Big Data Systems House Sensitive Data, Security Exposures

Big Data Systems House Sensitive Data, Security Exposures

SANS Institute survey concludes 73% of big data systems contain personal information, recommends tighter controls.

7 Data Center Disasters You'll Never See Coming

7 Data Center Disasters You’ll Never See Coming

(Click image for larger view and slideshow.)

Big data systems are invading enterprise data centers at a rapid rate, but they often lack the controlled access, data encryption, and other protections inherent in relational systems, according to a SANS Institute survey of 206 companies. Of the respondents, 43% were from organizations with 10,000 or more employees and 53% held a title related to IT security.

Big data systems increasingly serve as the repository for personal-identification information and corporate intellectual property. For example, the SANS survey found 73% of respondents with big data applications “use them to store personal data on customers and 72% store important business data,” such as employee records (64%), intellectual property (59%), and payment card information (53%).

The result is an exposure that companies may not have counted on as they initiated their pilot big data projects, according to the survey report, “Enabling Big Data By Removing Security and Compliance Barriers,” available here(registration required). Cloudera, the supplier of Hadoop system Cloudera Enterprise, sponsored the SANS survey.

Many times, those projects demonstrate the utility of bringing together diverse data that was previously hard to assemble given the radically different data types. Big data systems gain utility as more data is brought in. The result is a slow brew of gathering risk without sufficient safeguards, the study warns.

[Hortonworks is adding encryption to its big data system. Learn more:Hortonworks Deploys Hadoop Into Public Clouds.]

The SANS Institute is a private company that provides training and certification in cyber-security skills. Its name springs from its initial target group of IT professionals: system administrators and audit, networking, and security managers. The results of the survey were reported by SANS analyst Barbara Filkins, with John Pescatore, SANS director of emerging technologies, acting as an adviser.

Cloudera claims a marketplace lead with its built-in security measures, according to Alex Gutow, a Cloudera product marketing manager. For example, Cloudera is PCI compliant in handling credit card information. Other Hadoop systems have yet to achieve the rating, she said in an interview at the Hadoop Summit in San Jose, Calif., Wednesday.

MasterCard, a Cloudera partner and customer, has been using a PCI-certified enterprise data hub since 2014, said Sam Heywood, director of Cloudera’s Security Center of Excellence in Austin, Texas.

But other Hadoop-based systems are bent on catching up. Hortonworks, in an announcement before the summit, said it has added the protection of encryption for data at rest as well as data in transit to its 2.3 release of the Hortonworks Data Platform. Most big data system suppliers will look at the SANS survey and redouble their efforts to protect data in their systems.

(Image: alengo/iStockphoto)

(Image: alengo/iStockphoto)

Among the survey respondents, 27.4% said they are running a big data production system; 10.4% are running a pilot system; 17.4% were engaged in proof of concept; 28.4% had plans for a system but had not implemented it, usually due to resource issues; and 4.5% had no plans for a big data system. The remaining respondents did not know if such a system was in the works.

At the same time, 83% of the SANS survey respondents who had a running system said their systems “must comply with one or more regulatory standards.” In 40% of these cases, compliance must be established by external audit.

The stakes are large for all of the system suppliers. The market for big data products, such as Hortonworks, Riak, Couchbase, MongoDB, and Cloudera, is expected to grow from $16.55 billion in 2014 to $41.52 billion by 2018, according to market researcher IDC.

Security was one of the topics addressed by a panel of big data users Thursday, the last day of the Hadoop Summit. Anil Varma, VP of data and analytics for Schlumberger, said imposing user access controls, based on identity and roles, is one way to improve big data security. In order for role restrictions to work, companies will have to practice good data governance. Data must be tagged and segmented as it’s gathered, with personal-identification information having a much higher role restriction than anonymous, click-stream data.

“The next two to three years will be really important on that (data governance),” he said. Due to worries over security, “a lot of this data still hasn’t been brought in,” he noted.

David Lin, Symantec cloud platform engineer, said his firm needed to protect its data before it could extend services that help customers protect theirs. He urged companies to build up their big data lakes, initially in a restricted fashion, and then figure out how to grant more access to them.

“There’s a lot of uncertainty around security. Kill the fear. Haters to the left. Get started and go. Smart people will figure it out,” he said.

Sam Gentsch, manager of IT at Home Depot, said his firm is imposing a user-access-control framework with fine-grained controls on its Hadoop big data system.

Charles Babcock is an editor-at-large for InformationWeek and author of Management Strategies for the Cloud Revolution, a McGraw-Hill book. He is the former editor-in-chief of Digital News, former software editor of Computerworld and former technology editor of Interactive … View Full Bio



Charlie Babcock
Charlie Babcock,
User Rank: Author
6/12/2015 | 5:00:08 PM

Users want it now, protected or not

The appetite to use big data, and assembling more useful data in the Hadoop or NoSQL system, grows faster than the concerns about protecting it. The big data administrator is caught between wanting to extend the use of his system to more people and protecting what he’s got.
Ulf Mattsson
Ulf Mattsson,
User Rank: Strategist
6/12/2015 | 12:10:05 PM

The result is a slow brew of gathering risk without sufficient safeguards

I agree that “Many times, those projects demonstrate the utility of bringing together diverse data that was previously hard to assemble given the radically different data types. Big data systems gain utility as more data is brought in. The result is a slow brew of gathering risk without sufficient safeguards.”

To reach the goal of securing the data while preserving its value, the data itself must be protected at as fine-grained a level as possible. Securing individual fields allows for the greatest flexibility in protecting sensitive identifying fields while allowing nonidentifying information to remain in the clear.

Protecting this information within the enterprise is a significant challenge on its own, but monetizing the data means sending it to one or many other organizations, each of which have their own security profiles. Anonymizing privacy data completely may not be feasible in a monetizing scenario, but deidentifying the most sensitive information, e.g., names, social security numbers, birth dates, is vital to protecting the privacy of individuals.

Using data protection methods such as tokenization can also allow businesses to preserve the type and length of the data, as well as deidentifying only part of the data fields, while leaving the relevant parts in the clear, such as exposing a birth year rather than the entire date. This will keep the data usable for third parties to analyze, while helping to protect the privacy of the individuals who make up the data.

We may not be able to completely prevent hackers from stealing data, but we can make it far more difficult for them to cause significant damage with it. By protecting data at a very fine-grained level—fields or even part(s) of a field—we can continue to reap the benefits of data monetization while putting forth a significant barrier to identity theft.

Ulf Mattsson, CTO Protegrity

Please follow and like us:

Post a comment

Your email address will not be published. Required fields are marked *

close slider
  • +44 (0)203 004 9596
  • This field is for validation purposes and should be left unchanged.