What's Missing from DLP

By David Gibson

In most organizations today, there is sensitive data that is overexposed and vulnerable to misuse or theft, leaving IT in an ongoing race to prevent data loss. Packet sniffers, firewalls, virus scanners, and spam filters are doing a good job securing the borders, but what about insider threats? The threat of legitimate authorized users unwittingly (or wittingly) leaking critical information just by accessing data that is available to them is all too real. Analyst firms, such as IDC, estimate that in five years, unstructured data, which makes up 80 percent of organizational data, will grow by 650 percent.

The risk of data loss is increasing beyond this explosive rate, as more dynamic, cross-functional teams collaborate, and data is continually transferred between network shares, email accounts, SharePoint sites, mobile devices, and other platforms. As a result, security professionals are turning to data loss prevention (DLP) solutions for help. Unfortunately, organizations are finding that these DLP solutions, in many cases, fail to fully protect critical data because they focus on symptomatic, perimeter-level solutions and not a much deeper problem: the fact that users have inappropriate or excessive rights to sensitive information.

DLP Alone is Not a Panacea: DLP solutions primarily focus on classifying sensitive data and preventing its transfer with a three-pronged technology approach:

Endpoint protections encrypt data on hard drives and disable external storage to stop data from escaping via employee laptops and workstations.
Network protections scan and filter sensitive data to prevent it from leaving the organization via email, HTTP, FTP, and other protocols.
Server protections focus on content classification and identifying sensitive files that need to be protected before they have a chance to escape.

This approach works well if an organization knows who owns all the sensitive data and who is using it. Since that is almost never the case, once the sensitive data is identified, which in the average-sized organization can take months, IT is left with the monumental job of finding out who the sensitive data belongs to, who has and should have access to it, and who is using it. These questions must be answered in order to identify the highest priority sensitive data (that is, the data-in-use) and to determine the appropriate data loss prevention procedures.

Early solutions that focused primarily on endpoint and network protections were quickly overwhelmed by the massive amounts of data traversing countless networks and devices. Unfortunately, DLP’s file-based approach to content classification is cumbersome at best. Upon implementing DLP, it is not uncommon to have tens of thousands of “alerts” about sensitive files.

The challenge doesn’t stop here. Select an alert at random. The sensitive files involved may have been auto-encrypted and auto-quarantined, but what comes next? Who has the knowledge and authority to decide the appropriate access controls? Who are we now preventing from doing their jobs? How and why were the files placed here in the first place?

DLP solutions provide little context about data usage, permissions, and ownership, making it difficult for IT to proceed with sustainable remediation. IT does not have the information available to them to make decisions about accessibility and acceptable use on their own. And even if the information was available, it is not realistic to make these kinds of decisions for every file.

Digital collaboration is essential for organizations to function successfully, so the reality is that sensitive files are being used to achieve important business objectives. However, in order to do this, sensitive data must be stored somewhere that allows people to collaborate with it, while at the same time ensuring that only the right people have access and that their use of sensitive data is monitored.

Context is King: When an incident occurs or an access control issue is detected, organizations should not be required to turn their business into a panic room. Rather, solutions to prevent data loss need to enable the personnel with the most knowledge about the data, the data owners, to take the appropriate action to remediate risks quickly, in the right order. To do this, organizations need enterprise context awareness, including knowledge of who owns the data, who uses the data, and who should and should not have access.

Managing and protecting sensitive information requires an ongoing, repeatable process. The analyst firm, Forrester, refers to this as protecting information consistently with identity context (PICWIC).

The central idea of PICWIC is that data is assigned to business owners at all times. When identity context is combined with data management, organizations can provision new user accounts with correct levels of access, recertify access entitlements regularly, and take the appropriate actions when an employee changes roles or is terminated. By following the PICWIC best practices, the chances of accidental data leakage are dramatically reduced while lifting a substantial burden from IT.

Advanced and Comprehensive DLP: The concept of PICWIC and the resulting policies and procedures that it enables are most promising, but the question remains of how to implement PICWIC and improve DLP implementations. The key to providing the necessary context lies in metadata. The need is to collect and analyze required metadata non-intrusively, to automate workflows and auto-generate reports, and have a reliable operational plan to follow.

With the recent advancements in metadata technology, data governance software is providing organizations with the ability to improve DLP implementations by not only automating the process of identifying sensitive data, but also simultaneously showing what data is in use and who is using it, providing the needed context for comprehensive DLP.

By non-intrusively and continuously collecting critical metadata such as permissions, user and group activity, access, and sensitivity and then synthesizing this information, data governance software provides visibility never before available with traditional DLP implementations. When data governance software is used in conjunction with traditional DLP software, implementations move faster, and sensitive data is more accurately identified and protected.

With over 23 million records containing personally identifiable information leaked in 2011 alone, it is more important than ever for organizations to ensure sensitive data is secure. Regulations such as the European Union’s recent decision to fine businesses breaching their privacy rules up to two percent of their global turnover make it imperative for organizations to ensure their DLP practices are quick, comprehensive, and continuous.

Integrating data governance software automation into existing or new DLP implementations not only ensures sensitive data is secure, but it also provides a speed and scale that traditional DLP cannot achieve. Because data governance software automatically adjusts as changes file structures and activity profiles occur, access controls to shared data are always current and based on business needs. As a result, the fundamental step to data loss prevention is addressed, limiting what data makes its way to laptops, printers, and USB drives in the first place. That way, efforts to further protect data via filtering and encryption can be focused more efficiently on only those items that are valuable, sensitive, and actively being accessed.

David Gibson is the director of strategy at Varonis, a provider of unstructured and semi-structured data governance for file systems, SharePoint and NAS devices, and Exchange servers.

[From the April/May 2012 issue of AnswerStat magazine]