Jack Jaffe

Why DLP Projects Fail

Data Loss Prevention (DLP) products have been generally available for some time, yet their reputation amongst security professionals is not good. The products are known for resource-intensive implementations that yield small improvements in security.

Image Credits: Suridata



Data Loss Prevention (DLP) products have been generally available for some time, yet their
reputation amongst security professionals is not good.  The products are known for resource
intensive implementations that yield small improvements in security.

The core issue seems to be the failure of the inherent discovery mechanisms in these products
that result in a low accuracy rate resulting in a high amount of false positives. These
misclassifications annoy end users and cost organization support by inhibiting the flow of
information needed to conduct business.

The challenges faced by legacy DLP products has only been exacerbated by the movement of
workloads and storage from the organization premises to a variety of cloud and SaaS providers.  


So why has such a popular security technology been so challenged to accurate understand the
data that flows through their systems? This paper will look at several of key reasons

Rules Are Made To Be Broken

DLP often use rules or policies to make determinations on the sensitivity and classification of
organizational data.  A typical implementation can take six months or longer as these rules are
determined and configured, and then several months more to allow the inevitable exceptions.  

For example, it may be tempting to label all employee health care information as data that is
too sensitive to leave the organizational network. While this maybe good for privacy, it may
inhibit your claims administrator from getting the support from your insurance company for
your employees. So you make an exception for a particular individual.  But, this exception fails
when that individual changes jobs or the responsibility for claims moves from(for example) HR
to Finance.

Any change in personnel, organizational structure, or workflow can render any number of DLP
rules moot. The dynamics of a busy enterprise almost guarantee the obsolescence of the
resulting policies by the time the project is complete.

This interference with normal business operations is one of many major downsides of DLP.
The more aggressively the security team adds and updates rules to regulate sensitive data,
applications and user actions, the more often false positives occur, resulting in employee backlash.

False Positives From Patterns and Regex

DLP’s will often try to classify data based on familiar patterns or regex. Regex is shortening of
the term regular expression, which is a string of text that allows you to create patterns that
help match, locate, and manage data.

Some types of sensitive information can be programmatically detected such as credit cards and
social security numbers that follow a predictable structure, however this is highly error prone.
First, very different items can have a very similar look and presentation.  A nine-digit number
such as 2132397219, could be a phone number, a bank routing number, a product part code or
SKU, an employee number, of a great many other things.

Second, information like bank account numbers can take many different forms in practice, so
using dozens of detection patterns still may not catch them all, and instead quarantine lots of
unnecessary information in the process.  For example,DLP might encounter this telephone
number (8183415285) and identify it as a bank account, a false positive. An outgoing email
attachment with this telephone number might be blocked causing a slowdown in the business
where none is warranted.

Third, DLP’s can be challenged to detect sensitive information when it has been altered by
accident or intent.  The insertion of random characters or space scan cause the system to fail to
detect sensitive information allowing it to pass freely when it should not. For credit cards,
classic exfiltration bypassis to spell out the credit card number (“nine six one…”), change the
credit card number to Wingdings font, or re-write it as Roman numerals. It is easy to think up
ways to get past DLP’s pattern matching.

Fourth, the vast majority of valuable IP and personal data do not have obvious markers that
provide a surface area for the DLP to grab hold. Things like source code, trade secrets, key
supplier lists, financial records, patent applications and more present unique data strings, and
cannot be easily fit into a pattern. The net result is that DLP ends up focusing on the most
common denominator of information leaving whole swaths of information entirely unprotected.

People Are No Help

DLP solutions will often try and fill in the gaps left by rules, patterns and regex with manual
intervention. This introduces the inconsistency of human opinion into your security scheme.
Two people may not judge the sensitivity of a certain document the same way.  Further, the
employee is most likely to view the extra documentation requirement as burden and not
participate fully or at all.