Haviv Ohayon

What Is NLP and Why Does It Matter To You?

Learn about a new disruptive technology, Natural Language Processing, is bringing new visibility and control to the corporate data world.

Image Credits: Suridata

Hello, Dr. Reichart, please tell us about yourself

I am an Associate Professor at the faculty of Industrial Engineering and Management of the Technion. I’ve received my Ph.D.from the Hebrew University and was a Post-Doc Fellow in MIT and the University of Cambridge. I worked with several startups, like ExB, Yahoo, and Gong. I usually join startups during their early stages and help building the technology.

Define in your own words what is NLP.

NLP is an acronym that stands for NaturalLanguage Processing, which deals with developing technologies and algorithms that can understand human language. NLP deals with written texts and with spoken words (conversations, interactions). The field lies heavily on MachineLearning (ML).  For example, a system received and processed multiple texts and if a new text is entered, it should identify if the text contains positive or negative sentiment.

How Suridata integrates NLP in their products?

Our goal is to discover sensitive information in our clients’ data sources (on-prem and cloud). There is a lot of sensitive information that just sits on servers, local hosts, and other places that you didn’t even think about. For example, hotel systems may contain information regarding their guests’ allergies and visitors, and this information should not fall into malicious hands. To find this information,Suridata’s product uses the NLP algorithms to search throughout all systems,sometimes go over thousands of documents, and find which one contains sensitive data.

Other companies also use NLP algorithms.What is unique in your product?

To find sensitive information, there are several challenges. Foremost of all, each client is different, and their sensitive data looks and behaves differently. For example, sensitive information of banks does not look like the sensitive information of a real estate company. Suridata does not offer their services for only one line of clients, but to all of them since the NLP algorithm is continually evolving and improves itself, based on the customer’s data.

But what if the client does not know if they have sensitive information? Can your algorithm manage it?

Every AI learns from samples. If you give it enough samples, it knows what to search for and what to find. One of our first things we discovered, during our early stages, is that organizations don’t always know what they have. There could be hundreds of sensitive files existing in some remote and unused server that need to be guarded. Our job is to locate those files, even if the client never even thought of them.

Do you have an example for this?

Yes. Lately, we run the algorithm on an organization, but we received a very limited range of samples. We usually need about 10-15 samples for the algorithm to understand the unique sensitive content, but we received only three samples. Not only did the algorithm found documents with sensitive information, but also found files that the organization didn’t know about and were written in a different format, but still included sensitive information, as if a real person went over all the documents and marked them.

So, you think that one day an AI will learn how we read and write and emulate us?

No, I think there is still a long way togo. AI can do very simple things right now, but nothing complex. AI can do wonderous things, such as forecasting NBA players behavior during a game based on the interviews that they gave before the game, and there are several social activities of AIs that can locate persons with suicidal tendencies based on their online activities, but we are a long way from being replaced by them.