Machine learning and privacy: The bumpy road from concept proof to production

Details a supervised machine learning (ML) solution, PII filter, created by computer consultant, Netquest.

Introduction

In the era of Big Data, when companies try to handle tons of data in order to get valuable comprehensive profiles of their clients, clickstream data takes on an important role. Data providers seek to acquire detailed information on the online behavio (or PII). Moreover, since the arrival of the GDPR, it is not only unethical to give away such information, but it is also illegal. Therefore, it is paramount to have a technique to detect those personal bits of information and strip them away from the data-set. In order to address this problem, we came up with a supervised machine learning (ML) solution called "Pll filter", which we presented during ESOMAR Fusion 2018. Results of the model up to now have been remarkably good, so we decided to put it into production. In this paper, we present the strategy we chose to deploy our service and make it accessible to our clients, be them internal or external.