Data marts vs. GDPR

Introduction
One of the attractions of our modern world is GDPR. I am sure that everyone has heard about it, so I will not go into details of it. Its essence is – from a data warehousing management point of view – very, very simply, that because of a statutory requirement, a company cannot keep the personal information of customers for more than 8 years if the customer has nothing more to do with the company. Each company has a data processing solution of its own developed by itself, and we had to implement GDPR compliant data processing in large data warehouses with hundreds of tables for large corporate customers, as well as deleting clients' data in their data mart layers of data warehouses.
For a less experienced person in the life of data warehouses, this may not seem a complicated task at first. However, for me, who has essentially devoted my career to this, it was somewhere between the very difficult and the unsolvable.
How? What? When?
If we get a list of those customers whose personal information needs to be deleted from the customer record system, a less complicated solution to the client deletion in the related tables is to irretrievably mask the data. The next step is to know exactly where to store relevant data in the data mart layer with the exact information to be deleted, and to decide on what kind of solution we will choose for deletion / masking. After a deep analysis we will be able to – depending on the data warehouse complexity – identify different relevant data marts up to a hundred. The idea is "where the problem arises, there needs to be a solution" approach, that in all the data marts we locally tackle the problem by introducing a new step. However, this is not the best solution due to the time constraints, the unstoppable nature of normal business, and the tremendous demand on human resource. Instead, it is more general and better to use the central solution in which you can parameterize and enable data masking to be used on a data mart and the task will be automatically performed. It is important that parameterization should be incorporated into the development standard for newly developed data marts. A complicating factor is that in the first round the already deposited data should be masked that have been accumulating up to decade, and only after that, the fresh data can be addressed.
Postface
It was a long and hard way to the first live deployment, but after so many successful deletion rounds for one of our clients, and after many millions of masked records, it works smoothly. The data mart layers of our client comply with the GDPR.
Gábor Ébert
Office: 1139 Budapest, Lomb u. 15, 3rd floor
info@eisys.hu | +36 1 611 4117
EiSYS Számítástechnikai Kft. | HU13669555