Data Privacy and the Data Professional

This was the first of three sessions I attended at SQL Saturday 712.

Session link: http://www.sqlsaturday.com/712/Sessions/Details.aspx?sid=77101

Martin Catherall gave a great talk on the ever-important subject of data privacy. Developers and SQL Administrators may find themselves with more access than they need. Recent versions of SQL Server give us more tools to help manage and respect other people’s data, especially PII.

Martin unwrapped these recent features and put them in the context of GDPR (General Data Protection Regulation). I found the timing fascinating, because GDPR took effect the previous day.

There are at least five key rights (known as Data Subject Rights) to be met:

  1. Right to Delete*
  2. Right to Edit
  3. Right to Restrict
  4. Right to Export
  5. Right to View/Access

We could say one of the major sources of confusion around data management is Security vs Convenience. It is a common assumption that most people click through privacy policies / terms and conditions / EULAs without reading. Some people even brag about it. As a result, they skip over the security implications of the data they choose to share, often to third parties without really “knowing” what is happening to their data.

*GDPR’s right to delete has interesting implications for logs, backups, and immutable data stores, but I’ll leave this thought for now.

Companies sell your data. Surprised?

The common perception of how data is treated is innocent. We ask “are all businesses data-driven?” And are surprised when people find out about data-mining and start getting worried. It was in the policies you just accepted!

The quip “If you are not paying, you are the product” (disputed sources: Television Delivers People / Andrew Lewis) seems obvious to people who use technology often. It is clear this is not communicated clearly to non-expert users.

All of these privacy and security implications extend to developers and other people working with data. We should utilize our tools to protect this data throughout its life-cycle. So, let’s look at what SQL Server offers…

Data Privacy in SQL Server

From a SQL Server/Cloud perspective, we can consider features like Azure Key Vault, Always Encrypted, and Transparent Data Encryption (TDE). Row Level Security (RLS) is also a very notable feature.

Azure Key Vault

Azure Key Vault enables teams to keep passwords and other secrets safe, in a world which seems dead set on moving almost everything to the cloud. Passwords sent in emails are no longer secure, and sticky notes can be observed (accidentally or otherwise).

There are plenty of password management tools in general (be sure to search for reviews before using one). However, since Key Vault part of the Azure ecosystem, you may already have access. It has all of the features you might expect from Microsoft: import/export, automation for certificates, auditing and monitoring, and more.

Always Encrypted

Arriving in SQL Server 2016, Always Encrypted allows us to store data without being able to inspect it. I.e., applications managing the data have keys to work with the data, but those keys are never shared with SQL Server. Always Encrypted works at the application driver level. This means it is effectively middleware between the application and the database server. Encryption and decryption are automatic.

This allows database administrators to manage the storage, performance, and health of SQL Server without putting sensitive data at risk.

Always Encrypted is per-column: you can encrypt sensitive columns and leave other columns as-is.

The keys used by the Always Encrypted driver need to be stored in a compatible Key Store. The MSDN documentation lists “Azure Key Vault, Windows Certificate Store on a client machine, or a hardware security module” as options (see how this all links?). The application driver will work with the key store transparently as needed.

Like all things, there are considerations noted in the documentation, specifically around migrating secure data.

Transparent Data Encryption

Transparent Data Encryption (TDE) allows us to encrypt data at rest (the files on disk are encrypted). While GDPR does not mandate encryption, it is a recommendation to consider encryption as part of overall security. Besides noting that TDE is encrypting the database files on disk, it’s really a technical/implementation process. The benefits of encrypting the databases on disk are straight-forward: if the keys are protected, and the files are comprised, attackers won’t be able to access the data by simply restoring the database files.

Row Level Security

Row Level Security (RLS) really piqued my interest. RLS potentially has a part to play when we consider restricting access on a per-user or per-group basis. This is common for enterprise applications. RLS allows us to hide rows in tables based on a lookup such as user ID. The lookup predicate can join to other tables, allowing for complex access controls such as geographically-limited access. Since RLS is built-in, implementation transfers much better than home-grown solutions. It acts like an invisible filter. If you type “SELECT * FROM…” it feels like an extra WHERE clause tacked on at the end. Used correctly, this could really remove some of the permissions-based struggles I’ve seen. In particular, with developers having access to non-test sets of data, even in side-by-side systems.

The impact of custom solutions to this problem include knowledge debt, technical debt, and sometimes complete failure if an edge case skips the check (usually an “empty” case or “*”/”everything” case).

I like the recognition of this need by Microsoft. Providing a standard implementation for a common problem is one of their strong suits. This is much the same as implementing OAuth 2.0 using OWIN middleware: a lot easier than doing it yourself! However, there are some drawbacks, namely, the profiler. This is worth repeating:

RLS doesn’t protect you from the profiler

RLS cannot filter external observers. E.g., SQL Server Profiler.

Data masking

Martin also discussed the need to mask personally identifiable information (PII). Citing RedGate, he noted a (recent) survey showed 2 out of 3 companies were testing with production data in development environments. A good strategy for avoiding this can be found on RedGate’s blog at Protecting production data in non-production environments.

Martin discussed three approaches to dealing with this.

Manually mask PII

This leads to eventual gaps in the ad hoc process as new PII fields are added over time. PII can sometimes be placed into generic fields (“comments”, “notes”) and so slip through unmasked.

Create your own data

The usual objection to this is that manual test data doesn’t exercise the same code paths that production data does. I’d like to point out that a deficiency in test suite data doesn’t give you the right to use others’ PII.

Product placement: automatic masking

RedGate (and presumably other companies) provide tools where you can set up field mappings to transform PII. E.g., by randomizing names. You can adjust the transformations to maintain existing distributions. These tools also suffer from gaps over time, but enforcing data movement through these tools implies examining the results. It also suggests you’ll manage the suite of transformations as part of your overall data management strategy.

Data privacy processes vs culture

Even with good processes in place, your company culture could sabotage efforts to respect PII and meet your data privacy obligations.

E.g., in a company where multiple people share an account password, you can no longer restrict access to data appropriately. In this case, each person should have an account with the appropriate access.

Where we can automate or remove the ability for people to access sensitive data, we no longer have to worry about enforcement and compliance. Set up the system to protect the data, and then safely work within your access levels.

Conclusion

We’ve always had the responsibility to look after the data with which we are entrusted. GDPR gives us a kick to do it properly. New versions of SQL Server provide a suite of powerful tools and integration opportunities to safe-guard data in a standard way.

Unfortunately, each session at SQL Saturday was only an hour long, otherwise I’d hope to hear more from Martin. You can find Martin at SQL Down Under, where they feature training, mentoring, and free stuff!

A big thanks to Martin for taking the time to present these important features.

Advertisements