- Create a new tag of the type 'Custom HTML'. Use the Javascript below.
- The tag (a) reads the title and URL of the webpage on which the tag is loaded, (b) replaces the previously defined personal data in the title and URL of the webpage, using the function mentioned above (c) adjusts the URL in the browser and/or replaces the title of the webpage, if personal data are present, and (d) sends a 'piiRedacted' event to the dataLayer, together with the new URL.
<script>(function(){ var PII = {{PII}}; var URL = {{Page URL}};
var newURL = {{return redactData function}}(URL, PII);
if (newURL !== URL) {
window.history.replaceState({}, document.title, newURL)
}
var title = document.title;
var newTitle = {{return redactData function}}(title, PII);
if (newTitle !== title) {
document.title = newTitle;
}
window.dataLayer = window.dataLayer || [];
window.dataLayer.push({
"event": "piiRedacted",
"Page URL": newURL
});
})();
</script> |
- Create a trigger based on the "piiRedacted" event.
Create a new trigger of type 'Custom Event' based on the 'piiRedacted' event. This event indicates the moment that all actions in the Custom HTML tag from above have been executed - at this moment the URL and title of the webpage are completely free of personal data.
- Replace the existing 'All Pages' trigger with the new 'piiRedacted' trigger.
To ensure that tags from other scripts are not loaded until after the URL and page title have been removed from personal data, the 'All Pages' trigger on existing tags should be replaced by the new trigger based on the 'piiRedacted' event.
Tool dependent whitelists - how do I do that?
With the solution of tool independent whitelists you define the data that is not personal data (the whitelist). Next, you turn them into blacklist patterns, because you want to replace values that are not present in the whitelist. An example will make this clear.
As an illustration, I want to replace the value of all URL parameters with "[REDACTED]", except for the parameters "foo" and "bar" - this is my whitelist. Specifically, this means that the URL "https://www.domein.nl?foo=waarde&bar=waarde&email=siemon@i-spark.nl&foobar=waarde” will be replaced by “https://www.domein.nl?foo=waarde&bar=waarde&email=[REDACTED]&foobar=[REDACTED]”. Below I explain step by step how you can achieve this using Google Tag Manager:
- Define your blacklist
- Turn your whitelist into a blacklist. The goal is to replace all parameters except the one of the whitelist. So specify 'all parameters except the whitelist' in a regular expression. In a similar way you can specify a pattern that matches every word except the one of a whitelist.
- Create a new variable of the type "Custom JavaScript macro" and call it "PII".
- Within the new variable, define an array containing an object with 3 keys: 'name', 'regex' and 'replacement'. For the name key, enter a string describing the type of data if it does not appear in the whitelist. In this case I use 'NON-WHITELISTED PARAMETER'. For the regex key, specify the regular expression of the type of data that does not appear in the whitelist - in this case all parameters except 'foo' and 'bar'. For the replacement key, specify the string with which the personal data is to be replaced. A "$" followed by a number indicates the number of the capturing group whose match must be maintained in the replacement.
- Return the defined array.
function(){ var piiRegex = [{ name: 'NON-WHITELISTED PARAMETER', regex: /([?&](?!((foo|bar)=))[^=]+=)([^&$#])+/gi, replacement: "$1[REDACTED]" }]
return piiRegex;
} |
The regular expression is now more complicated by using a negative lookahead. For the enthusiast I like to explain the regular expression with the negative lookahead bit by bit:
- ([?&](?!((foo|bar)=))[^=]+=)
First capturing group: match a "?" or "&" not followed by "foo=" or "bar="
Match a '?' or '&'.
Negative lookahead: match the above only if not followed through the regular expression.
-((foo|bar)=)
The string "foo" or "bar" followed by "=".
"foo|bar" is your whitelist!
Match each character except "=" at least once up to "="
Second capturing group: match each character except "&", "$" (end of string) or "#" at least 1 time
- The 'g' means 'global'. In other words, search (and replace) all matches within the string instead of just the first match.
- The 'i' indicates that the regular expression is not case-insensitive.
The use of the above regular expression means that the URL
“https://www.domein.nl?foo=waarde&bar=waarde&email=siemon@i-spark.nl&foobar=waarde” gives two matches, namely “&email=siemon@i-spark.nl” en “&foobar=waarde”. After all, there are 2 parameters that do not match the whitelist within the negative lookahead, namely "email" and "foobar". I only want to replace the parameter value by "[REDACTED]" and therefore the 1st capturing group of each match - "&foo=" and "&bar=" - be preserved. I do this by replacing the full regex matches with "$1[REDACTED]".
Steps b to e remain the same for the tool-independent whitelisting, as described above for the tool-independent blacklisting.
Takeaways
- The Personal Data Authority uses a broad definition for 'personal data'. This requires measures to prevent the processing of personal data as much as possible. A combination of blacklisting and whitelisting is recommended.
- The 'replace' function in combination with regular expression makes it easy to replace personal data. This method can be applied tool-independently before other scripts are loaded and is therefore very efficient.
- The tool-independent solution using regular expression supports both blacklists and whitelists. For whitelisting you can use a negative lookahead.
Would you like to have help implementing the above mentioned solutions? Or do you want advice on what is the best solution for your business? Get in touch!
contact us
Siemon