The End of Days in Selling Personally Identifiable Information for Microtargeting has Arrived

realdanielbyrne
6 min readJan 14, 2021

With California’s new law placing restrictions and penalties on sharing, selling and capturing personally identifiable information it has now come time for the IT industry to examine the real need to gather such information.

A strong case can be made for just capturing the meta and stripping away names, phone numbers, and other personally identifiable information. With the way the data is used in predictive analytical and machine learning models is still just as useful and highly valuable to marketers without a consumer’s name and his son’s birthday in the hands of Cambridge Analytica.

Targeted Personalized Marketing is Hype that Never Lived up to its Promise

Facebook and Google does not need my personally identifiable data to feed me relevant ads. The data in aggregate is more valuable than the single data point that a person represents. Furthermore, the data these companies collect is time sensitive, and most targeted marketing platforms do not filter out old data effectively. Thus, digital profiles are a confusing mess of old haunts, current trends, and a search for a friends birthday present 2 years ago.

For example, this is a snippet from my Google Ad Settings Page which represents data Google collected about me and which Google uses to tailor targeted ads to me is laughably off base.

For starters, they got a few things right. I am a male between the ages go 35–44, I like American Football (good guess) and basketball.

However, pretty quickly I can start pick out some inaccuracies in the data. For instance I am a programmer, but I haven’t used C/C++ in a few years. I’m an Apple guy, and so to see Microsoft on my profile makes me want to cringe a little bit. I used to live in Birmingham Alabama, but that was 5 years ago. I never read celebrity and entertainment news, I’m not into audio equipment, cameras, nor comics, I don’t really watch comedy films, and I’ve never used Udemy for anything.

How Predictive Marketing Models are Built

Market researchers use the methods of statistical modeling and predictive analytics to build micro-targeting tools such as linear regression and machine learning models to predict a consumers behavior. Researchers take samples of data on people who have bought specific products and build a predictive model based upon a number of statistically significant shared traits. These traits can be location, sex, age, marital status, or number of children. They may be as broad as interest in camping, American Football, programming, or anything uniquely tied to this social group.

While it is appropriate for an individual seller to capture personal information since they must maintain receipts, credit card transactions, shipping addresses and the like, there is little real incentive to include that data when building these models. The reality is that individuals cease to become individuals when their behaviors are compared to the group. Rather people tend to fall into readily identifiable categories.

It is thus important to point out then when researchers train these models, personal information such as names and phone numbers are removed from the training data, because they are not helpful and could cause problems like model overfitting or underfitting.

Generally, a model is said to overfit if it is more accurate in fitting known training data but significantly less accurate in predicting new data. One can intuitively understand overfitting from understanding that information from past experience can be divided into information that is relevant to predict future events and information that is irrelevant or noise. Generally, the more difficult a thing is to predict, the more noise exists in past information that should be ignored. Failing to ignore this noise can result in overfitting.

Generalization is the key in developing a good model. You want your predictive models to give the best results for a wide variety of inputs not the correct result for a small subset of inputs, and wildly incorrect for everything else. So why are companies like Facebook selling this specific information to begin with if it isn’t being used?

Two Real World Examples

The fact is, most people, and by that I mean the people marketers want to target act like a hive mind. They do and say and buy what people around them are doing, saying and buying with the exception of the rare few influencers. This is especially true for say individuals walking into a mall or for individuals shopping for camping equipment on Amazon.

Let’s examine these two real world examples and the power of generalization to see how marketers can use this information in delivering generalized but still micro-targeted ads tailored to a specific subset of consumers.

It is not a mystery what an Individual’s motivations are when they walk into a mall. Nor is it hard to mistake the intentions of a shopper on Amazon searching for camping equipment.

A marketer can pretty much be certain that a shopper on Amazon looking for camping equipment might be interested in other things associated with camping like for instance, outdoor clothing, first aid kits, RV accessories, sunglasses, and bug spray. Furthermore, using cookies and browser history that same shopper can be followed to Facebook and Cabella’s websites and subsequently pitched time relevant ads following this intuition. The marketer does not need to know this individuals name, nor his phone number, nor his e-mail. They can just simply feed his browser history into a multi-class neural net to predict which things he might purchase as a result of his recent interest in camping equipment.

Likewise, a marketer can pretty much be certain that a person walking into a mall is looking to buy either clothes, shoes, lunch at the food court, a coffee at Starbucks, jewelry, a cellphone, or a cookie cake. These potentialities while diverse are also limited. That means a marketer can safely assume and confidently pitch any or all of these options while a shopper is in the mall.

That location information can be easily and anonymously gathered if malls would dispense with their antiquated use of capture portals which slow down and dissuade people from logging onto free WiFi systems. The proprietors of the mall or airport or for that matter any commercialized public space should want people to log on to their WiFi.

The monetizing of the captured audience in a mall could go something like this:

  • The local mall reports to Facebook regularly on the minute, how many people are logged into their public WiFi. This number is an accurate measurement of the number of people in the mall since the mall does not use a capture portal and allows any and all users with a WiFi compatible device walking into their facility to access to their public WiFi.
  • Facebook inserts ads into the feeds of individuals scrolling through Facebook as they are walking through the mall.
  • These ads are purchased by marketers looking to sell individuals clothes, shoes, lunch at the food court, a coffee at Starbucks, jewelry, a cellphone, or a cookie cake.
  • The user, who is completely anonymous considering all he has shared is his devices MAC address and general location, is fed timely and relevant ads specific to the things he/she might find in that mall.

In Conclusion

Since personally identifiable information is already tossed when training predictive models, and since the mere location or current browsing history of a shopper is sufficient to identify what things should be marketed to them, it is clear that the the entire market of buying and selling an individuals personally identifiable information is of poor value. All that really matters is the meta information like location, browsing history, and various other abstracted data points that I didn’t didn’t discuss in this post.

So, here is my request to Facebook, and other new economy data warehouses, “STOP selling my personally identifiable information”. However, I give you permission to sell the fact that I’m in the airport with 80K other people, and that implies that I’m probably looking for a Starbucks and a phone charger. My name is irrelevant at this point, and thus I applaud the California’s digital privacy law a good first step in the reversal of decades of exploitation at my expense for the benefit of a select few.

--

--