Does anyone read the terms and conditions before installing software, enabling the LCD panel on your vehicle, or buying a laptop? If you are one of the few who have the time and patience to read through (and understand) all of the purposefully lengthy and obfuscated terms and conditions, you may cringe before grudgingly accepting.

One seemingly innocuous tidbit is how your data may be collected and sent back anonymously. While this gives some the sense of anonymity, scientists say not so fast. According to AmericanScientist.Org, data scientists can predict with relatively accurate results, who the anonymous person is, from tidbits of information within anonymized data. Wait, what?

Anonymizing data usually means that your name, address, social security and other pieces of PII (personally identifiable information) are stripped from the data set. This would appear to limit the entities that are using the data from uniquely identifying any one individual.

A Web site set up by Latanya Sweeny, http://aboutmyinfo.org/ of Harvard begs to differ. The Web site shows how seemingly innocuous information present in anonymized data can reveal a person’s identity. Armed with just three pieces of information, (your gender, birth date and zip code) the site can predict who you are with 99% certainty. In sparsely populated areas, it’s almost 100%.

After entering my birth date, gender and zip code into the site, it rendered the statistics: There are 315 people my age in my zip code, which has a dense population of about 35,000. Hidden but not really.

Since I am the only resident with my birth date, I can be positively identified without even having to enter my gender.

Sweeny began to work on de-identification while she was a graduate student at MIT. She was concerned about privacy in medical records.

She examined a batch of medical records, anonymized and releases for statistical purpose, and she identified William Weld, a governor of Massachusetts, when she cross-referenced birth dates, genders and zip codes against voter registration rolls. The point was well received. Partly in reaction to this research, HIPAA laws were enacted and it’s now illegal to release anonymous medical records that contain birth dates. Other industries are not so regulated.

The next time a Web site asks you for your birth date and zip code, it’s likely they are trying to identify you without having to ask you directly. It’s not just birth dates and zip codes, other pieces of nondescript information can reveal your identity through the magic (or analysis) of statistical mathematics when analyzed against the troves of already existing data on the Internet.

While large data sets are often used by scientists and organization to find trends, solve problems and other beneficial purposes, keep in mind that if the data is made public, sold or traded, your identity can be extrapolated using statistical math.

Leave a comment

Your email address will not be published.