VTech Data Breach - Some Steps To Take

3 min read

Edit, 11/27/2015: Troy Hunt - who was consulted on the Vice story - has a comprehensive review of the breach that is a great read. End edit.

Earlier today, Vice published a story on a data breach at VTech, a popular toy and game manufacturer. The bad news is that the breach was very extensive, with nearly 5 million unique emaila and personal info on approximately 200,000 children. The good news - if there is ever good news when reporting a data breach - is that the person who compromised the data isn't planning on releasing it.

However, from reading the description of the breach, it is possible that other people have already accessed the data stored by VTech. Data that could be accessed included email addresses, children's first name, gender, and date of birth, passwords, and password recovery questions and answers. According to the hacker (as quoted in the article linked above):

The hacker said that while he doesn’t intend to publish the data publicly, it’s possible others exfiltrated it before him.
"It was pretty easy to dump, so someone with darker motives could easily get it".

What To Do Now

First, don't panic. If you have bought a VTech product in the past, these steps can help minimize any risk. And, even if you haven't bought a VTech product, these steps are good practice. Nothing listed here is particularly novel or earth shattering, but these steps can help protect you over time.

  • On sites that use password recovery questions, go back and update your password and change the recovery questions and answers. This can be tedious, but it's a lot less work than recovering an account after it has been compromised.
  • Use a password manager (something like LastPass or 1Password) to store your passwords. This will allow you to store and use more complex passwords.
  • Establish a fraud alert. This guide from the Privacy Rights Clearinghouse contains clear instructions on how to do that, as well as other useful information on steps to take if and when your personal data gets compromised.
  • Visit Have I been pwned to see if your data was breached in any of the high profile breaches. While this site is not 100% comprehensive, it's a useful resource to see if you have been affected by recent data breaches. Because the data from the VTech breach has not been released, this site will not include people affected there (the list from the VTech breach has been added to "Have I Been pwned"), but it is still a useful resource in monitoring whether your information has been breached anywhere.

And, to state the obvious: now is not a good time to buy any new VTech games.

This is why we encrypt hard drives, folks #privacy #humanerror

Student data on computers that were stolen. They were probably just after the computers. The data was an unfortunate coincidence.

Tracking We Can't Hear

4 min read

The Center for Democracy and Technology recently filed comments to the FTC on cross device tracking. Their report is a good summary of current practices in cross device tracking, and worth a read. In this report, however, they highlight the use of high frequency tones embedded in television and online video ads to connect people to devices in order to allow marketers to connect a single individual to multiple devices. Because each device holds information about how a person interacts online, the ability to combine these separate views into a single profile allows marketers to develop more comprehensive (and more invasive) profiles of people. Ars Technica has a solid writeup that summarizes these points. The FTC is holding an event dedicated to cross device tracking tomorrow (on November 16, 2015).

As noted in the Ars writeup and the CDT document, multiple companies use high frequency pitches to track users. In September, 2015, one of these companies announced additional VC funding - while the quotations below are specifically about this one company, they are generally accurate about this tracking practice.

With such data related to TV commercials, companies can come up with targeted mobile ads. The technology essentially consists of an audio beacon signal embedded into tv commercials which are picked up silently by an app installed on a user phone (unknown to a user).

A rough profile of user (sic) is then created, containing information about where the ad was watched, for how long did the user watch that commercial before changing the channel, which kind of mobile device is user using and so on.

Just to highlight: the app that picks up the audio signal that cannot be detected by human hearing needs to be installed on a person's phone. That would seem to be a pretty significant barrier, as very few people would willingly install software on their phone for the expressed purpose of tracking them.

However, affiliate deals sidestep this barrier:

The company reportedly has agreements with about 6-7 apps to incorporate this technology in their app to catch signals from TV and claims to have data of 18 million smartphones already. It has already created mobile ads for over 50 brands in six countries including Google, Dominos, Samsung, Candy Crush, Airtel, P&G, Kabam and Myntra.

Based on this report, it sounds like the tracking technology is embedded within other apps. So, when you download an app from the Play Store of the Apple Store, it could have this tracking software silently embedded in it, with no notice to end users. Both Google and Apple could play a positive role here by requiring apps that embedded this tracking software to display a prominent notice to end users on their app pages.

It's also worth noting that, while the stated use is for advertisers to connect multiple devices to a single user, this technology could also be used to track multiple people to a single location. For example, high frequency pitches could be sent out in a mall (tracking people through a store), at a concert, or via any televised display. This would allow a specific device (and the person carrying it) to be tracked to a precise location, even if that person has their location services fully disabled on their phone. This technology would also allow marketers or observers to identify people who were in the same place at the same time.

Intrusive practices like this move marketing solidly into the realm of profiling and surveillance. Technologies like this also make the case for requiring a hardware switch on mobile devices to disable microphones and cameras. At the very least, these intrusive practices by marketers show us that browsing the web with the volume turned off, and muting the television when any commercials play, are best practice. It also highlights how privacy weaknesses in the Internet of Things (most recently seen in Vizio's sloppy business and privacy practice) can be compounded into greater intrusions into our privacy. These intrusions - committed by marketers in their self-described mission to deliver more relevant content - cross the line from marketing research into tracking and surveillance. Many of these practices are invisible, and offer no option to opt out, let alone to review or correct the full profiles amassed on us.

Now, it turns out that in addition to being invisible, the tracking is inaudible as well.

Privacy Policies and Machine Learning

3 min read

Today, Google announced the release of their v2 machine learning system under an open source license. This is a big deal for a few reasons. First, to understate things, Google understands machine learning. The opportunity to see how Google works on addressing machine learning will save huge numbers of people huge amounts of time. Second, this lets us take a look inside what is generally a black box. We don't often get to see how ratings, reviews, recommendations, etc are made at scale. This release peels back one piece of one curtain and lets us look inside.

Before we go any further, it's worth highlighting that machine learning - even with a solid codebase - is incredibly complex. Doing it well involves a range of work on infrastructure, data structure, training the algorithm, and ongoing, constant monitoring for accuracy - and even then, there is still a lot of confusion and misconception about what machine learning does, and what machine learning should do. So, before we proceed any further it needs to be highlighted that doing machine learning well requires (at the very least) clearly defined goals, a reliable dataset, and months of dedicated, focused work training the algorithm with representative data. The codebase can jumpstart the process, but it is only the beginning.

As part of the work we're doing at Common Sense Media, Jeff Graham and I are working with a large number of school districts on a system that streamlines the process of evaluating the legal policies and terms of a range of education technology applications.

The first part of this work involves tracking policies and terms so we can (among other things) track changes to policies to alert us when we need to update an evaluation. There are a range of other observations this will allow - and we have started talking about some of them already.

The second part of this work involves mapping specific paragraphs in privacy policies to specific privacy concerns. When it comes to evaluating policies, this analysis is the most time consuming. Doing it well requires reading and re-reading the policies to pull relevant sections together. While there are ways to simplify this, these methods are more useful for a general triage than a comprehensive review.

However, Jeff has been looking at machine learning as a way to simplify the initial triage for a while. To understate things, it's complicated. Doing it right - and training and adjusting the algorithm - is no small feat. Implementing machine learning as part of the privacy work is a distant speck on a very crowded roadmap. It's incredibly complicated, and we have a lot of work to do before it makes sense to begin looking at implementing machine learning to do the initial categorization. But, announcements like the one from Google today get us closer.

Is This Part Of Your Social Media Training For Kids?

2 min read

While the fact that a variety of data brokers engage in digital redlining is old news, and Facebook's patent on assessing credit via friends could theoretically be dismissed as a future plan, we now have reports that credit rating agencies are using social media to assess credit ratings:

FICO is working with credit card companies to use several different methods for deciding what size loans people can handle, and using non-traditional sources like social media allows them to collect information on people who don't have an in-depth credit history

As educators, if you steer kids towards social media use, how do you prepare them for the reality that their posts are being archived by companies that will use their interactions to judge and sort them for the indeterminate future? Does your social media training and digital citizenship for kids cover how to create an online persona that is as creditworthy as possible? And how do we reconcile the needs for "authentic conversation" against a backdrop where for-profit companies invisibly mine these "authentic" interactions looking for predictors of future behavior? How many of us could withstand the actions of our youth weighted alongside our adult choices? How many parents consider these realities when they share information about their kids online?

NCES brief from 2010 on managing PII in student education records.

This is a technical doc - need to review this in more detail.

Privacy Protection and Human Error

5 min read

As part of my work, I spend a fair amount of time reading through the websites of educational technology offerings. The other day, while on the site of a well known, established product, I came across a comment from one person asking for information about another person. Both people - the commenter and the person who was the subject of the question - were identified by first and last name. The nature of the question on this site struck me as strange, so I did a search on the name of the person who left the comment.

The search on the commenter's name returned several hits - including every one of the top five - that clearly showed that the commenter is a principal at a school in the United States. Jumping to the school's webpage, it clearly shows that the principal's school supports young children. With that information, I returned to the comment. Knowing that the original questioner is a principal of a school, it became clear that the subject of the question - who, remember, is identified by first and last name - is almost certainly a student at the school.

I had stumbled across a comment on an edtech site where a principal identified a student at their school by name, asked a question that implied an issue with the student, on the open web. The question had been asked over a month ago.

To make matters worse, the principal's question about a student on the open web had been responded to by the vendor. Staff for the company answered the question, and left the thread intact.

In this post, we're going to break down the ways that this exchange is problematic, what is indicated by these problems, and what to do when you encounter something similar in the future.

The Problems

Problem 1: The principal who asked the original question has access to large amounts of data on kids, but doesn't understand privacy law or the implications of sharing student information - including information with implications for behavioral issues - on the open web. This problem is particularly relevant now, when some people are complaining that teachers haven't been adequately trained on new privacy laws coming on the books. The lack of awareness around privacy requirements is as old as data collection, and it's disingenuous and ahistorical to pretend otherwise.

Problem 2: The vendor responded to the question, and allowed a student to be identified by name, by that student's principal, on their product's web site. The product in question here is in a position to collect, manage, and store large amounts of student data, and much of that data contains potentially sensitive student information. Every member of their staff should be trained on handling sensitive data, and on how to respond when someone discloses sensitive information in a non-secure way. When a staff member stares a potential FERPA violation in the face and blissfully responds, we have a problem.

This problem is exacerbated by rhetoric used by a small but vocal set of vendors, who insist that they "get" privacy, and that people with valid privacy concerns are an impediment to progress. Their stance is that people should get out of their way and let them innovate. However, when a vendor fails to adequately respond to an obvious privacy issue, it erodes confidence in the potential for sound judgment around complicated technical, pedagogical, and ethical issues. If a vendor can't master the comment field in blogging software, they have no business going anywhere near any kind of tracking or predictive analytics.

How To Respond

If you ever see an issue that is a privacy concern, reach out to the company, school, and/or organization directly. In this case, I reached out via several private channels (email, the vendor's online support, and a phone call to their support). The comment with sensitive data and the vendor's response were removed within a couple hours. A private response is an essential part of responsible disclosure. We make privacy issues worse when we identify the existence of an issue before it has time to be addressed.

For principals and educators, and anyone in a school setting who is managing student data: spend some time reading through the resources at the federal Privacy Technical Assistance Center. While some of the documents are technical, and not every piece of information will be applicable in every situation, the resources collected there provide a sound foundation for understanding the basics. At the very least, schools and districts should create a student data privacy protection plan.

For vendors, train your staff. If you're a founder, train yourself. For founders: start with the PTAC and FERPA resources linked in this document. Cross reference the data you collect for your application with the data covered under FERPA. If there is any chance that you will have any people under the age of 13 using your site, familiarize yourself with COPPA. Before you have any student data in your application, get some specific questions about your application and your legal concerns and talk with a lawyer who knows privacy law.

For staff: make sure you have a Data Access Policy and some training on how to respond if a customer discloses private information. If you are part of an accelerator, ask for help and guidance. Talk to other companies as well. This is well work ground, and there is some great work that has been done and shared.


Privacy is complicated. We will all make mistakes, and by working together, over time, we will hopefully make fewer of them, and the ones we do make will be smaller in magnitude. This is why we need an increased awareness of privacy, and sound protection for student data. By taking concrete steps, however, we can improve the way we handle data, and move toward having an informed conversation around both the risks and rewards of sound data use.

Spring 2015. Auburn leaks personal data on HS students acquired from ACT

So, this data was sold/shared with Auburn, and included people who never went to Auburn.