Do You Like Working On Privacy? So Do We!

1 min read

Common Sense Media is hiring a Privacy Editor. If you want to be part of a team that is working on improving privacy and information security within educational technology, read the job description and apply! We'd love to find someone in Portland, OR, but we also recognize that there are smart, talented people all over the country.

Please, share this with people who you think might be interested.

Privacy, Parenting, and the VTech Breach

1 min read

I put a post out on the Common Sense Media blog about the VTech breach:

Late last week, news broke about a data breach at the popular game and toy manufacturer VTech. The initial description of the breach sounded fairly large: Nearly five million parent accounts and 200,000 student accounts were compromised. If we make the assumption that one account equals one person, that's the equivalent of a data breach that covers every person living in Wyoming, Vermont, the District of Columbia, Alaska, North Dakota, South Dakota, and Delaware combined.

Read the whole post over at Common Sense.

This is why we encrypt hard drives, folks #privacy #humanerror

Student data on computers that were stolen. They were probably just after the computers. The data was an unfortunate coincidence.

Privacy Policies and Machine Learning

3 min read

Today, Google announced the release of their v2 machine learning system under an open source license. This is a big deal for a few reasons. First, to understate things, Google understands machine learning. The opportunity to see how Google works on addressing machine learning will save huge numbers of people huge amounts of time. Second, this lets us take a look inside what is generally a black box. We don't often get to see how ratings, reviews, recommendations, etc are made at scale. This release peels back one piece of one curtain and lets us look inside.

Before we go any further, it's worth highlighting that machine learning - even with a solid codebase - is incredibly complex. Doing it well involves a range of work on infrastructure, data structure, training the algorithm, and ongoing, constant monitoring for accuracy - and even then, there is still a lot of confusion and misconception about what machine learning does, and what machine learning should do. So, before we proceed any further it needs to be highlighted that doing machine learning well requires (at the very least) clearly defined goals, a reliable dataset, and months of dedicated, focused work training the algorithm with representative data. The codebase can jumpstart the process, but it is only the beginning.

As part of the work we're doing at Common Sense Media, Jeff Graham and I are working with a large number of school districts on a system that streamlines the process of evaluating the legal policies and terms of a range of education technology applications.

The first part of this work involves tracking policies and terms so we can (among other things) track changes to policies to alert us when we need to update an evaluation. There are a range of other observations this will allow - and we have started talking about some of them already.

The second part of this work involves mapping specific paragraphs in privacy policies to specific privacy concerns. When it comes to evaluating policies, this analysis is the most time consuming. Doing it well requires reading and re-reading the policies to pull relevant sections together. While there are ways to simplify this, these methods are more useful for a general triage than a comprehensive review.

However, Jeff has been looking at machine learning as a way to simplify the initial triage for a while. To understate things, it's complicated. Doing it right - and training and adjusting the algorithm - is no small feat. Implementing machine learning as part of the privacy work is a distant speck on a very crowded roadmap. It's incredibly complicated, and we have a lot of work to do before it makes sense to begin looking at implementing machine learning to do the initial categorization. But, announcements like the one from Google today get us closer.

Privacy Protection and Human Error

5 min read

As part of my work, I spend a fair amount of time reading through the websites of educational technology offerings. The other day, while on the site of a well known, established product, I came across a comment from one person asking for information about another person. Both people - the commenter and the person who was the subject of the question - were identified by first and last name. The nature of the question on this site struck me as strange, so I did a search on the name of the person who left the comment.

The search on the commenter's name returned several hits - including every one of the top five - that clearly showed that the commenter is a principal at a school in the United States. Jumping to the school's webpage, it clearly shows that the principal's school supports young children. With that information, I returned to the comment. Knowing that the original questioner is a principal of a school, it became clear that the subject of the question - who, remember, is identified by first and last name - is almost certainly a student at the school.

I had stumbled across a comment on an edtech site where a principal identified a student at their school by name, asked a question that implied an issue with the student, on the open web. The question had been asked over a month ago.

To make matters worse, the principal's question about a student on the open web had been responded to by the vendor. Staff for the company answered the question, and left the thread intact.

In this post, we're going to break down the ways that this exchange is problematic, what is indicated by these problems, and what to do when you encounter something similar in the future.

The Problems

Problem 1: The principal who asked the original question has access to large amounts of data on kids, but doesn't understand privacy law or the implications of sharing student information - including information with implications for behavioral issues - on the open web. This problem is particularly relevant now, when some people are complaining that teachers haven't been adequately trained on new privacy laws coming on the books. The lack of awareness around privacy requirements is as old as data collection, and it's disingenuous and ahistorical to pretend otherwise.

Problem 2: The vendor responded to the question, and allowed a student to be identified by name, by that student's principal, on their product's web site. The product in question here is in a position to collect, manage, and store large amounts of student data, and much of that data contains potentially sensitive student information. Every member of their staff should be trained on handling sensitive data, and on how to respond when someone discloses sensitive information in a non-secure way. When a staff member stares a potential FERPA violation in the face and blissfully responds, we have a problem.

This problem is exacerbated by rhetoric used by a small but vocal set of vendors, who insist that they "get" privacy, and that people with valid privacy concerns are an impediment to progress. Their stance is that people should get out of their way and let them innovate. However, when a vendor fails to adequately respond to an obvious privacy issue, it erodes confidence in the potential for sound judgment around complicated technical, pedagogical, and ethical issues. If a vendor can't master the comment field in blogging software, they have no business going anywhere near any kind of tracking or predictive analytics.

How To Respond

If you ever see an issue that is a privacy concern, reach out to the company, school, and/or organization directly. In this case, I reached out via several private channels (email, the vendor's online support, and a phone call to their support). The comment with sensitive data and the vendor's response were removed within a couple hours. A private response is an essential part of responsible disclosure. We make privacy issues worse when we identify the existence of an issue before it has time to be addressed.

For principals and educators, and anyone in a school setting who is managing student data: spend some time reading through the resources at the federal Privacy Technical Assistance Center. While some of the documents are technical, and not every piece of information will be applicable in every situation, the resources collected there provide a sound foundation for understanding the basics. At the very least, schools and districts should create a student data privacy protection plan.

For vendors, train your staff. If you're a founder, train yourself. For founders: start with the PTAC and FERPA resources linked in this document. Cross reference the data you collect for your application with the data covered under FERPA. If there is any chance that you will have any people under the age of 13 using your site, familiarize yourself with COPPA. Before you have any student data in your application, get some specific questions about your application and your legal concerns and talk with a lawyer who knows privacy law.

For staff: make sure you have a Data Access Policy and some training on how to respond if a customer discloses private information. If you are part of an accelerator, ask for help and guidance. Talk to other companies as well. This is well work ground, and there is some great work that has been done and shared.


Privacy is complicated. We will all make mistakes, and by working together, over time, we will hopefully make fewer of them, and the ones we do make will be smaller in magnitude. This is why we need an increased awareness of privacy, and sound protection for student data. By taking concrete steps, however, we can improve the way we handle data, and move toward having an informed conversation around both the risks and rewards of sound data use.

Details/background on Universal 2nd Factor. Nice set of supporting libraries/tools here as well. #privacy

Resources for a standalone server, integrations with existing CMS's, and other tools and info to get started.

What Peeple Tells Us About Privacy

2 min read

The latest Internet furor-de-jour is over an app called Peeple. This post is not going to get into the details or problems with the app, as other people have already done a great job with that.

In brief, the app allows anyone with a Facebook account to rate anyone else. No consent is needed, or asked. All a person needs to rate another person is their phone number.

As seen in the links above (and in a growing angry mob on Twitter), people are pointing out many of the obvious weaknesses in this concept.

The reason many people are justifiably furious about Peeple is that it allows strangers to rate us, and makes that rating visible as a judgment we potentially need to account for in our lives. However, what Peeple aims to do - in a visible and public way - is a small subset of the ways we are rated and categorized every day by by data brokers, marketers, human resources software, credit ratings agencies, and other "data driven" processes. These judgements - anonymous, silent, and invisible - affect us unpredictably, and when they affect us we often don't know about it until much later, if at all.

While Peeple is likely just a really bad idea brought to life by people with more money and time than sense, I'm still holding out hope that Peeple is a large scale trolling experiment designed to highlight the need for increased personal privacy protections.

Some Tips For Vendors When Looking At Your Privacy Policies

4 min read

This post is the result of many conversations over the last several years with Jeff Graham. It highlights some things that we have seen in our work on privacy and open educational resources. This post focuses on privacy, but the general lesson - that bad markup gets in the way of good content - holds true in both the OER and the privacy space.

When looking at privacy policies and terms of service, the most important element is the content of the policy. However, these policies are generally delivered over the web, so it's important to look at how the pages containing these policies perform on the web. Toward that end, here are some simple things that vendors should be doing to ensure that their policies are as accessible as possible to as many people as possible, with as few barriers as possible.

Toward that end, here are four things that vendors should be doing to test the technical performance of their policies.

  • View the source. In a web browser, use the "view source" option. Does the text of your policy appear in the "main content" area of your page, or some semantic equivalent? Are you using h1-h6 tags appropriately? These are simple things to fix or do right.
  • Google your privacy policy and terms of service, and see what comes up. First, use the string "privacy policy OR terms of service [your_product_name]". See what comes up. Then, use the more focused "privacy policy OR terms of service" - in this search, be sure to omit the initial "www" so that your search picks up any subdomains.
  • Use an automated tool (like PhantomJS) to capture screenshots of your policies. If PhantomJS has issues grabbing a screenshot of your page, it's a sign that you have issues with the markup on your page.
  • Use a screenreader to read your page. Listen to how or if it works. Where we have observed issues with a page failing to behave in a screenreader, it's frequently due to faulty markup, or the page being loaded dynamically via javascript.

To people working on the web or in software development, these checks probably sound rudimentary - and they are. They are the technical equivalent of being able to tie your shoes, or walking and chewing gum at the same time.

In our research and analysis of privacy policies, we have seen the following issues repeated in many places; some of these issues are present on the sites of large companies. Also worth noting: this is a short list, highlighting only the most basic issues.

  • Pages where the policies are all wrapped in a form tag. For readers unfamiliar with html, the form tag is used to create forms to collect data.
  • Pages where, according to the markup, the policies are part of the footer.
  • Pages where, according to character count, the actual policies only account for 3% of the content on the page, with the other 97% being markup and scripts.
  • Sites where Google couldn't pick up the text of the policy and was only able to index the script that is supposed to load it.

We are not going to be naming names or pointing fingers, at least not yet, and hopefully never. These issues are easy to fix, and require skills that can be found in a technically savvy middle schooler. Vendors can and should be doing these reviews on their own. The fix for these issues is simple: use standard html for your policies.

We hear a lot of talk in the privacy world about how privacy concerns could stifle innovation - that's a separate conversation that will almost certainly be the topic of a different post, but it's also relevant here. When the people claiming to be the innovators have basic, demonstrable problems mastering html, it doesn't speak well to their ability to solve more complex issues. Let's walk before we run.

Breakdown, by state, of how voter registration info and history is sold. #privacy

A state by state breakdown of how voter information and history is sold.