The Privacy Divide

4 min read

One of the questions that arises in privacy work is if or how privacy rights - or access to those rights - play out across economic lines. The answer is complicated, and this post is a messy step into what will almost certainly be an ongoing series of posts on this topic. Both inside and outside education, we often talk about issues related to the digital divide, but we don't often look at a companion issue, the privacy divide.

This post is not intended to be exhaustive, by any means - and please, for people reading this, I'd love to read any resources that would be relevant and help expand the conversation.

There are a range of ways to dig into the conversation within EdTech, but one way to start to look at the issue is to examine how parents are informed of their rights under FERPA. This is an area where more work needs to be done, but even a superficial scan suggests that an awareness of FERPA rights is not evenly distributed.

Leaving FERPA aside, it's worth looking at how content filtering plays out within schools. The quotes that follow are from a post about Securly, but it's broadly applicable to any environment that defaults to filtering.

"From the Securly dashboard, the administrators can see what students have and haven’t been able to access," she explains. "If I want to see what kids are posting on Twitter or Facebook, I can--everything on our Chromebooks gets logged by Securly."

However, for students whose only access is via a school-issued machine, the level of surveillance becomes more pervasive.

"Most of our students are economically disadvantaged, and use our device as their only device," DeLapo explains. "Students take Chromebooks home, and the Securly filters continue there."

This raises some additional questions. Who is more likely to have their activities tracked via social media monitoring? If something gets flagged, who is more likely to have the results passed to law enforcement, rather than a school official?

 These patterns follow the general trends of disproportionate suspension based on race.

What zip codes are more likely to receive the additional scrutiny of predictive policing?

Throughout these conversations, we need to remain aware that the systems in use currently are designed to spot problems. The absence of a problem - or more generally, the lower probability that a problem will eventually exist - creates a lens focused on a spectrum of deficits. The absence of a problem is not the same as something good, and when we used tools explicitly designed to identify and predict problems, they will "work" as designed. In the process of working, of course, they generate more data that will be used as the justification or rationale for future predictions and judgments.

Increasing access and elminating the digital divide need to happen, but people can be given access to different versions of the internet, or access via chokepoints that behave differently. We need look no further than the stunted vision of or the efforts of major industry players to detroy net neutrality to see how these visions play out.

To be more concrete about this, we can look at how AT&T is charging extra for the right to opt out of some (but not all) ad scanning on some of its Fiber internet access offerings. Julia Angwin has documented the cost - in cash and time - she spent over a year to protect her privacy.

Taking a step to the side, current examples of how data is used show how data analysis fuels bias - from using phone habits to judge lenders, to digital redlining based on online habits, to using data to discriminate in lending.

The digital divide is real, and the need to eliminate it is real. But as move to correct this issue, we need to be very cognizant that not all access is created equal. We can't close the digital divide while opening the privacy divide - this approach would both exacerbate and extend existing issues far into the future.

Public and Private

2 min read

In discussions around digital citizenship and online footprints, the terms "private" and "public" are often misused.

A couple quick notes:

First, the "private" settings on most social media platforms are a joke. Within social media, "private" often means "just visible to my friends", which in turn means "it appears on my friend's timeline." So, even if you are locking down your content to your friends, in many cases this means that your "privacy" is determined by the judgment of your least secure friend.

And, of course, this doesn't even touch the reality that "private" in most social software means that all content is fully visible and accessible to the mining algorithms of the company providing the service, and their affiliates. This, "private" means "invisibly shared with organizations who exist by selling predictions about your behavior for the foreseeable future."

These public/private conversations also don't mention the constant practices of brand monitoring, social media monitoring in education, and social media monitoring by law enforcement. If you are a kid in a school, ask your district and your school board if they use social media monitoring tools, and how you - as a student - can review what the school has collected.

People often also give advice about tagging photos. The process of tagging photos is simple: it's how companies crowdsource improving their facial recognition algorithms. But not tagging photos isn't the same as respecting someone's privacy. Facial recognition works, and social media companies already know who your friends are - because you tell them explicitly with the "friend" mechanisms they provide, and implicitly via who you interact with. Geolocation tracking makes this easier.

So when you upload a picture of you with three of your friends, the algorithms already have a small pool of people who it might be - and they can use facial recognition software to narrow the field. For example, Facebook's facial recognition software is now accurate to the point where it can use other visual cues to identify a person.

So, let's talk about public and private, but let's also be explicit that true privacy is elusive. What frequently gets called privacy is really surveillance that has yet to be noticed.

Portland Public Schools, and a Calendar Built For Adults

4 min read

When talking about school and learning, scheduling and routines are frequently overlooked. A level of stability and predictability can be helpful for both kids and their families. This year, the Portland Public Schools calendar shows how not to build a schedule.

At the outset, I want to highlight that the issues with the calendar have been co-created by the district and the union - much of the choppiness in this calendar are the result of the district meeting requirements in the union contract. But both the district and the union need to come together to fix this.

Right now, we have a calendar that works for adults, but not for kids and their families.

You can verify my tallies against the PPS Calendar.

  • Week 1: Start Thursday, Aug 27th. 2 days.
  • Week 2: Aug 31 to Sept 4. Full week
  • Week 3: Sept 8 to Sept 11. 4 days (Labor Day)
  • Week 4: Sept 14 to 18. 4.5 days; late start on Wednesday.
  • Week 5: Sept 21 to 25. Full week
  • Week 6: Sept 28 to Oct 2nd: Full week
  • Week 7: Oct 5 to Oct 8. 4 days
  • Week 8: Oct 12 to 16. 4.5 days; late start on Wednesday.
  • Week 9: Oct 19-20. 2 days. Conferences for three days.
  • Week 10: Oct 26-29. 4 days. 
  • Week 11: Nov 2 to 6. Full week.
  • Week 12: Nov 9 to 13.  4 days. Veterans Day.
  • Week 13: Nov 16 to 20. Full week.
  • Week 14: Nov 23 to 25. 3 days. Thanksgiving.
  • Week 15: Nov 30 to Dec 4. Full week.
  • Week 16: Dec 7 to Dec 11. Full week.
  • Week 17: Dec 14 to 18. 4.5 days; late start on Wednesday.
  • Week 18: Jan 6 to 8. 3 days (snow days).
  • Week 19: Jan 11 to 15. Full week.
  • Week 20: Jan 19 to Jan 22. 3.5 days; late start on Wednesday and MLK holiday.
  • Week 21: Jan 26 to 29. 4 days.
  • Week 22: Feb 1 to 5. Full week.
  • Week 23: Feb 8 to 11. Full week.
  • Week 24: Feb 16 to 19. 3.5 days; late start on Wednesday and President's Day holiday.
  • Week 25: Feb 22 to 26. Full week.
  • Week 26. Feb 29 to Mar 4. Full week.
  • Week 27: Mar 7 to 11. Full week.
  • Week 28: Mar 14 to 18. 4.5 days; late start on Wednesday.
  • March 21: Spring Break.

In the first 14 weeks - from the start of school to Thanksgiving - PPS students have only 5 complete (aka full) weeks of school, and only two consecutive full weeks. Of the remaining 9 weeks, three of the weeks have three days or less, with the remaining 6 weeks being fractured by holidays and/or late starts. The late starts are especially disruptive - and the burden lands heaviest on working families with younger kids, because these families need to find childcare and transportation to get their kids to school. Anecdotally, late starts are also a culprit in increased absenteeism.

Between Thanksgiving and the winter holiday (also known as peak cold and flu season) we manage to string together two consecutive full weeks. But, as anyone who has worked in a school will tell you, the weeks between Thanksgiving and Winter break can be difficult to do detailed work because they are sandwiched between two significant vacations. Additionally, as anyone who has worked in schools can attest, the week before an extended vacation can be more hectic than usual for a variety of reasons.

In January, the choppiness returns in force, with one full week of school in the entire month.

In February - the 22nd week of the school year - we are finally ready to actually get down to school, with 5 out of 6 weeks being full weeks. For a lot of kids, this is too little, too late. The time when they need this consistency the most is at the beginning of the school year. By February, patterns have been established, and relationships have been formed. While the schedule at PPS probably meets the legal requirements for seat time and calendar days, it would be wonderful to see PPS and the teacher's union come together to put together a calendar that actually works for kids and families.

Do You Like Working On Privacy? So Do We!

1 min read

Common Sense Media is hiring a Privacy Editor. If you want to be part of a team that is working on improving privacy and information security within educational technology, read the job description and apply! We'd love to find someone in Portland, OR, but we also recognize that there are smart, talented people all over the country.

Please, share this with people who you think might be interested.

Some Additional Questions On Data Sharing and Collection

2 min read

Senator Al Franken has sent a letter to Google CEO Sundar Pichai asking some questions about Google's data collection practices. His letter raises some excellent questions, but there are two specific use cases that are left out that also need to be addressed. In a post from May 2015, I described some of the issues with how Google structures their terms for Google Apps for Education; these questions build on that post.

  • If a student is logged into their GAFE account and accesses a non-GAFE service provided by Google, can data collected via the non-GAFE service be shared with ad and data brokers, like Doubleclick or Quantcast?
  • If a student is logged into their GAFE account and accesses a 3rd party service that is integrated with GAFE, what student data can be accessed by that 3rd party vendor?

The focus on targeted advertising is important, but ultimately is too narrow. Targeted advertising is a visible manifestation of data collection and user profiling. While the presence of targeted ads let us know that some services are tracking users, larger potential harms accrue when people are invisibly profiled, with their data being shared with different brokers and combined with data sets from different sources. This should not happen within an educational context where students are legally required to attend.

It's also worth noting that while these specific questions focus on Google, comparable abuses of student data occur in many ecosystems that have a business plan based on 3rd party integrations.

Surveillance, Worst Case Scenarios, and the Winceable Moment

4 min read

In discussing issues related to privacy, people often devolve to trying to identify and define immediate harm and/or a worst case scenario. Both of these lenses are reductive and incomplete. Because data analysis often occurs invisibly to us, via proprietary algorithms that we don't even know are in play, assigning harm can be a matter of informed guesswork and inference. As one example, try explaining how and why your credit score is determined - this algorithmically defined number determines many opportunities we receive or don't receive, yet few of us can say with any certainty how this number is derived. Algorithms aren't neutral - they are a series of human judgments automated in a formula. There isn't any single worst case scenario, and discussions of worst case scenarios risk creating a false vision that there is a single spectrum with "privacy" at one one and some vague "worst case scenario" at the other - and this is not how it works.

The reason privacy matters - and the reason that profiling matters - is that we are seeing increasingly experimental and untested uses of data, especially in the realm of predictive analytics. Products using new statistical methods are used in hiring decisions, lending, mortgage decisions, finance, search, and personalization. The hype is that these new - or "innovative" or "disruptive" - uses of data will help us get more efficient, and push pass the biases of the past. However, this fails in at least two ways: first, algorithms contain the biases of their creators. Second, the performance of these products fails to live up to the hype, which in turn doesn't justify the risk.

Data collected in an educational setting - by definition - is data collected on people in the midst of enormous development, questioning, and growth. If people are doing adolescence right, they will make mistakes, ask questions, break things - all in the name of growth and learning. In the context of, for example, an eighth grade classroom, it all makes sense. But outside that context, it's very different. One of the promises of Big Data and Learning Analytics is that the data sets will be large enough to allow researchers to distill signal from noise, but as noted earlier, the reality fails to live up to the hype.

How many of us have memories of our behavior from high school, middle school, and elementary school that make us wince? Those winceable moments are our data trail. I mentioned earlier that talking about worst case scenarios is an inaccurate frame, and this is why: there is no single data point that, if undone, can "fix" our past. However, data collected from our adolescence is bound to contain things that are inaccurate, temporary, flawed, or confusing - for us, and for people attempting to find patterns.

If people are aware of surveillance, it shifts the way we act. When students are habituated to surveillance from an early age, it has the potential to shift the way we develop. If this data is shared outside of an educational context, it creates the potential that every person attending a public school is fully profiled before they graduate. A commonly overlooked element of this conversation is that profiles never come from a single source - they are assembled and combined from multiple sources. When data collected within an educational context gets combined with data sets collected from social media or our personal browsing history, different stories emerge.

For most people over 30 reading this post, our detailed records begin early in the 21st century, when we were 15 or older. For some kids in school now, their data profile begins when their parents posted their ultrasound on Facebook. While targeted advertising to kids is an immediate concern, at least targeted advertising is visible. Profiling by alogorithm is invisible, and is forever. Requiring students to pay for their public education with the data that will then be used to judge them sells out our kids. We can use data intelligently, but we need to have a candid conversation about what that means.

You Look Unprofessional When You Copy Your Policies

2 min read

Examples like this abound, but here's one. Searching for the exact phrase "resources we make available on our Site. Users may visit our Site anonymously" returns over 15,000 hits.

You can do this yourself by going to the policies on any site, and selecting a random phrase. I recommend selecting a the second half of one sentence and the beginning of the next, to include about 10-15 words total. If you find a typo or a misspelled work, use that word on the off-chance that the mistake has been faithfully plagiarized. If you are searching using Google, be sure to enclose the phrase in quotation marks to only get results using the exact string.

This is a simple, fast test to do on any site you think about using. It takes all of about 30 seconds. While plagiarized terms don't necessarily mean that there are issues with the site, it does indicate that the people behind the site have cut corners and taken shortcuts around privacy issues. Given that many breaches occur due to human error, and that problems we can observe often indicate issues that are hidden from view, plagiarized terms should at least give us pause.

If you are a vendor, do this test on your own terms. Pull ten excerpts at random, and see what comes up. If you discover that, for whatever reason, your terms have been plagiarized and you don't know why, you can then begin to fix the issue.

FERPA Directory Information as Anchor Data

3 min read

I'm currently working on a longer piece about education technology, the "school official" designation, parental consent, and how these legal definitions get frayed in practice. There has been a lot of attention paid to the school official designation in recent weeks, which is good. However, there have also been some inaccuracies and overstatements, and an unrealistic focus on one company - Google - in these conversations. While the actions of Google are interesting and very relevant because of their size and reach, we should not focus exclusively here on one company: the practices Google uses are widespread, and as we pursue better privacy practice we are better off identifying practices and habvits that need to improve, rather than playing whackamole with companies.

But, the process of writing this longer (not yet finished) piece has also spurred me on to finally get some thoughts out on directory information. As described in this FERPA directory information model form, "Directory information, which is information that is generally not considered harmful or an invasion of privacy if released, can also be disclosed to outside organizations without a parent’s prior written consent."

The list of information included as part of directory information - or "information that is generally not considered harmful or an invasion of privacy if released" - is pretty complete:

  • Student's name
  • Address
  • Telephone listing
  • Electronic mail address
  • Photograph
  • Date and place of birth
  • Major field of study
  • Dates of attendance
  • Grade level
  • Participation in officially recognized activities and sports
  • Weight and height of members of athletic teams
  • Degrees, honors, and awards received
  • The most recent educational agency or institution attended
  • Student ID number, user ID, or other unique personal identifier used to communicate in electronic systems
  • A student ID number or other unique personal identifier that is displayed on a student ID badge

If this information was compromised as part of a data breach, it would be considered substantial - yet, this information about children can be shared without parental consent, for their entire K12 experience.

Directory information data is accurate, high quality data. It places a person in a school. It ties them to a location. It provides a list of friends or acquaintances (kids in the same year at the same school). It provides a date and place of birth. It can provide a unique ID that could appear in other data systems - a boon for data enhancement via recombination. It includes a photo, and can include their height and weight. It's also worth noting that each school district identifies directory information differently, so some schools might share less information than what is listed in the model letter.

This data is updated annually, so that over the thirteen years of a child's K12 education, the data trail of directory information provides a comprehensive snapshot of a child's life from age 5 to 18.

Or, to put it in the words of Experian, directory information can be used to presort our high school students into those more likely to be "American Royalty" or those who are "Small Town Shallow Pockets".

While FERPA does allow parents to opt out of this data sharing, parental rights are often poorly understood.

As it currently stands, directory information creates an accurate, high quality dataset that provides the foundation for uses ranging from identity theft to profiling. While the standard argument in favor of easy sharing of directory information generally center around creating yearbooks, school play brochures, and media coverage of student athletes, we need to celebrate student accomplishments without compromising student privacy. To achieve that, we should reconsider how we handle directory information.


I'd Like Some "Smart" in Smart Toys, aka Hello Barbie Is Extra Chatty

2 min read

As reported in The Guardian, Hello Barbie had multiple security issues that would have allowed hackers to reroute the doll's communication to a different server, and override the doll's privacy settings. This initial breach would allow an attacker to access account information that would allow a hacker to know a home address, listen in to the doll's immediate surroundings, and even take over the responses that the doll said.

But the fun doesn't end there. Because the doll connects to the internet via a home wireless connection, hackers could use that information to take over the home internet connection, monitor traffic on the connection, and compromise devices and steal information within the home network. A compromised Hello Barbie doll would also be a great tool for people looking to see if and when a family is home. No need to place a microphone inside the house, because Jimmy is playing with it!

It's worth noting that these security issues were discovered by a researcher and disclosed to the manufacturer responsibly. As reported in the Guardian, ToyTalk, the company that partners with Mattel to make Hello Barbie, downplayed the multiple security holes.

An enthusiastic researcher has reported finding some device data and called that a hack. While the path that researcher used to find that data is not obvious and not user-friendly, it important to note that all that information was already directly available to Hello Barbie customers through the Hello Barbie Companion App. No user data, no Barbie content, and no major security nor privacy protections has been compromised to our knowledge.

If the manufacturer of a toy can't understand how an attack that compromises an entire home network isn't a "major security or privacy" issue, they need to get into another line of work. I don't think we're ready for smart toys until manufacturers can behave responsibly.

Privacy, Parenting, and the VTech Breach

1 min read

I put a post out on the Common Sense Media blog about the VTech breach:

Late last week, news broke about a data breach at the popular game and toy manufacturer VTech. The initial description of the breach sounded fairly large: Nearly five million parent accounts and 200,000 student accounts were compromised. If we make the assumption that one account equals one person, that's the equivalent of a data breach that covers every person living in Wyoming, Vermont, the District of Columbia, Alaska, North Dakota, South Dakota, and Delaware combined.

Read the whole post over at Common Sense.