Students, Data, and Blurred Lines: A Closer Look at Google’s Aim to Organize U.S. Youth and Student Information
Thorough read on GAFE and Business practices. Solid footnotes. From April, 2015.
4 min read
For schools running programs using Apple hardware, Apple has set up a guide to streamline the process of creating an Apple ID for each student.
This guide includes a section on creating accounts for students under the age of 13:
The instructions quoted above indicate that the school is the broker for arranging parental consent. This is a fairly standard practice among edtech companies (whether or not this is good practice is a different conversation). We will also look at the linked Parental Disclosure Consent doc later in this post.
Apple's guide includes step by step instructions for creating Apple IDs. The first step merits a close reading:
Step 1. Prepare Apple ID request. To create new Apple ID accounts, you will need to upload a correctly formatted, comma-separated value (CSV) file containing a list of students who need Apple IDs. To download a template, go to Import Accounts and click Download Account Template. To edit the template, use an application such as Numbers, Microsoft Excel, or another application that can save CSV files. To complete the template, you will need to provide the batch number or batch name, student name, student Apple ID, student date of birth, and the parent or guardian email address for each student request.
To highlight a couple key points, Apple's process requires that every school prepare a text file (CSV stands for comma separated values) with the name, birthdate, and parent contact of every student. Text files are notoriously insecure - anyone who gets this file can access the information within it. So, Apple's recommended method for creating student IDs requires creating a comprehensive list of sensitive student data in one of the least secure formats available.
This post explaining the process goes a step further; they use student Gmail addresses for the Apple ID. Practically, this means that if this file was ever compromised, the people who accessed the file would have student names, dates of birth, parent email, and student email.
In case people wonder why this is a concern: when you store data in an insecure format, you expose your students to greater risk - as happened with these students in New Orleans. And these students in Escondido. And these students in Seattle. And these students in Maine.
By encouraging the use of CSV files to move sensitive student information, Apple encourages insecure data handling practice. It's unacceptable for any age group, but it somehow feels worse for students under the age of 13. The fact that this is accepted as sound practice by edtech staff anywhere is also problematic.
The full text of the Parent Disclosure doc is essential reading, but we will highlight a couple sections here. For example, after Apple has parental consent, they are clear that they will collect a range of information from all students.
We may collect other information from your student that in some cases has been defined under COPPA as personal information. For example, when your student is signed in with his or her Apple ID, we may collect device identifiers, cookies, IP addresses, geographic locations, and time zones where his or her Apple device is used. We also may collect information regarding your student's activities on, or interaction with our websites, apps, products and services.
Apple states very clearly that they will tie location and behavior to an identity for all students, of all ages.
At times Apple may make certain personal information available to strategic partners that work with Apple to provide products and services, or that help Apple market to customers. Personal information will only be shared by Apple to provide or improve our products, services and advertising; it will not be shared with third parties for their marketing purposes.
In this clause, Apple clearly states that they (Apple) will use personal information to improve Apple's advertising and marketing.
According to Apple's published policies and documentation, participating in a school's iPad program requires student data flowing to Apple's marketing and advertising, and encourages sloppy data handling by schools.
2 min read
Trying to find Google's Apps for Education Terms of Service page is akin to spending a weekend unicorn hunting while quaffing cocktails from the Holy Grail.
And please, if I have missed an obvious place where Google's current Terms of Service for Apps for Edu are linked, please tell me. I have spent a foolishy long amount of time trying to nail this down, and I would love to know that I had missed something obvious. The shadow of a PEBKAC looms long, and could easily extend into this examination.
Our quest for current Google Apps for Edu Terms of Service leads from Google's Trust page, to the signup page where a district or school would get Apps for Edu, to the Product list, to the product overview page, to the top-level Google for Education page, to Google search (which leads to outdated terms).
It seems like the most reliable way to see the Terms of Service for Google Apps for Edu is to ask people who are already running Google Apps. It really shouldn't be this complicated.
Google could improve this easily by taking the following basic steps:
Future posts will address some of the ways that Google's terms allow student data to leak out and be used outside the Apps for Edu terms of service. However, that is a separate issue from basic transparency. For a company founded on making data on the web more discoverable, the opacity of Google's basic terms should be an easy problem for Google to fix.
2 min read
Last night, I had the opportunity to present on privacy issues in educational technology to the Portland EdTech meetup. We had about 45 minutes to talk, which let us scratch the surface. We also had a good mix of vendors, people from higher ed, and people from K12. I'd love to see parents and (gasp) students in the mix at future events. I strongly prefer getting different stakeholders together, as all stakeholders benefit from hearing different perspectives and concerns.
The slides from the presentation are on Google Drive. The presentation is licensed under a Creative Commons Attribution-Share Alike license. Feel free to use any piece of it, and link to this post by way of attribution.
Below, I pulled out links to useful resources from the presentation.
This is FAR FROM COMPREHENSIVE. Rather, these small examples show some of the complexities involved in deidentifying data, and how combining data sets can render some efforts at deidentification meaningless.
3 min read
As is very obvious to the three regular readers of the FunnyMonkey blog, we care about privacy. Our work around privacy comes directly from our belief that learner agency and learner control are both essential elements in education, and frequently ignored elements of our educational process. Our commitment to learner agency informs much of the work we do - it's why, in addition to privacy, we care about open content, student-directed portfolios, and empowering people and organizations via open source tools.
Over the last eleven years, as part of our work with FunnyMonkey, we have been able to work on a range of projects covering all of these issues. We have been fortunate to work with some amazing people at some amazing organizations. Although software development is a big part of what we do, we never looked at software as the end goal of any project. Technology isn't neutral, and we always worked with people to make sure that any solution removed barriers to doing good work. If writing code was part of making things better, so be it.
For the last seven years, Jeff Graham has been directly involved in shaping and guiding the work we do. Jeff is a rarity among developers - equally comfortable discussing deployment process, the pros and cons of different open licenses, scalability, security, and the emotions of people as they interact with the software we build. There really isn't much we've done over the last seven years that hasn't been made better by his insights and expertise.
Increasingly, as our work around privacy has ramped up, we have been looking at ways to improve both awareness of privacy, and practice around privacy and security. Many of those ideas have been shared here, on this blog. Now, the time feels right - more people seem to be aware of a broader range of issues related to privacy and data use than at any point in the post-NCLB era.
To further this work, both Jeff and I will be joining Common Sense Media this summer. The team at Common Sense is already doing amazing work around student data privacy, and we are incredibly excited to be able to join them. The time feels right. Over the past few years, privacy was the thing we made time for among the other areas of our work. Now, privacy will be the thing we do. Fun times lay ahead.
2 min read
If you are building and selling an EdTech product, you can benefit from having people on the sales and marketing teams go through the user-facing interface of your product. As part of this process, you should also have members of these teams read through your terms of service and privacy policies.
I observed that one of the odd things about the service was that it appeared to have two separate policies: one for the free service, and a second set of terms for the premium service.
This screencast breaks down how this issue is presented to end users.
An easy and obvious fix for this would be to have the links from the premium signup page point back to the privacy policies and terms from the main PaperRater site. At that point, there could be a discussion about how to improve the actual terms with clarity around what terms applied to the free and fee-based service.
Of course, it's also possible that different terms should apply to fee-based and free services. But, if that is the case, then the differences should be made clearly and transparently to end users.
3 min read
Over the last few years, we have been looking at ways in which privacy policies and data stewardship can be improved. Over that time, one of the issues we have encountered repeatedly is that it is difficult to track how and why policies change over time. This lack of transparency hurts people who want to learn about privacy, and how an application treats student data. It also hurts companies - these decisions should be part of organizational culture, and losing them means losing an opportunity to see how a company has evolved and improved over time.
These issues are addressed via a small, simple change: placing terms of service, privacy policies, and other related policy docs on GitHub. Over on the Clever blog, Mohit Gupta has a great blog post describing how to get this done.
The short version: to get started here, all you need to is create a repository on GitHub that contains your terms. Ideally, use this structure:
https:/. Use Clever's terms as an example.
Putting policies on GitHub creates some immediate benefits that will accrue over time.
Putting terms on GitHub is not a panacea - this won't magically fix weak terms. However, making terms of service and policies available to broader audiences in an accessible format will help more people understand how data gets used - and doesn't get used - in software. Creating concrete steps that help companies commit to greater transparency helps shift norms around privacy. Creating tools that help us identify sound practice allows us to improve the conversation around privacy, one facet at a time.
Most importantly, this is something that can be done now.
I'm very happy to say that a group of companies have already committed to getting their policies on GitHub. In the next 1-2 weeks, we will be announcing the "official" launch, and doing some additional outreach. If you want to get your terms onto GitHub and be part of the initial announcement, please get in touch.
3 min read
In the recent days, people have attempted to justify doxxing. Ironically, a person was doxxed in the name of student privacy. I didn't think that it would be necessary - in the education space - to have a conversation about why doxxing is a very bad idea, but here we are.
I left the text below as a comment but I wanted to post this here as well so I have a copy.
If you are in the education world, or the technology world, please speak up on this issue.
I noticed in reading through this most recent post that you omitted this pretty thorough debunking of both the doxxing angle, and the actual conflict of interest that has been used to justify the doxxing: http:/
/ hackeducation.com/ 2015/ 03/ 21/ doxxing/
You also omitted https:/
/ www.schneier.com/ blog/ archives/ 2015/ 01/ doxing_as_an_at.html - which documents doxxing going back to 2001. One of the comments on that piece is from a person who was doxxed in 1997.
Which is to say: reality often differs from the story you get told on Google trends.
But more than anything, I am left nearly speechless by this statement: "I will not spend time editing out info from public docs."
Why not? In all the cases you cite, the home address of the subject is completely irrelevant to the story you are trying to tell. Yet, you are willing to expose these people to the potential risk for harm because editing out an address will slow you down?
I've done this work. I've edited docs before - it takes around 30 seconds. Cleaning up docs so you are not exposing personal information is *sound research*. Please, add this into your workflow. It will improve your credibility.
Let's say a student hands in work filled with spelling errors. If their justification for it was, "well, I would have corrected these things, but I didn't have time," - what would your reaction be?
Finally, you also justify doxxing by saying that you only research adults, not children. This is a dangerous shield to use. Many of these adults *have children* - when you dox the parent, you put the child at risk.
Additionally, the original issue here (the social media monitoring) is rooted in expectations of privacy, and the expectation to be free from excessive, unnecessary surveillance. We *all* have those rights. You, me, and the people we disagree with. Adult or child.
Please - reconsider what you are saying in this piece. It is a dangerous escalation. At best, it will result in some Pyrric victories. At worst, someone will get hurt. Badly.
6 min read
As part of their work on PARCC, Pearson appears to be monitoring social media accounts for mention of the test. This monitoring appears to make no distinctions between student accounts, teacher accounts, or anyone else.
This recently came to light when an email from a school superintendent in New Jersey was shared publicly. I'm including a version of the email in this post for context. However, unlike many other places where this was shared, I am removing the sender's name and contact info. In this post, we'll also address some of the issues around how this story came out.
From the above email, it sounds like something like this happened:
Pearson's social media monitoring software flags a tweet that mentions the PARCC exam. This tweet was identified as containing information about a test question sent during the testing period, and therefore potentially compromising the integrity of the test. It's unclear if this determination is done by a human, or flagged by a machine.
This report got passed on to the New Jersey Department of Education. At some point, the author of the tweet was identified. It's unclear whether Pearson or NJ DOE did that, and how complicated that process was (ie, did the author have their name in their profile, or was there more to it than that). NJ DOE contacted the Superintendent where the student attends school and asks for a disciplinary consequence.
The Superintendent who received the report, however, does her own research, and uncovers that the NJ DOE and Pearson information was inaccurate. The tweet was sent after the test, and did not contain information that compromised a question.
This sequence of events raises many questions - and, unfortunately, the loudest conversations don't seem to be getting past the "Pearson is spying on our kids" handwaving.
We'll dispense with that first: EVERYONE is spying on your kids - it's called social media monitoring. Districts use social media monitoring software. Law enforcement uses it. Just google "social media monitoring school district" and "law enforcement social media monitoring school" and start reading. Brands monitor social media traffic constantly. The platforms we all use to communicate (Facebook, Twitter, Instagram, Snapchat, etc, etc, etc) ALL monitor, analyze, and sell our interactions.
Pearson is a brand. Their products are brands. For better or for worse (and no, we are not having that conversation in this post) Pearson has invested heavily in PARCC. They will monitor social media to protect that brand. Pearson is just like every other brand, and they are all currently monitoring behavior on social media. That doesn't make it okay, but that does make it both pervasive, and yesterday's news.
Several other additional questions get raised here.
It's also worth noting that, in the situation in New Jersey, the Superintendent's response was about as good as one could hope for. She took the report, examined it, and researched it. She cut through the inaccuracies in the NJ DOE claim. She was - correctly - concerned about the role of external vendors and state DOE in monitoring student conversations outside the school day, off the school infrastructure.
However, in the current testing climate, people are so eager to score points against testing companies that they distributed an email from her that contained her full name, and her contact info at her place of work. The email that was shared appears to have been shared without her knowledge or consent (UPDATE: the Superintendent did not give permission for her email to go public - found via Frank Noschese END UPDATE). So, in this case, we have a Superintendent doing right by kids having her job made more difficult by people who, in their rush to "get" Pearson, spread her contact info over the entire internet. There are ways of breaking this story and learning more about this that don't involve throwing a Superintendent under the bus.
It's also interesting to note that Pearson appears to be taking some steps to clean up their tracks on their social media monitoring. Up until yesterday, Pearson was featured as a case study for a social media monitoring firm called Tracx. Today, however, that link is dead. Using Google's cache, I grabbed a screenshot of Pearson's social media monitoring case study, as the cache will likely expire in a few days. All that remains on the Tracx site currently is the Pearson logo, at the accurately named "client logo" page. At some point, I expect this link to return a "Page not found" error, which will indicate that it has also been scrubbed.
UPDATE: In the ongoing conversations on Twitter, Jason Buell noted that, while there is outrage about Pearson monitoring discussions about a test, there has been relative silence about the ongoing use of social media monitoring that targets youth of color. There are glaring examples of profiling via social media monitoring. I have a hard time understanding the furor over Pearson alongside the acceptance of surveillance of students of color. END UPDATE.
At this point, there are more unanswered questions than anything. However, my main question when thinking about this remains rooted in how we perceive and react to learning that we are monitored. What is required to help people realize that the monitoring that surprises them in an academic context is persistent and ongoing everywhere else?
3 min read
Clever recently updated their privacy terms, and their method and process provides a good example for other companies to replicate.
In terms of the actual content, Clever made two substantial changes around how data is treated in case of a sale, transfer, or bankruptcy.
Here is Clever's original language, before the updates:
In the event of a change of control: If we sell, divest or transfer the business or a portion of our business, we may transfer information, provided that the new provider has agreed to data privacy standards no less stringent than our own. We may also transfer personal information â under the same conditions â in the course of mergers, acquisitions, bankruptcies, dissolutions, reorganizations, liquidations, similar transactions or proceedings involving all or a portion of our business.
Here is Clever's updated language, after the updates:
In the event of a change of control: If we sell, divest or transfer our business, we will not transfer personal information of our customers unless the new owner intends to maintain and provide the Service as a going concern, and provided that the new owner has agreed to data privacy standards no less stringent than our own. In such case we will provide you with notice and an opportunity to opt-out of the transfer of personally identifiable Student Data.
The updated language contains two important distinctions. First, the terms now clearly state that personal data will not be transferred as part of any transaction unless the buyer is going to maintain the service, and do so under sound policy. This goes a long way toward mitigating the risk of another ConnectEdu or Corinthian Colleges fiasco.
The second important change in this section is that the terms now guarantee notification in case of sale, paired with user opt-out of data transfers. This puts control in the hands of end users. I'm also assuming here that end users could be both schools that contract with Clever, and people who attend those schools.
These changes get a lot of things right. By identifying that a sale to a company that wants to maintain and grow the service is a different type of event than a liquidation, Clever combines good data stewardship as a parallel to maintaining the continuity of their service. By committing to notification and user opt out, Clever demonstrates a confidence in what they offer, and a respect for the trust people place in them. By putting their terms in Github, and maintaining solid commit messages, they make it very clear how their terms evolve over time.
Most importantly, both the process that Clever has used, and the content of these changes, show how this can be done well. I'd love to see other companies follow Clever's lead here.