Building Consensus for Privacy and Security

1 min read

I had the pleasure to present at ATLIS on April 19, 2016, in Atlanta. The conversation covered different facets of privacy, and how to evaluate the different attitudes toward privacy and security in schools.

One element in the conversation that we sped over involved some simple browser-based tools that highlight third party trackers. The example I used highlighted two news sites (Huffington Post and the NY Times), but the process works just as well with educational technology apps: enable Lightbeam, log in to an edtech site, and see what loads.

The full presentation is available below.

Terms of Service and Privacy Policies at CharacterLab

10 min read

I probably spend more time than recommended browsing the web and reading privacy policies and terms of service, looking for patterns. When I encounter a new app, the first thing I do is find the terms and read them. Terms are useful in a range of ways. First, what they say matters. Second, how they say it can provide insight into the service, and how the company views themselves. Third, terms can indicate the business plan (or possible business plans) of a company. Finally, the degree to which the terms align (or not) with the product can indicate how coherent the planning within a company has been. There are other elements we can glean from terms, but the points outlined here are some of the more common items that can be inferred from terms.

Last week, I encountered the terms of service at characterlab.org. They offer an application to support character growth. The terms discussed in this post were updated in August; 2015. I downloaded an archive version this morning (April 4, 2016).

The target audience of Character Lab is teachers, but they also get information about children (to set up accounts) and from children (once accounts have been set up). 

Account Creation and Parental Consent

In the process defined by the terms and reinforced via their user interface, teachers create accounts for students.

The information we collect varies based upon the type of user you are.
(i) Teachers: In order to use the Service, you will need to register for an account. In order to register, we will collect your name, login name, and institution you are associated with, grade level, years of experience, along with your telephone number and email address.
(ii) Students: Students will not be asked to provide Information. Teachers will create accounts for students by providing their name. Students and teachers will both input information related to student character assessment tests and other Services-related functions.

In the terms, parental consent is mentioned, but only in passing, in the "Eligibility" section:

You must be at least 18 years old, an emancipated minor, or possess legal parental or guardian consent, and be fully able and competent to enter into and abide by these Terms to access the Service. If you are under 13 years of age, you may only access the Service with the express permission of your legal parent or guardian.

Given the account creation workflow in place with this site, a teacher is binding a student to these terms, potentially without any parental consent. In the case of a student under the age of 13, the way the eligibility terms are written ("If you are under 13 years of age, you may only access the Service with the express permission of your legal parent or guardian.") the onus for understanding and obtaining the need for parental consent appears to be on the student, who my or may not be aware that the terms exist, and who has no role setting up their account.

At the very least, the terms should require that the teacher or school creating student accounts obtain and maintain verifiable parental consent.

A suggestion for vendors looking to avoid this circular setup: read your terms from the perspective of each of your target users. If likely scenarios exist where a person would have data in your system before that person had any opportunity to interact with your system, you should consider revising your terms, your onboarding process, or both.

Grammar Counts

From the "Protecting Children's Information" section, we are given text that fails to meet basic standards for clarity.

If you are a student, please note that your parent can view request a copy of your character report, and any and all other information associated with you, on this Site, including without limitation messages between. If you are a parent, you may request a copy of your child's character report (whether self-reported or reported by any and all other information associated with your child) on this Site by either submitting an email request to Character Lab at cgc@characterlab.org.

A couple things jump out here: first, as highlighted above, students play no role in creating their account, so the chances they would be informed that parents can request a copy via these terms is slim. Second, both sentences in the "Protecting Children’s Information" section contain grammatical errors and word omissions that make them less than comprehensible.

If you are putting out an application that collects data, read your terms. Have a good editor read your terms. Have a good lawyer read your terms. Have your lead developer read your terms. If you are the company founder, read your terms. If terms contain basic grammatical errors, or sentences riddled with omissions, it raises the question: in how many other places do similar weaknesses exist?

Data collection and minimization

In looking at the data that is collected, several areas exist where the terms claim the right to collect more data than is needed to run the service.

Your browser type, language, plug-ins, Internet domain and operating system;

This service has no need to collect information about browser plugins. Collecting this information is a component of browser fingerprinting, which is a precise method of tying a specific browser to a specific machine - which can often lead to uniquely identifying a person without collecting data traditionally considered Personally Identifiable Information (or PII). Additionally, tracking "Internet domain" seems excessive as well. While the term is pretty vague, one common definition could mean that the service tracks the domains from which requests originate, so the vendor would know if someone was connecting from the network of a specfic school or university. This information replicates a lot of what can be inferred from collecting an IP address (which characterlab.org also connects), but connecting an IP address to a domain seems unnecessary - especially because teachers are required to state a school affiliation when they register.

Moving on, the terms also claim the rights to collect and store device IDs and physical location.

Unique identifiers, including mobile device identification numbers, that may identify the physical location of such devices;

This service does not require a device ID or physical location to run. If they actually collect and retain this information, it creates a much more valuable dataset that could be compromised via a data breach or human error.

If this data is actually needed to run the application, then the terms need to clarify how and why it is used. I suspect that this is an example of something we see pretty regularly: the terms are out of sync with what the app actually does. CharacterLab is not alone in claiming the rights to obtain device IDs. Many other EdTech companies do this. While it is easy to get a device ID, it is generally not necessary, and many EdTech companies could eliminate this practice with no negative effect on their service.

Data collection and retention should be minimized to reflect the specific needs of the app. When a vendor thinks about these details, they can build better software that is easier to maintain. By making sound technical decisions as a regular part of the development process - and by verifying that the terms of service reflect actual practice - vendors can have confidence that they understand their product, and how it runs.

Data transfers

This issues with data collection and retention are highlighted by how data will be treated in case of a merger or an acquisition.

(d) in the event that Character Lab goes through a business transition, such as a merger, divestiture, acquisition, liquidation or sale of all or a portion of its assets, your Information will, in most instances, be part of the assets transferred;

This provision creates the very real possibility that data can be sold or transferred as part of a larger deal. This is a very problematic clause. As we saw with ConnectEdu and Corinthian (where student data was included in a sale to a student loan collection agency), these sales happen. Given the rate of churn in the education technology space, terms that allow student data to be sold or transferred create significant risk that data will be used in a range of ways that are completely unrelated to the stated goals of Character Lab.

The ability to transfer data, paired with the data that can be collected, could be mitigated to an extent by a good deletion policy. However, Character Lab does not deliver on that either.

Please note that certain Information may remain in the possession of Character Lab after your account has been terminated. Character Lab reserves the right to use your Information in any aggregated data collection after you have terminated your Account, however Character Lab will ensure that the use of such Information will not identify you personally.

When data is deleted, it should be deleted, full stop. Given that Character Lab claims the right to collect browser plugins or device ids - either of which can be used to precisely identify an individual - the claim that they will ensure that their data set won't identify you personally rings hollow.

This problem is exacerbated because the terms contain no language banning recombination with other datasets.

To be clear, the reason that they include this claim over deleted data is to support research. However, they could support their research needs and respect user intent by specifying that they will delete all user data, and not incorporate that data into aggregate data sets moving forward, but that any data used in aggregate data sets created before the data was deleted will not be affected.

Their provisions here would also be less problematic if the app minimized data collection, as outlined above.

Changes to terms

Finally, this app contains the poison pill for terms of service.

Character Lab shall have the right to modify these Terms at any time, which modification shall be effective upon posting the new Terms on the Terms of Use page of the Site. We recommend that you check the Site regularly for any such changes. Your use of the Character Lab Service following such posting shall be deemed to constitute your acceptance of such modification. The Terms may not otherwise be changed or modified.

The ability to change terms with no notice is always problematic, but it is especially problematic given that this site contains student information, and that the site has limited the ability of people to fully delete their information.

If terms are substantially modified, users should be notified via email, and via notice on the site - ideally as a banner, and as added text on the login page. The updated terms should also be posted for a specified period (generally around 30 days) before they become active.

Closing

The issues outlined here are a summary - there are other things in these terms that could be improved, but in the interests of brevity I kept a narrow focus.

These terms have issues that appear frequently across many terms in both educational and consumer technology. My sense in reading these terms is that the terms of using the service have drifted from the intent of the people creating the service. This is a common issue - building an app and releasing it into the world is a lot of work, and it's easy to overlook the need to clarify the terms of service. Imprecise or poorly written terms are rarely a sign of bad intent.

However, given that the terms provide the legal basis and rights of both vendor and users of a service, getting them right is essential. For a vendor, ensuring that the terms align with the practice and intent of the application is a very practical way to ensure that you have organizational clarity about the goals of your organization, and the role technology plays in reaching them.

Encryption, Privacy, and Security

9 min read

In conversations about student data privacy, the terms "encryption," "security," and "privacy" are often used interchangeably. While these terms are related, they ultimately are distinct concepts. In this post, we will break down how these terms overlap with each other, and how they are distinct.

But at the outset, I need to emphasize that this post will be incomplete - a comprehensive treatment of these terms and the distinctions between them would be a good subject for a book. Details will be left out. If you're not okay with that, feel free to stop reading now. I imagine that the Kardashians are up to something curious or interesting - feel free to check that out.

As is hopefully obvious by now, this post is not intended to be comprehensive. This post is intended to provide a starting point for people looking to learn more about these concepts.

Privacy

Privacy is arguably the least technical element in this conversation. There are two facets to privacy we will highlight here:

  • It's possible to have great security and bad privacy practices; and
  • We often speak about "privacy" without clarifying "private from whom."

Great security and bad privacy

A vendor can go to extreme lengths to make sure that data can only be accessed by the vendor, or the partners of the vendor. However, if the vendor reserves the right to sell your data to whomever they want, whenever they want, that's not great for your privacy. The ways that vendors can use the data they acquire from you are generally spelled out in their terms of service - so, if a vendor reserves rights to share and reuse your data in their terms, and you agree to those terms, you have given the vendor both data, and the permission to use that data.

There are many vendors who have solid security paired with privacy policies and data usage practices that compromise user privacy.

Who is that private from, really?

Different people think of different things when we say the word "private" - in most cases, when we think about privacy, we focus on things we don't want other people to know. When we are working with technology, though the concept of "other people" gets abstract and impersonal pretty quickly.

When we use services that store a record of what we have done (and it's worth noting that "doing" means read, said, searched for, liked, shared, moused over, and how long we have done any of these things), the "private" things we do are handed over to systems that have a perfect memory. This changes the nature of what "private" can mean. For the purposes of this post, we'll use four different categories of people who might be interested in us over time, and how that impacts our privacy.

  • Criminal - these are the folks people agree about the most: the people stealing data, perpetrating identity theft, and using a range of attacks to get unauthorized access to data with bad intent.
  • Personal - there is also large agreement about personal privacy. We can all agree that we don't want Great Uncle Wilfred to know about our dating life, or to talk about it during Thanksgiving. The ability to control which of our acquaintances knows what is something we all want.
  • Corporate - there is less agreement here, as one person's desire for privacy often runs counter to a data broker's or a marketers business plan. But, when using a service like Facebook, Instagram, Twitter, Snapchat, Pinterest, etc, the "privacy settings" provided by the vendor might offer a degree of personal privacy, but they do nothing to prevent the vendor from knowing, storing, and profiting from everything you do online. This often includes tracking you all over the web (via cookies and local shared objects), in real life (via location information collected via a mobile app), or from buying additional data about you from a data broker.
  • State - there is also less agreement about what constitutes an appropriate level of protection or freedom from state sponsored surveillance. While people have been aware of the inclination of the state to violate privacy in the name of security and law enforcement throughout history, the Snowden leaks helped create specific clarity about what this looked like in the present day.

(As an aside, the data use practices within politics should possibly be included in this list.)

Many conversations about privacy don't move past considering issues related to criminal activity or personal compromises. However, both corporate and state level data collection and use expose us to risk. As was recently illustrated by the Ashley Madison and the OPM breaches, corporate data collection and state data collection pose criminal and personal risk.

For people looking to learn more about the various factors at play in larger privacy conversations, I strongly recommend Frank Pasquale's recent book, the Black Box Society. The book itself is great, and the footnotes are an incredible source of information.

Security

In very general terms, security can be interpreted to mean how data is protected from unauthorized access and use. Encryption is a part of security, but far from the only part. If a systems administrator leaves his username and password on a post-it note stuck to his monitor, that undercuts the value of encrypting the servers. Human error can result in snafus like W2s for a popular tech startup being emailed to a scammer.

If people email passwords to one another - or store passwords online in a Google Spreadsheet - a system with fantastic technical security can be compromised by a person who has limited technical abilities but who happens to stumble onto the passwords. Phishing and social engineering attacks exploit human judgement to sidestep technical security measures. If a csv file of user information is transferred via Spider Oak and then copied to an unencrypted USB key, the protection provided by secure file transfer is immediately destroyed by storing sensitive information in plain text, on a portable device that is easy to lose. In short, security is the combination of technical and human factors which, taken together, decrease the risk of unauthorized access or use of information.

Encryption is an element of security, but not the only element. It is, however, a big part of the foundation upon which security, and our hopes for privacy, rests.

Encryption

Encryption is often used in general terms, as a monolithic construct, as in: "We need to fight to protect encryption" or "Only criminals need encryption."

However, the general conversation rarely gets into the different ways that information can be encrypted. Additionally, there are differences between encrypting a device (like a hard drive), data within an app, and data in transit between an app and a server or another user.

As an example, all of the following questions look at possible uses of encryption for a standard application: does an application encrypt data at rest on the device where the data is stored? If the application pushes data to a remote server for storage, is the data encrypted while in transit to and from the remote location? If the data is stored at the remote location, is the data encrypted while at the remote location? If the remote location uses multiple servers to support the application, is communication between these servers encrypted?

If the answer to any of these questions is "no" then, arguably, the data is not getting the full benefits of encryption. To further complicate matters, if a vendor encrypts data at rest, and encrypts data moving between servers, and encrypts data moving between servers and applications, but that vendor can still decrypt that data, then there is no guarantee that the benefits of encryption will protect an individual user. When vendors can decrypt the data on their hardware, then the data is only as secure - and the information stored only as private - as the vendor is able or willing to protect that encryption.

True end to end encryption (where the data is encrypted before it leaves the application, is sent via an encrypted connection, and only decrypted at its final destination) is the ideal, but often a vendor will function as a middleman - storing and archiving the data before sending it along to its intended recipient. This is one of many reasons that the encryption debate looks different for vendors that make hardware relative to vendors that build software.

In very general terms, hardware manufacturers fighting for encryption are protecting user data; and it's in the best interest of these manufacturers to protect user data because if hardware vendors fail to protect user data they also lose user trust, and then people won't buy their products.

In equally general terms, many application vendors fighting for encryption have a more complicated position. A small number of vendors have been vocal supporters of encryption for years - these are the small number of vendors who offer true end to end encryption, or who implement encryption where the user, not the vendor, retains control of their keys. However, the ongoing legal battle between Apple and the FBI over encryption has elicited broad support from within the tech community, including companies who use data to power advertising and user profiling. For companies whose business is predicated on access to and use of a large dataset of sensitive user information, strong encryption is essential to their business interests.

In their external communications, they can get a public relations win by advancing the position that they are defending people's right to privacy. Internally, however, encryption protects the biggest asset these companies possess: the data sets they have collected, and the communications they have about their work. This is where the paradox of strong security with questionable privacy practice comes into play: why should encryption give large companies an additional tool to protect the means by which they compromise the privacy of individuals?

And the answer is that, without encryption available to individuals, or small companies, none of us have a chance to enjoy even limited privacy. If we - people with less access to technical and financial resources than the more wealthy or connected - want to have a chance at maintaining our privacy, encryption is one of the tools we must have at our disposal. The fact that it's also useful to companies that make a living by mining our information and - arguably - violating our privacy doesn't change the reality that encryption is essential for the rest of us too.

NOTE: I'd like to thank Jeff Graham for critical feedback on drafts of this piece.

The Privacy Divide

4 min read

One of the questions that arises in privacy work is if or how privacy rights - or access to those rights - play out across economic lines. The answer is complicated, and this post is a messy step into what will almost certainly be an ongoing series of posts on this topic. Both inside and outside education, we often talk about issues related to the digital divide, but we don't often look at a companion issue, the privacy divide.

This post is not intended to be exhaustive, by any means - and please, for people reading this, I'd love to read any resources that would be relevant and help expand the conversation.

There are a range of ways to dig into the conversation within EdTech, but one way to start to look at the issue is to examine how parents are informed of their rights under FERPA. This is an area where more work needs to be done, but even a superficial scan suggests that an awareness of FERPA rights is not evenly distributed.

Leaving FERPA aside, it's worth looking at how content filtering plays out within schools. The quotes that follow are from a post about Securly, but it's broadly applicable to any environment that defaults to filtering.

"From the Securly dashboard, the administrators can see what students have and haven’t been able to access," she explains. "If I want to see what kids are posting on Twitter or Facebook, I can--everything on our Chromebooks gets logged by Securly."

However, for students whose only access is via a school-issued machine, the level of surveillance becomes more pervasive.

"Most of our students are economically disadvantaged, and use our device as their only device," DeLapo explains. "Students take Chromebooks home, and the Securly filters continue there."

This raises some additional questions. Who is more likely to have their activities tracked via social media monitoring? If something gets flagged, who is more likely to have the results passed to law enforcement, rather than a school official?

 These patterns follow the general trends of disproportionate suspension based on race.

What zip codes are more likely to receive the additional scrutiny of predictive policing?

Throughout these conversations, we need to remain aware that the systems in use currently are designed to spot problems. The absence of a problem - or more generally, the lower probability that a problem will eventually exist - creates a lens focused on a spectrum of deficits. The absence of a problem is not the same as something good, and when we used tools explicitly designed to identify and predict problems, they will "work" as designed. In the process of working, of course, they generate more data that will be used as the justification or rationale for future predictions and judgments.

Increasing access and elminating the digital divide need to happen, but people can be given access to different versions of the internet, or access via chokepoints that behave differently. We need look no further than the stunted vision of internet.org or the efforts of major industry players to detroy net neutrality to see how these visions play out.

To be more concrete about this, we can look at how AT&T is charging extra for the right to opt out of some (but not all) ad scanning on some of its Fiber internet access offerings. Julia Angwin has documented the cost - in cash and time - she spent over a year to protect her privacy.

Taking a step to the side, current examples of how data is used show how data analysis fuels bias - from using phone habits to judge lenders, to digital redlining based on online habits, to using data to discriminate in lending.

The digital divide is real, and the need to eliminate it is real. But as move to correct this issue, we need to be very cognizant that not all access is created equal. We can't close the digital divide while opening the privacy divide - this approach would both exacerbate and extend existing issues far into the future.

Public and Private

2 min read

In discussions around digital citizenship and online footprints, the terms "private" and "public" are often misused.

A couple quick notes:

First, the "private" settings on most social media platforms are a joke. Within social media, "private" often means "just visible to my friends", which in turn means "it appears on my friend's timeline." So, even if you are locking down your content to your friends, in many cases this means that your "privacy" is determined by the judgment of your least secure friend.

And, of course, this doesn't even touch the reality that "private" in most social software means that all content is fully visible and accessible to the mining algorithms of the company providing the service, and their affiliates. This, "private" means "invisibly shared with organizations who exist by selling predictions about your behavior for the foreseeable future."

These public/private conversations also don't mention the constant practices of brand monitoring, social media monitoring in education, and social media monitoring by law enforcement. If you are a kid in a school, ask your district and your school board if they use social media monitoring tools, and how you - as a student - can review what the school has collected.

People often also give advice about tagging photos. The process of tagging photos is simple: it's how companies crowdsource improving their facial recognition algorithms. But not tagging photos isn't the same as respecting someone's privacy. Facial recognition works, and social media companies already know who your friends are - because you tell them explicitly with the "friend" mechanisms they provide, and implicitly via who you interact with. Geolocation tracking makes this easier.

So when you upload a picture of you with three of your friends, the algorithms already have a small pool of people who it might be - and they can use facial recognition software to narrow the field. For example, Facebook's facial recognition software is now accurate to the point where it can use other visual cues to identify a person.

So, let's talk about public and private, but let's also be explicit that true privacy is elusive. What frequently gets called privacy is really surveillance that has yet to be noticed.

Portland Public Schools, and a Calendar Built For Adults

4 min read

When talking about school and learning, scheduling and routines are frequently overlooked. A level of stability and predictability can be helpful for both kids and their families. This year, the Portland Public Schools calendar shows how not to build a schedule.

At the outset, I want to highlight that the issues with the calendar have been co-created by the district and the union - much of the choppiness in this calendar are the result of the district meeting requirements in the union contract. But both the district and the union need to come together to fix this.

Right now, we have a calendar that works for adults, but not for kids and their families.

You can verify my tallies against the PPS Calendar.

  • Week 1: Start Thursday, Aug 27th. 2 days.
  • Week 2: Aug 31 to Sept 4. Full week
  • Week 3: Sept 8 to Sept 11. 4 days (Labor Day)
  • Week 4: Sept 14 to 18. 4.5 days; late start on Wednesday.
  • Week 5: Sept 21 to 25. Full week
  • Week 6: Sept 28 to Oct 2nd: Full week
  • Week 7: Oct 5 to Oct 8. 4 days
  • Week 8: Oct 12 to 16. 4.5 days; late start on Wednesday.
  • Week 9: Oct 19-20. 2 days. Conferences for three days.
  • Week 10: Oct 26-29. 4 days. 
  • Week 11: Nov 2 to 6. Full week.
  • Week 12: Nov 9 to 13.  4 days. Veterans Day.
  • Week 13: Nov 16 to 20. Full week.
  • Week 14: Nov 23 to 25. 3 days. Thanksgiving.
  • Week 15: Nov 30 to Dec 4. Full week.
  • Week 16: Dec 7 to Dec 11. Full week.
  • Week 17: Dec 14 to 18. 4.5 days; late start on Wednesday.
  • Week 18: Jan 6 to 8. 3 days (snow days).
  • Week 19: Jan 11 to 15. Full week.
  • Week 20: Jan 19 to Jan 22. 3.5 days; late start on Wednesday and MLK holiday.
  • Week 21: Jan 26 to 29. 4 days.
  • Week 22: Feb 1 to 5. Full week.
  • Week 23: Feb 8 to 11. Full week.
  • Week 24: Feb 16 to 19. 3.5 days; late start on Wednesday and President's Day holiday.
  • Week 25: Feb 22 to 26. Full week.
  • Week 26. Feb 29 to Mar 4. Full week.
  • Week 27: Mar 7 to 11. Full week.
  • Week 28: Mar 14 to 18. 4.5 days; late start on Wednesday.
  • March 21: Spring Break.

In the first 14 weeks - from the start of school to Thanksgiving - PPS students have only 5 complete (aka full) weeks of school, and only two consecutive full weeks. Of the remaining 9 weeks, three of the weeks have three days or less, with the remaining 6 weeks being fractured by holidays and/or late starts. The late starts are especially disruptive - and the burden lands heaviest on working families with younger kids, because these families need to find childcare and transportation to get their kids to school. Anecdotally, late starts are also a culprit in increased absenteeism.

Between Thanksgiving and the winter holiday (also known as peak cold and flu season) we manage to string together two consecutive full weeks. But, as anyone who has worked in a school will tell you, the weeks between Thanksgiving and Winter break can be difficult to do detailed work because they are sandwiched between two significant vacations. Additionally, as anyone who has worked in schools can attest, the week before an extended vacation can be more hectic than usual for a variety of reasons.

In January, the choppiness returns in force, with one full week of school in the entire month.

In February - the 22nd week of the school year - we are finally ready to actually get down to school, with 5 out of 6 weeks being full weeks. For a lot of kids, this is too little, too late. The time when they need this consistency the most is at the beginning of the school year. By February, patterns have been established, and relationships have been formed. While the schedule at PPS probably meets the legal requirements for seat time and calendar days, it would be wonderful to see PPS and the teacher's union come together to put together a calendar that actually works for kids and families.

Do You Like Working On Privacy? So Do We!

1 min read

Common Sense Media is hiring a Privacy Editor. If you want to be part of a team that is working on improving privacy and information security within educational technology, read the job description and apply! We'd love to find someone in Portland, OR, but we also recognize that there are smart, talented people all over the country.

Please, share this with people who you think might be interested.

Some Additional Questions On Data Sharing and Collection

2 min read

Senator Al Franken has sent a letter to Google CEO Sundar Pichai asking some questions about Google's data collection practices. His letter raises some excellent questions, but there are two specific use cases that are left out that also need to be addressed. In a post from May 2015, I described some of the issues with how Google structures their terms for Google Apps for Education; these questions build on that post.

  • If a student is logged into their GAFE account and accesses a non-GAFE service provided by Google, can data collected via the non-GAFE service be shared with ad and data brokers, like Doubleclick or Quantcast?
  • If a student is logged into their GAFE account and accesses a 3rd party service that is integrated with GAFE, what student data can be accessed by that 3rd party vendor?

The focus on targeted advertising is important, but ultimately is too narrow. Targeted advertising is a visible manifestation of data collection and user profiling. While the presence of targeted ads let us know that some services are tracking users, larger potential harms accrue when people are invisibly profiled, with their data being shared with different brokers and combined with data sets from different sources. This should not happen within an educational context where students are legally required to attend.

It's also worth noting that while these specific questions focus on Google, comparable abuses of student data occur in many ecosystems that have a business plan based on 3rd party integrations.

Surveillance, Worst Case Scenarios, and the Winceable Moment

4 min read

In discussing issues related to privacy, people often devolve to trying to identify and define immediate harm and/or a worst case scenario. Both of these lenses are reductive and incomplete. Because data analysis often occurs invisibly to us, via proprietary algorithms that we don't even know are in play, assigning harm can be a matter of informed guesswork and inference. As one example, try explaining how and why your credit score is determined - this algorithmically defined number determines many opportunities we receive or don't receive, yet few of us can say with any certainty how this number is derived. Algorithms aren't neutral - they are a series of human judgments automated in a formula. There isn't any single worst case scenario, and discussions of worst case scenarios risk creating a false vision that there is a single spectrum with "privacy" at one one and some vague "worst case scenario" at the other - and this is not how it works.

The reason privacy matters - and the reason that profiling matters - is that we are seeing increasingly experimental and untested uses of data, especially in the realm of predictive analytics. Products using new statistical methods are used in hiring decisions, lending, mortgage decisions, finance, search, and personalization. The hype is that these new - or "innovative" or "disruptive" - uses of data will help us get more efficient, and push pass the biases of the past. However, this fails in at least two ways: first, algorithms contain the biases of their creators. Second, the performance of these products fails to live up to the hype, which in turn doesn't justify the risk.

Data collected in an educational setting - by definition - is data collected on people in the midst of enormous development, questioning, and growth. If people are doing adolescence right, they will make mistakes, ask questions, break things - all in the name of growth and learning. In the context of, for example, an eighth grade classroom, it all makes sense. But outside that context, it's very different. One of the promises of Big Data and Learning Analytics is that the data sets will be large enough to allow researchers to distill signal from noise, but as noted earlier, the reality fails to live up to the hype.

How many of us have memories of our behavior from high school, middle school, and elementary school that make us wince? Those winceable moments are our data trail. I mentioned earlier that talking about worst case scenarios is an inaccurate frame, and this is why: there is no single data point that, if undone, can "fix" our past. However, data collected from our adolescence is bound to contain things that are inaccurate, temporary, flawed, or confusing - for us, and for people attempting to find patterns.

If people are aware of surveillance, it shifts the way we act. When students are habituated to surveillance from an early age, it has the potential to shift the way we develop. If this data is shared outside of an educational context, it creates the potential that every person attending a public school is fully profiled before they graduate. A commonly overlooked element of this conversation is that profiles never come from a single source - they are assembled and combined from multiple sources. When data collected within an educational context gets combined with data sets collected from social media or our personal browsing history, different stories emerge.

For most people over 30 reading this post, our detailed records begin early in the 21st century, when we were 15 or older. For some kids in school now, their data profile begins when their parents posted their ultrasound on Facebook. While targeted advertising to kids is an immediate concern, at least targeted advertising is visible. Profiling by alogorithm is invisible, and is forever. Requiring students to pay for their public education with the data that will then be used to judge them sells out our kids. We can use data intelligently, but we need to have a candid conversation about what that means.

You Look Unprofessional When You Copy Your Policies

2 min read

Examples like this abound, but here's one. Searching for the exact phrase "resources we make available on our Site. Users may visit our Site anonymously" returns over 15,000 hits.

You can do this yourself by going to the policies on any site, and selecting a random phrase. I recommend selecting a the second half of one sentence and the beginning of the next, to include about 10-15 words total. If you find a typo or a misspelled work, use that word on the off-chance that the mistake has been faithfully plagiarized. If you are searching using Google, be sure to enclose the phrase in quotation marks to only get results using the exact string.

This is a simple, fast test to do on any site you think about using. It takes all of about 30 seconds. While plagiarized terms don't necessarily mean that there are issues with the site, it does indicate that the people behind the site have cut corners and taken shortcuts around privacy issues. Given that many breaches occur due to human error, and that problems we can observe often indicate issues that are hidden from view, plagiarized terms should at least give us pause.

If you are a vendor, do this test on your own terms. Pull ten excerpts at random, and see what comes up. If you discover that, for whatever reason, your terms have been plagiarized and you don't know why, you can then begin to fix the issue.