Privacy Postcard: Starbucks Mobile App

2 min read

For more information about Privacy Postcards, read this post.

General Information

App permissions

The Starbucks app has permissions to read your contacts, and to get network location and location from GPS.

Starbucks app permissions

Access contacts

The application permissions indicate that the app can access contacts, and this is reinforced in the privacy policy.

600

Law enforcement

Starbucks terms specify that they will share data if sharing the information is required by law, or if sharing information helps protect Starbuck's rights.

Starbucks law enforcement

Location information and Device IDs

Starbucks can use location as part of a broader user profile.

Starbucks collects location info

Data Combined from External Sources

The terms specify that Starbucks can collect, store, and use information about you from multiple sources, including other companies.

Starbucks data collection

Third Party Collection

The terms state that Starbucks can allow third parties to collect device and location information.

Third party

Social Sharing or Login

The terms state that Starbucks facilitates tracking across multiple services.

Social sharing

Summary of Risk

The Starbucks mobile app has several problematic areas. Individually, they would all be grounds for concern. Collectively, they show a clear lack of regard for the privacy of people who use the Starbucks app. The fact that the service harvests contacts, and harvests location information, and allows selected information to be used by third parties to profile people creates significant privacy risk.

People shouldn't have to sell out their contact list and share their physical location to get a cup of coffee. I love coffee as much as the next person, but avoid the app (and maybe go to a local coffee shop), pay cash, and tip the barista well.

Privacy Postcards, or Poison Pill Privacy

10 min read

NOTE: While this is obvious to most people, I am restating this here for additional emphasis: this is my personal blog, and only represents my personal opinions. In this space, I am only writing for myself. END NOTE.

I am going to begin this post with a shocking, outrageous, hyperbolic statement: privacy policies are difficult to read.

Shocking. I know. Take a moment to pull yourself up from the fainting couch. Even Facebook doesn't read all the necessary terms. Policies are dense, difficult to parse, and in many cases appear to be overwhelming by design.

When evaluating a piece of technology, "regular" people want an answer to one simple question: how will this app or service impact my privacy?

It's a reasonable question, and this process is designed to make it easier to get an answer to that question. When we evaluate the potential privacy risks of a service, good practice can often be undone by a single bad practice, so the art of assessing risk is often the art of searching for the poison pill.

To highight that this process is both not comprehensive and focused on surfacing risks, I'm calling this process Privacy Postcards, or Poison Pill Privacy - it is not designed to be comprehensive, at all. Instead, it is designed to highlight potential problem areas that impact privacy. It's also designed to be straightforward enough that anyone can do this. Various privacy concerns are broken down, and include keywords that can be used to find relevant text in the policies.

To see an example of what this looks like in action, check out this example. The rest of this post explains the rationale behind the process.

If anyone reading this works in K12 education and you want to use this with students as part of media literacy, please let me know. I'd love to support this process, or just hear how it went and how the process could be improved

1. The Process

Application/Service

Collect some general information about the service under evaluation.

  • Name of Service:
  • Android App
  • Privacy Policy url:
  • Policy Effective Date:

App permissions

Pull a screenshot of selected app permissions from the Google Play store. The iOS store from Apple does not support the transparency that is implemented in the Google Play store. If the service being evaluated does not have a mobile app, or only has an iOS version, skip this step.

The listing of app permissions is useful because it highlights some of the information that the service collects. The listing of app permissions is not a complete list of what the service collects, nor does it provide insight into how the information is used, shared, or sold. However, the breakdown of app permissions is a good tool to use to get a snapshot of how well or poorly the service limits data collection to just what is needed to deliver the service.

Access contacts

Accessing contacts from a phone or address book is one way that we can compromise our own privacy, and the privacy of our friends, family, and colleagues. This can be especially true for people who work in jobs where they have access to sensitive information or priviliged information. For example, if a therapist had contact information of patients stored in their phone and that information was harvested by an app, that could potentially compromise the privacy of the therapist's clients.

When looking at if or how contacts are accessed, it's useful to cross-reference what the app permissions tell us against what the privacy policy tells us. For example, if the app permissions state that the app can access contacts and the privacy policy says nothing about how contacts are protected, that's a sign that the privacy policy could have areas that are incomplete and/or inadequate.

Keywords: contact, friend, list, access

Law enforcement

Virtually every service in the US needs to comply with law enforcement requests, should they come in. However, the languaga that a service uses about how they comply with law enforcement requests can tell us a lot about how a service's posture around protecting user privacy.

Additionally, is a service has no language in their terms about how they respond to law enforcement or other legal requests, that can be an indicator that the terms have other areas where the terms are incomplete and/or inadequate.

Keywords: legal, law enforcement, comply

Location information and Device IDs

As individual data elements, both a physical location and a device ID are sensitive pieces of information. It's also worth noting that there are multiple ways to get location information, and different ways of identifying an individual device. The easiest way to get precise location information is via the GPS functionality in mobile devices. However, IP addresses can also be mapped to specific locations, and a string of IP addresses (ie, what someone would get if they connected to a wireless network at their house, a local coffee shop, and a library) can give a sense of someone's movement over time.

Device IDs are unique identifiers, and every phone or tablet has multiple IDs that are unique to the device. Additionally, browser fingerprinting can be used on its own or alongside other IDs to precisely identify an individual.

The combination of a device ID and location provides the grail for data brokers and other trackers, such as advertisers: the ability to tie online and offline behavior to a specific identity. Once a data broker knows that a person with a specific device goes to a set of specific locations, they can use that information to refine what they know about a person. In this way, data collectors build and maintain profiles over time.

Keywords: location, zip, postal, identifier, browser, device, ID, street, address

Data Combined from External Sources

As noted above, if a data broker can use a device ID and location information to tie a person to a location, they can then combine information from external sources to create a more thorough profile about a person, and that person's colleagues, friends, and families.

We can see examples of data recombination in how Experian sorts humans into classes: data recombination helps them identify and distinguish their "Picture Perfect Families" from the "Stock cars and State Parks" and the "Urban Survivors" and the "Small Towns Shallow Pockets".

And yes, the company combining this data and making these classifications is the same company that sold data to an identity thief and was responsible for a breach affecting 15 million people. Data recombination matters, and device identifiers within data sets allow companies to connect disparate data sources into a larger, more coherent profile.

Keywords: combine, enhance, augment, source

Third Party Collection

If a service allows third parties to collect data from users of the service, that creates an opportunity for each of these third parties to get information about people in the ways that we have described above. Third parties can access a range of information (such as device IDs, browser fingerprints, and browsing histories) about users on a service, and frequently, there is no practical way for people using a service to know what third parties are collecting information, or how these third parties will use it.

Additionally, third parties can also combine data from multiple sources.

Keywords: third, third party, external, partner, affiliate

Social Sharing or Login

Social Sharing or Login, when viewed through a privacy lens, should be seen as a specialized form of third party data collection. With social login, however, information about a person can be exchanged between the two services, or taken from one service.

Social login and social sharing features (like the Facebook "like" button, a "Pin it" link, or a "Share on Twitter" link) can send tracking information back to the home sites, even if the share never happens. Solutions like this option from Heise highlight how this privacy issue can be addressed.

Keywords: login, external, social, share, sharing

Education-specific Language

This category only makes sense on services that are used in educational contexts. For services that are only used in a consumer context, this section might be superfluous.

As noted below, I'm including COPPA in the list of keywords here even though COPPA is a consumer law. Because COPPA (in the US) is focused on children under 13, there are times when COPPA connects with educational settings.

Keywords: parent, teacher, student, school, , family, education, FERPA, child, COPPA

Other

Because this list of concerns is incomplete, and there are other problematic areas, we need a place to highlight these concerns if and when they come up. When I use this structure, I will use this section to highlight interesting elements within the terms that don't fit into the other sections.

If, however, there are elements in the other sections that are especially problematic, I probably won't spend the time on this section.

Summary of Risk

This section is used to summarize the types of privacy risks associated with the service. As with this entire process, the goal here is not to be comprehensive. Rather, this section highlights potential risk, and whether those risks are in line with what a service does. IE, if a service collects location information, how is that information both protected from unwarranted use by third parties and used to benefit the user?

2. Closing Notes

At the risk of repeating myself unnecessarily, this process is not intended to be comprehensive.

The only goal here is to streamline the process of identify and describing poison pills buried in privacy policies. This method of evaluation is not thorough. It will not capture every detail. It will even miss problems. But, it will catch a lot of things as well. In a world where nothing is perfect, this process will hopefully prove useful.

The categories listed here all define different ways that data can be collected and used. One of the categories explicitly left out of the Privacy Postcard is data deletion. This is not an oversight; this is an intentional choice. Deletion is not well understood, and actual deletion is easier to do in theory than in practice. This is a longer conversation, but the main reason that I am leaving deletion out of the categories I include here is that data deletion generally doesn't touch any data collected by third party adtech allowed on a service. Because of this, assurances about data deletion can often create more confusion. The remedy to this, of course, is for a service to not use any third party adtech, and to have strict contractual requirements with any third party services (like analytics providers) that restrict data use. Many educational software providers already do this, and it would be great to see this adopted more broadly within the tech industry at large.

The ongoing voyage of MySpace data - sold to an adtech company in 2011, re-sold in 2016, and breached in 2016 - highlights that data that is collected and not deleted can have a long shelf life, completely outside the context in which it was originally collected.

For those who want to use this structure to create your own Privacy Postcards, I have created a skeleton structure on Github. Please, feel free to clone this, copy it, modify it, and make it your own.

Facebook, Cambridge Analytica, Privacy, and Informed Consent

4 min read

There has been a significant amount of coverage and commentary on the new revelations about Cambridge Analytica and Facebook, and how Facebook's default settings were exploited to allow personal information about 50 million people to be exfiltrated from Facebook.

There are a lot of details to this story - if I ever have the time (unlikely), I'd love to write about many of them in more detail. I discussed a few of them in this thread over on Twitter. But as we digest this story, we need to move past the focus on the Trump campaign and Brexit. This story has implications for privacy and our political systems moving forward, and we need to understand them in this broader context.

But for this post, I want to focus on two things that are easy to overlook in this story: informed consent, and how small design decisions that don't respect user privacy allow large numbers of people -- and the systems we rely on -- to be exploited en masse.

The following quote is from a NY Times article - the added emphasis is mine:

Dr. Kogan built his own app and in June 2014 began harvesting data for Cambridge Analytica. The business covered the costs — more than $800,000 — and allowed him to keep a copy for his own research, according to company emails and financial records.

All he divulged to Facebook, and to users in fine print, was that he was collecting information for academic purposes, the social network said. It did not verify his claim. Dr. Kogan declined to provide details of what happened, citing nondisclosure agreements with Facebook and Cambridge Analytica, though he maintained that his program was “a very standard vanilla Facebook app.”

He ultimately provided over 50 million raw profiles to the firm, Mr. Wylie said, a number confirmed by a company email and a former colleague. Of those, roughly 30 million — a number previously reported by The Intercept — contained enough information, including places of residence, that the company could match users to other records and build psychographic profiles. Only about 270,000 users — those who participated in the survey — had consented to having their data harvested.

The first highlighted quotation gets at what passes for informed consent. However, in this case, for people to make informed consent, they had to understand two things, neither of which are obvious or accessible: first, they had to read the terms of service for the app and understand how their information could be used and shared. But second -- and more importantly -- the people who took the quiz needed to understand that by taking the quiz, they were also sharing personal information of all their "friends" on Facebook, as permitted and described in Facebook's terms. This was a clearly documented feature available to app developers that wasn't modified until 2015. I wrote about this privacy flaw in 2009 (as did many other people over the years). But, this was definitely insider knowledge, and the expectation that a person getting paid three dollars to take an online quiz (for the Cambridge Analytica research) would read two sets of dense legalese as part of informed consent is unrealistic.

As reported in the NYT and quoted above, only 270,000 people took the quiz for Cambridge Analytica - yet these 270,000 people exposed 50,000,000 people via their "friends" settings. This is what happens when we fail to design for privacy protections. To state this another way, this is what happens when we design systems to support harvesting information for companies, as opposed to protecting information for users.

Facebook worked as designed here, and this design allowed the uninformed decisions of 270,000 people to create a dataset that potentially undermined our democracy.

Filter Bubbles and Privacy, and the Myth of the Privacy Setting

6 min read

When discussing information literacy, we often ignore the role of pervasive online tracking. In this post, we will lay out the connections between accessing accurate information, tracking, and privacy. We will use Twitter an as explicit example. However, while Twitter provides a convenient example, the general principles we lay out here are applicable across the web.

Major online platforms "personalize" the content we see on them. Everything from Amazon's shopping recommendations to Facebook's News Feed to our timelines on Twitter are controlled by algorithms. This "personalization" uses information that these companies have collected about us to present us with an experience that is designed to have us behave in a way that aligns with the company's interests. And we need to be clear on this: personalization is often sold as "showing people more relevant information" but that definition is incomplete. Personalization isn't done for the people using a product; it's done to further the needs of the company offering the product. To the extent that personalization shows people "more relevant information," this information furthers the goals of the company first, and the needs of users second.

Personalization requires that companies collect, store, and analyze information about us. Personalization also requires that we are compared against other people. This process begins with data collection about us -- what we read, what we click on, what we hover over, what we share, what we "like", sites we visit, our location, who we connect with, who we converse with, what we buy, what we search for, the devices we use, etc. This information is collected in many ways, but some of the more visible methods companies use to get this information is via cookies that are set by ad networks, or social share icons. Of course, every social network (Facebook, Instagram, Twitter, Pinterest, Musical.ly, etc) collects this information from you directly when you spend time on their sites.

The web, flipping us the bird

When you see social sharing icons, know that when a site flips you the bird,  your browsing information is being widely shared with these companies and other ad brokers.

This core information collected by sites can be combined with information from other sources. Many companies explicitly claim this right in their terms of service. For example, Voxer's terms claim this right using this language:

Information We May Receive From Third Parties. We may collect information about you from other Product users, such as when a friend provides friend details or contact information, or indicates a relationship with you. If you authorize the activity, Facebook may share with us certain approved data, which may include your profile information, your image and your list of friends, their profile information and their images.

By combining information from other sources, companies can have information about us that includes our educational background, employment history, where we live, voting records, any criminal justice information from parking tickets to arrests to felonies, in addition to our browsing histories. With these datasets, companies can sort us into multiple demographics, which they can then use to compare us against other people pulled from other demographics.

In very general terms, this is how targeted advertising, content recommendation, shopping recommendation, and other forms of personalization all work. Collect a data set, mine it for patterns and the probablity that these patterns are significant and meaningful. Computers make math cheap, so this process can be repeated and refined as needed.

However, while the algorithms can churn nearly indefinitely, they need data and interaction to continue to have relevance. In this way, algorithms can be compared to the annoying office mate with pointless gossip and an incessant need to publicly overshare: they derive value from their audience.

And we are the audience.

Twitter's "Personalization and Data" settings provice a great example of how this works. As we state earlier, while Twitter provides this example, they are not unique. The settings shown in the screenshot below highlight some of the data that is collected, and how this information is used. The screenshot also highlights how, on social media, there is no such thing as a privacy setting. What they give us is a visibility setting -- while we have minimal control over what we might see, nothing is private from the company that offers the service.

Twitter's personalization settings

From looking at this page, we can see that Twitter can collect a broad range of information that has nothing to do with the core functionality of Twitter, and everything to do with creating profiles about us. For example, why would Twitter need to know the other apps on our devices to allow us to share 140 character text snippets?

Twitter is also clear that regardless of what we see here, they will personalize information for us. If we use Twitter, we only have the option to play by their rules (to the extent that they enforce them, of course):

Twitter always uses some information, like where you signed up and your current location, to help show you more relevant content.

What this explanation leaves out, of course, is for whom the content is most relevant: the person reading it, or Twitter. Remember: their platform, their business, their needs.

But when we look at the options on this page, we also need to realize that the data they collect in the name of personalization is where our filter bubbles begin. A best-case definition of "relevant content" is "information they think we are most interested in." However, a key goal of many corporate social sites is to make it more difficult to leave. In design, dark patterns are used to get people to act against their best interest. Creating feeds of "relevant content" -- or more accurately, suppressing information according to the dictates of an algorithm -- can be understood as a dark information pattern. "Relevant content" might be what is most likely to keep us on a site, but it probably won't have much overlap with information that challenges our bias, breaks our assumptions, or broadens our world.

The fact that our personal information is used to narrow the information we encounter only adds insult to injury.

We can counter this, but it takes work. Some easier steps include:

  • Using ad blockers and javascript blockers (uBlock Origin and Privacy Badger are highly recommended as ad blockers. For javascript blockers, try Scriptsafe for Chrome and NoScript for Firefox ).
  • Clear your browser cookies regularly.
  • When searching or doing other research, use Tor and/or a VPN.

These steps will help minimize the amount of data that companies can collect and use, but they don't eliminate the problem. The root of the problem lies in information assymetry: companies know more about us than we know about them, and this gap increases over time. However, privacy and information literacy are directly related issues. The more we safeguard our personal information, the more freedom we have from filter bubbles.

 

Privacy and Security Exercise

2 min read

Do this exercise with your phone, tablet, and/or any computer you use regularly.

Imagine that someone has accessed your device and can log in and access all information on the device.

  • If they were a thief, what information could they access about you?
  • If they were a blackmailer, what information could they access about you?
  • What information could they access about your friends, family, or professional contacts?
  • If you work as a teacher, counselor, consultant, or other type of advisor: what information could someone glean about the people you work with?

As you do this exercise, be sure to look at all apps (on a phone or tablet), online accounts accessible via a web browser, address books, and ways that any of this information could be cross referenced or combined. For example, what information could be accessed about people you "know" via social media accounts?

  • What steps can you take to protect this information?
  • Assuming that someone you know has comparable information about you, what steps would you want them to take?

Are there differences between the steps you could take, and the steps you would want someone else to take? What accounts for those differences?

When it comes to protecting information, we are connected. At some level, we are as private and secure as our least private and secure friend.

Targeted Ads Compromising Privacy in Healthcare

2 min read

For a current example of how and why privacy matters, we need look no further than the practices of a company that uses "mobile geo fencing and IP targeting services" to target people with ads.

In this specific case, the company is targeting ads to women inside Planned Parenthood clinics with anti-choice materials. The anti-choice messaging - euphemestically referred to as "pregnancy help" - gets delivered to women who enter selected health clinics.

"'We can set up a mobile geo fence around an area—Planned Parenthood clinic, hospitals, doctor's offices that perform abortions,' Flynn said. 'When a smartphone user enters the geo fence, we tag their smartphone's ID. When the user opens an app [the ad] appears.'"

Let's stop pretending that a phone ID isn't personal information. This is how data can be used to compromise people's privacy. We should also note that the anti-choice groups are now clearly in the business of harvesting personal information about women who visit health clinics, and who knows what they are doing with that information. With the device ID in hand, they can easily combine that dataset with data from any of the big data brokers and get detailed profiles of the people they are targeting.

This is how private institutions target and exploit individuals. However, they are using techniques adopted by advertisers and political campaigns.

Tech isn't neutral. Whenever you hear talk of "place based advertising", this is what we are talking about.

Building Consensus for Privacy and Security

1 min read

I had the pleasure to present at ATLIS on April 19, 2016, in Atlanta. The conversation covered different facets of privacy, and how to evaluate the different attitudes toward privacy and security in schools.

One element in the conversation that we sped over involved some simple browser-based tools that highlight third party trackers. The example I used highlighted two news sites (Huffington Post and the NY Times), but the process works just as well with educational technology apps: enable Lightbeam, log in to an edtech site, and see what loads.

The full presentation is available below.

Terms of Service and Privacy Policies at CharacterLab

10 min read

I probably spend more time than recommended browsing the web and reading privacy policies and terms of service, looking for patterns. When I encounter a new app, the first thing I do is find the terms and read them. Terms are useful in a range of ways. First, what they say matters. Second, how they say it can provide insight into the service, and how the company views themselves. Third, terms can indicate the business plan (or possible business plans) of a company. Finally, the degree to which the terms align (or not) with the product can indicate how coherent the planning within a company has been. There are other elements we can glean from terms, but the points outlined here are some of the more common items that can be inferred from terms.

Last week, I encountered the terms of service at characterlab.org. They offer an application to support character growth. The terms discussed in this post were updated in August; 2015. I downloaded an archive version this morning (April 4, 2016).

The target audience of Character Lab is teachers, but they also get information about children (to set up accounts) and from children (once accounts have been set up). 

Account Creation and Parental Consent

In the process defined by the terms and reinforced via their user interface, teachers create accounts for students.

The information we collect varies based upon the type of user you are.
(i) Teachers: In order to use the Service, you will need to register for an account. In order to register, we will collect your name, login name, and institution you are associated with, grade level, years of experience, along with your telephone number and email address.
(ii) Students: Students will not be asked to provide Information. Teachers will create accounts for students by providing their name. Students and teachers will both input information related to student character assessment tests and other Services-related functions.

In the terms, parental consent is mentioned, but only in passing, in the "Eligibility" section:

You must be at least 18 years old, an emancipated minor, or possess legal parental or guardian consent, and be fully able and competent to enter into and abide by these Terms to access the Service. If you are under 13 years of age, you may only access the Service with the express permission of your legal parent or guardian.

Given the account creation workflow in place with this site, a teacher is binding a student to these terms, potentially without any parental consent. In the case of a student under the age of 13, the way the eligibility terms are written ("If you are under 13 years of age, you may only access the Service with the express permission of your legal parent or guardian.") the onus for understanding and obtaining the need for parental consent appears to be on the student, who my or may not be aware that the terms exist, and who has no role setting up their account.

At the very least, the terms should require that the teacher or school creating student accounts obtain and maintain verifiable parental consent.

A suggestion for vendors looking to avoid this circular setup: read your terms from the perspective of each of your target users. If likely scenarios exist where a person would have data in your system before that person had any opportunity to interact with your system, you should consider revising your terms, your onboarding process, or both.

Grammar Counts

From the "Protecting Children's Information" section, we are given text that fails to meet basic standards for clarity.

If you are a student, please note that your parent can view request a copy of your character report, and any and all other information associated with you, on this Site, including without limitation messages between. If you are a parent, you may request a copy of your child's character report (whether self-reported or reported by any and all other information associated with your child) on this Site by either submitting an email request to Character Lab at cgc@characterlab.org.

A couple things jump out here: first, as highlighted above, students play no role in creating their account, so the chances they would be informed that parents can request a copy via these terms is slim. Second, both sentences in the "Protecting Children’s Information" section contain grammatical errors and word omissions that make them less than comprehensible.

If you are putting out an application that collects data, read your terms. Have a good editor read your terms. Have a good lawyer read your terms. Have your lead developer read your terms. If you are the company founder, read your terms. If terms contain basic grammatical errors, or sentences riddled with omissions, it raises the question: in how many other places do similar weaknesses exist?

Data collection and minimization

In looking at the data that is collected, several areas exist where the terms claim the right to collect more data than is needed to run the service.

Your browser type, language, plug-ins, Internet domain and operating system;

This service has no need to collect information about browser plugins. Collecting this information is a component of browser fingerprinting, which is a precise method of tying a specific browser to a specific machine - which can often lead to uniquely identifying a person without collecting data traditionally considered Personally Identifiable Information (or PII). Additionally, tracking "Internet domain" seems excessive as well. While the term is pretty vague, one common definition could mean that the service tracks the domains from which requests originate, so the vendor would know if someone was connecting from the network of a specfic school or university. This information replicates a lot of what can be inferred from collecting an IP address (which characterlab.org also connects), but connecting an IP address to a domain seems unnecessary - especially because teachers are required to state a school affiliation when they register.

Moving on, the terms also claim the rights to collect and store device IDs and physical location.

Unique identifiers, including mobile device identification numbers, that may identify the physical location of such devices;

This service does not require a device ID or physical location to run. If they actually collect and retain this information, it creates a much more valuable dataset that could be compromised via a data breach or human error.

If this data is actually needed to run the application, then the terms need to clarify how and why it is used. I suspect that this is an example of something we see pretty regularly: the terms are out of sync with what the app actually does. CharacterLab is not alone in claiming the rights to obtain device IDs. Many other EdTech companies do this. While it is easy to get a device ID, it is generally not necessary, and many EdTech companies could eliminate this practice with no negative effect on their service.

Data collection and retention should be minimized to reflect the specific needs of the app. When a vendor thinks about these details, they can build better software that is easier to maintain. By making sound technical decisions as a regular part of the development process - and by verifying that the terms of service reflect actual practice - vendors can have confidence that they understand their product, and how it runs.

Data transfers

This issues with data collection and retention are highlighted by how data will be treated in case of a merger or an acquisition.

(d) in the event that Character Lab goes through a business transition, such as a merger, divestiture, acquisition, liquidation or sale of all or a portion of its assets, your Information will, in most instances, be part of the assets transferred;

This provision creates the very real possibility that data can be sold or transferred as part of a larger deal. This is a very problematic clause. As we saw with ConnectEdu and Corinthian (where student data was included in a sale to a student loan collection agency), these sales happen. Given the rate of churn in the education technology space, terms that allow student data to be sold or transferred create significant risk that data will be used in a range of ways that are completely unrelated to the stated goals of Character Lab.

The ability to transfer data, paired with the data that can be collected, could be mitigated to an extent by a good deletion policy. However, Character Lab does not deliver on that either.

Please note that certain Information may remain in the possession of Character Lab after your account has been terminated. Character Lab reserves the right to use your Information in any aggregated data collection after you have terminated your Account, however Character Lab will ensure that the use of such Information will not identify you personally.

When data is deleted, it should be deleted, full stop. Given that Character Lab claims the right to collect browser plugins or device ids - either of which can be used to precisely identify an individual - the claim that they will ensure that their data set won't identify you personally rings hollow.

This problem is exacerbated because the terms contain no language banning recombination with other datasets.

To be clear, the reason that they include this claim over deleted data is to support research. However, they could support their research needs and respect user intent by specifying that they will delete all user data, and not incorporate that data into aggregate data sets moving forward, but that any data used in aggregate data sets created before the data was deleted will not be affected.

Their provisions here would also be less problematic if the app minimized data collection, as outlined above.

Changes to terms

Finally, this app contains the poison pill for terms of service.

Character Lab shall have the right to modify these Terms at any time, which modification shall be effective upon posting the new Terms on the Terms of Use page of the Site. We recommend that you check the Site regularly for any such changes. Your use of the Character Lab Service following such posting shall be deemed to constitute your acceptance of such modification. The Terms may not otherwise be changed or modified.

The ability to change terms with no notice is always problematic, but it is especially problematic given that this site contains student information, and that the site has limited the ability of people to fully delete their information.

If terms are substantially modified, users should be notified via email, and via notice on the site - ideally as a banner, and as added text on the login page. The updated terms should also be posted for a specified period (generally around 30 days) before they become active.

Closing

The issues outlined here are a summary - there are other things in these terms that could be improved, but in the interests of brevity I kept a narrow focus.

These terms have issues that appear frequently across many terms in both educational and consumer technology. My sense in reading these terms is that the terms of using the service have drifted from the intent of the people creating the service. This is a common issue - building an app and releasing it into the world is a lot of work, and it's easy to overlook the need to clarify the terms of service. Imprecise or poorly written terms are rarely a sign of bad intent.

However, given that the terms provide the legal basis and rights of both vendor and users of a service, getting them right is essential. For a vendor, ensuring that the terms align with the practice and intent of the application is a very practical way to ensure that you have organizational clarity about the goals of your organization, and the role technology plays in reaching them.

Encryption, Privacy, and Security

9 min read

In conversations about student data privacy, the terms "encryption," "security," and "privacy" are often used interchangeably. While these terms are related, they ultimately are distinct concepts. In this post, we will break down how these terms overlap with each other, and how they are distinct.

But at the outset, I need to emphasize that this post will be incomplete - a comprehensive treatment of these terms and the distinctions between them would be a good subject for a book. Details will be left out. If you're not okay with that, feel free to stop reading now. I imagine that the Kardashians are up to something curious or interesting - feel free to check that out.

As is hopefully obvious by now, this post is not intended to be comprehensive. This post is intended to provide a starting point for people looking to learn more about these concepts.

Privacy

Privacy is arguably the least technical element in this conversation. There are two facets to privacy we will highlight here:

  • It's possible to have great security and bad privacy practices; and
  • We often speak about "privacy" without clarifying "private from whom."

Great security and bad privacy

A vendor can go to extreme lengths to make sure that data can only be accessed by the vendor, or the partners of the vendor. However, if the vendor reserves the right to sell your data to whomever they want, whenever they want, that's not great for your privacy. The ways that vendors can use the data they acquire from you are generally spelled out in their terms of service - so, if a vendor reserves rights to share and reuse your data in their terms, and you agree to those terms, you have given the vendor both data, and the permission to use that data.

There are many vendors who have solid security paired with privacy policies and data usage practices that compromise user privacy.

Who is that private from, really?

Different people think of different things when we say the word "private" - in most cases, when we think about privacy, we focus on things we don't want other people to know. When we are working with technology, though the concept of "other people" gets abstract and impersonal pretty quickly.

When we use services that store a record of what we have done (and it's worth noting that "doing" means read, said, searched for, liked, shared, moused over, and how long we have done any of these things), the "private" things we do are handed over to systems that have a perfect memory. This changes the nature of what "private" can mean. For the purposes of this post, we'll use four different categories of people who might be interested in us over time, and how that impacts our privacy.

  • Criminal - these are the folks people agree about the most: the people stealing data, perpetrating identity theft, and using a range of attacks to get unauthorized access to data with bad intent.
  • Personal - there is also large agreement about personal privacy. We can all agree that we don't want Great Uncle Wilfred to know about our dating life, or to talk about it during Thanksgiving. The ability to control which of our acquaintances knows what is something we all want.
  • Corporate - there is less agreement here, as one person's desire for privacy often runs counter to a data broker's or a marketers business plan. But, when using a service like Facebook, Instagram, Twitter, Snapchat, Pinterest, etc, the "privacy settings" provided by the vendor might offer a degree of personal privacy, but they do nothing to prevent the vendor from knowing, storing, and profiting from everything you do online. This often includes tracking you all over the web (via cookies and local shared objects), in real life (via location information collected via a mobile app), or from buying additional data about you from a data broker.
  • State - there is also less agreement about what constitutes an appropriate level of protection or freedom from state sponsored surveillance. While people have been aware of the inclination of the state to violate privacy in the name of security and law enforcement throughout history, the Snowden leaks helped create specific clarity about what this looked like in the present day.

(As an aside, the data use practices within politics should possibly be included in this list.)

Many conversations about privacy don't move past considering issues related to criminal activity or personal compromises. However, both corporate and state level data collection and use expose us to risk. As was recently illustrated by the Ashley Madison and the OPM breaches, corporate data collection and state data collection pose criminal and personal risk.

For people looking to learn more about the various factors at play in larger privacy conversations, I strongly recommend Frank Pasquale's recent book, the Black Box Society. The book itself is great, and the footnotes are an incredible source of information.

Security

In very general terms, security can be interpreted to mean how data is protected from unauthorized access and use. Encryption is a part of security, but far from the only part. If a systems administrator leaves his username and password on a post-it note stuck to his monitor, that undercuts the value of encrypting the servers. Human error can result in snafus like W2s for a popular tech startup being emailed to a scammer.

If people email passwords to one another - or store passwords online in a Google Spreadsheet - a system with fantastic technical security can be compromised by a person who has limited technical abilities but who happens to stumble onto the passwords. Phishing and social engineering attacks exploit human judgement to sidestep technical security measures. If a csv file of user information is transferred via Spider Oak and then copied to an unencrypted USB key, the protection provided by secure file transfer is immediately destroyed by storing sensitive information in plain text, on a portable device that is easy to lose. In short, security is the combination of technical and human factors which, taken together, decrease the risk of unauthorized access or use of information.

Encryption is an element of security, but not the only element. It is, however, a big part of the foundation upon which security, and our hopes for privacy, rests.

Encryption

Encryption is often used in general terms, as a monolithic construct, as in: "We need to fight to protect encryption" or "Only criminals need encryption."

However, the general conversation rarely gets into the different ways that information can be encrypted. Additionally, there are differences between encrypting a device (like a hard drive), data within an app, and data in transit between an app and a server or another user.

As an example, all of the following questions look at possible uses of encryption for a standard application: does an application encrypt data at rest on the device where the data is stored? If the application pushes data to a remote server for storage, is the data encrypted while in transit to and from the remote location? If the data is stored at the remote location, is the data encrypted while at the remote location? If the remote location uses multiple servers to support the application, is communication between these servers encrypted?

If the answer to any of these questions is "no" then, arguably, the data is not getting the full benefits of encryption. To further complicate matters, if a vendor encrypts data at rest, and encrypts data moving between servers, and encrypts data moving between servers and applications, but that vendor can still decrypt that data, then there is no guarantee that the benefits of encryption will protect an individual user. When vendors can decrypt the data on their hardware, then the data is only as secure - and the information stored only as private - as the vendor is able or willing to protect that encryption.

True end to end encryption (where the data is encrypted before it leaves the application, is sent via an encrypted connection, and only decrypted at its final destination) is the ideal, but often a vendor will function as a middleman - storing and archiving the data before sending it along to its intended recipient. This is one of many reasons that the encryption debate looks different for vendors that make hardware relative to vendors that build software.

In very general terms, hardware manufacturers fighting for encryption are protecting user data; and it's in the best interest of these manufacturers to protect user data because if hardware vendors fail to protect user data they also lose user trust, and then people won't buy their products.

In equally general terms, many application vendors fighting for encryption have a more complicated position. A small number of vendors have been vocal supporters of encryption for years - these are the small number of vendors who offer true end to end encryption, or who implement encryption where the user, not the vendor, retains control of their keys. However, the ongoing legal battle between Apple and the FBI over encryption has elicited broad support from within the tech community, including companies who use data to power advertising and user profiling. For companies whose business is predicated on access to and use of a large dataset of sensitive user information, strong encryption is essential to their business interests.

In their external communications, they can get a public relations win by advancing the position that they are defending people's right to privacy. Internally, however, encryption protects the biggest asset these companies possess: the data sets they have collected, and the communications they have about their work. This is where the paradox of strong security with questionable privacy practice comes into play: why should encryption give large companies an additional tool to protect the means by which they compromise the privacy of individuals?

And the answer is that, without encryption available to individuals, or small companies, none of us have a chance to enjoy even limited privacy. If we - people with less access to technical and financial resources than the more wealthy or connected - want to have a chance at maintaining our privacy, encryption is one of the tools we must have at our disposal. The fact that it's also useful to companies that make a living by mining our information and - arguably - violating our privacy doesn't change the reality that encryption is essential for the rest of us too.

NOTE: I'd like to thank Jeff Graham for critical feedback on drafts of this piece.

The Privacy Divide

4 min read

One of the questions that arises in privacy work is if or how privacy rights - or access to those rights - play out across economic lines. The answer is complicated, and this post is a messy step into what will almost certainly be an ongoing series of posts on this topic. Both inside and outside education, we often talk about issues related to the digital divide, but we don't often look at a companion issue, the privacy divide.

This post is not intended to be exhaustive, by any means - and please, for people reading this, I'd love to read any resources that would be relevant and help expand the conversation.

There are a range of ways to dig into the conversation within EdTech, but one way to start to look at the issue is to examine how parents are informed of their rights under FERPA. This is an area where more work needs to be done, but even a superficial scan suggests that an awareness of FERPA rights is not evenly distributed.

Leaving FERPA aside, it's worth looking at how content filtering plays out within schools. The quotes that follow are from a post about Securly, but it's broadly applicable to any environment that defaults to filtering.

"From the Securly dashboard, the administrators can see what students have and haven’t been able to access," she explains. "If I want to see what kids are posting on Twitter or Facebook, I can--everything on our Chromebooks gets logged by Securly."

However, for students whose only access is via a school-issued machine, the level of surveillance becomes more pervasive.

"Most of our students are economically disadvantaged, and use our device as their only device," DeLapo explains. "Students take Chromebooks home, and the Securly filters continue there."

This raises some additional questions. Who is more likely to have their activities tracked via social media monitoring? If something gets flagged, who is more likely to have the results passed to law enforcement, rather than a school official?

 These patterns follow the general trends of disproportionate suspension based on race.

What zip codes are more likely to receive the additional scrutiny of predictive policing?

Throughout these conversations, we need to remain aware that the systems in use currently are designed to spot problems. The absence of a problem - or more generally, the lower probability that a problem will eventually exist - creates a lens focused on a spectrum of deficits. The absence of a problem is not the same as something good, and when we used tools explicitly designed to identify and predict problems, they will "work" as designed. In the process of working, of course, they generate more data that will be used as the justification or rationale for future predictions and judgments.

Increasing access and elminating the digital divide need to happen, but people can be given access to different versions of the internet, or access via chokepoints that behave differently. We need look no further than the stunted vision of internet.org or the efforts of major industry players to detroy net neutrality to see how these visions play out.

To be more concrete about this, we can look at how AT&T is charging extra for the right to opt out of some (but not all) ad scanning on some of its Fiber internet access offerings. Julia Angwin has documented the cost - in cash and time - she spent over a year to protect her privacy.

Taking a step to the side, current examples of how data is used show how data analysis fuels bias - from using phone habits to judge lenders, to digital redlining based on online habits, to using data to discriminate in lending.

The digital divide is real, and the need to eliminate it is real. But as move to correct this issue, we need to be very cognizant that not all access is created equal. We can't close the digital divide while opening the privacy divide - this approach would both exacerbate and extend existing issues far into the future.