ISPs Can Continue to Collect and Sell All of Our Browsing History, and We'll Never Know

4 min read

Yesterday, on March 28, 2017, Congressional Republicans gave a huge gift to Internet Service Providers (ISPs) by killing rules that would have prevented them from selling our browsing history. Because Congressional Republicans killed these rules, ISPs - companies like Comcast, Verizon, Qwest (aka CenturyLink), AT&T, etc - can continue to sell information about how we browse the web. All browsing we do on the web - from a young child looking for information about dinosaurs, to a teen curious about their sexual identity, to a person reading the news, to a parent looking for medical information, to a person browsing pornography - all of these activities, done inside people's homes, can continue to be tracked and sold to anyone, without our knowledge or consent.

We need to pause here - this is actually as bad as it sounds. If you have kids in your house, their browsing activity can be bundled and sold by your ISP. As their parent, you will never be told that any sale took place, who the buyers are, and how they are using that information. So, the next time your kids are having a playdate, if your kid's friends connect to your internet, your ISP is profiting from the playdate. Thanks to the actions of congressional Republicans, this is universal across the US. Every ISP in the US can continue to do this.

However, this isn't the worst of it. An element that has gone largely undiscussed is how this rule change puts ISPs in a commanding position when it comes to connecting online and offline behavior. Connecting online and offline identity is a leading concern with advertisers - and rest assured, they are looking at this through a racial lens as well. For people who connect to the internet via a phone and a computer, our ISP can now identify both devices as belonging to a specific home. This is incredibly valuable information - and because this information can be shared and sold indiscriminately, it allows for a solid connection to be made between an individual, their home address, their computer, and their phone.

In practical terms, this sets ISPs in a position to be able to track our physical location over time, and predict our location in real time. For all of us who carry smartphones, our phones connect to multiple ISPs over the course of every day - from different cell towers, to coffee shop wireless, to library wireless, to connectivity provided by our school or workplace. If our ISP shares our device information, we can be precisely identified across a range of locations, and a record of our movement can be stored and collected. Location data has been shown to be a strong predictor of identity, but our ISPs are in a position where location data is just a small part of their overall data set

At this point, the only real protection is to use a VPN. However, many VPNs only protect a single device - to protect a home requires setting up a VPN on all devices, or configuring a router to connect to the internet via a VPN. While setting up a router to connect via a VPN is not enormously complicated, it's a significant technical barrier that will definitely be beyond the reach of many consumers.

It's also worth noting that VPNs will only be a realistic alternative if our ISPs don't throttle VPN connections, and reduce their speed to a crawl. Because of actions taken in the FCC under Tom Wheeler, we currently have some protections, but Republicans are also looking to kill net neutrality. This would be bad for a variety of reasons, but would also be another blow to personal privacy.

Adtech and Misinformation: the Middlemen Who Sell to All Sides

18 min read

When we look at adtech, much of the focus falls on either of these two places: the advertisers who sell specific products via online advertising, or the data brokers who package and sell our information. And this is good - data brokers in particular pose a unique threat, and need much more attention. However, online ads get delivered via a network of middlemen that automate and streamline the process. These middlemen are effective - for example, when we read about youth in Macedonia making thousands of dollars a month from political misinformation aimed at US audiences, we need to remember that the profits generated by these sites wouldn't happen without the use of adtech (These middlemen are generally described by those in the advertising industry in jargon-heavy prose. For people looking for a high-level background, this post discusses Supply Side Platforms; this post discusses Demand Side Platforms, and this video describes how ads are bought, targeted, and delivered)

In a speech from January 2017, Randall Rothenberg - the President and CEO of the Interactive Advertising Bureau, or IAB - directly acknowledged the effectiveness of these middlemen, and the role that advertising plays in making misinformation profitable:

As an industry, it is our obligation to again step up. But this time, our goal cannot be merely to fix our supply chain. Our objective isn’t to preserve marketing and advertising. When all information becomes suspect – when it’s not just an ad impression that may be fraudulent, but the data, news, and science that undergird society itself – then we must take civic responsibility for our effect on the world.

Who Shares What

A few weeks back, Kris Shaffer and I began talking about getting a clearer understanding about what people read and share on Twitter, and what that looks like across the political spectrum. We started looking at the stories shared - and the sites they are shared from - to get a sense of patterns. Based on patterns observed in the data Kris collected, I created a list of 25 sites from across the political spectrum, ranging from misinformation targeted to progressives, to left leaning, to mainstream media, to right leaning, to hate sites.

  • Addicting Info
  • AllenBWest
  • Alternet
  • Bipartisan Report
  • Breitbart
  • Daily Caller
  • Daily Stormer
  • Fox News Insider
  • Gateway Pundit
  • Guardian
  • Huffington Post
  • New York Times
  • Newsmax
  • Patriot Post
  • Project Veritas
  • Ralph Retort
  • Reddit
  • RT
  • The Atlantic
  • The Blaze
  • Wall Street Journal
  • Washington Post
  • White Rabbit Radio
  • YouTube
  • ZeroHedge

This list of sites is obviously not exhaustive, and the work here is just a start, but from this initial review, some interesting patterns begin to emerge. Over the entire list of 25 sites, just under 500 different ad tracking domains are called. Of all of these adtech companies, over 60 are used on 10 or more sites. Amazon Adsystem is used on 12 of the 25 sites; 18 of the 25 sites use adtech supplied by Yahoo, and 23 of the 25 sites use Doubleclick, from Google.

To collect information about the URLs called when visiting each site, we set up an intercepting proxy - for these tests, I used OWASP ZAP, an open source tool. I browsed using Firefox, and set up a custom profile to use while testing. Before visiting each site, I removed all browsing history, cache, and cookies. To test each site, I visited the home page, a story linked from the home page, and a second story or page on the site, for a total of three pages per site. Then, using reporting functionality built into ZAP, I exported all the URLs called while visiting each site.

This is the same list of sites, sorted by the number of different domains called when visiting each site. At the risk of getting overly technical, a "single domain" looks at the base path, so if a site made a call to "" and "" that counts as a single domain because of the common base of "":

  • ZeroHedge: 184
  • AllenBWest: 183
  • Daily Caller: 152
  • Gateway Pundit: 142
  • The Blaze: 139
  • Bipartisan Report: 129
  • Huffington Post: 129
  • RT: 121
  • Alternet: 116
  • Breitbart: 87
  • Ralph Retort: 82
  • New York Times: 78
  • Newsmax: 78
  • Addicting Info: 75
  • Washington Post: 71
  • The Atlantic: 68
  • Fox News Insider: 55
  • Wall Street Journal: 49
  • Project Veritas: 20
  • Guardian: 18
  • Reddit: 16
  • White Rabbit Radio: 16
  • Daily Stormer: 12
  • YouTube: 11
  • Patriot Post: 10

The core dataset used for the analysis in this post is available here. At the end of this post, I also include additional details on the trackers used and their affiliations.

When we drill down into individual sites, we see that Daily Stormer and White Rabbit Radio - two far right sites - make use of Doubleclick advertising - owned by Google - to generate ad revenue.

Daily Stormer:


White Rabbit Radio:


It's also worth noting that owners of Daily Stormer and White Rabbit Radio use Google Analytics to understand how people interact with their sites. Daily Stormer and White Rabbit Radio have plenty of company here: Addicting Info,, Alternet, Breitbart, Daily Caller, Fox News Insider, the New York Times, Reddit, and the Atlantic - among others - also use this common infrastructure that is provided by - and sends data to - Google. When we look at the web through the lens of adtech, one thing becomes abundantly clear: adtech vendors sell indiscriminately, across the political spectrum and social spectrum. It's also clear that without the complicity of adtech vendors, sites on the political fringe - both right and left - would have far fewer resources.

The broad use of adtech to generate revenue creates a level of interconnectedness and dependency between content providers and the adtech networks that profit from them. Ads allow content providers to make money, but that money means different things - for both ad networks and content providers. The overhead of a four person content farm publishing dishonest clickbait, or of a right wing blog like Gateway Pundit, or of a left wing blog like Alternet, or a right wing site like Breitbart, is very different than that of the Wall Street Journal, or the New York Times, or the Washington Post. Yet, all of these sites use (as just one example) comScore, a data aggregator and ad exchange.

When Breitbart attempts to undermine the credibility of the Washington Post and the New York Times, the services offered by comScore make that profitable.


When Gateway Pundit attempts to undermine the credibility of the Washington Post and the New York Times, the services offered by comScore make that profitable.

Gateway Pundit

When the Times and the Post cover news, they also attempt to generate revenue via the services of comScore. But, as these publishers fire back at one another, comScore - like its brothers in adtech - generates consistent revenue as a result of the crossfire where comScore arms both sides.

As we look at the sites people read and share on social media, what should we make of the fact that 10 of the 25 sites surveyed (Addicting Info, AllenBWest, Alternet, Bipartisan Report, Daily Stormer, Gateway Pundit, Huffington Post, Newsmax, Patriot Post, and Project Veritas) load this 18,400 line javascript file supplied by Facebook? Should we believe that all of these sites, from all over the political and social spectrum, require this enormous file to function? Have web developers and site owners become lazy and willing to use prefab components without considering the larger consequences?

The pervasive use of the same adtech provided by the same companies to competing sites and political enemies raises some interesting questions.

  • What does it mean that the same individual companies that specialize in data collection, analysis, and reuse are woven into our news and information systems across the ideological spectrum?
  • What does it mean when the act of reading news, or engaging in political activism online, is an observed activity?
  • What does it mean when legitimate news outlets are reliant on a small number of adtech companies for revenue, and these adtech companies sell to anyone, regardless of whether they traffic in hate, deception, or news?
  • Given the higher expenses of doing news well - with editors, paid writers, professional fact checkers - what obligations, if any, does adtech have to police hate speech or propaganda?

Tracking the Trackers

Tracking the companies that profit from selling ads to all sides is complicated by the fact that adtech is highly opaque. If we attempt to visit the URLs that show up in the proxy logs, we are met with a dizzying array of responses, the vast majority of which are completely uninformative. Getting a company name generally requires some or all of these four steps:

Even with these tools, nailing down a specific URL to a specific domain can be time consuming. As an example, the domain is called in 17 out of the 25 sites we surveyed. Visiting their home page shows nothing.

A Whois lookup indicates that the domain has been registered anonymously via GoDaddy, so there is no company information publicly available for the domain.

anon registration

However, we can track the IP address of the domain and do a reverse IP address lookup, which indicates three other sites hosted on the same IP address.

Reverse IP Lookup

A search using the domain names turns up this opt-out page, which confirms that The Trade Desk controls

This is ridiculously opaque. You should not need to know how to do a reverse IP address lookup or a Whois search to know what company is collecting information on you - and this is just for one tracker loaded on one site. The Huffington Post loads upwards of 120 trackers; the Daily Caller loads upwards of 150; loads over 175. If we estimate very low, and assume that we can identify each tracker in 5 minutes, that still comes out to 600 minutes to identify all the trackers used by the Huffington post, or 875 minutes for The adtech companies that profit from our data and sell access to us as we surf the web use this opacity to further their business interests.

Commonly Used Trackers

As noted earlier, just over 60 trackers/services are used in 10 or more of the 25 sites we surveyed. For this post, I identified the companies involved to get a sense of who some of the bigger players are. To emphasize, the list of 25 sites surveyed is incomplete yet representative. This is the beginning of work that will likely be ongoing, as time allows.

But when we look at the 60 services used most commonly across these 25 different sites, the vast majority of these companies are IAB members. As Randall Rotherberg, the President and CEO of the IAB observed in his speech:

(W)e face a challenge that has boiled over into crisis, perhaps the greatest crisis it is possible to face. For it is a crisis not of our industry, not of our digital media and marketing village, but a crisis of society writ large.

Right now, the status quo in adtech is to sell to all sides, and profit from both the arms race and the battles. While our discourse and news ecosystem remains mired in misinformation, adtech pulls profit.

Adtech profits when we read lies, and adtech allows liars to earn revenue.

Adtech profits when we read hate speech, and adtech allows the people who spread hate to earn revenue.

Adtech profits when places like the Huffington Post convince writers to publish for "exposure," and adtech allows the Huffington Post to generate revenue for these exploitive practices.

Adtech profits when people read traditional news outlets, and adtech allows these news outlets to generate revenue.

It's worth remembering that the impact of ad revenue will vary based on the overhead within an organization. The more a site cuts corners, eliminates editors and fact checkers, or doesn't pay writers, the greater the benefit of revenue generated via adtech. The benefits of adtech tilt the scales toward falsehood, sensationalism, and hate. Adtech in its current form - predicated on online monitoring of consumers, and selling access to user data via ad exchanges - gives a decided advantage to those who are willing to bypass facts in favor of bias, superficiality, or an emotional appeal.

Appendix 1 -Dataset

Full dataset in csv format:

Appendix 2 - List of third party services, by URL

  • - used in 23 sites.
  • - used in 22 sites.
  • - used in 22 sites.
  • - used in 21 sites.
  • - used in 21 sites.
  • - used in 21 sites.
  • - used in 20 sites.
  • - used in 19 sites.
  • - used in 19 sites.
  • - used in 18 sites.
  • - used in 18 sites.
  • - used in 18 sites.
  • - used in 17 sites.
  • - used in 17 sites.
  • - used in 17 sites.
  • - used in 17 sites.
  • - used in 17 sites.
  • - used in 16 sites.
  • - used in 16 sites.
  • - used in 16 sites.
  • - used in 16 sites.
  • - used in 16 sites.
  • - used in 15 sites.
  • - used in 15 sites.
  • - used in 15 sites.
  • - used in 15 sites.
  • - used in 15 sites.
  • - used in 15 sites.
  • - used in 14 sites.
  • - used in 14 sites.
  • - used in 14 sites.
  • - used in 14 sites.
  • - used in 14 sites.
  • - used in 14 sites.
  • - used in 14 sites.
  • - used in 14 sites.
  • - used in 14 sites.
  • - used in 14 sites.
  • - used in 13 sites.
  • - used in 13 sites.
  • - used in 13 sites.
  • - used in 13 sites.
  • - used in 12 sites.
  • - used in 12 sites.
  • - used in 12 sites.
  • - used in 12 sites.
  • - used in 12 sites.
  • - used in 12 sites.
  • - used in 11 sites.
  • - used in 11 sites.
  • - used in 11 sites.
  • - used in 11 sites.
  • - used in 11 sites.
  • - used in 11 sites.
  • - used in 10 sites.
  • - used in 10 sites.
  • - used in 10 sites.
  • - used in 10 sites.
  • - used in 10 sites.
  • - used in 10 sites.
  • - used in 10 sites.
  • - used in 10 sites.
  • - used in 10 sites.

Appendix 3 - List of most used third party services, with additional details

RhythmOne is an IAB member.

AOL/ is an IAB member.

AddThis is an IAB member

AppNexus is an IAB member.

The Trade Desk is an IAB member

Drawbridge is an IAB member.

AOL is an IAB member. is an IAB member

Neustar is an IAB member

Amazon is an IAB member.

ConvertMedia is an IAB member

IAB and TRUSTe member

Index Exchange is an IAB member

Rubicon Project is an IAB Member

  • Owned by Amazon
  • Used primarily as a cdn, so its use will vary widely among sites.
  • Not explicitly used for ad networks

  • Like Cloudfront, Cloudflare is a CDN

Pulsepoint is an IAB member

Lotame is an IAB member

Adobe is an IAB member, although their Marketing Cloud appears to not be an IAB member.

Conversant is an IAB member

Adobe is an IAB member.

Eyereturn is an IAB member

Eyeview is an IAB member.

Facebook companies


Facebook is an IAB member.

Google companies

The following domains are associated with Google services - some are ad-related, some (like YouTube) both provide a service and tracking. All of these domains - individually - were called on 10 or more of the 25 sites surveyed.

  • (part of Google/Doubleclick)

Additional info on these domains/services:

Google is an IAB member.

RadiumOne is an IAB member

MediaMath is an IAB member.

Dstillery is an IAB member.

Maxpoint is an IAB member.

Datalogix is an IAB member.

OpenX is an IAB member.

Pubmatic is an IAB member.

Quantcast is an IAB member.

AudienceScience is an IAB member.

Rocket Fuel is an IAB member.

Rubicon Project is an IAB Member.

comScore is an IAB member.

Simplifi is an IAB member.

Sitescout is an IAB member.

SpotXchange is an IAB member.

Tapad Inc is an IAB member.

Videology is an IAB member.

Exponential is the company name Exponential is an IAB Member.

Tubemogul is an IAB member.

Turn is an IAB member.

Yahoo is owned by Verizon, and is an IAB member.

Google, Lawsuits, and the Importance of Good Documentation

8 min read

This week, the Mississippi Attorney General sued Google, claiming that Google is mining student data. In this post, I'll share some general, personal thoughts, and some recommendations for Google.

To start, it's worth watching a statement from the press conference where the suit was announced - this video clip was shared by Anna Wolfe, a journalist who covered the event.

At 1:46 in the video, the AG describes the "tests" that were run. To be blunt, these tests don't sound like actual tests - it sounds more like browsing and looking at the screen. Unless the student account they were using was relatively new, had never done any searches on the topic being "tested," had never browsed while logged in to any non-Google site that had ad tracking, and all testing browsers had their cache, cookies, and browsing history cleared, there are a range of benign options that could explain behavior that looks like targeted ads. And that doesn't even take into account the difference between targeted ads based on past behavior, and content-based ads delivered because a page describes a specific subject.

Without additional detail from the Mississippi AG on how they tested for tracking, the current claims of tracking are less than persuasive.

G Suite Terms, and (a Lack of) Clarity

An area where Google can improve is highlighted in the suit: Google's terms, and the way Google describes how educational data are handled, are not easily accessible or comprehensible (all the necessary disclaimers apply: I am not a lawyer, this is not legal advice, etc, etc). This commentary is limited to transparency and clarity. With that said, Google could blunt a lot of the claims and criticisms they receive with better documentation. The people who are doing this work at Google are smart and talented - they should be allowed to describe the details of their work more effectively.

Google has built a "Trust" page for G Suite, formerly known as Google Apps for Education. The opening paragraphs of text on this page highlight the confusing complexity of Google's terms.

Opening text from Trust page

In this opening text, Google links to five different policies that govern use of Google products in education:

However, this list of five different legal documents leaves out five additional documents that potentially govern use of G Suite in Education:

Of these five additional documents, two (the Data Processing Amendment and the Model Contract Clauses) are optional. However, these ten documents are not listed together in a single, coherent list anywhere on the Google site that I have found. The trust page also links to this list of Google services that are not included in G Suite/Google Apps for Education, but that can be enabled within G Suite. The list includes over 40 individual services, which are all covered by different sets of terms.

Moving down the "Trust" page, we see several different words or phrases used to refer to the Education Terms: "contracts," "G Suite Agreement," and "agreements." These all link to the same document, but the different names for the same document make it more difficult to follow than it needs to be.

Some simple things Google could do on the "Trust" page:

  • list out all applicable terms and policies, with a simple description of what is covered;
  • list out the order of precedence among the different documents that govern G Suite use. If there is a contradiction between different any of these different documents, identify what document is authoritative. As just one example, the Data Processing Agreement and the G Suite Agreement define key terms like "affiliate" in slightly different ways;
  • highlight what documents are optional;
  • create a simple template for districts (or state departments of ed, or universities) to document the agreements governing a particular G Suite/Google Apps implementation;
  • standardize language used when referring to different policies;
  • define the differences between the Education-specific contracts and the Consumer contracts;
  • in each of their legal terms, create IDs that allow for linking directly to a section of a document.

While the above steps would be an improvement, creating standalone, education-specific terms that were fully independent of the consumer terms would add additional clarity. From a product development place, this legal review would force an internal review to ensure that legal terms and technical implementation were in sync. To be clear, this is an enormous undertaking, but if Google did this, it would add some much-needed clarity. Practically speaking, Google could use this step to generate some solid PR as well. The PR messaging on this practically writes itself: "Google has always prided itself on being a leader in security, data privacy, and transparency. As our products evolve and improve, we are always making sure that our agreemets evolve and improve as well."

G Suite and Advertising

Google has stated on multiple occasions that "There are no ads in the suite of G Suite core services." Here, it's worth noting that "core services" for education only includes Gmail, Google Calendar, Google Talk, Google Hangouts, Google Drive, Google Docs, Google Sheets, Google Slides, Google Forms, Google Sites, Google Contacts, and Google Vault. Other services - like Maps, Blogger, YouTube, History, and Custom Search - are not part of the core services, and are not covered under educational terms.

Ads text from Trust page

There are differences, however, between showing ads, targeting ads, and collecting data for use in profiles. Ads can be shown on the basis of the content of the page (ie, read an article about canoeing, see an ad for canoes), and this requires no information about the person reading the page.

Targeted ads use information collected from or about a user to target them, or their general demographic, with specific ads. However, while targeted ads are annoying and intrusive, they provide visual evidence that personal data is being collected and organized into a profile.

On their "Trust" page, as pictured above, Google states that "Google does not use any user personal information (or any information associated with a Google Account) to target ads."

In Google's Educational Terms, they state that they collect the following information from users of their educational services:

  • device information, such as the hardware model, operating system version, unique device identifiers, and mobile network information including phone number of the user;
  • log information, including details of how a user used our service, device event information, and the user's Internet protocol (IP) address;
  • location information, as determined by various technologies including IP address, GPS, and other sensors;

While it is great that Google states that they don't use information collected from educational users, Google also needs to provide a technical explanation that demonstrates how they ensure that IP addresses collected from students, unique IDs that are tied to student devices, and student phone numbers are explicitly excluded from advertising activity. Also, Google should clearly define what they mean when they say "advertising purposes", as this phrase is vague enough to take on many different meanings, often showing more about the opinions of the reader than the practice of Google.

This technical explanation should also include how the prohibitions against advertising based on data collected in Google Apps can square with this definition of advertising pulled from the optional Data Processing Agreement:

"'Advertising' means online advertisements displayed by Google to End Users, excluding any advertisements Customer expressly chooses to have Google or any Google Affiliate display in connection with the Services under a separate agreement (for example, Google AdSense advertisements implemented by Customer on a website created by Customer using the "Google Sites" functionality within the Services)."

There are many ways that all of these statements can be true simultaneously, but without a technically sound explanation of how this is accomplished, Google is essentially asking people to trust them with no demonstration of how this is possible.


Google has been working in the educational space for years, and they have put a lot of thought into their products. However, real questions still exist about how these products work, and about how data collected from kids in these products is handled. Google has created copious documentation, but - ironically - that is part of the problem, as the sheer volume of what they have created contains contradictions and repetitions with slight degrees of variance that impede understanding. Based on seeing both Google's terms evolve over the years and from seeing terms in multiple other products, these issues actually feel pretty normal. This doesn't mean that they don't need to be addressed, but I don't see malice in any of these shortcomings.

However, the concern is real, for Google and other EdTech companies: if your product supports learning today, it shouldn't support redlining and profiling tomorrow.

Why I Signed

4 min read

Yesterday, December 14th, was an interesting day in technology. Evernote announced an update to their terms of service that appears to allow selected employees to read notes stored in their system, with no opt-out, in the interest of improving machine learning. People using Evernote are - rightly - talking about abandoning the service en-masse, which seems like a pretty reasonable response to such horrible privacy practice. Of course, I have heard nary a peep from Evernote's education ambassadors about this. Who knows - maybe if they actually said something they might have to give back their t-shirts and stickers.

But Evernote's issues were a footnote compared to the spectacle of major tech leaders shuffling into Trump Tower to meet with the president-elect, the incoming vice-president, and the children of the president-elect. If we are searching for a situation that illustrates how ethics get bent for reasons of politics and profit, we don't need to look much further than this event.

Trump Tower tech meeting

An additional backdrop here is that Trump ascended to the presidency with the help of the company that he didn't invite because they refused his emoji. And, during a campaign that was marked by promises of creating a registry for Muslims, the Trump campaign was steadily creating a version of that registry, and more, with data pulled from Facebook, assembled and augmented by Cambridge Analytica, and further extended by data purchased from the major data brokers here in the US that combines in-person and online habits, with up to 5000 individual data points on 220 million Americans. This data set is privately held, so potentially, White House advisors like Richard Spencer and Steve Bannon could be using this data set to inform their work. But let's be clear - this data set exists because of the work of the tech industry, and the data it collects.

Third party tracking is pervasive on the web. This technology creates marked and growing information asymmetry, where the odds are increasingly stacked against people, and stacked for corporations. Technology fuels this power imbalance, and technologists build the tools that make it possible.

The day before the leading technologists in our country shuffled into Trump Tower, news broke of 200 million records for sale on the dark web containing information that appears to come from a data broker. The records identify individuals, and include details like spending habits, political contributions, political leaning, credit rating, charitable contributions, travel habits, and information on gambling habits/tendencies. These records were certainly assembled and stored via different tracking technologies.

With this as a backdrop, when I see something like I will admit a degree of skepticism. The profiling tools are built, and the data sets are assembled, multiple times over. I also want to make explicitly clear that my signature, or lack of signature, on the list is pretty unimportant in the larger scheme of things. But with all that said - and with all the technology that has been built, and is right now humming along, collecting data, serving bad search results, and tracking us - we can still make things better. Hell, we might even be able to make things right.

With regard to privacy, people often use two metaphors to describe why the efforts to increase privacy protections are meaningless: "the genie is out of the bottle" and "the train has left the station." What people using these metaphors fail to recognize is that the stories end with the genie returning to the bottle, and the train pulling into another station. "Too late" is the language of the lazy or the overwhelmed. Change starts with awareness, and change grows with organized voices. That's something I can get behind, and is the reason I signed

Facebook, Voter Suppression, and AdTech

3 min read

This piece over on Medium ties together several news stories that have been written about the Trump campaign's use of Dark Posts on Facebook to supress the vote among Clinton voters. There are some great details in the post, and you should read it in full. A few details stood out that bear highlighting.

The Trump campaign used pre-built tools within Facebook, and data on users exposed by Facebook. In other words, Facebook already had the tools to support vote suppression built into their system. I don't think that this was done intentionally by Facebook, but it really hammers home the point: all tech has unintended consequences. When we look at tech, we need to evaluate the fringes, and ask hard questions about what the tech can break, because we humans are great at breaking things. But in this case, the mechanisms for manipulating behavior via ads worked very well for suppressing turnout in the electorate. Predictive analytics lost, but mood manipulation via big data worked well.

The Trump campaign used data from within Facebook to suppress turnout among Clinton supporters. This means that every progressive organization on any issue that has been organizing on Facebook helped provide the Trump campaign with a list of potential voters to receive Dark Posts to suppress their vote (and in brief, Dark Posts are private ads microtargeted to specific demographics. On some days, the Trump campaign delivered 100,000 different ads, tailored by demographic data). But the message to progressive orgs should be clear: when you organize on Facebook, you expose your organization and your stakeholders to profiling and targeted political ads by your opponenents. Use better tools.

Finally, according to the piece, the Trump campaign created a privately owned database that contains between 4000-5000 data points on the online and offline behavior (ie, where we go, our credit card purchases, etc) of approximately 220 million Americans. This database was compiled from multiple sources, including Cambridge Analytica, Experian PLC, Datalogix, Epsilon, and Acxiom Corporation. It's privately held, and it's unclear what restrictions, if any, exist around who can access this database. Unlike data collected on us by the NSA, where there are levels of bureaucracy tracking access, the dataset compiled during the campaign is a much more openly accessible resource to people within the Trump campaign.

Also worth noting: Facebook explicitly offers advertising services that tie online and offline behaviors. If you look at the list opf partners, you will see some of the same players that determine our credit scores.

Data Clean Up - No Time Like the Present

1 min read

I can't think of a better time than the present for schools to clean up some of their existing demographic data collected on students. Ideally, demographic data can be used to ensure that students and schools get resources they need, but in some cases, the same demographic data used to help deliver services could also be used to help identify parents or families that have have stayed past the time permitted on their visa. 

For example, information on languages spoken in the home, the presence or absence of a social security number, or questions that look directly at immigration status can all be used in multiple ways. Given that collecting accurate data on sensitive topics is never easy, deleting this data as a means to insuring that it isn't misused or misconstrued is a recommended path. 

If you don't have data, it can't be compromised, leaked, or misused. For schools that have sensitive demographic data on the students entrusted to their care, now is an ideal time to clean up.

How Do We Support Each Other As We Do The Work?

3 min read

Donald Trump and Mike Pence won the election last night. This raises a whole slew of questions, but I'll start here with some questions grounded in an educational context:

  • What does it mean to create a safe space for learning for black and brown kids when the leader of the country considers people that look like them to be terrorists, rapists, or drug dealers who should be kicked out of the country?
  • What does it mean to stand up against bullying when we have a leader who incorporated abusive behavior as a campaign strategy?
  • What does it mean to encourage honesty when we have a leader who actively ignores the truth?
  • What does it mean to educate women when we have a leader who consistently demeans women based on their physical appearance, and who brags of sexual assault?

I don't have answers to any of these questions - and really, the answers to these questions reside in our day to day actions. We - all of us - will have a constant series of small interactions where we will have the opportunity to do well, or to do something else. Hopefully, we will get it right more than we get it wrong, and hopefully, when we get it wrong we will have the humility to admit it, make amends, and move forward.

The conditions that led to the election of Donald Trump existed well before Donald Trump announced his candidacy. The racism, misogyny, and xenophobia that he voiced while campaigning has been well documented. However, racism, misogyny, and xenophobia are well worn in the history of the United States. I don't say this to be inflammatory, but rather to acknowledge a basic reality. I mean, I'm writing this within the border of a state that was founded as a bastion of white supremacy

So here we are. To state the obvious: the need to do the work would have existed regardless of who won, but the Trump/Pence victory amplifies the need to center intersectional social justice in our work. And yes, I am being intentionally vague when I say, "the work." We all need to define it in the way that makes sense to us - for some of us, it will be intensely local; for others, it will be organizing at the national level. For most of us, it will be something in between. For people who look like me, let's consider talking less, and listening more.

Food is my thing. When I woke up this morning, I peeled and diced some shallots, onions, parsnips, garlic, and a turnip and threw them in a pot with a chicken, salt, pepper, brown sugar, soy sauce, and some spices. It's simmering now as I write, and soon, the smell will fill the house.

It's not a solution, but it'll feed those around me. Today starts now. How do we support each other as we do the work? 

Ransomware Focused on K12 and Government

1 min read

While ransomware attacks have been on the rise, education has seen (fortunately) few attacks. However, as reported in Softpedia, that could be changing.

The focus on educational and government users attempts to take advantage of (among other things) weak or nonexistent disaster recovery strategies.

By going after government institutions, they might get lucky and infect a target that has failed to implement a proper backup procedure, effectively shutting down its system until a ransom has been paid. The chances of squeezing a ransom payment out of these targets are higher than with regular home users.

The attack has been delivered using bogus ticket confirmations, which in turn contain a link to a the ransomware. Now is the time to do two things:

  • Test your backup and disaster recovery strategy; and
  • Review good email and download habits with your colleagues. This will protect against phishing, social engineering, and ransomware attacks.

Students, Directory Information, and Social Media - Part 2

5 min read

Last week, I put out a post on social media and kids. Apparently, it was read by more than a couple people. I don't keep track of pageviews or reach here - I have no analytics running on this blog, and while I will talk with people on Twitter about things, I have disabled comments on the blog - thanks, spammers and trolls! So, I have no sense of what posts I write on my personal blog here resonate with people. My approach to writing online is to treat it as my outboard brain - the process of writing helps me figure out what I'm thinking, what I'm getting wrong, and where I need to look and learn more. Based on the feedback and response I received on the last post, I wanted to clarify and expand on a couple things.

The premise of the post is that parents should be able to opt their kids out of directory information sharing and having their kid's information (photo and name) shared on social media, and not have this be a barrier to other school activities like yearbook, local news, athletic and music publications, class pictures, and streamlined access to childcare. 

The conversation would be different if directory information was limited to basic information, and if teacher sharing on social media showed higher levels of restraint, but, unfortunately, that is not where we are.

Some school districts - not all, but some - consider a student's name, address, email address, phone number, picture, date of birth, place of birth, and enrollment status to be directory information under FERPA. Under FERPA, local education agencies have the right to define what constitutes directory information. FERPA allows directory information to be shared without consent. It's worth noting that if a company had an incident where this same data was accessed, this would be considered a data breach. Yet, for a kid in kindergarten, schools and districts have the right to designate this as information that can be shared freely. To get a sense of how districts are defining directory information and managing opt-out, take some time and read through district forms. These forms are pretty short, and most of them can be read in under five minutes.

Moving on to social media, some teachers who make regular use of social media often overshare. This is not the case for all teachers - there are many teachers who have don't show kid's faces, only share images of larger activities, don't share student names, and don't share other personal information collected from students. But, the small subset of teachers who overshare complicate the space for their peers, and for school districts attempting to balance proactive outreach with real concerns about learner privacy. When teachers share their school and grade in their bio, that information can be combined with what is shared in social media posts. It's also worth noting that, in many cases, a search across a username on social media sites reveals additional information about people.

While opinions vary on what I am about to say, I do not consider a school web site to be part of social media. Most school web sites have nowhere near the traffic or visibility (to people or search engines) as social media sites. Opinions differ on that point, but I want that distinction clear in this post.

Three recommendations for districts that would address these basic issues include:

  • Limit what is covered under directory information. Ideally, information that allows a kid to be contacted directly would be excluded from directory information;
  • Create a social media policy for teachers that limits the amount of information that teachers can share about a student via an individual teacher's social media account (Twitter, Facebook, Instragram, Snapchat, etc). Ideally, names should not accompany portraits, and call outs to edtech vendors should not accompany a kid's image;
  • Avoid grouping parental and learner rights around data sharing into all-or-nothing buckets. These two forms are good examples of how districts are proactively addressing these needs.

If a district takes steps to minimize what is considered directory information and has a sound social media policy in place, the number of people who feel the need to opt out will likely decrease. This is an opinion based on multiple conversations over years, and like all opinions requires time to see if it holds water. However, my sense (from talking with people who care about learner privacy from within parent and school communities) is that having the option to opt out would reduce concerns about the need to opt out - in other words, when schools recognize the need for the option, people have more trust that the schools understand the issues are are addressing them effectively. 

And, a closing thought: parents also have a role to play here in their sharing on their own social media feeds. Periodically, take a step back and review your social media presence with an eye toward seeing what information you have shared about your family and friends. If we want to emphasize the need for privacy with our kids, we have an obligation to model that with our own behavior as well.

Students and Social Media

10 min read

Update: I put out a second part to this post based on some conversations. End Update.

Introductory note: In this post, I reference hashtags and tweets I have seen that compromise student privacy. Ordinarily, I would link to the hashtags or tweets, and/or post obscured screenshots. In this post, I am doing neither because I do not want to further compromise the privacy of the people in the examples I have seen.

When teachers post pictures of students on social media, it raises the question of whose story is being told, and in whose voice, and for what audience. Multiple different answers exist for each of these questions: the "story" being can told can range from the story of a kid's experience in the class, to a teacher's documenting of class activities, to a teacher documenting activities that are prioritized within a district. In most cases, even when the story is told from the student's perspective, the voice telling the story is an adult voice. The audience for these pieces can also vary widely, from parents, to other teachers, to the district front office, to the broader education technology world.

While students often figure prominently in classroom images posted on social media, student voice is rarely highlighted, and students are rarely the audience. The recent example of the IWishMyTeacherKnew hashtag - where a teacher took student thoughts and words from 8, 9 and 10 year olds and posted them on Twitter, and parlayed that experience into a book deal - provides a clear example of student words appropriated to tell an adult story. As a side note, it's also worth highlighting that student handwriting is a biometric identifier under FERPA, so sharing samples of student handwriting online without prior parental consent is, at best, a legal gray area. To emphasize: asking your students what matters and what they care about is great. Publishing these personal details to the world via social media - especially when their words can be traced back to them within their local community - prioritizes an adult need over learner's needs.

When posting student pictures on social media, the adults in the room need to be careful about the details they include with their images. I have seen examples of teachers doing a great job documenting their classroom when they show pictures of kids working on a project, and the pictures focus on work, do not include student names, and do not include student faces (or only include them as part of a group shot, not as a close up portrait). Conversely, I have also seen teachers post pictures of kids that include a close up of the student's face and a student name tag, where the teacher's bio identifies their school and grade. Teachers should also ensure that the location services for photos is off, otherwise they can share precise geographic locations.

In some more extreme examples, I have seen teachers post portraits of students that include a name tag, grade, and a reference to a specific EdTech vendor. Whiled it is great to see teachers highlighting student effort and growth, including a specific tech vendor in the callout with a picture of an elementary school student looks a lot like a kid being used as an unpaid spokesperson to market a tech product. To make matters worse, I have also seen examples where these pictures included usernames and passwords of students. To be crystal clear, writing usernames and password names down in a publicly accessible place shouldn't happen. Posting these passwords to the open web is a surefire way to make this bad practice worse.

When a teacher posts a picture of a student that include the above details, they are potentially sharing directory information or parts of an educational record, as defined under FERPA. Beyond what is covered under FERPA, we must ask whose needs are served by sharing this information on commercial social media - the student, the teacher, or the school? Taking pictures is fine. Recognizing students for work and progress is obviously fine. Tying that work and progress to a specific app or vendor is less fine. Posting this collection of information on social media has the potential to cross some serious legal and ethical lines.

Given that some of this information is covered under FERPA, parents have some rights to control how and where information is shared. Schools and districts can play a key role in ensuring that learners have full and unfettered access to their basic rights to privacy. Unfortunately, many districts do not approach this issue with adequate flexibility or understanding of how their policies can protect or impair a parent's ability to access their rights. For a pretty typical example, we will take a look at the opt-out and disclosure form from Baltimore County Public Schools.

The form has three sections: FERPA Directory Information Opt-Out, Intellectual Property Opt-Out, and Student Photographs, Videos and/or Sound Recordings Opt-Out. These are the right categories to include in an opt-out form, but the way the opt-outs are structured are hostile to student privacy.

Taking a closer look, starting with the FERPA Directory Information Opt-Out, the section closes with this explanatory note, followed by three options.

BCPS opt-out excerpt - FERPA

Note: If you “opt-out” of the release of directory information, BCPS will not release your child’s directory information to anyone, including, but not limited to: Boys and Girls Clubs, YMCA, scouts, PTA, booster clubs, yearbook/memory book companies that take photographs at schools and/or other agencies and organizations.

The reference to Boys and Girls Clubs and the YMCA is telling here: these outside vendors are used to run childcare programs for parents that need it. Because the district takes a blanket approach where parents are required to choose all or nothing, the current district opt-out policy appears to place a barrier in the way of parents who want to protect their child's privacy and need childcare. The likely scenario here is that parents who opt out of data sharing at the district level need to make additional arrangements with the childcare providers at the school level. While this is not an insurmountable obstacle, it creates unneeded friction for parents, which can be read as a disincentive for parents and children to access their rights under FERPA.

Districts can address this issue very easily by adding a single check box to the their form that authorized the release of directory information to school-approved childcare providers.

Moving on to the Intellectual Property Opt-Out section, Baltimore County Public Schools takes a similarly broad approach with student's IP rights. The terms of the opt-out form combine multiple different activities, with multiple different means of both publishing and distribution, into an all-or-nothing option.

BCPS opt-out form - IP Rights

Having a student's intellectual property uploaded to a web site with weak privacy protections is a very different situation than having a kid covered in the news, or having a kid participating in a school-sponsored video. The fact that a district conflates these very different activities undercuts the protections available to learners. This also creates the impression that the district values district-created processes more than student privacy and learner agency.

Moving on to Student Photographs, Videos and/or Sound Recordings Opt-Out, Baltimore County Public Schools again takes an all-or-nothing approach.

BCPS opt-out - Photos, Videos, Recordings

If the parent denies such permission, the student’s picture will not be used in any BCPS publication or communication vehicle, including, but not limited to, printed materials, web sites, social media sites or the cable television channel operated, produced or maintained by BCPS’ schools or offices, nor will my child’s picture be part of a school yearbook, memory book, memory video, sports team, club or any other medium.

Social media, yearbook, childcare, and sports activities are all very different events. When schools structure permissions in a way that removes agency from parents and kids, they burn goodwill. Also, given that teachers and districts are still publishing pictures of kids online in ways that share personal information, including (on some rare occasions) passwords, parents should have some granular ability to differentiate between sharing in a yearbook and sharing on Instagram, Facebook, or Twitter. Until schools and districts consistently get this right, they have an obligation to err on the side of restraint. To state the obvious, kids don't walk through the school doors so adults can use their likeness and work on social media. Similarly, yearbooks and social media are very different things, and yearbook companies and social media companies have very different business models, and - in most cases - very different approaches to handling data.

The solution here is pretty straightforward: provide parents a granular set of options. A parent or kid should be able to say that they want to be in the yearbook; a high school athlete should be able to say they want to be in the program or in the paper; a musician should be able to be acknowledged in a newsletter - and these options do not need to be tethered to sharing directory information, streamlined access to childcare, or indiscriminate sharing on social media. That is a reasonable request, and if a teacher, school, or district lacks the data handling and media literacy skills requried to make that happen, then we have an opportunity for teachers and district staff to develop and grow professionally.

The argument we generally hear against allowing parents and students real choices over their privacy rights is that the burden would be too much for schools to handle. However, we only need to look at how parental rights are managed with regards to health curriculum to see the hollowness of that argument. In Baltimore County Public Schools - as with many schools in many districts nationwide - parents and students can opt-out of individual units in the health curriculum. Districts have been managing this granular level of opt out for years, and somehow - miraculously - the educational system has not tumbled into ruin as a result.

The main difference, of course, is that in many states parental opt-out rights are required and defined by law.

For parents: use the opt-out form provided by the World Privacy Forum to assert your rights. In an email accompanying the form, explain that you would like to see your district develop more flexible policies on opt-out and data sharing.

For teachers: if you are going to share student images and work on social media, make intentional choices about what you share, how you share, and why you share. Additionally, ask your district about more granular policies for parents and learners. While the initial change might be hard, over time the more flexible rules will make your work easier, and increase trust between you, your students, and their guardians.

For districts: get ahead of the curve and start offering more flexible options. As we have seen with health curriculum and with privacy in the last few years, state legislatures are not shy about introducing and passing legislation. Districts have an opportunity to address these concerns proactively. It would be great to see them take advantage of this opportunity.