Bill has worked in education as an English and history teacher, an administrator, and a technology director. Bill initially discovered the Internet in the mid-1990's at the insistence of a student who wouldn't stop talking about it.

Thirty Seconds

3 min read

In my years working in and around education, I have heard a lot of arguments about how to "reach" teachers in order to provide them information. A lot of these arguments have the stench of SEO optimization, and quickly devolve into keyword placement, catchy titles, finding the right post length, using pictures, using video, and making sure to embed current jargon. At some point in this screed, the question of time gets raised. Teachers are busy, they will say. They need to make a decision in [X seconds] or [Y minutes]. Any longer than that and we've lost our chance.

And when I hear these arguments, I'm always at a loss on how to proceed. Teachers are busy, but teachers are also caring, informed professionals. Far too frequently, when I hear people talk about "reaching" people, or how to make pages "sticky," I hear the language of trickery. It's the language used when -- consciously or unconsciously -- people view attention as something to be gamed, not earned -- as something to be taken, not offered. It's the language of people who lack a thorough confidence in what they offer, and feel their first and best recourse is to resort to gimmickry to keep people engaged.

And when I ask questions about how they are working to improve their information, or talking with the people they want to reach, or make room to elevate voices within their readership, or what their unique perspective on a specific issue might be, it often feels like I'm addressing a native English speaker in Greek. When I suggest spending less time and money on the frills that adorn a piece and more time figuring out how a specific piece offers something new or unique, the conversations generally grind to a halt.

And that's too bad, because if you write well, and write with a purpose, and have an actual vision that makes sense, people will read. If you want to make sure that you have an edge in search, encrypt your site, and make sure it uses standards-compliant markup. But assuming that your best ideas need to be accessible in under [X seconds/Y minues] patronizes the people who might be have a deep interest in your posts. It also encourages unexamined oversimplifications, which leads to sloppy thought. There are some decisions that shouldn't be made in under 30 seconds, or under 2 minutes. And while there is a balance that needs to be struck between accessibility and depth, the content should drive where that line is drawn. I'd argue we create more useful information in educational content when we err on the side of an intelligent reader. 

Thinking is okay. Acknowledging that aspects of the world are complex, and don't fit into easily consumed chunks, is a part of how we "reach" people. We need to keep the simple things simple, and we need to explain the complex things well. Attempting to take shortcuts through intellectual complexity is another facet of technology as solutionism. The only people who win are the folks selling shortcuts -- and they have generally cashed their checks by the time the rest of us are cleaning up their messes.

Who Knows What You Read?

2 min read

I spend a lot of time thinking about ways to help people understand data collection and privacy. I've done workshops on tracking before, but over the next year I'd like to try this with a group of teachers, or possibly at an EdCamp. This activity could also work at the high school level, and possibly even with middle school students.

The goal of the activity is to provide participants the skills and tools to begin analyzing how online trackers work, and how to spot and identify them.

If anyone runs this activity, or has suggestions on how to improve or modify it, please let me know

Select an individual news article, and document how you found it. 

Then, read the article.

Then, document:

  • a. how you chose this individual article;
  • b. what device you read it on;
  • c. how long you spent reading it;
  • d. the web page you visited after you read it;
  • e. your physical location when you read it;
  • f. when during the day you read it.

Then, describe who else would know the answers to questions a-f, listed above. Include any companies that might be tracking any of the pages you visited, including the company that owns/controls the site that published the article. How difficult or easy would it be for them to share that information with other companies? How would you know if/when any of this information was shared, or how it was used?

Then, compare this process to reading an article in a newspaper or magazine. When we read something in print, who else knows about it? How do they know?

Using just the information from only one article, what statements,judgments, or assumptions could someone make about you?

How would this change if they had information about 10 articles you read?

How would this change if they had access to your reading habits for the last week? The last month? The last year? 

To get a sense of the trackers on a page, use a tool like Ghostery or Lightbeam (Firefox only). While neither are as accurate as an intercepting proxy, they are both very accessible, and help illustrate the point with much less work.

Amazon and Whole Foods: Can I Have Some Data with that Kale?

4 min read

It looks like Amazon is buying Whole Foods

Let's take a step back and look at the data involved here. We will start by looking at a person who only uses Amazon to shop online, buys food from Whole Foods, and reads using the Kindle app.

For anyone who has ever bought something, Amazon has our home address, and possibly related shipping addresses (ie, ifyou have ever bought something as a gift and had it shipped directly to the recipient). Amazon potentially has one or more credit cards stored for us. Amazon has our purchasing history, and our browsing history. If we ever responded to an ad online for an Amazon product, Amazon has that referrer history, and can infer and expand their profile on us based on the sites that refer us to Amazon.

And, of course, Amazon collects information about all the different devices you use to access Amazon services - so Amazon has a precise record of all the hardware and software you use when you shop, potentially going back to when you first started shopping online. If you can't remember the phone you used in 2007, Amazon could probably tell you.

Moving on to Whole Foods, every time someone uses a credit card in the store, Whole Foods gets the person's name, their credit card number, their geographic location (the store), the time they were there, and the list of items they have purchased. Cross referencing this information with data collected by Amazon, the credit card number or name and zip code could be sufficient to connect these data sets with close to 100% certainty.

For people who use the Whole Foods App, the list of data collected by Whole Foods expands dramatically. The application collects geographic location, device information (ie, the brand of phone or tablet, some form of device ID, the IP addresses it uses, etc), presumably an email address, and the ability to read and access wireless and bluetooth connections. I'm not sure if Whole Foods does tracking via bluetooth beacons, but the app permissions for the android app leave that open as a possibility. If the Whole Foods app does ship with bluetooth tracking enabled, anyone with the app installed and running can be tracked via bluetooth beacons from just about anywhere. Potentially, if tracking was set up between any of Amazon's home devices (the Echo, etc) and the Whole Foods app that Amazon can now access, that would be a very effective way to map in-person social connections and online/offline activity.

If a person shops online at Amazon, buys (expensive) food at Whole Foods, and reads using the Kindle app, then we are also sharing our reading history, patterns, reading speed, and book buying history with Amazon. This data can also be used to infer interests (a person reads one type of book over another, and reads this type of book faster than another), habits (a person generally reads in the morning, and for a certain amount of time), and other personal patterns. When reading habits are cross-referenced against other personal habits (like the food we buy or the items we shop for) it creates a more complete profile of an individual. 

It doesn't take much of a leap to see how a list of the food we buy, the items we shop for, the information we read, and where and when we do each of these actions would be of interest in things like health care. 

And, of course, Amazon has been moving into health care. And, given that we are seeing more experiments using things like sentiment analysis and wearable tech as a means to adjust insurance rates, scenarios that include shopping lists in insurance calculations aren't a stretch.

It's also worth noting that the depth of the Whole Foods data set will be a boon for companies like Amazon that look at differential pricing. Amazon will now be in a great position to identify people willing to pay more for everyday items.

So, have fun shopping at Whole Foods. That organic, free range, hormone free chicken you will be eating tonight will be pecking in your data trail for a while. 

Twitter's Misleading User Experience When Reporting Abuse

2 min read

Twitter's history of combating trolls and abuse has been problematic, at best.

Recently, I discovered a corner in their toolkit that highlights why Twitter's current efforts remain ineffective.

When reporting a person for abuse (or, more likely, a bot), Twitter leads you through a multi-step process. 

In the first step, we select an account or a tweet to report.

Step 1 

In the second step, we define the reason for the report.

Step 2

In the third step, we provide additional details.

Step 3

In the fourth step, we indicate who is being harassed.

Step 4

In the fifth step, we select up to five tweets that demonstrate the harassment.

Step 5

In the sixth step, we decide whether we want to block the account, mute the account, or do neither. When we click "Done", the offending tweets we reported are no longer visible. Voila. The process has worked.

Step 6

Except, it hasn't. Despite appearances, Twitter has done nothing to address the abuse. When you are logged in, you can't see the Tweets you reported. To the rest of the world - including, literally, everyone who isn't you - the content is still visible. This almost certainly includes search engines.

From your perspective, it actually looks like Twitter has done something, but from a practical perspective, Twitter has engaged in a game of smoke and mirrors. This happens regardless of whether or not we select "Block" or "Mute"; Twitter still hides the tweets you reported from you, and you alone.

This is dangerous. If a person has been doxxed on Twitter and they report the tweet, Twitter's UX creates the misleading impression that the offending content has been removed. The solution to this problem is simple: Twitter should let the "Block" or "Mute" options work as intended. While this wouldn't fix Twitter's abysmal record of responding to abuse, it would at least provide a more honest user experience.

When Twitter automatically hides offensive content from the people who have reported it, they create the impression that they have done something, when they have done nothing. Design choices like this demonstrates Twitter's apathy towards effectively addressing hate and abuse on their platform.

Edmodo Has Removed Tracking From Their Web Site For Students and Teachers

2 min read

Last night, I heard from representatives at Edmodo in response to my post on ad trackers. I need to emphasize at the outset that the speed of their response here is a very positive sign. I published my post around 9:00 AM on a Saturday, and I heard from them less than 12 hours later on Saturday night.

In their email to me, they shared that the code and tracking behavior I observed was left over from testing. While they investigate solutions, they are both removing this code, and turning off ads. This change is already in place. As of this writing, there no longer appears to be any tracking of teacher or student accounts. I have done a quick visual examination to verify this with my test accounts.

This is the right step to take. Edmodo deserves credit for making this step, and making it so quickly. I am hoping and optimistic that this is a permanent change.

UPDATE: May 14, 2017

I have heard additional details from the team at Edmodo about their technical implementation. Although my original post was not about their Beta Sponsored Content program, they wanted to be very clear that, for that program, they used Doubleclick's COPPA-compliant flag. The information they conveyed to me is included below:

"(f)or the ads we recently started serving to Edmodo users through Doubleclick, we turned on the COPPA-compliant tag. The COPPA-compliant tag is supposed to prevent behavioral tracking. We have turned off those ads until we can confirm that the COPPA-compliant tag is working properly to prevent behavioral tracking."


Tracking of Teachers and Students in Edmodo

7 min read

UPDATE: Sunday, May 14th, 2017 - I heard from Edmodo last night, and they have removed the tracking that is observed and discussed in this post from their web application. Their response was fast, and they deserve a lot of credit for making this decision, and implementing it quickly. Details available here. END UPDATE

0. Introduction

This has been a rough week for Edmodo. Unlike many other people, I will not be writing about the data breach that leaked information about 77 million Edmodo users. Instead, in this post, I will look at ad tracking within Edmodo that affects both teachers and students.

Looking at Edmodo was not on my list of things to do this week. I did this research on my personal time, completely disconnected from my work. The reason I was looking at all was that I received a message from a person advising me about what to look for, and this message contained details that made the report credible. While I can't promise I will be able to research everything sent my way, I am always interested in working with students, parents, and teachers. If you see something that looks or feels odd, please be in touch.

1. Process

For this post, I set up a test Edmodo teacher account, and two sample student accounts. I observed traffic while logged in using OWASP ZAP. The test student account in this test was from a student in a fourth grade class, so the student would be under 13. All cookies, the browser cache, and browsing history were cleared prior to testing. The browser was re-cleared between all test sessions.

2. What We Aren't Looking At

This spring, Edmodo announced that they are allowing ads (Edmodo calls them "sponsored" or "promoted" content) to be displayed in their site. This post is not about Edmodo displaying ads in their site.

3. Displaying Ads versus Tracking

There is a big difference between displaying an ad and tracking users. When an ad is displayed, the actual ad can be understood as a visual indication of potential tracking.

However, users can be tracked without ads being immediately displayed. This type of tracking is largely invisible to end users, but this tracking sends a regular stream of data back to the data broker/ad network. This data includes, at minimum, the page a user is on, the precise time they are on it, the operating system and version, the IP address of the user, and the browser and version. All of this information is tied together via a common identifier. In many cases, the combination of technical factors about a user - device information and/or IP address - is adequate to identify, or come close to identifying, an individual. Because this information is all tied together with a common identifier, the probability of identifying an individual increases.

Because of this, we treat the display of ads as a separate issue from tracking users. Both can be problematic, and ads can be displayed with or without user tracking. In this post, I focus only on mechanisms used to track users.

4. Tracking Teachers

Teachers are targeted by a range of ad trackers, as shown below. The teacher login occurs in line 175; we can observe multiple trackers getting called after login.

Tracking teachers

This is pretty standard ad tracking behavior, and we are not going to spend additional time on this, as the student tracking is more complicated. However, for educators using Edmodo, this is how your usage information is passed to data brokers when you are logged into the site working with students.

5. Tracking Students

In Edmodo, students are exposed to targeted ad tracking as follows. I will open with a brief description, and then follow that with a more detailed description that includes screenshots from the proxy logs used to capture traffic.

5.1 Brief description

  • A. When a student logs in to Edmodo, Edmodo allows Google's Doubleclick to set a tracking cookie.
  • B. While a student is logged in, there are additional calls to Doubleclick. These calls include information about the student's computer, and the page that they are currently on.
  • C. When the student logs out of Edmodo, this triggers a call to Doubleclick.
  • D. In turn, this spawns two additional calls to ad networks. The ID value that is sent to Doubleclick is the same value that is set when the student logged in, and the referrer from Edmodo clearly identifies the user as a student.

5.2 Details

5.2.A. When a student logs in to Edmodo, Edmodo allows Google's Doubleclick to set a tracking cookie.

Setting a cookie on a student account at login

The login occurs in line 141. The call to Doubleclick occurs after login in line 160.

Setting a cookie value

In the above screenshot, Doubleclick sets a cookie in the student's browser with a unique ID. The test account in this writeup is a student in a fourth grade class, so the student would be well under 13. Edmodo allows teachers to specify student grade level of their courses, so arguably Edmodo would have actual knowledge in some cases if a student is under 13.

Choosing a grade level in Edmodo

5.2.B. While a student is logged in, there are additional calls to Doubleclick. These calls include information about the student's computer, and the page that they are currently on.

Additional calls to Doubleclick

Each of these individual calls contain information about the students path through the platform, which is shared with Doubleclick and tied to the tracking ID created in Step A.

5.2.C. When the student logs out of Edmodo, this triggers a call to Doubleclick.

Student logout

The logout occurs in Line 554. The calls to Doubleclick occur in Lines 561, 564, 571, and 573. These calls are discussed in more detail below.

5.2.D. In turn, this spawns two additional calls to ad networks.

Calls to multiple networks

The ID value that is sent to Doubleclick is the same value that is set when the student logged in, and the referrer from Edmodo clearly identifies the user as a student (note the user_type=student at the end of the URL).

On the left hand side of the screenshot, you will notice a reference to "pubmatic" and "rubicon." These are two commonly used ad brokers: and

Calls are made to these two ad brokers based on the redirect observed above.




6. This Couldn't Happen Without Edmodo's Active Involvement

To see a little bit behind the mechanics here, we need to take a look at the source code on Edmodo's site. The screenshot below is taken from the page source, while logged in as a student user in a test fourth grade class.

Hardcoded Google IDs

Note the conversion ID that Edmodo has hardcoded into their web page. Then, we will take a look at the call that is made to Doubleclick after our test 4th grade student has logged in:

Google IDs sent over

The referrer here is the student's home page within Edmodo, and the call to Doubleclick includes the hardcoded value set by Edmodo.

7. Conclusions

As documented in this post, the presence of ad trackers for both teachers and students can be observed when we inspect traffic via an intercepting proxy. Some obvious questions that come to mind are:

  1. How aware are teachers in the Edmodo community that they are being tracked by ad brokers permitted on the site by Edmodo?
  2. How aware are students, teachers, and parents that ad brokers can collect data on students while using Edmodo?
  3. How does the presence of ad trackers that push information about student use to data brokers improve student learning?
  4. Are Edmodo Ambassadors briefed on the student-level tracking that occurs within Edmodo? If not, why not?

An additional (and likely) possibility here is that not everyone within Edmodo is aware that this tracking is occurring. Companies are not monoliths, and few decisions within companies have the support and/or awareness of everyone in the company.

It is also possible that the student level tracking is the result of a technical error that did not get caught by a QA/testing process.

There are additional questions that can and should be asked, but in the interest of keeping a narrow focus, I will leave things here.

Ad Tracking on Kaiser Permanente's Patient Health Portal

4 min read

Last night, I logged onto the Kaiser Permanente patient health portal. I hadn't done this in a while.

I use a javascript blocker in my web browser. After logging into the site, I was very surprised to see a call to Google Ad Manager.

Call to Google Ads

This sparked my curiosity, so I decided to run the entire session through an intercepting proxy.

The intercepting proxy showed that Kaiser Permanente permits multiple ad trackers to collect data about people seeking health information from the Kaiser Permanente patient portal. To be clear, I was logged in to the portal - I was not browsing anonymously. The observed trackers specifically target logged in users.

In my very brief test, I observed the following trackers: Google Ad Services, WebTrends, Demdex, Omniture, and Doubleclick (which is part of Google). The screenshot below shows a subset of these trackers, taken from the intercepting proxy. I have saved the proxy logs in case it's ever necessary to review or verify them.

Trackers, after login

Kaiser is very clear in their terms that, in their member health portal, they allow third party ad trackers to collect information about patients at Kaiser that use their health portal.

Their terms lack any details about any limits placed on how these third parties can use the data they collect from patients who have logged in to Kaiser's portal seeking health information. Specifically, the terms do not state that third parties who collect data from Kaiser's patient health portal are prohibited from enhancing or potentially re-identifying people within the data set. It's also worth noting that the "opt out" feature is completely ineffective.

However, even basic information could help advertisers target or exploit users. If a person logs onto the Kaiser site four times in a week, that tells a different story to ad trackers than a person that logs onto the site once a month.

Then, if that same person logs onto the Kaiser patient health portal and heads over to WebMD to look for additional information, data brokers can connect the same individual (via cookie values set on the Kaiser site) to both sites.

This ad tracking takes on an even more invasive and intrusive tone for parents who have linked a child's account, or for an adult who is managing health care for an aging parent or sick spouse or partner. Because Kaiser permits ad trackers on its health portal (or really, on our health portal), these intimate, highly personal moments are exposed to ad trackers and data brokers.

The opportunistic business models of data brokers are clearly documented. Packaging health information is good business for them. Data brokers know that people with health issues or concerns can be more vulnerable. As Frank Pasqualle notes in this piece from 2014, data brokers create and sell multiple lists that target health-related issues:

They have created lists of victims of sexual assault, and lists of people with sexually transmitted diseases. Lists of people who have Alzheimer’s, dementia and AIDS. Lists of the impotent and the depressed.

Because of the language Kaiser has included in their terms, it is clear that Kaiser has made a very intentional decision: they are allowing patients looking for health information to be targeted by ad trackers. Kaiser should provide some additional clarity about this practice, and answering these questions would be a good start:

  • What third party trackers are allowed on the Kaiser Site to collect data about logged in Kaiser patients?
  • How long have these trackers been allowed on Kaiser's Health Portal?
  • For each tracker, what data are collected? How is this data used?
  • Why were these ad trackers chosen over other ad trackers?
  • How much revenue is generated for Kaiser via these ad trackers? What are the precise details of the business arrangement between the ad trackers and Kaiser Permanente?
  • How can a Kaiser patient who uses the portal review all of the data that Kaiser has allowed to be collected about them?
  • How does the placement of these ad trackers on the Kaiser Permanente web site, that collect information about logged in users, improve patient outcomes?

I will be contacting Kaiser directly to share these concerns, and I will update this post and/or write follow up posts to share what I learn.

TRUSTe's Opt Out Is a Cynical Joke

1 min read

I've been meaning to write this out for a while.

TRUSTe's "opt-out" option is a cynical joke. The page is here:

Here are a few reasons why this "solution" is worthless.

  • this opt-out "solution" doesn't stop data collection, it just stops the display of ads.
  • participation in this program isn't required; it's voluntary. Some vendors don't participate at all, where others participate but don't integrate with TRUSTe's platform.
    This doesn't work.

    From an end user perspective, this means that opting out via TRUSTe's "solution" requires visiting multiple sites, just to trigger an opt-out mechanism that doesn't actually stop data collection.
  • the "solution" is cookie based, so whenever a person resets their cookies, even these pathetically limited opt out options go away.

An actual solution is using a javascript blocker, and/or uBlock Origin and Privacy Badger.

When an industry-backed "solution" is this toothless, it creates the distinct impression that industry is phoning this in. Over the last few years, the FTC and the New York Attorney General have seen some problems here as well.

BuzzFeed and Methods for Tracking the Trackers; or This Is Hard, Chapter 9674

7 min read

For the last several months, Kris Shaffer and I have been working together on tracking news sites, partisan sites, and hate sites, and their relative popularity on social media. We have also been looking at the advertising and tracking technology used on these sites in an effort to understand how these sites generate revenue. Based on our research, with initial summaries published between February and late March, 2017, we concluded that ad tech and tracking allows misleading news and hate speech to generate revenue.

Kris has three posts on the subject:

I published this piece:

We have been continuing this work because, while our early research showed some significant and interesting patterns, these issues are complex, and we want to be thorough.

Fortunately, there are other people doing similar work. This BuzzFeed article published in early April looks at very similar details to what Kris and I have been researching, and reaches some similar conclusions. However, when reviewing the data behind the BuzzFeed work, I noticed some anomalies that appear to be related to the methodology used to collect the data supporting the BuzzFeed piece.

At the outset, I want to highlight that this conversation wouldn't be possible if all of us weren't describing our methods. While the methodology of the BuzzFeed piece omits some essential details, the overall conclusions still hold up. The need to counter misinformation and the business models that make misinformation profitable are universally recognized, and the more people we have looking at these details, the better.

The more we credit the range of work happening in this space, the better. One paper I hadn't seen until yesterday was this study from Mezzobit. I will definitely be reaching out to look at this service. I have also benefited from being to talk with and learn from David Carroll, Chris Gilliard, Jeff Graham, and Girard Kelly, among others.

But, returning to the BuzzFeed story, this post will look at 3 main concerns: the methodology, the focus on display ads versus the larger ecosystem, and how BuzzFeed's adtech practices compare to the companies they study. I have additional questions on the use of as a tool to track adtech, but a detailed discussion of that topic is outside the scope of this post.

Methodology (Ghostery-based versus intercepting proxy)

Our methodology in studying trackers is pretty straightforward. We use OWASP ZAP (an intercepting proxy) to capture activity when we visit a site. Then, we export all URLs from the session, which is core functionality in the proxy. Then, we use tldextract to break these URLs down into their component pieces to make them easier to study. This gives us a precise (albeit labor intensive) view into what trackers are placed on what sites.

There are multiple ways to get this view, each with their own advantages and drawbacks. The BuzzFeed methodology uses a web-based tool:

Liliana Bounegru, a a co-investigator on the upcoming A Field Guide To Fake News, used the Tracker Tracker tool to extract ad trackers currently present on the homepage and one article page of each of these sites. Some sites on the list are no longer active, so those were discarded in the analysis. Bounegru then used the Wayback Machine to look for archived versions of the homepage and an article page for each of these sites prior to November 2016. In the end, we identified 51 sites that had trackers on their archived pages and were still online in March 2017.

The tracker tool used to drive the BuzzFeed article is a web-based tool that appears to be based on Ghostery. While its output is informative, it's not precise enough to be considered complete. It's still a useful tool because it's going to be imprecise in consistent ways, but the imprecision can lead to a lack of necessary detail.

As an example, the BuzzFeed article mentions multiple sites where they were unable to identify the source of some ads and their associated ad networks.

The networks serving ads on the pages were collected into a spreadsheet. In some cases, we were not able to identify the provider responsible for pop-unders that were present on several sites. We noted that in the spreadsheet.

Using an intercepting proxy, identifying the source of the pop-under ads is pretty straightforward. We ran a test on TMZWorldStarNews, one of the sites identified as having unidentified pop-under ads. The full archive of the BuzzFeed data set is available on Github.

In our review, the url of the pop-under was

When we look at the URL, it contains the string "?utm_source=advertisecom" - and is a known ad network. When we take a deeper look into the proxy logs, we can see the full set of popunders that will be triggered by this provider, along with the affiliated urls used to deliver content. In tracking ads, affiliating domains with specific providers is both important and difficult to do. Using an intercepting proxy helps give a clearer view of the actual traffic, which helps make these connections.

Display ads versus trackers/advertising ecosystem

The BuzzFeed article appears to direct attention onto what ads get displayed, rather than the larger tracking ecosystem.

In order to determine the ads currently running on fake news sites, Silverman visited 76 active fake news sites without an ad blocker, and with the Ghostery browser plugin enabled. (Ghostery identifies which ad trackers are active on a given webpage, and is also used in the Tracker Tracker tool.) For each site he visited the homepage and at least one article page to examine the ads.

However, focusing solely on the display of ads omits the larger ecosystem of vendors that track users. Using the example of TMZWorldStarNews, the BuzzFeed dataset doesn't identify any trackers.

Using an intercepting proxy, we observe nearly 800 different calls to several hundred distinct urls while visiting the homepage and a single article on TMZWorldStarNews. Scores of these distinct URLs belong to ad trackers. Each of these ad trackers get data on users, and many of these ad trackers appear affiliated because they pass cookie IDs to one another. These affiliations are visible via an intercepting proxy, although spotting them requires some detailed searches through the proxy logs.

What does BuzzFeed do?

Another interesting question that we encounter in our study of ad tracking centers on how more mainstream sites track their visitors and deliver ads. It's one thing to say that ad networks will indiscriminately sell to misinformation sites, but it's still another thing when mainstream sites continue to work with ad tech vendors who will sell to anyone. If we look at the web through the lens of ad tech, many web sites with very different content have significant overlaps via the ad tech they use.

From a quick glance at the ad tech used on BuzzFeed, we see some overlaps with what we observed on TMZWorldStarNews. Both sites make calls to the third party sites/ad trackers listed below:


BuzzFeed is not alone here. As we observed earlier, other mainstream sites use the same adtech as highly partisan or misinformation sites.

How can we expect ad trackers to heed calls for increased responsibility when mainstream news organizations continue to give money and user data to companies that support misinformation?


Tracking ad trackers is far more complex than it should be - and getting the details right is essential in mapping the terrain. Ad tracking - and the profiling it requires - is central to making misinformation profitable. It also lays the foundation for increased information asymmetry, which is a key element in maintaining existing power structures. We need to make the entire ad tracking system easier to understand. It's difficult, complicated work - and that's another reason why people who care about getting this right need to work together.

ISPs Can Continue to Collect and Sell All of Our Browsing History, and We'll Never Know

4 min read

Yesterday, on March 28, 2017, Congressional Republicans gave a huge gift to Internet Service Providers (ISPs) by killing rules that would have prevented them from selling our browsing history. Because Congressional Republicans killed these rules, ISPs - companies like Comcast, Verizon, Qwest (aka CenturyLink), AT&T, etc - can continue to sell information about how we browse the web. All browsing we do on the web - from a young child looking for information about dinosaurs, to a teen curious about their sexual identity, to a person reading the news, to a parent looking for medical information, to a person browsing pornography - all of these activities, done inside people's homes, can continue to be tracked and sold to anyone, without our knowledge or consent.

We need to pause here - this is actually as bad as it sounds. If you have kids in your house, their browsing activity can be bundled and sold by your ISP. As their parent, you will never be told that any sale took place, who the buyers are, and how they are using that information. So, the next time your kids are having a playdate, if your kid's friends connect to your internet, your ISP is profiting from the playdate. Thanks to the actions of congressional Republicans, this is universal across the US. Every ISP in the US can continue to do this.

However, this isn't the worst of it. An element that has gone largely undiscussed is how this rule change puts ISPs in a commanding position when it comes to connecting online and offline behavior. Connecting online and offline identity is a leading concern with advertisers - and rest assured, they are looking at this through a racial lens as well. For people who connect to the internet via a phone and a computer, our ISP can now identify both devices as belonging to a specific home. This is incredibly valuable information - and because this information can be shared and sold indiscriminately, it allows for a solid connection to be made between an individual, their home address, their computer, and their phone.

In practical terms, this sets ISPs in a position to be able to track our physical location over time, and predict our location in real time. For all of us who carry smartphones, our phones connect to multiple ISPs over the course of every day - from different cell towers, to coffee shop wireless, to library wireless, to connectivity provided by our school or workplace. If our ISP shares our device information, we can be precisely identified across a range of locations, and a record of our movement can be stored and collected. Location data has been shown to be a strong predictor of identity, but our ISPs are in a position where location data is just a small part of their overall data set

At this point, the only real protection is to use a VPN. However, many VPNs only protect a single device - to protect a home requires setting up a VPN on all devices, or configuring a router to connect to the internet via a VPN. While setting up a router to connect via a VPN is not enormously complicated, it's a significant technical barrier that will definitely be beyond the reach of many consumers.

It's also worth noting that VPNs will only be a realistic alternative if our ISPs don't throttle VPN connections, and reduce their speed to a crawl. Because of actions taken in the FCC under Tom Wheeler, we currently have some protections, but Republicans are also looking to kill net neutrality. This would be bad for a variety of reasons, but would also be another blow to personal privacy.