By Jennifer Stromer-Galley
If you’re a regular visitor to the dashboard, blog or reader of the newsletter, you may have noticed some issues with the site the last few weeks. Weekly, we compare the overall spend data that Facebook reports with what we report on the website – to ensure as best we can the accuracy of our collections and classification pipelines that the dashboard is built on.
On Monday, October 5th, one of our team members found a significant discrepancy in the amount of money Facebook was reporting that Biden and Trump had spent in the past seven days from what we were displaying on the website. While Trump was fairly accurate, our website was reporting roughly one-third less ad spending for Biden. We thus began a mad scramble to determine the source of the error.
In the end, two issues were identified with our collections and classification processes.
- While the Trump campaign tends to run many ads on short time lines, Biden tends to run not only many ads on short timelines but also has some big ad buys with longer timelines. Our collectors were not reaching far enough back in time to look up older ads to update their overall spending.
- When we built our pipeline in January to classify new ads and push the data to a Mongo database for the website to access, we built in an assumption: all ads would have ad creative body text. It turns out that was a bad assumption. Although all or nearly all of Trump’s ads have creative body text, some of Biden’s biggest ad buys are only an image . As such, there’s no text for our classifiers to tag, and the ad’s metadata, including the spend and demographic data, were not being added to the mongo database.
After lots of sweat, swearing, and a few tears (all from yours truly), we now have the website aligned closer to what Facebook is reporting are Trump and Biden ad buy expenditures.
We have removed incorrect blog posts, and are working to correct The Conversation article that reports inaccurate spend data. We will be reposting older blogs with corrected spend data as soon as possible while we also work to share new reports of analyses we are conducting.
Our goal is always to be transparent with the research and journalism community. We regret any errors that we have caused, and will always be forthcoming when we see troubles with our data.