Artificial intelligence is truly a black box.
Journalists are reporting on a phenomenon that is hard to explain, even for experts in the field. And compounding matters, most of the important conversations are taking place behind closed doors. Many of the major advances in the field are proprietary, and the public is overly reliant on one-sided corporate press releases that maximise shareholder benefit and minimise risks. Meanwhile, publicly available information is heavily academic, requiring advanced knowledge of the field to decipher anything beyond the executive summary.
Why It Is Crucial That Journalists Understand AI
Journalists need to develop a fluency in AI before it disrupts both our newsrooms and our society. We have to get better at explaining this technology that impacts nearly all aspects of our lives – from determining what movies appear in our Netflix queue to whether we qualify for loans. But to develop fluency, one needs to have a solid understanding of the infrastructure that makes artificial intelligence work – the datasets that feed the systems and where this information is coming from.
Journalists need to develop a fluency in AI before it disrupts both our newsrooms and our society. We have to get better at explaining this technology that impacts nearly all aspects of our lives.
For one thing, datasets and how they are collected, used and compromised can influence the results of any system. This seems like an obvious point. But even a basic question – like “what information is in the training data for this AI model?” – can lead to a complex answer.
For instance, some of the most important datasets used for machine learning are comprised of millions of images. Usually, a programmer can answer the question of where the data came from or what library was used to generate the results. But what is the information that forms the library? Until recently, this was difficult to answer.
Training data needs to have a lot of items for it to work, so normally most libraries are collecting and compiling information from a few massive data repositories, like Google Images or Flickr. And while most places try to ensure that the data being entered is properly categorised, errors can occur at scale.
In 2015, Google had a widely publicized misstep when software engineer Jacky Alciné realised the Google Photos image recognition algorithms were tagging black people as “gorillas.” It is a horrific and racist association, but why would this happen in the first place? Most experts in the AI field knew why. There wasn’t some racist engineer causing mayhem behind the scenes. It was a data set that had been trained on more images of gorillas than African Americans.
Trickier still is how to solve this problem: the 2018 follow up piece from Wired shows that Google employed a workaround that blocked the image recognition systems from identifying gorillas, but still hadn’t fixed the core problem.
Even a basic question – like “what information is in the training data for this AI model?” – can lead to a complex answer.
And remember, Google owns this dataset, which is powered by users uploading their own photos. And that was just one example that was caught and publicised.
These kinds of issues are more common than we think, and the Google People + AI Research team created a machine learning data visualisation tool called Facets. Now open source, Facets can play with the data and create a clearer visualisation of the information being presented. Researchers Fernanda Viégas and Martin Wattenberg explain the genius of the system and what it can reveal during at MoMA R&D salon:
With Facets, the errors and biases in a dataset are made visible. The first few examples of bias are benign. For example, airplanes are overwhelmingly blue, which may confuse a system trying to identify red or silver flying objects as the same thing. Blank spaces, errors and places where humans and computers disagree on categorisation are also easily seen. But some bias isn’t so easy to correct, and can be quite damaging. At the same salon, noted academic and researcher Kate Crawford linked the underlying bias in photography and in news – for example, why a dataset of the most labeled faces on the Web are 78 per cent white men – to categorisation errors in AI.
No Simple Answers
There are not simple answers in reporting on or understanding artificial intelligence and these examples just scrape the surface of the larger implications of biased systems. Many technology and data journalists have invested in understanding programming principles. I’m going to suggest that all journalists begin studying how computing and programming work on a basic level.
I’m going to suggest that all journalists begin studying how computing and programming work on a basic level.
One does not need to want to become a programmer or even gain proficiency in a language like Python to report on AI. Just looking at how developers approach solving problems will greatly aid the understanding of how these systems are built and designed. This will then improve our framing of these issues in reporting and our understanding of how these systems will eventually impact our newsroom.
Because journalists do not understand the basics of how artificial intelligence works, we are prone to missing the larger picture or over sensationalizing our stories. Rachel Thomas, co-founder of Fast.ai, recently took the Harvard Business Review to task and shared lessons applicable to how journalists think about AI:
“The media often frames advances in AI through a lens of humans vs. machines: who is the champion at X task. This framework is both inaccurate as to how most algorithms are used, as well as a very limited way to think about AI. In all cases, algorithms have a human component, in terms of who gathers the data (and what biases they have), which design decisions are made, how they are implemented, how results are used to make decisions, the understanding various stakeholders have of correct uses and limitations of the algorithm, and so on.”
So much in understanding machine learning and artificial intelligence are about the framing. If you ask better questions and set better parameters, you receive a better result. Journalists are trained to examine the frameworks. We do this as a matter of course in our work. But for us to truly inform the public on the full potential of the AI revolution, we need to be working from a stronger knowledge base.
The deadline is also quickly approaching for the AI Ethics Initiative’s challenge, which offers a share of $750,000 for ideas that shape the impact artificial intelligence (AI) has on the field of news and information. Anyone around the world can submit an idea in one of four categories that address pressing issues.
You might also be interested in Why Journalism Needs Scientists (Now).
Opinions expressed on this website are those of the authors alone and do not necessarily reflect or represent the views, policies or positions of the EJO.