Editor's Pick

Avi Eyal, Co-Founder and Managing Partner,...

Reinventing Retail with Technology

Harvey Lewis, Chief Tax Data Scientist, EY UK &...

Lessons From Applying AI in a...

Jim Knight, Chief Education & External Officer,...

Education Technology Growth in Europe

Debbie Green, VP Of Applications, ORACLE [NYSE:...

Man and Machine: a Dualism like Yin...

Artificial Intelligence or Artificial Stupidity? The Difference Rests On Data Quality

Dennis Kessler, Head of Data Governance, European Investment Bank (EIB)

Artificial Intelligence or Artificial Stupidity? The Difference Rests On Data Quality

Dennis Kessler, Head of Data Governance, European Investment Bank (EIB)

A squashed fly drops into a police printer, causing the arrest warrant being issued to name ‘Buttle,’ a law-abiding family man, instead of ‘Tuttle,’ a rogue plumber. So begins ‘Brazil,’ Terry Gilliam’s 1985 dystopian film featuring bureaucracy working beyond anyone’s control. Well-intentioned attempts to correct the administrative error and free the wrongly arrested man lead inevitably to torture, madness, and death.

Basic data-quality controls using cross-validation with other key data fields could have revealed the error before too much damage was done. But, blind faith in processes and procedures meant salvation came too late.

Such are the fears of autonomous artificial intelligence (AI) and machine learning (ML) systems wresting control from humans and functioning beyond anyone’s control.

Growing awareness of the presence of algorithm-based decision-making systems in everyday life and commerce explains why books such as Homo Deus by Yuval Noah Harari have become runaway best-sellers.

Yet while algorithms, AI and ML attract attention and investment, the need to ensure quality data is often overlooked. Even finely calibrated and carefully tested models can be tainted by faulty data.

A poorly judged Amazon book recommendation or Netflix film recommendation will not harm quality of life or well-being. The stakes are different for a loan request, a job application or even a medical diagnosis, where biased decisions from tainted data can be highly damaging.

Certainly AI can help to identify and reduce the bias and prejudices inherent in human-based decision-making—though sometimes the opposite can be true.

In May 2016, an investigation found that algorithms, which were already widely used by judges to help determine the risk of reoffending, erroneously flagged black defendants as future criminals almost twice as often as white defendants.

Conversely, white defendants were classified as low risk more often than black defendants.

Those disparities could not be explained by the defendants’ prior crimes or the type of crimes for which they had been arrested. Ultimately, the data on which the methodology was based were skewed and, in some cases, not even properly understood.

Ironically, the system had been introduced to try to reduce the bias inherent in the sentencing by judges or release decisions made by parole boards.

The algorithms driving such applications are expected to be self-learning, improving themselves after errors are detected, and refining the model with hidden bias detected and purged over time in a virtuous cycle.

Yet racial bias has even been detected in Google’s revered search algorithms. In 2013, Latanya Sweeney, a well-known Harvard university professor and respected computer scientist, noticed that searches for her name and other African-American-sounding names were more likely to trigger the display of an ad for a background check on arrest records than searches for white-sounding names. She eventually concluded that Google’s automated ad-serving algorithm was detecting and adapting to user click-through actions occurring more frequently when the ads were linked to black-sounding names.

"Even finely calibrated and carefully tested models can be tainted by faulty data"

The algorithm seemed to be recalibrating itself based on a widespread pattern of discriminatory behaviour by millions of users—effectively learning from a flawed dataset.

Savvy organisations with good data governance standards and practices typically define data quality in terms of ‘dimensions’ such as completeness, accuracy, integrity, and so on. Data quality issues are classified by dimension and then used to identify trends, support root-cause analysis, and resolve issues.

The successful campaign to re-elect Barack Obama got data quality and analytics right in 2012. Supported by powerful ML tools, the data analytics team categorised known voters in terms of different metrics, which they then used to aim campaign ads at specific target groups. This drove the ‘Get out the vote’ campaign, which helped return Obama to the White House. Accurate data combined with the right analytical insights were key.

By contrast, Hilary Clinton’s election campaign in 2016 featured faulty polling data which erroneously indicated a big lead in three states that she ended up losing. The analytics led the campaign to incorrectly prioritise other states instead, which contributed to Clinton’s narrow defeat.

Although the algorithms might have been solid, the data was fatally flawed.

More than elections are at stake when data analytics drive decisions between life or death. In 2018, the U.S. Department of Defense set up its Joint Artificial Intelligence Center (JAIC).

Having focused to date on object detection, classification, and tracking, its focus is now shifting to develop AI-driven autonomous weapons of the kind central to films like ‘Robocop’ and ‘The Terminator’ series.

A real-world version of the fictional ‘SkyNet’ becoming self-aware and triggering an AI-led robot apocalypse might not be imminent—whether or not it features indestructible cyborgs with Austrian accents.

Yet plenty of data scientists, AI specialists, and futurist entrepreneurs are clearly concerned about this rapidly developing trend. In 2012, ‘The Campaign to Stop Killer Robots’ was co-founded Jody Williams, who won a Nobel Prize for her global campaign to ban landmines.

Then in July 2018, 2,400 AI scientists and entrepreneurs signed a declaration to refuse to participate in developing or manufacturing so-called lethal autonomous weapon systems (LAWS).

This led to the UN in Geneva in March 2019 debating proposals to ban LAWS. The UN secretary general, António Guterres, described such machines as ‘morally repugnant.’

And reflecting the inherent risks, the U.S. Defense Dept’s JAIC unit announced in September 2019 that it was recruiting an ‘ethicist’ to help guide the development and application of AI in weapons systems.

Doubts remain about not just algorithms but also the sensors and datasets, with many fearful that deployment of such weapons systems will lead to tragedy or even worse.

In his 2015 book, The Master Algorithm, author Pedro Domingos notes, “People worry that computers will get too smart and take over the world, but the real problem is that they are too stupid and they have already taken over the world.”

Developing AI applications without the right data makes no sense—based on the ancient principle of ‘garbage in, garbage out.’

However finely honed the algorithms may be, data of pristine quality is needed to avoid ‘artificial stupidity’ and instead deliver the more intelligent kind.