People tend to simplify. So all types of machine-assisted thinking get dubbed «artificial intelligence,» or AI.
What we’re really talking about with AI or machine learning is an algorithmic approach to analysis: We’re going to look at a bunch of data, and artificial intelligence is going to help us figure some things out. It’s going to tell us things we would have a difficult time finding ourselves.
Here’s where machine learning comes in. It’s all about training a model. When we speak about a model we’re really just talking about code. It’s a coded algorithm—a coded set. A statistical model is the other term for it. So for the data architects of the world, when we think of a data model, we’re thinking of tables, columns, and relationships. In the simplest form, however, we’re talking about a specific set of data.
Here’s the famous example: If you feed the algorithm only pictures of cats, the only thing it knows about is cats. And when you ask it a question, the first thing it can say is, «Is it a cat or is it not a cat?» So what’s the outcome? In the data world, we would probably call it skew: The data is skewed in one direction because we gave it this limited set of data. When we talk about the ethics of data, one of the things we’re talking about is bias. The mere overemphasis on cats is a form of bias. Structurally, data can be biased because of how it relates inside of a machine learning algorithm.
Cats don’t appear to be a big issue, but what if the same structure is about employment? Look at unemployment data in the midst of the pandemic, say July 2020. If all of your predictions about the economy are based on that data set, the results will be biased. That’s why when it comes to machine learning, the more data we have, the better. If you add data sets for employment before and after the pandemic to the data from the middle of the pandemic, you’re going to work around less bias.
When we examine bias in machine learning, a data set is just the beginning. But it’s the first element you have to wrestle with. What data are you providing for the machine learning algorithms? Where did it come from? Are you getting a broad enough set of data that you’re not presupposing the outcome? That’s the base level set of questions.
When it comes to the breadth of a data set, where are you going to get it? Do you have to buy it? You might. In most cases, more information is better, but again, you still have to protect against this potential bias in the data.
Algorithms Are Coded Bias
The well known industry thought leader, Donald Farmer, was asked whether an algorithm can have prejudice or bias. And his answer, essentially, was “of course.” Algorithms are coded bias.1
So the first issue was potential bias in the data, but now we have the algorithm itself to confront. The algorithm can be biased depending on who wrote it and how. That’s where we enter a conversation about diversity in the workforce—in all the dimensions of diversity including experience, socio-economic background, ethnicity, race, and more. It’s not just one factor. It’s multidimensional like so many things in analytics, like so many things in life itself. How an algorithm is coded depends on the thought process of the person doing the coding. The algorithm reflects us, both our promise and our limits.
Amazon, for example, tried to do an automatic human resources valuation for applicants and it turned out that AI was systematically placing women at a disadvantage.2 Women applying for a job wouldn’t make it past the first stage because the pertinent data set showed that mostly men had held these jobs previously. Though now we know we want more diversity in our workforce and we’re trying to get more women into technology, algorithms such as that still reinforce the bias in previous generations.
So the challenge becomes building a team that can look at not only the algorithms, but also the data, conclusions, and results, in an equitable and fair-minded way, in an expansive fashion. It’s going to require investment in things like an AI risk framework, an evaluation framework, and some sort of an ethics program where people are actively engaged in resolving these issues.
Not everything is automatable, at least not right now. You are currently going to need a data governance council or data governance board: a group of people who are there to examine output, examine the process, and ensure balance. Furthermore, you’re going to need to exert human will over issues such as privacy. Ethics is not just eliminating bias. It includes all the other issues a company needs to engage with, for example, “My information is not for you to share, and that means not just my data, but how that data interacts with algorithms.”
So to establish an ethical AI process we have to put some controls in place to make sure that we aren’t incurring additional risk, and that we’re not going to inadvertently continue to promote a stereotype nor violate a compliance regulation.
First Three Steps
A discussion of the ethics of AI is an integral undertaking of any company that wants to do the right thing and serve its customers. Discussion for the sake of discussion is not rhetorical. We have to do it. But we also need specific, well-crafted actions to create ethical movement in our companies. Let’s talk about three initial steps any of us could take to do just that.
1. Define ethics
First you have to define what you mean by ethics, and make sure the definition aligns with your corporate values. You must build awareness in your organization that ethics in AI are important. You have to build a framework of some sort to track and monitor compliance to your AI ethics standards, which means just like all analytics, you have to define KPIs. How are you going to measure your success or failure on implementing ethical AI?
Then you have to express clearly how you arrived at your ethical profile. Make certain that you articulate this, along with the operational and governance aspects, to your stakeholders and to the public. Transparency is an important element that could affect your organization’s reputation as more people become concerned about their privacy and how their data is used.
2. Find unbiased data sets
Suppose you buy an AI program that’s been built and tested, and customers are raving about it and testifying to the results they get. Well, you still have to feed that AI data. Do you have an unbiased data set to feed it? In my mind, that’s the biggest obstacle to ethical AI. This requires good data management and data governance, against which you can profile the data. You must be able to trace the data lineage. How fresh is the data? Where did it come from? Is the profile of the data appropriate for feeding into the AI?
Again, it is not only the data set itself that should be considered. It’s also the algorithm that runs that data set.
To guard against both biased data sets and biased algorithms, companies should consider the following factors when creating their own data sets.
- The source, variety, completeness, and appropriateness of the data set feeding the models. Does it contain a comprehensive cross section of the data about the subject being studied? If not, it may be possible to add synthetic data to make the data set more balanced. More data is usually better.
- Is the model-building team diverse and inclusive so that no one perspective (potentially introducing unconscious bias) can skew the results?
- Is there a review process to evaluate the output of the models to ensure the results are unbiased and the suggested actions are ethical? Is the outcome of the model in line with the business’s values and goals?
The same applies for third-party data sets. Basically, you want to do due diligence on both the data and the provider.
3. Make the relationship between your AI and customers ethical
As a potential customer, if I buy something, or even just look at a page online, you may want to offer me ways to pursue that apparent interest. But if you operate clumsily, you will end up offering inappropriate options, and firing what looks like sales offers at me. If I looked at a garage door once, you’re not necessarily helping me by sending a dozen ads for garage doors every time I sign on. More likely you are now annoying me because I did not ask for this. That’s definitely not customer friendly. And I may consider it an unethical use of my information.
It gets to an even more basic question. As a company, is it ethical to use the data I get from you even if my intent is to make your online experience smoother?
Here’s another example many people are familiar with: Cambridge Analytica.3 Pretty much everyone today agrees that was an unethical use of data. It was gathered without permission, it was uncompensated, it was leaked by Facebook to Cambridge Analytica, and Cambridge Analytica sold it to political groups. It’s not anything that anyone who had provided the data would have ever approved had they been aware.
This question of awareness brings us back around to where we started. We have to acknowledge and examine ethics in how we deal with data, algorithms, machine learning, and artificial intelligence. We cannot help our customers without understanding what they want. As customers, we cannot understand what companies ought to do, what they are capable of doing, if we are not aware of what we want and what we’re willing to pay for it.
In other words, we need to talk to ourselves and we need to talk to one another. This is an exciting opportunity to create an ethical machine learning environment together.
To investigate more on the ethics of AI, delve into these articles.
Ethics of Artificial Intelligence and Robotics: This is an academic article from the Stanford Encyclopedia of Philosophy on ethics in robotics and AI.
A Practical Guide to Building Ethical AI: This is an HBR article that proposes a methodology for companies to consider when operationalizing data and AI ethics.