Training AI in an Ethical Way


Share on LinkedIn

Over the past year, the business world has had a collective crash course in AI, developing a deeper understanding of how to use this new technology impactfully and responsibly. Near the end of 2023, a survey of digital marketers revealed 95% consider AI a marketing game-changer, and half believe AI will play a significant role in most marketing tasks.

This year, we can expect an even greater outpouring of practical AI use cases for business. Leaders in AI tech are jockeying for position in the marketplace: OpenAI has been teasing a leveling-up in AI sophistication via the upcoming GPT-5, and Google has announced that its Gemini platform will expand AI training data beyond text.

The data sets used to train AI are expanding. But larger data sets mean greater data privacy risks. One of the great AI challenges for 2024 will be in learning to utilize the power of data to produce positive outcomes, while also protecting consumer privacy and guarding against brand risk. Violating data privacy will trigger regulatory scrutiny, negative publicity, and a loss of users’ trust. Business leaders must take the time to develop ethical usage policies for AI – policies that will ensure consumers are delighted, rather than frustrated or threatened, by the introduction of AI tools into their daily lives.

Bigger can be better – but riskier

In reality, size in and itself is not the most valuable characteristic of an AI model. When Common Sense Media rated 10 of the most popular AI tools for how they managed privacy, ethical use, transparency, safety, and impact, the tools that were trained on the broadest data sets carried the most risk. AI products trained on the most selective data sets were the safest. Google’s Bard – powered by Gemini – scored 75% for privacy, while ChatGPT scored 48%.

These findings have significant implications for how businesses can and should implement AI tools. Businesses are well advised to begin their AI implementation journey with manageable datasets where they can proceed with confidence as to how the data is trained, the risks of accessing personal information, and the impact of privacy regulations. Only when these initial, contained use cases have been successfully deployed should business leaders advocate for greater expansion.

What data sets should AI be trained on?

A broader range of training data sets provides AI a very diverse range of information sources, which may generalize and predict outcomes across any number of possible scenarios. But what if those data sets contain biases, or inaccuracies? If you’ve been following the evolution of generative AI, you’ve almost certainly heard about how insufficiently trained AI tools can easily replicate and amplify misinformation and skewed perspectives.

To avoid these potentially harmful pitfalls, it’s critical to prepare and clean the data before it is fed into AI training models; to provide human oversight at critical junctures; and to maintain tight control over how tools are used and monitored for bias and undesired outcomes. For some AI tools, using more selective data sets can even help deliver more relevant outcomes for use cases that necessitate highly specific queries and scenarios, and that demand accurate and highly specific insights.

What are the risks that the data contains personal information?

Data sets used for AI training always demand scrutiny upfront, because if any sensitive or personally identifiable information (PII) makes its way into the data set, serious privacy issues can ensue. When a data set contains PII, there’s a very real risk that an AI system might misuse it. Significant negative outcomes, such as a privacy breach, can be harmful to consumers, and a reputational black eye for the business. Such risks are especially problematic if data was collected without explicit user consent, or without clear disclosure that it may be used for AI training. Even ChatGPT was briefly shut down in Italy in 2023 due to lack of clear disclosures and controls. Developers and organizations must implement strict data handling policies to mitigate privacy issues while training AI – including disclosures and easy-to-understand opt-ins and opt-outs.

Is the data compliant with privacy regulations?

Massive growth in AI implementation arrived in the midst of a worldwide reassessment of how data privacy should be practiced. Today, businesses navigate a complex landscape of compliance, where the bar for privacy is set wherever privacy regulations are tightest.

Regulators in the European Union have demonstrated they’re not afraid of enforcing GDPR and levying steep fines for violations. Stringent management and oversight of personal data must be a top-level concern. Businesses must grant control to individual users over how their data is used, how AI algorithms use data, and how AI systems make their decisions.

This is where bias in AI algorithms becomes not just an ethical matter, but a legal one. Regulations are emerging that require AI applications to avoid bias and discrimination. To ensure compliance, businesses should conduct regular audits and assessments of their AI models, to identify and rectify any biases that have been inadvertently engineered into the tech. We should expect even more regulations to go into effect around the world, keeping abreast of the proliferation of AI tools that rely on user data.

As AI continues to evolve rapidly, businesses and organizations must ensure their use of the technology remains ethical and privacy-compliant. All business leaders, and any stakeholders in the organization who touch AI applications, are obliged to stay informed about emerging regulations and guidelines, and to educate themselves about where the ethical risks lie in the AI tools they use. Ongoing compliance is a necessity to retain and build trust in the marketplace among users and business partners alike, and stakeholders need to work together to adapt their AI strategies accordingly.

Damian Rollison
With over a decade of local search experience, Damian Rollison, SOCI's Director of Market Insights, has focused his career on discovering innovative ways to help businesses large and small get noticed online. Damian's columns appear frequently at Street Fight, Search Engine Land, and other publications, and he is a frequent speaker at industry conferences such as Localogy, Brand Innovators, State of Search, SMX, and more.


Please use comments to add value to the discussion. Maximum one link to an educational blog post or article. We will NOT PUBLISH brief comments like "good post," comments that mainly promote links, or comments with links to companies, products, or services.

Please enter your comment!
Please enter your name here