5 Big Hurdles in Taking Machine Learning to Production

1.11.2023
Dr. Matthias Böck, Roua Guesmi

Introduction
Production and adoption: The introduction of machine learning is a complex task
For machine learning and AI: big investment ≠ big harvest
Good data products require good product thinking
Striking the right chord between AI capabilities and public perception
Ethical perspectives on machine learning are becoming more and more important
FOMO around generative AI: Do all roads lead to Large Language Models?
Hallucinating and concatenating: What many people forget about AI
Summary

Ever since the rise of OpenAI and Stable Diffusion, machine learning has been hyped as a game-changing technology that can revolutionize everything from our everyday work life to how businesses function on a fundamental level.

By leveraging machine learning algorithms, businesses can automate processes, improve customer experiences, and make data-driven decisions at scale. To do so, one has only to adopt the latest tool stack, go through one of the “speed up your productivity 100x with these prompts” posts on their preferred social media channel, or hire some skilled data scientists. Or at least that’s the theory.

In this blog post series, we’ll highlight the hidden elements that we believe are preventing businesses like yours from getting the most out of machine learning.

We’ll delve into why many businesses fail to maximize their usage of machine learning systems over time, why a change of mindset might be the key, and outline what we believe to be the best approach to help you avoid the most common pitfalls.

We’ll also provide real-world examples and insights from customers to help you get to grips with a new way of approaching common machine learning challenges.

Bildschirmfoto-2023-10-30-um-13_44_39-600x600

Production and adoption: The introduction of machine learning is a complex task

Developing a sophisticated algorithm based on high-quality data and making accurate predictions are critical components of a successful machine learning project. However, the major challenges remain those of a strategic or process-related nature, namely:

Defining the right questions
Deploying and maintaining machine learning into production
Organizational adoption and enablement of a company to actually make use of it

On a technical level, only a small percentage of a machine learning solution actually comprises machine learning code. The rest is dedicated to duties such as model deployment, retraining, maintenance, updates, experimentation, auditing, versioning, and monitoring. As a result, bringing machine learning solutions into production is a complex task that requires a combination of various skills and technologies.

Overview of the connected infrastructure

Bildschirmfoto-2023-10-30-um-13_44_39-600x600

(Source: www.datarevenue.com)

For machine learning and AI: big investment ≠ big harvest

Many companies have responded to these developments by investing heavily in their tool stack and recruiting skilled data scientists This helped them gain deeper insights into their data and to apply state-of-the art machine learning methods. And yet they continue to struggle to bring these methods into production or to keep up with recent AI developments and fully leverage their potential.

This is largely due to a reliance on project-focused mindsets instead of applying product-oriented approaches. Proof of concepts (POCs) or ad hoc analyses help businesses to get a better understanding or to get buy-in for a specific initiative, but they lack a plan for ongoing development and sustainability.

This lack of product-oriented thinking usually manifests as:

Failing to adapt models to changes in the data or applied context
Unclear accountabilities for maintenance and maintenance processes (spoiler: in most cases, it shouldn’t be your data scientists taking care of ongoing maintenance)
A lack of monitoring of the model’s performance
Unreliable code maintenance and versioning of data
And worst of all: solutions that are not used, or fail to add value for their users.

Technology and a group of data scientists alone cannot overcome these issues.

Good data products require good product thinking

One solution to these issues is to start asking ourselves the question: Are we aiming to build something that lasts? Or are we just building a POC that we’re willing to potentially throw away again soon?

Product thinking is a framework that helps you to build products that people actually care about. By including all relevant stakeholders throughout the entire development process, it helps to keep the focus on the actual user’s needs. This user-centric approach is the cornerstone of value creation in any data product.

What is the definition of a good data product?

A simple definition of a successful product is that it needs to fall into the intersection of being valuable, feasible, and usable. This means that the perspectives of user experience, technology, and business need to be harmoniously interwoven during the product creation process. There are several ways to achieve this, and in this blog post series, we’ll give you some real-world examples and insights from our customers to help you do so.

Definition of the most important aspects of a product (inspired by the work of Martin Erikssen):

Bildschirmfoto-2023-10-30-um-13_44_39-600x600

Alongside what we believe to be the right approach to developing data products, there are a number of external challenges to bringing machine learning systems into production. We’ll now look at a couple of these and outline what you need to consider.

Striking the right chord between AI capabilities and public perception

According to Gartner, even among companies with significant AI experience, 85% of machine learning projects fail to meet expectations and just 53% of projects successfully move from prototype to production.

Machine learning systems not fail only on their way to production but also when we are not aware of how the system is used and perceived by its users. Let’s take two prominent examples from Amazon and Microsoft:

Amazon’s recruiting tool
Amazon developed an AI-powered recruiting tool in 2014 to automate the process of identifying top talent from resumes. However, in 2015, the company’s machine learning experts found that the tool was favoring male candidates for technical roles, as it was trained on resumes submitted over a 10-year period, where the majority of applicants were male. This incident underlines the importance of training AI models on diverse datasets to avoid biases.

Microsoft Tay
In 2016, Microsoft introduced Tay, an AI-powered Twitter chatbot, which was designed to simulate human-like conversations. Within 24 hours, Microsoft had to shut down the chatbot after it started making racist and offensive comments to other Twitter users. This incident might be a few years back, but recent large language models (such as ChatGPT) also use (among other data) social media data for model training. A significant proportion of the fine-tuning efforts after training the model goes into setting boundaries and filters for the model to not repeat what it has “seen” there.

And you don’t have to look far for other stories of AI failures with drastic consequences:

Tesla’s car crash due to autopilot failure
IBM Watson misdiagnosed cancer patients
Alexa tells a 10-year-old to touch a live plug with a penny for a challenge
Further examples of AI failures: https://www.privateinternetaccess.com/blog/ai-gone-wrong/

Failure typically comes at a cost and the cost increases with the number of people who are affected by or perceive said failure. Therefore, it is essential for companies to recognize the challenges involved in implementing these technologies as well as the potential costs of rolling out a poorly or not working solution.

Ethical perspectives on machine learning are becoming more and more important

It’s crucial to take ethical concerns such as fairness, transparency, safety, and privacy into account when working with machine learning models – and legislative bodies are already beginning to do so:

UNESCO produced the first-ever global standard on AI ethics – the ‘Recommendation on the Ethics of Artificial Intelligence’ in November 2021. This framework was adopted by all 193 member states.
The EU adopted the EU AI Act in June 2023, which is designed to regulate the use of AI systems in Europe. Assessing the potential impact of these regulations will be crucial for any organization leveraging AI.

The academic and industry sectors are not far behind. Incorporating ethical considerations into their strategies has become paramount. A telling indicator is the 2023 AI Index Report by Stanford University, which states:

“The number of accepted submissions to FAccT, a leading AI ethics conference, has more than doubled since 2021 and increased by a factor of 10 since 2018. 2022 also saw more submissions than ever from industry actors.”

What are our organization’s ethical standards for AI?
Where are we currently in relation to these guardrails?
How can we come to a deep understanding of potential risks and how can we contain those risks?

By taking an ethical approach to data product design, you can create products that are valuable to users and society at large while also avoiding damaging consequences. It also puts into context the risks associated with adopting machine learning systems purely from the perspective of not wanting to “miss out” on the next big thing.

FOMO around generative AI: Do all roads lead to Large Language Models?

The fear of missing out on generative AI is prevalent enough for many companies to throw caution to the wind when faced with the concerns listed above. They’re drawn into the whirlwind of AI innovations published daily by various influencers and media outlets. These stories chart the race between big tech companies to produce the most advanced large language model (LLM) and the promise of endless new business opportunities.

This hype wave was triggered by OpenAI’s move to make their LLM publicly available with ChatGPT, a chatbot that enables human-like communications. According to the rumors, their LLM consists of up to one trillion parameters and has been trained on a huge amount of data (570GB of text) taken from diverse sources such as Wikipedia, books, and the web to predict the next most likely word given a list of words, such as a user’s input question.

The capabilities of the latest LLMs are impressive and to give a few examples allow you to:

Write or debug programming code
Summarize the transcript of a meeting or any other text or audio
Summarize the answer to a given search request (currently not all models can search the internet and also give real references)
Translate between various languages
Create content for diverse formats such as email, apps, and blog articles
Being a copilot on any of the above topics and more by helping with research, challenging an existing concept, or suggesting a plan for developing one.

The list goes on and is updated on a daily basis. It has expanded from automated slide generation to AI-powered analysts that are capable of analyzing and interpreting data sets or screenshots of dashboards and images by themselves.

New concepts such as AutoGPT even spawn their own AI agents to autonomously solve a given task list and expand the LLM’s existing functionalities with the ability to access the web or generate and execute Python code.

Microsoft just released an open-source Python package called AutoGen, which allows you to build a workflow with different agents, which can be customized with specific capabilities and roles. The general idea is that these agents communicate with each other, challenge the results, ask follow-up questions, or decide if a solution has been found or even needs to be discarded due to restrictions.

Hallucinating and concatenating: What many people forget about AI

These are impressive and potentially world-changing capabilities. Though the models are getting better and better, they can also be prone to “hallucinating”, meaning that they make up answers that are incorrect or make no sense. In the heated debates and marketing around AI, we should remember that they are language and not knowledge models.

These models have no understanding of the world and the interrelation of different topics but are often – but not always – astoundingly good at concatenating the right words together.

The flip side of this ability is that they can be used on a vast scale for fraud/phishing, fake news, to politically influence bots on social media, and even for so-called prompt injection, which allows attackers to influence the model’s behavior. The models have bypassed our cognitive capability to understand them and most of their answers to the questions we throw at them are convincingly eloquent. But they are not necessarily correct.

Big search engine providers such as Google and Microsoft Bing have been watching the incredible rise of OpenAI’s ChatGPT, which has the fastest-growing user base in history. They have since made strong moves to roll out their own solutions. Both Google’s Bard and Bing’s AI have garnered a lot of attention, but the first malfunctions were quick to pop up on social media. Bing’s chatbot claimed to have fallen in love with its user and tried to convince him to leave his spouse. On another occasion, it refused to believe that the year is 2023 and became quite snappy with its user .

While these examples sound funny, companies can seriously damage their reputations by putting out a poor solution. This is especially true in the field of internet search, where Google has been the dominant player for years. Nearly all of the big tech companies are rolling out or will soon roll out not only text generation but also image, audio, and video generation as key elements of their products (Alexa is on her way to becoming LLM-powered, and the same goes for Apple with AppleGPT).

The heated debate on what has been released upon the world and how this will change the internet forever will continue and we should be an informed part of this. For a critical perspective on these developments, we recommend reading Gary Marcus or Alberto Romero’s perspectives.

As the big tech companies continue their race, how do we, as a society and individuals, ensure that the path of AI development aligns with ethical standards and societal well-being? The debate is far from over, and the decision on which path to take has to be made sooner rather than later.

Even though we’re getting faster and faster at building new prototypes, the fact remains that ethical discussions continue to loom like a shadow over this progress, and that 53% of machine learning projects never make it into production. If we want to build something that’s actually useful for users – and continues to be useful over time – we have to engage with the hard questions and be open to a shift in thinking to help us bring projects over the finish line.

Summary

Bringing machine learning systems into production requires companies to overcome not only technological but primarily organizational challenges. Whether you are a data scientist, a software developer, or a business leader, we hope that the insights provided in this blog post help you to better navigate the beautiful world of machine learning deployment and successfully bring your machine learning use cases into production.

In the coming weeks, we will dig into the following key areas with dedicated blog posts including successful examples from customer projects, common pitfalls, and of course potential solutions:

Maturity
Strategy
Model development
Evaluation
Operation

If your business needs support in tackling the obstacles discussed in this post, our team of data experts is happy to provide insights and support with the process of bringing machine learning into production. You can find out more about our services here or get in touch with someone from our team here.

If you’d like to be notified when the next post goes live, feel free to sign up for our newsletter to receive updates every 2-3 months about key industry updates and opinion pieces such as this one. Happy reading!

You may also be interested in Server-side tracking explained Read More

Back to Overview