GenAI Language Models: Scope and Typical Use Cases

0
391

Share on LinkedIn

After writing a good number of times about how to responsibly use AI, especially generative AI and how to find the first use cases and right models to work with, it is now time to have a deeper look into making sense of the whole AI story again.

Mind you, the hype is not yet over, with some consultancies juggling numbers of several trillions of dollar that generative AI can add to the economy while still a lot of startups create applications that merely use an existing large language model (LLM) without adding significant IP.

On the positive side, the EU AI act takes serious shape and form and has already passed the European Parliament.

On a micro-scale, we are still looking far too much on LLMs to solve business problems, almost considering them a silver bullet.

Which they are not, far from it.

As it is often the case, one size doesn’t fit all – or shall we say one model doesn’t fit all. And, as we do know by now, LLMs are called large because they are large, very large. This means, that, even if it is possible to use an LLM for a use case, it is not always the best approach to throw it at this use case for efficiency reasons. It might take an LLM a bit longer to solve the problem and it surely incurs higher run-time costs. There is plenty of research on this topic. Here’s one from May 23, that quotes the daily cost of running ChatGPT at $700,000 per day and the usage of GPT4 to support customer service for an SMB at more than $21,000 per month. 

On the other hand, there are a good number of problems that are already solved efficiently using other technologies than LLMs. Think predictions, anomaly detection, NLP, translations, transcripts, and many more use cases. Surely, all these tasks can be accomplished with an LLM, perhaps augmented by RAG (retrieval augmented generation) and/or fine-tuning. You get the point here, right? One uses the more powerful and expensive tool — and invests more into it — to be able to apply it to a given task.

The basic question to ask oneself is if or whether the new solution approach is better than existing solutions. Example: Blockchain never got real traction because it is a solution in search of its problem. To be sure, this is not the case for LLMs — the level and quality of content generation across modes that these systems can achieve is unprecedented. There are clear use cases here.

Each type of model serves different needs, balancing between computational costs, versatility, and task-specific performance. The choice of model depends on the application’s requirements, including desired output quality, available computational resources, and the specific nature of the tasks it needs to perform.

So, let’s talk a little about typical models, their scope, and typical use cases. Who better to ask than GPT4 – of course with some editing?

Large Language Model (LLM)

Large language models are AI systems with a vast number of parameters, typically in the billions or even trillions, designed to understand, generate, and interact with human language at a sophisticated level. They are trained on extensive datasets covering a wide range of topics, languages, and formats.

Differentiation

LLMs are differentiated by their size (number of parameters) and their ability to perform a wide array of tasks without task-specific training. Their size allows for a deep understanding of language and context, making them highly versatile.

Size Suggestion

Typically, LLMs have over a billion parameters, with some of the most advanced models today reaching into tens or hundreds of billions.

Typical Use Cases

Content generation, including the formulation of recommendations, conversation agents, language translation, complex question answering, summarization, and as a foundation for further fine-tuning for specialized tasks.

Medium Language Model (MLM)

Medium language models are scaled-down versions of LLMs, with parameters usually in the range of hundreds of millions. They maintain a broad understanding of language but may not reach the depth or nuance of LLMs.

Differentiation

MLMs strike a balance between computational efficiency and language understanding capabilities. They require significantly less computational power to run than LLMs but can still handle a variety of language tasks well.

Size Suggestion

Typically in the range of 100 million to 1 digit billion parameters.

Typical Use Cases

More constrained content generation like summarization, anomaly detection, basic conversational agents, text classification, and language understanding tasks where the full power of an LLM is not required.

Small Language Model (SLM)

Small language models are the most lightweight in the language model family, with parameters often in the tens of millions. They are designed for efficiency and can run on limited computational resources.

Differentiation

SLMs are optimized for speed and low resource usage. While their understanding and generation capabilities are more limited, they are well-suited for real-time applications and devices with limited processing power.

Size Suggestion

Usually under 100 million parameters.

Typical Use Cases

Embedded systems, mobile applications, real-time text analysis and extraction, domain-specific translation and categorization; tasks requiring fast, on-the-fly language processing with minimal computational overhead.

Narrow Model

Narrow models, also known as domain-specific models, are AI systems designed to perform well on a specific task or within a particular domain. They are not necessarily language models but can be any type of machine learning model focused on a narrow set of capabilities.

Differentiation

The key distinction is their specialized focus. Unlike language models that are generalists, narrow models are experts in their domain, potentially offering higher accuracy and efficiency for specific tasks.

Size Suggestion

Size varies widely depending on the task and the complexity of the domain. Can range from thousands to millions of parameters.

Typical use cases

Image recognition in medical diagnostics, optical character recognition, voice recognition for specific accents, fraud detection in finance, predictive maintenance in manufacturing, and any application where specialized knowledge significantly outperforms generalist approaches.

Now we know it, coming from the definite source …

A real-world scenario

As this is quite theoretical, let’s dive into a typical business scenario, like the submission of an expense report, which usually has three steps: Taking a picture on a mobile device, submitting it, and then triggering an approval… or not.

When the photo of the expense report is taken on a mobile device, it is typically run through an OCR system to extract text from the photo (OCR = narrow model). The text is then broken down into pieces of information, like vendor, amounts, currency, etc., which then gets categorized and perhaps converted into the company currency before submission as an electronic document. These jobs are done by a small language model on the device. 

After submission, the claim is checked for anomalies, including policy violations, requiring domain understanding such as applicable rules and policies. This is a task for a medium language model, which uses the documents that describe the rules and policies.

Other use cases include the electronic signature of a contract in purchasing, the creation of an offer in sales, and many more.

In summary

Decision makers in IT and lines of business need to take a detailed look at the real AI offering, its associated cost, and whether the necessary intelligence is already available in-house.

Just because generative AI is the new shiny toy does not mean that it is the most effective or efficient tool to use.

Work with your partners who will be willing and able to guide you to, or out of, the use of an LLM for your problem statement, this also considering environmental or societal categories. A partner will do this. A vendor will mainly make sure that you buy the shiny new tool.

Thomas Wieberneit

Thomas helps organisations of different industries and sizes to unlock their potential through digital transformation initiatives using a Think Big - Act Small approach. He is a long standing CRM practitioner, covering sales, marketing, service, collaboration, customer engagement and -experience. Coming from the technology side Thomas has the ability to translate business needs into technology solutions that add value. In his successful leadership positions and consulting engagements he has initiated, designed and implemented transformational change and delivered mission critical systems.

ADD YOUR COMMENT

Please use comments to add value to the discussion. Maximum one link to an educational blog post or article. We will NOT PUBLISH brief comments like "good post," comments that mainly promote links, or comments with links to companies, products, or services.

Please enter your comment!
Please enter your name here