Beyond Words: Introducing Multimodal GenAI

0
2

Share on LinkedIn

In part two of this this three-part series I want to introduce you to the world of multimodal Generative AI (GenAI) and explore the potential it offers to enhance the customer experience. 

First, let’s begin by outlining what multimodal GenAI is. Imagine you’re creating a digital book and want to include some images, perhaps some video and certainly some sound. These elements will need to sit alongside the text that you write supported by diagrams that explain some of the concepts in your digital book. Once your digital book is ready you want to do a launch event and share the books availability in multiple languages with your audience (who knows you may even want to compose a musical theme for the book launch as well).

In a traditional production model, you’d need an army of editors, illustrators, a film crew, linguists (and of course if you really want to create a launch theme tune then you’d need musicians and song writers as well). It’s a daunting list.

Step in multimodal GenAI, a concept designed to implement and deliver a fully immersive multisensory experience. At the core of this concept is the idea that we have a GenAI capability that can ingest, process and then output multiple types of data, providing more precision in interpreting customer needs, sentiment and tone of voice based on the multiple types of inputs it receives. To create a digital book we could share with the AI what we want the book to be about, how we want to illustrate certain points, define the video, and even ask it to produce it in multiple languages. A solution encompassing multimodal GenAI will then generate the text, video, images, sounds and diagrams, translating the content to different languages as needed.

How then can a marketer apply this concept to meet the needs of some hitherto untapped customer segment or to create a better relationship with an existing customer base? Let’s consider the following possibility. I run a website offering flights and hotels to customers who want to book online and create their own unique tour packages. I want to increase the number of long-distance travellers I attract and help travellers maximize the experience when they travel with me. So far this could cover any of the plethora of travel website providers out there. I however have created a customer interface based on a multimodal GenAI service on my website. Potential customers can share images of places they want to visit, record a video outlining what they want in their holiday package, share concerns they have about certain destinations, define budget restrictions, party size and any physical or health limitations etc. My site then assembles a travel proposal based on all these inputs and creates a fully immersive personalized ‘travelog’ allowing the customer to experience their potential holiday through immersive multimedia. They can see places they want to visit, get recommendations on other places they should not miss, hear what other travellers have to say about the location and sights, learn more about the culture and customs of the destination, learn some of the basics of the language and have any concerns or worries they expressed answered. Add in augmented or virtual reality and the immersion becomes even deeper. If we move to haptic systems we can smell, see, hear, and touch all before we go and experience the whole thing for real. 

Multimodal GenAI can help us reach audiences that we have struggled to convert or maximize in previous engagements, give confidence to the nervous traveller, make the most of every second for the person wanting to maximize their two week break, help discover new places, ideas and experiences for the traveller who has gone to the same location for the past ten trips based on comfort or maybe apathy. 

How realistic is multimodal GenAI? Well, the example above is already very real and there are new and emerging examples every day. Like all technology though we must consider the challenges and there are some significant challenges to delivery. First is the sheer volume and complexity of the different data types we need to train our models, not to mention the actual complexity of the computations. It isn’t just the volume of data either, but aligning the data across the different modalities made more complex by the different formats, and semantic definitions within the different data types. It is also fair to point out that we still have very limited multimodal data sets today and no fully developed approaches to handling missing data across the modalities. Lastly think about the complexity of the decision logic. We are looking at data from different sources in a variety of formats which in turn means we need to develop complex frameworks and algorithms to handle extraction of value and inference from the data itself.

Even given these challenges multimodal GenAI will become more of a reality in the very near future. 

In the final article of this series, we will examine the future of Generative AI and explore ideas that may lead us from Gen AI to Artificial General Intelligence (AGI) and beyond.

Mike Turner
As a multi award winner and with over 25 years of experience in the field of Customer Intelligence, Mike has led many successful projects for international blue-chip companies. With SAS, Mike is helping clients to understand the future direction of Customer Intelligence and how this will be impacted by the rapid change and growth in technology and consumer expectations. He works across topics such as the internet of things, algorithmic decisioning, open and collaborative data strategies and next generation marketing considering artificial intelligence and machine learning.

ADD YOUR COMMENT

Please use comments to add value to the discussion. Maximum one link to an educational blog post or article. We will NOT PUBLISH brief comments like "good post," comments that mainly promote links, or comments with links to companies, products, or services.

Please enter your comment!
Please enter your name here