Friday, June 26, 2026

How To Scale AI in Digital Commerce Effectively

Dana Gardner: Welcome to Don't Panic, It's Just Data, the podcast that explores how organizations turn data into a business advantage. I'm Dana Gardner, Principal Analyst at Interarbor Solutions, and I'll be your host for this discussion.

Joining me are Jürgen Obermann, Senior Go-To-Market Leader for EMEA at Vespa.ai, and Piotr Kobziakowski, Senior Principal Solutions Architect also at Vespa.ai. They've worked with some of the world's largest brands to move from static search experiences to more powerful and dynamic personalized customer journeys.

In this episode, we'll tackle one of the biggest challenges facing digital commerce teams, how to unlock the full potential of AI-driven search ranking, and personalization. And we'll explore how technical leaders can design platforms that deliver meaningful impact for their customers. Jorgen and Pietro, welcome.

Jürgen Obermann: Thank you for having us.

Piotr Kobziakowski: Thank you very much.

Dana Gardner: Many teams see the potential of AI, but the practical blockers from data fragmentation to slow experimentation can make it difficult to turn ambition into real customer impact. Jorgen, let's start with a common challenge. Where do digital commerce companies most often struggle when adopting AI-driven search, ranking, and personalization?

[Listen to the discussion or watch it.]

Jürgen Obermann: We see three areas of problem areas with our e-commerce customers today. The first one is kind of an operational level. At the operational level all these e-commerce sites obviously have a history, they have a long development, they have fragmented environments, they all have architectures based on microservices, which was a good thing at the time. But today with AI and its performance needs for AI causes some performance problems, but it also causes flexibility problems. People talk to for the slightest changes. I have 90 to 180 day delivery times from the engineering teams because they have so many areas where they need to fine-tune things and if they touch one thing too much something else will break.

So, it's a very fragile infrastructure. That's challenge number one. Challenge number two is with regard to the customer experience and the search experience customers have on their websites. And that is really something where now with the newer AI technologies, people can do much more sophisticated, personalized search, particularly using our technology. And so that's where people really would like to see some improvements. And that's kind of the challenge they're facing that what they use today is not allowing them to do that.

The third area is the business area where they would love to run campaigns and check out if the campaigns actually have an impact. I like to use the example of the Netherlands where they have the king days and at king days everybody wears orange and somebody providing sneakers, hats and t-shirts should provide them in orange and should push these campaigns to the customers, which today takes three weeks, lots of involvement of data scientists in order to create a personalization and ranking to reflect it.

And what the people really want to do is to have basically more or less online within minutes and be able to configure this and then see the impact in A-B testing right away. And this is something where we get involved a lot with our customers because that's exactly where they want to go.

Dana Gardner: Are today's digital commerce search and recommendation stacks hitting a ceiling? When you look at a large e-commerce systems, what's the architectural bottleneck that you see most often?

Jürgen Obermann: It seems like that that's Lucene-based solutions and I used to work for Elastic for a long time so I know the environment pretty well. It seems like the Lucene-based kind of implementations hit the ceiling as soon as you start using vector operations.

I just talked to an analyst today and they told me that they feel like it's bolted on and therefore not really effective when it comes to be used in these environments. I'll give you an example. One of our customers, recently implemented our solution, was using one of the Lucene-based solutions and they had about four queries per second using vector operations in the background. We implemented our solution and we were using vectors and tensors to do this, not much different to what they were using before.

We could come up with 4,000 queries per second. So, you see that there's two orders of magnitude difference. And this is sort of the architectural bottleneck a lot of these people face besides being a bit too distributed. They have search and ranking personalization divided and have a network in between, which causes latency and so on and so forth. But Piotr, maybe you can give us from a technical perspective an additional view on this.

Piotr Kobziakowski: When you think actually about the multiple systems involved in an e-commerce operation, first of all, we're coming from the search, so we need to find items, that we need to rank them, we need to personalize them.

But then this requires a lot of other things as well, capturing the signals from the users, updating the feature stores, and then inferring on the information that are captured with different machine learning models in inference platforms, usually also outside of the main systems.

So those things basically need to be interconnected, right? So, the calls between platforms. That's one thing which is actually very limiting the speed and response because every connection, every call to the API causes the delay, right? But also if you have systems, let's say system for ranking, system for search, system for recommendation, they usually need to replicate the same catalog multiple times and then the information needs to be collected.

So not only you store data in multiple places, but also you slow down your actions. Then if you like to use all of these components into one single system to provide good answer to the end customer. This requires connecting calls across all of the systems. This data is not really available that fast as it will be in single system, right?

So that's definitely a big, big bottleneck in making the systems work fast. And also from the operational perspective, that's roadblock actually to update all of the systems. If you introduce just single field to add one more feature, that means that you need to update your APIs. You need to update all the things around all the system. So definitely this is really slowing down evolution and innovation.

Dana Gardner: Piotr, let's talk about what AI native search architectures look like. If a technical leader were designing a digital commerce search and ranking platform today -- with those vectors, the tensors, and the real-time inference -- what fundamental design principles should they prioritize? How should this be done properly?

Piotr Kobziakowski: From my perspective, we should look at all areas where the bottlenecks are. We already discussed that, right? We need to put a processing where the data is. So, to shorten the path and enable richer and better calculations on the all signals and data which we have. So effectively, when you look at the Vespa architecture, so that will be, from my perspective, really go to platform for e-commerce.

And then from simple perspective, because you can combine standard product search, Lexi, or semantic, you can basically implement signal handling, so feature store that you actually learn from interactions from user interactions. And then you can serve recommended elements or items very nicely and quickly because this is just another element in your rankings.

And obviously, I mentioned ranking and then what does it mean ranking? Ranking, real-time ranking means that you can perform all the calculations, not just how to order the elements based on your text input. but also include in this ranking business logic, which will be prioritizing, for example, items which are better for the revenue, but not losing the element from personalization, which is making customer feel that system understands them what they like and then what they actually see. So we have very good examples across many customers we are interacting that if users are looking for the cars or mobile phones or houses, if they have let's say,100,000s of offers for different things to actually navigate even through the search, keyword search, or let's say even the semantic search, it's very hard to find those exactly things they are looking for. Let's imagine that I would like to find the car, which is specific engine, specific, let's say color.

And things and then if you enter the website and you have that personalized you see cars you are normally interested how much faster you'll find when this is really precise car you are looking at and then instead of 3,000 you see only maybe 200 which are exactly what you need. So, your decision-making process will be much faster.

Another thing another topic is obviously that a platform which does all these calculations. Let's say it has to have really proper document store. It cannot be just for the search inverted index, which will be enabling us to do lexical search or just vector search, which is a vector store which enables us the vector search or just basically the feature store which will keep the values for example for personalization, let's say profiles for the users.

It has to be all combined because when you do ranking, you need to reach out to all of these data sets at once in a very fast way. Because if you have, let's say 10,000 queries per second, you can imagine that the load on the network will be extremely high if you need to make this cost to separate platforms. We are not talking about megabytes. We are talking about even gigabytes per second when we have combined systems, meaning distributed systems. So, the network is having extremely big importance. If you can put that next to the data, let's say this access enables you to do direct calls to your data, it's much, much faster. So again, many of these things are represented today in tensors.

We hear a lot about vector databases and vectors, but vectors are just small subset of tensors. Tensors can represent map of vectors, scalars, matrices, or maps. And then now when you think about personalization items, how you represent user, user can be represented as a no single vector, because if you have multiple categories, obviously we don't like the same cars as we like t-shirts.

So, we need to have multiple elements that represent different categories. And then we can put that in single tensor in Vespa and then use it in our ranking to calculate simple dot product operation and then really have accurate representation of what user likes. So again, now when we think about tensors, we should actually think a lot about machine learning models. So obviously inference and whether the inference is happening is also extremely important.

When you have access to all of this data, we have tensor representations, then it's natural to run different models like GPT models or any ONNX model, which you can download and you can experiment and use this data immediately in your ranking process.

I didn't mention yet about Vespa ranking. Vespa ranking is not as people understand traditional system where you do let's say, hybrid ranking, and then you'll find the documents which are lexical and semantics in combination. Vespa ranking is really the big, let's say, system which enables you to divide your calculations into three different phases per node and then per global cluster. You'll be able to execute any type of mathematical operation there.

Also, use any signals from typical lexical and semantic world, combine it with business logic with if conditions, let's say, full conditional structures, and then you can really build nicely all of the logic. Also, ranking enables you to expose all the calculated values, which can be used later on to optimize and then use it for training models which will, for example, fine tune weights for each segment of the ranking to be the most accurate possible.

Digital commerce teams rarely lack ideas. Most understand how AI, data, and personalization could improve customer experiences. The problem is turning those ideas into something that works at scale, in real time, and without slowing the business down.

Dana Gardner: Let's drill into the personalization a bit. I should think that the digital commerce systems of the future need to adapt. They can't be static and rule driven. They need to be more adaptive in real time.

So how do these systems, the personalization systems that rely on nightly batches and static segmentation, manual tuning, how do we move them now to a world where ranking adapts instantly to user behavior?

Piotr Kobziakowski: Yes, because of the topics we already discussed in the previous question, as we got that in separate systems, we needed to collect information. We've been not able to run quickly operations on, let's say, millions of the users easily, right? Because the system got their own limitations, and then it was pretty hard actually to do it.

When we move to Vespa, we can shift some of the operations from complex models into just tensor operations because it turns out that actually when we do tensor representation for profiles, we can basically update these models in the update the models based on the user signals and then updating in Vespa is possible. After all, Vespa is enabling a partial update.

So, you can update not just single fields in the document with really high efficiency, but also you can update even single cell in the tensors. So that enables you to manipulate every factor of the personalization and quickly do the dot product calculations. So that's similar to vector search, which enables you to find quickly documents, are, let's say the closest to what users like. And then when you have this ability, you can easily combine that in ranking with your text search.

We can actually start from the text search. You have some results. And then in the second stage, which I mentioned, you apply reordering based on the user preferences and then users will see what he was looking for but in his own preferred, let's say, colors, shapes and everything. So that also gives this feeling of a really good search because then we are not getting things which we don't like upfront, right?

Dana Gardner: And let's look at the impact on the resources needed as we make these advances. So from an engineering and operations perspective, what's the real cost of running search and vector and recommendation stacks as separate systems? What inefficiencies arise when organizations spread their search and vector retrieval inference across different services and databases?

Piotr Kobziakowski: When we have everything actually separate, even lexical semantics are separate, then another re-ranker which will be there. Then another ranking platform which will be putting business weights into every single document. And then you have recommender component that will be learning from user behaviors, doing the nightly batches or let's say hourly or any other period of time.

Then you will have feature store which will keep users. or let's say models, the model server and inference platform. When you think about that in many of these, many of these systems will have replicated data, right? So first of all, they might be actually data-driven inconsistency between these data sets because when we do ranking, we may update maybe later our catalog, not in every systems or something will happen that inconsistency will be there. So that's heavy risk of actually having broken results. Then obviously latency. So we already spoke about this latency and calls. We are not looking at single calls. We call, we look at the thousands of calls per second. So that generates a lot of a lot of data transfers across network. And then it's heavily impacting P95, P99 latency on the system, which is very important for the user experience. So platform complexity.

You already mentioned at the beginning that innovation is a key today because the world accelerated heavily, right? We see models every day. We see innovation every day. So, if we are not able to modify our path from to compete with other systems, which are now built in, as startups, that's really bad thing because we may lose a position from the leader and then be the last one if we'll be not competing.

We need to think about how to make this complex ecosystem much simpler to be able to introduce changes and modifications every day. So again, there is also aspect of testing. Testing is not trivial, right? You need to have really ability to run the, let's say one schema of ranking, how you'll be doing this and performing your operations. And then Vespa enables you also to run multiple different ways, how you will be running different ranking profiles just by setting the parameter so you can create almost unlimited number of ranking profiles and do the selection of the ranking profile to make it actually comparable across different sets of those.

Dana Gardner: It certainly sounds like the implications of AI-driven commerce is forcing a reckoning of search almost from top to bottom. Let's talk about the migration of how you get from current state to the next state. If you were leading a move from a legacy search stack to an AI native platform, how would you phase it? How would you go the crawl, walk, run in order to get there?

What do technical leaders need to modernize search and recommendation? You just can't rip and replace legacy systems. What are the practical steps to make this transition?

Piotr Kobziakowski: The biggest challenge is to provide the personalization to the category pages. When you visit your websites, you will see the products which will be, ordering will be driven by business and recommendation and personalization features, right? And this is usually not heavily implemented in e-commerce space. I would start from the personalization component.

This personalization component requires copy of the full catalog. as we already discussed, because these catalogs live in all of these components, right? So, you start doing the category pages, you start building the catalog, and you start at the same time thinking about, can I use the same catalog in the same platform for semantic and lexical search, right?

By successfully moving the personalization, you can realize that adding the search component to that is not really complex. It's a trivial task because you can build just new ranking profile, which will be responsible for search. And you already have the personalization component, which was built for the category page. If you fuse those two, you have now personalized search for the user.

So step by step, using this kind of approach, you'll have really easy move and of course when you do category pages you can start from single categories as well and then see how they perform measure the results and then add more categories once finalized then you move to a lexical and semantic search but then you don't start from the lexical and semantic search because you don't like to change things which work today well right you will move that later when you have implemented personalization components And then you will see that you can save on the same, reusing the same data in the same platform. You will not on that over the time, you will not need additional platforms to actually run around this.

Dana Gardner: Before we close out from a go-to-market perspective, Jürgen do you think some of the business leaders are underappreciating the impact on search that moves towards AI involved? What we've been hearing from Piotr sounds fairly involved. But do you think it's under-appreciated on the business side of what it takes to make search evolve along with AI?

Jürgen Obermann: Yes, we need to move to AI, but nobody really knows what the impact is of doing AI is because AI is a very, very wide kind of expression. What we see is that there is a push to do AI, but almost from a very high level management perspective for the sake of AI, not realizing what the impact is on the existing infrastructure.

For example, the effect of this by using AI technology for e-commerce is some of our customers in a one-on-one change of infrastructure from whatever they had before to Vespa, they had in certain categories up to 20-25 percent increase of revenue because of the better representation of their products, the easier access to their products and the more personalized delivery of the information to the customer.

The impact is profound. And I think where the gap is today is to understand from a business perspective. Yes, they want to do AI. But how do they get AI implemented in a way that is actually useful for the company? And I think this is the challenge today, where we also sometimes struggle because AI is such a wide experience that you really need to clarify that, but while it's done, I have not seen any manager or product owner who would not be excited to have this type of implementation.

Dana Gardner: Piotr, regardless of the level of AI adoption, it certainly sounds like the usability and detail availability for digital commerce is in itself a force to change and improve your search capabilities.

What advice would you suggest for technical leaders who want to deliver those usability and commerce benefits to the business? How should they start rethinking the digital commerce and architectures? How do you get the technical people to be able to deliver on these promises?

Piotr Kobziakowski: I advise that they look at the AI itself, right? AI is very broad topic. We have obviously large language models (LLMs) that everybody is evaluating now with ChatGPT, Perplexity and other systems, which do one thing. They understand and answer the questions.

But there is a hidden fact about those LLMs. For example, they can be extremely good at extracting features or information from the documents. So they can be used to improve the understanding by the system itself ,what the items are, and what they are for or how they can be represented.

When you have these representations, obviously those mostly will be generated into tensors. So you need to have platform which will handle that very well. The whole ecosystem which will be around those AI systems, LLMs, and other models. It has to be really cohesive and working in tandem with every required features together.

I suggest that they need to look in all of the parameters that we mentioned, such as latency, data availability, how updates can be done, and then how we can combine it all together into one single response to the user in the fastest possible way. They need to think about all of these aspects, and then when they look at Vespa they will realize and understand that Vespa is not the search engine, is not just vector database, but it's a platform that delivers all of these components into one single system.

Dana Gardner: Thank you so much, Jürgen and Pietra, for uncovering some of the details and complexity involved with transitioning digital commerce. It was a pleasure to have you on the show.

Jürgen Obermann: Thank you.

Piotr Kobziakowski: Thank you.

Dana Gardner: For our audience, if you would like to learn more about what we covered today, please visit www.vespa.ai. If you enjoyed this discussion, we'll be back next week with another episode in our ongoing podcast series.

Until then, make sure you subscribe to this podcast and all major platforms, and follow the conversation on our social channels at EM360 Tech on X and LinkedIn. And for more insightful daily content, head over to em360tech.com. Thank you again.

(Vespa.ai supported the creation of this discussion).

[Listen to the discussion or watch it.]


No comments:

Post a Comment