Tuesday, October 10, 2017

Inside story on HPC’s AI role in Bridges 'strategic reasoning' research at CMU

The next BriefingsDirect high performance computing (HPC) success interview examines how strategic reasoning is becoming more common and capable -- even using imperfect information.

We’ll now learn how Carnegie Mellon University and a team of researchers there are producing amazing results with strategic reasoning thanks in part to powerful new memory-intense systems architectures.

Listen to the podcast. Find it on iTunes. Get the mobile app. Read a full transcript or  download a copy. 

To learn more about strategic reasoning advances, please join me in welcoming Tuomas Sandholm, Professor and Director of the Electronic Marketplaces Lab at Carnegie Mellon University in Pittsburgh. The discussion is moderated by Dana Gardner, Principal Analyst at Interarbor Solutions.

Here are some excerpts:

Gardner: Tell us about strategic reasoning and why imperfect information is often the reality that these systems face?

Sandholm: In strategic reasoning we take the word “strategic” very seriously. It means game theoretic, so in multi-agent settings where you have more than one player, you can't just optimize as if you were the only actor -- because the other players are going to act strategically. What you do affects how they should play, and what they do affects how you should play.

Sandholm
That's what game theory is about. In artificial intelligence (AI), there has been a long history of strategic reasoning. Most AI reasoning -- not all of it, but most of it until about 12 years ago -- was really about perfect information games like Othello, Checkers, Chess and Go.

And there has been tremendous progress. But these complete information, or perfect information, games don't really model real business situations very well. Most business situations are of imperfect information.

Know what you don’t know

So you don't know the other guy's resources, their goals and so on. You then need totally different algorithms for solving these games, or game-theoretic solutions that define what rational play is, or opponent exploitation techniques where you try to find out the opponent's mistakes and learn to exploit them.

So totally different techniques are needed, and this has way more applications in reality than perfect information games have.

Gardner: In business, you don't always know the rules. All the variables are dynamic, and we don't know the rationale or the reasoning behind competitors’ actions. People sometimes are playing offense, defense, or a little of both.

Before we dig in to how is this being applied in business circumstances, explain your proof of concept involving poker. Is it Five-Card Draw?

Heads-Up No-Limit Texas Hold'em has become the leading benchmark in the AI community.
Sandholm: No, we’re working on a much harder poker game called Heads-Up No-Limit Texas Hold'em as the benchmark. This has become the leading benchmark in the AI community for testing these application-independent algorithms for reasoning under imperfect information.

The algorithms have really nothing to do with poker, but we needed a common benchmark, much like the IC chip makers have their benchmarks. We compare progress year-to-year and compare progress across the different research groups around the world. Heads-Up No-limit Texas Hold'em turned out to be great benchmark because it is a huge game of imperfect information.

It has 10 to the 161 different situations that a player can face. That is one followed by 161 zeros. And if you think about that, it’s not only more than the number of atoms in the universe, but even if, for every atom in the universe, you have a whole other universe and count all those atoms in those universes -- it will still be more than that.

Gardner: This is as close to infinity as you can probably get, right?

Sandholm: Ha-ha, basically yes.

Gardner: Okay, so you have this massively complex potential data set. How do you winnow that down, and how rapidly does the algorithmic process and platform learn? I imagine that being reactive, creating a pattern that creates better learning is an important part of it. So tell me about the learning part.

Three part harmony

Sandholm: The learning part always interests people, but it's not really the only part here -- or not even the main part. We basically have three main modules in our architecture. One computes approximations of Nash equilibrium strategies using only the rules of the game as input. In other words, game-theoretic strategies.

That doesn’t take any data as input, just the rules of the game. The second part is during play, refining that strategy. We call that subgame solving.

Then the third part is the learning part, or the self-improvement part. And there, traditionally people have done what’s called opponent modeling and opponent exploitation, where you try to model the opponent or opponents and adjust your strategies so as to take advantage of their weaknesses.

However, when we go against these absolute best human strategies, the best human players in the world, I felt that they don't have that many holes to exploit and they are experts at counter-exploiting. When you start to exploit opponents, you typically open yourself up for exploitation, and we didn't want to take that risk. In the learning part, the third part, we took a totally different approach than traditionally is taken in AI.

We are letting the opponents tell us where the holes are in our strategy. Then, in the background, using supercomputing, we are fixing those holes.
We said, “Okay, we are going to play according to our approximate game-theoretic strategies. However, if we see that the opponents have been able to find some mistakes in our strategy, then we will actually fill those mistakes and compute an even closer approximation to game-theoretic play in those spots.”

One way to think about that is that we are letting the opponents tell us where the holes are in our strategy. Then, in the background, using supercomputing, we are fixing those holes.


HPC from HPE
To Supercomputing and Deep Learning

Gardner: Is this being used in any business settings? It certainly seems like there's potential there for a lot of use cases. Business competition and circumstances seem to have an affinity for what you're describing in the poker use case. Where are you taking this next?

Sandholm: So far this, to my knowledge, has not been used in business. One of the reasons is that we have just reached the superhuman level in January 2017. And, of course, if you think about your strategic reasoning problems, many of them are very important, and you don't want to delegate them to AI just to save time or something like that.

Now that the AI is better at strategic reasoning than humans, that completely shifts things. I believe that in the next few years it will be a necessity to have what I call strategic augmentation. So you can't have just people doing business strategy, negotiation, strategic pricing, and product portfolio optimization.

You are going to have to have better strategic reasoning to support you, and so it becomes a kind of competition. So if your competitors have it, or even if they don't, you better have it because it’s a competitive advantage.

Gardner: So a lot of what we're seeing in AI and machine learning is to find the things that the machines do better and allow the humans to do what they can do even better than machines. Now that you have this new capability with strategic reasoning, where does that demarcation come in a business setting? Where do you think that humans will be still paramount, and where will the machines be a very powerful tool for them?

Human modeling, AI solving

Sandholm: At least in the foreseeable future, I see the demarcation as being modeling versus solving. I think that humans will continue to play a very important role in modeling their strategic situations, just to know everything that is pertinent and deciding what’s not pertinent in the model, and so forth. Then the AI is best at solving the model.

That's the demarcation, at least for the foreseeable future. In the very long run, maybe the AI itself actually can start to do the modeling part as well as it builds a better understanding of the world -- but that is far in the future.

Gardner: Looking back as to what is enabling this, clearly the software and the algorithms and finding the right benchmark, in this case the poker game are essential. But with that large of a data set potential -- probabilities set like you mentioned -- the underlying computersystems must need to keep up. Where are you in terms of the threshold that holds you back? Is this a price issue that holds you back? Is it a performance limit, the amount of time required? What are the limits, the governors to continuing?

Sandholm: It's all of the above, and we are very fortunate that we had access to Bridges; otherwise this wouldn’t have been possible at all.  We spent more than a year and needed about 25 million core hours of computing and 2.6 petabytes of data storage.

This amount is necessary to conduct serious absolute superhuman research in this field -- but it is something very hard for a professor to obtain. We were very fortunate to have that computing at our disposal.

Gardner: Let's examine the commercialization potential of this. You're not only a professor at Carnegie Mellon, you’re a founder and CEO of a few companies. Tell us about your companies and how the research is leading to business benefits.

Superhuman business strategies

Sandholm: Let’s start with Strategic Machine, a brand-new start-up company, all of two months old. It’s already profitable, and we are applying the strategic reasoning technology, which again is application independent, along with the Libratus technology, the Lengpudashi technology, and a host of other technologies that we have exclusively licensed to Strategic Machine. We are doing research and development at Strategic Machine as well, and we are taking these to any application that wants us.

                                                           HPC from HPE
Overcomes Barriers 
To Supercomputing and Deep Learning

Such applications include business strategy optimization, automated negotiation, and strategic pricing. Typically when people do pricing optimization algorithmically, they assume that either their company is a monopolist or the competitors’ prices are fixed, but obviously neither is typically true.

We are looking at how do you price strategically where you are taking into account the opponent’s strategic response in advance. So you price into the future, instead of just pricing reactively. The same can be done for product portfolio optimization along with pricing.

Let's say you're a car manufacturer and you decide what product portfolio you will offer and at what prices. Well, what you should do depends on what your competitors do and vice versa, but you don’t know that in advance. So again, it’s an imperfect-information game.

Gardner: And these are some of the most difficult problems that businesses face. They have huge billion-dollar investments that they need to line up behind for these types of decisions. Because of that pipeline, by the time they get to a dynamic environment where they can assess -- it's often too late. So having the best strategic reasoning as far in advance as possible is a huge benefit.

If you think about machine learning traditionally, it's about learning from the past. But strategic reasoning is all about figuring out what's going to happen in the future.
Sandholm: Exactly! If you think about machine learning traditionally, it's about learning from the past. But strategic reasoning is all about figuring out what's going to happen in the future. And you can marry these up, of course, where the machine learning gives the strategic reasoning technology prior beliefs, and other information to put into the model.

There are also other applications. For example, cyber security has several applications, such as zero-day vulnerabilities. You can run your custom algorithms and standard algorithms to find them, and what algorithms you should run depends on what the other opposing governments run -- so it is a game.

Similarly, once you find them, how do you play them? Do you report your vulnerabilities to Microsoft? Do you attack with them, or do you stockpile them? Again, your best strategy depends on what all the opponents do, and that's also a very strategic application.

And in upstairs blocks trading, in finance, it’s the same thing: A few players, very big, very strategic.

Gaming your own immune system

The most radical application is something that we are working on currently in the lab where we are doing medical treatment planning using these types of sequential planning techniques. We're actually testing how well one can steer a patient's T-cell population to fight cancers, autoimmune diseases, and infections better by not just using one short treatment plan -- but through sophisticated conditional treatment plans where the adversary is actually your own immune system.

Gardner: Or cancer is your opponent, and you need to beat it?

Sandholm: Yes, that’s right. There are actually two different ways to think about that, and they lead to different algorithms. We have looked at it where the actual disease is the opponent -- but here we are actually looking at how do you steer your own T-cell population.

Gardner: Going back to the technology, we've heard quite a bit from HPE about more memory-driven and edge-driven computing, where the analysis can happen closer to where the data is gathered. Are these advances of any use to you in better strategic reasoning algorithmic processing?

Algorithms at the edge

Sandholm: Yes, absolutely! We actually started running at the PSC on an earlier supercomputer, maybe 10 years ago, which was a shared-memory architecture. And then with Bridges, which is mostly a distributed system, we used distributed algorithms. As we go into the future with shared memory, we could get a lot of speedups.

We have both types of algorithms, so we know that we can run on both architectures. But obviously, the shared-memory, if it can fit our models and the dynamic state of the algorithms, is much faster.

Gardner: So the HPE Machine must be of interest to you: HPE’s advanced concept demonstration model, with a memory-driven architecture, photonics for internal communications, and so forth. Is that a technology you're keeping a keen eye on?


HPC from HPE
Overcomes Barriers 
To Supercomputing and Deep Learning

Sandholm: Yes. That would definitely be a desirable thing for us, but what we really focus on is the algorithms and the AI research. We have been very fortunate in that the PSC and HPE have been able to take care of the hardware side.

We really don’t get involved in the hardware side that much, and I'm looking at it from the outside. I'm trusting that they will continue to build the best hardware and maintain it in the best way -- so that we can focus on the AI research.

Gardner: Of course, you could help supplement the cost of the hardware by playing superhuman poker in places like Las Vegas, and perhaps doing quite well.

Sandholm: Actually here in the live game in Las Vegas they don't allow that type of computational support. On the Internet, AI has become a big problem on gaming sites, and it will become an increasing problem. We don't put our AI in there; it’s against their site rules. Also, I think it's unethical to pretend to be a human when you are not. The business opportunities, the monetary opportunities in the business applications, are much bigger than what you could hope to make in poker anyway.

Listen to the podcast. Find it on iTunes. Get the mobile app. Read a full transcript or  download a copy. Sponsor: Hewlett Packard Enterprise.


You may also be interested in: