The era of Microsoft exclusivity is over! Sam Altman's latest interview: Why must OpenAI partner with AWS?

Original author: Ben Thompson, Stratechery

Compiled by: Peggy, BlockBeats

Editor's Note: On April 27, OpenAI and Microsoft amended their cooperation agreement, with Azure no longer having exclusive rights to OpenAI models. This allows OpenAI to expand its products to other cloud platforms such as AWS.

Note: Azure is Microsoft's cloud computing platform, usually called Microsoft Azure. Like AWS and Google Cloud, it primarily provides enterprises with cloud services such as servers, databases, storage, networking, security, and AI model deployment.

To outsiders, this may seem like just a change in cloud service distribution channels; but judging from the discussion between Sam Altman and AWS CEO Matt Garman, the more crucial change is that AI is moving from "model invocation" to "enterprise-level workflow".

This article is a translation of an interview with Sam Altman and Matt Garman by the technology business analysis media Stratechery. It focuses on Bedrock Managed Agents, a product of the collaboration between OpenAI and AWS, and discusses the similarities between cloud computing and the migration of AI platforms, the challenges of deploying enterprise-level agents, the differences between AgentCore and managed services, and AWS's position in the competition for AI infrastructure.

Note: Stratechery was founded by technology analyst Ben Thompson and focuses on changes in technology company strategy, platform economy, cloud computing, AI, and the media industry. Its content primarily consists of in-depth analysis and executive interviews, and it has a high influence in Silicon Valley's technology and investment circles, often regarded as an important window into the strategic moves of large technology companies.

The core of Bedrock Managed Agents is not just enabling AWS customers to use OpenAI models, but embedding those models into AWS's native identity, permissions, logging, governance, deployment, and security systems. In other words, what enterprises truly need is not a smarter chat window, but a system of "virtual colleagues" that can operate within the organization, access data, perform tasks, and adhere to permission boundaries.

This is also the most noteworthy aspect of this collaboration: the focus of AI competition is shifting from "who has the strongest model" to "who can turn the model into usable enterprise infrastructure." In individual developer scenarios, Codex can rely on local environments to solve many complex problems; however, in enterprise scenarios, the agent must deal with databases, SaaS, permission systems, security boundaries, and compliance requirements.

In a sense, this collaboration mirrors the early logic of cloud computing. AWS lowered startup costs for companies, enabling small teams to build internet products without having to build their own servers. Now, OpenAI and AWS are attempting to lower the barrier to entry for enterprises deploying AI agents, allowing them to integrate AI into real-world business processes without having to assemble models, permissions, data, and security systems themselves. The difference this time is that adoption is faster and enterprise needs are more urgent.

Therefore, this article is not really about "listing" OpenAI models on AWS, but rather about AI infrastructure entering the next stage: models, cloud, data, and enterprise access control systems are becoming deeply integrated. Future competition may no longer be about API prices, chip performance, or model rankings, but about who can build an AI platform that enterprises can confidently use, continuously expand, and truly execute.

The following is the original text:

Introduction

Good morning. As I mentioned yesterday, today's Stratechery interview is ahead of schedule in terms of my release timeline (from Thursday to Tuesday); however, it is actually delayed in terms of delivery time (from 6 a.m. ET to 1 p.m. ET) because the topic is subject to embargo restrictions.

Over the past few days, this ban has also put me in a somewhat delicate position: Last Friday, I interviewed OpenAI CEO Sam Altman and AWS CEO Matt Garman about Bedrock Managed Agents, powered by OpenAI. Naturally, one question I raised was: how exactly does this collaboration reconcile with the agreement between OpenAI and Microsoft that grants Azure exclusive access to OpenAI models?

Note: Bedrock Managed Agents is a managed AI agent service offered by AWS, with model capabilities supported by OpenAI. It doesn't just allow enterprises to call OpenAI models on AWS; instead, it embeds models into AWS's native identity authentication, access control, logging, security, governance, and deployment systems. This enables enterprises to build AI agents within their own cloud environment that can perform tasks, access internal data, and adhere to permission boundaries. Simply put, it can be understood as OpenAI agent infrastructure running within an AWS enterprise environment.

Late Sunday, I heard from a rumor that Microsoft would be making some announcements on Monday morning. I was wondering if it would be a preemptive lawsuit!

On Monday, Microsoft and OpenAI announced that they had revised their agreement to allow OpenAI to offer its products on other cloud service providers, including AWS.

Thus, this interview came about.

I believe this new arrangement between Microsoft and OpenAI is reasonable for both parties. Here are the key points of the new agreement as listed in Microsoft's official article:

Microsoft remains OpenAI's primary cloud partner; OpenAI products will be launched primarily on Azure unless Microsoft is unable or chooses not to support the necessary capabilities. OpenAI can now offer its full range of products to customers through any cloud provider.

Microsoft will continue to license OpenAI's models and product-related IP until 2032. However, Microsoft's licenses will no longer be exclusive.

Microsoft will no longer pay revenue sharing to OpenAI.

• OpenAI’s revenue share payments to Microsoft will continue until 2030. This arrangement will remain unchanged regardless of OpenAI’s technological progress, but there is a cap on the total amount.

Microsoft, as a major shareholder, will continue to be directly involved in the growth of OpenAI.

I believe the last point is the most important. Previously, Azure did have a genuine competitive advantage as the only hyperscale cloud provider capable of offering OpenAI models. However, this exclusivity is also limiting OpenAI, especially as more and more enterprises prioritize access to models on their current cloud platforms. I've pointed out many times that this was a key competitive advantage for Anthropic. In other words, Azure's exclusivity is actually harming Microsoft's investment in OpenAI. Given Anthropic's rapid growth this year, Microsoft must safeguard this investment, even if it means diminishing Azure's differentiation.

At the same time, OpenAI clearly sees AWS as a huge opportunity—so big that it's willing to forgo a portion of its Azure-related revenue for the next few years. Combined with the previous point, this also makes it easier for Azure management to accept losing its exclusivity: after all, Azure's profit and loss statement will look much better without paying OpenAI a share of its revenue. OpenAI also freed Microsoft from the AGI terms; now, whatever happens, the agreement between the two companies will last until 2032.

It's now fairly clear that OpenAI's next focus will be on AWS. And the strongest evidence for this is the subject of this interview: Bedrock Managed Agents, powered by OpenAI. The simplest way to understand this product is to think of it as the AWS equivalent of Codex. Codex works well largely because it's localized, which naturally solves many complex problems, especially security issues. But enabling agents to operate across departments and systems within an organization is a completely different matter. The goal of this product is to make it easier for organizations that already have most of their data on AWS to use this type of workflow.

In this interview, we discussed how AWS pioneered the entire cloud computing category and its impact on startups; we also explored the similarities and differences between AI and that paradigm shift. We then discussed Bedrock Managed Agents: what it is and how it differs from Amazon's existing AgentCore product. We also talked about Trainium, why chips aren't that important for most AI users, and why collaboration is a logical choice compared to Google's emphasis on full-stack integration.

Just a reminder, all Stratechery content, including interviews, is available to listen to on podcasts; click the link at the top of this email to add Stratechery to your podcast player.

Enter the interview.

Interview content

This interview has been lightly edited to improve clarity.

OpenAI enters AWS, ending Azure's era of exclusivity.

Ben Thompson (Host): Matt Garman, Sam Altman—Matt, welcome to Stratechery; Sam, welcome back. I previously interviewed Altman in October 2025, March 2025, and February 2023.

Sam Altman (OpenAI CEO): Thank you.

Matt Garman (AWS CEO): Thank you, thank you for the invitation.

Host: Matt, this is your first time on Stratechery. Unfortunately, I think Sam's presence will prevent us from doing our usual "Meet the Guest" segment. Besides, he probably doesn't want to hear us reminiscing about our time at Kellogg's. However, it's still great to have an alumnus on the podcast.

Matt Garman: Yes, I'm happy to be here. I hope to come again next time so we can talk in more depth.

Host: That's great. You've been involved with AWS since your internship, and now you're leading the entire AWS organization amidst the AI wave. In your opinion, what are the similarities and differences between building the AI business and building the initial general computing business—let's put it that way for now?

Matt Garman: I think the similarity lies in the same excitement I see, and the same ability for internet builders to start doing things they couldn't do before. One of the cool things about AWS when we first started was that developers suddenly had access to infrastructure that was previously only available to the largest companies. In the past, only companies with millions of dollars in budgets to build data centers could afford this. Now, developers only need a credit card and a few dollars to launch an application. This has dramatically expanded what internet builders can do.

Our idea was that people could build whatever they wanted. We wouldn't presuppose what they should do. We believed that creativity exists all over the world; given powerful tools, they would create interesting and amazing things.

I believe AI's empowerment of builders is at least as transformative, and perhaps even more so. Think about what's become possible now: you don't need to spend ten years learning programming to build an application; you don't need a huge team of hundreds of people, or months upon months to develop something. You can build quickly and iterate rapidly with small teams. AI is unlocking innovation across all sectors of the world. In many ways, it's very similar to what happened in the past. Seeing the capabilities it brings to our customer base is truly exciting.

Host: However, when AWS emerged, you were the only player, so both the advantages and disadvantages naturally fell on you in a sense. Do you get the feeling that in the AWS era, many things were about general-purpose computing—making computing replaceable, flexible, and cheap; but in the AI field, especially in the training phase, the winning abstraction seems more like a highly vertically integrated supercluster, a very advanced network, and extremely tight interaction between software and hardware? Was this an accident for you? Because this time, you weren't starting from scratch, nor were you "the only ones here"; you had a specific understanding of large-scale computing in the past, but at least in the early years of AI, it didn't seem to align perfectly.

Matt Garman: I'm not sure how different this is for us. I think what's really different is the astonishing speed of adoption. I think that might have surprised everyone. Sam, feel free to add if you disagree. But the speed at which people have embraced these capabilities, and the speed at which they've grasped them, I think, has exceeded everyone's expectations.

This is very different from when we first started working on cloud computing. Back then, we spent a very long time explaining why a bookseller would provide computing power. We had to put in a lot of effort explaining what cloud computing was. There was a lot of hard work involved, which people often forget now. But in 2006, nobody took it for granted that the world's computing would migrate to the cloud. There was indeed a lot of difficult explaining and pushing work back then.

Host: So do you think we need to do some explanation now? Because many people were initially anchored in the training era, while you would say, "We're thinking about the reasoning era," which is something else entirely. Do you also need to reactivate that explanatory ability?

Matt Garman: Yes, it's necessary, but the speed at which people understand what you're saying is completely different now. So I think, yes, when you're getting people from "this thing looks cool, I can talk to a smart chatbot" to "it can actually get the job done in your business," there is definitely an educational process. But in terms of the speed of technological evolution, this process has been relatively fast.

Host: I promise we'll get right to today's product topic. But Sam, looking back from the perspective of the startup ecosystem, AWS was clearly revolutionary; it completely changed the barriers to starting a business. Now anyone can start a business. Seed rounds and angel investors emerged, and the funding threshold was pushed back. You don't need to write "We need to buy servers" in your PowerPoint presentation; you can build an application first, and then raise Series A or other rounds.

From your perspective, what are the differences and similarities between the world opened up by AWS and the world opened up by AI today?

Sam Altman: I believe there have been four platform-driven moments in history that massively empowered startups: the internet, the cloud, mobile, and AI. Of these four, the first I experienced as an adult was the cloud. In the early days of Y Combinator, it was difficult to exaggerate how much it changed startups.

Before that, startups had to rent hosting space, assemble their own servers, and put the equipment inside. It was an extremely complex process, and you had to raise a lot of money first. Then suddenly, the cloud appeared. Although the cloud emerged after Y Combinator was founded, probably the following year.

Host: That's exactly what I wanted to ask—ultimately, are YC and the cloud more inseparable than you realized at the time?

Sam Altman: At the time, we felt that they were highly intertwined. It felt like YC was riding the cloud wave from the very beginning, because there were already some early examples of cloud services before AWS.

Host: If AWS exists, then the amount of money needed to get a startup started is indeed much less than before.

Sam Altman: This was a massive enabling change, which is why YC sounded so crazy at the time. People would say, "It's impossible to invest tens of thousands of dollars in a startup, it's simply impossible, the server costs alone exceed that amount." So this completely changed what startups could do with a small amount of capital.

Generally speaking, startups win when there's a major platform shift and you can do things with a faster cycle and less capital than before. This is the classic way startups beat big companies. Early in my career, I witnessed this kind of change firsthand with the cloud. Now, looking at companies building products based on AI, the direction feels very similar. But as Matt said, this time the pace is insane.

Host: Is there a situation where existing large companies and industry giants are adopting AI much faster than they adopted cloud computing back then?

Sam Altman: This is certainly more common. But I'm also referring to the rate at which startups are growing their revenue. I recently spoke at YC, and at the end I asked, "What are people's revenue expectations for a good company at the end of YC?" They said, "That answer changes every month. The answer might be different at the beginning and end of the same YC batch." This has never happened before. The speed at which people are building businesses at scale on this new platform is something I've never seen before.

Host: Matt, throughout the cloud era, AWS was basically the cloud of choice for all startups, which gave you a huge advantage. So what makes you still the cloud of choice today? Is it because many people are now building products on the OpenAI API; or do you feel, "We're entering this market from a very different angle. We have a huge existing customer base that's asking us to provide AI capabilities, but our visibility isn't as high as Sam's view of this entire startup community"?

Matt Garman: I think there are several aspects to this. First, we are very excited about this collaboration, and I believe it will be very important for many startups. But even today, if you talk to startups, most of the ones that are expanding are still expanding on AWS, and there are many reasons for that. The scale is there, the availability is there, the security is there, the reliability is there, the ecosystem of other ISV partners is on AWS, and the customers are on AWS.

Host: (laughs) Whether they like it or not, everyone has used the AWS console, so they're used to it.

Matt Garman: And we help them. We spend a lot of time empowering startups, not just giving them credits, but also advising them on how to build systems, how to think about go-to-market, and a lot of things like that. I think a lot of startups really appreciate that. We invest a lot of time and effort to make that happen because we truly believe that startups are the lifeblood of AWS. That's been true from the beginning, and as Sam just mentioned, it still is today. I still go to Silicon Valley or elsewhere every quarter to meet directly with startups, hear what they're doing, and make sure that what we're building truly meets their needs.

So, the competition for startups' attention is definitely greater today than it was 20 years ago. But it remains just as important to us as it has always been. We invest a significant amount of time in ensuring we meet the needs of these startups.

Host: Is it fair to say that those who build products directly based on the OpenAI API, rather than using the Azure version of OpenAI services, are more likely to adopt a technology stack where regular computing runs on AWS and the AI portion uses OpenAI?

Matt Garman: I think this is a very common model for many startups today, absolutely.

Bedrock Managed Agents: Bringing AI Agents into Enterprise Workflows

Host: This brings us to today's announcement: Bedrock Managed Agents, powered by OpenAI. I think I'm not mistaken. As I understand it, the selling point of this product isn't just that OpenAI models can be used on AWS—which I think shouldn't be allowed yet—but that OpenAI's cutting-edge models are encapsulated in a native AWS agent runtime, including identity, permission status, logging, governance, and deployment. Sam, is that accurate?

Sam Altman: Yes, that's a good summary.

Host: Thank you. So what exactly is this? Please explain it in plain language.

Sam Altman: I think the next stage of AI will move from "you give an agent some text and get more text back," or even "you give it a bunch of code and get more code back," to a new stage where these agents will run within companies to do all sorts of different types of work.

"Virtual colleagues" is the closest description I've ever heard, but no one has truly found the most accurate language to describe it yet. We're working together to build a new product to help companies that want to build these stateful agents actually create them and make them usable. Again, I don't think we know yet how the world will ultimately talk about these agents or how they will be used. But if you look at what's happening at Codex, I think that's a great example of where this is headed.

Host: For an AI agent to truly function, a model alone isn't enough. It also needs a complete supporting system: a runtime environment, callable tools, task status, memory, access control, and performance evaluation. You specifically mentioned the word "status." So, how crucial are these external infrastructure components to the agent's actual functionality?

Sam Altman: Its importance cannot be overstated. I no longer see the Harness and the model as two completely separate things. From my own experience, when I start a task in Codex and it accomplishes something amazing, one thing is very clear: I don't always know how much credit should go to…

Host: Is it the model that's strong, or the supporting system (Harness)?

Sam Altman: Yes, that's exactly right.

Host: To what extent was the Harness system developed alongside the model? Where did this integration occur? Was it during post-training? During the prompt? What exactly made this integration effective?

Sam Altman: Both. It's not really part of the pre-training process. But I would say there's something more interesting here: we've seen many times in the past that things we initially thought were very separable are being baked deeper and deeper into the system.

For example, our initial understanding of tool-calling. Now it's a crucial part of how we use these models, but initially, we didn't think it needed to be deeply integrated into the training process. Over time, we've done more and more of that.

I also suspect that models and their supporting systems (Harness) will become increasingly integrated over time. Furthermore, I expect pre-training and post-training will eventually become more integrated as well. This may sound like a cliché, but I'll say it anyway because I believe it's very, very true: we are still in a very early stage of this paradigm. The industry is probably only as mature as it was in the Homebrew Computer Club era.

Host: That's why I find this very interesting. I wrote a few weeks ago that in any value chain, there eventually comes a point of integration, a crucial point because two parts must come together for things to function. Over time, a lot of value obviously settles at that point. My assessment at the time was that the integration of the harness and the model is that crucial point. This certainly aligns with your interests, but it sounds like you agree with that assessment.

Sam Altman: That certainly aligns with my interests, and I do agree. But I would go a step further: what really matters is that you type what you want to happen in Codex, and it actually happens.

Host: You don't care about the implementation details.

Sam Altman: We've seen far too many examples like this in our exploration of these things: things that initially had to be addressed at the system prompt level, but later no longer needed. The overall observation here is that as models become smarter, you have greater flexibility to make them behave the way you want. It sounds obvious, but it really is…

Host: It's easier to ask a 10-year-old to do something than a 5-year-old to do something.

Sam Altman: When I think back to the GPT-3 era, all that we had to do to squeeze even a little bit of usability out of those models, and look at the present, you don't need to do that at all, because the models can understand and do things right out of the box. This trend will likely continue to grow.

Matt Garman: I'd like to add something. I completely agree with Sam. And when you talk to clients, they actually know exactly what they want these systems to do. Before this collaboration, clients were, to some extent, forced to piece things together themselves. They wanted these models and agents to remember certain things, to collaborate well, and to integrate into their existing systems. And this wasn't just a matter of third-party tools; it included their own tools as well. They wanted these agents to understand their own data, their own applications, and their own operating environment. And today, at least for now, all of this integration work has to be done by each client themselves.

So, part of our collaboration is to build a new type of product that brings these elements closer together, making it easier for customers to accomplish what they want. For example, identity capabilities are already built into the product; the ability to connect to databases and complete authentication will also be done within your AWS VPC, or Virtual Private Cloud. Theoretically, these things could also be done with the OpenAI API on one side and AWS on the other. But by building this together, we enable customers to realize value more easily and quickly, and to accomplish what they want in their own enterprise environments.

Host: So you mean it's possible to build a working agent within a general-purpose (Harness) system, just much more difficult? Are you making it easier? Or are there things that simply can't be done without these things being bundled together?

Sam Altman: Going back to your earlier analogy, before AWS existed, if you were willing to stand in a server room cubicle, buy a bunch of servers, figure out how to connect them, and hire your own network engineers, you could do a lot of things. You could make a lot of things happen. Then suddenly, you just need to log into the AWS console, click something like "I need another S3 instance," and you can do even more because the startup energy and workload required for the basics have decreased dramatically.

Today, you can certainly do a lot with models. But every time I see someone using our models, or trying to build the workflows Matt just mentioned, I get conflicted. On one hand, I'm glad they find these models impressive, like some kind of magical technology; on the other hand, I'm almost driven crazy because they've gone through so much pain and struggle to get anything actually working.

This applies not only to the developers who build these products. Even just using ChatGPT, I see people copying and pasting from here to there, trying to create a complex prompt, and I know that will all disappear, which excites me. Right now, everything is still too early and too bad.

Host: Just don't remove its integration with BBEdit. That's my favorite feature of the ChatGPT app, bar none.

Note: BBEdit is a long-established text and code editor for macOS. The host is half-jokingly saying that while the AI agent will reduce copy-pasting and manual operations in the future, he still hopes ChatGPT will retain the ability to integrate with the local editor.

Sam Altman: Okay.

Host: (Laughs) Thank you.

Sam Altman: First, these things are just too difficult to do right now. We believe that if we could make them much easier, it would bring more value to developers and businesses. Second, there are many things that simply cannot function reliably right now. I think that through this collaboration, it will not only be a story about ease of use, not just about "not having to build your own Colo anymore." We will also explore many new things together, enabling people to build products and services that were previously impossible even with a lot of pain and struggle.

The real challenge for enterprise agents lies in permissions, data, and security.

Host: I'd like to return to the point of "what can be built" later. But let's quickly go back to Codex. Codex is a harness plus a model, and it runs locally. Why is it easier to have agents working locally now?

Sam Altman: Actually, we initially had it running in the cloud. I think ultimately you do want it to run in the cloud.

Host: Of course. I'm asking this by following the path of transitioning to this cloud product. But why did you go back to local?

Sam Altman: Because your whole environment is already there. Your computer is already configured, your data is already there, and you don't have to think about too many things. While this isn't the final state, it's definitely easier to get things running.

However, entering a world where agents actually run in the cloud would obviously be great. For example, if you have a very demanding task, or need to shut down your computer, or in other situations, you could delegate the work to the cloud. This direction would certainly be fantastic. But in the short term, the ease of use we can offer is clearly still superior when using the user's local environment.

Host: I have a way of understanding this: the old security models were more like a "castle and moat" model, while now you are moving towards a new zero-trust security model, where everything has a proper permission structure, authentication mechanism, and all these details. To me, running locally is somewhat like a self-imposed "castle and moat": everything is local, so I assume they are all fine and easy to handle.

One way I understand this product is that to get all these parts truly operational in a production environment, you can't have them all on-premises. You have to be running in that environment from the start. Matt, does that make sense to you?

Matt Garman: I don't think any computing environment has truly gotten rid of the client. Running locally does have its advantages. There's a reason why most of your iPhone apps also have local components, whether it's for connectivity, latency, local computing, or access to files and apps.

Local clients certainly have their limitations. As Sam points out, they're simple and work well, but they're also restricted and have boundaries. You can't scale your local laptop; you have what you have. Things get more complicated once you move into enterprise scenarios, like sharing between two people; thinking about permissions and security boundaries becomes much more difficult.

So there are many parts to this. I wouldn't say a local environment is a bad thing, it's just another thing. I think eventually you'll want to build a bridge between local and the cloud.

Host: That's exactly my question. In the cloud era, there are containers that help bring your local and production environments closer together. But in the agent scenario, it sounds like if you're dealing with agents, like you just mentioned, it's like a virtual colleague, or something similar. If they have their own identities, their own permissions, and all that, then even just building them, you need to be in the correct environment where they'll ultimately be deployed. That seems to me.

Sam Altman: I think there's still a lot to figure out here. For example, if you're an employee of a company, when you use a service, should you only have one account? And should your agent also use your account? Or should your agent use a different account so the server can tell who it is?

Host: Or, what if you want a lot of agents?

Sam Altman: Exactly. I suspect what we really need is something we haven't figured out yet. Maybe when Ben's agent logs in as Ben, it uses Ben's account but identifies itself as an agent, not the real Ben. We don't even have a basic concept to think about this yet, but we'll probably have to figure it out soon.

My feeling is that there will be another 50 similar things. As agents join the workforce and act with increasing autonomy and task complexity, many of our mental models about how software works, and how access control and permissions operate within companies and on the wider internet, must evolve.

Host: Matt, what are your thoughts on agent security, access policies, and similar issues?

Matt Garman: Yes, I do think that when you migrate more of these kinds of workloads to the cloud, as a centralized organization, you can exert more control over the security-related aspects. We've been talking to clients, and this is definitely something they're concerned about. They'll say, "I love the prospect of these powerful models and agents, but how do I make sure I don't mess things up and create an event that could end the company?"

This concern is real.

I think we can help in this area because these problems are solvable. Indeed, we can. I think we can give customers some confidence: for example, "It's running inside this VPC," then you at least have control over the boundaries and know what it can access; or it's through a gateway, and you can assign it permissions, just like you would assign it a role elsewhere in the environment.

These are capabilities we've built over the past 20 years. We've built a very rich set of capabilities around these structures, enabling not only Y Combinator startups to use AWS, but also banks, healthcare organizations, and government agencies around the world. The entire security architecture built around AWS, I believe, will help us further accelerate our customers' adoption of this technology while providing them with the security safeguards they need to act quickly.

Often, in a company, especially in industries with a strong risk aversion tendency, having these safety barriers allows them to say, "As long as it runs in this sandbox, I'm willing to move forward quickly," which can actually help many customers start using these technologies in a wider range of scenarios.

Host: Many of the capabilities you just mentioned were built by AWS over the past 20 years, and you're now trying to use them for agents. These capabilities are already exposed today through AgentCore. So, what is the relationship between Bedrock Managed Agents, powered by OpenAI, and Bedrock AgentCore?

Note: AgentCore can be understood as a "bottom-level toolkit" or "basic component platform" provided by AWS for enterprises to develop AI agents. The relationship between the two can be understood as follows: AgentCore = the underlying building blocks; Bedrock Managed Agents = the finished solution assembled by AWS and OpenAI.

Matt Garman: A lot of the things we built together were based on AgentCore building blocks, putting those parts together.

Host: So it's kind of like a superset on top of AgentCore?

Matt Garman: The AWS team and the OpenAI team used AgentCore components together, combined with OpenAI models and many other parts, to build this product.

AgentCore can be understood as a set of basic building blocks we provide. Just like on AWS, if you want to build your own agent workflow, you can directly use these modules: such as memory components, secure execution environments, and permission management capabilities. You can configure these capabilities yourself and combine them to create an agent system suitable for your business. Some customers are already running these capabilities in production environments and have created many cool applications.

Host: But not with OpenAI.

Matt Garman: But not with OpenAI. Today they have to use different models, that's true. Actually, no, that's not entirely true either. We do have people doing this with OpenAI.

Host: Oh, it's just calling another model on the cloud, or something similar.

Matt Garman: They're just calling OpenAI models directly. So people are definitely using OpenAI for this today, just not in the native Bedrock way, but they're still using it. It's an open ecosystem; you can pull different capabilities to build whatever you want. I bet people will continue to do that.

There are some builders out there who love—to borrow Sam's analogy—even though it's no longer necessary today, they still enjoy assembling their own computers at home. People love building. We believe that people will continue to build their own agents for a long time to come. But the vast majority of them will want an easier way to do things; they don't want to configure all those parts themselves. That's one of the things we're launching in this collaboration.

Host: I'd like to clarify this distinction further. Bedrock Managed Agents is a managed service; however, users can also use AgentCore to connect to different models, regardless of whether the model is on AWS or another cloud. Sam, does this constitute the difference between it and OpenAI's Azure service? Simply put, on Azure, users primarily access the OpenAI API directly; while on Amazon, it's a more complete managed agent service. Is this understanding correct?

Sam Altman: Correct, yes.

Host: You're very confident about this? It's correctly defined in terms of all clauses and scope, and it won't become a problem in the future?

Sam Altman: Yes. I think things will evolve over time, but as a starting point, I'm very confident in this approach.

Host: Will this be an AWS exclusive? Or do you also expect to offer a similar managed experience on other clouds?

Sam Altman: Yes, we will be doing this exclusively with Amazon, and we are very excited about it.

Host: How much of this exclusivity comes from, "Look, we use all of Amazon's APIs, so of course it's only on Amazon"? Or is it not simply "we use Amazon APIs," but rather the entire concept of managed experience itself, which will currently be hosted on Amazon?

Sam Altman: In spirit, we hope this is a collaborative effort between the two companies.

Host: I understand. The press release mentioned something, which goes back to what Matt just said: theoretically, you can call other APIs and then glue everything together yourself. But in this case, the customer data will remain within AWS. So what exactly can OpenAI see? What does that mean?

Matt Garman: Yes. The whole thing basically stays in your VPC, so the data is protected within the Bedrock environment.

Note: VPC stands for Virtual Private Cloud, which can be understood as a "private cloud network space" that an enterprise allocates within AWS.

Host: I understand. This product will run on OpenAI models via Bedrock, and these models will run on Trainium, right?

Matt Garman: They will run in a mixed manner in different ways—part of it will be on the Trainium, and part of it will be on the GPU.

Note: Trainium is AWS's self-developed AI acceleration chip used to support large model training and inference. Similar to NVIDIA GPUs, it belongs to the underlying computing infrastructure. For ordinary enterprise customers, they typically don't need to directly interact with Trainium, but rather indirectly use the underlying computing power through managed services like Bedrock.

Host: Is this just due to the time factor? Because I remember you mentioned in your announcement a few months ago...

Matt Garman: Part of it is time, and part of it is capability. I think we will use a mix of different components and appropriate infrastructure for different parts as we build the system together. But over time, more and more parts will run on Trainium.

Sam Altman: We are very much looking forward to running these models on Trainium.

AI platform competition is shifting from models to infrastructure.

Host: I can imagine. Matt, I have a quick question about Trainium, which is also a more general one. This is how I currently understand Trainium, and I'd like to confirm if it's correct. The name Trainium is quite unfortunate because its real importance in the future will actually lie in inference. Its primary presentation will be through managed services like Bedrock. That is to say, clients may not even know exactly what computing resources they are using. Is this understanding fair?

Matt Garman: First of all, I am willing to take responsibility for the terrible naming of all AWS services.

Host: That's okay. I have a website that gets spread by word of mouth called Stratechery, so I totally understand bad naming.

Sam Altman: I think the word "trainium" is pretty cool.

Matt Garman: It's really cool.

Host: That's a pretty cool term, but it feels more like an inference chip than a training chip.

Matt Garman: Yes. But putting the naming aside, it's useful for both training and inference. Honestly, this is a chip that excites us a lot. We believe it will be a huge business, both in the current generation and future versions, and a major driving force behind many things we're working on together.

By the way, I think that, like with GPUs, you'll interact with many of these accelerator chips through an abstraction layer. Most customers don't actually interact directly with GPUs, unless they're using them for graphics-related scenarios on their laptops. But when you interact with OpenAI, even if it runs on a GPU, you're not talking to the GPU; when you talk to Claude, regardless of whether it's based on a GPU, Trainium, or TPU, you're not talking to those chips, you're talking to the interface.

The vast majority of inference is performed by a few models. So whether it's 5, 10, 20, or 100 models, it's not millions of people directly programming these chips. This will continue in the future because these systems are too complex and too large. If you want to train a model, not many people have enough money to train it, and not many people truly have the ability to manage them. They are extremely complex systems, and the OpenAI team's ability to extract value from large computing clusters is astonishing. But not many people have such a team. Regardless of the specific chip, I believe this applies to all accelerator chips.

Sam Altman: Ben, I'm increasingly feeling that what we, as a company, need to do is become a token factory. But what customers really care about is that we can deliver the best intelligence units at the lowest price, and in the quantity and capacity they want.

Host: Do you think we will continue with our current pricing method, that is, pricing by tokens? Is this reasonable in the long run?

Sam Altman: That's unreasonable. In fact, our recently released 5.5 model is an interesting example. Its cost per token is much higher than 5.4, but it requires a significantly fewer tokens to complete the same answer. Essentially, you don't care how many tokens the answer costs; you just want the work done. What you want is a price and the capacity you can get.

So maybe I was wrong when I said "token factory." We're more like an intelligence factory, or something like that. We want to provide as many "intelligence units" as possible at the lowest possible price. Whether it's a larger model running fewer tokens or a smaller model running many tokens; whether it's GPUs, Trainium, or something else; or whatever other creative way we do it, I don't think customers care.

In fact, they won't directly handle these things at all. When you put things into Codex, or build a new agent in an SRE (Stateful Runtime Environment), you shouldn't need to think about these issues at all. You should just be surprised at how much you've gotten at such a low cost.

Host: Is the decrease in token usage due to the model itself, or the supporting system (Harness)?

Sam Altman: It's mainly a model, but it also plays a small part in supporting systems (Harness).

Host: I understand. Matt, by the way, I just asked Sam about the exclusivity issue. Do you anticipate providing similar managed services for other models in the future?

Matt Garman: We're focused on doing this with OpenAI right now. We're very excited about what we're building together. As for the longer term, that's a long time.

Host: "The longer-term future is a very long time," I'll let you reserve that answer for now. No problem, I have to ask this question.

Regarding customers, I have another question. Sam, considering your points just now, I'd also like to hear your perspectives. When customers actually enter the production environment, where does OpenAI's responsibility end, and where does AWS's responsibility begin? To me, if all the data is on AWS and remains there, and the customer is operating at a higher level, then ultimately it's AWS's responsibility? From a consumer's perspective, is this understanding correct?

Matt Garman: Yes, I think that's correct. When you need to contact someone, you contact AWS support for help. It's part of your AWS environment; it's what you build on AWS. Your AWS account reps will be there to help you. When we build it, we also involve our OpenAI colleagues to help you figure out how to best utilize the product or handle similar issues. In some cases, if we encounter bugs that require their help, we escalate the issue to them. But AWS will be your direct, first-line support.

Host: Sam, what's your take on the scale of this business relative to OpenAI's core API business?

Sam Altman: I hope it will be very large. We are investing a lot of effort into this and have committed to purchasing a lot of computing power. I believe there will be a lot of revenue to support all of this. One framework I'm increasingly convinced of is that when the price is low enough, the demand for intelligence is essentially limitless.

Host: So from this perspective, its demand elasticity is very high? Prices fall, and demand rises?

Sam Altman: Of course, there is that. But to put it another way, if you lower the price of water, maybe you'll drink more water, maybe you'll go from showering once a day to showering twice a day; there's some flexibility. But at some point, you'll say, "You know what? I have enough water."

Host: And if you absolutely need water, you'll buy it no matter how expensive it is.

Sam Altman: The same goes for other utilities. If electricity is cheaper, you'll naturally use more. But if you think of intelligence as a utility, I don't know of any other utility that would make me think, "I just want more. As long as the price is low enough, I'll keep using more."

Matt Garman: Interestingly, the same is true for computing power in large ways. Think about the cost of a computing cycle today, which is orders of magnitude cheaper than it was 30 years ago, and more computing power is being sold today than ever before.

Host: Right. At least until extremely high scale is reached and cost becomes important, people don't usually really think about computing costs. Generally speaking, from a strategic perspective, everyone just assumes they have computing power. So how far does AI need to go to reach that point? In other words, people no longer make "how much money did I spend here" their first reaction.

Sam Altman: I don't think that's the first reaction right now. There are far more customers asking us now, "No matter the price, can you give me more? I just need more capacity, and I'm willing to pay more." By comparison, there are far fewer people competing with us on price.

But I do believe we will continue to bring prices down significantly, by a truly staggering margin. Perhaps the more we do this, the more wealth will flow into this sector. But I am confident that we will continue to substantially reduce costs at our current level of intelligence.

One thing that surprised me somewhat is that, at least for today, a significant portion of total market demand is focused on absolute frontier models.

Host: Yes, there are many issues in this regard. The cutting-edge service model is very expensive, and people can actually use the previous version. But you mean, regardless, people just want to use the most advanced version?

Sam Altman: So far, yes.

Matt Garman: I think this is a very good sign that we are still far from the state we really want to achieve, and there is still a lot of unmet demand. I do think it's a bit like the computing needs of 40 years ago. Back then, a computer was extremely expensive, while now everyone's mobile phone has far more computing power than it did then, and we've sold billions of such devices.

I believe the same thing will happen in the world of AI. Today, everyone wants to use cutting-edge models because you need them to get a lot of useful work done, and everyone is very excited about the capabilities outside of AI.

I believe that over time, you will have a hybrid set of models. Incidentally, some smaller models will be able to accomplish certain things, even things that the latest OpenAI models haven't yet been able to do. But they will become smaller, cheaper, and faster over time. At the same time, there will also be those super-large models trying to tackle cancer and other similar problems.

But I think we're still in the early stages of a possibility. When you see so much demand and such rapid growth in the early stages of a possibility, the future is very exciting.

Host: Is there a somewhat cynical perspective: Sam, you have a group of clients who say, "We really want to use OpenAI models, but everything we have is on AWS, and we're not moving." Matt, on your side, it's, "Look, everything we have is on AWS, can you bring the OpenAI models over?" So this is simply about fulfilling that demand. And it turns out that because AWS is the largest, the demand is astronomical. Is this the simplest answer? Or is there another layer, that you genuinely believe you can deliver something highly differentiated, and that it will attract new customers for both of you?

Sam Altman: We are certainly very happy to reach AWS customers, and many people really like AWS. Yes, that's true.

Matt Garman: This part is definitely true.

Host: (Laughs) Right.

Matt Garman: Conversely, our customers are also very excited to have access to OpenAI technology.

Sam Altman: But I do believe we can build something incredible new together. I hope that a year from now, when people look back on this, the most important thing they'll talk about won't be, "Oh, finally we can access these models through AWS," or something like that. Instead, they'll say, "Wow, we didn't realize how important this new product was before."

I believe that, in terms of models, supporting systems, and capabilities, we are approaching a completely new form of computing. It will feel very different from the existing "I need the API for this model" mentality.

Matt Garman: I completely agree, that's the key. The first part is good, very good; but the second part, I think, is where we all really get excited.

Host: Speaking of which, I'd like to return to this topic. I have a theory, which may not be entirely accurate, and I'm curious to hear your thoughts, about "what else needs to be built." Specifically, there might eventually be a real middleware or intermediate layer. Within an organization, there are various databases, SaaS applications, and various data fragments spanning different systems. On top of that, there would be an agent layer or supporting system (Harness). It seems there's something else that needs to be built in between. OpenAI Frontier touched on this issue to some extent. Is this part of it? Or is this something that needs to be built in the future? Or am I completely wrong, and we don't need this at all?

Sam Altman: You're absolutely right, we definitely need something like that. Lately, I've been talking to clients, especially large enterprises, and they're saying things like, "I want some kind of agent runtime environment; I want a management layer that can connect my data to the agents, while making sure I understand where tokens are spent and where they aren't, and having some kind of oversight; I also want some kind of workspace"—hopefully Codex—"something like that for my employees."

The set of things people are asking for is becoming very consistent. But now we still need to actually build the whole product.

Host: It sounds like we almost need a dual agent layer. One agent layer maintains the middle layer, constantly delving into various data sources; the other is the actual user interface layer, where people actually interact. Does this align with the direction we're heading? Or am I digressing?

Sam Altman: I agree with both of these points; that's how the world might look today. But as models become truly intelligent, I don't think we already know what future architectures will actually look like.

Now, at this layer, which you could call the user agent layer, people really want to interact with multiple agents. We let you build agents for this thing, agents for that thing, and they can talk to each other, and so on. Then, at the company's management level, people have various control mechanisms to help AI explore files in the file system.

Host: And then at some point, you realize that you're just clinging to the past for no reason. These things should have been done in the model.

Sam Altman: That's exactly what I was trying to say. At some point, you might say, "We already have such amazing capabilities, let's redesign the whole architecture."

Matt Garman: Yes, I agree. I think something different is definitely going to emerge here. I'm not sure if we know exactly what it is now, but that's part of the beauty of it. You get customers to use it, to build it, and then you can learn from them and figure out how to make these things easier, faster, and better for them.

Host: Sam, this is our second time doing a product launch interview like this. The last time was with Kevin Scott about New Bing. At that time, you were quite confident in the threat you posed to Google. What do you think happened in the end?

Note: Kevin Scott is Microsoft's Chief Technology Officer. New Bing is an AI search product launched by Microsoft in February 2023, powered by OpenAI technology. It attempts to upgrade traditional search from "returning links" to an interactive method that "directly generates answers and assists in completing tasks." At the time, New Bing was seen as a significant attempt by Microsoft to challenge Google's search dominance using OpenAI.

Sam Altman: I think we did better than I expected. ChatGPT is, in my opinion, the first truly large-scale new consumer product since Facebook.

Host: So that's the answer? In other words, you did better than you expected, but it was mainly reflected in ChatGPT, not in other areas?

Sam Altman: No, I think we've done a pretty good job with APIs, especially Codex. But that wasn't what I was thinking at the time. I was thinking that maybe these new language interfaces would change the way people find information on the internet. And Google is an absolutely extraordinary company. I think Google is still underestimated in many ways, considering the breadth and depth of what it does. But relatively speaking, I'm satisfied with ChatGPT's performance.

Host: Matt, I also have a similar Google question for you. Just this week, Thomas Kurian (CEO of Google Cloud) spoke about their fully integrated technology stack, from models to chips to the agent layer—it's all integrated from top to bottom. You're here today with an executive from another company, Amazon, which, by definition, isn't fully integrated internally.

Many people have criticized you for not having a cutting-edge model. But now we've entered the era of inference, and you're used to serving a large number of companies. Is it possible that by maintaining a certain degree of neutrality, you've actually found yourself in a better position? Was this intentional, or did you accidentally find yourself in a very good position, simply unaware of its importance before?

Matt Garman: One thing is intentional. Ever since we started AWS, we've always considered our partners a critical part of supporting our end customers. From the beginning, this has been a very important part of our strategy: working deeply with our partners. Perhaps unlike some other companies, we believe that if our partners succeed, if they are building on top of us or with us, then we succeed, and that's great.

We see it as working together to make the pie bigger, and that's a victory. But that's not necessarily how other people see the world. Sometimes they say, "I have to have it all." That's fine too; it's one perspective.

But I think choice is important. That's how the best product wins. By the way, in this world, you can have first-party products, and you can have many third-party products. But our view is that we want customers to choose what's best for them. If what's best for them is something you build yourself, that's great.

For us, if the best thing is built by our partners but runs on top of us, we consider that a victory, because it's the best thing for our customers. We've always thought this way, and that's actually how we build the Bedrock platform in the AI world. We want to support a wide range of models and a wide range of capabilities. This has always been true, from databases to computing platforms and everything else.

So I think this is a deliberate strategy. I also think it's a strategy that clients appreciate because they like this approach. We're excited to continue exploring this direction.

Host: Yes, that's very interesting. There's a balance to be struck between software, platforms, and infrastructure, with everyone claiming to serve everyone. But it feels like, if we go back to the beginning of AWS, it started with I, or Infrastructure. From my perspective, that gave you almost maximum flexibility, allowing you to meet Sam in the middle. Sam has a strong S, or Software; together you're building a P, or Platform. I think that's a fair assessment.

Matt Garman: Exactly. It does get more difficult in some areas. For example, we say, "We only have one S3," and there aren't any other S3 products, and that's true. So some core components, like you said, at the infrastructure level, we do place a lot of emphasis on what we build ourselves.

But as you move up the technology stack, I think the capability set becomes broader. In any case, I don't believe any single company will own all the applications. As you move down the technology stack, into the model and services layer, the number decreases; further down to the infrastructure layer, the number decreases even more. Our view is that embracing the entire partner portfolio is good for our end customers.

Host: Sam, is there anything you'd like to say in closing?

Sam Altman: I think Matt made an excellent point. I truly believe there's enormous potential in the next generation of products that developers can build right now. Given that we expect model capabilities to improve at a very steep curve over the next year, the timing is perfect for us to embark on this journey together and work to truly build a platform to enable it. I think people will love it.

Host: Great. Matt, Sam, thank you for coming to Stratechery.

Matt Garman: That's great. Thank you for inviting us.

Sam Altman: Thank you.