By: Thomas Stahura
AGENTS! AGENTS!! AGENTS!!!
Big tech can’t get enough of them! Google’s got Mariner. Microsoft’s got Copilot. Salesforce rolled out Agentforce. OpenAI’s cooking up Operator. And Anthropic has Computer Use. (Naming is hard.)
You’ve heard the hype. Maybe you’re already sick of it. They even got Matthew McConaughey to say it during the Super Bowl — America's most sacred Sunday ritual.
But have you actually used one? Probably not. And funny enough, most of the “agents” I just listed aren’t even real agents.
So what is an agent, anyway?
An agent, put simply, is a Large Language Model (LLM) in a reasoning loop that has access to tools (like a browser, code interpreter, or calculator). The LLM is prompted to break down tasks into steps and to use tools to autonomously accomplish its given goal. The tools then provide feedback from the digital environment and the LLM continues to its next step until the task is complete.
A browser agent is given a task: “Book a flight from San Francisco to Seattle.” First, it runs an “open browser” command, and the browser confirms: “Browser is open,” with a screenshot. Next, it types “San Francisco to Seattle flights” into the search bar, hits enter, and waits for results. It scans the listings, picks a booking site, clicks through, and follows the prompts— step by step. Each action generates feedback to keep it on track until the task is complete.
Most agents have a litany of specific tools, but all you really need is to move the mouse, click, type, and scroll. After all, that's all humans need to use a computer.
So what, then, makes me say that most agents out there aren't actually agents? For starters, Mariner is on a waitlist, Copilot doesn't have access to any tools, and Agentforce only has access to Salesforce-specific tools. OpenAI’s Operator and Anthropic’s Computer Use are what I’d call actual agents. But Operator is $200/month and Computer use is in beta.
Open source is not far behind. Browser use (YC W25) exploded onto the scene about a month ago and already has 27k github stars. I’ve used browser-use for my AI bias hackathon project. Works with any LLM in only 15 lines of code. Totally free.
Autogen, a Microsoft agent framework, is also open source with 39k stars. Along with Skyvern (12k stars YC S23) and Stagehand (7.5k stars). And these are just browser agents! There are also coding agents that live within an integrated development environment (IDE) like the closed-source Replit, GitHub Copilot, and Cursor, and the open-source Cline (28k stars), Continue.dev (23k stars), and Void (10k stars/YC S24).
Agents, at the end of the day, are about autonomous control. Whether it's a browser or a calculator, the more tools, control, and thus access you give an LLM, the more it can do on your behalf. In that respect, not all agents are created equal.
When I use my computer, I don't just use the browser or IDE. Sure, I spend a bunch of time online (who doesn't?), and coding (so much), but I control my computer on the OS level. I’m able to jump between different applications and navigate my file system with my keyboard and mouse, so shouldn't my agent, too?
Many thought an OS-level agent was impossible a few months ago. Now it seems inevitable. Imagine a future where we interact with our devices in the same way Tony Stark interacts with Jarvis in Iron Man (2008). This is an entirely new human-computer interaction paradigm that is set to completely change the industry.
Big tech knows this. Apple has enabled developers to write custom tools for Apple Intelligence to interact with. And MS Copilot Recall automatically records your screen to automate tasks (that is before it was recalled over privacy issues).
In the open community, Open Interpreter (58k stars) is an OS-level agent that can write and execute commands in the command line. It has limitations (no vision capabilities) but is impressive and the first of its kind. Other models such as OS-Atlas and UI-TARS exist but are not nearly as popular as browser or IDE agents. (We invested in Moondream, a startup building vision “pointing” capabilities for agent developers.)
The OS agent wars are existential for big tech. Any agent that exists within Windows or MacOS will get hamstrung by permissions requirements enshittifying the experience of alternatives while Microsoft and Apple keep their control over the industry. If these companies own and control the software that controls your computer, is it really your computer? I think not.
Regardless, agents still have a long way to go. Reliability remains a large issue along with handling authentication (to email, social media, and other sites). These, however, are solvable problems. Meta has already set up GAIA, a general AI assistant benchmark, that if solved “would represent a milestone in AI research.” And Okta, owners of Auth0, invested in Browserbase to help the agent company manage web authentication.
It's only a matter of time at this point.
P.S. If you have any questions or just want to talk about AI, email me! thomas@ascend.vc