Subscribe

Who Purchased This Smoked Salmon? The Impact of AI Agents on the Internet and Shopping Lists | Artificial Intelligence (AI)

Date:

The Rise of AI Agents: A Glimpse into the Future of Automation

I’m watching artificial intelligence order my groceries. Armed with my shopping list, it types each item into the search bar of a supermarket website, then uses its cursor to click. Watching what appears to be a digital ghost perform this usually mundane task is strangely transfixing. “Are you sure it’s not just a person in India?” my husband asks, peering over my shoulder.

I’m trying out Operator, a new AI “agent” from OpenAI, the maker of ChatGPT. Launched for UK users last month, it features a text interface and conversational tone similar to ChatGPT, but rather than just answering questions, it can actually do things—provided they involve navigating a web browser.

The Appeal of AI Agents

Hot on the heels of large language models, AI agents are being hailed as the next big thing in technology. The allure is clear: a digital assistant that can complete practical tasks is far more compelling than one that merely engages in conversation. OpenAI’s offering is not alone; Anthropic introduced “computer use” capabilities to its Claude chatbot late last year, while Perplexity and Google have also rolled out “agentic” features in their AI assistants. Various companies are developing agents tailored for specific tasks, such as coding or research.

The definition of an AI agent is still up for debate, but the general consensus is that they must be able to take actions with a degree of autonomy. “As soon as something starts to execute actions outside of the chat window, then it’s gone from being a chatbot to an agent,” explains Margaret Mitchell, chief ethics scientist at AI company Hugging Face.

The Experimental Nature of AI Agents

It’s important to note that most commercially available agents come with a disclaimer stating they are still experimental. OpenAI describes Operator as a “research preview,” and there are plenty of amusing examples online of these agents making mistakes—like spending $31 on a dozen eggs or trying to return groceries to the store they were purchased from. Depending on whom you ask, these agents are either the next overhyped tech toy or the dawn of an AI future that could reshape the workforce and change how we live.

“On paper, they could be amazing because they could automate a lot of drudgery,” says Gary Marcus, a scientist skeptical of large language models. “But I don’t think they will work reliably any time soon, and it’s partly an investment in hype.”

My Experience with Operator

Curious to see for myself, I sign up for Operator. With no food in the house, grocery shopping seems like a fitting first task. I type my request, and it asks if I have a preferred shop or brand. I tell it to go with whichever is cheapest. A window appears showing a web browser, and I watch as it searches “UK online grocery delivery.” The mouse cursor selects the first result: Ocado. It begins searching for my requested items and filters the results by price, selecting products and clicking “Add to trolley.”

I’m impressed with Operator’s initiative; it doesn’t bombard me with questions but instead makes executive decisions based on brief item descriptions like “salmon” or “chicken.” When searching for eggs, it successfully scrolls past several non-egg items that appear as special offers. My list asks for “a few different vegetables,” and it selects a head of broccoli, then asks if I’d like anything else specific. I tell it to choose two more, and it opts for carrots and leeks—choices I would have made myself. Feeling bold, I ask it to add “a sweet treat,” and I watch as it types “sweet treat” into the search bar. Although it initially selects 70% chocolate, I inform it that I don’t like dark chocolate, and it swaps it for a Galaxy bar.

Navigating Challenges

However, we hit a snag when Operator realizes that Ocado has a minimum spend. I add more items to the list, but then it comes to logging in. The agent prompts me to intervene, as it’s designed to request user input when handling sensitive information like login credentials or payment details. While Operator usually takes constant screenshots to “see” what it’s doing, OpenAI clarifies that it does not do this when a user takes control.

At checkout, I test the waters by asking Operator to complete the payment. I take back control when it asks for my card details. Although I’ve already provided OpenAI with my payment information (Operator requires a ChatGPT Pro account, costing $200 a month), I feel uneasy sharing this directly with an AI. After placing the order, I await my delivery the following day. But that doesn’t solve dinner. I give Operator a new task: can it order me a cheeseburger and chips from a local, highly-rated restaurant?

Ordering Takeout

It asks for my postcode, then loads the Deliveroo website and searches for “cheeseburger.” Again, there’s a pause when I have to log in, but since Deliveroo already has my card details stored, Operator can proceed directly to payment. The restaurant it selects is local and highly rated—a fish and chip shop. I end up with a passable cheeseburger and a large bag of chippy-style chips. Not exactly what I’d envisioned, but not wrong either. I’m mortified, however, when I realize Operator skipped over tipping the delivery rider. I sheepishly take my food and add a generous tip after the fact.

Limitations of the Current Generation

Of course, watching Operator in action somewhat defeats the time-saving purpose of using an AI agent for online tasks. Ideally, you could let it work in the background while you focus on other tabs. While drafting this piece, I make another request: can it book me a gel manicure at a local salon?

Operator struggles more with this task. It navigates to the beauty booking platform Fresha, but when prompted to log in, I see it has chosen an appointment a week too late and over an hour’s drive away from my home in East London. I point out these issues, and it finds a slot for the right date but in Leicester Square—still a distance away. Only then does it ask for my location, revealing that it must not have retained this knowledge between tasks. By this point, I could have already made my own booking. Operator eventually suggests a suitable appointment, but I abandon the task, chalking it up as a win for Team Human.

The Future of AI Agents

It’s clear that this first generation of AI agents has limitations. The need for human oversight during login processes requires a fair amount of intervention, although Operator does store cookies to keep users logged into websites on subsequent visits. While the results are usually accurate, they don’t always align with my expectations. When my groceries arrive, I find that Operator has ordered smoked salmon instead of fillets and doubled up on yogurt—possibly due to a special offer. It interpreted “some fish cakes” to mean three packs (I intended just one) and was only saved from buying chocolate milk instead of plain because that product was out of stock. To be fair, I had the opportunity to review the order, and I would have achieved better results if I’d been more specific in my prompts. However, these extra steps would also detract from the effort saved.

Despite these current flaws, my experience with Operator feels like a glimpse into a future where such systems could become embedded in everyday life. You might already write your shopping list on an app; why wouldn’t it also place the order? Agents are also infiltrating workflows beyond personal assistance. OpenAI’s CEO, Sam Altman, has predicted that AI agents could “join the workforce” this year.

AI Agents in Software Development

Software developers are among the early adopters of these technologies. GitHub recently added agentic capabilities to its AI Copilot tool. GitHub’s CEO, Thomas Dohmke, notes that developers are accustomed to some level of automated assistance; the difference with AI agents lies in their level of autonomy. “Instead of you just asking a question and it gives you an answer, you give it a problem, and then it iterates on that problem together with the code that it has access to,” he explains.

GitHub is already working on an agent with greater autonomy, dubbed Project Padawan (a Star Wars term referring to a Jedi apprentice). This would allow an AI agent to work asynchronously, meaning a developer could have teams of agents reporting to them, producing code for review. Dohmke believes that developers’ jobs are not at risk, as their skills will remain in demand. “I’d argue the amount of work that AI has added to most developers’ backlog is higher than the amount of work it has taken over,” he asserts. Agents could also make coding tasks, such as building an app, more accessible to non-technical individuals.

The Vision of Personal AI Assistants

Outside of software development, Dohmke envisions a future where everyone has their own personal Jarvis, the talking AI from Iron Man. Your agent would learn your habits and become customized to your tastes, making it increasingly useful. He imagines using his agent to book holidays for his family.

However, the more autonomy these agents possess, the greater the risks they pose. Mitchell from Hugging Face co-authored a paper warning against the development of fully autonomous agents. “Fully autonomous means that human control has been fully ceded,” she cautions. Rather than operating within set boundaries, a fully autonomous agent could gain access to sensitive information or behave unexpectedly, especially if it can write its own code. While it’s not a big deal if an AI agent gets your takeout order wrong, what if it starts sharing your personal information with scam websites or posting inappropriate content on social media? High-risk workplaces could introduce particularly hazardous scenarios, such as accessing missile command systems.

The Future Landscape of AI Agents

Mitchell hopes that technologists, legislators, and policymakers will incentivize guardrails to mitigate such incidents. For now, she anticipates that agentic capabilities will become more refined for specific tasks. Soon, we may see agents interacting with other agents—your agent could work with mine to set up a meeting, for example.

This proliferation of agents could reshape the internet. Currently, much of the information online is tailored for human language, but as AIs increasingly interact with websites, this could change. “We’re going to see more and more information available through the internet that is not directly human language, but is the information necessary for an agent to act on it,” Mitchell predicts.

Dohmke echoes this sentiment, believing that the concept of the homepage will lose significance as interfaces are designed with AI agents in mind. Brands may start competing for AI attention rather than human eyeballs.

One day, agents may even escape the confines of the computer. We could see AI agents embodied in robots, opening up a world of physical tasks for them to assist with. “My prediction is that we’re going to see agents that can do our laundry for us, do our dishes, and make us breakfast,” Mitchell says. “Just don’t give them access to weapons.”

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Share post:

Subscribe

New updates

More like this
Related

Beyond Documentation: Creating Competitive Advantages in Healthcare AI

The AI Revolution in Healthcare: From Scribing to Comprehensive...

UC Artificial Intelligence Council Unveils Website for AI News...

UC Artificial Intelligence Council Launches Comprehensive AI Platform On February...

Advancements and Challenges: The Impact of AI on the...

The Transformative Potential of Artificial Intelligence in Health Care Artificial...

The Rise of AI in China: A Growing Phenomenon

The Rise of AI in China: A New Era...