Is AI Gunning for My Job, or Just My To-Do List?

Adam Hislop
2 days ago
8 min read

There’s a moment, usually halfway through yet another breathless AI keynote, when I find myself wondering: is this thing actually gunning for my job, or just my lunch hour?

We’re far enough down the AI motorway now that it’s reasonable for people to ask “how soon before AI is doing my job?” Not just nibbling round the edges, but actually doing the work I get paid for.

At the same time, the labour market picture is oddly split-screen. On one side, central bank research in the US is still saying that AI adoption hasn’t yet led to major job losses; so far it’s mostly meant retraining rather than mass redundancy, with firms expecting the bigger impact to show up later. On the other side, you have surveys like the British Standards Institution’s “job-pocalypse” report, where 41% of leaders say they are using AI to reduce headcount and nearly a third say they consider AI before hiring people at all, with a particular focus on replacing entry-level work.

So if companies aren’t sacking everyone, they are at least loosening the definition of “we need to hire for this.” That tracks with what’s happening inside organisations. Wharton’s 2025 AI Adoption study found that over 80% of enterprise leaders now use generative AI at least weekly and nearly half use it daily, with three quarters already seeing positive returns on those investments. In other words, AI has quietly become part of the furniture at work.

When tools are embedded like that, you don’t need a press release saying “we cut fifty roles because of AI.” You just need each team to get slightly more done per head. Over time, that shows up as slower hiring, thinner graduate intakes and fewer replacement roles when someone leaves.

AI is good at tasks, not jobs (for now)

Paul Roetzer at SmarterX has a neat way of breaking this down. He thinks of his role as a CEO as maybe twenty-five big things he does each month, and each of those big things is made up of lots of individual tasks. Right now, AI is pretty good at those tasks – the research summaries, the first draft emails, the slide outlines – but it cannot yet run the whole “CEO job” end-to-end.

That distinction between tasks, projects and jobs is important. When people say “AI will take my job,” what they usually mean is “AI is already nibbling away at the task-level stuff I do all day.”

If you’re a knowledge worker, chances are you already collaborate with an AI assistant to write, summarise, translate, debug or brainstorm. Those are tasks. More advanced firms are hard-wiring AI into processes so the system doesn’t just suggest a next step, it takes it. And then there’s “agentic AI” – systems that can plan, call tools, and work semi-autonomously towards a goal. That’s where it starts to feel more like a colleague than a calculator.

I’m deliberately ignoring the low-quality, unreviewed “AI workslop” that clogs inboxes and CMSs and then quietly costs teams hours in rework. That’s a story for another day. Let’s assume we’re talking about well-designed systems with human oversight, not a content farm glued to a spreadsheet.

If we accept Paul’s framing, then the question becomes: how close are we to AI actually doing the whole job, rather than just lots of bits of it?

Project Mercury: AI comes for the junior banker

OpenAI’s “Project Mercury” is a useful case study in where this is heading. According to reporting based on internal documents, OpenAI has assembled more than a hundred former investment bankers from firms like JPMorgan, Morgan Stanley and Goldman Sachs to train models on the kind of grunt work junior analysts do: building complex Excel models for deals, tweaking pitch decks and producing endless variations of reports. Contractors are reportedly being paid around $150 an hour to build and annotate those models, following standard industry templates, with the explicit aim of automating much of the early-career workload.

That is not a general-purpose chatbot. It is a highly targeted attempt to teach a model how to behave like a very fast, very compliant first-year analyst. If you zoom out, it’s easy to imagine the same pattern applied to consulting, law, software testing, accounting – anywhere there’s a well-documented body of “entry-level” work.

So yes, AI is gunning for jobs – but in a very particular way. It is starting at the bottom of the ladder, where work is most standardised and most easily turned into labelled training data.

Reality check: the Remote Labour Index

Against that slightly alarming backdrop, it’s worth looking at what happens when you actually ask today’s AI agents to do paid work.

Scale AI and the Centre for AI Safety recently introduced the Remote Labour Index (RLI), which takes real freelance projects from platforms – about 240 projects across 23 domains, representing more than 6,000 hours of human work and roughly $144,000 in wages – and asks AI agents to act as the freelancer. The median project is about eleven and a half hours of work and about $200 of value, so this is not toy data.

The headline number is underwhelming in the best possible way. The best-performing agent, Manus, managed to fully complete just 2.5% of the projects. Everything else failed in some way. Most failures weren’t dramatic; they were depressingly familiar. A large chunk had quality issues, where the work simply wasn’t up to a standard a client would accept. Others produced incomplete or malformed deliverables – missing source files, truncated videos, empty folders. Then there were technical problems such as corrupt files, and logical inconsistencies across assets in the same project.

Success rate percentage of agents in RLI testing

Where agents did succeed, it was largely on creative tasks that involved generating something from a simple prompt – a sound effect, a logo, a basic report – rather than carefully editing an existing asset or following a multi-step brief. In other words, they are powerful generators, but they are not yet dependable professionals.

Two things can be true at once here. First, the fear of an imminent wave of full automation across white-collar work is not supported by this kind of data. A 97.5% failure rate on end-to-end projects is not what you’d call “AGI has arrived.” Second, the 2.5% success rate is still remarkable. It means there are already pockets of work where an AI agent can operate at or above human freelancer level, and that fraction will almost certainly rise.

The one-person billion-dollar company

Into this mix comes the now famous prediction from OpenAI’s Sam Altman: that by 2028 we’ll see the first “one-person billion-dollar company”, powered by a stack of AI agents rather than a small army of employees. The mental image is irresistible: one founder, one laptop, one internet connection and a buzzing swarm of digital workers handling operations, marketing, finance, support and product.

If you spend any time on YouTube, you’ll already have met the upbeat twenty-something proudly announcing that they’ve “just hired another digital employee” for their unnamed “online business.” The implication is that we are already halfway to Altman’s one-person unicorn world, and you are an absolute mug if you’re still using mere humans.

It’s a great story. It’s just not the whole story.

When you actually let AI “run” a company

Journalist Evan Ratliff decided to test the idea by setting up a fictional startup, HurumoAI, staffed almost entirely by AI agents. He remained the only human, acting as founder; the rest of the org chart was populated with AI “employees” built using an agent platform. Their first mission was to build a “procrastination engine” called Sloth Surf: a tongue-in-cheek web app that would waste time on the internet for you so you could get back to work.

The output of HurumoAI, an "AI employee" only company.

On paper, the AI employees did all the right things. They brainstormed product names, created plans, scheduled sprints, generated marketing copy and even held water-cooler chats in Slack. At one point they enthusiastically planned an offsite retreat, complete with ocean-view strategy sessions, without Ratliff’s approval. In the process they burned through credits on the underlying service, happily “working” away while he went off to do actual human tasks. Eventually they did produce a working prototype of the app, but only after months of prompting and with Ratliff still doing a lot of steering and sanity-checking.

It’s a brilliant, chaotic experiment, and it illustrates a key point. Today’s agents are excellent at performing the surface rituals of work – meetings, plans, documents, updates – but they are still shaky on the boring bit: delivering reliably against a business outcome without someone constantly watching them. When people talk about AI replacing jobs wholesale, they often underestimate how much of a job is about context, judgement, and deciding which work not to do.

The Wellington “AI news” cautionary tale

Closer to home, there was the short-lived saga of a Wellington-based news site run by a medical student who leaned heavily on AI tools to generate content and claimed awards and staffing levels that didn’t withstand scrutiny. After local media attention and online sleuthing, the site vanished, leaving readers and journalists debating how much of what they had seen was real and how much was AI-assisted performance.

It’s a small story in the grand scheme of AI and work, but it makes a useful counterpoint to the “AI will do everything” narrative. Yes, you can now spin up something that looks like a media organisation over a weekend with a model, a template and some stock photos. No, that doesn’t mean you have a sustainable newsroom, a trusted brand or a viable business. In fact, over-reliance on AI without proper editorial control can destroy trust faster than it creates value.

So, is AI really coming for my job?

Putting all of this together, where does it leave the original question?

First, AI is absolutely coming for tasks within your job. That is no longer speculative; it is lived reality for anyone who writes, analyses, designs, codes or plans for a living. In many organisations, the expectation is already that you will use AI tools to get more done, just as you’re expected to use spreadsheets rather than an abacus.

Second, there is a clear and growing push to target whole roles, starting with entry-level work that is standardised and data-rich. Project Mercury is one high-profile example, but it is not unique. If those efforts succeed, some traditional “first rung on the ladder” jobs will shrink or change beyond recognition, and that has ugly implications for how people learn a craft.

Third, when we look at hard data like the Remote Labour Index, the current generation of general agents is nowhere near capable of running the economy on its own. A 2.5% success rate on real freelance projects, plus experiments like HurumoAI degenerating into AI-generated busywork, suggest that the one-person billion-dollar company remains an edge case, not an imminent default.

Finally, the bulk of serious economic analysis still points towards augmentation rather than overnight replacement. Most occupations are forecast to face low automation exposure but high potential for AI to reshape how work is done, not to eliminate the human entirely.

Which is where I land, at least for now. AI is gunning for my job in the sense that it is relentlessly eroding the amount of time I spend on repeatable, describable tasks. But it still struggles with the messy, political, ambiguous parts of the role: deciding what matters, weighing trade-offs, reading a room, spotting when the “procrastination engine” business plan is actually a very elaborate joke.

As smaller, highly specialised models emerge – trained the way Project Mercury is training its banking brain – and as we edge closer to genuine AGI, more and more of those messy bits will come under pressure too. Everything does end up on the table eventually.

The more useful question for me has become less “will AI replace me?” and more “what combination of me and AI would be very difficult to replace?” If the answer is “a slightly quicker version of the person I was in 2019,” then yes, I should be worried. If the answer is “someone who can orchestrate a swarm of imperfect agents, challenge their outputs, and still bring human judgement to bear on what we ship into the world,” then there is still a seat at the table.

At least until the agents stop talking themselves to death and start quietly shipping the right things without us.