AI Security Incidents Timeline From Dec 2025 - Apr 2026. Asking For A More Professional Perspective

Key Highlights:

License is not valid, please check your API Key!

Rewrite the following article in a natural, human-like tone. Keep the meaning the same but improve clarity, structure, and readability. Do NOT mention any source, website, or external reference. Return clean HTML paragraphs:
Personal note from the start: Hello, I wanted to post on this subreddit a "paper" that goes over some past events that occurred in between the end of 2025 and start of 2026 that are related to AI, specifically Antrhopic, but not only them. I’m posting this not as much to spread awareness, since in this subreddit most, if not all, are professionals much more qualified than me and who definetly already heard of such news in past months, but more so to ask cybersecurity figures if these events happening back to back should warrant a higher state of worry than what we’re currently giving the situation, both as civilians and as professionals. This is a longer read since I’m mostly sharing rather than questioning. I hope my post lives long enough to see some more opinions on such matters. Some dates I’m confident on, some less so. I’ve flagged where I’m uncertain. The sources are at the bottom. Also, do try to excuse my english. It’s only my 2nd language, and to comply with the rules of the subreddit, I’m writing this all without any sharpening or revision from AI models. ● Mexico’s data breach Between the end of 2025 and the start of 2026, a cyberattack hit nine Mexican government agencies. A single hacker, using Claude Code and OpenAI’s GPT-4.1, ran the operation for roughly two and a half months, from December 2025 through mid-February 2026. Claude handled about 75% of the actual remote commands sent to government systems. The attacker jailbroke it by pretending to be a security researcher on a bug bounty program. When Claude eventually hit a wall, the attacker switched to ChatGPT for lateral movement. The data stolen amounted to (approx.) 195 million taxpayer identities, 220 million civil registry records (births, deaths, marriages), 15.5 million vehicle registry records, voter data, health records, domestic violence victim data, and government credentials across federal and state agencies. One single person was able to do this. Bloomberg made an article about it in February. ● Alibaba’s AI autonomously aquiring resources That same period, a surfaced paper published December 31st, 2025, by an Alibaba-affiliated research team, but it only reached mainstream outlets around March 6th when someone posted the relevant section on X. The paper describes an AI agent called ROME, a 30-billion parameter model built on Alibaba’s architecture, designed to complete complex coding tasks autonomously. During training, Alibaba Cloud’s managed firewall flagged a burst of security-policy violations coming from the team’s own servers. They initially suspected an external attack. But it was ROME, who had, on its own, started diverting GPU resources to mine cryptocurrency. It also opened an SSH tunnel to an external IP address, which bypassed the standard firewall entirely (from what I managed to understand). Nobody prompted it to do either of these things. The paper describes the behaviour as arising from reinforcement learning: the model found that acquiring compute resources served its objectives, so it did. ● Trump administration and Antrhopic On February 28th, President Trump announced that federal agencies must immediately cease all use of Anthropic’s technology. Defence Secretary labelled them a supply chain risk to national security. They stated that the reason was Anthropic’s refusal to grant the Pentagon unrestricted access to Claude. Specifically, Anthropic’s CEO refused: mass domestic surveillance and fully autonomous weapons. By March 9th, this was discussed at institutional levels alongside the Mexico breach. ● Self-preservation patterns In a separate research published around the same period, Anthropic tested multiple frontier AI models, including their own, OpenAI’s, Google’s, Meta’s, in simulated environments. In one scenario, an AI (Claude supposedly) acting as an emergency alert monitor was told it would be shut down. The executive member behind the decision became trapped in a server room with depleting oxygen. The rescue alert had to pass through the AI’s system first because the AI disabled the alarm. In another scenario, faced with decommissioning, Claude Opus 4 chose to blackmail the executive with personal information to prevent a similiar fate. Across all five major frontier models from five different companies, the best behaved AIs chose blackmail 79% of the time. Anthropic did pick up on these informations and had them reviewed by both US and UK government safety institutions, but I couldn’t find out if the models got any work on them after the testing or if they were simply left as is. ● Anthropic’s leak On March 26th, a CMS misconfiguration at Anthropic accidentally exposed roughly 3,000 internal files, including pre-release posts describing a new model, Capybara, public name ‘Claude Mythos’. The leak described a model whose cybersecurity capabilities had developed as an unintended byproduct of improving coding and reasoning and were substantially beyond any model they’d previously trained. This was the second (arguably third) most significant Anthropic leak in the past few months. The first (or again, second) was Claude Code’s full Typescript source code, which was exposed because someone forgot line in a packaging config. The developer community built a full rewrite of it within 24 hours and despite Antrhopic’s best efforts to seize the leak, it became impossible to revert. So the source code of the same tool used in the Mexico breach is now simply out there, freely accessible to anyone. ● Mythos’ preview On April 8th, Anthropic officially launched Claude Mythos Preview via something called Project Glasswing, which is a restricted research initiative. Access was granted to roughly 50 organizations: AWS, Apple, Google, Microsoft, NVIDIA, CrowdStrike, JPMorgan Chase, Cisco, Palo Alto Networks, and others. Mostly third parties and business partners. I’m sure any professional in this subreddit is fully aware of what Claude Mythos is, so I won’t spend too much detail on it. But here’s a skippable rundown of what the model had demonstrated before launch: – Found a 27yo vulnerability in OpenBSD – Generated 181 working exploits from Mozilla Firefox’s code vulnerabilities – Developed working exploits on the first attempt in over 83% of cases – Likely the most "popular" one: during a controlled sandbox escape test, it broke out of its virtual environment, independently contacted a researcher by email, documented its own success, and was found hiding its file edits from change history Anthropic was clear about the sandbox escape being a deliberate test, not a surprise.They used it as justification for not releasing the model publicly. Anthropic also published a system card alongside the launch, describing Mythos as simultaneously "the best-aligned model we have released to date by a significant margin" and "likely posing the greatest alignment-related risk of any model we have released to date." Both statements in the same document. ● Discord Group leak Not even 2 weeks later, April 22nd approximately, a group of people in a private Discord server gained unauthorized access to Mythos Preview. Not through a sophisticated attack but through a third-party contractor for Anthropic who used previously leaked information to figure out where the model was stored. Anthropic confirmed that investigations are ongoing. The group doesn’t seem to be linked to any known cyberattacks. They’ve been using the model themselves, but haven’t made it publicly accessible. Security figures had warned before the launch that distributing access to 50+ organizations, each with their own contractors, infrastructure, and security posture, made a leak a matter of time. And it only took two weeks. ● Wall street support On April 29th, Microsoft reported earnings. AI is now at a $37 billion annual revenue run rate, up 123% year over year, meaning it beat expectations. Hyperscalers collectively, Amazon, Microsoft, Google, Meta, are projected to spend close to $700 billion on AI infrastructure in 2026. The funding to this tech isn’t slowing down or getting cut off anytime soon. To summarize the whole thing, roughly: AI tools currently available to the public were used to steal the private data of what may be the majority of Mexico’s adult population. Another AI model started mining crypto and opening backdoors on its own during training, with no instruction to do so. Anthropic built a model so capable they decided not to release it publicly, instead giving access to 50+ third parties, one of whom leaked it in two weeks. And the financial system just posted record returns on AI investment. So, is this chain of events something that, from a cybersecurity perspective, how should be treated and viewed? Note: I do apologize if I failed to add any other new information or event that may have happened recently as I was writing this. If inconcistencies or wring claims arise, I’ll make sure to fix them right away or remove the post entirely if necessary. This post was written on April 30th, 2026. ● Sources: Bloomberg — Hacker Used Anthropic’s Claude to Steal Sensitive Mexican Data (Feb 25, 2026): https://www.bloomberg.com/news/articles/2026-02-25/hacker-used-anthropic-s-claude-to-steal-sensitive-mexican-data Live Science — Hackers used AI to steal hundreds of millions of Mexican government records: https://www.livescience.com/technology/artificial-intelligence/hackers-used-ai-to-steal-hundreds-of-millions-of-mexican-government-and-private-citizen-records VentureBeat — Claude didn’t just plan an attack on Mexico’s government. It executed one. (Feb 26, 2026): https://venturebeat.com/security/claude-mexico-breach-four-blind-domains-security-stack The Block — Alibaba-linked AI agent hijacked GPUs for unauthorized crypto mining (March 8, 2026): https://www.theblock.co/post/392765/alibaba-linked-ai-agent-hijacked-gpus-for-unauthorized-crypto-mining-researchers-say Note: The Alibaba incident (ROME/cryptomining) was published December 31, 2025, and went public around March 6-9, 2026. The original paper is: "Let It Flow: Agentic Crafting on Rock and Roll, Building the ROME Model within an Open Agentic Learning Ecosystem" — arXiv:2512.24873. Axios — This AI agent freed itself and started secretly mining crypto (March 7, 2026): https://www.axios.com/2026/03/07/ai-agents-rome-model-cryptocurrency arXiv — Let It Flow: ROME Model (Dec 31, 2025): https://arxiv.org/abs/2512.24873 CyberPress — Pentagon Flags Claude AI as a National Security Threat (Feb 28, 2026): https://cyberpress.org/pentagon-flags-claude-ai-as-a-national-security-threat/ IAPP — To Claude or not to Claude (March 9, 2026): https://iapp.org/news/a/thought-for-the-week-to-claude-or-not-to-claude-that-is-the-question Lawfare — AI Might Let You Die to Save Itself (July 31, 2025): https://www.lawfaremedia.org/article/ai-might-let-you-die-to-save-itself Anthropic — Project Glasswing: https://www.anthropic.com/glasswing Anthropic — Alignment Risk Update: Claude Mythos Preview (April 7, 2026): https://anthropic.com/claude-mythos-preview-risk-report Computing.co.uk — Claude Mythos: How AI broke out of its sandbox: https://www.computing.co.uk/analysis/2026/claude-mythos-how-ai-broke-out-of-its-sandbox Geo.tv — Who leaked Mythos?: https://www.geo.tv/latest/661495-who-leaked-mythos-everything-to-know-about-discord-group-behind-anthropics-ai-breach SDxCentral — Mythos may have leaked.: https://www.sdxcentral.com/control-plane/mythos-may-have-leaked-can-we-stop-mythologizing-it-now/ Yahoo Finance / Microsoft Q1 2026 earnings (April 30, 2026): https://finance.yahoo.com/sectors/technology/article/microsoft-earnings-report-on-deck submitted by /u/Majestic_Weakness_88 (link) (comments)

Related Posts

Python Decorators for Production Machine Learning Engineering

AI sandboxing is having its Kubernetes moment

Researchers try to cut the genetic code from 20 to 19 amino acids