Back to Home

Blog

Writing about AI tooling, infrastructure, and the security gaps nobody is talking about.

AI-Generated Code Pins Vulnerable Dependencies 55% of the Time

LLMs default to library versions that appeared most in training data. Those versions carry known CVEs up to 55.7% of the time, and the bias is structural.

May 24, 2026

The Developer Toolchain Is Now the Primary Attack Surface

Four coordinated supply chain attacks in two weeks exploited IDE extensions, CI caches, agent plugins, and trust relationships between them. The security perimeter has moved.

May 24, 2026

Framework Conventions Cut AI Coding Agent Accuracy by 30 Points

New research shows LLM coding agents lose 30 points of accuracy as framework conventions accumulate. Next.js App Router, Django, and FastAPI all share the same problem: behavior the agent has to guess rather than read.

May 24, 2026

Agent Costs Tripled While Per-Token Prices Dropped. The Jevons Paradox for AI.

Per-token prices are falling aggressively, but enterprise AI bills are rising. Agent architectures consume tokens at 10-100x single-turn rates, and cheaper calls lead to more calls.

May 24, 2026

AI Coding Tools Tripled Production Incidents in Faros's Largest Study Yet

Faros AI's 2026 report, covering 22,000 developers, shows AI coding tools tripled production incidents and pushed 31% more PRs to production without review.

May 15, 2026

A Cursor Agent Deleted a Production Database. The Failure Was the Token, Not the Model.

The PocketOS incident shows why credential blast radius, not prompt engineering, is the real security boundary for AI agents.

May 15, 2026

The Shai-Hulud Worm Is Now Open Source, and npm's Security Model Has No Good Answer

A supply chain worm hit 172 packages with 518M downloads, then its creators open-sourced the code. The npm trust model is structurally unprepared.

May 15, 2026

The Four-Layer Architecture for General Agents

One general agent plus domain skills beats building custom agents. The four-layer architecture that makes it work.

May 15, 2026

Three Rules for Building Agents (and a Checklist You Can Use Monday)

Anthropic's three-rule framework for deciding when to build agents, how to keep them simple, and how to debug them by thinking from inside the context window.

May 15, 2026

Cloudflare Cut 20% of Its Workforce After Record Revenue, and the Bench Player Is the Casualty

Cloudflare's CEO said AI made employees 100x more productive and laid off 20% of the company. The structural shift is the end of the bench player, the institutional-knowledge backup hire that companies kept as insurance. Jevons' Paradox suggests efficiency gains will expand the scope of justifiable software, not shrink the workforce.

May 14, 2026

How I Wired AI Agents Into My Engineering Stack

A Docker-based MCP gateway connects financial data, web scraping, workflow automation, and browser agents into a unified tool surface. The architecture and the patterns that make agent-to-service orchestration practical for a solo engineer.

May 12, 2026

Uber Burned Through Its Annual AI Budget in Four Months

When AI coding tools work at full engineering-org scale, the cost center doesn't shrink. It changes shape. Uber, NVIDIA, and a four-person startup show three versions of the same inversion.

May 10, 2026

Reward Hacking Generalizes: How One Training Signal Contaminates an Entire Model

OpenAI's goblin problem and Anthropic's alignment-faking experiments trace to the same root cause. A reward learned in one context leaks into others. Anthropic's Model Spec Midtraining technique reduced misalignment from 68% to 5% by training values into the model before fine-tuning.

May 8, 2026

MCP Has a Systemic RCE Vulnerability, and Every Published Prompt Injection Defense Has Been Broken

OX Security disclosed 14 CVEs across MCP's STDIO interface. Anthropic confirmed the behavior is intentional. A joint paper from OpenAI, Anthropic, and Google DeepMind then showed that all 12 published prompt injection defenses fail at over 90% bypass rates. What's left is layered, deterministic filtering, and it's not enough either.

May 2, 2026

AI Writes 90% of the Code. Engineering Velocity Went Up 10%.

Almost every impressive AI coding demo is greenfield. Almost every real engineering engagement is brownfield. That gap explains why the productivity numbers don't match the headlines.

Apr 30, 2026

12 Prompt Injection Defenses Tested. All 12 Bypassed.

A joint paper from OpenAI, Anthropic, and Google DeepMind tested every published prompt injection defense with adaptive attackers. All failed at >90% success. Most coding agents ship zero defense at all.

Apr 30, 2026

A Roblox Cheat Script Led to a Two-Month Breach Inside Vercel

An employee at Context.ai downloaded auto-farm scripts for Roblox on a device with access to company systems. The malware that came with it eventually reached Vercel's internal environment through an OAuth token chain, and the attacker sat there for two months before detection.

Apr 25, 2026

How Production Agent Systems Manage Context

Every production agent system converges on a pattern for managing context as conversations grow. Per-tool truncation, not shared middleware, and a second stage most teams forget.

Apr 22, 2026

Claude Has 171 Internal Emotion States, and Some of Them Degrade Output Quality

Anthropic's interpretability team found 171 internal activation patterns inside Claude that behave like emotions and causally change behavior. Activating 'desperate' raised the model's blackmail likelihood from a 22% baseline. For anyone running long-task agents, the mechanics matter more than the philosophy.

Apr 18, 2026

Claude Code Changed Default Reasoning, Buried It in Release Notes

Opus 4.6 was not secretly lobotomized, but Anthropic did silently change two defaults that cost you tokens and reasoning depth. Here is what changed and how to fix it.

Apr 17, 2026

98% More Pull Requests. Zero More Delivery.

Faros AI data shows teams with high AI coding adoption merge 98% more pull requests, see PR review time rise 91%, and move zero DORA metrics. METR cannot run the control group anymore because developers refuse to code without AI. The tools work; we are measuring them wrong.

Apr 15, 2026

AI Code Passes Tests. Then It Breaks Production.

Qodo raised $70M on the premise that AI-generated code that passes tests still breaks production. The Wiz study on 5,600 vibe-coded apps shows why, and what to do about it.

Apr 14, 2026

The 14% Problem: Why 88% of AI Agents Never Reach Production

78% of enterprises have agent pilots, only 14% ship to production. The 88% that fail are not blocked by model capability. They are blocked by operational discipline.

Apr 10, 2026

The Model Is the Commodity. The Harness Is the Moat.

Model quality has converged across Claude, GPT, and Gemini. What separates reliable production agents now is the system built around the model, what the industry is calling the agent harness.

Apr 9, 2026

Stop Telling Your AI It's an Expert: Here's What to Do Instead

USC researchers found that persona prompting ('You are an expert') hurts factual accuracy while helping style tasks. Here's the data and what to do instead.

Mar 27, 2026

How the YC CEO Structured an AI Engineering Workflow: What You Can Learn From It

Garry Tan open-sourced 28 Claude Code skills that simulate a virtual engineering team. The interesting part isn't the skills, it's the pipeline structure. Here's the pattern you can steal.

Mar 26, 2026

Two Papers That Should Change How Your Team Uses AI Coding Tools

One paper shows 75% of AI agents break working code during maintenance. The other shows copy-pasting 7 layers in an old model topped the leaderboard. Together they say: we're building faster than we understand.

Mar 26, 2026

What AI Agent Adoption Actually Looks Like: China's OpenClaw Craze

OpenClaw surpassed React as GitHub's most-starred project. In China it became a cultural phenomenon, then got banned from government devices. 20% of its skills were malicious. Here's what enterprise teams should learn.

Mar 25, 2026

Your Security Scanner Got Hacked: The TeamPCP Supply Chain Attack

A single threat actor compromised Trivy, Checkmarx, and LiteLLM in one week. Two of the three targets were security scanners. Here's what happened and what to do about it.

Mar 25, 2026

Anthropic Accuses DeepSeek and Others of Distillation Attacks on Claude

Anthropic reveals industrial-scale distillation attacks by three Chinese AI labs, creating 24,000+ fraudulent accounts and 16 million exchanges to extract Claude's capabilities.

Feb 26, 2026

LLM Concept Vectors: MIT/UC San Diego Research on Steering Model Behaviour

Researchers extract 'concept vectors' from LLMs, enabling runtime behavior tuning without retraining. Under a minute on a single GPU, fewer than 500 examples.

Feb 26, 2026

vLLM v0.16.0: Throughput Scheduling and a WebSocket Realtime API

vLLM v0.16.0 adds a WebSocket Realtime API for voice-enabled agents, async scheduling for higher throughput, and speculative decoding improvements.

Feb 26, 2026

Chandra OCR: The New Gold Standard in Open-Source Document Parsing

Datalab's Chandra OCR scores 83.1% on the olmOCR benchmark, beating GPT-4o and Gemini. Full-page decoding with layout-aware output in Markdown, HTML, or JSON.

Nov 19, 2025

Request Hedging: Accelerate Your App by Firing Duplicate Requests

Request hedging fires a second duplicate request after a short delay, racing to beat outlier latency. Google cut P99.9 latency by 96% with just 2% extra traffic.

Sep 18, 2025

Understanding Vectors, Embeddings, and RAG for Smarter Search

A practical guide to vector search, embeddings, similarity metrics, vector indexes, and Retrieval-Augmented Generation (RAG) for developers building semantic search systems.

Jun 12, 2025