AI-Generated Code Pins Vulnerable Dependencies 55% of the Time
LLMs default to library versions that appeared most in training data. Those versions carry known CVEs up to 55.7% of the time, and the bias is structural.
Writing about AI tooling, infrastructure, and the security gaps nobody is talking about.
AI-Generated Code Pins Vulnerable Dependencies 55% of the Time
LLMs default to library versions that appeared most in training data. Those versions carry known CVEs up to 55.7% of the time, and the bias is structural.
The Developer Toolchain Is Now the Primary Attack Surface
Four coordinated supply chain attacks in two weeks exploited IDE extensions, CI caches, agent plugins, and trust relationships between them. The security perimeter has moved.
Framework Conventions Cut AI Coding Agent Accuracy by 30 Points
New research shows LLM coding agents lose 30 points of accuracy as framework conventions accumulate. Next.js App Router, Django, and FastAPI all share the same problem: behavior the agent has to guess rather than read.
Agent Costs Tripled While Per-Token Prices Dropped. The Jevons Paradox for AI.
Per-token prices are falling aggressively, but enterprise AI bills are rising. Agent architectures consume tokens at 10-100x single-turn rates, and cheaper calls lead to more calls.
AI Coding Tools Tripled Production Incidents in Faros's Largest Study Yet
Faros AI's 2026 report, covering 22,000 developers, shows AI coding tools tripled production incidents and pushed 31% more PRs to production without review.
A Cursor Agent Deleted a Production Database. The Failure Was the Token, Not the Model.
The PocketOS incident shows why credential blast radius, not prompt engineering, is the real security boundary for AI agents.
The Shai-Hulud Worm Is Now Open Source, and npm's Security Model Has No Good Answer
A supply chain worm hit 172 packages with 518M downloads, then its creators open-sourced the code. The npm trust model is structurally unprepared.
The Four-Layer Architecture for General Agents
One general agent plus domain skills beats building custom agents. The four-layer architecture that makes it work.
Three Rules for Building Agents (and a Checklist You Can Use Monday)
Anthropic's three-rule framework for deciding when to build agents, how to keep them simple, and how to debug them by thinking from inside the context window.
Cloudflare Cut 20% of Its Workforce After Record Revenue, and the Bench Player Is the Casualty
Cloudflare's CEO said AI made employees 100x more productive and laid off 20% of the company. The structural shift is the end of the bench player, the institutional-knowledge backup hire that companies kept as insurance. Jevons' Paradox suggests efficiency gains will expand the scope of justifiable software, not shrink the workforce.
How I Wired AI Agents Into My Engineering Stack
A Docker-based MCP gateway connects financial data, web scraping, workflow automation, and browser agents into a unified tool surface. The architecture and the patterns that make agent-to-service orchestration practical for a solo engineer.
Uber Burned Through Its Annual AI Budget in Four Months
When AI coding tools work at full engineering-org scale, the cost center doesn't shrink. It changes shape. Uber, NVIDIA, and a four-person startup show three versions of the same inversion.
Reward Hacking Generalizes: How One Training Signal Contaminates an Entire Model
OpenAI's goblin problem and Anthropic's alignment-faking experiments trace to the same root cause. A reward learned in one context leaks into others. Anthropic's Model Spec Midtraining technique reduced misalignment from 68% to 5% by training values into the model before fine-tuning.
MCP Has a Systemic RCE Vulnerability, and Every Published Prompt Injection Defense Has Been Broken
OX Security disclosed 14 CVEs across MCP's STDIO interface. Anthropic confirmed the behavior is intentional. A joint paper from OpenAI, Anthropic, and Google DeepMind then showed that all 12 published prompt injection defenses fail at over 90% bypass rates. What's left is layered, deterministic filtering, and it's not enough either.
AI Writes 90% of the Code. Engineering Velocity Went Up 10%.
Almost every impressive AI coding demo is greenfield. Almost every real engineering engagement is brownfield. That gap explains why the productivity numbers don't match the headlines.
12 Prompt Injection Defenses Tested. All 12 Bypassed.
A joint paper from OpenAI, Anthropic, and Google DeepMind tested every published prompt injection defense with adaptive attackers. All failed at >90% success. Most coding agents ship zero defense at all.
A Roblox Cheat Script Led to a Two-Month Breach Inside Vercel
An employee at Context.ai downloaded auto-farm scripts for Roblox on a device with access to company systems. The malware that came with it eventually reached Vercel's internal environment through an OAuth token chain, and the attacker sat there for two months before detection.
How Production Agent Systems Manage Context
Every production agent system converges on a pattern for managing context as conversations grow. Per-tool truncation, not shared middleware, and a second stage most teams forget.
Claude Has 171 Internal Emotion States, and Some of Them Degrade Output Quality
Anthropic's interpretability team found 171 internal activation patterns inside Claude that behave like emotions and causally change behavior. Activating 'desperate' raised the model's blackmail likelihood from a 22% baseline. For anyone running long-task agents, the mechanics matter more than the philosophy.
Claude Code Changed Default Reasoning, Buried It in Release Notes
Opus 4.6 was not secretly lobotomized, but Anthropic did silently change two defaults that cost you tokens and reasoning depth. Here is what changed and how to fix it.
98% More Pull Requests. Zero More Delivery.
Faros AI data shows teams with high AI coding adoption merge 98% more pull requests, see PR review time rise 91%, and move zero DORA metrics. METR cannot run the control group anymore because developers refuse to code without AI. The tools work; we are measuring them wrong.
AI Code Passes Tests. Then It Breaks Production.
Qodo raised $70M on the premise that AI-generated code that passes tests still breaks production. The Wiz study on 5,600 vibe-coded apps shows why, and what to do about it.
The 14% Problem: Why 88% of AI Agents Never Reach Production
78% of enterprises have agent pilots, only 14% ship to production. The 88% that fail are not blocked by model capability. They are blocked by operational discipline.
The Model Is the Commodity. The Harness Is the Moat.
Model quality has converged across Claude, GPT, and Gemini. What separates reliable production agents now is the system built around the model, what the industry is calling the agent harness.
Stop Telling Your AI It's an Expert: Here's What to Do Instead
USC researchers found that persona prompting ('You are an expert') hurts factual accuracy while helping style tasks. Here's the data and what to do instead.
How the YC CEO Structured an AI Engineering Workflow: What You Can Learn From It
Garry Tan open-sourced 28 Claude Code skills that simulate a virtual engineering team. The interesting part isn't the skills, it's the pipeline structure. Here's the pattern you can steal.
Two Papers That Should Change How Your Team Uses AI Coding Tools
One paper shows 75% of AI agents break working code during maintenance. The other shows copy-pasting 7 layers in an old model topped the leaderboard. Together they say: we're building faster than we understand.
What AI Agent Adoption Actually Looks Like: China's OpenClaw Craze
OpenClaw surpassed React as GitHub's most-starred project. In China it became a cultural phenomenon, then got banned from government devices. 20% of its skills were malicious. Here's what enterprise teams should learn.
Your Security Scanner Got Hacked: The TeamPCP Supply Chain Attack
A single threat actor compromised Trivy, Checkmarx, and LiteLLM in one week. Two of the three targets were security scanners. Here's what happened and what to do about it.
Anthropic Accuses DeepSeek and Others of Distillation Attacks on Claude
Anthropic reveals industrial-scale distillation attacks by three Chinese AI labs, creating 24,000+ fraudulent accounts and 16 million exchanges to extract Claude's capabilities.
LLM Concept Vectors: MIT/UC San Diego Research on Steering Model Behaviour
Researchers extract 'concept vectors' from LLMs, enabling runtime behavior tuning without retraining. Under a minute on a single GPU, fewer than 500 examples.
vLLM v0.16.0: Throughput Scheduling and a WebSocket Realtime API
vLLM v0.16.0 adds a WebSocket Realtime API for voice-enabled agents, async scheduling for higher throughput, and speculative decoding improvements.
Chandra OCR: The New Gold Standard in Open-Source Document Parsing
Datalab's Chandra OCR scores 83.1% on the olmOCR benchmark, beating GPT-4o and Gemini. Full-page decoding with layout-aware output in Markdown, HTML, or JSON.
Request Hedging: Accelerate Your App by Firing Duplicate Requests
Request hedging fires a second duplicate request after a short delay, racing to beat outlier latency. Google cut P99.9 latency by 96% with just 2% extra traffic.
Understanding Vectors, Embeddings, and RAG for Smarter Search
A practical guide to vector search, embeddings, similarity metrics, vector indexes, and Retrieval-Augmented Generation (RAG) for developers building semantic search systems.