LLM Concept Vectors: MIT/UC San Diego Research on Steering Model Behaviour

Date: February 23, 2026 Source: Science
Researchers from MIT and UC San Diego published a paper in Science describing a new algorithm called the Recursive Feature Machine (RFM) that can extract concept vectors from large language models. These are patterns of neural activity corresponding to specific ideas or behaviors. Using fewer than 500 training samples and under a minute of compute on a single A100 GPU, researchers were able to steer models toward or away from specific behaviors, bypass safety features, and transfer concepts across languages.
The technique works across LLMs, vision-language models, and reasoning models.
Why This Matters for Developers
This research points to a future beyond prompt engineering. Instead of coaxing a model into a desired behavior with carefully crafted text, developers will be able to directly manipulate the model's internal representations of concepts. That is a fundamentally different level of control.
It opens the door to more precise model customization and makes it easier to debug why a model behaves a certain way. The ability to extract and transfer concepts across languages is particularly significant for global teams building multilingual applications, since it sidesteps the need to curate separate alignment datasets for each language. The practical takeaway: the developers who learn to work with internal model representations, not just prompts, will have a serious edge in building AI-powered applications.