News
Discovering Concept-Editing Algorithms With LLM Agents " Less Wrong
1+ hour, 48+ min ago (23+ words) Concept erasure is a technique that removes unwanted information from a model's activations, but current erasure methods struggle to fully remove tar...
Please make me care about x-risk " Less Wrong
13+ hour, 49+ min ago (275+ words) I've heard people talk a lot about how AI Safety, as a community, has a communication issue. In reality I think we, as society, have a 'difficulty ca...
The Problem with Chat " Less Wrong
14+ hour, 36+ min ago (13+ words) https: //www. magfrump. net/blog/the-problem-with-chat Discuss The Problem with Chat...
Links #4: 2026/06 Part 2 " Less Wrong
17+ hour, 12+ min ago (1450+ words) Preface " * I show my discovery graph in (via ") blocks, those without usually come from my RSS reader, or the algorithm of that site * This is app"...
"Correct Answer Features" Cannot Explain Multiple Choice Capabilities " Less Wrong
19+ hour, 9+ min ago (24+ words) TL; DR: I present theoretical and empirical evidence that LLMs cannot be (exclusively) using a "correct answer feature" as the main mechanism by which...
The Name is Not The Model " Less Wrong
1+ day, 20+ min ago (1322+ words) Abstract A safety evaluation has to trust something to know what it is testing: the model's name, its version string, and the reasons the model gives...
Gradient-free Single-pass Model Beats nano GPT on Shakespeare " Less Wrong
2+ day, 1+ hour ago (291+ words) Beam is a character-level language model that computes count tables mapping character contexts to next-character frequencies. Each order receives a capacity score composed of two terms: where H(p) is the Shannon entropy of the smoothed distribution. This is 1 when all…...
We Should Be Scaling RL on Forecasting " Less Wrong
2+ day, 22+ hour ago (44+ words) This is a crosspost of a post from my blog, Metal Ivy. The original is here: Reinforcement Learning on Forecasting Will Give Us a Superhuman Forecast...
Reinforcement Learning on Forecasting Can Give Us a Superhuman Forecaster " Less Wrong
2+ day, 22+ hour ago (44+ words) This is a crosspost of a post from my blog, Metal Ivy. The original is here: Reinforcement Learning on Forecasting Will Give Us a Superhuman Forecast...
Reinforcement Learning on Forecasting Will Give Us a Superhuman Forecaster " Less Wrong
2+ day, 22+ hour ago (44+ words) This is a crosspost of a post from my blog, Metal Ivy. The original is here: Reinforcement Learning on Forecasting Will Give Us a Superhuman Forecast...