News

lesswrong. com
lesswrong. com > posts > x5qddsd Xfq SZXi8 Yu > discovering-concept-editing-algorithms-with-llm-agents

Discovering Concept-Editing Algorithms With LLM Agents " Less Wrong

1+ hour, 48+ min ago  (23+ words) Concept erasure is a technique that removes unwanted information from a model's activations, but current erasure methods struggle to fully remove tar...

Symbols: symbol:once,nasdaq:lpla,train.py,agent.py
lesswrong. com
lesswrong. com > posts > it Dx Bk Ha Qne Ywcz Lp > please-make-me-care-about-x-risk

Please make me care about x-risk " Less Wrong

13+ hour, 49+ min ago  (275+ words) I've heard people talk a lot about how AI Safety, as a community, has a communication issue. In reality I think we, as society, have a 'difficulty ca...

Symbols: btc-usd
lesswrong. com
lesswrong. com > posts > 3mr BLJFLgqa E6vaj D > the-problem-with-chat

The Problem with Chat " Less Wrong

14+ hour, 36+ min ago  (13+ words) https: //www. magfrump. net/blog/the-problem-with-chat Discuss The Problem with Chat...

lesswrong. com
lesswrong. com > posts > Zowo Ni JGm Hn SABgyz > links-4-2026-06-part-2

Links #4: 2026/06 Part 2 " Less Wrong

17+ hour, 12+ min ago  (1450+ words) Preface " * I show my discovery graph in (via ") blocks, those without usually come from my RSS reader, or the algorithm of that site * This is app"...

Symbols: fim-92,ted-ai
lesswrong. com
lesswrong. com > posts > b JG6t8i Bo Df Kq4xc E > correct-answer-features-cannot-explain-multiple-choice

"Correct Answer Features" Cannot Explain Multiple Choice Capabilities " Less Wrong

19+ hour, 9+ min ago  (24+ words) TL; DR: I present theoretical and empirical evidence that LLMs cannot be (exclusively) using a "correct answer feature" as the main mechanism by which...

Symbols: symbol:once
lesswrong. com
lesswrong. com > posts > c Z2 Sh KLc Fii Pjh Lg6 > the-name-is-not-the-model

The Name is Not The Model " Less Wrong

1+ day, 20+ min ago  (1322+ words) Abstract A safety evaluation has to trust something to know what it is testing: the model's name, its version string, and the reasons the model gives...

Symbols: gpt-4o
lesswrong. com
lesswrong. com > posts > r Siw Kbis Km Mh ALp Bk > gradient-free-single-pass-model-beats-nanogpt-on-shakespeare-1

Gradient-free Single-pass Model Beats nano GPT on Shakespeare " Less Wrong

2+ day, 1+ hour ago  (291+ words) Beam is a character-level language model that computes count tables mapping character contexts to next-character frequencies. Each order receives a capacity score composed of two terms: where H(p) is the Shannon entropy of the smoothed distribution. This is 1 when all…...

lesswrong. com
lesswrong. com > posts > p QLQ5 GMj QP7q Kb7 HS > we-should-be-scaling-rl-on-forecasting

We Should Be Scaling RL on Forecasting " Less Wrong

2+ day, 22+ hour ago  (44+ words) This is a crosspost of a post from my blog, Metal Ivy. The original is here: Reinforcement Learning on Forecasting Will Give Us a Superhuman Forecast...

Symbols: nasdaq:rdwr,nyse:arl,btc-usd,nyse:rdw,nyse:rvty
lesswrong. com
lesswrong. com > posts > p QLQ5 GMj QP7q Kb7 HS > reinforcement-learning-on-forecasting-can-give-us-a

Reinforcement Learning on Forecasting Can Give Us a Superhuman Forecaster " Less Wrong

2+ day, 22+ hour ago  (44+ words) This is a crosspost of a post from my blog, Metal Ivy. The original is here: Reinforcement Learning on Forecasting Will Give Us a Superhuman Forecast...

Symbols: btc-usd,nasdaq:googl,nasdaq:nvda
lesswrong. com
lesswrong. com > posts > p QLQ5 GMj QP7q Kb7 HS > reinforcement-learning-on-forecasting-will-give-us-a

Reinforcement Learning on Forecasting Will Give Us a Superhuman Forecaster " Less Wrong

2+ day, 22+ hour ago  (44+ words) This is a crosspost of a post from my blog, Metal Ivy. The original is here: Reinforcement Learning on Forecasting Will Give Us a Superhuman Forecast...

Symbols: btc-usd,nasdaq:googl,nasdaq:nvda