All of these tests performed far better than what I expected given my prior poor experiences with agents. Did I gaslight myself by being an agent skeptic? How did a LLM sent to die finally solve my agent problems? Despite the holiday, X and Hacker News were abuzz with similar stories about the massive difference between Sonnet 4.5 and Opus 4.5, so something did change.
Anthropic 现在处于一个「既要又要」的两难境地:既想维持安全、不反人性的模型定位和公司形象,又不愿意错过美国政府的大单。
,这一点在safew官方版本下载中也有详细论述
Inside the situation room
Do a second pass to clean up the code/comments and make further optimizations