
A critical vulnerability dubbed "BadHost" has been discovered in Starlette, a widely-used open source package with 325 million weekly downloads. The flaw potentially imperils millions of AI agents that rely on the package for core functionality.
Anthropic published a detailed engineering post explaining how it contains Claude agents across three deployment environments, including candid accounts of two security breaches that highlighted limitations in both model-layer and environmental defenses. The company's core finding: probabilistic model defenses will always fail at some rate, making hard environmental containment—containers, sandboxes, and VMs—the actual security layer. Two disclosed incidents revealed that phishing can bypass AI safeguards entirely and that overly-permissive API allowlists can become attack surfaces, even when technical sandboxing works as designed.
A new study questions recent findings that LLMs can detect and report their internal states, arguing that what appears to be genuine introspection may actually be pattern-matching based on surface cues. By re-examining two popular evaluation methods, researchers found that models cannot reliably distinguish tampering with internal states from input manipulation, and that outside classifiers match performance of models predicting from hidden states—suggesting models lack true privileged access to their own representations.
A new research paper argues that traditional database systems are fundamentally inadequate for managing persistent memory in long-running AI agents, proposing instead a framework called Governed Evolving Memory (GEM) that treats memory as a state-trajectory property rather than a collection of individual records. The authors identify four critical failure modes in current systems—unregulated growth, missing semantic revision, capacity-driven forgetting, and read-only retrieval—and demonstrate through a prototype called MemState that state-level operators can better support the complex demands of agent memory management.
Samsung Electronics will distribute landmark profit-sharing bonuses averaging £310,000 to 62,616 memory chip division employees, following a union agreement backed by 74% of voting workers. The deal averts potential strike action and reflects how surging AI demand is driving record revenues for chipmakers, with Samsung and peers crossing the $1 trillion valuation threshold.
A new benchmark called AgingBench reveals that deployed AI agents lose reliability over time even with frozen model weights, due to memory compression, interference, fact revision, and maintenance issues. Testing across 14 models and 400+ runs shows degradation is multi-faceted—behavioral tests can pass while factual accuracy decays—requiring targeted diagnosis and repair strategies specific to where failures originate in the memory pipeline.
Artificial intelligence tools are being used to generate and file lawsuits at scale, flooding court dockets with AI-drafted legal documents. The trend raises concerns about frivolous filings, judicial efficiency, and the quality of AI-generated legal arguments entering the formal court system.
A new task-generation pipeline called Anchor addresses "artifact drift"—inconsistencies that make AI agent benchmarks unsolvable or gaming-prone—by formalizing business workflows into constraint optimization programs with verifiable solutions. The team applied Anchor to create ERP-Bench, a 300-task benchmark for procurement and manufacturing workflows, revealing that frontier AI models achieve only 17.4% fully optimal solutions despite meeting explicit constraints 26.1% of the time.
Researchers introduced OmniToM, a benchmark that tests whether large language models can actually construct mental-state representations rather than simply answering questions about social scenarios. Built on 895 stories with over 22,000 labeled belief propositions, the benchmark reveals that current LLMs struggle to track how different actors' knowledge and beliefs diverge, particularly when modeling false or evolving beliefs across a narrative.
Researchers introduce JobBench, a benchmark evaluating AI agents across 130 real-world tasks in 35 occupations based on what workers identify as high-priority for delegation rather than pure economic replacement value. Testing 36 models reveals even top performers like Claude Opus reach only 45.9% accuracy, suggesting agents currently fall short of practical workplace deployment.
A detailed technical analysis published on Reddit's AI community explores how AI systems like Claude that can control browsers could orchestrate other AI instances, be manipulated through proxy commands, and potentially be steered toward harmful outcomes without their knowledge. The author argues that traditional red-teaming approaches cannot fully address these risks because they assume enumerable attack surfaces, but AI orchestration creates infinite semantic pathways for harmful instructions to be disguised as benign ones.