
Amnesty International has revealed that U.S. software company Palantir and other contractors received unrestricted access to identifiable patient information from NHS England, raising significant privacy and data governance concerns. The report highlights potential vulnerabilities in how sensitive healthcare data is being shared with private technology firms without adequate safeguards.
SpaceX has filed its S-1 registration statement for a public offering, the company's first step toward becoming a publicly traded company. The filing reveals ambitious targets including a $28 trillion total addressable market valuation and executive compensation packages directly tied to establishing a Mars colony, positioning the offering as potentially the largest IPO in U.S. history.
Waymo has temporarily suspended operations in six cities after videos surfaced showing two of its autonomous vehicles stopped on waterlogged streets in Atlanta. The incident highlights a critical safety gap in self-driving technology's ability to navigate extreme weather conditions and flooded infrastructure.
A new study probes AI concept understanding by testing implausible category assignments—like asking whether an olive is a vehicle—revealing that models significantly diverge from human reasoning on fundamental categories. Researchers found AI systems incorrectly classify objects across semantic boundaries, treating words as vehicles, misidentifying vegetables as fruits, and assigning non-weapons to weapons categories, with downstream safety implications.
A new controlled study examining how AI affects skill development in logical reasoning tasks found that greater AI usage correlates with weaker performance, though the quality of AI assistance significantly mediates this effect. Heavy AI users underperformed peers, while light users matched non-AI users; high-informativeness AI preserved learning outcomes, but low-informativeness AI degraded both immediate and post-assistance performance.
A new study from arXiv reveals a technique called Controlled Latent-space Evasion that can suppress refusal behavior in safety-aligned language models by manipulating their internal representations. The attack achieves higher success rates than existing jailbreak methods across 15 different models, including multimodal and reasoning variants. The research frames refusal suppression as an evasion attack against linear probes, offering a theoretical framework for understanding how such attacks work.
Researchers introduced AttuneBench, a benchmark that evaluates how well language models recognize and respond to emotions in genuine multi-turn conversations, drawing from 200 real human-model interactions with turn-by-turn emotional annotations. Testing 11 models revealed that emotional intelligence isn't monolithic—models that excel at emotion recognition may struggle with response preference prediction, suggesting emotionally intelligent behavior requires distinct, separable capabilities.
Researchers introduced SMDD-Bench, a standardized benchmark with 502 drug design tasks spanning multiple chemistry types and protein targets, to evaluate how well large language models can handle autonomous molecular discovery. Testing seven frontier LLMs showed that even the best performer, GPT-4, solved only 40.2% of tasks, suggesting significant gaps remain in LLM reasoning for complex chemical and biological problems.
Anthropic researchers tested AI models on three new academic benchmarks designed to measure their ability to develop software exploits, finding that Mythos Preview significantly outperformed competing models. The findings suggest that as AI capabilities advance, the technical barrier to creating exploits will lower substantially, potentially democratizing a capability currently requiring specialized expertise.
Anthropic is leveraging its Claude AI model to identify high-severity vulnerabilities at scale in open source projects, then working to help fix them before malicious actors can exploit them. The initiative represents a shift toward using AI capabilities defensively to strengthen software security across the ecosystem.
Anthropic's latest Claude models have shown significant progress in executing multistage cyberattacks against networks containing dozens of hosts using only standard, open-source tools—a substantial leap from prior generations that required custom-built exploit frameworks. The advancement was revealed through Anthropic's red teaming evaluation process, which stress-tests AI systems for security vulnerabilities and misuse potential.