A new research method can identify coalition structures forming in multi-agent AI systems by analyzing internal neural representations rather than relying on observable behavior alone. The technique uses mutual-information graphs and spectral partitioning to detect subgroups of agents, and has been validated in reinforcement learning environments and large language models, revealing organizational hierarchies that scalar measurements cannot capture.
Why it matters: As AI systems become more complex and distributed, the ability to detect emergent coalitions at the representational level before behavioral manifestation is critical for AI safety monitoring and alignment verification.