Atlantic reporter Alex Reisner uncovered four datasets containing up to 21 million songs used to train AI music generators, including confirmed use by Google and Stability AI, and made them searchable for public scrutiny. The datasets range from massive collections of 12 million and 9 million tracks to smaller but still substantial repositories of over 100,000 songs each, raising transparency questions about AI training data sourcing.
Why it matters: As AI music generation becomes mainstream, understanding which datasets fuel these models is critical for musicians, rights holders, and regulators grappling with copyright and ethical implications of AI training practices.