Elon Musk says AI training models have exhausted human-generated data and need to move to synthetic data.
“All the data and knowledge created by humans has been exploited in the process of training AI. This has been happening since last year,” billionaire Elon Musk said in an interview published on January 9.
He believes the only way to avoid a shortage of source data to use to train new models is to switch to synthetic data generated by AI itself. “It’s like writing an essay or thesis, then grading and evaluating yourself, and building new knowledge,” he says.
Leading tech companies like Meta, Microsoft, Google, and OpenAI have all used synthetic data to refine their models.

However, the American billionaire also warned that AI models are still capable of creating “hallucinations” – a term that refers to incorrect or meaningless outputs, leading to the risk of misleading information due to the AI’s self-synthesis. “Halucinations pose many challenges to the process of using synthetic data, because it is impossible to know whether what the AI gives is an illusion or the real answer to the problem,” he said.
Andrew Duncan, director of AI at the Alan Turing Institute in the UK, said Musk’s comments were similar to a recent academic paper that estimated that public data for AI models could run out by 2026. He said relying too much on synthetic data could risk “model collapse”, meaning the quality of AI output would decline, bias would increase, and creativity would be lost.
High-quality data, as well as control over it, has become one of the most tense legal fronts in the AI craze.
OpenAI also admitted last year that it couldn’t create tools like ChatGPT without access to copyrighted data, while the creative industry and publishers are demanding payment for their material being mined in AI training.