Whare are some of the limits?
Well, to start. Recent https://arxiv.org/pdf/2103.14749.pdf (research) from MIT found a high number of errors in publicly available datasets that are widely used for training models. An average of 3.3% errors were found in the test sets of 10 of the most widely used computer vision, natural language processing (NLP) and audio datasets. 
Data is found in free text.
No shared data collection standards
Self-reported data can be inaccurate
Fraud and Abuse

But it's not all doom and gloom, progress is being made.
https://www.forbes.com/sites/forbestechcouncil/2022/02/16/the-accuracy-limits-of-data-driven-healthcare/?sh=4c141b174623 (https://www.forbes.com/sites/forbestechcouncil/2022/02/16/the-accuracy-limits-of-data-driven-healthcare/?sh=4c141b174623)

Whare are some of the limits?

Well, to start. Recent research from MIT found a high number of errors in publicly available datasets that are widely used for training models. An average of 3.3% errors were found in the test sets of 10 of the most widely used computer vision, natural language processing (NLP) and audio datasets. 

Data is found in free text.No shared data collection standardsSelf-reported data can be inaccurateFraud and Abuse

But it's not all doom and gloom, progress is being made.
