Unsupervised machine learning methods for scaling data quality monitoring
As our ability to store and access vast amounts of data has increased exponentially, more and more data users have realized that having trustworthy data is a bigger challenge than having big data. Existing rules and metrics approaches to monitoring the quality of this data are tedious to set up and maintain, fail to catch unexpected issues, and generate false positive alerts that lead to alert fatigue.
In this talk, Vicky will describe a set of fully unsupervised machine learning algorithms for monitoring data quality at scale.
Participants will leave this talk with an understanding of unsupervised data quality monitoring, its strengths and weaknesses, and how to begin monitoring data using it in Databricks.