From the course: Data Planning, Strategy, and Compliance for AI Initiatives

Unlock this course with a free trial

Join today to access over 25,500 courses taught by industry experts.

Data sampling techniques and statistical considerations

Data sampling techniques and statistical considerations

From the course: Data Planning, Strategy, and Compliance for AI Initiatives

Data sampling techniques and statistical considerations

- [Instructor] Oftentimes we need to use data sampling when we're working in AI. Now, the reason for this is that working with full data sets is sometimes impractical even with advanced computing resources. Now, sample selection can directly determine AI model quality and reliability. So we want to be careful about sampling errors that can cascade through development leading to flawed business decisions. So why do we sample the data? There's a number of reasons. It reduces costs while maintaining statistical validity and insights. It also accelerates exploration and development cycles by lowering processing requirements. Sampling enables more efficient hypothesis testing across a larger data landscape, and it can be useful for identifying data quality issues before we commit to a full blown implementation. Random sampling is really the gold standard for sampling. Now there are several different types. Simple random sampling is a sampling process where every data point has an equal…

Contents