Creating example datasets should not be the hardest part of your workflow. Instead of searching for data that almost fits your needs, you can simply draw your own. With the drawdata library in Python, you can sketch data points and turn them into structured datasets within seconds. Here are some key advantages: ✔ Full control over your data ✔ Create exactly the patterns you want to demonstrate ✔ No dependency on external datasets ✔ Fast prototyping of ideas and methods ✔ Ideal for teaching and clear examples ✔ Saves time compared to searching for and cleaning data The visualization below shows the idea. Instead of generating data with formulas, you draw points on a canvas, create clusters, trends, and outliers, and then export the result as a dataset for analysis. This makes it easy to create realistic scenarios for testing, teaching, and debugging. I’ve just published a new module in the Statistics Globe Hub that shows how to draw synthetic datasets using the drawdata Python library and analyze them afterward in R with k-means clustering. It includes a full video walkthrough, practical examples, and detailed exercises. Not part of the Statistics Globe Hub yet? It is an ongoing learning program with new modules released every Monday, covering topics such as statistics, data science, AI, R, and Python. More information about the Statistics Globe Hub: https://lnkd.in/exBRgHh2 #datascience #python #machinelearning #datavisualization #syntheticdata #statisticsglobehub
Super handy trick for testing ETL pipeline logic and edge cases early on....
Does it do 3D
So much potential if it can be generalized to multi-dimensional
Seems practical for edu purposes :) I’ll try it!
This is really cool, thanks for sharing!
Thank you Joachim Schork for sharing this it's actually super useful and kinda overlooked a lot of people practice on clean, ready-made datasets so they skip the part where you actually think about how the data is generated. when you draw it yourself, you’re forced to think about patterns, noise, outliers, separability, basically the stuff models are trying to learn also really good for debugging. if your model fails on synthetic data you fully understand, that tells you way more than it failing on some random dataset you downloaded feels like a simple tool but builds much better intuition than people expect
That's very handy! I'd love to have a similar tool to draw time-series data in the same way! Have you ever considered that feature?
Wow, this is awesome. Thanks for the hint. Can I define constraints such as variance or average?
Such a great idea.
Damn, it is actually a great push for learning. It is actually time consuming to create functions that meet your demands for the synthetic data. Would love to try this. Thanks for sharing. Joachim Schork