Adding a few thoughts on Data Mining & Big Data
Disclaimers : Am a Digital Product Manager - having worked in E-commerce, Fin-Tech, Telecom (Data + OTT), Consumer Durable Retail so my insights shall be based on B2C industries & perspectives. Lastly am no expert at any tool - R, Python, SQL but have used it intermediately for Running the Product. So please read it more from a "User in Need" perspective.
Sharing an article which helped me collaborate my learning by @SanilSubhashChandraBose :
https://www.garudax.id/pulse/practical-guide-data-mining-e-commerce-business-subhash-chandra-bose/
In order to perform data mining - there are a few must haves :
- Customer Behavioral Data (at least a year's trends are required) for all KPIs like Name, location, mobile number, Transaction Details, Date & Time for transactions, etc.
- Tool to analyze - R, Python, SQL
- Business Objectives/ Consumer Problems to derive e.g. Segmenting of customer data or projecting customer sales data
- And lastly, an open mind to question the findings from domain knowledge perspective
Here am sharing a few techniques for Data Mining :
- Supervised (Predictive) : When the exercise is based on a resulting/controlled variable. Z = F(X) + Y . E.g. Campaign analytics on what shall be the possible outcome basis these changes
- Classification : Decision Tree, Rule Induction, Neural Network, Nearest Neighbor Classification
- Regression (Elastic Net) : Linear, Logistic, Polynomal, Stepwise, Ridge, Lasso
- Forecasting
- Predictive Modelling
- Unsupervised (Descriptive) : When there is no controlled variable to follow & its an exercise based on absolute numerical values of the inputs. Z = F (X) . E.g. CUG, UAT Feedbacks collation or survey collections. Also collating social feedback metrics like likes, comments, playstore ratings, etc. is Descriptive.
- Clustering : PCA & feature selection
- Association
- Sequential Analysis
- Diagnostic : This is used to analyse why this happened & possible reasons. Root cause Analysis,
- Prescriptive : This is used to guide what should be done to prevent something from happening, it is in the form of process note/ set of recommendations.
- Monte Carlo Situation
- Pattern Identification & Alerts
- Optimizations
Which technique to use in which problem situation is the trick to master I believe :)
Also sharing a Predictive Analysis/Data Mining Ecosystem Diagram :
For every test - post defining objective, its important to calculate the sample size (n) & p-value for deciding the accepted area of optimum along with confidence level for accuracy (alpha).
https://www.invisionapp.com/inside-design/cognitive-science-design/
https://becominghuman.ai/lets-talk-about-advanced-analytics-a-brief-look-at-artificial-intelligence-bf1c7a7d3f96
https://medium.freecodecamp.org/if-youre-a-developer-transitioning-into-data-science-here-are-your-best-resources-c31928b53cd1