Use of Regression in Data Science

Use of Regression in Data Science

Since the emergence of Big Data has introduced us with two major variables:

Key & Value

Let us see the relationship of such two variables, to be worked with in terms of using Data Science. The use of Regression also on basic terms gives an a depiction of two variables X & Y to work with. These variables are:

Independent Variables & Dependent Variables

Let us take behavior of users of a financial institution. We take a hypothetical data (random sample) of 6 users visiting one specific website of a Line of Business or LOB in specific one hour.

User: 1, 2, 3, 4, 5, 6 Visits: 5, 17, 11, 8, 14, 5

The user behavior of another user or we can say User # 7, we need to predict his/her behavior of visiting one specific website, we will be using the statistical technique, which is called "Mean", which is the adding all visits by first randomly selected users, which becomes to total visits to be divided by the total number of users, which we can also say as Mean (Visits) = 60/6 = 10. This is the prediction we can do in terms of best estimate for user # 7 to visit the same site. This can also be considered Internal LOB Forensics of User Behavior.  This can also be called the Measure of Variability. Let us now find the distance between our data on the good fit that we got after calculating the mean, which is 10 for users usage deviations ( Mean - Visit):

Residuals(Error): -5, 7, 1, -2, 4, -5 {Let us add all + & - = -5-2-5 = -12 & 7+1+4 = 12}

This means -12+12 = 0, our value is most likely the value of the next user's visit to the website, we have chosen the sample for. Let us now do a Sum of Squared Residuals or Errors, which is 120. This entire example is based on one dependent variable only, which is the visits of one specific website by some users with in a Line of Business. This predictive analytic discussion has introduced the idea of usage of a website, by some users using Simple Linear Regression. We certainly can explore more, if we know, the time users have spent on that specific website or the number of pages the visited, in this case, we now can have both Independent and Dependent Variables available for us to work with to have our prediction on a better note.

Linear Regression is a continuity of Correlation and Anova. While working with Correlation we work with two variables as we discussed in this article X & Y, and there are points plotted on these X & Y on a graph.There is a relationship that we have explored between these plotted points. We can also say that the value of one variable is the function of another variable. It can also be shown as:

y = f(x) { the value of y is a function of x }

The value of dependent variable y is always dependent on the value of dependent variable x.

It is hoped that this article sheds some light on the basic use investigative forensics within a department or a Line of Business within an organization, which may be looking at the internal users' behavior to serve some clients using one single resource.

 

 

 

 

The definition of all fixed up or variabled factors in any linear regression is the most important matter , The most difficult step is How we can find the real ?, make adjusting them to be compared to the real situation.

It's objective and indispensable to forcast in data

Sometimes simple linear models are "good enough", and provide considerable lift over "naive"/intuitive mental models. I agree that non-linearity, etc. is often present, but the lift is often minimal and should only be extensively mined if "small lift" means "big returns" $$). Sometimes this is not the case ... Linear regression is "good enough."

A key point not to be ignored here is that you have to characterize the behavior of each "variable" before you can make a meaningful comparison between the "variables" --- or for that matter, even choose an appropriate comparison technique.

To view or add a comment, sign in

More articles by Dr. Atif Farid Mohammad PhD

  • Quantum Computing - Basics Review

    The Basics of Quantum Computing Thanks to my friend Michael A. Echols MBA CISSP for asking me to write about Quantum…

    13 Comments
  • Quantum Computing - Learning 001

    We start with classical computing, by working with linear equations. This all starts with “bit”, and bits use gates and…

    4 Comments
  • Quantum Computing - Foundational Start

    People have been curious about the next stage in computing, which is Quantum Computing. We're used to traditional…

    1 Comment
  • GPT/LLM use in Remote Patient Monitoring... & Beyond

    #rpmgpt OmniAGI.ai has been working on LLMs (#rpmgpt) and has created an OmniSmart AI Agent to gather/process & train…

    11 Comments
  • LLM/GPT Hallucinations - We care.

    We are in the era of "LLM hallucinations". These are a phenomenon that occurs when Large Language Models (LLMs)…

    3 Comments
  • Generative AI (LLM/GPT, etc.): Reality Check

    The use of Generative AI can be significant in the enhancement for an organization using an Omnichannel..

    4 Comments
  • GPT & More - The Set Theory Implementation

    Set theory is a powerful tool to analyze and understand language models of any size. In a large language model, set…

    5 Comments
  • ChatGPT & the Role of Generative AI

    ChatGPT & more of such are based on Generative AI, which is an umbrella term encompassing an array of artificial…

    9 Comments
  • 2023 Cyber Security Brief

    The word “data” is being spoken in almost every industry, in every domain. What is data? It is something measured…

  • Democratizing Generative AI

    According to HBR Generative AI models are incredibly diverse. They can take in such content as images, longer text…

    4 Comments

Others also viewed

Explore content categories