How I learned to Data Science

David E.

Published Apr 23, 2019

There are few things that I know for sure, and one of them is that I am not a Data Scientist.

Having said that, over the past year or so, I've been learning how to use python and many submodules to shape stagnant data into something useful. After having spent this much time working on multiple projects, I feel like that might actually be the true definition of Data Science:

Using advanced tools to shape stagnant data into something useful

Reading the real definition it turns out I'm not that far off.

Data science is a multi-disciplinary field that uses scientific methods, processes, algorithms and systems to extract knowledge and insights from structured and unstructured data.

The experts that shaped me

Now I'm going to spend a few minutes walking you through the steps I took to start diving into the world of Data Science.

I have to admit right out the gate there are a few people both general coworkers and distant educators that helped me down the path. One guy I worked with Dr. Petropoulos (I kid you not), was a coworker of mine that really got me interested in the field. His knowledge of how Data Science and Machine Learning worked made me feel like I had not idea what what he was talking about. But then some of the classes by Hugo Bowne-Anderson on DataCamp really pushed me over the edge to really try new things.

To confirm my thoughts that I might be doing Data Science-y things, I reviewed some of my work with a few true Data Scientists at a bar in Reston, VA who were in town for an annual conference and visiting from the USF Diabetes Health Research Center. They confirmed my fears that I might be slowly becoming a Data Scientist.

Just to be clear, all of the programming language I've used so far has been python. There are many other options like R, SAS, and Java, but I chose Python.

The Language that shaped my Data

Ok, so let's get started.

I started by relearning the basics of python: opening files, setting variables, etc. Boring stuff. Then I moved into Numpy Arrays, List Comprehensions, and Pandas DataFrames. While Numpy Arrays store data in multi-dimensional lists, and List Comprehensions allow you to calculate and create lists from multiple elements in single commands, Pandas DataFrames allows you to ingest, structure, view, and shape data into something humans can understand and use.

Numpy Array: Source

List Comprehension Calculation:

myarray = [(arrayelement * 16)/12 for arrayelement in range(1,100) if arrayelemnent / 2 ]

The Data I shaped

Side Note: How to import the above and below mentioned modules

However Pandas DataFrames only lets an analyst represent data in string or text format. I then needed to be able to view data in a way that made it easy to quickly view trends, patterns, and groupings. Before the next section I should add that you need a functional platform that allows you to use modules that create a canvas to work with your data. So I started learning about Jupyter, which allows you to run code and view canvas based graphs via your browser, and Matplotlib.PyPlot, a canvas for visualizing the data.

This allows you to view trending patterns over time for groupings of data. The above is a graph created that shows shared techniques of Mitre ATT&CK hacking groups. Cool.

Other data I used was unstructured and I had not idea what the content looked like. In situations like those you may wind up with graphs like this: Not cool.

These really are only helpful to the analyst so that they realize they found out how to view the data, even if they still don't actually know what the data looks like.

I then wanted to find a way to view link analytics of Mitre ATT&CK groups to TTPs so I learned about Networkx and Seaborn. This actually was helpful in a different project I can't display here.

The future I'm predicting from the data my code shaped

As I continued to shape the data into something useful I started to be able view Cyber Threat Actor Attribution of active threats across sectors in the United States.

As I started moving into visualizing data and seeing the trends, patterns, and groupings the shaping was producing I started to be able to predict Sector Based Cyber Threat Actor attacks from month to month.

So in closing by using high quality stagnant data we just had lying around I was able to start performing advanced analytics against structured and unstructured data to help my community better understand their current sector based threat landscape and what might be in store for them in the near future.

To view or add a comment, sign in

How I learned to Data Science

David E.

The experts that shaped me

The Language that shaped my Data

The Data I shaped

Side Note: How to import the above and below mentioned modules

The future I'm predicting from the data my code shaped

More articles by David E.

Others also viewed

The Missing Link in Data Science - Beginner's Guide

Statistics, Programming, & Data Science

A Data Scientist's Top 5 skills

The Complete Beginner's Guide to Data Cleaning and Preprocessing

A case of spongy datasets: Missing values

Can one become a Data Scientist in 2 months?

Data Science Resources

How do I get started in Data Science?

Why "the right tool for the job" matters in data science

Getting Started with R for Data Science

Data Science Portfolio Building

How to Get Entry-Level Machine Learning Jobs

Data Science in Finance

Key Lessons When Moving Into Data Science

Explore content categories

The experts that shaped me

The Language that shaped my Data

The Data I shaped

Side Note: How to import the above and below mentioned modules

The future I'm predicting from the data my code shaped

More articles by David E.

Red Teaming @ 10000 Feet

Designing an Auto-Exploit & Implant Control Platform

Visibility: Staying Safe Online

Online Privacy is a Myth

MandaloreQuest: An Offensive Journey

An analyst's review of top Cyber Certs

The Optical Illusion of Cyber Security

The Age of Pentesting & The Death of the Network Admin

Others also viewed

The Missing Link in Data Science - Beginner's Guide

Statistics, Programming, & Data Science

A Data Scientist's Top 5 skills

The Complete Beginner's Guide to Data Cleaning and Preprocessing

A case of spongy datasets: Missing values

Can one become a Data Scientist in 2 months?

Data Science Resources

How do I get started in Data Science?

Why "the right tool for the job" matters in data science

Getting Started with R for Data Science

Similar topics

Data Science Portfolio Building

How to Get Entry-Level Machine Learning Jobs

Data Science in Finance

Key Lessons When Moving Into Data Science

Explore content categories