How I learned to Data Science
Matlab Plot Graph of Sector based Cyber Threat Activity

How I learned to Data Science

There are few things that I know for sure, and one of them is that I am not a Data Scientist.

Having said that, over the past year or so, I've been learning how to use python and many submodules to shape stagnant data into something useful. After having spent this much time working on multiple projects, I feel like that might actually be the true definition of Data Science:

Using advanced tools to shape stagnant data into something useful

Reading the real definition it turns out I'm not that far off.

Data science is a multi-disciplinary field that uses scientific methods, processes, algorithms and systems to extract knowledge and insights from structured and unstructured data.


The experts that shaped me

Now I'm going to spend a few minutes walking you through the steps I took to start diving into the world of Data Science.

I have to admit right out the gate there are a few people both general coworkers and distant educators that helped me down the path. One guy I worked with Dr. Petropoulos (I kid you not), was a coworker of mine that really got me interested in the field. His knowledge of how Data Science and Machine Learning worked made me feel like I had not idea what what he was talking about. But then some of the classes by Hugo Bowne-Anderson on DataCamp really pushed me over the edge to really try new things.

To confirm my thoughts that I might be doing Data Science-y things, I reviewed some of my work with a few true Data Scientists at a bar in Reston, VA who were in town for an annual conference and visiting from the USF Diabetes Health Research Center. They confirmed my fears that I might be slowly becoming a Data Scientist.

Just to be clear, all of the programming language I've used so far has been python. There are many other options like R, SAS, and Java, but I chose Python.


The Language that shaped my Data

Ok, so let's get started.

I started by relearning the basics of python: opening files, setting variables, etc. Boring stuff. Then I moved into Numpy Arrays, List Comprehensions, and Pandas DataFrames. While Numpy Arrays store data in multi-dimensional lists, and List Comprehensions allow you to calculate and create lists from multiple elements in single commands, Pandas DataFrames allows you to ingest, structure, view, and shape data into something humans can understand and use.

Numpy Array: Source

No alt text provided for this image

List Comprehension Calculation:

myarray = [(arrayelement * 16)/12 for arrayelement in range(1,100) if arrayelemnent / 2 ]


The Data I shaped

Side Note: How to import the above and below mentioned modules

No alt text provided for this image

However Pandas DataFrames only lets an analyst represent data in string or text format. I then needed to be able to view data in a way that made it easy to quickly view trends, patterns, and groupings. Before the next section I should add that you need a functional platform that allows you to use modules that create a canvas to work with your data. So I started learning about Jupyter, which allows you to run code and view canvas based graphs via your browser, and Matplotlib.PyPlot, a canvas for visualizing the data.

No alt text provided for this image

This allows you to view trending patterns over time for groupings of data. The above is a graph created that shows shared techniques of Mitre ATT&CK hacking groups. Cool.

Other data I used was unstructured and I had not idea what the content looked like. In situations like those you may wind up with graphs like this: Not cool.

No alt text provided for this image

These really are only helpful to the analyst so that they realize they found out how to view the data, even if they still don't actually know what the data looks like.

I then wanted to find a way to view link analytics of Mitre ATT&CK groups to TTPs so I learned about Networkx and Seaborn. This actually was helpful in a different project I can't display here.

No alt text provided for this image

The future I'm predicting from the data my code shaped

As I continued to shape the data into something useful I started to be able view Cyber Threat Actor Attribution of active threats across sectors in the United States.

No alt text provided for this image

As I started moving into visualizing data and seeing the trends, patterns, and groupings the shaping was producing I started to be able to predict Sector Based Cyber Threat Actor attacks from month to month.

No alt text provided for this image

So in closing by using high quality stagnant data we just had lying around I was able to start performing advanced analytics against structured and unstructured data to help my community better understand their current sector based threat landscape and what might be in store for them in the near future.

To view or add a comment, sign in

More articles by David E.

  • Red Teaming @ 10000 Feet

    by David Evenden, @JediMammoth There are many articles/books that are pro-Red Teaming, but I haven't seen many that…

    4 Comments
  • Designing an Auto-Exploit & Implant Control Platform

    So I've been working on a few offensive security projects, and Jake Williams encouraged me that if I build something…

  • Visibility: Staying Safe Online

    In response to my article Online Privacy is a Myth many people asked how they can better protect themselves online. The…

  • Online Privacy is a Myth

    When I went to the Tribe of Hackers conference a few weeks ago Marcus Carey asked 10 willing attendees to spend one…

    4 Comments
  • MandaloreQuest: An Offensive Journey

    WTF is MandaloreQuest? My research on the topic of the AutoExploitation of targets initiated the development of…

  • An analyst's review of top Cyber Certs

    A side-by-side review of CySA+, SSCP, CASP+, and CISSP Over the past few years I have attacked a new Cyber Security…

    11 Comments
  • The Optical Illusion of Cyber Security

    It's been quite the past few years watching as the cyber security landscape quickly slips from our hands. As cyber…

    3 Comments
  • The Age of Pentesting & The Death of the Network Admin

    In our current society image is everything. From the clothes we wear to the cars we drive.

Others also viewed

Explore content categories