Artificial Intelligence
During 2017, we witnessed significant attacks on a monthly base such as WannaCry, Nopetya etc... We noticed an increase in ransomware, botnets and other vectors as popular forms of malware attack. In addition, cybercrime continually expanding its methods of attack utilize APT tools, Vault 7, Vault 8 etc., scripts and toolkits for phishing attacks, randomization and so much more. Those aspects pushed organizations toward machine learning (ML) hoping it will provide the forceful deterrent.
One definition of the term Artificial Intelligence - An area of computer science that deals with giving machines the ability to seem like they have human intelligence (Webster’s Dictionary).
Systems based on AI helping us by automate many tasks, but also lets us tackle complex problems that most humans are capable of solving. It can apply to a variety of issues: robotic planning and navigation, computer vision, natural language processing. The questions are in what extent AI can assist to engage the threat and what are the changes that allowed it.
Ø Overload of Information - We all provide so much information and this information is available in digital form, ready for use.
Ø The technology is there - Incredible resources are now available and the cost for accessing these resources in the cloud are decreasing. Even local computing power & storage continue to grow exponentially.
Ø Research with algorithms – algorithms give us the ability to use these new computing resources on the massive data sets now available.
The intelligence process deals with gathering, analyzing and presenting (Graphical IE Link Analysis & statistical models) a variety of statistical and narrative data. This means that in order to make available information actionable and relevant, the machinery must be able to work with Big Data’s three V’s:
· Volume which associated the most with big data, because, well, the volume can be big.
· Velocity as the measure of how fast the data is coming in.
· Variety as the number of data’s types.
We can find ourselves using research areas that base on logic and artificial, focusing primarily on the use of logic to model human-like planning, reasoning and problem-solving.
In general, we can divide Machine Learning algorithms into two broad categories: supervised and unsupervised.
Supervised algorithms require a labeled training dataset. Once trained, it should subsequently be able to classify or predict data, given any new input correctly. Most neural network architectures can be considered as supervised learning algorithms.
Unsupervised algorithms do not require labeled training data sets. This typically uses inherent data properties to subsequently predict or classify data. For example, most clustering techniques such as K-Means are unsupervised algorithms.
One of the best book is Peter Norvig’s Artificial Intelligence — A Modern Approach, which covered significant topics including intelligent agents, problem-solving by searching, adversarial search, probability theory, multi-agent systems, social AI, philosophy/ethics/future of AI.
After brief explanation why AI provides added value to management level and to the decision-making process, here are few algorithms that will serve you[1]:
1. Naive Bayes Classification: Naive Bayes classifiers are a family of simple probabilistic classifiers based on applying Bayes’ theorem with strong (naive) independence assumptions between the features. The featured image is the equation — with P(A|B) is posterior probability, P(B|A) is the likelihood, P(A) is class prior probability, and P(B) is predictor prior probability
2. Ordinary Least Squares Regression: If you know statistics, you probably have heard of linear regression before. Least squares is a method for performing linear regression. You can think of linear regression as the task of fitting a straight line through a set of points. There are multiple possible strategies to do this, and "ordinary least squares" strategy goes like this — You can draw a line, and then for each of the data points, measure the vertical distance between the point and the line, and add these up; the fitted line would be the one where this sum of distances is as small as possible. Linear refers the kind of model you are using to fit the data, while least squares regard to the type of error metric you are minimizing over.
3. Logistic Regression: Logistic regression is a powerful statistical way of modeling a binomial outcome with one or more explanatory variables. It measures the relationship between the categorical dependent variable and one or more independent variables by estimating probabilities using a logistic function, which is the cumulative logistic distribution. In general, regressions can be used in real-world applications such as Credit Scoring, Measuring the success rates of marketing campaigns, Predicting the emerge of a specific malware.
4. Clustering Algorithms: Clustering is the task of grouping a set of objects such that objects in the same group (cluster) are more similar to each other than to those in other groups. Every clustering algorithm is different (Centroid-based algorithms, Connectivity-based algorithms, Density-based algorithms, Probabilistic, Dimensionality Reduction, Neural networks / Deep Learning )
5. Decision Trees: A decision tree is a decision support tool that uses a tree-like graph or model of decisions and their possible consequences, including chance-event outcomes, resource costs, and utility. From a business decision point of view, a decision tree is the minimum number of yes/no questions that one has to ask, to assess the probability of making a correct decision, most of the time. As a method, it allows you to approach the problem in a structured and systematic way to arrive at a logical conclusion. Such as identifying lateral movement etc.
As described shortly above, there are two major types of ML classification techniques: supervised learning and unsupervised learning, which are differentiated by the data (i.e., input) they accept.
Supervised learning refers to algorithms that are provided with a set of labeled training data, in order to learn what distinguishes the labels. for example, modern image recognition algorithms, such as Google Image search, can accurately distinguish tens of thousands of objects, and advanced facial recognition algorithms exceed the performance of human beings. By learning what makes each category unique, the algorithm can present new & unlabeled data and apply the correct label. Note the critical need for choosing a representative training dataset; if the training data contains only dogs and cats, but the new photo is a fish, the algorithm will have no way of knowing the proper label.
Unsupervised learning refers to algorithms provided with unlabeled training data, with the task of inferring the categories all by itself. Sometimes labeled data is very rare, the job of labeling is difficult or we may not even know if labels exist. For example, consider the case of network flow data. While we have enormous amounts of data to examine, attempting to label data would be extremely time-intensive, and it would be tough for a human to determine what label to assign.
Separating data into groups assumes that the relevant data is present. Determining the color of someone's skin is trivial for a sighted person, but a blind person will find that task unpassable, as he does not have the most critical sensor. They will have to rely on other information, such as the person's voice, an attempt to "label” the individual correctly.
Machines are no different in this regard. We mentioned earlier the concept of a feature. This concept can be understood straightforwardly: if our data is stored in a spreadsheet where a single row represents one data point, then the elements are the columns. For our email example, some features may be the sender, recipient, date & content of the email. From our network flow example, features include packet size, remote IP address, network port, packet content or any of the other hundreds attributes that network traffic can have. Having useful features is a critical prerequisite for being able to successfully apply machine-learning techniques. Simultaneously, having too many non-informative features may degrade algorithm performance, as the overabundance of noise can hide more information that is useful.
The steps required to create a ML tool are varied, but typically proceed as follows:
1. Data collection. While it's possible to run and even create ML algorithms based on streaming, real-time data (e.g., trading decision based on stock market data), the majority of techniques involve collecting data ahead of time and creating a model using stored data.
2. Data cleaning. Raw data is often unusable for ML purposes. There may be missing data, inconsistent data use (e.g., a cardinality feature may contain "North," "north," and "N," all identical in meaning), and numeric data with non-numeric characters, among many other possible problems. This step also involves the integration of multiple data sources to a single usable source. Cleaning is often a time-consuming and iterative process, as fixing one issue often uncovers another.
3. Feature engineering. After all the data is ready for use, it's time to ensure that maximum information is extracted from the data itself, as described above. This process usually takes place before creating the ML algorithm.
4. Model building / model validation. This set of steps involves building the model and testing to ensure it works correctly on unlabeled data. There are many statistical considerations to consider when testing the model. When working with supervised ML, a major concern is whether the model is overfit (overfitting is "the production of an analysis that corresponds too closely or exactly to a particular set of data, and may, therefore, fail to fit additional data or predict future observations reliably) to the training data,. i.e., whether the model that was produced takes into account properties that are unique to the training data. There are many statistical techniques used to minimize this risk, which is often employed during model validation.
5. Deployment/Monitoring. Implementation of an ML model is rarely a "once-and-done" event. Generally, and especially in the case of network traffic, historical observations does not necessarily match future activity. For that reason, even after deployment, models are monitored and periodically rerun through the build/validate step to ensure top performance.
To conclude the argument, we will look at the topic from another angle. When we look for Intelligence information on counter terrorism we can find a case of a Libyan citizen with an Australian passport, who intends to arrive in Belgium and to join a terrorist cell to carry out an attack. 10 years ago this task required dozens of open and covert security personnel to monitor every suspicious step at the airport.
Today, thanks to AI technology, we can reduce physical forces on the ground and process information at a rapid pace, such that finding a person answering a description transmitted by intelligence forces becomes a critical tool in the war on terrorism.
This analogy shows how we can adopt new technology to increase of preference and improve of actions; this is true when we fight terrorism but also when we tackle cyber threats.
I would like to thanks, Dana Toren for her kind help and contribution to this article.
[1] The information taken from Wikipedia
Good thinking
You've mentioned a few interesting points on AI here, thank you.
nice post and if you want to automate that feature engineering part try this www.featuretools.com
Succinct intro to AI and its role in Big Data Analytics. Nice!
Great article Eli!