Our First Machine Learning Project!

Pulkit Mathur

Published May 13, 2019

A few months ago, we started the "AI A-Team", a group of very enthusiatic people within the Oracle Group of #CapgeminiNL who wanted to upskill in Machine Learning technologies and help their clients get more value out of their existing tools.

For our first project, we chose to focus on Natural Language Processing (NLP). NLP is one of the most-used areas of Machine Learning and is already being used by tons of companies to deliver unprecedented value. It is mainly used to derive insights from text. For example, a company which does a major product launch might want to analyze thousands of twitter reviews that indicate whether people are liking their product or not; or an enterprise might use NLP to analyze multiple log files across its various applications to get previously unseen insights for its systems. Since there are tons of resources on the internet which explain "What is Machine Learning and NLP", I would not go into that in this blog. We all use NLP products everyday to make sense of what we say/type, some of the most prominent ones are voice-assisstants like Google Assistant, Siri; and products like Google Translate.

Use case: With a very enthusiatic group of some of our A-Team members(Leon Smiers, Kasturi Kugathas, Anil Suri), we decided to focus on the use-case of a stock broker in a financial company who needs to analyze hundreds of incoming financial reports every day and make a decision on whether to invest in the analysed company or not.

Problem: As you can imagine, the analysis includes having to read the financial documents, search for any hidden clues about a company's financial condition and analyze a lot of data. This is, of course, prone to a lot of manual errors and can lead to a lot of financial decisions gone wrong!

Result: Frustated Stockbroker!

Solution: We decided to create a summarization tool using #TFIDF, an algorithm used in NLP, to condense the 30 pages(!) long documents into 30 lines which convey the meaning of the document accurately and are more easily and quickly readable. The tool we created breaks up the document into sentences, analyses the sentences and determines a "score" for each sentence. The summary produced by the model is simply the top "x" number of sentences. For the stock broker, having the short summary at hand means faster turn-around times and a decreased rate of error (lot less analysis!).

On top of the capabilities offered by the algorithm, we also used custom rules. For example, in our model, we increased the significance of a sentence if the sentences contains the words "billions", "millions", "sell" and "buy". We chose these words because the sentences containing these words probably have a high impact on the stock broker's decision and should ideally be included in the top-ranked sentences, and consequently, in the summary.

On top of this, to make the job of the stock broker even more easy, we gave an automated BUY, NEUTRAL or SELL rating to each company. This is done by deriving the sentiment of the analysed text and displaying it using a branch of NLP called "Sentiment Analysis". For eg, in the below screen shot, you can see that our model gives a 15.8% BUY(Positive) rating, 77.7% NEUTRAL(Neutral) rating and 6.5% SELL(Negative) RATING to the company.

Result: Happy Stockbroker!

This model can be applied in any situation where a lot of text needs to be analyzed to derive insights, like application logs, communications, search data etc. We are always looking for more real-life use cases (NLP/text/image/others) to help clients uncover more value out of their existing applications. Sample use-cases:

Analyzing logs in a complex IT landscape to uncover hidden insights which can lead to a reduction in cost/downtime.
"Intelligent" Search: Instead of just searching for/in a document based on text, make your search intelligent. Improve your results by matching the "context", not just the text.
Ticketing support: Analyse new tickets generated and match them against an existing database of existing tickets. See which tickets are the most similar to the newly-raised ticket (not just text, also context) and improve turnaround times.

If you have something in mind which we can help you with, feel free to send a message or drop in for a coffee!

Github Link: https://github.com/pulkit5454/FinancialAnalysis/blob/master/model

Nikunj Luhadiya 6y

Good Work Bro 👍👍

1 Reaction

Pulkit Mathur 6y

Leon Smiers Anil Suri @Kasturi

2 Reactions

See more comments

To view or add a comment, sign in

Our First Machine Learning Project!

Pulkit Mathur

More articles by Pulkit Mathur

Others also viewed

From NLP Foundations to Meta-Prompting: How I think About Working with LLMs

Human-Centred Agentic Science 3 - Qualitative Analysis

NLP and Text Analytics Simplified: Document Classification

GPT3:Overview and Application

Natural Language Processing for Social Media Applications

Revival of Knowledge Graphs – as key enabler for Gen AI/LLM solutions

Alignment vs. Orientation in Vector Similarity: A Guide for Machine Learning Practitioners

Conditional Random Fields (CRF): Short Survey

Explore content categories

More articles by Pulkit Mathur

Integration with AWS Serverless Tools

Getting Started with Oracle Integration Cloud Services- Testing ICS integrations (4/4)

Get Started with Oracle Integration Cloud Services- Creating Orchestrations (3/4)

Get Started with Oracle Integration Cloud Services- Creating a LinkedIn adapter (2/4)

Get Started with Oracle Integration Cloud Services- Introduction (1/4)