Our First Machine Learning Project!

A few months ago, we started the "AI A-Team", a group of very enthusiatic people within the Oracle Group of #CapgeminiNL who wanted to upskill in Machine Learning technologies and help their clients get more value out of their existing tools.

For our first project, we chose to focus on Natural Language Processing (NLP). NLP is one of the most-used areas of Machine Learning and is already being used by tons of companies to deliver unprecedented value. It is mainly used to derive insights from text. For example, a company which does a major product launch might want to analyze thousands of twitter reviews that indicate whether people are liking their product or not; or an enterprise might use NLP to analyze multiple log files across its various applications to get previously unseen insights for its systems. Since there are tons of resources on the internet which explain "What is Machine Learning and NLP", I would not go into that in this blog. We all use NLP products everyday to make sense of what we say/type, some of the most prominent ones are voice-assisstants like Google Assistant, Siri; and products like Google Translate.


Use case: With a very enthusiatic group of some of our A-Team members(Leon Smiers, Kasturi Kugathas, Anil Suri), we decided to focus on the use-case of a stock broker in a financial company who needs to analyze hundreds of incoming financial reports every day and make a decision on whether to invest in the analysed company or not.

Problem: As you can imagine, the analysis includes having to read the financial documents, search for any hidden clues about a company's financial condition and analyze a lot of data. This is, of course, prone to a lot of manual errors and can lead to a lot of financial decisions gone wrong!

No alt text provided for this image

Result: Frustated Stockbroker!





Solution: We decided to create a summarization tool using #TFIDF, an algorithm used in NLP, to condense the 30 pages(!) long documents into 30 lines which convey the meaning of the document accurately and are more easily and quickly readable. The tool we created breaks up the document into sentences, analyses the sentences and determines a "score" for each sentence. The summary produced by the model is simply the top "x" number of sentences. For the stock broker, having the short summary at hand means faster turn-around times and a decreased rate of error (lot less analysis!).

On top of the capabilities offered by the algorithm, we also used custom rules. For example, in our model, we increased the significance of a sentence if the sentences contains the words "billions", "millions", "sell" and "buy". We chose these words because the sentences containing these words probably have a high impact on the stock broker's decision and should ideally be included in the top-ranked sentences, and consequently, in the summary.

On top of this, to make the job of the stock broker even more easy, we gave an automated BUY, NEUTRAL or SELL rating to each company. This is done by deriving the sentiment of the analysed text and displaying it using a branch of NLP called "Sentiment Analysis". For eg, in the below screen shot, you can see that our model gives a 15.8% BUY(Positive) rating, 77.7% NEUTRAL(Neutral) rating and 6.5% SELL(Negative) RATING to the company.

No alt text provided for this image
No alt text provided for this image

Result: Happy Stockbroker!





This model can be applied in any situation where a lot of text needs to be analyzed to derive insights, like application logs, communications, search data etc. We are always looking for more real-life use cases (NLP/text/image/others) to help clients uncover more value out of their existing applications. Sample use-cases:

  • Analyzing logs in a complex IT landscape to uncover hidden insights which can lead to a reduction in cost/downtime.
  • "Intelligent" Search: Instead of just searching for/in a document based on text, make your search intelligent. Improve your results by matching the "context", not just the text.
  • Ticketing support: Analyse new tickets generated and match them against an existing database of existing tickets. See which tickets are the most similar to the newly-raised ticket (not just text, also context) and improve turnaround times.

If you have something in mind which we can help you with, feel free to send a message or drop in for a coffee!

Github Link: https://github.com/pulkit5454/FinancialAnalysis/blob/master/model

To view or add a comment, sign in

More articles by Pulkit Mathur

Others also viewed

Explore content categories