Tackling the Data Challenge

Tomoya Wada

Published Dec 9, 2016

We now live in a world of data-driven insight. In this new world, to devise the best business decisions, we need to continuously improve the quality of data sources and our analytics capabilities. We also must improve our analytical frameworks by deploying existing tools in new ways. These improvements will enable us to solve complex business challenges – and indeed survive – in the new data economy.

A Brief History

For more than 15 years, legacy tools such as SQL and Python have been used to provide data-driven insights. These tools were designed as general purpose tools, highly applicable for a wide range of topics, and have been used to solve a wide range of problems in fields as diverse as computer science and finance. Due to this diversity of application, these legacy tools have been improved with this multidisciplinary use in mind, and have contributed to the rapid development of various analytics techniques such as machine learning (ML), natural language processing (NLP), and graph-based analytic platforms.

As these analytics techniques have developed, we were restricted by the limitation of the computational resources — the amount of memory and storage available. To overcome this issue, since 2011, we have deployed a distributed system using batch processes. This distributed system enabled us to handle substantially large data for analytics. This framework has been maintained and further developed by open source communities, and has been extending our analytic capability. Thus, “Big Data” was born.

Batch processes have been widely used for data-handling/processing in the financial sector, but are limited in terms of its time resolution. Due to the time interval between each batch process, the outputs are not continuous. These intervals have a critical impact on time-sensitive issues such as fraudulent payments and equity trading. To minimize this impact, during last few years, we have been introducing streaming-process methods instead of batch processes. This approach enables us to process data in real time.

By combining the analytics techniques mentioned above, with data from larger and more varied sources, we are now able to derive higher quality results (e.g., rising alert signals of fraudulent payments) in a more timely manner. To be at the forefront of the financial sector, we need to deploy these technologies and techniques into appropriate processes, and optimize the overall analytics frameworks.

Winners and Losers

Despite the increasing need of financial institutions for data insight, the allocation of resources to this function is still relatively limited. To be at the forefront, banks invest substantial resources to create innovative analytics methodologies and to solve their business challenges.

JPMorgan, for example, has deployed distributed systems to extract insights from a wide variety of data sources in real time using both batch (Hadoop) and streaming processes (Spark). They have particularly focused on merging legacy resources and newly-developed technologies, and have successfully handled both structured and unstructured datasets, incorporating emails and social media posts in their analyses. One of their prime successes is the surveillance of internal compliance policy violations. The surveillance system generates alerts and prevents violations. This has resulted in substantially reduced potential legal fees.

This success demonstrates how JPMorgan deploys batch and streaming processes in their analytics framework. More importantly, it indicates that JPMorgan is capable of using these processes with internal “home grown” systems and does not rely on vendors. This enables JPMorgan to deploy these processes for other scenarios seamlessly, such as risk analysis using multidisciplinary data sources for portfolio management operations and improvement of customer service by analyzing customer behavior and predicting potential issues.

These benefits and future potentials will vary depending on the bank. Because of the cost of the installation, this technology may not be beneficial for some banks. Additionally, this technology requires the transformation of existing databases and modification of legacy software, which may prove too onerous for some. These considerations could outweigh the technology’s advantages described above.

For example, in 2015, a regional bank announced their preference to preserve its legacy databases and software. In this case, the bank recognized the potential of the distributed system, but they decided to postpone adoption due to the implementation costs involved. At the same time, JPMorgan actively improved their distributed systems to great success.

Introducing well-tested technologies with an optimal cost is one possible strategy to be pursued based on a bank’s business challenges and available resources. However, to truly innovate in these areas – and to maintain industry leadership – organizations need to derive the best possible solutions for their businesses using larger amounts of a wider variety data. Organizations can shorten the time-to-market to develop these new solutions by implementing efforts using open-source communities.

The New Terrain

For wealth management firms, there are obvious opportunities to strengthen their portfolio management operations using these approaches. Some examples will include:

Segmentation of investors, companies, and clients and prediction of their behavior
Measurement and analysis of risk tolerance and product suitability
Sales prospect discovery and determination of e effective sales scenarios

To capture these opportunities, the first step will be to conduct text data analysis using batch and streaming processes. Text data from both the public domain (e.g., news feeds, blogs, social media posts) and private domain (e.g., proprietary firm information) contains activities of investors, companies, and clients. To follow a client’s moment-by-moment activities, it is crucial to gather information from a wide range of text-data sources by deploying ML and NLP using streaming processes. On the other hand, the use of the batch processes for text data would be sufficient for non-time sensitive scenarios, such as weekly reports and journal publications. Using the collected information in business analyses, we can build better datasets.

These datasets will enable us to obtain a better classification of the targets and better predictions of the targets’ behavior. For example, improved datasets will allow us to estimate the target’s risk tolerance more precisely. Information from text data has a potential to describe what their current risk is. The risk may vary depending on the circumstances, and its description would support other analysis methods, such as time-series analysis. Once we have a well-articulated and precisely estimated risk tolerance, we can use this information to develop better products for clients and maintain better relationships with them.

There are also opportunities to identify the relationships among investors and companies using the dataset. Text data explicitly describes the current business partners of a firm or investor and their activities. The data also describes other personal and professional connections of the business partners besides the firm or investor. By gathering this connection data, we have an opportunity to explore the connections with a firm and target investors or companies. It is important to note that because of containing information using streaming processes, we have the opportunity to consider the temporal change of the connections to drive better predictions.

By taking advantage of integration, distributed systems, and various analytics, there are opportunities to increase the strength of wealth management firms’ portfolio management business lines significantly. This dynamic will lead to innovative solutions for addressing the business challenges on the horizon.

李华

伟门 - 物流专员

10mo

"123Homeschool's free worksheets make learning truly enjoyable for kids. （https://onlineimagecompression.com/）" Also check our powerful （Image compression） tool for fast image optimization.

To view or add a comment, sign in

Tackling the Data Challenge

Tomoya Wada

A Brief History

Winners and Losers

The New Terrain

More articles by Tomoya Wada

Others also viewed

Top 10 Questions About Knowledge Graphs—Answered

Artificial Intelligence and the Individual Customer

The Future of Data Science: Trends, Challenges, and Opportunities

🚀 GraphRAG: The Next Frontier in AI-Powered Knowledge Retrieval & Reasoning

Unlocking the Power of Vector Databases: The Future of Data Management

Leading Thoughts with Big Data Joe: How Big Data Empowers GenAI

Augmented Analytics: Automating Data Preparation and Discovery

Stop Giving AI Agents a Database. Give Them a Nervous System.

LightRAG: Simple and Fast Retrieval-Augmented Generation

How AI-driven analytics and LLMs are transforming data science

Batch Processing in Big Data

Real-Time Data Processing Tools

Machine Learning Frameworks

Big Data Analytics Implementation Issues

Trading Analytics Platforms

Explore content categories

A Brief History

Winners and Losers

The New Terrain

More articles by Tomoya Wada

On the Return of Old Problems in AI Agents

Others also viewed

Top 10 Questions About Knowledge Graphs—Answered

Artificial Intelligence and the Individual Customer

The Future of Data Science: Trends, Challenges, and Opportunities

🚀 GraphRAG: The Next Frontier in AI-Powered Knowledge Retrieval & Reasoning

Unlocking the Power of Vector Databases: The Future of Data Management

Leading Thoughts with Big Data Joe: How Big Data Empowers GenAI

Augmented Analytics: Automating Data Preparation and Discovery

Stop Giving AI Agents a Database. Give Them a Nervous System.

LightRAG: Simple and Fast Retrieval-Augmented Generation

How AI-driven analytics and LLMs are transforming data science

Similar topics

Batch Processing in Big Data

Real-Time Data Processing Tools

Machine Learning Frameworks

Big Data Analytics Implementation Issues

Trading Analytics Platforms

Explore content categories