Big Data in a Small Page - Cognitive Computing Simplified

Praful Krishna

Published Mar 10, 2015

From Coseer Blog: https://coseer.com/s/b9

Through various media like sight, sound, touch, etc., people accept or disseminate information in multiple formats - the written/ spoken language, facial expressions, intonations, physical language like hugs, etc. Each of them is highly nuanced. For example, it is the slightest twitch in the eye that differentiates surprise from sarcasm. This is why it takes a computing system with 100 billion nodes i.e. our brain, to successfully communicate between two people. This kind of processing power is not yet available outside nature.

So we engineers do what we are best at - pick a niche and assume that nothing else is important. If we pick up the simplest of these formats, the language, we still have to tackle with insane complexity. Language is simplest because it is consensus based. Two persons cannot exchange information unless they agree on language. Hence language has been codified. This is very useful for engineers who are trying to use structured, quantitative computing science for emulating human behavior.

The trouble is that this consensus based codification is highly local - geographically, temporally, professionally, demographically, or by any other axis you cut it. For example a sentence like "Today is a good day." means completely different things to people living in San Francisco than to people living in Byrd Station, Antarctica. For people living in Washington DC it means different things in August than in February. On the same day in Washington DC, say on the day of Supreme Court's decision on Obamacare, this sentence means different things to Republicans than to Democrats.

To ~80% of earth's population that does not understand English, the sentence does not mean anything at all, but let us engineers do the niche magic again and ignore them for a while.

Ergo, to design a cognitive computing solution that can be applied to more than one datapoint, we have to allow for a lax codification. The same word can take multiple parts of speech. The same meaning is conveyed by different orders of words in the sentence. Try "A good day, today is." The same fact can be expressed using different words. For example, "launch", "unveil", "release", "unlock", "announce", "beta test" and "introduce" may mean the same event in some contexts. The same fact can also be colored in different sentiments or related to multiple other facts in multiple ways based on opinions of different authors.

Many of the current NLP solutions try to make a good guess as soon as they come across these tokens. For example, Stanford's NLP kit assigns part of speech based on complex sentence parsing models built over statistical dictionaries. Another approach, like Coseer's, is to take every possibility as a hypothesis. As these hypotheses are processed with each other in increasingly constrained steps, multiple permutations simply perish. For example, in the sentence "A report is due.", we can make two hypotheses about "report" - verb or noun. When we inter-relate these with hypotheses on other words in the sentence, the verb hypothesis is voided. In either case, the simplest decisions require immense computing power.

Let's make this real. At 9:30 pm Pacfic Time on March 6, 2015, we ran a basic Finder module on www.nytimes.com. This module only focuses on spatial position of text elements, and creates hypotheses on basic facts encountered in text. The output is then used by multiple sophisticated algos to solve real problems. When we printed the JSON for NY Times, the basic mode output for single page ran into 1.7 million lines. It is available here for the brave among you (~4000 pages, ~9MB). For our solutions for Finance and eCommerce sectors, our servers process 1-1.5m documents everyday. In other words for simplest of our products we need to process datapoints running into trillions.

With the recent developments in Big Data technology it has become possible to process this kind of information using commoditized hardware, so now we can attempt to emulate our brain and get computers to do things that have been exclusive human till now. Managing this ambiguity using explosive data permutations within latencies that are sensible for real time applications, is what cognitive computing is all about. To know more, follow us on Facebook, LinkedIn,Twitter, or Google Plus.

Welcome.

Praful Krishna 11y

Thanks for your comment Narsimhan. You make a great point. There already is a lot of work going on in the way you suggest. Think about research in Speech Recognition, Augmented Reality, Clinical Genomics, etc. which use Artificial Intelligence as a tool to assist humans in solving problems. The real world examples are Siri, driverless cars (or at least cars with advanced, predictive safety features) and new molecules. There also are the numerous startups and companies like Pandora who learn about your tastes. We are trying to solve a slightly different problem. In the post I talk about NY Times, but what if for some reason you needed to process thousands of such pages, like a good investment management professional would like to do; or even millions, like someone who is trying to predict success of their promotional deals. This is impossible as of today. Today we drive in the blind or have to be content with samples. If we can emulate human-like understanding of that page, we can scale up to any level and provide significantly more nuanced answers. Instead of hours pouring over redundant data, now these answers can be available in seconds. Also, we can keep tracking the information sources for any changes, so that the users are very responsive. In some ways we are still only assisting humans, but our focus is on the professionals in enterprises who hardly get to spend time on what they are best at - creativity, innovation and human judgment.

Narasimhan Santhanam 11y

Hi Praful, I confess I did not understand everything you have written above, but the message I tell myself is this: In many contexts, doing stuff as well as nature does, or as we naturally do, is gonna take a long, long time, if at all. So, to folks like you who come to the table with enormous intellectual horsepower, here is a poser: Rather than try to design systems that try to substitute or replace what nature does, will it be a better idea to design systems that work along with nature? For instance, in the example provided by you for arriving at hypotheses based on text positions in the NY Times page, rather than trying to arrive at hypotheses (which the brain is probably far superior in doing, and in fact different brains could do it differently!), would it be better to use the computing power to only highlight words and texts in some ways so that it is easier for the brain to synthesise it and form hypotheses of their own? I guess I could have said the earlier stuff in an easier way: Why not let machines do what they can do best, so that humans can focus on what they can do best? Just imagine how much more value the brilliance of guys like you could add if you could realign your thinking this wee bit. All the best!

See more comments

To view or add a comment, sign in

Big Data in a Small Page - Cognitive Computing Simplified

Praful Krishna

More articles by Praful Krishna

Others also viewed

My Mom, Alexa and the Future of Computing

Logic Force Theory (LFT): A Quantum-Enhanced Framework for Logical Processing in AI Systems

Spectral Capital and the Quantum Leap: Leading the Charge in the Era of Advanced Reasoning AI

🧠 Quantum’s “PyTorch vs TensorFlow” Moment Is Here — But It’s Bigger Than That

The rise of AI APIs - AI for the rest of us

AI Servers: The Backbone of Modern Artificial Intelligence Infrastructure

The Convergence That Will Define the Next Decade: Quantum, AI, and Big Data

Scaling AI on HPC

Semantic Computing. A bird's eye view

STEP 9. Towards the generative computer: the third computing revolution

Explore content categories

More articles by Praful Krishna

A Business Leader's Guide to Chatbots

What should a Resume look like for an Entry Level Data Scientist Position?

How to run a Virtual Team?

Principles of Next Generation Knowledge Management

Free Will in AI and The Specter of Human Extinction

How to Build a 21st Century Organization

Three Laws for Driving ROI from AI Projects

Deep Language Understanding

A Smart Automation Agenda

Artificial Intelligence as a Product - The new SaaS of Software?