How to Reduce AI Hallucinations, Enhance Security and Decrease GPU Cost by Up to Ninety Percent in Your Enterprise AI project
In an earlier post I discussed the trend towards use of Small Language vs Large Language Models. Unfortunately, irrespective of whether you use a LLM or SLM in your enterprise AI project you are likely to encounter issues - these contribute to the much publicized statistic that 90% of enterprise AI projects do not meet their objectives.
This post discusses a new software technology co-developed by IBM with NVIDIA called Content Aware Storage or CAS which addresses many of the technical and financial challenges impacting enterprise AI project success.
How does Content Aware Storage Work
Content Aware Storage solutions from IBM provide near real time access to enterprise data by monitoring data sources in real time for changes – this monitoring is done at the data source not a data copy. The changes are processed as a series of GPU micro transactions through a workflow enabled by NVIDIA’s NeMo Retriever Extraction workflow which includes these steps: Extract, Prepare, Chunk and Vectorize. Once the data is vectorized it is available for semantic search within CAS using IBM Research developed software capabilities - either with or without use of a backend SLM or LLM.
So Why the Change and What Are the Benefits
The use of real time data feeds into AI workflows provided by CAS addresses many of the issues impacting enterprise AI project success:
Increases the accuracy of AI recommendations – AI solutions today are trained to provide answers and if they do not have access to applicable data the recommendations can be made up. The AI industry coined a new term to describe made up answers - they call this a “hallucination” which you may have read about as an issue. The problem resulting from these hallucinations can be severe and frequently there is no easy way to tell if an answer provided by an AI solution is underpinned by good data. This issue is greatly mitigated when you can provide the AI tooling with real time access to enterprise data.
Recommended by LinkedIn
Significantly enhance the security of AI systems – current AI solutions typically rely on copies of data. Unfortunately this typically also requires manual work to maintain access controls or security permissions for data use and many of the much publicized data breaches on the cloud tie to copies of sensitive data without appropriate security control. With a CAS enabled workflow you pick up both the original changed data and the access rights assigned to the data. So you have only one place to secure your data which makes it significantly easier to ensure data security.
Significantly improve the TCO of AI Solutions – without CAS, the vector databases are updated as part of a periodic Retrieval Augment Generation (or RAG) process and these updates are typically done by re-ingesting the entire source database. With CAS, small updates are processed in near real-time throughout the day. By keeping the GPUs consistently busy the cost of the GPUs inside an AI solution can be greatly reduced – in some cases by over 90%. As GPUs are for many AI solutions the majority of the expense of the solution the CAS offering can greatly improve the TCO for an AI Initiative.
Simpler Sizing to Accelerate your AI project - with CAS sizing is done based on the daily average data ingested, not the more complex GPU/CPU metrics. This makes solution design easier, pricing simpler and can accelerate AI projects by enabling accurate sizing without the need for a pilot.
How to Learn More
If you’d like to learn more about CAS see this YouTube video link which is a recording of the joint announcement by Jacob Liberman (NVIDIA Director of Product Management) and Joe Dain (IBM Storage CTO Office – Senior Technical Staff Member and Master Inventor) at the IBM TechXchange conference this month:
Enable Agentic AI at Scale in the Enterprise with NVIDIA and IBM