Cognos Analytics : Data Sets

Norbert Bracke

Published Oct 8, 2019

What is a Data Set

A Data Set is a tabular result set that is stored in the Content Store. These can be created from a Framework Manager Package or a Data Module. These are saved as parquet files in the Content Store and are a binary columnar format that is compressed, more information on the parquet files here. Due to the compression the actual parquet file may be smaller than the raw input file. Although these files are stored in the Content Store, once requested by the query service (DQM) the files are stored locally on the server so that the query planning may leverage caching opportunities. With Cognos Analytics 11.1 DQM now communicates with a new SPARK service that runs in its own JVM to process these parquet files. A Data Set can be used to create a Report, Dashboard or Exploration.

How should you use a Data Set

There are a number of good reasons to leverage a Data Set and some instances where it may not be an ideal solution. Firstly, since the Data Set is stored in the Content Store you need to be conscience of the impact of the database sizing. To assist with that there are governor settings that limit the size of each file and the total size of all files per user. When performance is critical and the database is perhaps large or complex that queries off it are not as optimal as desired then consider a Data Set. When creating the Data Set be aware of the number of rows, number of columns and specifically the columns that are actually used. Even if 2000 columns are possible to be defined in a Data Set including columns that are not used is inefficient. A Data Set includes aggregations to improve overall performance but if unused aggregations are included that may have a negative impact. As a general rule 10’s of millions of rows is possible were 100s of millions may not provide the performance you need and you should consider aggregate tables in your database for those large data volumes. When creating the Data Set include sorting on frequently used columns as that improves performance with columnar organization of the data in the parquet and how it is accessed. Also, as with any data source design including frequently computed expressions in the data is more efficient than computing those at runtime. As mentioned previously the parquet files are retrieved from the Content Store but at query execution time these are stored on the file system in the ../data location, having that location on fast disk also improves performance. The parquet files are stored on disk instead of being passed directly to the query service to allow reuse but these files when not used for a period of time are automatically purged to conserve disk space. These files are encrypted by default but that can be changed in the system configuration. The parquet files are created by Cognos and include metadata, therefor Cognos cannot consume parquet files created externally. To aid in creating the Data Set using CTRL ALT M (CTRL OPT M on Mac) will enable the Report Authoring toolbar.

Logging to assist with diagnosing Data Set queries is set in the JDBC event in the xqe.diagnosticlogging.xml file. Also, in the configuration under Diagnostic logging you can enable logging of the Flint Service which is the compute service that processes the parquet file. This will generate files (flint-console.log and dataset-service.log) in the ../log directory.

Additionally, you can enable the Spark console in flint-app.properties.

What’s different between 11.0 and 11.1 Data Sets

In Cognos Analytics 11.0.x DQM processes the parquet files where in Cognos Analytics 11.1 a new compute service offloads the processing. The new service is using Spark which allows for faster parallelism which benchmarking has shown significant improvements both in the size of the files that are possible as well as performance. With 11.1 the file format of the parquet file has changed. All new Data Sets will use the new format and a Data Set refresh will update to the new format. There is also a utility <install>/bin64/parquetUpgrade that will scan the Content Store and update the files. When the compute service receives a request and the file is not in the new format the data is processed as is and you may not realize the improvements.

Documentation on Data Sets can be found here.

Prasad N 2y

Data sets are behaving very differently to data modules when it comes to aggregations. I tried a simple running-total on a field and got different results

Benet Fernandes 6y

Very nice article, I am working on a report where we have approximately 12 millions of records and around 60 columns. in FM we are joining fact table with other 3 dim table and create a single query subject. We use this query subject to create data set. But it seems cognos is not handling this properly, server time out. Dose anyone has created data set with such high volume of data?

Luca Arnone 6y

Hi I think that this feature could be useful to build pre-aggregates used by dashboards and data modules. My concern is about the impacts on the content store. Reading this link https://www.ibm.com/support/knowledgecenter/SSEP7J_11.0.0/com.ibm.swg.ba.cognos.inst_cr_winux.doc/t_install_global_external_repository.html I seem to understand that it is possible to define an object store to store datasets on the local disk bypassing the content store. Have you ever tried this feature and verified that the content store isn't actually used?

Nigel Campbell 6y

the command line option might be used by an admin who wants to force existing sets in the old format to be converted. otherwise, as a report/dashboard runs, if the old format is detected a conversion will occur automatically. you can improve data skipping of parquet access by sorting the input data (i.e sort option in report studio) for the primary columns you may filter on. in general do not attempt to stuff a large number (several hundred) columns per row into a parquet file. don't dump a pile of columns in rows that will never be used, you slow uploads, waste space etc. per kimball-methodology, you would do better to define precomputed column values vs repeat expressions used to drive filters+calcs per query. similarly, filtering on integer types is superior to searching on strings (especially long strings).

Cognos Analytics : Data Sets

Norbert Bracke

More articles by Norbert Bracke

Others also viewed

Unleashing the Power of Data Analytics: Exploring SQL, Excel, and Power BI

6 Benefits of QlikView With Big Data

What is BI, DW, ETL & MSBI?

Data, Visualization, and Separation of Concerns

Complete Guide to Crafting Dynamic Dashboards with Basic SQL

Data Visualization and Analysis using Power BI

Rethinking Business Intelligence: Traditional BI vs Pyramid Analytics

Looker BI - From the Eyes of Tableau BI Consultant

What’s New in Cognos 12.1.0 and The Story So Far…

Unlocking Business Intelligence: A Guide to SSRS, SSAS, SSIS, and SQL Server

Explore content categories

More articles by Norbert Bracke

Cognos: FM and DM

Are your Charts telling the right story

Modernization with Cognos Analytics

Business Intelligence Modernization

Cognos: Upgrading to CA and the Go Live

Cognos: Upgrading to CA and the Validation Phase

Cognos: Upgrading to CA and Migrating the Reports

Cognos: Upgrading to CA and the baseline

Cognos: Upgrading to CA and the environment

Cognos: Planing an upgrade to CA