Cognos Analytics : Data Sets

What is a Data Set

A Data Set is a tabular result set that is stored in the Content Store. These can be created from a Framework Manager Package or a Data Module. These are saved as parquet files in the Content Store and are a binary columnar format that is compressed, more information on the parquet files here. Due to the compression the actual parquet file may be smaller than the raw input file. Although these files are stored in the Content Store, once requested by the query service (DQM) the files are stored locally on the server so that the query planning may leverage caching opportunities. With Cognos Analytics 11.1 DQM now communicates with a new SPARK service that runs in its own JVM to process these parquet files. A Data Set can be used to create a Report, Dashboard or Exploration.

No alt text provided for this image

How should you use a Data Set

No alt text provided for this image

There are a number of good reasons to leverage a Data Set and some instances where it may not be an ideal solution. Firstly, since the Data Set is stored in the Content Store you need to be conscience of the impact of the database sizing. To assist with that there are governor settings that limit the size of each file and the total size of all files per user. When performance is critical and the database is perhaps large or complex that queries off it are not as optimal as desired then consider a Data Set. When creating the Data Set be aware of the number of rows, number of columns and specifically the columns that are actually used. Even if 2000 columns are possible to be defined in a Data Set including columns that are not used is inefficient. A Data Set includes aggregations to improve overall performance but if unused aggregations are included that may have a negative impact. As a general rule 10’s of millions of rows is possible were 100s of millions may not provide the performance you need and you should consider aggregate tables in your database for those large data volumes. When creating the Data Set include sorting on frequently used columns as that improves performance with columnar organization of the data in the parquet and how it is accessed. Also, as with any data source design including frequently computed expressions in the data is more efficient than computing those at runtime. As mentioned previously the parquet files are retrieved from the Content Store but at query execution time these are stored on the file system in the ../data location, having that location on fast disk also improves performance. The parquet files are stored on disk instead of being passed directly to the query service to allow reuse but these files when not used for a period of time are automatically purged to conserve disk space. These files are encrypted by default but that can be changed in the system configuration. The parquet files are created by Cognos and include metadata, therefor Cognos cannot consume parquet files created externally. To aid in creating the Data Set using CTRL ALT M (CTRL OPT M on Mac) will enable the Report Authoring toolbar.

No alt text provided for this image
No alt text provided for this image

Logging to assist with diagnosing Data Set queries is set in the JDBC event in the xqe.diagnosticlogging.xml file. Also, in the configuration under Diagnostic logging you can enable logging of the Flint Service which is the compute service that processes the parquet file. This will generate files (flint-console.log and dataset-service.log) in the ../log directory.


Additionally, you can enable the Spark console in flint-app.properties.

No alt text provided for this image

What’s different between 11.0 and 11.1 Data Sets

In Cognos Analytics 11.0.x DQM processes the parquet files where in Cognos Analytics 11.1 a new compute service offloads the processing. The new service is using Spark which allows for faster parallelism which benchmarking has shown significant improvements both in the size of the files that are possible as well as performance. With 11.1 the file format of the parquet file has changed. All new Data Sets will use the new format and a Data Set refresh will update to the new format. There is also a utility <install>/bin64/parquetUpgrade that will scan the Content Store and update the files. When the compute service receives a request and the file is not in the new format the data is processed as is and you may not realize the improvements.

Documentation on Data Sets can be found here.


Data sets are behaving very differently to data modules when it comes to aggregations. I tried a simple running-total on a field and got different results

Like
Reply

Very nice article, I am working on a report where we have approximately 12 millions of records and around 60 columns. in FM we are joining fact table with other 3 dim table and create a single query subject. We use this query subject to create data set. But it seems cognos is not handling this properly, server time out. Dose anyone has created data set with such high volume of data?

Like
Reply

Hi I think that this feature could be useful to build pre-aggregates used by dashboards and data modules. My concern is about the impacts on the content store. Reading this link https://www.ibm.com/support/knowledgecenter/SSEP7J_11.0.0/com.ibm.swg.ba.cognos.inst_cr_winux.doc/t_install_global_external_repository.html I seem to understand that it is possible to define an object store to store datasets on the local disk bypassing the content store. Have you ever tried this feature and verified that the content store isn't actually used?

Like
Reply

the command line option might be used by an admin who wants to force existing sets in the old format to be converted. otherwise, as a report/dashboard runs, if the old format is detected a conversion will occur automatically. you can improve data skipping of parquet access by sorting the input data (i.e sort option in report studio) for the primary columns you may filter on. in general do not attempt to stuff a large number (several hundred) columns per row into a parquet file. don't dump a pile of columns in rows that will never be used, you slow uploads, waste space etc. per kimball-methodology, you would do better to define precomputed column values vs repeat expressions used to drive filters+calcs per query. similarly, filtering on integer types is superior to searching on strings (especially long strings). 

To view or add a comment, sign in

More articles by Norbert Bracke

  • Cognos: FM and DM

    What is the difference between Framework Manager (FM) and Data Modules (DM)? Both tools are used to design a metadata…

    8 Comments
  • Are your Charts telling the right story

    Do you display the data so the insight is consumable? The goal is to provide clear understanding of the information and…

  • Modernization with Cognos Analytics

    If you are legacy Cognos user, you should become familiar with the new capabilities that have been introduced with…

  • Business Intelligence Modernization

    Business Intelligence has evolved through the years. Enterprises have been leveraging BI capabilities to derive…

    2 Comments
  • Cognos: Upgrading to CA and the Go Live

    Upgrading to Cognos Analytics Every process can be broken down to a methodology. After having help many customers…

    1 Comment
  • Cognos: Upgrading to CA and the Validation Phase

    Upgrading to Cognos Analytics Every process can be broken down to a methodology. After having help many customers…

  • Cognos: Upgrading to CA and Migrating the Reports

    Upgrading to Cognos Analytics Every process can be broken down to a methodology. After having help many customers…

    2 Comments
  • Cognos: Upgrading to CA and the baseline

    Upgrading to Cognos Analytics Every process can be broken down to a methodology. After having help many customers…

    1 Comment
  • Cognos: Upgrading to CA and the environment

    Upgrading to Cognos Analytics Every process can be broken down to a methodology. After having help many customers…

  • Cognos: Planing an upgrade to CA

    Upgrading to Cognos Analytics Every process can be broken down to a methodology. After having help many customers…

    1 Comment

Others also viewed

Explore content categories