A Syntax Dictionary and Test Routines for processing GS1 Application Identifier data
This article describes the result of a collaboration between the Barcode Writer in Pure PostScript and Zint barcode generation projects in the creation of a “Syntax Dictionary” describing GS1 Application Identifiers with the aim of improving data quality in the GS1 system by providing a harmonised framework that can be adopted by any project that processes Application Identifier based data such as element strings and GS1 Digital Link URIs.
Update: As of August 2022, ongoing development and maintenance of the GS1 Syntax Dictionary is now coordinated by GS1 Global Office in the form of the GS1 Barcode Syntax Resource: https://ref.gs1.org/tools/gs1-barcode-syntax-resource/
The GS1 Syntax Dictionary is a text file that is both human-readable and machine-readable, which consists of a set of entries describing each currently assigned GS1 Application Identifier (AI) and its relationship to other AIs. The contents of the dictionary are intentionally straightforward, however it is sufficient to facilitate certain activities that are essential to all good quality barcode generation and barcode data processing software that supports GS1 symbologies, chiefly:
The GS1 Syntax Dictionary, together with a set of reference Linter procedures, can either be used directly or transliterated into third-party software. It should be straightforward for projects that adopt these resources to update to new revisions whenever they are updated in response to changes in the corresponding specifications.
The raison d'être for releasing these resources under a permissive software license (Apache 2.0) is to provide an extensible framework for accurately processing AI syntax data that can be used by free and proprietary software alike and thereby improve the quality of data and artefacts (such as barcode symbols) that enter the supply chain.
In this article we demonstrate how this is the case for barcoding applications that support the GS1 Application Identifier system and we conclude with a case study showing benefits to supply chain data quality.
Representations of GS1 AI syntax data
Within barcoding systems, AI syntax data is commonly represented in the several distinct formats depending upon the context.
The following examples present the same information in a number of common formats.
1. Human-friendly rendition:
Sometimes referred to as "non-HRI text".
GTIN: 03453120000011
USE BY or EXPIRY: 210508 (8th May 2021)
BATCH/LOT: ABCD1234
SHIP TO LOC: 9501101020917
This is a presentation format in which titles are shown for each individual Application Identifier so that it is clear exactly what each element string represents.
It is the most frequently used display format when an individual must act directly on the data since it is straightforward for a human to understand the entirety of the information that is represented.
2. GS1 AI syntax string:
Sometimes referred to as a "bracketed AI element string".
(01)03453120000011(17)210508(10)ABCD1234(410)9501101020917
Or equivalently displayed as a block that is typically presented in the form of “HRI text” adjacent to barcodes:
(01) 03453120000011
(17) 210508
(10) ABCD1234
(410) 9501101020917
These are presentation formats in which the numeric representation of the Application Identifiers is used, which is still easy for a human to read but places the onus on the individual to look up the corresponding definition for each AI themselves.
The AI syntax string is the most frequently accepted input format used by good-quality barcode generation and labelling software since it is compact and does not require the user to understand the nuances of how the data is to be encoded within a barcode message.
Note: Some software requires that AIs be input using an alternative to parentheses (round brackets) since these can appear as legitimate data value characters: Square brackets are commonly substituted for this purpose. Other software uses parentheses to denote AIs but requires any parentheses that appear in data values to be entered using some character escaping mechanism.
3. GS1 barcode message data:
Sometimes referred to as an "unbracketed AI element string".
{FNC1}01034531200000111721050810ABCD1234{FNC1}4109501101020917
This is an interface format for the data that most closely resembles what is actually encoded in a barcode symbol. Some low-quality barcode software requires that the user carefully convert their AI syntax data into this machine interface format in order to create valid GS1 symbols.
Mistakes made by users manually performing this conversion, and buggy software that performs this conversion incorrectly, are the leading sources of bad quality data in the GS1 barcoding system. The GS1 Syntax Dictionary contains all of the information that is necessary to facilitate an accurate conversion as we shall see in the next section.
“{FNC1}” represents the special FNC1 function character that is either included directly in the barcode message using a unique symbol character (that has no direct ASCII representation) or is inferred using a reversible ASCII character substitution whose effect is undone at scan time. The leading FNC1, which is either directly included in the encoded barcode message or inferred through the presence of a mode indicator, denotes the use of AI syntax data. Any subsequent instances act as a data separator that must be placed at the end of any non-terminal element strings that are not defined as fixed-length.
4. Scanned GS1 barcode data:
]d201034531200000111721050810ABCD1234{GS}4109501101020917
This is an interface format representing the data that a barcode reader transmits to the host when scanning a barcode symbol containing AI syntax data.
When a barcode symbol is scanned, each instance of an FNC1 character in the barcode message is transmitted by the reader to the host as the Group Separator character (“{GS}” having ASCII value 29), except for the leading FNC1 character which is represented by the “symbology identifier” at the start of the data — always “]d2” for GS1 DataMatrix.
Aside: Symbology-specific variations
The latter two representations relate to the interface formats used by a barcode image generator to encode message data within a symbol and by a barcode scanner to retrieve the stored message data from a barcode symbol and pass it to a host. Herein a certain amount of generalisation is presented because the fine details of the encoding vary between symbologies, although the principles remain the same.
Throughout this document the examples provided relate to GS1 DataMatrix for which the data representations provided exactly match the barcode message encoded in the symbol and the data transmitted by a barcode scanner upon decoding it.
For those with a thirst for more detail, here are examples of subtle differences in the internal barcode message for different symbologies:
It is decidedly not the intention of the designers of barcode symbologies that these nuances be exposed to the casual user of barcode applications. They are artefacts of the design of each general symbology that become apparent when these are adapted for application-specific use within the GS1 system. In some cases these differences arise from the lack of provision for the AI syntax data in the original design, whilst in others they arise from optimisations specially included in the design to increase the space efficiency when encoding AI syntax data.
All processes presented in this document can be trivially adapted to support the nuances of each symbology since the overall encoding and decoding principles are common. In all cases the relevant technical specifications should be consulted to determine whether the data interface formats require specialisation to accommodate any peculiarities of the symbology.
5. GS1 Digital Link URI:
https://example.com/01/3453120000011/10/ABCD1234?17=210508&410=9501101020917
This is a relatively new representation for Application Identifier based data that encodes it in the form of a web address that facilitates linking to online information and services. When included within a barcode carrier it is always represented using the generic symbology (i.e. without FNC1 in first position) as a simple string message.
GS1 Digital Link URIs encode semantics that are not possible to represent using regular GS1 AI element strings. In particular there is a clear differentiation between the thing being represented — identified by the qualified key present in the URI's path information — and the attributes of the thing, which are represented by the query parameters.
Nevertheless, there exists a rough equivalence between a GS1 Digital Link URI and an AI element string. Care must be taken when constructing a GS1 Digital Link URI to differentiate between identification keys and regular attribute data and ensure that these are properly encoded within the path information and query parameters, respectively.
The GS1 Syntax Dictionary describes the key to key-qualifier associations permitted by the GS1 Digital Link URI grammar, and as such enables conversion between regular AI element strings. The details of this conversion are not described further in this article in which we will now focus on conversion between traditional AI element string representations.
Converting between representations of Application Identifier data
A requirement of any software that handles AI syntax data is to accurately convert the data between multiple representations. This may be necessary for presentation purposes, encoding or decoding of AI syntax data as a barcode message, or some other purpose.
The following diagram shows the typical format conversions required within a barcode system when processing AI syntax data:
The following descriptions explain how to use the GS1 Syntax Dictionary to perform the numbered conversions shown in the preceding diagram:
➀ To convert a scanned GS1 barcode to a GS1 AI syntax string
]d201034531200000111721050810ABCD1234{GS}4109501101020917
|
V
(01)03453120000011(17)210508(10)ABCD1234(410)9501101020917
Notice that within scanned barcode data any FNC1 separator characters (or implied FNC1 characters) that existed in the barcode message have been converted to ASCII GS characters for transmission to the host.
When parsing the scanned GS1 barcode data it is necessary to consult the GS1 Syntax Dictionary for each AI to determine whether or not it belongs to the predefined set of fixed-length AIs that do not require separation from a subsequent non-terminal AI using an FNC1 character in the barcode message data, and if so to lookup its specified data length. The entries for these fixed-length AIs are denoted with the "*" flag character.
01 * N14,csum,key ex=02,255,37 dlpkey=22,10,21|235 # GTIN
17 * N6,yymmd0 req=01,02,255,8006,8026 # USE BY or EXPIRY
10 X..20 req=01,02,8006,8026 # BATCH/LOT
410 * N13,csum,key # SHIP TO LOC
It can be seen from the entries selected above that (01), (17) and (410) are in the predefined set of fixed-length AIs, with character lengths of 14, 6 and 13 respectively, whereas (10) is not flagged as fixed length. Without the corresponding length data it would not be possible to reliably determine where one AI finishes and the next AI begins.
The conversion of scanned barcode data to a AI syntax string starts by dropping the “]d2” symbology identifier and looking up a prefix of the remaining data that matches some entry in the GS1 Syntax Dictionary. The matching entry (01) is specified as having length fourteen (“N14”) indicating that the next fourteen characters denote the value for this element string, with no requirement for a GS character separator — the next AI may immediately follow. Since it too is specified as a special fixed-length AI of length six (“N6”) the next six characters denote its value and there is no requirement for a GS separator before the next AI.
The remaining AIs (10) and (410) do not belong to the special list of fixed-length AI’s, so their data value must be read until a GS character is encountered or the end of the data is reached.
Recommended by LinkedIn
Compatibility note
The standards firmly recommend that an FNC1 separator is omitted after a fixed-length AI, yet they do not strictly forbid it. Therefore the conversion must accommodate GS1 barcode messages in which fixed-length AIs are followed by an FNC1 character by discarding any superfluous GS character that appears after a fixed-length data element in the scanned barcode message.
General-purpose application software should follow "Postel's robustness principle" which ensures optimal interoperability: Be rigorous with the data that you produce and be liberal with the data that you accept. This principle does not apply to data validation software which should report any deviations from the nominal or canonical format defined by the relevant standards, no matter how slight.
➁ To convert a GS1 AI syntax string to GS1 barcode message data
(01)03453120000011(17)210508(10)ABCD1234(410)9501101020917
|
V
{FNC1}01034531200000111721050810ABCD1234{FNC1}4109501101020917
The conversion begins by including an FNC1 character at the beginning of the message data to indicate that the barcode message contains AI syntax data.
The conversion continues by writing out each AI, dropping the parentheses, followed by its data field value. After processing each non-terminal element string it is necessary to consult the GS1 Syntax Dictionary for the entry corresponding to the element string’s AI to determine whether it is a member of the predefined set of fixed-length AIs that do not require separation from a subsequent AI with an FNC1 character. These fixed-length entries are specified with the “*” flag character.
01 * N14,csum,key ex=02,255,37 dlpkey=22,10,21|235 # GTIN
17 * N6,yymmd0 req=01,02,255,8006,8026 # USE BY or EXPIRY
10 X..20 req=01,02,8006,8026 # BATCH/LOT
410 * N13,csum,key # SHIP TO LOC
It can be seen from the entries selected above that AI (01), (17) and (410) are flagged as fixed-length whereas AI (10) is not. An FNC1 separator following the data field should be excluded in precisely those cases where the AI belongs to the predefined list of fixed-length AIs. In all other cases the FNC1 character must be included with the exception of the terminal element string.
Compatibility note
As noted previously, the transmission protocol for barcode message data specifies that instances of the FNC1 non-data separator character are converted by the scanner into a Group Separator character so that they can be transmitted to the host as regular ASCII. The recipient of a barcode message provided by a general-purpose barcode scanner is unable to distinguish between an FNC1 character and a literal GS character encoded within the barcode — nor does this matter for correct decoding of AI syntax data.
Historically this has resulted in many applications producing barcode images in which non-terminal, variable-length AIs have been erroneously terminated with literal GS characters rather than FNC1 non-data characters. After much debate over whether this actually matters (given that in practise decoding is unaffected by the error) it is decided that the GS character is also permitted for this purpose.
Again, follow the robustness principle by using the traditional FNC1 as the separator character unless some application standard is in effect that specifies that you use GS, since doing so will often lead to a more optimal data encoding which may result in a smaller symbol size.
➂ To convert a GS1 AI syntax string to a human-friendly rendition
(01)03453120000011(17)210508(10)ABCD1234(410)9501101020917
|
V
GTIN: 03453120000011
USE BY or EXPIRY: 210508 (8th May 2021)
BATCH/LOT: ABCD1234
SHIP TO LOC: 9501101020917
For this simple conversion process the GS1 Syntax Dictionary should be consulted to lookup the entry for each AI to determine the corresponding data field title.
01 * N14,csum,key ex=02,255,37 dlpkey=22,10,21|235 # GTIN
17 * N6,yymmd0 req=01,02,255,8006,8026 # USE BY or EXPIRY
10 X..20 req=01,02,8006,8026 # BATCH/LOT
410 * N13,csum,key # SHIP TO LOC
Each AI should be resolved to the corresponding data field title and emitted. It is common that additional processing is performed to render information such as dates into an unambiguous human-friendly format.
During all of the above conversion processes an important task that should be performed is to validate the data to ensure that it conforms to the definitions for the AIs provided by the corresponding specifications, as represented by both the type specification and Linter references in the GS1 Syntax Dictionary.
GS1 AI syntax data validation and association checks
The GS1 Syntax Dictionary can be used along with a set of Linter procedures either directly by application code or by a project's build system to generate code that performs a deep validation of any message consisting of GS1 Application Identifier element strings.
Basic data type and length checks
10 X..20 req=01,02,8006,8026 # BATCH/LOT
The above entry describes AI (10) with the title “BATCH/LOT”. The data field for this AI consists of a single component that can contain arbitrary text from one to twenty characters in length drawn from GS1 Character Set 82 (“X..20”).
The type check means that the following frequently occurring erroneous inputs would be rejected:
Content validation using reference Linter routines
01 * N14,csum,key ex=02,255,37 dlpkey=22,10,21|235 # GTIN
The above syntax entry describes AI (01) with the title “GTIN”. The data field for this AI consists of a single component that must be exactly fourteen digits (“N14”).
Additionally this component should be validated with the listed set of Linters:
The type check alone allows the following erroneous inputs to be rejected:
Additionally the Linters enable a further class of erroneous inputs which are syntactically well-formed but whose contents is incorrect to be rejected, for example:
Reference Linters such as “key” perform offline checks that ensure that the data contents are feasible. However, their implementation can be extended to support the specific requirements of an application. For example, an application that has access to online data sources might perform a real time lookup to ensure that the given GCP has been allocated and is in current use. In that case the following example of an otherwise valid GTIN might be rejected:
Application Identifiers with multiple components
7007 N6,yymmdd [N6],yymmdd req=01,02 # HARVEST DATE
The above entry describes AI (7007) with the title “HARVEST DATE”. The data field for this AI consists of two components:
Together, the type checks and Linters allow the following erroneous inputs to be rejected:
The GS1 Syntax Dictionary applies an extensive set of Linters to AI syntax messages for validating around thirty types of data including country codes, currency codes, IBANs, alphanumeric check character pairs and structured coupon data.
Application Identifier association checks
Note also that the GS1 Syntax Dictionary entries for an AI may contain additional "key=value" pairs with the following meanings...
Within the context of either regular AI element string data or a GS1 Digital Link URI:
Within the context of a GS1 Digital Link URI:
Case Study: GS1 AI syntax data linting in the Online Barcode Generator
The Online Barcode Generator is a web-based tool with a long pedigree and support for around 100 formats that allows users to generate barcode images for their data. It is based on Barcode Writer in Pure PostScript and therefore benefits directly from the integrated GS1 data linting described above.
Within 24 hours of activating AI data format validation it had detected more than 3500 instances of bad user-supplied AI syntax data thereby preventing the generation of defective GS1 barcodes, many of which would have previously entered the global supply chain.
The most frequently detected errors in AI values are presented in the following table:
Error detected ............................. Occurrences (to 2 sig.fig.)
AI (01): Too short ................................................. 500
AI (02): Bad checksum .............................................. 420
AI (02): Too short ................................................. 350
AI (01): Bad checksum .............................................. 230
AI (96): Invalid CSET 82 character ................................. 200
AI (95): Invalid CSET 82 character ................................. 140
AI (01): Too long .................................................. 130
AI (17): Invalid month ............................................. 130
AI (15): Invalid month ............................................. 110
AI (00): Bad checksum .............................................. 110
AI (00): Too short .................................................. 98
AI (15): Too long ................................................... 95
Unrecognised AI ..................................................... 94
AI (10): Invalid CSET 82 character .................................. 89
AI (23): Unrecognised AI ............................................ 84
AI (3103): Too short ................................................ 83
AI (00): Not numeric ................................................ 72
AI (02): Not numeric ................................................ 63
AI (00): Too long ................................................... 46
AI (01): Not numeric ................................................ 54
AI (21): Too long ................................................... 43
AI (37): Not numeric ................................................ 42
AI (15): Too short .................................................. 39
AI (37): Too long ................................................... 38
AI (90): Invalid CSET 82 character .................................. 34
AI (10): Too short .................................................. 26
AI (15): Not numeric ................................................ 23
AI (98): Invalid CSET 82 character .................................. 22
AI (11): Invalid month .............................................. 19
AI (14): Invalid month .............................................. 17
AI (17): Too short .................................................. 17
AI (241): Invalid CSET 82 character ................................. 17
AI (3103): Too long ................................................. 16
AI (402): Too long .................................................. 15
AI (402): Too short ................................................. 15
AI (8110): Coupon fields must be 1,2,3,4,5,6 or 9, increasing order . 13
AI (8110): Invalid month in expiration date ......................... 12
AI (97): Invalid CSET 82 character .................................. 12
...
Clearly the GS1 Syntax Dictionary and Linters have the potential to drive huge improvements in supply chain data quality.
Afterward
If you would like to find out more about integrating the GS1 Syntax Dictionary with your project then please contact the author Terry Burton or either of the Barcode Writer in Pure PostScript or Zint forerunner projects.
Terry Burton is the principal of Terry Burton Consulting Ltd and a recognised subject matter expert in the field of barcode generation. He is the current Chair of the AIM Technical Symbology Committee and continues to be a major contributor to barcode symbology standards. He maintains an extensive open source software barcode generation library, Barcode Writer in Pure PostScript.
Acknowledgements
Many thanks to Harald Oehlmann, Wilfried Weigelt and Steven Keddie for their valuable suggestions when reviewing drafts of this article.