Simplify Complex Data Format Test Planning with Context Free Grammars

Simplify Complex Data Format Test Planning with Context Free Grammars

The following is an excerpt from my book "Writing Test Plans Made Easy", available in Kindle and paperback format here: https://www.amazon.com/Writing-Test-Plans-Made-Easy/dp/1478333693

Explaining Things that have Pieces and Parts Using Context Free Grammars

It is very common that a tester must come up with an approach for handling different data formats. Simple data formats like numbers and such can usually by covered by lists representing the different boundary values. But data formats get very complicated very quickly. When that happens, often switching to context-free grammars like BNF format can help explain the tests.

Imagine, for example, that a program takes a URL as an input and from that determines the source domain, the server address, the file path, the filename, and each individual argument in the query string and then stores each in a database. This implies the program has a lot of parsing code to extract each part. It is clear we need to think about the different forms of a URL. There is a lot to test here, so let’s look at just one example:

Test Issue: How well does the application isolate the server and domain portion of a URL?

Summary: The server portion of the URL is preceded by the format specification string and followed by an optional port identifier. Further, the domain portion itself can range from 0-N dot separated identifiers. These tests are designed to exercise the variations on these formats.

Methodology: Test data will be derived from URLs that match the following syntax:

<Protocol>://<Server_Domain>[/[<path>]]

<Protocol>:={HTTP | HTTPS}

<Server_Domain>:=<DomainTag> [:<port>]

<DomainTag>:=<Domain>[.<Domain>]

<port>:={1…99999}

<path>:=dir1/dir2/file.html

Using the above, tests will be generated using the following strategy:

  • Every option block [] will have tests for both existing and non-existing
  • Every choice block {x…y} range will have tests for max, min, and N random values in between
  • Every choice block with explicit values, {x|y} will have all options selected
  • <Protocol> both HTTP and HTTPS
  • <DomainTag> with 1…N domain options
  • <port> N Randomly chosen values as well as max and min
  • <path> a fixed value across all tests
  • <domain> random alpha numeric values

In the above example, it doesn’t take much text to describe what will turn out to be a rather large set of tests. By basing the test model on the syntax of the URL format it is easy for a reviewer to understand how the test selection relates to the parsing function under test. Note that the BNF expression of the URL format is not the same as the standard W3C URL specifications. The tester has chosen to target a BNF description that expresses a set of tests for a specific problem, thus the “<path>” part of the URL is set to a fixed value, because the tester did not find other forms of the path relevant to the test.

Wayne This is such a solid insight! Most teams drown in endless example datasets, but very few step back and define the structure of the data itself. Using a grammar-based approach (like BNF) is such an underrated superpower it gives clarity, reduces noise, and scales beautifully as complexity grows. Also, love the practicality behind your book. Testers don’t need 300-page theory… they need fast, reviewable, high-signal plans. The “30 minutes to quality” angle is a game changer. Definitely adding this to my reading list. Thanks for sharing and more people in QA need to hear this! 🔥

Like
Reply
Eduard Florinescu

Python Developer - remote

4mo

scrie planul de testare cu picioarele?

Like
Reply

To view or add a comment, sign in

More articles by Wayne Roseberry

Others also viewed

Explore content categories