Automatically Assigning Random Product IDs in Snowflake Using ARRAY_CONSTRUCT + UNIFORM() + Random()

Automatically Assigning Random Product IDs in Snowflake Using ARRAY_CONSTRUCT + UNIFORM() + Random()

In many data engineering workflows, especially during data simulation, testing, or backfilling incomplete records, you may need to assign a default product identifier or any such identifiers when the source system leaves it blank.

Snowflake makes this seamless by combining arrays with built-in random number functions.

In this post, we break down a simple, powerful SQL pattern for filling missing product_id values using a predefined list of product codes.

Here’s the query:

Article content

🔍 What This Query Does

This update statement:

✔ randomly selects one product ID from a list

✔ fills it into rows where product_id is NULL

✔ preserves existing non-null product IDs

This is perfect when you need quick synthetic product mapping or want product variety in a demo or test dataset.

🧩 Breaking Down the Logic

1. Define the set of possible product IDs

ARRAY_CONSTRUCT(9000,9001,9002,9003,9004)        

This creates an in-line array:

[9000, 9001, 9002, 9003, 9004]

No lookup table, No CTEs - the list lives directly in the statement.

2. Select a random index using UNIFORM()

UNIFORM(0, 5, RANDOM())        

This returns a random integer between:

  1. 0 (inclusive)
  2. 5 (exclusive)
  3. So the possible index values are: 0, 1, 2, 3, 4

which match the array’s positions.

3. Use the index to pick a product ID

ARRAY_CONSTRUCT(...)[index]        

Examples:

index 0 → 9000

index 1 → 9001

index 4 → 9004

Every row gets one random product ID.

4. Convert numeric to string

If your product_id column stores text values, Snowflake ensures correct type via:

TO_VARCHAR(...)        

5. Only update missing product IDs

WHERE product_id IS NULL        

This safely prevents overwriting existing values.

🚀 When to Use This Pattern

This technique is great for:

  1. Synthetic product data creation
  2. Backfilling missing product info
  3. Demo datasets for downstream teams
  4. A/B test distributions
  5. Mock catalog assignments
  6. Quick randomization in Snowflake exercises

🧠 Why This Pattern Is Clean & Powerful

  1. No staging tables
  2. No temporary functions
  3. No stored procedures
  4. One simple, expressive SQL query
  5. You choose what random values with bounding list

Snowflake’s functional SQL style lets you keep logic inline, readable, controlled list of values and performant.

To view or add a comment, sign in

More articles by Karthikeyan Shanthakumar

Explore content categories