How to use External Variables in your Data Pipeline

Learn how to use dynamic external variables as part of a collaborative data pipeline process

12 Oct 2021 • 1 min read

Externalized variables allow you or others to control certain parameters of your data pipeline without having to make any modification to the the pipeline schema or access the pipeline builder at all.

For example: you create a data pipeline that generates a daily report based on a list of product codes that is updated frequently. These product codes are set by your client who do not have access to your Data Pipelines account. The product codes will need to be externalized so they can be set dynamically. This can be done by storing them in a Google Sheet.

The Google Sheet storing the list of product codes is accessed and updated by your client and consumed by your data pipeline. It effectively becomes just another dataset that can be combined with any other dataset in your pipeline. So for example to filter by these codes you could to a left join. Once the list of product codes are updated the result of the join also changes.

This is a powerful way to create dynamic variables that may range from just a single value to a whole table. Access to the variables is controlled by whoever has access to the Google Sheet. Google provides advanced sharing options.

The above technique is demonstrated in the following video.

A simple example about dynamic externalized variables in your data pipeline