Data Connections

How data connections work


Data Pipelines makes it easy to connect data sources and destinations. There is no limit to the number of data connections within a pipeline.

Once a connection is created, pipelines may consume and write data to it. Most data connections are two-way (read and write) with a few exceptions (eg. Google Analytics) that are read-only.

Data connections are identified by name for the following reasons:

  • Deleting and re-adding a data connection: if data connections were identified by something like a primary key or a generated ID then it would be impossible to delete and re-add them since the generated key will never be the same as the one that was deleted. This would mean that all pipelines and schedules that depended on the connection now had to be updated to use the newly added connection.
  • Quick switching between data sources and destinations: building a pipeline is usually done using some kind of development database. Once the pipeline is built, we want to to start using the production database. There are multiple ways to do this, one of them being cloning the pipeline and updating the connections in the pipeline definition. An alternative way is to update the connection parameters. Since connections are identified by name, there is no need to make any changes to the pipeline or schedule that depend on the connection. Note, that this approach may not be suitable if there are other pipelines that also use the same data connection.

When a data connection is deleted, the system checks and warns if there are any pipelines consuming data from it or any schedules writing to it. This is done to prevent accidental deletion of data connections that existing pipelines depend on.

Note that data connection names are only editable if there are no pipelines or schedules currently depending on them.