Frequently Asked Questions

What data connectors are supported?

Data Pipelines currently supports the following:

  • AWS S3 (CSV, Parquet, JSON lines, AVRO, Orc)
  • AWS DynamoDB
  • Google Sheets
  • Google Analytics
  • Google BigQuery
  • Azure SQL
  • SQL Server
  • MongoDB
  • Hadoop (Self hosted accounts only)
  • MySQL - Including AWS, GCloud, Azure & more
  • PostgreSQL - Including AWS, GCloud, Azure, Heroku & more
  • MariaDB
We will be adding more connections in the future, if there's one you'd like to see please contact us.

Why isn't the pipeline view updated instantly when I click Preview?

Data Pipelines is designed to scale to the very largest of workloads. This requires consistancy, so we use Apache Spark to generate both data previews and run the workloads. This means that your pipelines are kept validated and are guaranteed to run everytime. The cost of this is that a Spark job runs with each iteration of the preview and this takes a few seconds to complete, regardless of the size of your datasets.

Can I mix different types of data connections in the same pipeline?

Yes. You can combine data from different types of connectors in a single pipeline. You can also write to different connections on the same run.

Is there a self hosting option?

Yes. Self-hosting is available on our enterprise plan. Get in touch for more information.