Frequently Asked Questions
Data Pipelines currently supports the following:
- AWS S3 (CSV, Parquet, JSON lines, AVRO, Orc)
- AWS DynamoDB
- Google Sheets
- Google BigQuery
- Hadoop (Self hosted accounts only)
Data Pipelines is designed to scale to the very largest of workloads. This requires consistancy, so we use Apache Spark to generate both data previews and run the workloads. This means that your pipelines are kept validated and are guaranteed to run everytime. The cost of this is that a Spark job runs with each iteration of the preview and this takes a few seconds to complete, regardless of the size of your datasets.
Yes. You can combine data from different types of connectors in a single pipeline. You can also write to different connections on the same run.
Yes. Self-hosting is available on our enterprise plan. Get in touch for more information.