Login  |  Sign up

Introducing Data Pipelines

Data Pipelines makes it easy to create pipelines from large datasets without in-depth knowledge of Apache Spark. It removes the need for programming and SQL knowledge usually associated with data analytics. Our platform provides a user friendly interface to build reporting pipelines step by step.

How it works

Once you have an account with us, you will be able to create pipelines from your data stored locally (if self-hosted), in Amazon S3 or via connecting to SQL databases. Each pipeline reads data and produces output in CSV or Parquet format or SQL tables. Each step in a pipeline corresponds to an SQL operation (select, join, etc...) but since these operations are built with the help of the UI, only basic tabular data processing skills are necessary.

Once a pipeline is built, it can be scheduled to run at desired intervals or time of day.

Built for large workloads

Data Pipelines uses Apache Spark to process data which means that no workload is ever too large.

Sign up for 3 months of free trial and start building data pipelines in minutes.
Why sign up for a trial?
Use cases
Interactive Analysis

Problem: A market research company collects Ad impressions in CSV files stored in Amazon S3. These files are too large to inspect them via Excel or similar tools.

Solution: Using DataPipelines, it only takes a few clicks for researchers to connect the data and start inspecting it via the graphical interface.

Reporting Pipeline

Problem: A mobile game's micro transaction metadata is stored in parquet files in Amazon S3. They want to deliver several aggregated daily performance reports written to a MySQL database.

Solution: Admins connect their data to our platform and within minutes start creating pipelines, at each step previewing what the final output is going to look like. Pipelines then are scheduled to run once a day.

Archive Processing

Problem: A company has archives of historical medical data stored on hard drives that need processing and migrating to Amazon S3. They have their own servers and prefer to host everything in-house.

Solution: DataPipelines can be self-hosted which enables seamless processing and combining of data stored locally and in Amazon S3. Locally processed data is then written to their S3 bucket.

main page screenshot create pipeline page screenshot