Modern Data Stack: What Data Ingestion tool should you pick?
In the rapidly evolving landscape of modern data management, selecting the appropriate ingestion tool plays a pivotal role in ensuring a successful data stack implementation. With a plethora of options available, making an informed decision requires careful evaluation of several key factors.
- Ingestion Tooling
- Modern data stack
When selecting an ingestion tool for your modern data stack, consider your team’s capabilities, real-time replication needs, and data volume for cost and performance impacts. Robustness is essential, especially for mission-critical applications. Also, assess deployment options and custom connector designs as each tool varies. For cloud solutions, examine data volume’s correlation with pricing.
In this post, we compare Fivetran, Airbyte, and Meltano, helping you confidently find the ideal tool.
Why do you need a data ingestion tool?
Data ingestion tools play a critical role in modern data stacks by facilitating the extraction and loading of data from various sources into a centralized location, such as a data warehouse or data lake. These tools are essential for enabling organizations to make informed decisions based on accurate and up-to-date data.
The primary function of data ingestion tools is to connect to different data sources, extract data in various formats, transform it if necessary, and load it into the target destination. They are responsible for handling data movement, data integration, and data synchronization tasks efficiently and reliably. Example data sources could be your CRM (e.g., Hubspot), invoicing software like Exact, marketing tools and your operational databases.
In short: Airbyte, Fivetran, and Meltano for data Ingestion
Below is a comparison of three products used for ingestion as part of a modern data stack: Fivetran, Airbyte, and Meltano. They are useful in different situations, with varying trade-offs. In short:
- Fivetran is fully managed and has more mature connectors. Technical experience may be necessary to set up custom connectors or when connecting to e.g., PostgreSQL, which requires more configuration. Fivetran can be very expensive.
- Airbyte can be used with their cloud service or by deploying the open-source variant yourself. Costs are lower, and you have much more control when deploying the service yourself. Their built-in connectors are much less mature, however, and most are not out of the development and testing phase. Some say Airbyte is not yet stable/fast enough to be used in production.
- Meltano trades a user-friendly experience via a web interface for a developer-focused experience. They provide no user interface, instead relying on (version-controlled) configuration files and the CLI. Connectors (called taps and targets) are based on the Singer specification. Many such connectors have been written in the past, although not necessarily meant for use in Meltano. These are often maintained by third parties, with unclear support.
Embrace Data-Driven Success: Sign Up for Our Newsletter.
Join our newsletter to receive expert insights, actionable strategies, and real-world stories that will guide you to achieving data-driven success.
Quality of existing connectors
First we turn to the main functionality of any ingestion tool, the data connectors it employs.
Even though Airbyte has many connectors, most of them are in either alpha or beta (connector catalog). It might not be confidence-inspiring for a client to see all the alpha/beta tags when viewing the Airbyte interface newly-added to their infrastructure.
We strongly discourage using alpha releases for production use cases and do not offer Cloud Support SLAs around these products, features, or connectors. (source)
It’s important to note that Airbyte explicitly advises against using alpha releases for production use cases, and as a consequence, they do not offer Cloud Support SLAs for these products, features, or connectors, as stated in their source. This means that if you encounter any issues or challenges with alpha connectors, dedicated support may not be available. In the open-source context, Airbyte’s support is limited to a community slack, and there is no official support SLA in place.
To their credit, Airbyte has transparently shared that their alpha connectors are not ready for production use, emphasizing their commitment to quality and reliability. Instead, they encourage users to explore beta connectors, which boast a higher sync success rate of 93%, as opposed to the 90% rate reported for alpha connectors in January 2023, as mentioned in this source. This data provides valuable insights into the performance of these connectors and helps users make informed decisions regarding their integration choices. Their page describing the release stages of a connector. They explain in detail what tests their connectors must pass before moving out of alpha/beta.
In conclusion, Airbyte has the potential to be a great source of well-tested and robust connectors in the future, but it is not there quite yet if you require very high reliability of your data connectors.
When it comes to transparency on the implementations of their connectors, Fivetran takes a different approach compared to Airbyte, leaving us with limited information to share. While Fivetran undoubtedly stands as a robust product, their connectors are completely closed source. However, one aspect that’s crystal clear is the pricing, which comes with a substantial price tag.
There are many connectors (taps/targets) available for Meltano (see here). It is based on the open-source Singer specification, which was also used by Stitch, for which many have been developed (see here). Airbyte claims that the quality of these connectors varies wildly, and that many may be broken due to schema changes and left without support (see here).
Implementing Custom Data connectors
Implementing custom data connectors is something that always comes up sooner or later, and when it does, you want to be prepared.
Airbyte: Empowering Customization and Extensibility
Airbyte takes a different approach, emphasizing customization and extensibility. Connectors in Airbyte implement the Airbyte protocol and can be created in three ways: using low-code connectors, a connector builder UI, or building more complex connectors using Python or Java with the Airbyte Connector Development Kit (CDK). The platform’s openness allows you to build connectors that match your unique data needs, making Airbyte an attractive choice for those seeking a more tailored data integration solution.
Fivetran: Streamlined Data Pipelines with a Hands-Off Approach
For those seeking a streamlined data integration solution, Fivetran offers a hands-off approach to building data pipelines. Setting up custom connectors in Fivetran involves using a cloud function, as detailed in their documentation. It is clear that their offering is more towards the hands-off approach of building data pipelines.
Image credit: https://fivetran.com/docs/functions
Meltano: Following the Standardized Singer Specification
Meltano follows the well-established Singer specification, utilizing “taps” to extract data records from sources and “targets” to store these records in desired destinations. The Singer specification standardizes the message format for taps and targets, offering a structured and standardized approach to data processing. While Meltano’s method aligns with the Singer specification, it’s essential to be aware of the opinions shared by Airbyte regarding Meltano/Singer, which can be found in these informative blog posts:
Deployment & Configuration: A Comparative Overview
Airbyte offers both an open-source option and a cloud-hosted solution. The open-source version grants the flexibility to run it locally, in containers, or on Kubernetes. The web interface makes configuration a breeze for both options.
Fivetran stands as a robust SaaS application, but its hosting is limited to their cloud infrastructure. Configuration is straightforward, thanks to the user-friendly web interface, although exporting configurations can be challenging. Moving out from the Fivetran cloud might prove difficult, as you cannot host their solution yourself.
Meltano provides an open-source component and a beta version of a managed cloud solution. The open-source variant offers versatility by running locally or in the cloud, using a CLI and config files. Though it lacks a web-based UI for easy configuration, the benefits lie in version controlling, saving, and restoring configurations with ease. Developers can apply updates to Meltano, making it an excellent choice for those well-versed in code and configuration.
Fivetran and Airbyte are accessible to non-developers with their intuitive web-based interfaces, ideal for quick and efficient setups. However, exporting or version-controlling configurations may be a limitation.
In contrast, Meltano takes a developer-centric approach, focusing on code and config files. While it lacks a web-based interface, integration with Dagster and Airflow provides options for pipeline scheduling. Configurations for sources, targets, and pipeline specifics remain code-based tasks, catering to developers with more technical expertise.
Comparing Ingestion tools on Pricing
Airbyte: Cloud Pricing with Credits
Airbyte Cloud employs a credit-based pricing system, with each credit priced at $2.50. Check out their pricing page for further details. The page also outlines the credit cost for different data operations types, such as reading API sources (6 credits per million rows) and reading databases, warehouses, and file sources (4 credits per GB). Notably, writing to databases and warehouse destinations incurs no additional cost. For example, importing a database of just over 7 GB in size with 30 million rows would cost around $70 (calculated at 4 credits per GB). In contrast, utilizing Fivetran’s starter plan for the same task would amount to over $3000. The substantial price difference between these services is evident.
|Data operations type||What does one credit stand for?|
|Read API source||6 credits per million rows|
|Read database, warehouse & file source||4 credits per GB|
|Read custom source||6 credits per million rows|
|Write database & warehouse destination||free|
Fivetran: Pricing Based on Monthly Active Rows
Fivetran’s pricing structure is based on Monthly Active Rows (MAR), calculated as the number of rows added or updated. Costs per MAR vary depending on the plan and usage, with detailed information available here. For instance, processing 1 million MAR per month on a starter plan costs around $500/month, while on a standard plan, it would be approximately $750/month. To compare features across plans, refer to the feature comparison page.
Meltano: Cloud solution in Beta
Meltano operates solely as an open-source product, with a cloud offering in beta. Pricing right now is not available yet. The only costs incurred for using Meltano as OSS are related to hosting (cloud/on-premise) and, potentially, additional engineering labor to maintain the service.
Conclusion: Navigating the Ingestion Tool Landscape
In this comparison of three popular ingestion tools - Fivetran, Airbyte, and Meltano - we’ve explored the distinct features, pricing models, and capabilities they offer. Fivetran stands as a fully managed solution, excelling in mature connectors and ease of use, but it comes with a higher price tag. Airbyte, on the other hand, provides a flexible cloud and open-source offering, allowing greater control and lower costs, though its connectors are still evolving. Meltano takes a developer-focused approach with an open-source solution, relying on code and configuration files for customization, making it ideal for those well-versed in technical aspects.
Ultimately, the ideal choice among these tools depends on your organization’s specific requirements, resources, and expertise. By carefully considering the factors mentioned and understanding the strengths of each tool, you can seamlessly integrate data into your stack, setting the foundation for effective and data-driven decision-making in your business.
About the author
Maximilian is a machine learning enthusiast, experienced data engineer, and co-founder of BiteStreams. In his free-time he listens to electronic music and is into photography.Read more
Enjoyed reading this post? Check out our other articles.
Do you want to get more value from your Data? Contact us now
Get more data-driven with BiteStreams, and leave the competition behind you.Contact us