Trino and dbt open source data tools snuggle closer with integrated SaaS

Trending 4 months ago

Two SaaS products targeting unfastened root information guidance and analytics technologies person joined forces successful a move hoped to pull users who wish to exemplary and negociate information for crunching.

The business betwixt dbt and Starburst intends to service a ample marketplace by helping hole information for analytics without moving it, 1 expert told The Register.

Starburst is nan institution built astir unfastened root Trino (formerly Presto), nan analytics and information reservoir task originating successful Facebook's Hadoop environment, which counts AWS, Salesforce and Pinterest among its community. dbt, connected nan different hand, is nan institution built astir nan unfastened root instrumentality of nan aforesaid sanction that helps organizations model, negociate and foretell nan information transformations basal for analyzable "internet scale" analytics. The banal marketplace Nasdaq, engineering institution Vestas and martech elephantine Hubspot are among its customers.

Starburst co-founder Matt Fuller said dbt allows analytics engineers to exemplary information successful a higher-level connection but exports SQL to manipulate information successful a database aliases information reservoir specified arsenic Starburst.

"It's a really complementary technology," he told The Register.

Starburst besides allows users to analyse information extracurricular its information reservoir utilizing SQL, including systems specified arsenic MySQL aliases PostgreSQL, arsenic good arsenic non-relational systems for illustration MongoDB, Kafka and Elastic.

With customers already utilizing Trino and dbt together, it made consciousness to merge them successful nan companies' SaaS products – dbt Cloud for dbt and Starburst Galaxy.

"People antecedently were utilizing [dbt Core] pinch Galaxy, but it's a small cumbersome because Galaxy is simply a afloat managed offering and dbt Core is unfastened source, truthful you person to negociate it yourself. With this announcement, you tin now usage some products that are managed offerings together and that wasn't imaginable before," he said.

Analyst Kevin Petrie, Eckerson Group investigation vice president, said nan mixed work was targeting a ample market.

"Enterprise environments are much distributed than ever, pinch information domiciled on-premises and successful 2 aliases much clouds. This makes it tricky to move and hole information for analytics projects. By utilizing Starburst's federated query motor alongside dbt's translator engine, information teams tin hole information for analytics without needing to move it. So they tin analyse a wider array of data, wherever it sits, for a fixed analytics project.

  • MariaDB cuts jobs, repeats 'going concern' informing to banal market
  • Ex-BigQuery exec and Motherduck CEO: For immoderate users, nan reply is to deliberation small
  • Leave your information wherever it is, we'll look astatine it there, says SAP
  • Tabular launches pinch nan committedness of a 'headless' information warehouse

"They tin usage Starburst to query nan distributed data, and dbt to clean, exemplary and archive it, pinch nary request to ingest it crossed platforms."

A drawstring of information storage and analytic vendors person go willing successful offering users nan anticipation of bringing analytics to data, without moving nan information into a information storage aliases information lake. Teradata worked pinch Starburst to accommodate Trino for this intent successful its merchandise QueryGrid successful 2020.

More recently, Google BigQuery, Snowflake and Cloudera announced their take of Apache Iceberg, nan unfastened root information array format from Netflix.

Starburst besides has an Iceberg connector, but Fuller based on its attack was much unfastened than nan information storage vendors erstwhile applied to a information lakehouse – that precocious coined conception of combining information lakes and information warehouses.

"I'm gladsome they're yet catching up to knowing nan worth of Iceberg, but I don't deliberation they rather get it right," Fuller said. "Iceberg and Trino are wholly independent unfastened root projects. Combined, they create a genuinely unfastened information lakehouse. If you do want to usage them some together arsenic a commercialized offering, location is Starburst Galaxy and Tabular, which is nan institution down Iceberg. The quality pinch Snowflake and nan different approaches is they person limitations for it. In immoderate cases, nan catalog for Iceberg tables isn't accessible to different tools, for example. There's ever for illustration a flimsy lock-in angle."

Petrie told us: "Enterprises want to consolidate arsenic overmuch information arsenic they tin onto unreality platforms specified arsenic Snowflake, BiqQuery aliases Databricks. But information gravity and migration complexity forestall them from moving everything to conscionable 1 platform. So I deliberation galore environments will usage some consolidated platforms specified arsenic Snowflake and query engines specified arsenic Starburst aliases Dremio to support their analytics projects." ®