Keep Canvases Moving with DuckDB on the Server

Product Updates

David Tomasoni-Major

February 20, 2025

April 16, 2025

•

min read

Contributors

No items found.

Subscribe to newsletter

Subscribe to receive the latest blog posts to your inbox every week.

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form.

No items found.

We're super excited to announce the general availability of DuckDB on the server. This is our way of dramatically speeding up query speed while reducing compute cost for customers, helping you drive adoption throughout your organisation with a great user experience and without worrying about unexpected costs.

It's a simple truth that when data is widely accessible, explorable and responsive, teams can better identify and tackle business problems that would otherwise be lost in the static of slow moving dashboards.

If you use Count today, you are already benefiting from DuckDB running your client browser. Our customers are already running around 80% of their queries this way, dramatically reducing their compute costs. Now with this new feature, we are able to offload the majority of the remaining queries.

Queries can now be run in one of three locations: your data warehouse, in serverless VMs, and in the browser. We balance across the last two to deliver quicker and cheaper queries.

We believe that by bringing on-demand, serverless DuckDB instances together with that existing capability, we are now offering one of the most computationally sophisticated BI tools on the market. We're taking more and more of the load—and therefore cost—away from your database so that compute cost never factor in decisions about how you use data as an organization.

This feature is available today to all our users in Explore, Scale, and Enterprise workspaces.

Duck Duck Go (explore)

You can easily switch canvas cells to query against DuckDB rather than the original data source it is already drawing from. You'll see this reflected in the cell status bar—and most likely in the processing time too!

Behind the scenes we'll then run these queries either in-browser, or from today, also on-demand in VMs within our secure infrastructure. We stream results efficiently between cells and automatically cache data as we go, letting DuckDB decide the most effective way of delivering your results quickly.

Switch cells as early as possible downstream of an large data query to improve the performance of subsequent analysis

In the overwhelming majority of instances, shifting queries to the server or browser will be quicker. We can deliver and cache the exact slices of data needed downstream.

But in almost every instance, this will be cheaper through the reduction of queries against your data warehouse. We've heard many customers talk of compute rapidly outpacing BI costs and limiting the widest usage of data in their organization.

When you're working with a great technology like DuckDB, it's tempting to contrive some A vs. B comparison showing how much faster it is with aggregations and filtering, but let's be honest BigQuery will dance as quick as you pay it. However, we believe Count (and companies) are at their best enabling rapid iteration and exploration, rather than just going fast along a fixed analytical pipeline.

Here is how we have been using this functionality internally:

We create a BigQuery cell to pull from the source database table, but limit to 10k rows (the default)
Leverage DuckDB on the client to make some visuals iteratively - here you benefit from zero latency and rapid feedback loops of having the data locally in your client
Once you're happy, change the upstream cell to DuckDB on the server to remove the row limit
The downstream DuckDB visual will now start running on the server - it's a bit slower, but it's still not hitting your database, and you haven't had to change anything

The opportunity

From examining how our customers use their data, we see two big areas where this feature will have an impact:

maintaining the performance of canvases when large results set are being dynamically filtered with control cells etc.
when there are just plain old "big queries" using row-level detail in further analysis

Switching to DuckDB and local cells can have an immediate impact in both of these instances.

We can now move even more queries off your infrastructure

This is just the latest in our efforts to keep canvases performant. It naturally combines with our existing query caching and scheduling features that can reduce query load further.