Databento sponsors Polars: accelerating data wrangling in financial applications

June 30, 2023
Title picture for Databento and Polars

Databento is proud to sponsor Polars, a high-performance dataframe library ideal for fast data wrangling. By leveraging Polars' capabilities, market data users can process and analyze large datasets with ease, allowing for timely and data-driven decisions in response to market changes.

Polars' ability to handle large datasets efficiently has attracted widespread adoption as a go-to solution for data exploration in finance. The library is particularly useful for single-machine workflows that process datasets containing fewer than 1 billion rows, which is a common occurrence in quantitative trading.

Databento is not the only financial user that has found Polars game-changing. Arthur Whitney, creator of kdb, notes his appreciation for Polars: "Time—user and machine—is expensive. Pandas and Polars are free—god bless them."

Polars offers an exciting alternative to Pandas, with a few notable properties:

  • Impressive performance gains: Benchmark tests have shown Polars outperforming Pandas and other competing solutions by 8–15x. This stems from Polars' various optimizations; the use of parallelism processing and SIMD; lazy evaluation and reduction of copies; efficient memory traversal patterns, and more.
  • Familiar syntax: Polars offers a syntax similar to Pandas, making it easy for users to transition while delivering significant speed improvements.
  • Efficient columnar format: Polars is built on Apache Arrow and provides bindings for its excellent memory-mapping API that makes it easier to handle datasets much larger than available RAM—typically encountered in tick-level and order book data — on a single machine.
  • Cross-compatibility with Rust: pyo3-polars exposes pyo3 extensions for Polars's primary data structures, making it easy to extend Polars with user-defined functions compiled in Rust.

Another reason we chose to sponsor Polars is our common DNA in Rust. We've been heavy proponents of Rust for finance, evolving from a C/C++ codebase to a codebase that's primarily written in Rust. (We've documented our journey in Rust here.)

At the moment, Databento users can convert their data from DBN (Databento Binary Encoding) to Polars indirectly through its .from_pandas constructor.

import databento as db
import polars as pl

client = db.Historical()

data = client.timeseries.get_range(
    dataset='XNAS.ITCH',
    schema='trades',
    symbols=['GOOG', 'AAPL', 'BAC'],
    start='2023-06-21'
)

pd_df = data.to_df()

df = pl.from_pandas(pd_df)

We'll look to provide additional support for Polars dataframes as it makes its way into more of our internal workflows and financial applications.

Overall, we see Polars as a compelling choice for financial applications involving large datasets, and the team at Databento looks forward to supporting Polars' continued development.