Support

Databento Binary Encoding

Databento Binary Encoding (DBN) is an extremely fast message encoding and storage format for normalized market data. The DBN specification includes a simple, self-describing metadata header and a fixed set of struct definitions, which enforce a standardized way to normalize market data.

All official Databento client libraries use DBN under the hood, both as a data interchange format and for in-memory representation of data. DBN is also the default encoding for all Databento APIs, including live data streaming, historical data streaming, and batch flat files.

Getting started with DBN

The easiest way to get started with DBN is through any of the official Databento client libraries, which support reading and writing DBN files. See:

Other resources are also available:

  • The dbn Rust crate is the reference implementation of DBN and provides a library for decoding and encoding DBN, and also for converting from DBN to CSV and JSON.
  • dbn-cli is a command line tool that can be used to read DBN files; transcode DBN to CSV or JSON; print the output, or write it to disk. You can install it with cargo install dbn-cli.

Why should you use DBN?

The key advantages of using DBN are:

  • End-to-end. DBN can be used to store and transport normalized data across all components of a typical trading system. It can fulfill the requirements of a file format for efficient storage; a message encoding for fast real-time streaming, and an in-memory representation of market data for a low latency system—all at once. This simplifies your trading system and eliminates the use of multiple serialization formats. It also ensures that market data is immutable, lossless, and consistent as it passes between components.
  • Use the same code for historical and live. Our client libraries exploit the end-to-end aspect of DBN and allow you to use the same code in historical and live; you can write an event-driven trading platform that runs the exact same code in backtest and production trading.
  • Zero-copy. DBN data is structured the same way whether in-memory, on the wire, or on disk. Thus, the data gets read and written directly as-is, without costly encoding or decoding steps that move the data into CPU and back out.
  • Symbology metadata. The DBN protocol includes a lightweight header that provides metadata for interpreting and using the market data payload, such as symbology mappings, so a DBN file is self-sufficient for many use cases.
  • Highly compressible. DBN strictly uses fixed lengths and offsets for all fields. This layout enables typical compression algorithms, such as zstd and lz4, to achieve high compression ratios.
  • Optimized for modern CPUs. The predictable layout of DBN records also allows for highly-optimized, sequential access patterns that take advantage of instruction pipelining and prefetching on modern CPUs. The struct definitions are also deliberately designed so that most records fit into a single cache line.
  • Extremely fast. DBN achieves extremely fast speeds on reads and writes. Most use cases of DBN are compression-bound or I/O-bound while only using a single CPU core. DBN has been used in environments with 6.1 microseconds median internal latency; we've also seen user-reported benchmarks of full order book replay over 19.1 million messages per second using our C++ client library on a Google Cloud VM.
  • Normalization format. Using DBN also automatically means you're adopting its normalization format. While there are many ways to normalize data, our team arrived at these best practices after many years of combined experience at top-tier trading firms and integrating dozens of trading venues. For example, DBN allows you to replay full order book data at I/O bound, backtest with passive orders in precise sequence, and losslessly achieve much of what's possible with raw packet captures with several factors of improvement in speed and storage requirements.

Layout

A valid DBN stream or file has two parts, beginning with metadata, and immediately followed by records.

The following diagram shows the field layout of the DBN encoding:

Version 1
 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|             magic string = "DBN"              |  version = 1  |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                            length                             |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                                                               |
+                                                               +
|                                                               |
+                            dataset                            +
|                                                               |
+                                                               +
|                                                               |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|             schema            |                               |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+                               +
|                       start (UNIX nanos)                      |
+                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                               |                               |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+                               +
|                        end (UNIX nanos)                       |
+                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                               |                               |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+                               +
|                      limit (max records)                      |
+                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                               |                               |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+                               +
|                            reserved                           |
+                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                               |   stype_in    |   stype_out   |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|    ts_out     |                                               |
+-+-+-+-+-+-+-+-+                                               +
|                                                               |
+                                                               +
|                                                               |
+                                                               +
|                                                               |
+                                                               +
|                                                               |
+                                                               +
|                                                               |
+                                                               +
|                 reserved (47 bytes of padding)                |
+                                                               +
|                                                               |
+                                                               +
|                                                               |
+                                                               +
|                                                               |
+                                                               +
|                                                               |
+                                                               +
|                                                               |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                    schema_definition_length                   |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                  schema_definition (variable)                 |
|                              ...                              |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                         symbols_count                         |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                       symbols (variable)                      |
|                              ...                              |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                         partial_count                         |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                       partial (variable)                      |
|                              ...                              |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                        not_found_count                        |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                      not_found (variable)                     |
|                              ...                              |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                         mappings_count                        |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                      mappings (variable)                      |
|                              ...                              |
+-+-+-+-+-+-+-+-+-+-end metadata; begin body--+-+-+-+-+-+-+-+-+-+
|                            records                            |
|                              ...                              |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Versions 2 and above
 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|             magic string = "DBN"              |    version    |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                            length                             |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                                                               |
+                                                               +
|                                                               |
+                            dataset                            +
|                                                               |
+                                                               +
|                                                               |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|             schema            |                               |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+                               +
|                       start (UNIX nanos)                      |
+                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                               |                               |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+                               +
|                        end (UNIX nanos)                       |
+                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                               |                               |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+                               +
|                      limit (max records)                      |
+                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                               |   stype_in    |   stype_out   |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|    ts_out     |        symbol_cstr_len        |               |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+               |
|                                                               |
+                                                               +
|                                                               |
+                                                               +
|                                                               |
+                                                               +
|                                                               |
+                                                               +
|                                                               |
+                                                               +
|                                                               |
+                                                               +
|                 reserved (53 bytes of padding)                |
+                                                               +
|                                                               |
+                                                               +
|                                                               |
+                                                               +
|                                                               |
+                                                               +
|                                                               |
+                                                               +
|                                                               |
+                                                               +
|                                                               |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                    schema_definition_length                   |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                  schema_definition (variable)                 |
|                              ...                              |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                         symbols_count                         |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                       symbols (variable)                      |
|                              ...                              |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                         partial_count                         |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                       partial (variable)                      |
|                              ...                              |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                        not_found_count                        |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                      not_found (variable)                     |
|                              ...                              |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                         mappings_count                        |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                      mappings (variable)                      |
|                              ...                              |
+-+-+-+-+-+-+-+-+-+-end metadata; begin body--+-+-+-+-+-+-+-+-+-+
|                            records                            |
|                              ...                              |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Metadata

Metadata is included at the beginning of every DBN stream or file. Basic information is found at the start of the metadata, followed by optional symbology mappings.

The metadata contains all of the parameters needed to construct a request for the exact same data via Databento's historical API. Likewise, if you fetched DBN-encoded data from Databento's historical API, the metadata header will contain the parameters of your original request.

The following table describes the metadata fields, in the order of appearance. All fields are little-endian.

Field Type Description
version char[4] "DBN" followed by the version of DBN the file is encoded in as a u8.
length uint32_t The length of the remaining metadata header, i.e. excluding version and length.
dataset char[16] The dataset ID (string identifier).
schema uint16_t The data record schema. u16::MAX indicates a potential mix of schemas and record types, which will always be the case for live data.
start uint64_t The start time of query range in UNIX epoch nanoseconds.
end uint64_t The end time of query range in UNIX epoch nanoseconds. u64::MAX indicates no end time was provided.
limit uint64_t The maximum number of records to return. 0 indicates no limit.
stype_in uint8_t The symbology type of input symbols. u8::MAX indicates a potential mix of types, such as with live data.
stype_out uint8_t The symbology type of output symbols.
ts_out uint8_t Whether each record has an appended gateway send timestamp.
symbol_cstr_len uint16_t The number of bytes in fixed-length string symbols, including a null terminator byte. Version 2 only, symbol strings are always 22 in version 1.
schema_definition_length uint32_t Number of bytes in the schema definition.
schema_definition uint8_t[schema_definition_length] Self-describing schema to be implemented in the future.
symbols_length uint32_t Number of symbols in the original query.
symbols char[symbols_length][symbol_cstr_len] The symbols from the original query.
partial_length uint32_t The number of symbols partially resolved.
partial char[partial_length][symbol_cstr_len] The partially resolved symbols.
not_found_length uint32_t The number of unresolved symbols.
not_found char[not_found_length][symbol_cstr_len] The unresolved symbols.
mappings_length uint32_t The number of symbols at least partially resolved.
mappings SymbolMapping[mappings_length] The SymbolMappings, one for each resolved symbol.

where SymbolMapping has the following structure:

Field Type Description
raw_symbol char[symbol_cstr_len] The symbol requested symbol stype_in.
interval_length uint32_t The number of MappingIntervals in intervals.
intervals MappingInterval[interval_length] The MappingIntervals associated with raw_symbol.

and where MappingInterval has the following structure:

Field Type Description
start_date uint32_t The start date of the interval, as a YYYYMMDD integer.
end_date uint32_t The end date of the interval, as a YYYYMMDD integer.
symbol char[symbol_cstr_len] The symbol in stype_out to which raw_symbol corresponds for the interval between start_date and end_date, where symbol_cstr_len is specified earlier in the Metadata. This is often instrument_id because it is the default stype_out.

Records

The metadata is immediately followed by DBN records. A valid DBN stream or file contains zero or more records.

All records begin with the same 16-byte RecordHeader with the following structure:

Field Type Description
length uint8_t The length of the record in 32-bit words.
rtype uint8_t The record type. Each schema corresponds with a single rtype value. See Rtype.
publisher_id uint16_t The publisher ID assigned by Databento, which denotes the dataset and venue.
instrument_id uint32_t The numeric instrument ID.
ts_event uint64_t The event timestamp as the number of nanoseconds since the UNIX epoch.

See the Schemas and data formats section for a full list of fields for the record associated with each schema.

Versioning

We use the version field in the metadata header to signal changes to the structure of record types and metadata.

Version 2

The following was changed:

  • Metadata:
    • Sets version to 2
    • Adds symbol_cstr_len field
    • Rearranges padding
    • The fixed-length strings for symbology are now defined to have symbol_cstr_len characters (currently 71), whereas in version 1 they always had 22
  • InstrumentDefMsg (definition schema):
    • raw_symbol now has symbol_cstr_len characters (71)
    • Rearranges padding
  • SymbolMappingMsg (live symbology):
    • stype_in_symbol and stype_out_symbol now have symbol_cstr_len characters (71)
    • Adds stype_in and stype_out fields
    • Removes padding
  • ErrorMsg (gateway errors in live)
    • Adds space to err for longer error messages
    • Adds code and is_last fields
  • SystemMsg (non-error gateway messages in live)
    • Add space to msg for longer messages
    • Adds code field
Version 3

This set of changes adds support for strategy legs to the definition schema and an expanded quantity field in the statistics schemas.

  • Added 8-byte alignment padding to the end of metadata
  • Expanded quantity to 64 bits in StatMsg (statistics schema)
  • InstrumentDefMsg (definition schema):
    • A definition record will be created for each strategy leg
    • Adds the following leg fields:
      • leg_count
      • leg_index
      • leg_instrument_id
      • leg_raw_symbol
      • leg_side
      • leg_underlying_id
      • leg_instrument_class
      • leg_ratio_qty_numerator
      • leg_ratio_qty_denominator
      • leg_ratio_price_numerator
      • leg_ratio_price_denominator
      • leg_price
      • leg_delta
    • Expands asset to 11 bytes
    • Expands raw_instrument_id to 64 bits to support publishers that use larger IDs
    • Removal of statistics-schema related fields trading_reference_price, trading_reference_date, and settl_price_type
    • Removal of the status-schema related field md_security_trading_status
Info
Info

CSV and JSON are affected by the new fields.

Currently, version 2 is used for the IFEU.IMPACT and NDEX.IMPACT datasets. The DBN crate and client libraries will continue to support decoding version 1 data.

Upgrading to versions

DBN version 1 files can be upgraded to version 2 with the dbn CLI tool by passing the --upgrade or -u flag.

dbn version1.dbn --output version2.dbn --upgrade

Comparison with other encodings and formats

DBN is designed specifically for normalized market data. It adopts a fixed set of struct definitions, also called message schemas, for this purpose. It's important to note that DBN is not a general-purpose serialization format like Simple Binary Encoding (SBE) or Google Protocol Buffers (protobufs), which provide a flexible schema definition language. Unlike these formats, DBN doesn't offer tools for generating decoders or encoders from any user-specified schema.

When comparing DBN to most encodings or serialization formats, a critical difference is that DBN is a zero-copy encoding. Moreover, what makes DBN most unique is that it's simultaneously intended for three common use cases in a trading system: file format, real-time messaging format, and in-memory representation. This is a very specific convergence of use cases that manifests frequently in financial trading systems.

Other encodings or formats typically used in situations where DBN would be a suitable replacement include:

  • SBE
  • Apache Parquet
  • Apache Arrow
  • Feather

Comparing to these, DBN is intended to be good at all three abovementioned use cases of a trading system—so you don't have to mix multiple serialization formats in one system—while the others tend to excel only in one or two use cases.

A single format for all use cases carries a more important benefit for trading than just the performance upside that comes with minimizing copies; it ensures that market data is immutable when it passes through your trading system. The following diagram helps you visualize the difference between a potential trading system that uses DBN compared to a typical trading system that doesn't.

Typical trading environment (top chart) Trading environment using DBN (bottom chart)
Market data in multiple message, file, and in-memory formats Market data in a single format
Multiple layers of serialization and deserialization No transformation of data
Incurs risk of inconsistent state between components using market data Eliminates risk of inconsistent state between components using market data
Complex, slow code Simple, fast code

Typical trading environment

Typical trading environment

Trading environment using DBN

Trading environment using DBN

Immutable market data makes it easy to align live trading with post-production logs and historical data; it makes it easy to use the same code for live trading, backtesting, and exploratory research; it also makes it easy to write GUIs that need accurately synchronize order events with market data events, especially market data events that triggered those order events. In short, using the same encoding or format everywhere ensures that state is synchronized throughout distributed parts of your trading system.

For these reasons, most mature trading firms eventually end up implementing their own proprietary encoding that resembles DBN.

The following table summarizes other key comparisons:

DBN SBE Parquet Arrow
Schema definition Fixed schemas XML Thrift, Avro, Protobuf Arrow object model
Layout Sequential Sequential Column-oriented Column-oriented
Zero copy Yes Yes No Limited support
Suitable for real-time messaging Yes Yes No No
Suitable as file format Yes Yes Yes Through Feather
Metadata Yes No, user-defined No, user-defined No
Sequential read Fastest Fast Moderate Moderate
Sequential write Fastest Fast Slowest Moderate (Feather)
Compressed size Small Moderate Smallest Largest (Feather)
Transcoding to CSV Yes No Through pandas Yes
Transcoding to JSON Yes No Through pandas No
Mapping to pandas Yes No Yes Yes
Package size (lines of code) 16.0k (v0.14.0) 55.7k (v1.27.0) 108.5k (v1.12.3) 1.6M (v12.0.0)
Language support Python, C++, Rust, C bindings C++, Java, C# 11+ languages 11+ languages
Use case Market data (storage, replay, research, real-time messaging, normalization, OMS, EMS, GUIs) Direct venue connectivity Storage file format Data exploration

Frequently asked questions

Isn't this basically a bunch of raw structs? What's so special about this?

Yes, pretty much! And it's not exactly novel—in our experience, most top-tier trading firms will have something similar already implemented, along with proprietary tooling to support it. The significance of DBN is that we're open sourcing the whole toolset, with many best practices for normalization and performance optimization, so that you don't have to reinvent the wheel.

Another purpose of DBN is that it provides a standardized data interchange format that can be used for high-throughput, low latency streaming between a data provider like us and you. At the time of the initial release of DBN, we're not aware of any data provider that adopts a binary flat file or messaging format with similar zero-copy semantics.

Even if you don't want to use DBN exactly, it's a lightweight specification so it's easy to mimic some of its practices or fork our reference implementation for your own use case.

When should you not use DBN?

  • When you depend on many tools and frameworks in the Apache ecosystem. Many data processing tools have native support for Apache Parquet and Arrow. However in our experience, mature trading environments generally use fewer general-purpose computation frameworks. In these cases, DBN is still an excellent as an interchange format for receiving and storing data from Databento, and we still support converting DBN to pandas dataframes and hence Arrow format.
  • If you don't use Databento, only ever plan on trading on one trading venue, have already written parsers for the raw feeds, and have direct extranet connectivity, then there's a strong argument for just using the original wire protocol like ITCH and even rolling your own, thinner normalization format.
  • If you have an academic or toy project and only plan on working with historical data. Many such projects employ relatively small amounts of data and don't require live data. In these circumstances, it makes sense to just store the data in your own, thinner binary format or a binary format with some structure like Parquet or HDF5.
  • If you have to support many teams with one platform, with different trading styles and business functions, of which many of them only require low frequency data. In this situation, the performance benefits of fixed schemas become much less important within your organization, and the flexibility becomes more important. It's also quite likely in this situation that you have to constantly update your normalization format for new exploratory workflows. In those cases, DBN is still an excellent as an interchange format for receiving and storing data from Databento, but your firm will likely benefit from converting DBN data into flexible formats downstream.

Encoding, serialization format, protocol—what's the difference?

There are slight differences in these terms but DBN is all three at once.

One way to look at DBN is that it's an OSI layer 6 presentation protocol, much like SBE—except that DBN is much stricter about message schemas, whereas SBE is flexible.

Data written in accordance to such a protocol can be persisted to disk, so it can also serve as a storage format; it can also be written on the wire as a message encoding or wire format. SBE excels as a message encoding but is less often used as a storage format, whereas DBN accommodates both equally well.

Why is the reference implementation written in Rust? Can I use it?

The majority of Databento's infrastructure is written in Rust, C, and Python, and the reference implementation of DBN is a rewrite of the original C implementation and was originally written with internal use in mind.

Rust's C ABI makes it interoperable across multiple languages with thin bindings, and its memory management model makes it safe and performant. It's simple to integrate the Rust DBN library into your Python application, as seen in our Python client library.

Why is DBN sequential and not column-oriented? Aren't modern column-oriented layouts more optimized for querying?

The sequential layout of DBN makes it more performant for real-time messaging use cases.

Column-oriented formats do have the theoretical potential for more optimization in non-real time use cases, but this depends on the actual implementation. Our reference DBN implementation is heavily optimized and still on par with column-oriented formats like Apache Arrow on common use cases for historical market data.

Am I locked-in to a proprietary binary format here?

No, the DBN reference implementation is open-sourced under the permissive Apache 2.0 License. We also provide transcoders to convert your DBN data into CSV and JSON.

How is DBN being used currently?

We store upwards of 4 PB and over 30 trillion records of normalized historical data in DBN internally at Databento. Every single message that passes through our infrastructure gets encoded in DBN—over billions of messages per day, at single-port peak messaging rates over 60 Gbps, spanning multiple asset classes and over 1.8 million instruments on any given day. It is used for all of our data schemas, including full order book, tick-by-tick trades, top of the book, OHLCV aggregates, venue statistics, instrument definitions, and more.

Most of users, including some of the world's largest hedge funds and market making firms, are already using DBN through our client libraries, putting it through multiple production use cases that involve real-time streaming and historical flat files.