Databento Binary Encoding

Databento Binary Encoding (DBN) is an extremely fast message encoding and storage format for normalized market data. The DBN specification includes a simple, self-describing metadata header and a fixed set of struct definitions, which enforce a standardized way to normalize market data.

All official Databento client libraries use DBN under the hood, both as a data interchange format and for in-memory representation of data. DBN is also the default encoding for all Databento APIs, including live data streaming, historical data streaming, and batch flat files.

Getting started with DBN

The easiest way to get started with DBN is through any of the official Databento client libraries, which support reading and writing DBN files. See:

The DBNStore.from_file and DBNStore.to_file methods for Python
The DbnFileStore::Replay method for C++
The AsyncDbnStore::from_file method, and databento::dbn::encode module for Rust

Other resources are also available:

The dbn Rust crate is the reference implementation of DBN and provides a library for decoding and encoding DBN, and also for converting from DBN to CSV and JSON.
dbn-cli is a command line tool that can be used to read DBN files; transcode DBN to CSV or JSON; print the output, or write it to disk. You can install it with cargo install dbn-cli.

Why should you use DBN?

The key advantages of using DBN are:

End-to-end. DBN can be used to store and transport normalized data across all components of a typical trading system. It can fulfill the requirements of a file format for efficient storage; a message encoding for fast real-time streaming, and an in-memory representation of market data for a low latency system—all at once. This simplifies your trading system and eliminates the use of multiple serialization formats. It also ensures that market data is immutable, lossless, and consistent as it passes between components.
Use the same code for historical and live. Our client libraries exploit the end-to-end aspect of DBN and allow you to use the same code in historical and live; you can write an event-driven trading platform that runs the exact same code in backtest and production trading.
Zero-copy. DBN data is structured the same way whether in-memory, on the wire, or on disk. Thus, the data gets read and written directly as-is, without costly encoding or decoding steps that move the data into CPU and back out.
Symbology metadata. The DBN protocol includes a lightweight header that provides metadata for interpreting and using the market data payload, such as symbology mappings, so a DBN file is self-sufficient for many use cases.
Highly compressible. DBN strictly uses fixed lengths and offsets for all fields. This layout enables typical compression algorithms, such as zstd and lz4, to achieve high compression ratios.
Optimized for modern CPUs. The predictable layout of DBN records also allows for highly-optimized, sequential access patterns that take advantage of instruction pipelining and prefetching on modern CPUs. The struct definitions are also deliberately designed so that most records fit into a single cache line.
Extremely fast. DBN achieves extremely fast speeds on reads and writes. Most use cases of DBN are compression-bound or I/O-bound while only using a single CPU core. DBN has been used in environments with 6.1 microseconds median internal latency; we've also seen user-reported benchmarks of full order book replay over 19.1 million messages per second using our C++ client library on a Google Cloud VM.
Normalization format. Using DBN also automatically means you're adopting its normalization format. While there are many ways to normalize data, our team arrived at these best practices after many years of combined experience at top-tier trading firms and integrating dozens of trading venues. For example, DBN allows you to replay full order book data at I/O bound, backtest with passive orders in precise sequence, and losslessly achieve much of what's possible with raw packet captures with several factors of improvement in speed and storage requirements.

Layout

A valid DBN stream or file has two parts, beginning with metadata, and immediately followed by records.

The following diagram shows the field layout of the DBN encoding:

Version 1

 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|             magic string = "DBN"              |  version = 1  |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                            length                             |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                                                               |
+                                                               +
|                                                               |
+                            dataset                            +
|                                                               |
+                                                               +
|                                                               |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|             schema            |                               |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+                               +
|                       start (UNIX nanos)                      |
+                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                               |                               |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+                               +
|                        end (UNIX nanos)                       |
+                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                               |                               |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+                               +
|                      limit (max records)                      |
+                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                               |                               |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+                               +
|                            reserved                           |
+                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                               |   stype_in    |   stype_out   |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|    ts_out     |                                               |
+-+-+-+-+-+-+-+-+                                               +
|                                                               |
+                                                               +
|                                                               |
+                                                               +
|                                                               |
+                                                               +
|                                                               |
+                                                               +
|                                                               |
+                                                               +
|                 reserved (47 bytes of padding)                |
+                                                               +
|                                                               |
+                                                               +
|                                                               |
+                                                               +
|                                                               |
+                                                               +
|                                                               |
+                                                               +
|                                                               |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                    schema_definition_length                   |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                  schema_definition (variable)                 |
|                              ...                              |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                         symbols_count                         |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                       symbols (variable)                      |
|                              ...                              |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                         partial_count                         |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                       partial (variable)                      |
|                              ...                              |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                        not_found_count                        |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                      not_found (variable)                     |
|                              ...                              |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                         mappings_count                        |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                      mappings (variable)                      |
|                              ...                              |
+-+-+-+-+-+-+-+-+-+-end metadata; begin body--+-+-+-+-+-+-+-+-+-+
|                            records                            |
|                              ...                              |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Versions 2 and above

 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|             magic string = "DBN"              |    version    |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                            length                             |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                                                               |
+                                                               +
|                                                               |
+                            dataset                            +
|                                                               |
+                                                               +
|                                                               |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|             schema            |                               |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+                               +
|                       start (UNIX nanos)                      |
+                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                               |                               |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+                               +
|                        end (UNIX nanos)                       |
+                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                               |                               |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+                               +
|                      limit (max records)                      |
+                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                               |   stype_in    |   stype_out   |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|    ts_out     |        symbol_cstr_len        |               |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+               |
|                                                               |
+                                                               +
|                                                               |
+                                                               +
|                                                               |
+                                                               +
|                                                               |
+                                                               +
|                                                               |
+                                                               +
|                                                               |
+                                                               +
|                 reserved (53 bytes of padding)                |
+                                                               +
|                                                               |
+                                                               +
|                                                               |
+                                                               +
|                                                               |
+                                                               +
|                                                               |
+                                                               +
|                                                               |
+                                                               +
|                                                               |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                    schema_definition_length                   |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                  schema_definition (variable)                 |
|                              ...                              |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                         symbols_count                         |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                       symbols (variable)                      |
|                              ...                              |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                         partial_count                         |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                       partial (variable)                      |
|                              ...                              |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                        not_found_count                        |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                      not_found (variable)                     |
|                              ...                              |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                         mappings_count                        |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                      mappings (variable)                      |
|                              ...                              |
+-+-+-+-+-+-+-+-+-+-end metadata; begin body--+-+-+-+-+-+-+-+-+-+
|                            records                            |
|                              ...                              |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Metadata

Metadata is included at the beginning of every DBN stream or file. Basic information is found at the start of the metadata, followed by optional symbology mappings.

The metadata contains all of the parameters needed to construct a request for the exact same data via Databento's historical API. Likewise, if you fetched DBN-encoded data from Databento's historical API, the metadata header will contain the parameters of your original request.

The following table describes the metadata fields, in the order of appearance. All fields are little-endian.

Field	Type	Description
`version`	char[4]	`"DBN"` followed by the version of DBN the file is encoded in as a `u8`.
`length`	uint32_t	The length of the remaining metadata header, i.e. excluding `version` and `length`.
`dataset`	char[16]	The dataset ID (string identifier).
`schema`	uint16_t	The data record schema. `u16::MAX` indicates a potential mix of schemas and record types, which will always be the case for live data.
`start`	uint64_t	The start time of query range in UNIX epoch nanoseconds.
`end`	uint64_t	The end time of query range in UNIX epoch nanoseconds. `u64::MAX` indicates no end time was provided.
`limit`	uint64_t	The maximum number of records to return. 0 indicates no limit.
`stype_in`	uint8_t	The symbology type of input symbols. `u8::MAX` indicates a potential mix of types, such as with live data.
`stype_out`	uint8_t	The symbology type of output symbols.
`ts_out`	uint8_t	Whether each record has an appended gateway send timestamp.
`symbol_cstr_len`	uint16_t	The number of bytes in fixed-length string symbols, including a null terminator byte. Version 2 only, symbol strings are always 22 in version 1.
`schema_definition_length`	uint32_t	Number of bytes in the schema definition.
`schema_definition`	uint8_t[schema_definition_length]	Self-describing schema to be implemented in the future.
`symbols_length`	uint32_t	Number of symbols in the original query.
`symbols`	char[symbols_length][symbol_cstr_len]	The symbols from the original query.
`partial_length`	uint32_t	The number of symbols partially resolved.
`partial`	char[partial_length][symbol_cstr_len]	The partially resolved symbols.
`not_found_length`	uint32_t	The number of unresolved symbols.
`not_found`	char[not_found_length][symbol_cstr_len]	The unresolved symbols.
`mappings_length`	uint32_t	The number of symbols at least partially resolved.
`mappings`	SymbolMapping[mappings_length]	The `SymbolMapping`s, one for each resolved symbol.

where SymbolMapping has the following structure:

Field	Type	Description
`raw_symbol`	char[symbol_cstr_len]	The symbol requested symbol `stype_in`.
`interval_length`	uint32_t	The number of `MappingInterval`s in `intervals`.
`intervals`	MappingInterval[interval_length]	The `MappingInterval`s associated with `raw_symbol`.

and where MappingInterval has the following structure:

Field	Type	Description
`start_date`	uint32_t	The start date of the interval, as a YYYYMMDD integer.
`end_date`	uint32_t	The end date of the interval, as a YYYYMMDD integer.
`symbol`	char[symbol_cstr_len]	The symbol in `stype_out` to which `raw_symbol` corresponds for the interval between `start_date` and `end_date`, where `symbol_cstr_len` is specified earlier in the Metadata. This is often `instrument_id` because it is the default `stype_out`.

Records

The metadata is immediately followed by DBN records. A valid DBN stream or file contains zero or more records.

All records begin with the same 16-byte RecordHeader with the following structure:

Field	Type	Description
`length`	uint8_t	The length of the record in 32-bit words.
`rtype`	uint8_t	The record type. Each schema corresponds with a single `rtype` value. See Rtype.
`publisher_id`	uint16_t	The publisher ID assigned by Databento, which denotes the dataset and venue.
`instrument_id`	uint32_t	The numeric instrument ID.
`ts_event`	uint64_t	The event timestamp as the number of nanoseconds since the UNIX epoch.

See the Schemas and data formats section for a full list of fields for the record associated with each schema.

Versioning

We use the version field in the metadata header to signal changes to the structure of record types and metadata.

Version 2

The following was changed:

Metadata:
- Sets version to 2
- Adds symbol_cstr_len field
- Rearranges padding
- The fixed-length strings for symbology are now defined to have symbol_cstr_len characters (currently 71), whereas in version 1 they always had 22
InstrumentDefMsg (definition schema):
- raw_symbol now has symbol_cstr_len characters (71)
- Rearranges padding
SymbolMappingMsg (live symbology):
- stype_in_symbol and stype_out_symbol now have symbol_cstr_len characters (71)
- Adds stype_in and stype_out fields
- Removes padding
ErrorMsg (gateway errors in live)
- Adds space to err for longer error messages
- Adds code and is_last fields
SystemMsg (non-error gateway messages in live)
- Add space to msg for longer messages
- Adds code field

Version 3

This set of changes adds support for strategy legs to the definition schema and an expanded quantity field in the statistics schemas.

Added 8-byte alignment padding to the end of metadata
Expanded quantity to 64 bits in StatMsg (statistics schema)
InstrumentDefMsg (definition schema):
- A definition record will be created for each strategy leg
- Adds the following leg fields:
  - leg_count
  - leg_index
  - leg_instrument_id
  - leg_raw_symbol
  - leg_side
  - leg_underlying_id
  - leg_instrument_class
  - leg_ratio_qty_numerator
  - leg_ratio_qty_denominator
  - leg_ratio_price_numerator
  - leg_ratio_price_denominator
  - leg_price
  - leg_delta
- Expands asset to 11 bytes
- Expands raw_instrument_id to 64 bits to support publishers that use larger IDs
- Removal of statistics-schema related fields trading_reference_price, trading_reference_date, and settl_price_type
- Removal of the status-schema related field md_security_trading_status

Info
CSV and JSON are affected by the new fields.

Currently, version 2 is used for the IFEU.IMPACT and NDEX.IMPACT datasets. The DBN crate and client libraries will continue to support decoding version 1 data.

Upgrading to versions

DBN version 1 files can be upgraded to version 2 with the dbn CLI tool by passing the --upgrade or -u flag.

dbn version1.dbn --output version2.dbn --upgrade

Comparison with other encodings and formats

DBN is designed specifically for normalized market data. It adopts a fixed set of struct definitions, also called message schemas, for this purpose. It's important to note that DBN is not a general-purpose serialization format like Simple Binary Encoding (SBE) or Google Protocol Buffers (protobufs), which provide a flexible schema definition language. Unlike these formats, DBN doesn't offer tools for generating decoders or encoders from any user-specified schema.

When comparing DBN to most encodings or serialization formats, a critical difference is that DBN is a zero-copy encoding. Moreover, what makes DBN most unique is that it's simultaneously intended for three common use cases in a trading system: file format, real-time messaging format, and in-memory representation. This is a very specific convergence of use cases that manifests frequently in financial trading systems.

Other encodings or formats typically used in situations where DBN would be a suitable replacement include:

SBE
Apache Parquet
Apache Arrow
Feather

Comparing to these, DBN is intended to be good at all three abovementioned use cases of a trading system—so you don't have to mix multiple serialization formats in one system—while the others tend to excel only in one or two use cases.

A single format for all use cases carries a more important benefit for trading than just the performance upside that comes with minimizing copies; it ensures that market data is immutable when it passes through your trading system. The following diagram helps you visualize the difference between a potential trading system that uses DBN compared to a typical trading system that doesn't.

Typical trading environment (top chart)	Trading environment using DBN (bottom chart)
Market data in multiple message, file, and in-memory formats	Market data in a single format
Multiple layers of serialization and deserialization	No transformation of data
Incurs risk of inconsistent state between components using market data	Eliminates risk of inconsistent state between components using market data
Complex, slow code	Simple, fast code

Typical trading environment

Trading environment using DBN

Immutable market data makes it easy to align live trading with post-production logs and historical data; it makes it easy to use the same code for live trading, backtesting, and exploratory research; it also makes it easy to write GUIs that need accurately synchronize order events with market data events, especially market data events that triggered those order events. In short, using the same encoding or format everywhere ensures that state is synchronized throughout distributed parts of your trading system.

For these reasons, most mature trading firms eventually end up implementing their own proprietary encoding that resembles DBN.

The following table summarizes other key comparisons:

	DBN	SBE	Parquet	Arrow
Schema definition	Fixed schemas	XML	Thrift, Avro, Protobuf	Arrow object model
Layout	Sequential	Sequential	Column-oriented	Column-oriented
Zero copy	Yes	Yes	No	Limited support
Suitable for real-time messaging	Yes	Yes	No	No
Suitable as file format	Yes	Yes	Yes	Through Feather
Metadata	Yes	No, user-defined	No, user-defined	No
Sequential read	Fastest	Fast	Moderate	Moderate
Sequential write	Fastest	Fast	Slowest	Moderate (Feather)
Compressed size	Small	Moderate	Smallest	Largest (Feather)
Transcoding to CSV	Yes	No	Through pandas	Yes
Transcoding to JSON	Yes	No	Through pandas	No
Mapping to pandas	Yes	No	Yes	Yes
Package size (lines of code)	16.0k (v0.14.0)	55.7k (v1.27.0)	108.5k (v1.12.3)	1.6M (v12.0.0)
Language support	Python, C++, Rust, C bindings	C++, Java, C#	11+ languages	11+ languages
Use case	Market data (storage, replay, research, real-time messaging, normalization, OMS, EMS, GUIs)	Direct venue connectivity	Storage file format	Data exploration

Frequently asked questions

Isn't this basically a bunch of raw structs? What's so special about this?

Yes, pretty much! And it's not exactly novel—in our experience, most top-tier trading firms will have something similar already implemented, along with proprietary tooling to support it. The significance of DBN is that we're open sourcing the whole toolset, with many best practices for normalization and performance optimization, so that you don't have to reinvent the wheel.

Another purpose of DBN is that it provides a standardized data interchange format that can be used for high-throughput, low latency streaming between a data provider like us and you. At the time of the initial release of DBN, we're not aware of any data provider that adopts a binary flat file or messaging format with similar zero-copy semantics.

Even if you don't want to use DBN exactly, it's a lightweight specification so it's easy to mimic some of its practices or fork our reference implementation for your own use case.

When should you not use DBN?

When you depend on many tools and frameworks in the Apache ecosystem. Many data processing tools have native support for Apache Parquet and Arrow. However in our experience, mature trading environments generally use fewer general-purpose computation frameworks. In these cases, DBN is still an excellent as an interchange format for receiving and storing data from Databento, and we still support converting DBN to pandas dataframes and hence Arrow format.
If you don't use Databento, only ever plan on trading on one trading venue, have already written parsers for the raw feeds, and have direct extranet connectivity, then there's a strong argument for just using the original wire protocol like ITCH and even rolling your own, thinner normalization format.
If you have an academic or toy project and only plan on working with historical data. Many such projects employ relatively small amounts of data and don't require live data. In these circumstances, it makes sense to just store the data in your own, thinner binary format or a binary format with some structure like Parquet or HDF5.
If you have to support many teams with one platform, with different trading styles and business functions, of which many of them only require low frequency data. In this situation, the performance benefits of fixed schemas become much less important within your organization, and the flexibility becomes more important. It's also quite likely in this situation that you have to constantly update your normalization format for new exploratory workflows. In those cases, DBN is still an excellent as an interchange format for receiving and storing data from Databento, but your firm will likely benefit from converting DBN data into flexible formats downstream.

Encoding, serialization format, protocol—what's the difference?

There are slight differences in these terms but DBN is all three at once.

One way to look at DBN is that it's an OSI layer 6 presentation protocol, much like SBE—except that DBN is much stricter about message schemas, whereas SBE is flexible.

Data written in accordance to such a protocol can be persisted to disk, so it can also serve as a storage format; it can also be written on the wire as a message encoding or wire format. SBE excels as a message encoding but is less often used as a storage format, whereas DBN accommodates both equally well.

Why is the reference implementation written in Rust? Can I use it?

The majority of Databento's infrastructure is written in Rust, C, and Python, and the reference implementation of DBN is a rewrite of the original C implementation and was originally written with internal use in mind.

Rust's C ABI makes it interoperable across multiple languages with thin bindings, and its memory management model makes it safe and performant. It's simple to integrate the Rust DBN library into your Python application, as seen in our Python client library.

Why is DBN sequential and not column-oriented? Aren't modern column-oriented layouts more optimized for querying?

The sequential layout of DBN makes it more performant for real-time messaging use cases.

Column-oriented formats do have the theoretical potential for more optimization in non-real time use cases, but this depends on the actual implementation. Our reference DBN implementation is heavily optimized and still on par with column-oriented formats like Apache Arrow on common use cases for historical market data.

Am I locked-in to a proprietary binary format here?

No, the DBN reference implementation is open-sourced under the permissive Apache 2.0 License. We also provide transcoders to convert your DBN data into CSV and JSON.

How is DBN being used currently?

We store upwards of 4 PB and over 30 trillion records of normalized historical data in DBN internally at Databento. Every single message that passes through our infrastructure gets encoded in DBN—over billions of messages per day, at single-port peak messaging rates over 60 Gbps, spanning multiple asset classes and over 1.8 million instruments on any given day. It is used for all of our data schemas, including full order book, tick-by-tick trades, top of the book, OHLCV aggregates, venue statistics, instrument definitions, and more.

Most of users, including some of the world's largest hedge funds and market making firms, are already using DBN through our client libraries, putting it through multiple production use cases that involve real-time streaming and historical flat files.