CSV and JSON are affected by the new fields.
Databento Binary Encoding
Databento Binary Encoding (DBN) is an extremely fast message encoding and storage format for normalized market data. The DBN specification includes a simple, self-describing metadata header and a fixed set of struct definitions, which enforce a standardized way to normalize market data.
All official Databento client libraries use DBN under the hood, both as a data interchange format and for in-memory representation of data. DBN is also the default encoding for all Databento APIs, including live data streaming, historical data streaming, and batch flat files.
Getting started with DBN
The easiest way to get started with DBN is through any of the official Databento client libraries, which support reading and writing DBN files. See:
- The
DBNStore.from_file
andDBNStore.to_file
methods for Python - The
DbnFileStore::Replay
method for C++ - The
AsyncDbnStore::from_file
method, anddatabento::dbn::encode
module for Rust
Other resources are also available:
- The dbn Rust crate is the reference implementation of DBN and provides a library for decoding and encoding DBN, and also for converting from DBN to CSV and JSON.
- dbn-cli is a command line tool that can be used
to read DBN files; transcode DBN to CSV or JSON; print the output, or write it to
disk.
You can install it with
cargo install dbn-cli
.
Why should you use DBN?
The key advantages of using DBN are:
- End-to-end. DBN can be used to store and transport normalized data across all components of a typical trading system. It can fulfill the requirements of a file format for efficient storage; a message encoding for fast real-time streaming, and an in-memory representation of market data for a low latency system—all at once. This simplifies your trading system and eliminates the use of multiple serialization formats. It also ensures that market data is immutable, lossless, and consistent as it passes between components.
- Use the same code for historical and live. Our client libraries exploit the end-to-end aspect of DBN and allow you to use the same code in historical and live; you can write an event-driven trading platform that runs the exact same code in backtest and production trading.
- Zero-copy. DBN data is structured the same way whether in-memory, on the wire, or on disk. Thus, the data gets read and written directly as-is, without costly encoding or decoding steps that move the data into CPU and back out.
- Symbology metadata. The DBN protocol includes a lightweight header that provides metadata for interpreting and using the market data payload, such as symbology mappings, so a DBN file is self-sufficient for many use cases.
- Highly compressible. DBN strictly uses fixed lengths and offsets for all fields. This layout enables typical compression algorithms, such as zstd and lz4, to achieve high compression ratios.
- Optimized for modern CPUs. The predictable layout of DBN records also allows for highly-optimized, sequential access patterns that take advantage of instruction pipelining and prefetching on modern CPUs. The struct definitions are also deliberately designed so that most records fit into a single cache line.
- Extremely fast. DBN achieves extremely fast speeds on reads and writes. Most use cases of DBN are compression-bound or I/O-bound while only using a single CPU core. DBN has been used in environments with 6.1 microseconds median internal latency; we've also seen user-reported benchmarks of full order book replay over 19.1 million messages per second using our C++ client library on a Google Cloud VM.
- Normalization format. Using DBN also automatically means you're adopting its normalization format. While there are many ways to normalize data, our team arrived at these best practices after many years of combined experience at top-tier trading firms and integrating dozens of trading venues. For example, DBN allows you to replay full order book data at I/O bound, backtest with passive orders in precise sequence, and losslessly achieve much of what's possible with raw packet captures with several factors of improvement in speed and storage requirements.
Layout
A valid DBN stream or file has two parts, beginning with metadata, and immediately followed by records.
The following diagram shows the field layout of the DBN encoding:
Version 1
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| magic string = "DBN" | version = 1 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| length |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |
+ +
| |
+ dataset +
| |
+ +
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| schema | |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +
| start (UNIX nanos) |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| | |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +
| end (UNIX nanos) |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| | |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +
| limit (max records) |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| | |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +
| reserved |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| | stype_in | stype_out |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| ts_out | |
+-+-+-+-+-+-+-+-+ +
| |
+ +
| |
+ +
| |
+ +
| |
+ +
| |
+ +
| reserved (47 bytes of padding) |
+ +
| |
+ +
| |
+ +
| |
+ +
| |
+ +
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| schema_definition_length |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| schema_definition (variable) |
| ... |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| symbols_count |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| symbols (variable) |
| ... |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| partial_count |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| partial (variable) |
| ... |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| not_found_count |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| not_found (variable) |
| ... |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| mappings_count |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| mappings (variable) |
| ... |
+-+-+-+-+-+-+-+-+-+-end metadata; begin body--+-+-+-+-+-+-+-+-+-+
| records |
| ... |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Versions 2 and above
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| magic string = "DBN" | version |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| length |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |
+ +
| |
+ dataset +
| |
+ +
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| schema | |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +
| start (UNIX nanos) |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| | |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +
| end (UNIX nanos) |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| | |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +
| limit (max records) |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| | stype_in | stype_out |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| ts_out | symbol_cstr_len | |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
| |
+ +
| |
+ +
| |
+ +
| |
+ +
| |
+ +
| |
+ +
| reserved (53 bytes of padding) |
+ +
| |
+ +
| |
+ +
| |
+ +
| |
+ +
| |
+ +
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| schema_definition_length |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| schema_definition (variable) |
| ... |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| symbols_count |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| symbols (variable) |
| ... |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| partial_count |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| partial (variable) |
| ... |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| not_found_count |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| not_found (variable) |
| ... |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| mappings_count |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| mappings (variable) |
| ... |
+-+-+-+-+-+-+-+-+-+-end metadata; begin body--+-+-+-+-+-+-+-+-+-+
| records |
| ... |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Metadata
Metadata is included at the beginning of every DBN stream or file. Basic information is found at the start of the metadata, followed by optional symbology mappings.
The metadata contains all of the parameters needed to construct a request for the exact same data via Databento's historical API. Likewise, if you fetched DBN-encoded data from Databento's historical API, the metadata header will contain the parameters of your original request.
The following table describes the metadata fields, in the order of appearance. All fields are little-endian.
Field | Type | Description |
---|---|---|
version |
char[4] | "DBN" followed by the version of DBN the file is encoded in as a u8 . |
length |
uint32_t | The length of the remaining metadata header, i.e. excluding version and length . |
dataset |
char[16] | The dataset ID (string identifier). |
schema |
uint16_t | The data record schema. u16::MAX indicates a potential mix of schemas and record types, which will always be the case for live data. |
start |
uint64_t | The start time of query range in UNIX epoch nanoseconds. |
end |
uint64_t | The end time of query range in UNIX epoch nanoseconds. u64::MAX indicates no end time was provided. |
limit |
uint64_t | The maximum number of records to return. 0 indicates no limit. |
stype_in |
uint8_t | The symbology type of input symbols. u8::MAX indicates a potential mix of types, such as with live data. |
stype_out |
uint8_t | The symbology type of output symbols. |
ts_out |
uint8_t | Whether each record has an appended gateway send timestamp. |
symbol_cstr_len |
uint16_t | The number of bytes in fixed-length string symbols, including a null terminator byte. Version 2 only, symbol strings are always 22 in version 1. |
schema_definition_length |
uint32_t | Number of bytes in the schema definition. |
schema_definition |
uint8_t[schema_definition_length] | Self-describing schema to be implemented in the future. |
symbols_length |
uint32_t | Number of symbols in the original query. |
symbols |
char[symbols_length][symbol_cstr_len] | The symbols from the original query. |
partial_length |
uint32_t | The number of symbols partially resolved. |
partial |
char[partial_length][symbol_cstr_len] | The partially resolved symbols. |
not_found_length |
uint32_t | The number of unresolved symbols. |
not_found |
char[not_found_length][symbol_cstr_len] | The unresolved symbols. |
mappings_length |
uint32_t | The number of symbols at least partially resolved. |
mappings |
SymbolMapping[mappings_length] | The SymbolMapping s, one for each resolved symbol. |
where SymbolMapping
has the following structure:
Field | Type | Description |
---|---|---|
raw_symbol |
char[symbol_cstr_len] | The symbol requested symbol stype_in . |
interval_length |
uint32_t | The number of MappingInterval s in intervals . |
intervals |
MappingInterval[interval_length] | The MappingInterval s associated with raw_symbol . |
and where MappingInterval
has the following structure:
Field | Type | Description |
---|---|---|
start_date |
uint32_t | The start date of the interval, as a YYYYMMDD integer. |
end_date |
uint32_t | The end date of the interval, as a YYYYMMDD integer. |
symbol |
char[symbol_cstr_len] | The symbol in stype_out to which raw_symbol corresponds for the interval between start_date and end_date , where symbol_cstr_len is specified earlier in the Metadata. This is often instrument_id because it is the default stype_out . |
Records
The metadata is immediately followed by DBN records. A valid DBN stream or file contains zero or more records.
All records begin with the same 16-byte RecordHeader
with the following structure:
Field | Type | Description |
---|---|---|
length |
uint8_t | The length of the record in 32-bit words. |
rtype |
uint8_t | The record type. Each schema corresponds with a single rtype value. See Rtype. |
publisher_id |
uint16_t | The publisher ID assigned by Databento, which denotes the dataset and venue. |
instrument_id |
uint32_t | The numeric instrument ID. |
ts_event |
uint64_t | The event timestamp as the number of nanoseconds since the UNIX epoch. |
See the Schemas and data formats section for a full list of fields for the record associated with each schema.
Versioning
We use the version field in the metadata header to signal changes to the structure of record types and metadata.
Version 2
The following was changed:
- Metadata:
- Sets
version
to 2 - Adds
symbol_cstr_len
field - Rearranges padding
- The fixed-length strings for symbology are now defined to have
symbol_cstr_len
characters (currently 71), whereas in version 1 they always had 22
- Sets
InstrumentDefMsg
(definition schema):raw_symbol
now hassymbol_cstr_len
characters (71)- Rearranges padding
SymbolMappingMsg
(live symbology):stype_in_symbol
andstype_out_symbol
now havesymbol_cstr_len
characters (71)- Adds
stype_in
andstype_out
fields - Removes padding
ErrorMsg
(gateway errors in live)- Adds space to
err
for longer error messages - Adds
code
andis_last
fields
- Adds space to
SystemMsg
(non-error gateway messages in live)- Add space to
msg
for longer messages - Adds
code
field
- Add space to
Version 3
This set of changes adds support for strategy legs to the definition schema
and an expanded quantity
field in the statistics schemas.
- Added 8-byte alignment padding to the end of metadata
- Expanded
quantity
to 64 bits inStatMsg
(statistics schema) InstrumentDefMsg
(definition schema):- A definition record will be created for each strategy leg
- Adds the following leg fields:
leg_count
leg_index
leg_instrument_id
leg_raw_symbol
leg_side
leg_underlying_id
leg_instrument_class
leg_ratio_qty_numerator
leg_ratio_qty_denominator
leg_ratio_price_numerator
leg_ratio_price_denominator
leg_price
leg_delta
- Expands
asset
to 11 bytes - Expands
raw_instrument_id
to 64 bits to support publishers that use larger IDs - Removal of statistics-schema related fields
trading_reference_price
,trading_reference_date
, andsettl_price_type
- Removal of the status-schema related field
md_security_trading_status
Info
Currently, version 2 is used for the IFEU.IMPACT
and NDEX.IMPACT
datasets.
The DBN crate and client libraries will continue to support decoding version 1 data.
Upgrading to versions
DBN version 1 files can be upgraded to version 2 with the dbn
CLI tool by passing the --upgrade
or -u
flag.
dbn version1.dbn --output version2.dbn --upgrade
Comparison with other encodings and formats
DBN is designed specifically for normalized market data. It adopts a fixed set of struct definitions, also called message schemas, for this purpose. It's important to note that DBN is not a general-purpose serialization format like Simple Binary Encoding (SBE) or Google Protocol Buffers (protobufs), which provide a flexible schema definition language. Unlike these formats, DBN doesn't offer tools for generating decoders or encoders from any user-specified schema.
When comparing DBN to most encodings or serialization formats, a critical difference is that DBN is a zero-copy encoding. Moreover, what makes DBN most unique is that it's simultaneously intended for three common use cases in a trading system: file format, real-time messaging format, and in-memory representation. This is a very specific convergence of use cases that manifests frequently in financial trading systems.
Other encodings or formats typically used in situations where DBN would be a suitable replacement include:
- SBE
- Apache Parquet
- Apache Arrow
- Feather
Comparing to these, DBN is intended to be good at all three abovementioned use cases of a trading system—so you don't have to mix multiple serialization formats in one system—while the others tend to excel only in one or two use cases.
A single format for all use cases carries a more important benefit for trading than just the performance upside that comes with minimizing copies; it ensures that market data is immutable when it passes through your trading system. The following diagram helps you visualize the difference between a potential trading system that uses DBN compared to a typical trading system that doesn't.
Typical trading environment (top chart) | Trading environment using DBN (bottom chart) |
---|---|
Market data in multiple message, file, and in-memory formats | Market data in a single format |
Multiple layers of serialization and deserialization | No transformation of data |
Incurs risk of inconsistent state between components using market data | Eliminates risk of inconsistent state between components using market data |
Complex, slow code | Simple, fast code |
Typical trading environment
Trading environment using DBN
Immutable market data makes it easy to align live trading with post-production logs and historical data; it makes it easy to use the same code for live trading, backtesting, and exploratory research; it also makes it easy to write GUIs that need accurately synchronize order events with market data events, especially market data events that triggered those order events. In short, using the same encoding or format everywhere ensures that state is synchronized throughout distributed parts of your trading system.
For these reasons, most mature trading firms eventually end up implementing their own proprietary encoding that resembles DBN.
The following table summarizes other key comparisons:
DBN | SBE | Parquet | Arrow | |
---|---|---|---|---|
Schema definition | Fixed schemas | XML | Thrift, Avro, Protobuf | Arrow object model |
Layout | Sequential | Sequential | Column-oriented | Column-oriented |
Zero copy | Yes | Yes | No | Limited support |
Suitable for real-time messaging | Yes | Yes | No | No |
Suitable as file format | Yes | Yes | Yes | Through Feather |
Metadata | Yes | No, user-defined | No, user-defined | No |
Sequential read | Fastest | Fast | Moderate | Moderate |
Sequential write | Fastest | Fast | Slowest | Moderate (Feather) |
Compressed size | Small | Moderate | Smallest | Largest (Feather) |
Transcoding to CSV | Yes | No | Through pandas | Yes |
Transcoding to JSON | Yes | No | Through pandas | No |
Mapping to pandas | Yes | No | Yes | Yes |
Package size (lines of code) | 16.0k (v0.14.0) | 55.7k (v1.27.0) | 108.5k (v1.12.3) | 1.6M (v12.0.0) |
Language support | Python, C++, Rust, C bindings | C++, Java, C# | 11+ languages | 11+ languages |
Use case | Market data (storage, replay, research, real-time messaging, normalization, OMS, EMS, GUIs) | Direct venue connectivity | Storage file format | Data exploration |
Frequently asked questions
Isn't this basically a bunch of raw structs? What's so special about this?
Yes, pretty much! And it's not exactly novel—in our experience, most top-tier trading firms will have something similar already implemented, along with proprietary tooling to support it. The significance of DBN is that we're open sourcing the whole toolset, with many best practices for normalization and performance optimization, so that you don't have to reinvent the wheel.
Another purpose of DBN is that it provides a standardized data interchange format that can be used for high-throughput, low latency streaming between a data provider like us and you. At the time of the initial release of DBN, we're not aware of any data provider that adopts a binary flat file or messaging format with similar zero-copy semantics.
Even if you don't want to use DBN exactly, it's a lightweight specification so it's easy to mimic some of its practices or fork our reference implementation for your own use case.
When should you not use DBN?
- When you depend on many tools and frameworks in the Apache ecosystem. Many data processing tools have native support for Apache Parquet and Arrow. However in our experience, mature trading environments generally use fewer general-purpose computation frameworks. In these cases, DBN is still an excellent as an interchange format for receiving and storing data from Databento, and we still support converting DBN to pandas dataframes and hence Arrow format.
- If you don't use Databento, only ever plan on trading on one trading venue, have already written parsers for the raw feeds, and have direct extranet connectivity, then there's a strong argument for just using the original wire protocol like ITCH and even rolling your own, thinner normalization format.
- If you have an academic or toy project and only plan on working with historical data. Many such projects employ relatively small amounts of data and don't require live data. In these circumstances, it makes sense to just store the data in your own, thinner binary format or a binary format with some structure like Parquet or HDF5.
- If you have to support many teams with one platform, with different trading styles and business functions, of which many of them only require low frequency data. In this situation, the performance benefits of fixed schemas become much less important within your organization, and the flexibility becomes more important. It's also quite likely in this situation that you have to constantly update your normalization format for new exploratory workflows. In those cases, DBN is still an excellent as an interchange format for receiving and storing data from Databento, but your firm will likely benefit from converting DBN data into flexible formats downstream.
Encoding, serialization format, protocol—what's the difference?
There are slight differences in these terms but DBN is all three at once.
One way to look at DBN is that it's an OSI layer 6 presentation protocol, much like SBE—except that DBN is much stricter about message schemas, whereas SBE is flexible.
Data written in accordance to such a protocol can be persisted to disk, so it can also serve as a storage format; it can also be written on the wire as a message encoding or wire format. SBE excels as a message encoding but is less often used as a storage format, whereas DBN accommodates both equally well.
Why is the reference implementation written in Rust? Can I use it?
The majority of Databento's infrastructure is written in Rust, C, and Python, and the reference implementation of DBN is a rewrite of the original C implementation and was originally written with internal use in mind.
Rust's C ABI makes it interoperable across multiple languages with thin bindings, and its memory management model makes it safe and performant. It's simple to integrate the Rust DBN library into your Python application, as seen in our Python client library.
Why is DBN sequential and not column-oriented? Aren't modern column-oriented layouts more optimized for querying?
The sequential layout of DBN makes it more performant for real-time messaging use cases.
Column-oriented formats do have the theoretical potential for more optimization in non-real time use cases, but this depends on the actual implementation. Our reference DBN implementation is heavily optimized and still on par with column-oriented formats like Apache Arrow on common use cases for historical market data.
Am I locked-in to a proprietary binary format here?
No, the DBN reference implementation is open-sourced under the permissive Apache 2.0 License. We also provide transcoders to convert your DBN data into CSV and JSON.
How is DBN being used currently?
We store upwards of 4 PB and over 30 trillion records of normalized historical data in DBN internally at Databento. Every single message that passes through our infrastructure gets encoded in DBN—over billions of messages per day, at single-port peak messaging rates over 60 Gbps, spanning multiple asset classes and over 1.8 million instruments on any given day. It is used for all of our data schemas, including full order book, tick-by-tick trades, top of the book, OHLCV aggregates, venue statistics, instrument definitions, and more.
Most of users, including some of the world's largest hedge funds and market making firms, are already using DBN through our client libraries, putting it through multiple production use cases that involve real-time streaming and historical flat files.