Support

Common fields, enums and types

Publishers, datasets, and venues

We use a few different terms to describe our data:

  • A dataset is a source of data.
  • A venue is an exchange, OTC market (e.g., ATS, ECN) or reporting entity.
  • A publisher is a specific venue from a specific dataset.
See also
See also

Read our Venues and publishers section for a more detailed explanation.

Publisher identifiers

All of our schemas include a publisher_id field, which is a unique numeric ID assigned by Databento to each publisher. A full list of publishers can be found using the metadata.list_publishers endpoint.

Venue and dataset identifiers

Each publisher is also assigned a string identifier (e.g., OPRA.PILLAR.XCBO), composed of two parts:

  • Dataset ID (e.g., OPRA.PILLAR). This is used as the dataset argument in any API or client method. Dataset IDs can be found on the Databento portal on each dataset's details page or via the metadata.list_datasets endpoint.
  • Venue (e.g., XCBO). For most markets, this is its ISO 10383 MIC code, which is guaranteed to be four characters long. For entities without a MIC code, this string is arbitrarily assigned by Databento and will also be four characters long.

Instrument identifiers

All of our schemas contain an instrument_id field which is a numeric ID that maps to a given instrument. In most cases, this numeric ID is assigned by the publisher. For publishers that do not assign this value, we create a synthetic mapping for it.

instrument_id is only guaranteed to be unique within a given day. Some publishers provide a different instrument ID on different days for the same underlying instrument. Other publishers may use the same instrument ID for different underlying instruments at different points in time.

Depending on the use case, it may be easier to work with other symbology types such as raw_symbol. Our symbology documentation outlines the various symbology types we support.

Timestamps

All the timestamps in our data are expressed as the number of nanoseconds since the UNIX epoch, i.e. UNIX timestamps. All timestamp fields are prefixed with ts_. Some of our timestamps are encoded as the difference, i.e. delta, relative to another timestamp. Such timestamp fields are suffixed with _delta.

We provide four types of timestamps, through the following fields:

  • Event timestamps, ts_event
  • Publisher sending timestamp, ts_in_delta
  • Databento receive timestamp, ts_recv
  • Databento sending timestamp (live only), ts_out

UNDEF_TIMESTAMP (UINT64_MAX, 18446744073709551615) is used to denote a null or undefined timestamp.

The event and publisher sending timestamps are provided by the publisher (or market), and we provide their original values without any adjustment.

ts_event

Most users will only need the event timestamp, i.e. ts_event. For market data, this represents the time that the event is received by the matching engine (tag 60 in FIX encoding).

The exact location where this timestamp is taken varies with matching engine architecture of each market. Some markets will handle different subsets of instruments on independent order gateways, while other markets will load balance the same subset of instruments across independent order gateways. Some markets take the event timestamp at the time it is received on the independent order gateways, while others may take this timestamp at the time it reaches a FIFO matching queue. In the former case, the clocks on independent order gateways are often not properly synchronized to the same clock source. Since we do not adjust the publisher's timestamps, any non-monotonicity in the original data will remain.

ts_in_delta

The publisher sending timestamp represents the time when the data message associated with an event is sent (tag 52 in FIX encoding). We encode this information in ts_in_delta, which expresses the number of nanoseconds between the Databento receive timestamp (ts_recv) and the publisher sending timestamp. To get the sending timestamp itself, simply subtract ts_in_delta from ts_recv. Since the publisher and Databento are not necessarily synchronized to the same clock source, ts_in_delta may be negative.

ts_in_delta is a 32-bit signed integer. The minimum will clamp to INT32_MIN and the maximum will clamp to INT32_MAX, even if the true value exceeds these limits.

Some markets do not provide both match event timestamps and sending timestamps. Often, they will provide only one of the two. In such cases, we take it that the event timestamp and sending timestamp assume the same value. As such, ts_event will be the provided timestamp and ts_in_delta will be equal to the difference between ts_recv and ts_event.

ts_recv

Unless otherwise specified, Databento receive timestamps, i.e. ts_recv, are synchronized against UTC with sub-microsecond accuracy. Moreover, these receive timestamps are always guaranteed to be monotonic for any given symbol.

These receive timestamps rely on hardware timestamping on the network adapter and are synchronized against a GPS clock source using PTP. The clock is corrected by slewing the time, which prevents discrete jumps backwards in time. In other words, our local receive timestamps are guaranteed to be monotonic for any given symbol.

ts_recv is also adjusted for leap seconds. The local receive timestamp is not immediately adjusted intraday when a leap second is introduced. Instead, the leap second update is applied at the end of the market session.

ts_out

For live data, we optionally include a timestamp of our data before it leaves our data gateways. This information is encoded as ts_out. Both ts_out and ts_recv are synced to the same GPS clock source. Subtracting ts_recv gives the number of nanoseconds spent in our system.

Index timestamp

All schemas have a primary timestamp that should be used for sorting records as well as indexing into any symbology data structure. This index timestamp will be ts_recv if it exists in the schema, otherwise it will be ts_event.

When requesting historical data, the data will be filtered based on the index timestamp.

When requesting data in CSV and JSON encodings, the first field will be set to this index timestamp. Additionally, for schemas that contain ts_recv, the second field will be set to ts_event.

Encodings

We support DBN, CSV, and JSON encodings for our data. DBN is an extremely fast message encoding and storage format for normalized market data. All official Databento client libraries use DBN under the hood, both as a data interchange format and for in-memory representation of data. DBN is also the default encoding for all Databento APIs, including live data streaming, historical data streaming, and batch flat files.

Our batch download system also supports CSV and JSON encodings.

Time zone

By default, all of our data is set in UTC time zone. Likewise, our site displays all dates and times in UTC by default.

Dates and times

We use the ISO 8601 date and time format to express dates and times used as parameters to our APIs. All dates and times used as parameters are in UTC by default.

The "reduced precision" concept in the ISO 8601 standard allows for dates and times to be represented with varying levels of detail. Any number of values may be dropped from any of the date and time representations, but in the order from the least to the most significant. For example, "2024-05" corresponds to "2024-05-01T00:00:00".

Any parameter that takes an ISO 8601 timestamp can instead be given a timestamp in nanoseconds since the UNIX epoch, as described in the above section.

All of our timestamp parameters are start-inclusive and end-exclusive.

Forward filling end parameters

For our APIs that take an optional end parameter as an ISO 8601 string, we will implement the following behavior when the end parameter is not provided:

We will forward fill any date or time components of the associated start parameter that are omitted. This "rounds up" the start timestamp for use as the end timestamp, and is done for more concise usage.

Examples of this behavior are shown below.

Info
Info

We will only forward fill timestamps with less than one-second resolution.

Start timestamp Effective start timestamp Forward filled end timestamp
"2024" "2024-01-01T00:00:00" "2025-01-01T00:00:00"
"2024-03" "2024-03-01T00:00:00" "2024-04-01T00:00:00"
"2024-03-10" "2024-03-10T00:00:00" "2024-03-11T00:00:00"
"2024-03-10T01" "2024-03-10T01:00:00" "2024-03-10T02:00:00"
"2024-03-10T00:01" "2024-03-10T00:01:00" "2024-03-10T00:02:00"

For example, a query for the entire month of March 2024 can be specified with start="2024-03" without an end.

rtype

An rtype or record type is an unsigned 8-bit discriminant in the header of every DBN record that indicates the type of record structure. Each schema has one rtype and by extension one record structure associated with it.

Info
Info

Some rtypes are not associated with a schema and are only present in live data.

Name Hex Decimal Description
MBP-0 0x00 0 A market-by-price record with a book depth of 0. Used for the trades schema.
MBP-1 0x01 1 A market-by-price record with a book depth of 1. Used for the TBBO and MBP-1 schemas.
MBP-10 0x0A 10 A market-by-price record with a book depth of 10.
Status 0x12 18 An exchange status record.
Definition 0x13 19 An instrument definition record.
Imbalance 0x14 20 An order imbalance record.
Error 0x15 21 An error record from the live gateway.
Symbol mapping 0x16 22 A symbol mapping record from the live gateway.
System 0x17 23 A non-error record from the live gateway.
Statistics 0x18 24 A statistics record from the publisher.
OHLCV-1s 0x20 32 An OHLCV record at a 1-second cadence.
OHLCV-1m 0x21 33 An OHLCV record at a 1-minute cadence.
OHLCV-1h 0x22 34 An OHLCV record at an hourly cadence.
OHLCV-1d 0x23 35 An OHLCV record at a daily cadence.
MBO 0xA0 160 A market-by-order record.
CMBP-1 0xB1 177 A consolidated market-by-price record with a book depth of 1.
CBBO-1s 0xC0 192 A consolidated market-by-price record with a book depth of 1 at a 1-second cadence.
CBBO-1m 0xC1 193 A consolidated market-by-price record with a book depth of 1 at a 1-minute cadence.
TCBBO 0xC2 194 A consolidated market-by-price record with a book depth of 1 with only trades.
BBO-1s 0xC3 195 A market-by-price record with a book depth of 1 at a 1-second cadence.
BBO-1m 0xC4 196 A market-by-price record with a book depth of 1 at a 1-minute cadence.

Prices

Prices are expressed as signed integers in fixed-precision format, whereby every 1 unit corresponds to 1e-9, i.e. 1/1,000,000,000 or 0.000000001. For example, a price of 5411750000000 corresponds to 5411.75 (decimal format).

When requesting data via batch download in CSV and JSON encodings, you can optionally choose for prices to be returned in decimal format. If you are requesting data using the online portal, you can select Decimal prices in the Advanced customization section. Otherwise, you can specify the pretty_px parameter in batch.submit_job using the client libraries.

Additionally, our client libraries support functionality to view prices in decimal format.

In certain scenarios—such as calendar spreads in futures—prices can be negative.

UNDEF_PRICE is used to denote a null or undefined price. It will be equal to 9223372036854775807 (INT64_MAX) when using the fixed-precision integer format. When expressed in decimal format, it will be equal to null in JSON, or "" (an empty string) in CSV.

Side

The side field contains information about the side of an order event. It's meaning will vary depending on the action field.

  • When action is Trade:

    • A - The trade aggressor was a seller
    • B - The trade aggressor was a buyer
    • N - No side specified
  • When action is Fill:

    • A - A resting sell order was filled
    • B - A resting buy order was filled
    • N - No side specified
  • When action is Add, Modify, and Cancel:

    • A - A resting sell order updated the book
    • B - A resting buy order updated the book
    • N - No side specified
  • When action is cleaR book, side will always be N

side can be N in the following cases:

  • The source does not disseminate a side for trades.
  • Trades happening during opening and closing auctions
  • Trades against non-displayed orders
  • Trades involving implied orders
  • Off-exchange trades

The Venues and datasets section provides more information regarding the specific cases for each dataset where no side will be specified.

Action

The action field contains information about the type of order event contained in the message.

Name Value Action
Add A Insert a new order into the book.
Modify M Change an order's price and/or size.
Cancel C Fully or partially cancel an order from the book.
Clear R Remove all resting orders for the instrument.
Trade T An aggressing order traded. Does not affect the book.
Fill F A resting order was filled. Does not affect the book.
None N No action: does not affect the book, but may carry flags or other information.

Flags

The flags field is a bit field that contains information about the message. Multiple flags can be set on a single message.

The meaning of each bit is as follows:

Flag Value Decimal Description
F_LAST 1 << 7 128 Marks the last record in a single event for a given instrument_id.
F_TOB 1 << 6 64 Top-of-book message, not an individual order.
F_SNAPSHOT 1 << 5 32 Message sourced from a replay, such as a snapshot server.
F_MBP 1 << 4 16 Aggregated price level message, not an individual order.
F_BAD_TS_RECV 1 << 3 8 The ts_recv value is inaccurate due to clock issues or packet reordering.
F_MAYBE_BAD_BOOK 1 << 2 4 An unrecoverable gap was detected in the channel.
F_PUBLISHER_SPECIFIC 1 << 1 2 Semantics depend on the publisher_id. Refer to the relevant dataset supplement for more details.
1 << 0 1 Reserved for internal use can safely be ignored. May be set or unset.

Top-of-book datasets

Some datasets are built on feeds from vendors that only provide top-of-book information (best bid and offer). Top-of-book messages are normalized into a pair of MBO records with the Add action and the F_TOB flag (0x40, 64) set. Typically for these datasets, there is no information available about the passive side of trades, so there are no Fill records and the side of the Trade record is always set to None.

The removal of a price level is normalized as an Add action with a size of 0 and price of UNDEF_PRICE (INT64_MAX, 9223372036854775807) or NaN in Python. This indicates there's currently no quotes for that side.

Other schemas, such as MBP-1, Trades, and OHLCV remain the same for top-of-book datasets.

Market-by-price datasets

Some datasets are built on feeds from vendors that only provide market-by-price information (with limited depth).

Messages adding/modifying/deleting a price level are normalized into an MBO record with the Add/Modify/Cancel action, with the size field containing the full quantity at that level and the F_MBP flag (0x10, 16) set. A price level can be identified from the combination of the side and the price. The order_id field should be ignored for those messages.

If the upstream feed has a maximum depth, an additional record with Cancel action will be sent whenever a price level falls outside the maximum depth - even if there are still orders at that price level.

Typically for these datasets, there is no information available about the passive side of trades, so there are no Fill records.

MBP-10 will only include depth up to the depth provided by the publisher. The remaining levels will always be empty.

Other schemas, such as Trades and OHLCV are otherwise unaffected.