Read our Venues and publishers section for a more detailed explanation.
Common fields, enums and types
Publishers, datasets, and venues
We use a few different terms to describe our data:
- A dataset is a source of data.
- A venue is an exchange, OTC market (e.g., ATS, ECN) or reporting entity.
- A publisher is a specific venue from a specific dataset.
See also
Publisher identifiers
All of our schemas include a publisher_id
field, which is a unique numeric ID assigned by Databento to each publisher.
A full list of publishers can be found using the metadata.list_publishers endpoint.
Venue and dataset identifiers
Each publisher is also assigned a string identifier (e.g., OPRA.PILLAR.XCBO
), composed of two parts:
- Dataset ID (e.g.,
OPRA.PILLAR
). This is used as thedataset
argument in any API or client method. Dataset IDs can be found on the Databento portal on each dataset's details page or via the metadata.list_datasets endpoint. - Venue (e.g.,
XCBO
). For most markets, this is its ISO 10383 MIC code, which is guaranteed to be four characters long. For entities without a MIC code, this string is arbitrarily assigned by Databento and will also be four characters long.
Instrument identifiers
All of our schemas contain an instrument_id
field which is a numeric ID that maps to a given instrument.
In most cases, this numeric ID is assigned by the publisher.
For publishers that do not assign this value, we create a synthetic mapping for it.
instrument_id
is only guaranteed to be unique within a given day.
Some publishers provide a different instrument ID on different days for the same underlying instrument.
Other publishers may use the same instrument ID for different underlying instruments at different points in time.
Depending on the use case, it may be easier to work with other symbology types such as raw_symbol
.
Our symbology documentation outlines the various symbology types we support.
Timestamps
All the timestamps in our data are expressed as the number of nanoseconds
since the UNIX epoch, i.e. UNIX
timestamps. All timestamp fields are prefixed with ts_
. Some of our timestamps are
encoded as the difference, i.e. delta, relative to another timestamp.
Such timestamp fields are suffixed with _delta
.
We provide four types of timestamps, through the following fields:
- Event timestamps,
ts_event
- Publisher sending timestamp,
ts_in_delta
- Databento receive timestamp,
ts_recv
- Databento sending timestamp (live only),
ts_out
UNDEF_TIMESTAMP
(UINT64_MAX
, 18446744073709551615) is used to denote a null or undefined timestamp.
The event and publisher sending timestamps are provided by the publisher (or market), and we provide their original values without any adjustment.
ts_event
Most users will only need the event timestamp, i.e. ts_event
. For market
data, this represents the time that the event is received by the
matching engine (tag 60 in FIX encoding).
The exact location where this timestamp is taken varies with matching engine architecture of each market. Some markets will handle different subsets of instruments on independent order gateways, while other markets will load balance the same subset of instruments across independent order gateways. Some markets take the event timestamp at the time it is received on the independent order gateways, while others may take this timestamp at the time it reaches a FIFO matching queue. In the former case, the clocks on independent order gateways are often not properly synchronized to the same clock source. Since we do not adjust the publisher's timestamps, any non-monotonicity in the original data will remain.
ts_in_delta
The publisher sending timestamp represents the time when the data message
associated with an event is sent (tag 52 in FIX encoding). We encode this
information in ts_in_delta
, which expresses the number of nanoseconds
between the Databento receive timestamp (ts_recv
) and the publisher sending timestamp.
To get the sending timestamp itself, simply subtract ts_in_delta
from
ts_recv
. Since the publisher and Databento are not necessarily synchronized
to the same clock source, ts_in_delta
may be negative.
ts_in_delta
is a 32-bit signed integer. The minimum will clamp to INT32_MIN
and the maximum will clamp to INT32_MAX
, even if the true value exceeds these limits.
Some markets do not provide both match event timestamps and sending timestamps.
Often, they will provide only one of the two. In such cases, we take it that
the event timestamp and sending timestamp assume the same value. As such,
ts_event
will be the provided timestamp and ts_in_delta
will be equal to
the difference between ts_recv
and ts_event
.
ts_recv
Unless otherwise specified, Databento receive timestamps, i.e. ts_recv
, are
synchronized against UTC with sub-microsecond accuracy. Moreover, these receive
timestamps are always guaranteed to be monotonic for any given symbol.
These receive timestamps rely on hardware timestamping on the network adapter and are synchronized against a GPS clock source using PTP. The clock is corrected by slewing the time, which prevents discrete jumps backwards in time. In other words, our local receive timestamps are guaranteed to be monotonic for any given symbol.
ts_recv
is also adjusted for leap seconds. The local receive timestamp is not
immediately adjusted intraday when a leap second is introduced. Instead, the
leap second update is applied at the end of the market session.
ts_out
For live data, we optionally include a timestamp of our data before it leaves our data gateways.
This information is encoded as ts_out
.
Both ts_out
and ts_recv
are synced to the same GPS clock source.
Subtracting ts_recv
gives the number of nanoseconds spent in our system.
Index timestamp
All schemas have a primary timestamp that should be used for sorting records as well as indexing into any symbology data structure.
This index timestamp will be ts_recv
if it exists in the schema, otherwise it will be ts_event
.
When requesting historical data, the data will be filtered based on the index timestamp.
When requesting data in CSV and JSON encodings, the first field will be set to this index timestamp.
Additionally, for schemas that contain ts_recv
, the second field will be set to ts_event
.
Encodings
We support DBN, CSV, and JSON encodings for our data. DBN is an extremely fast message encoding and storage format for normalized market data. All official Databento client libraries use DBN under the hood, both as a data interchange format and for in-memory representation of data. DBN is also the default encoding for all Databento APIs, including live data streaming, historical data streaming, and batch flat files.
Our batch download system also supports CSV and JSON encodings.
Time zone
By default, all of our data is set in UTC time zone. Likewise, our site displays all dates and times in UTC by default.
Dates and times
We use the ISO 8601 date and time format to express dates and times used as parameters to our APIs. All dates and times used as parameters are in UTC by default.
The "reduced precision" concept in the ISO 8601 standard allows for dates and times to be represented with varying levels of detail. Any number of values may be dropped from any of the date and time representations, but in the order from the least to the most significant. For example, "2024-05" corresponds to "2024-05-01T00:00:00".
Any parameter that takes an ISO 8601 timestamp can instead be given a timestamp in nanoseconds since the UNIX epoch, as described in the above section.
All of our timestamp parameters are start-inclusive and end-exclusive.
Forward filling end parameters
For our APIs that take an optional end
parameter as an ISO 8601 string, we will implement the following behavior when the end parameter is not provided:
We will forward fill any date or time components of the associated start
parameter that are omitted.
This "rounds up" the start
timestamp for use as the end
timestamp, and is done for more concise usage.
Examples of this behavior are shown below.
InfoWe will only forward fill timestamps with less than one-second resolution.
Start timestamp | Effective start timestamp | Forward filled end timestamp |
---|---|---|
"2024" | "2024-01-01T00:00:00" | "2025-01-01T00:00:00" |
"2024-03" | "2024-03-01T00:00:00" | "2024-04-01T00:00:00" |
"2024-03-10" | "2024-03-10T00:00:00" | "2024-03-11T00:00:00" |
"2024-03-10T01" | "2024-03-10T01:00:00" | "2024-03-10T02:00:00" |
"2024-03-10T00:01" | "2024-03-10T00:01:00" | "2024-03-10T00:02:00" |
For example, a query for the entire month of March 2024 can be specified with start="2024-03"
without an end
.
rtype
An rtype or record type is an unsigned 8-bit discriminant in the header of every DBN record that indicates the type of record structure. Each schema has one rtype and by extension one record structure associated with it.
InfoSome rtypes are not associated with a schema and are only present in live data.
Name | Hex | Decimal | Description |
---|---|---|---|
MBP-0 | 0x00 |
0 | A market-by-price record with a book depth of 0. Used for the trades schema. |
MBP-1 | 0x01 |
1 | A market-by-price record with a book depth of 1. Used for the TBBO and MBP-1 schemas. |
MBP-10 | 0x0A |
10 | A market-by-price record with a book depth of 10. |
Status | 0x12 |
18 | An exchange status record. |
Definition | 0x13 |
19 | An instrument definition record. |
Imbalance | 0x14 |
20 | An order imbalance record. |
Error | 0x15 |
21 | An error record from the live gateway. |
Symbol mapping | 0x16 |
22 | A symbol mapping record from the live gateway. |
System | 0x17 |
23 | A non-error record from the live gateway. |
Statistics | 0x18 |
24 | A statistics record from the publisher. |
OHLCV-1s | 0x20 |
32 | An OHLCV record at a 1-second cadence. |
OHLCV-1m | 0x21 |
33 | An OHLCV record at a 1-minute cadence. |
OHLCV-1h | 0x22 |
34 | An OHLCV record at an hourly cadence. |
OHLCV-1d | 0x23 |
35 | An OHLCV record at a daily cadence. |
MBO | 0xA0 |
160 | A market-by-order record. |
CMBP-1 | 0xB1 |
177 | A consolidated market-by-price record with a book depth of 1. |
CBBO-1s | 0xC0 |
192 | A consolidated market-by-price record with a book depth of 1 at a 1-second cadence. |
CBBO-1m | 0xC1 |
193 | A consolidated market-by-price record with a book depth of 1 at a 1-minute cadence. |
TCBBO | 0xC2 |
194 | A consolidated market-by-price record with a book depth of 1 with only trades. |
BBO-1s | 0xC3 |
195 | A market-by-price record with a book depth of 1 at a 1-second cadence. |
BBO-1m | 0xC4 |
196 | A market-by-price record with a book depth of 1 at a 1-minute cadence. |
Prices
Prices are expressed as signed integers in fixed-precision format, whereby every 1 unit corresponds to 1e-9, i.e. 1/1,000,000,000 or 0.000000001. For example, a price of 5411750000000 corresponds to 5411.75 (decimal format).
When requesting data via batch download in CSV and JSON encodings, you can optionally choose for prices to be returned in decimal format.
If you are requesting data using the online portal, you can select Decimal prices in the Advanced customization section.
Otherwise, you can specify the pretty_px
parameter in batch.submit_job using the client libraries.
Additionally, our client libraries support functionality to view prices in decimal format.
In certain scenarios—such as calendar spreads in futures—prices can be negative.
UNDEF_PRICE
is used to denote a null or undefined price.
It will be equal to 9223372036854775807 (INT64_MAX
) when using the fixed-precision integer format.
When expressed in decimal format, it will be equal to null
in JSON, or ""
(an empty string) in CSV.
Side
The side
field contains information about the side of an order event.
It's meaning will vary depending on the action
field.
When
action
is Trade:A
- The trade aggressor was a sellerB
- The trade aggressor was a buyerN
- No side specified
When
action
is Fill:A
- A resting sell order was filledB
- A resting buy order was filledN
- No side specified
When
action
is Add, Modify, and Cancel:A
- A resting sell order updated the bookB
- A resting buy order updated the bookN
- No side specified
When
action
is cleaR book,side
will always beN
side
can be N
in the following cases:
- The source does not disseminate a side for trades.
- Trades happening during opening and closing auctions
- Trades against non-displayed orders
- Trades involving implied orders
- Off-exchange trades
The Venues and datasets section provides more information regarding the specific cases for each dataset where no side will be specified.
Action
The action
field contains information about the type of order event contained in the message.
Name | Value | Action |
---|---|---|
Add | A |
Insert a new order into the book. |
Modify | M |
Change an order's price and/or size. |
Cancel | C |
Fully or partially cancel an order from the book. |
Clear | R |
Remove all resting orders for the instrument. |
Trade | T |
An aggressing order traded. Does not affect the book. |
Fill | F |
A resting order was filled. Does not affect the book. |
None | N |
No action: does not affect the book, but may carry flags or other information. |
Flags
The flags
field is a bit field that contains information about the message.
Multiple flags can be set on a single message.
The meaning of each bit is as follows:
Flag | Value | Decimal | Description |
---|---|---|---|
F_LAST |
1 << 7 |
128 | Marks the last record in a single event for a given instrument_id . |
F_TOB |
1 << 6 |
64 | Top-of-book message, not an individual order. |
F_SNAPSHOT |
1 << 5 |
32 | Message sourced from a replay, such as a snapshot server. |
F_MBP |
1 << 4 |
16 | Aggregated price level message, not an individual order. |
F_BAD_TS_RECV |
1 << 3 |
8 | The ts_recv value is inaccurate due to clock issues or packet reordering. |
F_MAYBE_BAD_BOOK |
1 << 2 |
4 | An unrecoverable gap was detected in the channel. |
F_PUBLISHER_SPECIFIC |
1 << 1 |
2 | Semantics depend on the publisher_id . Refer to the relevant dataset supplement for more details. |
1 << 0 |
1 | Reserved for internal use can safely be ignored. May be set or unset. |
Top-of-book datasets
Some datasets are built on feeds from vendors that only provide top-of-book information (best bid and offer).
Top-of-book messages are normalized into a pair of MBO records with the Add action and the F_TOB
flag (0x40
, 64) set.
Typically for these datasets, there is no information available about the passive side of trades, so there are no Fill records and the side
of the Trade record is always set to None.
The removal of a price level is normalized as an Add action with a size of 0 and price of UNDEF_PRICE
(INT64_MAX
, 9223372036854775807) or NaN in Python.
This indicates there's currently no quotes for that side.
Other schemas, such as MBP-1, Trades, and OHLCV remain the same for top-of-book datasets.
Market-by-price datasets
Some datasets are built on feeds from vendors that only provide market-by-price information (with limited depth).
Messages adding/modifying/deleting a price level are normalized into an MBO record with the Add/Modify/Cancel action, with the size
field containing the full quantity at that level and the F_MBP
flag (0x10
, 16) set.
A price level can be identified from the combination of the side
and the price
.
The order_id
field should be ignored for those messages.
If the upstream feed has a maximum depth, an additional record with Cancel action will be sent whenever a price level falls outside the maximum depth - even if there are still orders at that price level.
Typically for these datasets, there is no information available about the passive side of trades, so there are no Fill records.
MBP-10 will only include depth up to the depth provided by the publisher. The remaining levels will always be empty.
Other schemas, such as Trades and OHLCV are otherwise unaffected.