Please connect to the public DNS hostname and not the resolved IP, as this may change.
API reference - Live
You can receive live data from Databento via our live APIs, namely the Databento Raw API.
The Raw API is a simple socket-based, subscription-style protocol. Clients communicate with our live data gateways through a regular TCP/IP socket. To make it easier to integrate the API, we also provide official client libraries that simplify the code you need to write.
You can use our live APIs to subscribe to real-time data in your application. You can also use the APIs to request intraday historical data and instrument definitions for any number of products in a venue.
Overview
The Raw API is a proprietary wire protocol used between Raw API clients and our live data gateways. The following specification of this protocol prescribes encoding and decoding, a data model, valid message structures, and messaging behavior. This protocol is strictly an application layer protocol and relies on TCP for transport.
The protocol's main uses are subscribing to live market data and performing intraday playback for data from the last 24 hours.
Host and port
The Live API host to connect to depends on the dataset you'd like to subscribe to. To get the Live API host for a dataset:
- Take the dataset ID, e.g.
GLBX.MDP3, and convert it to lowercase - Replace periods with dashes, e.g.
glbx.mdp3becomesglbx-mdp3 - Append
.lsg.databento.com
The Live API port is always 13000.
For example, to connect to GLBX.MDP3, you'd open a TCP connection with glbx-mdp3.lsg.databento.com:13000.
Warning
Authentication
Databento uses API keys to authenticate requests. You can view and manage your keys on the API Keys page of your portal.
Each API key is a 32-character string.
Our API relies on a challenge-response authentication mechanism (CRAM) to ensure your API key is never sent over an unsecured network.
To authenticate, the server sends a challenge request message, to which the client must reply with an authentication request message. The server will then reply with an authentication response message to indicate whether the authentication succeeded.
For a detailed description of the algorithm, see the authentication message flow.
Schemas and conventions
A schema is a data record format represented as a collection of different data fields. Our datasets support multiple schemas, such as order book, tick data, bar aggregates, and so on. You can see a full list from our List of market data schemas.
You can get a list of all supported schemas for any given dataset using the MetadataListSchemas method. The same information can also be found on each dataset's detail page found through the Explore feature.
The following table provides details about the data types and conventions used for various fields that you will commonly encounter in the data.
| Name | Field | Description |
|---|---|---|
| Dataset | dataset |
A unique string name assigned to each dataset by Databento. Full list of datasets can be found from the metadata. |
| Publisher ID | publisher_id |
A unique 16-bit unsigned integer assigned to each publisher by Databento. Full list of publisher IDs can be found from the metadata. |
| Instrument ID | instrument_id |
A unique 32-bit unsigned integer assigned to each instrument by the venue. Information about instrument IDs for any given dataset can be found in the symbology. |
| Order ID | order_id |
A unique 64-bit unsigned integer assigned to each order by the venue. |
| Timestamp (event) | ts_event |
The matching-engine-received timestamp expressed as the number of nanoseconds since the UNIX epoch. |
| Timestamp (receive) | ts_recv |
The capture-server-received timestamp expressed as the number of nanoseconds since the UNIX epoch. |
| Timestamp delta (in) | ts_in_delta |
The matching-engine-sending timestamp expressed as the number of nanoseconds before ts_recv. See timestamping guide. |
| Timestamp out | ts_out |
The Databento gateway-sending timestamp expressed as the number of nanoseconds since the UNIX epoch. See timestamping guide. |
| Price | price |
The price expressed as signed integer where every 1 unit corresponds to 1e-9, i.e. 1/1,000,000,000 or 0.000000001. |
| Book side | side |
The side that initiates the event. Can be Ask for a sell order (or sell aggressor in a trade), Bid for a buy order (or buy aggressor in a trade), or None where no side is specified by the original source. |
| Size | size |
The order quantity. |
| Flag | flag |
A bit field indicating event end, message characteristics, and data quality. |
| Action | action |
The event type or order book operation. Can be Add, Cancel, Modify, cleaR book, Trade, Fill, or None. |
| Sequence number | sequence |
The original message sequence number from the venue. |
Datasets
Databento provides time series datasets for a variety of markets, sourced from different publishers. Our available datasets can be browsed through the search feature on our site.
Each dataset is assigned a unique string identifier (dataset ID) in the form PUBLISHER.DATASET, such as GLBX.MDP3.
For publishers that are also markets, we use standard four-character ISO 10383 Market Identifier Codes (MIC).
Otherwise, Databento arbitrarily assigns a four-character identifier for the publisher.
These dataset IDs are also found on the Data catalog and Download request features of the Databento user portal.
When a publisher provides multiple data products with different levels of granularity, Databento subscribes to the most-granular product. We then provide this dataset with alternate schemas to make it easy to work with the level of detail most appropriate for your application.
More information about different types of venues and publishers is available in our Knowledge base.
Symbology
Databento's live API supports several ways to select an instrument in a dataset. An instrument is specified using a symbol and a symbology type, also referred to as an stype. The supported symbology types are:
- Raw symbology (
raw_symbol) original string symbols used by the publisher in the source data. - Instrument ID symbology (
instrument_id) unique numeric ID assigned to each instrument by the publisher. - Parent symbology (
parent) groups instruments related to the market for the same underlying. - Continuous contract symbology (
continuous) proprietary symbology that specifies instruments based on certain systematic rules.
InfoIn the live API, existing subscriptions to continuous contracts will not be remapped to different instruments. However, submitting a new identical subscription may result in a new mapping.
When subscribing to live data, an input symbology type can be specified. By default, our client libraries will use raw symbology for the input type. Not all symbology types are supported for every dataset.
For live data, symbology mappings are provided through SymbolMappingMsg records.
These records are sent after the session has started and can be used to map the instrument_id from a data record's header to a text symbol.
For more about symbology at Databento, see our Standards and conventions.
Dates and times
Timestamps returned with live data are 64-bit integers representing UNIX nanoseconds.
These timestamps are always in UTC. To localize from other timezones, the conversion to and from UTC should be implemented on the client side.
The raw API control messages support UNIX nanoseconds and datetime string formatting based on ISO 8601 as listed below. All times are given in UTC, and the timezone cannot be set.
yyyy-mm-dd, e.g."2022-02-28"(midnight UTC)yyyy-mm-ddTHH:MM, e.g."2022-02-28T23:50"yyyy-mm-ddTHH:MM:SS, e.g."2022-02-28T23:50:59"yyyy-mm-ddTHH:MM:SS.NNNNNNNNN, e.g."2022-02-28T23:50:59.123456789"
Intraday replay
Our live API offers intraday replay within the last 24 hours.
Users can connect to the live service and request data from a particular start time.
Data will be filtered on ts_event for all schemas except CBBO and BBO, which will be filtered on ts_recv.
A different start time can be specified for each subscription. There can be multiple subscriptions for the same schema, with each subscription having a different start time. When the session starts, records will be immediately replayed for each schema. A replay completed SystemMsg will be sent for each replayed schema once it catches up to real-time data. Once a session has started, newly added subscriptions are not eligible for intraday replay.
As a special case for the GLBX.MDP3 dataset, we also provide replay of the entire weekly session in the MBO and definition schemas
outside of the 24-hour window to aid with recovery, as these schemas are highly stateful.
The Raw API accepts an ISO 8601 string or a UNIX timestamp in nanoseconds for the start time. Refer to the Dates and times article for more information on how our Raw API handles timestamps.
System messages
Our live API uses a system record (SystemMsg) to indicate non-error information to clients.
One use is heartbeating, to ensure the TCP connection remains open and to help clients detect a connection issue.
A heartbeat will only be sent if no other data record was sent to the client during the heartbeat interval.
The interval between heartbeat messages can be configured by setting heartbeat_interval_s in the Authentication request.
For heartbeats, msg will be "Heartbeat".
The layout of these records changed in DBN version 2.
You can use the version field in the Metadata header to check the DBN version.
Version 1
| Field | Type | Description |
|---|---|---|
msg |
char[64] | The message from the gateway. |
InfoFor datasets on DBN version 1 where
codeis not populated, you should parse themsgfield to determine the type of message.
Versions 2 and above
| Field | Type | Description |
|---|---|---|
msg |
char[303] | The message from the gateway. |
code |
uint8_t | Describes the type of system message. See table below. |
System code variants
| Variant | code |
Description |
|---|---|---|
| HEARTBEAT | 0 |
A message sent in the absence of other records to indicate the connection remains open. |
| SUBSCRIPTION_ACK | 1 |
An acknowledgement of a subscription request. |
| SLOW_READER_WARNING | 2 |
The gateway has detected this session is falling behind real-time data. |
| REPLAY_COMPLETED | 3 |
Indicates a replay subscription has caught up with real-time data. This message will be sent per schema. |
| END_OF_INTERVAL | 4 |
Signals that all records for interval-based schemas have been published for the given timestamp. |
Errors
Our live API uses an error record (ErrorMsg) to indicate failures to clients.
These are sent as data records to the client after the session has been started.
The layout of these records changed in DBN version 2.
You can use the version field in the Metadata header to check the DBN version.
Version 1
| Field | Type | Description |
|---|---|---|
err |
char[64] | The error message. |
InfoFor datasets on DBN version 1 where
codeis not populated, you should parse theerrfield to determine the type of message.
Versions 2 and above
| Field | Type | Description |
|---|---|---|
err |
char[302] | The error message. |
code |
uint8_t | Describes the type of error message. See table below. |
is_last |
uint8_t | Boolean flag indicating whether this is the last in a series of error records. |
Error code variants
| Variant | code |
Description |
|---|---|---|
| AuthFailed | 1 |
The authentication step failed. |
| ApiKeyDeactivated | 2 |
The user account or API key were deactivated. |
| ConnectionLimitExceeded | 3 |
The user has exceeded their open connection limit. |
| SymbolResolutionFailed | 4 |
One or more symbols failed to resolve. |
| InvalidSubscription | 5 |
There was an issue with a subscription request (other than symbol resolution). |
| InternalError | 6 |
An error occurred in the gateway. |
Connection limits
With our live API, there is a limit of 10 simultaneous connections (sessions) per (dataset) per team for Usage-based and Standard plans. Unlimited and Enterprise plans will be limited to 50 simultaneous connections per dataset per team. Creating additional API keys will not affect the maximum number of connections per team.
In addition, a single gateway will allow at most five incoming connections per second from the same IP address. If an IP address goes over this limit, incoming connections will be immediately closed by the gateway - existing connections will not be affected. If this happens, clients should wait one second before retrying.
Subscription rate limits
Symbol resolution is a relatively slow operation, as such, subscription requests are throttled to prevent abuse and accidental performance impact on other users. Subscriptions above the limit of 3 per second will not be rejected; instead, the gateway will wait until the session is back under the rate limit before processing it. The gateway will send a subscription acknowledgement when it has finished processing a subscription request.
Metered pricing
Databento only charges for the data that you use. You can find rates (per MB) for the various datasets and estimate pricing on our Data catalog. We meter the data by its uncompressed size in binary encoding.
When you stream the data, you are billed incrementally for each outbound byte of data sent from our live gateway.
If your connection becomes unresponsive while streaming our data, our live gateway will send data up to the TCP connection's receive window size, and you will not be billed for data over this limit.
Duplicate subscriptions within the same session will be deduplicated and not incur additional charges. Separate sessions with identical subscriptions will incur repeated charges.
Access to metadata, symbology, and account management is free.
Related: Billing management.
Encodings
DBN
Databento Binary Encoding (DBN) is an extremely fast message encoding and highly-compressible storage format for normalized market data. It includes a self-describing metadata header and adopts a binary format with zero-copy serialization.
We recommend using our Python, C++, or Rust client libraries to read DBN files locally. A CLI tool is also available for converting DBN files to CSV or JSON.
JSON
JavaScript Object Notation (JSON) is a flexible text file format with broad language support and wide adoption across web apps.
Our JSON data records follow the JSON lines specification, where
each line of the file is a JSON record.
Lines use UNIX-style \n separators.
Compression
Databento provides options for compressing files from our API.
zstd
The zstd compression option uses the Zstandard format.
This adds a slight performance cost to encoding and decoding.
Read more about working with Zstandard-compressed files.
none
The none compression option disables compression entirely, resulting larger data transfer.
By default, live data is uncompressed.
Protocol
The Raw API protocol is divided into two main parts: control messages and data records.
All messages sent by the client to the gateway are control messages.
Control messages
Control messages are used for authentication and subscribing to the gateway, and are encoded
as ASCII text. A single control message is composed by a series of key=value
pairs (called fields from now) separated by the | character.
Every control message terminates with a newline character (\n).
A control message can have optional and required fields. A control message which contains an unknown field or does not contain a required field will be considered invalid by the gateway. Control messages are limited to 64KiB; messages exceeding this length are invalid. After an invalid control message, the gateway will terminate the session with an error message.
An example of a correctly-formatted subscription request control message is schema=trades|stype_in=raw_symbol|symbols=SPY\n.
Data records
Once authenticated with the gateway, the gateway will not send any new control messages; subsequent messages will always be data records.
If compression is enabled, the data record stream will need to be decompressed with the compression algorithm of choice before decoding.
Data records are sent in the encoding specified during authentication. Unlike in the historical API, the CSV encoding is not supported.
In the DBN encoding, the records will be sequentially streamed.
In the JSON encoding, the records will follow the JSON lines specification,
where each line of the file is a JSON object. Lines use UNIX-style \n separators.
The ts_out parameter
If the client chose to set the ts_out parameter during the authentication, the gateway will send its send
timestamp with every data record. The format will vary per encoding.
In the dbn encoding, the ts_out is a 64-bit unsigned integer appended to every data record.
In the json encoding, the ts_out is added as an additional field within each JSON object.
Error detection
When maintaining a Raw API connection, clients should monitor their connection for errors.
There are three main ways in which a session can enter an error state:
- Hung connection: The client is not receiving any data from the gateway
- Disconnect without error: The client is explicitly disconnected by the gateway, without receiving an error message
- Disconnect with error: The client is explicitly disconnected by the gateway.
Immediately prior to being disconnected, the client will receive an error record or a Raw API error response (that is, containing
success=0)
Hung connection
To detect hung connections, clients are instructed to make use of system heartbeats.
Clients can configure a heartbeat interval when authenticating by setting the heartbeat_interval_s parameter on the authentication request message.
If the heartbeat interval is not set by the client, it will default to 30 seconds.
Once a session is started, if no data is sent by the gateway for the entirety of a heartbeat interval, the gateway will send a system message to the client to indicate a heartbeat.
If the gateway is regularly sending other messages to the client (for example, MboMsg), heartbeats will not be sent.
Once a session is started, if a client does not receive any messages from the gateway for the duration of one heartbeat interval plus two seconds, the session can be considered hung. Clients are instructed to disconnect from the gateway and reconnect upon detecting hung connections.
Clients with unstable internet connections may need larger intervals than two seconds to ensure the connection is truly hung, as opposed to merely delayed.
Disconnect without error
From the point of view of the client, a disconnect is detected when the underlying TCP session is closed. Upon being disconnected, clients are instructed to wait one second and initiate a new connection. Waiting too short an interval to reconnect may trigger the gateway's rate limiter.
See alsoConnection limits for more details.
Disconnect with error
From the point of view of the client, a disconnect with error is detected when the underlying TCP session is closed after an ErrorMsg or a Raw API error response is received.
Clients disconnected with an error are instructed to not reconnect automatically. In the vast majority of cases, reconnecting and resubscribing with the same parameters will lead to the same errors being received again. The error sent to the client will indicate the issue to be fixed prior to resubscribing.
Recovering after a disconnection
When reconnecting to the gateway, clients should resubscribe to all desired symbols. In order to avoid missing data after a reconnection, there are three main approaches to recovery:
- Natural refresh
- Intraday replay
- Snapshot (MBO only)
The best approach to recovery will depend on the client's use case and specific needs.
Natural refresh
To recover via natural refresh, clients can resubscribe to all desired symbols without the start or snapshot parameters.
This means no data will be replayed, and the client will immediately receive the newest messages upon subscribing.
This recovery approach is the fastest (since there's no data replay), and is recommended for stateless schemas such as MBP-10, in cases where the client only requires the current state of each instrument.
Intraday replay
To recover via intraday replay, clients should store the last ts_event and the number of records received with that last timestamp, per schema and instrument.
The ts_event and record count should be continuously updated when processing incoming records.
When reconnecting, clients should set the start parameter of the resubscription to the lowest stored ts_event across all instruments for that schema.
The gateway will then send all records starting from that timestamp (including records with the exact same ts_event).
The resubscription may lead to duplicated data being sent to the client. Clients who require that each message is delivered exactly once are instructed to:
- Discard all records with a lower
ts_eventthan the stored one for the corresponding instrument - Discard the first N records with the same
ts_eventas the stored one for the corresponding instrument, where N is the number of records already seen with thatts_eventprior to the disconnection. This is important in case there are multiple events with the samets_eventand the client is disconnected halfway through processing those events
This recovery approach is recommended when clients require the uninterrupted history of records for the desired schema (for example, when using the Trades schema to construct a ticker tape). However, this approach can take a longer time to synchronize with the live stream, depending on how long the client was disconnected.
For the CBBO and BBO schemas where filtering is based on ts_recv, clients should store the last ts_recv per instrument.
When reconnecting, clients should set the start parameter to the resubscription of the lowest stored ts_recv across all instruments.
The gateway will then send all records starting from that timestamp (including records with the same ts_recv).
Since CBBO and BBO are stateless schemas, you should always refer to the most recent record per instrument.
Snapshot (MBO only)
When resubscribing to the MBO schema, clients can request a snapshot to receive the current state of the book after a disconnection. This eliminates the need to replay the missed messages and leads to faster synchronization with the live stream. This recovery approach is generally recommended over intraday replay when using the MBO schema.
Maintenance schedule
We restart our live gateways on Sunday at the following times:
- CME Globex. 09:30 UTC
- All ICE venues. 09:45 UTC
- All other datasets. 10:30 UTC
All clients will be disconnected during this time.
Additionally, we may restart our gateways mid-week. While we generally post these mid-week restarts on our status page, we may perform these restarts without notice due to an urgent fix. You should configure your client to handle reconnecting automatically.
While exchanges are closed on Saturday, our systems are still connected to the exchange feeds. The exchange may send test data, and our gateways will disseminate this data to all connected clients. If you are not interested in receiving this test data, we recommend you disconnect after the Friday close and reconnect on Sunday after the scheduled restart.
InfoAny test data sent through the Live API will not be seen in our historical data.
Authentication
The authentication message flow is initiated by the gateway.
The gateway will send a greeting message followed by a challenge request.
The client must then issue an authentication request, which the gateway will respond to with an authentication response
Example
For the purpose of this example, assume the API key is db-89s9vCvwDDKPdQJ5Pb30Fyj9mNUM6.
When connecting to the gateway, the gateway will send two messages:
lsg_version=0.2.0\n
cram=j5pwMHz6vwXruJM4cOwQrQeQE0bImIzT\n
To generate the response, the client first concatenates the cram value and their API key in the format $cram|$key.
In this example, this would be j5pwMHz6vwXruJM4cOwQrQeQE0bImIzT|db-89s9vCvwDDKPdQJ5Pb30Fyj9mNUM6.
The concatenated string is then hashed with SHA-256.
In this example, it produces 6d3c875bb9f8cf503c3ed83ee5f476a3ad53f0c67706c51cf42d2db5ad8ff5a9.
This result is then concatenated with the last five bytes of the API key (also referred to as the bucket_id) in the format $auth-$bucket_id
With this, the authentication response string is generated as 6d3c875bb9f8cf503c3ed83ee5f476a3ad53f0c67706c51cf42d2db5ad8ff5a9-mNUM6.
The client then sends the authentication message containing this string to the gateway:
auth=6d3c875bb9f8cf503c3ed83ee5f476a3ad53f0c67706c51cf42d2db5ad8ff5a9-mNUM6|dataset=GLBX.MDP3|encoding=dbn|ts_out=0\n
If the key is valid, the gateway will reply with a successful response:
success=1|session_id=135567\n
Subscription
After authenticating, the client can subscribe to live data.
To subscribe, the client must send a subscription request to the gateway.
Subscriptions from multiple symbols and schemas can be made in the same session. Different datasets require separate connections.
The same symbol can have separate subscriptions in different schemas.
A subscription will not take effect until the session is started.
Once a subscription is sent to the gateway, it cannot be unsubscribed from. In order to remove a subscription, the client must disconnect from the gateway and establish a new session.
If a subscription fails on the gateway, the gateway will send one or more ErrorMsg records to the client and the session will be terminated.
Example
To subscribe to trades for all E-mini S&P Mini 500 Futures on CME, the client would send:
schema=trades|stype_in=parent|symbols=ES.FUT\n
Intraday historical subscription
After authenticating, the client can subscribe to intraday historical data.
The subscription mechanism is similar to live data: the client must send a
subscription request
to the gateway, containing the start parameter which can be formatted as nanoseconds since
the UNIX epoch or as a datetime string of one of the formats listed in Dates and times.
When handling an intraday historical subscription, the gateway will send all messages beginning from the specified timestamp and, after catching up in time, will continue sending live data as it arrives.
This makes intraday historical subscriptions the ideal way to deal with application restarts on the client side.
Our gateways keep data from up to the last 24 hours to serve via the intraday historical API. Requests for data over 24 hours old will be rejected.
Example
To subscribe to trades for all E-mini S&P 500 Futures on CME starting from 2023-04-05 00:00:00 UTC, the client would send:
schema=trades|stype_in=parent|symbols=ES.FUT|start=1680652800000000000\n
or
schema=trades|stype_in=parent|symbols=ES.FUT|start=2023-04-05T00:00:00\n
Starting the session
Once the gateway receives the session start message, it will start streaming data messages to the client and will continue until the connection is terminated.
Once the session has started, it's still possible to send new subscriptions to the
gateway, as long as they do not contain the start field.
InfoA session cannot be started more than once.
Example
After authenticating, to subscribe to trades for all E-mini S&P 500 Futures on CME starting from 2023-04-05 00:00:00 UTC, the client would send:
schema=trades|stype_in=parent|symbols=ES.FUT|start=1680652800000000000\n
Then, when ready to receive data, the client would send:
start_session=1\n
After this, the gateway will begin streaming data records to the client.
Challenge request
The gateway sends this message immediately following the greeting message to authenticate the client.
Fields
Authentication response
The gateway sends this response following the client's authentication request.
Fields
1 if the client was successfully authenticated, 0 otherwise.
Authentication request
Authenticates the session against the gateway and sets the dataset for the session.
Fields
cram challenge sent by the gateway. See authentication for a detailed walkthrough of the process.dbn.none.1, the gateway will prepend the timestamp at which the message was processed to every data record. Defaults to 0.1, the gateway will format fixed-precision fields as a decimal string. Only applicable for JSON encoding. Defaults to 0.1, the gateway will format timestamp fields as ISO 8601 strings. Only applicable for JSON encoding. Defaults to 0.
Subscription request
Add a new subscription to the session.
Supports multiple subscriptions for different schemas, which enables rich data streams containing mixed record types.
Specify an optional start time for intraday replay with subscriptions made before starting the session.
Please note there is no unsubscribe method. Subscriptions end when the TCP connection closes.
When subscribing to many symbols, a subscription can be split across multiple control messages so as to avoid exceeding the maximum length.
It's recommended to chunk the symbols in groups of 500.
All but the last message should include is_last=0 to indicate to the gateway that another part of the subscription remains.
Fields
symbols.ALL_SYMBOLS will subscribe to all the symbols in the dataset.ts_event except for CBBO and BBO schemas, which filter on ts_recv. Takes an ISO 8601 string or a UNIX timestamp in nanoseconds. Must be within the last 24 hours. Pass 0 to request all available data. Cannot be specified after the session is started.snapshot=1, start must be absent.0 indicates more requests will be sent with the same parameters other than symbols and the gateway should wait to process the request. is_last=1 is the default and instructs the gateway to process the request immediately.