The amount billed will be based on the actual amount of bytes sent; see our pricing documentation for more details.
API reference - Historical
Databento's historical data service can be accessed programmatically over its HTTP API. To make it easier to integrate the API, we also provide official client libraries that simplify the code you need to write.
Our HTTP API is designed as a collection of RPC-style methods, which can be
called using URLs in the
form https://hist.
.
Our client libraries wrap these HTTP RPC-style methods with more idiomatic interfaces in their respective languages.
You can use our API to stream or load data directly into your application. You can also use our API to make batch download requests, which instruct our service to prepare the data as flat files that can downloaded from the Download center.
Overview
Our historical API has the following structure:
- Metadata provides information about the datasets themselves.
- Time series provides all types of time series data. This includes subsampled data (second, minute, hour, daily aggregates), trades, top-of-book, order book deltas, order book snapshots, summary statistics, static data and macro indicators. We also provide properties of products such as expirations, tick sizes and symbols as time series data.
- Symbology provides methods that help find and resolve symbols across different symbology systems.
- Batch provides a means of submitting and querying for details of batch download requests.
Authentication
Databento uses API keys to authenticate requests. You can view and manage your keys on the API keys page of your portal.
Each API key is a 32-character string starting with db-
. By default, our library uses the
environment variable DATABENTO_API_KEY
as your API key. However, if you
pass an API key to the Historical
constructor through the key
parameter,
then this value will be used instead.
Related: Securing your API keys.
Schemas and conventions
A schema is a data record format represented as a collection of different data fields. Our datasets support multiple schemas, such as order book, trades, bar aggregates, and so on. You can get a dictionary describing the fields of each schema from our List of market data schemas.
You can get a list of all supported schemas for any given dataset using the Historical client's list_schemas method. The same information can also be found on the dataset details pages on the user portal.
The following table provides details about the data types and conventions used for various fields that you will commonly encounter in the data.
Name | Field | Description |
---|---|---|
Dataset | dataset |
A unique string name assigned to each dataset by Databento. Full list of datasets can be found from the metadata. |
Publisher ID | publisher_id |
A unique 16-bit unsigned integer assigned to each publisher by Databento. Full list of publisher IDs can be found from the metadata. |
Instrument ID | instrument_id |
A unique 32-bit unsigned integer assigned to each instrument by the venue. Information about instrument IDs for any given dataset can be found in the symbology. |
Order ID | order_id |
A unique 64-bit unsigned integer assigned to each order by the venue. |
Timestamp (event) | ts_event |
The matching-engine-received timestamp expressed as the number of nanoseconds since the UNIX epoch. |
Timestamp (receive) | ts_recv |
The capture-server-received timestamp expressed as the number of nanoseconds since the UNIX epoch. |
Timestamp delta (in) | ts_in_delta |
The matching-engine-sending timestamp expressed as the number of nanoseconds before ts_recv . See timestamping guide. |
Timestamp out | ts_out |
The Databento gateway-sending timestamp expressed as the number of nanoseconds since the UNIX epoch. See timestamping guide. |
Price | price |
The price expressed as signed integer where every 1 unit corresponds to 1e-9, i.e. 1/1,000,000,000 or 0.000000001. |
Book side | side |
The side that initiates the event. Can be Ask for a sell order (or sell aggressor in a trade), Bid for a buy order (or buy aggressor in a trade), or None where no side is specified by the original source. |
Size | size |
The order quantity. |
Flag | flag |
A bit field indicating event end, message characteristics, and data quality. |
Action | action |
The event type or order book operation. Can be Add, Cancel, Modify, cleaR book, Trade, Fill, or None. |
Sequence number | sequence |
The original message sequence number from the venue. |
Datasets
Databento provides time series datasets for a variety of markets, sourced from different publishers. Our available datasets can be browsed through the search feature on our site.
Each dataset is assigned a unique string identifier (dataset ID) in the form PUBLISHER.DATASET
, such as GLBX.MDP3
.
For publishers that are also markets, we use standard four-character ISO 10383 Market Identifier Codes (MIC).
Otherwise, Databento arbitrarily assigns a four-character identifier for the publisher.
These dataset IDs are also found on the Data catalog and Download request features of the Databento user portal.
When a publisher provides multiple data products with different levels of granularity, Databento subscribes to the most-granular product. We then provide this dataset with alternate schemas to make it easy to work with the level of detail most appropriate for your application.
More information about different types of venues and publishers is available in our FAQs.
Symbology
Databento's historical API supports several ways to select an instrument in a dataset. An instrument is specified using a symbol and a symbology type, also referred to as an stype. The supported symbology types are:
- Raw symbology (
raw_symbol
) original string symbols used by the publisher in the source data. - Instrument ID symbology (
instrument_id
) unique numeric ID assigned to each instrument by the publisher. - Parent symbology (
parent
) groups instruments related to the market for the same underlying. - Continuous contract symbology (
continuous
) proprietary symbology that specifies instruments based on certain systematic rules.
When requesting data from our timeseries.get_range or batch.submit_job endpoints, an input and output symbology type can be specified. By default, our client libraries will use raw symbology for the input type and instrument ID symbology for the output type. Not all symbology types are supported for every dataset.
The process of converting between one symbology type to another is called symbology resolution. This conversion can be done, for no cost, with the symbology.resolve endpoint.
For more about symbology at Databento, see our Standards and conventions.
Encodings
DBN
Databento Binary Encoding (DBN) is an extremely fast message encoding and highly-compressible storage format for normalized market data. It includes a self-describing metadata header and adopts a binary format with zero-copy serialization.
We recommend using our Python, C++, or Rust client libraries to read DBN files locally. A CLI tool is also available for converting DBN files to CSV or JSON.
CSV
Comma-separated values (CSV) is a simple text file format for tabular data, CSVs can be easily opened with Excel, loaded into pandas data frames, or parsed in C++.
Our CSVs have one header line, followed by one record per line.
Lines use UNIX-style \n
separators.
JSON
JavaScript Object Notation (JSON) is a flexible text file format with broad language support and wide adoption across web apps.
Our JSON files follow the JSON lines specification, where
each line of the file is a JSON record.
Lines use UNIX-style \n
separators.
Compression
Databento provides options for compressing files from our API. Available compression formats depend on the encoding you select.
Zstd
The Zstd
compression option uses the Zstandard format.
This option is available for all encodings, and is recommended for faster transfer speeds and smaller files.
You can read Zstandard files in Python using the zstandard package.
Read more about working with Zstandard-compressed files.
None
The None
compression option disables compression entirely, resulting
in significantly larger files.
However, this can be useful for loading small CSV files directly into Excel.
Dates and times
Our Python client library has several functions with timestamp arguments. These arguments will have type pandas.Timestamp | datetime.date | str | int
and support a variety of formats.
It's recommended to use pandas.Timestamp, which fully supports timezones and nanosecond-precision. If a datetime.date
is used, the time is set to midnight UTC. If an int
is provided, the value is interpreted as UNIX nanoseconds.
The client library also handles several string-based timestamp formats based on ISO 8601.
yyyy-mm-dd
, e.g."2022-02-28"
(midnight UTC)yyyy-mm-ddTHH:MM
, e.g."2022-02-28T23:50"
yyyy-mm-ddTHH:MM:SS
, e.g."2022-02-28T23:50:59"
yyyy-mm-ddTHH:MM:SS.NNNNNNNNN
, e.g."2022-02-28T23:50:59.123456789"
Timezone specification is also supported.
yyyy-mm-ddTHH:MMZ
yyyy-mm-ddTHH:MM±hh
yyyy-mm-ddTHH:MM±hhmm
yyyy-mm-ddTHH:MM±hh:mm
Bare dates
Some parameters require a bare date, without a time. These arguments have type datetime.date | str
and must either be a datetime.date
object, or a string in yyyy-mm-dd
format, e.g. "2022-02-28"
.
Errors
Our historical API uses HTTP response codes to indicate the success or failure of an API request. The client library provides exceptions that wrap these response codes.
2xx
indicates success.4xx
indicates an error on the client side. Represented as aBentoClientError
.5xx
indicates an error with Databento's servers. Represented as aBentoServerError
.
The full list of the response codes and associated causes is as follows:
Code | Message | Cause |
---|---|---|
200 | OK | Successful request. |
206 | Partial Content | Successful request, with partially resolved symbols. |
400 | Bad Request | Invalid request. Usually due to a missing, malformed or unsupported parameter. |
401 | Unauthorized | Invalid username or API key. |
402 | Payment Required | Issue with your account payment information. |
403 | Forbidden | The API key has insufficient permissions to perform the request. |
404 | Not Found | A resource is not found, or a requested symbol does not exist. |
409 | Conflict | A resource already exists. |
422 | Unprocessable Entity | The request is well formed, but we cannot or will not process the contained instructions. |
429 | Too Many Requests | API rate limit exceeded. |
500 | Internal Server Error | Unexpected condition encountered in our system. |
503 | Service Unavailable | Data gateway is offline or overloaded. |
504 | Gateway Timeout | Data gateway is available but other parts of our system are offline or overloaded. |
Rate limits
Our historical API allows each IP address up to:
- 100 concurrent connections.
- 100 time series requests per second.
- 100 symbology requests per second.
- 20 metadata requests per second.
- 20 batch list jobs requests per second.
- 20 batch submit job requests per minute.
When a request exceeds a rate limit, a BentoClientError
exception is raised
with a 429 error code.
Retry-After
The Retry-After response header indicates how long the user should wait before retrying.
If you find that your application has been rate-limited, you can retry after waiting for the time specified in the Retry-After header.
If you are using Python, you may use the time.sleep function as seen below to wait for the time specified in the Retry-After header.
e.g. time.sleep(int(response.headers("Retry-After", 1)))
This code snippet works best for our current APIs with their rate limits. Future APIs may have different rate limits, and might require a different default time delay.
Size limits
There is no size limit for either stream or batch download requests. Batch download is more manageable for large datasets, so we recommend using batch download for requests over 5 GB.
You can also manage the size of your request by splitting it into
multiple, smaller requests. The historical API allows you to make stream and
batch download requests with time ranges specified up to nanosecond resolution.
You can also use the limit
parameter in any request to limit the number of
data records returned from the service.
Batch download supports different
delivery methods which can be specified using the delivery
parameter.
Metered pricing
Databento only charges for the data that you use. You can find rates (per MB) for the various datasets and estimate pricing on our Data catalog. We meter the data by its uncompressed size in binary encoding.
When you stream the data, you are billed incrementally for each outbound byte of data sent from our historical gateway. If your connection is interrupted while streaming our data and our historical gateway detects connection timeout over 5 seconds, it will immediately stop sending data and you will not be billed for the remainder of your request.
Duplicate streaming requests will incur repeated charges. If you intend to access the same data multiple times, we recommend using our batch download feature. When you make a batch download request, you are only billed once for the request and, subsequently, you can download the data from the Download center multiple times over 30 days for no additional charge.
You will only be billed for usage of time series data. Access to metadata, symbology, and account management is free. The Historical.metadata.get_cost method can be used to determine cost before you request any data.
Related: Billing management.
Versioning
Our historical and live APIs and its client libraries adopt MAJOR.MINOR.PATCH
format
for version numbers. These version numbers conform to
semantic versioning. We are using major version 0
for
initial development, where our API is not considered stable.
Once we release major version 1
, our public API will be stable. This means that
you will be able to upgrade minor or patch versions to pick up new functionality,
without breaking your integration.
Starting with major versions after 1
, we will provide support for previous
versions for one year after the date of the subsequent major release.
For example, if version 2.0.0
is released on January 1, 2024, then all versions
1.x.y
of the API and client libraries will be deprecated. However, they will
remain supported until January 1, 2025.
We may introduce backwards-compatible changes between minor versions in the form of:
- New data encodings
- Additional fields to existing data schemas
- Additional batch download customizations
Our Release notes will contain information about both breaking and backwards-compatible changes in each release.
Our API and official client libraries are kept in sync with same-day releases
for major versions. For instance, 1.x.y
of the C++ client
library will use the same functionality found in any 1.x.y
version of the Python client.
Related: Release notes.
Historical
To access Databento's historical API, first create an instance of the
Historical
client. The entire API is exposed through instance methods of
the client.
Note that the API key can be passed as a parameter, which is
not recommended for production applications.
Instead, you can leave out this parameter to pass your API key via the DATABENTO_API_KEY
environment variable:
Currently, only BO1
is supported for historical data.
Parameters
None
then DATABENTO_API_KEY
environment variable is used.BO1
is supported. If None
then will connect to the default historical gateway.
Historical.metadata.list_publishers
List all publisher ID mappings.
Use this method to list the details of publishers, including their dataset and venue mappings.
Returns
list[dict[str, int | str]]
A list of publisher details objects.
[{'dataset': 'GLBX.MDP3',
'description': 'CME Globex MDP 3.0',
'publisher_id': 1,
'venue': 'GLBX'},
{'dataset': 'XNAS.ITCH',
'description': 'Nasdaq TotalView-ITCH',
'publisher_id': 2,
'venue': 'XNAS'},
{'dataset': 'XBOS.ITCH',
'description': 'Nasdaq BX TotalView-ITCH',
'publisher_id': 3,
'venue': 'XBOS'},
{'dataset': 'XPSX.ITCH',
'description': 'Nasdaq PSX TotalView-ITCH',
'publisher_id': 4,
'venue': 'XPSX'},
{'dataset': 'BATS.PITCH',
'description': 'Cboe BZX Depth',
'publisher_id': 5,
'venue': 'BATS'},
{'dataset': 'BATY.PITCH',
'description': 'Cboe BYX Depth',
'publisher_id': 6,
'venue': 'BATY'},
{'dataset': 'EDGA.PITCH',
'description': 'Cboe EDGA Depth',
'publisher_id': 7,
'venue': 'EDGA'},
{'dataset': 'EDGX.PITCH',
'description': 'Cboe EDGX Depth',
'publisher_id': 8,
'venue': 'EDGX'},
{'dataset': 'XNYS.PILLAR',
'description': 'NYSE Integrated',
'publisher_id': 9,
'venue': 'XNYS'},
{'dataset': 'XCIS.PILLAR',
'description': 'NYSE National Integrated',
'publisher_id': 10,
'venue': 'XCIS'},
{'dataset': 'XASE.PILLAR',
'description': 'NYSE American Integrated',
'publisher_id': 11,
'venue': 'XASE'},
{'dataset': 'XCHI.PILLAR',
'description': 'NYSE Texas Integrated',
'publisher_id': 12,
'venue': 'XCHI'},
{'dataset': 'XCIS.BBO',
'description': 'NYSE National BBO',
'publisher_id': 13,
'venue': 'XCIS'},
{'dataset': 'XCIS.TRADES',
'description': 'NYSE National Trades',
'publisher_id': 14,
'venue': 'XCIS'},
{'dataset': 'MEMX.MEMOIR',
'description': 'MEMX Memoir Depth',
'publisher_id': 15,
'venue': 'MEMX'},
{'dataset': 'EPRL.DOM',
'description': 'MIAX Pearl Depth',
'publisher_id': 16,
'venue': 'EPRL'},
{'dataset': 'XNAS.NLS',
'description': 'FINRA/Nasdaq TRF Carteret',
'publisher_id': 17,
'venue': 'FINN'},
{'dataset': 'XNAS.NLS',
'description': 'FINRA/Nasdaq TRF Chicago',
'publisher_id': 18,
'venue': 'FINC'},
{'dataset': 'XNYS.TRADES',
'description': 'FINRA/NYSE TRF',
'publisher_id': 19,
'venue': 'FINY'},
{'dataset': 'OPRA.PILLAR',
'description': 'OPRA - NYSE American Options',
'publisher_id': 20,
'venue': 'AMXO'},
{'dataset': 'OPRA.PILLAR',
'description': 'OPRA - BOX Options',
'publisher_id': 21,
'venue': 'XBOX'},
{'dataset': 'OPRA.PILLAR',
'description': 'OPRA - Cboe Options',
'publisher_id': 22,
'venue': 'XCBO'},
{'dataset': 'OPRA.PILLAR',
'description': 'OPRA - MIAX Emerald',
'publisher_id': 23,
'venue': 'EMLD'},
{'dataset': 'OPRA.PILLAR',
'description': 'OPRA - Cboe EDGX Options',
'publisher_id': 24,
'venue': 'EDGO'},
{'dataset': 'OPRA.PILLAR',
'description': 'OPRA - Nasdaq GEMX',
'publisher_id': 25,
'venue': 'GMNI'},
{'dataset': 'OPRA.PILLAR',
'description': 'OPRA - Nasdaq ISE',
'publisher_id': 26,
'venue': 'XISX'},
{'dataset': 'OPRA.PILLAR',
'description': 'OPRA - Nasdaq MRX',
'publisher_id': 27,
'venue': 'MCRY'},
{'dataset': 'OPRA.PILLAR',
'description': 'OPRA - MIAX Options',
'publisher_id': 28,
'venue': 'XMIO'},
{'dataset': 'OPRA.PILLAR',
'description': 'OPRA - NYSE Arca Options',
'publisher_id': 29,
'venue': 'ARCO'},
{'dataset': 'OPRA.PILLAR',
'description': 'OPRA - Options Price Reporting Authority',
'publisher_id': 30,
'venue': 'OPRA'},
{'dataset': 'OPRA.PILLAR',
'description': 'OPRA - MIAX Pearl',
'publisher_id': 31,
'venue': 'MPRL'},
{'dataset': 'OPRA.PILLAR',
'description': 'OPRA - Nasdaq Options',
'publisher_id': 32,
'venue': 'XNDQ'},
{'dataset': 'OPRA.PILLAR',
'description': 'OPRA - Nasdaq BX Options',
'publisher_id': 33,
'venue': 'XBXO'},
{'dataset': 'OPRA.PILLAR',
'description': 'OPRA - Cboe C2 Options',
'publisher_id': 34,
'venue': 'C2OX'},
{'dataset': 'OPRA.PILLAR',
'description': 'OPRA - Nasdaq PHLX',
'publisher_id': 35,
'venue': 'XPHL'},
{'dataset': 'OPRA.PILLAR',
'description': 'OPRA - Cboe BZX Options',
'publisher_id': 36,
'venue': 'BATO'},
{'dataset': 'OPRA.PILLAR',
'description': 'OPRA - MEMX Options',
'publisher_id': 37,
'venue': 'MXOP'},
{'dataset': 'IEXG.TOPS',
'description': 'IEX TOPS',
'publisher_id': 38,
'venue': 'IEXG'},
{'dataset': 'DBEQ.BASIC',
'description': 'DBEQ Basic - NYSE Texas',
'publisher_id': 39,
'venue': 'XCHI'},
{'dataset': 'DBEQ.BASIC',
'description': 'DBEQ Basic - NYSE National',
'publisher_id': 40,
'venue': 'XCIS'},
{'dataset': 'DBEQ.BASIC',
'description': 'DBEQ Basic - IEX',
'publisher_id': 41,
'venue': 'IEXG'},
{'dataset': 'DBEQ.BASIC',
'description': 'DBEQ Basic - MIAX Pearl',
'publisher_id': 42,
'venue': 'EPRL'},
{'dataset': 'ARCX.PILLAR',
'description': 'NYSE Arca Integrated',
'publisher_id': 43,
'venue': 'ARCX'},
{'dataset': 'XNYS.BBO',
'description': 'NYSE BBO',
'publisher_id': 44,
'venue': 'XNYS'},
{'dataset': 'XNYS.TRADES',
'description': 'NYSE Trades',
'publisher_id': 45,
'venue': 'XNYS'},
{'dataset': 'XNAS.QBBO',
'description': 'Nasdaq QBBO',
'publisher_id': 46,
'venue': 'XNAS'},
{'dataset': 'XNAS.NLS',
'description': 'Nasdaq Trades',
'publisher_id': 47,
'venue': 'XNAS'},
{'dataset': 'EQUS.PLUS',
'description': 'Databento US Equities Plus - NYSE Texas',
'publisher_id': 48,
'venue': 'XCHI'},
{'dataset': 'EQUS.PLUS',
'description': 'Databento US Equities Plus - NYSE National',
'publisher_id': 49,
'venue': 'XCIS'},
{'dataset': 'EQUS.PLUS',
'description': 'Databento US Equities Plus - IEX',
'publisher_id': 50,
'venue': 'IEXG'},
{'dataset': 'EQUS.PLUS',
'description': 'Databento US Equities Plus - MIAX Pearl',
'publisher_id': 51,
'venue': 'EPRL'},
{'dataset': 'EQUS.PLUS',
'description': 'Databento US Equities Plus - Nasdaq',
'publisher_id': 52,
'venue': 'XNAS'},
{'dataset': 'EQUS.PLUS',
'description': 'Databento US Equities Plus - NYSE',
'publisher_id': 53,
'venue': 'XNYS'},
{'dataset': 'EQUS.PLUS',
'description': 'Databento US Equities Plus - FINRA/Nasdaq TRF Carteret',
'publisher_id': 54,
'venue': 'FINN'},
{'dataset': 'EQUS.PLUS',
'description': 'Databento US Equities Plus - FINRA/NYSE TRF',
'publisher_id': 55,
'venue': 'FINY'},
{'dataset': 'EQUS.PLUS',
'description': 'Databento US Equities Plus - FINRA/Nasdaq TRF Chicago',
'publisher_id': 56,
'venue': 'FINC'},
{'dataset': 'IFEU.IMPACT',
'description': 'ICE Europe Commodities',
'publisher_id': 57,
'venue': 'IFEU'},
{'dataset': 'NDEX.IMPACT',
'description': 'ICE Endex',
'publisher_id': 58,
'venue': 'NDEX'},
{'dataset': 'DBEQ.BASIC',
'description': 'Databento US Equities Basic - Consolidated',
'publisher_id': 59,
'venue': 'DBEQ'},
{'dataset': 'EQUS.PLUS',
'description': 'EQUS Plus - Consolidated',
'publisher_id': 60,
'venue': 'EQUS'},
{'dataset': 'OPRA.PILLAR',
'description': 'OPRA - MIAX Sapphire',
'publisher_id': 61,
'venue': 'SPHR'},
{'dataset': 'EQUS.ALL',
'description': 'Databento US Equities (All Feeds) - NYSE Texas',
'publisher_id': 62,
'venue': 'XCHI'},
{'dataset': 'EQUS.ALL',
'description': 'Databento US Equities (All Feeds) - NYSE National',
'publisher_id': 63,
'venue': 'XCIS'},
{'dataset': 'EQUS.ALL',
'description': 'Databento US Equities (All Feeds) - IEX',
'publisher_id': 64,
'venue': 'IEXG'},
{'dataset': 'EQUS.ALL',
'description': 'Databento US Equities (All Feeds) - MIAX Pearl',
'publisher_id': 65,
'venue': 'EPRL'},
{'dataset': 'EQUS.ALL',
'description': 'Databento US Equities (All Feeds) - Nasdaq',
'publisher_id': 66,
'venue': 'XNAS'},
{'dataset': 'EQUS.ALL',
'description': 'Databento US Equities (All Feeds) - NYSE',
'publisher_id': 67,
'venue': 'XNYS'},
{'dataset': 'EQUS.ALL',
'description': 'Databento US Equities (All Feeds) - FINRA/Nasdaq TRF Carteret',
'publisher_id': 68,
'venue': 'FINN'},
{'dataset': 'EQUS.ALL',
'description': 'Databento US Equities (All Feeds) - FINRA/NYSE TRF',
'publisher_id': 69,
'venue': 'FINY'},
{'dataset': 'EQUS.ALL',
'description': 'Databento US Equities (All Feeds) - FINRA/Nasdaq TRF Chicago',
'publisher_id': 70,
'venue': 'FINC'},
{'dataset': 'EQUS.ALL',
'description': 'Databento US Equities (All Feeds) - Cboe BZX',
'publisher_id': 71,
'venue': 'BATS'},
{'dataset': 'EQUS.ALL',
'description': 'Databento US Equities (All Feeds) - Cboe BYX',
'publisher_id': 72,
'venue': 'BATY'},
{'dataset': 'EQUS.ALL',
'description': 'Databento US Equities (All Feeds) - Cboe EDGA',
'publisher_id': 73,
'venue': 'EDGA'},
{'dataset': 'EQUS.ALL',
'description': 'Databento US Equities (All Feeds) - Cboe EDGX',
'publisher_id': 74,
'venue': 'EDGX'},
{'dataset': 'EQUS.ALL',
'description': 'Databento US Equities (All Feeds) - Nasdaq BX',
'publisher_id': 75,
'venue': 'XBOS'},
{'dataset': 'EQUS.ALL',
'description': 'Databento US Equities (All Feeds) - Nasdaq PSX',
'publisher_id': 76,
'venue': 'XPSX'},
{'dataset': 'EQUS.ALL',
'description': 'Databento US Equities (All Feeds) - MEMX',
'publisher_id': 77,
'venue': 'MEMX'},
{'dataset': 'EQUS.ALL',
'description': 'Databento US Equities (All Feeds) - NYSE American',
'publisher_id': 78,
'venue': 'XASE'},
{'dataset': 'EQUS.ALL',
'description': 'Databento US Equities (All Feeds) - NYSE Arca',
'publisher_id': 79,
'venue': 'ARCX'},
{'dataset': 'EQUS.ALL',
'description': 'Databento US Equities (All Feeds) - Long-Term Stock Exchange',
'publisher_id': 80,
'venue': 'LTSE'},
{'dataset': 'XNAS.BASIC',
'description': 'Nasdaq Basic - Nasdaq',
'publisher_id': 81,
'venue': 'XNAS'},
{'dataset': 'XNAS.BASIC',
'description': 'Nasdaq Basic - FINRA/Nasdaq TRF Carteret',
'publisher_id': 82,
'venue': 'FINN'},
{'dataset': 'XNAS.BASIC',
'description': 'Nasdaq Basic - FINRA/Nasdaq TRF Chicago',
'publisher_id': 83,
'venue': 'FINC'},
{'dataset': 'IFEU.IMPACT',
'description': 'ICE Europe - Off-Market Trades',
'publisher_id': 84,
'venue': 'XOFF'},
{'dataset': 'NDEX.IMPACT',
'description': 'ICE Endex - Off-Market Trades',
'publisher_id': 85,
'venue': 'XOFF'},
{'dataset': 'XNAS.NLS',
'description': 'Nasdaq NLS - Nasdaq BX',
'publisher_id': 86,
'venue': 'XBOS'},
{'dataset': 'XNAS.NLS',
'description': 'Nasdaq NLS - Nasdaq PSX',
'publisher_id': 87,
'venue': 'XPSX'},
{'dataset': 'XNAS.BASIC',
'description': 'Nasdaq Basic - Nasdaq BX',
'publisher_id': 88,
'venue': 'XBOS'},
{'dataset': 'XNAS.BASIC',
'description': 'Nasdaq Basic - Nasdaq PSX',
'publisher_id': 89,
'venue': 'XPSX'},
{'dataset': 'EQUS.SUMMARY',
'description': 'Databento Equities Summary',
'publisher_id': 90,
'venue': 'EQUS'},
{'dataset': 'XCIS.TRADESBBO',
'description': 'NYSE National Trades and BBO',
'publisher_id': 91,
'venue': 'XCIS'},
{'dataset': 'XNYS.TRADESBBO',
'description': 'NYSE Trades and BBO',
'publisher_id': 92,
'venue': 'XNYS'},
{'dataset': 'XNAS.BASIC',
'description': 'Nasdaq Basic - Consolidated',
'publisher_id': 93,
'venue': 'EQUS'},
{'dataset': 'EQUS.ALL',
'description': 'Databento US Equities (All Feeds) - Consolidated',
'publisher_id': 94,
'venue': 'EQUS'},
{'dataset': 'EQUS.MINI',
'description': 'Databento US Equities Mini',
'publisher_id': 95,
'venue': 'EQUS'},
{'dataset': 'XNYS.TRADES',
'description': 'NYSE Trades - Consolidated',
'publisher_id': 96,
'venue': 'EQUS'},
{'dataset': 'IFUS.IMPACT',
'description': 'ICE Futures US',
'publisher_id': 97,
'venue': 'IFUS'},
{'dataset': 'IFUS.IMPACT',
'description': 'ICE Futures US - Off-Market Trades',
'publisher_id': 98,
'venue': 'XOFF'},
{'dataset': 'IFLL.IMPACT',
'description': 'ICE Europe Financials',
'publisher_id': 99,
'venue': 'IFLL'},
{'dataset': 'IFLL.IMPACT',
'description': 'ICE Europe Financials - Off-Market Trades',
'publisher_id': 100,
'venue': 'XOFF'},
{'dataset': 'XEUR.EOBI',
'description': 'Eurex EOBI',
'publisher_id': 101,
'venue': 'XEUR'},
{'dataset': 'XEEE.EOBI',
'description': 'European Energy Exchange EOBI',
'publisher_id': 102,
'venue': 'XEEE'},
{'dataset': 'XEUR.EOBI',
'description': 'Eurex EOBI - Off-Market Trades',
'publisher_id': 103,
'venue': 'XOFF'},
{'dataset': 'XEEE.EOBI',
'description': 'European Energy Exchange EOBI - Off-Market Trades',
'publisher_id': 104,
'venue': 'XOFF'}
]
Historical.metadata.list_datasets
List all valid dataset IDs on Databento.
Use this method to list the available dataset IDs (string identifiers), so you can use
other methods which take the dataset
parameter.
Parameters
None
then first date available.None
then last date available.
Returns
list[str]
A list of available dataset IDs.
Historical.metadata.list_schemas
List all available schemas for a dataset.
Parameters
Returns
list[str]
A list of available data schemas.
Historical.metadata.list_fields
List all fields for a particular schema and encoding.
Parameters
Returns
list[dict[str, str]]
A list of field details objects.
[
{
"name": "length",
"type": "uint8_t"
},
{
"name": "rtype",
"type": "uint8_t"
},
{
"name": "publisher_id",
"type": "uint16_t"
},
{
"name": "instrument_id",
"type": "uint32_t"
},
{
"name": "ts_event",
"type": "uint64_t"
},
{
"name": "price",
"type": "int64_t"
},
{
"name": "size",
"type": "uint32_t"
},
{
"name": "action",
"type": "char"
},
{
"name": "side",
"type": "char"
},
{
"name": "flags",
"type": "uint8_t"
},
{
"name": "depth",
"type": "uint8_t"
},
{
"name": "ts_recv",
"type": "uint64_t"
},
{
"name": "ts_in_delta",
"type": "int32_t"
},
{
"name": "sequence",
"type": "uint32_t"
}
]
Historical.metadata.list_unit_prices
List unit prices for each feed mode and data schema in US dollars per gigabyte.
Parameters
Returns
list[dict[str, Any]]
A list of maps of feed mode to schema to unit price.
[
{
"mode": "historical",
"unit_prices": {
"mbp-1": 0.04,
"ohlcv-1s": 280.0,
"ohlcv-1m": 280.0,
"ohlcv-1h": 600.0,
"ohlcv-1d": 600.0,
"tbbo": 210.0,
"trades": 280.0,
"statistics": 11.0,
"definition": 5.0
}
},
{
"mode": "historical-streaming",
"unit_prices": {
"mbp-1": 0.04,
"ohlcv-1s": 280.0,
"ohlcv-1m": 280.0,
"ohlcv-1h": 600.0,
"ohlcv-1d": 600.0,
"tbbo": 210.0,
"trades": 280.0,
"statistics": 11.0,
"definition": 5.0
}
},
{
"mode": "live",
"unit_prices": {
"mbp-1": 0.05,
"ohlcv-1s": 336.0,
"ohlcv-1m": 336.0,
"ohlcv-1h": 720.0,
"ohlcv-1d": 720.0,
"tbbo": 252.0,
"trades": 336.0,
"statistics": 13.2,
"definition": 6.0
}
}
]
Historical.metadata.get_dataset_condition
Get the dataset condition from Databento.
Use this method to discover data availability and quality.
Parameters
None
then first date available.None
then last date available.
Returns
list[dict[str, str | None]]
A list of conditions per date.
None
when condition
is 'missing'.Possible values for condition:
available
: the data is available with no known issuesdegraded
: the data is available, but there may be missing data or other correctness issuespending
: the data is not yet available, but may be available soonmissing
: the data is not available
[
{
"date": "2022-06-06",
"condition": "available",
"last_modified_date": "2024-05-18"
},
{
"date": "2022-06-07",
"condition": "available",
"last_modified_date": "2024-05-21"
},
{
"date": "2022-06-08",
"condition": "available",
"last_modified_date": "2024-05-21"
},
{
"date": "2022-06-09",
"condition": "available",
"last_modified_date": "2024-05-21"
},
{
"date": "2022-06-10",
"condition": "available",
"last_modified_date": "2024-05-22"
}
]
Historical.metadata.get_dataset_range
Get the available range for the dataset given the user's entitlements.
Use this method to discover data availability.
The start
and end
values in the response can be used with the timeseries.get_range and batch.submit_job endpoints.
This endpoint will return the start
and end
timestamps over the entire dataset as well as the per-schema start
and end
timestamps under the schema
key.
In some cases, a schema's availability is a subset of the entire dataset availability.
Parameters
Returns
dict[str, str | dict[str, str]]
The available range for the dataset.
start
and end
timestamps.{
"start":"2018-05-01T00:00:00.000000000Z",
"end":"2025-01-30T00:00:00.000000000Z",
"schema": {
"mbo": {
"start":"2018-05-01T00:00:00.000000000Z",
"end":"2025-01-30T00:00:00.000000000Z"
},
"mbp-1": {
"start":"2018-05-01T00:00:00.000000000Z",
"end":"2025-01-30T00:00:00.000000000Z"
},
"mbp-10": {
"start":"2018-05-01T00:00:00.000000000Z",
"end":"2025-01-30T00:00:00.000000000Z"
},
"bbo-1s": {
"start":"2018-05-01T00:00:00.000000000Z",
"end":"2025-01-30T00:00:00.000000000Z"
},
"bbo-1m": {
"start":"2018-05-01T00:00:00.000000000Z",
"end":"2025-01-30T00:00:00.000000000Z"
},
"tbbo": {
"start":"2018-05-01T00:00:00.000000000Z",
"end":"2025-01-30T00:00:00.000000000Z"
},
"trades": {
"start":"2018-05-01T00:00:00.000000000Z",
"end":"2025-01-30T00:00:00.000000000Z"
},
"ohlcv-1s": {
"start":"2018-05-01T00:00:00.000000000Z",
"end":"2025-01-30T00:00:00.000000000Z"
},
"ohlcv-1m": {
"start":"2018-05-01T00:00:00.000000000Z",
"end":"2025-01-30T00:00:00.000000000Z"
},
"ohlcv-1h": {
"start":"2018-05-01T00:00:00.000000000Z",
"end":"2025-01-30T00:00:00.000000000Z"
},
"ohlcv-1d": {
"start":"2018-05-01T00:00:00.000000000Z",
"end":"2025-01-30T00:00:00.000000000Z"
},
"definition": {
"start":"2018-05-01T00:00:00.000000000Z",
"end":"2025-01-30T00:00:00.000000000Z"
},
"statistics": {
"start":"2018-05-01T00:00:00.000000000Z",
"end":"2025-01-30T00:00:00.000000000Z"
},
"status": {
"start":"2018-05-01T00:00:00.000000000Z",
"end":"2025-01-30T00:00:00.000000000Z"
},
"imbalance": {
"start":"2018-05-01T00:00:00.000000000Z",
"end":"2025-01-30T00:00:00.000000000Z"
}
}
}
Historical.metadata.get_record_count
Get the record count of the time series data query.
This method may not be accurate for time ranges that are not discrete multiples of 10 minutes, potentially over-reporting the number of records in such cases. The definition schema is only accurate for discrete multiples of 24 hours.
Parameters
start
based on the resolution provided.'ALL_SYMBOLS'
or None
then will select all symbols.symbols
. Must be one of 'raw_symbol', 'instrument_id', 'parent', or 'continuous'.None
then no limit.
Returns
int
The record count.
Historical.metadata.get_record_count(
dataset: Dataset | str,
start: pd.Timestamp | datetime | date | str | int,
end: pd.Timestamp | datetime | date | str | int | None = None,
symbols: Iterable[str | int] | str | int | None = None,
schema: Schema | str = "trades",
stype_in: SType | str = "raw_symbol",
limit: int | None = None,
) -> int
Historical.metadata.get_billable_size
Get the billable uncompressed raw binary size for historical streaming or batched files.
This method may not be accurate for time ranges that are not discrete multiples of 10 minutes, potentially over-reporting the size in such cases. The definition schema is only accurate for discrete multiples of 24 hours.
Info
Parameters
start
based on the resolution provided.'ALL_SYMBOLS'
or None
then will select all symbols.symbols
. Must be one of 'raw_symbol', 'instrument_id', 'parent', or 'continuous'.None
then no limit.
Returns
int
The size in number of bytes used for billing.
Historical.metadata.get_billable_size(
dataset: Dataset | str,
start: pd.Timestamp | datetime | date | str | int,
end: pd.Timestamp | datetime | date | str | int | None = None,
symbols: Iterable[str | int] | str | int | None = None,
schema: Schema | str = "trades",
stype_in: Stype | str = "raw_symbol",
limit: int | None = None,
) -> int
Historical.metadata.get_cost
Get the cost in US dollars for a historical streaming or batch download request. This cost respects any discounts provided by flat rate plans.
This method may not be accurate for time ranges that are not discrete multiples of 10 minutes, potentially over-reporting the cost in such cases. The definition schema is only accurate for discrete multiples of 24 hours.
InfoThe amount billed will be based on the actual amount of bytes sent; see our pricing documentation for more details.
Parameters
start
based on the resolution provided.'ALL_SYMBOLS'
or None
then will select all symbols.symbols
. Must be one of 'raw_symbol', 'instrument_id', 'parent', or 'continuous'.None
then no limit.
Returns
float
The cost in US dollars.
Historical.metadata.get_cost(
dataset: Dataset | str,
start: pd.Timestamp | datetime | date | str | int,
end: pd.Timestamp | datetime | date | str | int | None = None,
mode: FeedMode | str = "historical-streaming",
symbols: Iterable[str | int] | str | int | None = None,
schema: Schema | str = "trades",
stype_in: SType | str = "raw_symbol",
limit: int | None = None,
) -> float
Historical.timeseries.get_range
Makes a streaming request for time series data from Databento.
This is the primary method for getting historical market data, instrument definitions, and status data directly into your application.
This method only returns after all of the data has been downloaded, which can take a long time. For large requests, consider using batch.submit_job instead.
Parameters
ts_recv
if it exists in the schema, otherwise ts_event
. Takes pd.Timestamp, Python datetime, Python date, ISO 8601 string, or UNIX timestamp in nanoseconds. Assumes UTC as timezone unless otherwise specified.ts_recv
if it exists in the schema, otherwise ts_event
. Takes pd.Timestamp, Python datetime, Python date, ISO 8601 string, or UNIX timestamp in nanoseconds. Assumes UTC as timezone unless otherwise specified. Defaults to the forward filled value of start
based on the resolution provided.'ALL_SYMBOLS'
or None
then will select all symbols.symbols
. Must be one of 'raw_symbol', 'instrument_id', 'parent', or 'continuous'.None
then no limit.
Returns
A DBNStore object.
A full list of fields for each schema is available through Historical.metadata.list_fields.
Historical.timeseries.get_range(
dataset: Dataset | str,
start: pd.Timestamp | datetime | date | str | int,
end: pd.Timestamp | datetime | date | str | int | None = None,
symbols: Iterable[str | int] | str | int | None = None,
schema: Schema | str = "trades",
stype_in: SType | str = "raw_symbol",
stype_out: SType | str = "instrument_id",
limit: int | None = None,
path: PathLike[str] | str | None = None,
) -> DBNStore
import databento as db
client = db.Historical("$YOUR_API_KEY")
data = client.timeseries.get_range(
dataset="GLBX.MDP3",
symbols=["ESM2"],
schema="trades",
start="2022-06-06T00:00:00",
end="2022-06-10T00:10:00",
limit=1,
)
df = data.to_df()
print(df.iloc[0].to_json(indent=4))
Historical.timeseries.get_range_async
Asynchronously request a historical time series data stream from Databento.
Primary method for getting historical intraday market data, daily data, instrument definitions and market status data directly into your application.
This method only returns after all of the data has been downloaded, which can take a long time. For large requests, consider using batch.submit_job instead.
Parameters
ts_recv
if it exists in the schema, otherwise ts_event
. Takes pd.Timestamp, Python datetime, Python date, ISO 8601 string, or UNIX timestamp in nanoseconds. Assumes UTC as timezone unless otherwise specified.ts_recv
if it exists in the schema, otherwise ts_event
. Takes pd.Timestamp, Python datetime, Python date, ISO 8601 string, or UNIX timestamp in nanoseconds. Assumes UTC as timezone unless otherwise specified. Defaults to the forward filled value of start
based on the resolution provided.'ALL_SYMBOLS'
or None
then will select all symbols.symbols
. Must be one of 'raw_symbol', 'instrument_id', 'parent', or 'continuous'.None
then no limit.
Returns
A DBNStore object.
A full list of fields for each schema is available through Historical.metadata.list_fields.
Historical.timeseries.get_range_async(
dataset: Dataset | str,
start: pd.Timestamp | datetime | date | str | int,
end: pd.Timestamp | datetime | date | str | int | None = None,
symbols: Iterable[str | int] | str | int | None = None,
schema: Schema | str = "trades",
stype_in: SType | str = "raw_symbol",
stype_out: SType | str = "instrument_id",
limit: int | None = None,
path: PathLike[str] | str | None = None,
) -> Awaitable[DBNStore]
import asyncio
import databento as db
client = db.Historical("$YOUR_API_KEY")
coro = client.timeseries.get_range_async(
dataset="GLBX.MDP3",
symbols=["ESM2"],
schema="trades",
start="2022-06-06T00:00:00",
end="2022-06-10T00:10:00",
limit=1,
)
data = asyncio.run(coro)
df = data.to_df()
print(df.iloc[0].to_json(indent=4))
Historical.symbology.resolve
Resolve a list of symbols from an input symbology type, to an output symbology type.
Take, for example, a raw symbol to an instrument ID: ESM2
ā 3403
.
Parameters
'ALL_SYMBOLS'
to request all symbols (not available for every dataset).symbols
. Must be one of 'raw_symbol', 'instrument_id', 'parent', or 'continuous'.start
based on the resolution provided.
Returns
dict[str, Any]
The results for the symbology resolution.
See alsoFor more information on symbology resolution, visit our symbology documentation.
{
"result": {
"ESM2": [
{
"d0": "2022-06-01",
"d1": "2022-06-26",
"s": "3403"
}
]
},
"symbols": [
"ESM2"
],
"stype_in": "raw_symbol",
"stype_out": "instrument_id",
"start_date": "2022-06-01",
"end_date": "2022-06-30",
"partial": [],
"not_found": [],
"message": "OK",
"status": 0
}
Batch downloads
Batch downloads allow you to download data files directly from within your portal. For more information, see Streaming vs. batch download.
Historical.batch.submit_job
Make a batch download job request.
Once a request is submitted, our system processes the request and prepares the batch files in the background. The status of your request and the files can be accessed from the Download center from your user portal.
This method takes longer than a streaming request, but is advantageous for larger requests as it supports delivery mechanisms that allow multiple accesses of the data without additional cost for each subsequent download after the first.
Related: batch.list_jobs.
Parameters
'ALL_SYMBOLS'
or None
then will select all symbols.ts_recv
if it exists in the schema, otherwise ts_event
. Takes pd.Timestamp, Python datetime, Python date, ISO 8601 string, or UNIX timestamp in nanoseconds. Assumes UTC as timezone unless otherwise specified.ts_recv
if it exists in the schema, otherwise ts_event
. Takes pd.Timestamp, Python datetime, Python date, ISO 8601 string, or UNIX timestamp in nanoseconds. Assumes UTC as timezone unless otherwise specified. Defaults to the forward filled value of start
based on the resolution provided.'ALL_SYMBOLS'
. Cannot be used with limit
.symbols
. Must be one of 'raw_symbol', 'instrument_id', 'parent', or 'continuous'.None
then no limit. Cannot be used with split_symbols
.
Returns
dict[str, Any]
The description of the submitted batch job.
None
until the job is done processing).symbols
.None
until the job is processed).Historical.batch.submit_job(
dataset: Dataset | str,
symbols: Iterable[str | int] | str | int,
schema: Schema | str,
start: pd.Timestamp | datetime | date | str | int,
end: pd.Timestamp | datetime | date | str | int | None = None,
encoding: Encoding | str = "dbn",
compression: Compression | str = "zstd",
pretty_px: bool = False,
pretty_ts: bool = False,
map_symbols: bool = False,
split_symbols: bool = False,
split_duration: Duration | str = "day",
split_size: int | None = None,
delivery: Delivery | str = "download",
stype_in: SType | str = "raw_symbol",
stype_out: SType | str = "instrument_id",
limit: int | None = None,
) -> dict[str, Any]
{
"id": "GLBX-20221217-MN5S5S4WAS",
"user_id": "NBPDLF33",
"api_key": "prod-001",
"cost_usd": None,
"dataset": "GLBX.MDP3",
"symbols": "ESM2",
"stype_in": "raw_symbol",
"stype_out": "instrument_id",
"schema": "trades",
"start": "2022-06-06T12:00:00.000000000Z",
"end": "2022-06-10T00:00:00.000000000Z",
"limit": None,
"encoding": "dbn",
"compression": "zstd",
"pretty_px": False,
"pretty_ts": False,
"map_symbols": False,
"split_symbols": False,
"split_duration": "day",
"split_size": None,
"packaging": None,
"delivery": "download",
"record_count": None,
"billed_size": None,
"actual_size": None,
"package_size": None,
"state": "queued",
"ts_received": "2022-12-17T00:36:37.844913000Z",
"ts_queued": None,
"ts_process_start": None,
"ts_process_done": None,
"ts_expiration": None
}
Historical.batch.list_jobs
List batch job details for the user account.
The job details will be sorted in order of ts_received
.
Related: Download center.
Parameters
Returns
list[dict[str, Any]]
A list of batch job details. See batch.submit_job for a detailed list of returned values.
[
{
"id": "GLBX-20221126-DBVXWPJJQN",
"user_id": "NBPDLF33",
"api_key": "prod-001",
"cost_usd": 23.6454,
"dataset": "GLBX.MDP3",
"symbols": "ZC.FUT,ES.FUT",
"stype_in": "parent",
"stype_out": "instrument_id",
"schema": "mbo",
"start": "2022-10-24T00:00:00.000000000Z",
"end": "2022-11-24T00:00:00.000000000Z",
"limit": None,
"encoding": "csv",
"compression": "zstd",
"pretty_px": False,
"pretty_ts": False,
"map_symbols": False,
"split_symbols": False,
"split_duration": "day",
"split_size": None,
"packaging": None,
"delivery": "download",
"record_count": 412160224,
"billed_size": 23080972544,
"actual_size": 8144595219,
"package_size": 8144628684,
"state": "done",
"ts_received": "2022-11-26T09:23:17.519708000Z",
"ts_queued": "2022-12-03T14:34:57.897790000Z",
"ts_process_start": "2022-12-03T14:35:00.495167000Z",
"ts_process_done": "2022-12-03T14:48:15.710116000Z",
"ts_expiration": "2023-01-02T14:48:15.710116000Z",
"progress": 100
},
...
Historical.batch.list_files
List files for a batch job.
Will include the manifest.json
, the metadata.json
, and batched data files.
Related: Download center.
Parameters
Returns
list[dict[str, Any]]
The file details for the batch job.
[
{
"filename": "metadata.json",
"size": 1102,
"hash": "sha256:0168d53e1705b69b1d6407f10bb3ab48aac492fa0f68f863cc9b092931cc67a7",
"urls": {
"https": "https://api.databento.com/v0/batch/download/46PCMCVF/GLBX-20230203-WF9WJYSCDU/metadata.json",
"ftp": "ftp://ftp.databento.com/46PCMCVF/GLBX-20230203-WF9WJYSCDU/metadata.json",
}
},
{
"filename": "glbx-mdp3-20220610.mbo.csv.zst",
"size": 21832,
"hash": "sha256:1218930af153b4953632216044ef87607afa467fc7ab7fbb1f031fceacf9d52a",
"urls": {
"https": "https://api.databento.com/v0/batch/download/46PCMCVF/GLBX-20230203-WF9WJYSCDU/glbx-mdp3-20220610.mbo.csv.zst",
"ftp": "ftp://ftp.databento.com/46PCMCVF/GLBX-20230203-WF9WJYSCDU/glbx-mdp3-20220610.mbo.csv.zst",
}
}
]
Historical.batch.download
Download a batch job or a specific file to {output_dir}/{job_id}/
.
Will automatically generate any necessary directories if they do not already exist.
Related: Download center.
Parameters
None
, defaults to the current working directory.None
then will download all files for the batch job.True
, and filename_to_download
is None
, all job files will be saved as a .zip archive in the output_dir
.
Returns
list[Path]
A list of paths to the downloaded files.
import databento as db
client = db.Historical("$YOUR_API_KEY")
# Download all files for the batch job
client.batch.download(
job_id="GLBX-20220610-5DEFXVTMSM",
output_dir="my_data/",
)
# Alternatively, you can download a specific file
client.batch.download(
job_id="GLBX-20220610-5DEFXVTMSM",
output_dir="my_data/",
filename_to_download="metadata.json",
)
Historical.batch.download_async
Asynchronously download a batch job or a specific file to {output_dir}/{job_id}/
.
Will automatically generate any necessary directories if they do not already exist.
Related: Download center.
Parameters
None
, defaults to the current working directory.None
then will download all files for the batch job.True
, and filename_to_download
is None
all job files will be saved as a .zip archive in the output_dir
.
Returns
list[Path]
A list of paths to the downloaded files.
import asyncio
import databento as db
client = db.Historical("$YOUR_API_KEY")
# Download all files for the batch job
coro = client.batch.download_async(
job_id="GLBX-20220610-5DEFXVTMSM",
output_dir="my_data/",
)
asyncio.run(coro)
# Alternatively, you can download a specific file
coro = client.batch.download_async(
job_id="GLBX-20220610-5DEFXVTMSM",
output_dir="my_data/",
filename_to_download="metadata.json",
)
asyncio.run(coro)
DBNStore
The DBNStore
object is an I/O helper class for working with DBN-encoded data.
Typically, this object is created when performing historical requests. However,
it can be created directly using DBN data on disk or in memory using provided
factory methods:
Attributes
DBNStore
.None
, the DBNStore
may contain multiple schemas.None
, the DBNStore
may contain mixed STypes.None
, the DBNStore
data was created without a known end time.
DBNStore.from_bytes
Read data from a DBN byte stream.
Parameters
Returns
A DBNStore object.
import databento as db
client = db.Historical("$YOUR_API_KEY")
data = client.timeseries.get_range(
dataset="GLBX.MDP3",
symbols=["ESM2"],
schema="trades",
start="2022-06-06",
)
# Save streamed data to .dbn.zst
path = "GLBX-ESM2-20220606.trades.dbn.zst"
data.to_file(path)
# Open saved data as a byte stream.
with open(path, "rb") as saved:
stored_data = db.DBNStore.from_bytes(saved)
# Convert to dataframe
df = stored_data.to_df()
print(df.head())
ts_event rtype publisher_id instrument_id action ... size flags ts_in_delta sequence symbol
ts_recv ...
2022-06-06 00:00:00.070314216+00:00 2022-06-06 00:00:00.070033767+00:00 0 1 3403 T ... 1 0 18681 157862 ESM2
2022-06-06 00:00:00.090544076+00:00 2022-06-06 00:00:00.089830441+00:00 0 1 3403 T ... 1 0 18604 157922 ESM2
2022-06-06 00:00:00.807324169+00:00 2022-06-06 00:00:00.807018955+00:00 0 1 3403 T ... 4 0 18396 158072 ESM2
2022-06-06 00:00:01.317722490+00:00 2022-06-06 00:00:01.317385867+00:00 0 1 3403 T ... 1 0 22043 158111 ESM2
2022-06-06 00:00:01.317736158+00:00 2022-06-06 00:00:01.317385867+00:00 0 1 3403 T ... 7 0 17280 158112 ESM2
[5 rows x 13 columns]
DBNStore.from_file
Read data from a DBN file.
See also
databento.read_dbn
is an alias forDBNStore.from_file
.
Parameters
Returns
A DBNStore object.
import databento as db
client = db.Historical("$YOUR_API_KEY")
data = client.timeseries.get_range(
dataset="GLBX.MDP3",
symbols=["ESM2"],
schema="trades",
start="2022-06-06",
)
# Save streamed data to .dbn.zst
path = "GLBX-ESM2-20220606.trades.dbn.zst"
data.to_file(path)
# Read saved .dbn.zst
stored_data = db.DBNStore.from_file(path)
# Convert to dataframe
df = stored_data.to_df()
print(df.head())
ts_event rtype publisher_id instrument_id action ... size flags ts_in_delta sequence symbol
ts_recv ...
2022-06-06 00:00:00.070314216+00:00 2022-06-06 00:00:00.070033767+00:00 0 1 3403 T ... 1 0 18681 157862 ESM2
2022-06-06 00:00:00.090544076+00:00 2022-06-06 00:00:00.089830441+00:00 0 1 3403 T ... 1 0 18604 157922 ESM2
2022-06-06 00:00:00.807324169+00:00 2022-06-06 00:00:00.807018955+00:00 0 1 3403 T ... 4 0 18396 158072 ESM2
2022-06-06 00:00:01.317722490+00:00 2022-06-06 00:00:01.317385867+00:00 0 1 3403 T ... 1 0 22043 158111 ESM2
2022-06-06 00:00:01.317736158+00:00 2022-06-06 00:00:01.317385867+00:00 0 1 3403 T ... 7 0 17280 158112 ESM2
[5 rows x 13 columns]
DBNStore.reader
Return an I/O reader for the data.
Returns
A raw IO stream for reading the DBNStore data.
DBNStore.replay
Replay data by passing records sequentially to the given callback.
Refer to the List of fields by schema article for documentation on the fields contained with each record type.
Parameters
Returns
None
import databento as db
client = db.Historical("$YOUR_API_KEY")
data = client.timeseries.get_range(
dataset="GLBX.MDP3",
symbols=["ESM2"],
start="2022-06-06",
)
def print_large_trades(trade):
size = getattr(trade, "size", 0)
if size >= 200:
print(trade)
data.replay(print_large_trades)
TradeMsg { hd: RecordHeader { length: 12, rtype: Mbp0, publisher_id: GlbxMdp3Glbx, instrument_id: 3403, ts_event: 1654524078339857609 }, price: 4164.000000000, size: 291, action: 'T', side: 'B', flags: 0, depth: 0, ts_recv: 1654524078342408839, ts_in_delta: 20352, sequence: 3605032 }
TradeMsg { hd: RecordHeader { length: 12, rtype: Mbp0, publisher_id: GlbxMdp3Glbx, instrument_id: 3403, ts_event: 1654524133736900455 }, price: 4160.000000000, size: 216, action: 'T', side: 'B', flags: 0, depth: 0, ts_recv: 1654524133737794739, ts_in_delta: 28024, sequence: 3659203 }
TradeMsg { hd: RecordHeader { length: 12, rtype: Mbp0, publisher_id: GlbxMdp3Glbx, instrument_id: 3403, ts_event: 1654538295588752739 }, price: 4140.000000000, size: 200, action: 'T', side: 'B', flags: 0, depth: 0, ts_recv: 1654538295589900967, ts_in_delta: 21708, sequence: 10031624 }
DBNStore.request_full_definitions
Request for full instrument Definition(s) for all symbols based on the metadata properties. This is useful for retrieving the instrument definitions for saved DBN data.
A timeseries.get_range request is made to obtain the definitions data which will incur a cost.
Parameters
DBNStore
).
Returns
A DBNStore object.
A full list of fields for each schema is available through Historical.metadata.list_fields.
import databento as db
client = db.Historical(
key="$YOUR_API_KEY",
)
trades = client.timeseries.get_range(
dataset="GLBX.MDP3",
symbols=["ES.FUT"],
stype_in="parent",
schema="trades",
start="2022-06-06",
)
definitions = trades.request_full_definitions(client).to_df()
definitions = definitions.sort_values(["expiration", "symbol"]).set_index("expiration")
print(definitions[["symbol"]])
symbol
expiration
2022-06-17 13:30:00+00:00 ESM2
2022-06-17 13:30:00+00:00 ESM2-ESH3
2022-06-17 13:30:00+00:00 ESM2-ESM3
2022-06-17 13:30:00+00:00 ESM2-ESU2
2022-06-17 13:30:00+00:00 ESM2-ESZ2
2022-09-16 13:30:00+00:00 ESU2
2022-09-16 13:30:00+00:00 ESU2-ESH3
2022-09-16 13:30:00+00:00 ESU2-ESM3
2022-09-16 13:30:00+00:00 ESU2-ESU3
2022-09-16 13:30:00+00:00 ESU2-ESZ2
2022-12-16 14:30:00+00:00 ESZ2
2022-12-16 14:30:00+00:00 ESZ2-ESH3
2022-12-16 14:30:00+00:00 ESZ2-ESM3
2022-12-16 14:30:00+00:00 ESZ2-ESU3
2023-03-17 13:30:00+00:00 ESH3
2023-03-17 13:30:00+00:00 ESH3-ESM3
2023-03-17 13:30:00+00:00 ESH3-ESU3
2023-03-17 13:30:00+00:00 ESH3-ESZ3
2023-06-16 13:30:00+00:00 ESM3
2023-06-16 13:30:00+00:00 ESM3-ESU3
2023-06-16 13:30:00+00:00 ESM3-ESZ3
2023-09-15 13:30:00+00:00 ESU3
2023-09-15 13:30:00+00:00 ESU3-ESH4
2023-09-15 13:30:00+00:00 ESU3-ESZ3
2023-12-15 14:30:00+00:00 ESZ3
2023-12-15 14:30:00+00:00 ESZ3-ESH4
2024-03-15 13:30:00+00:00 ESH4
2024-03-15 13:30:00+00:00 ESH4-ESM4
2024-06-21 13:30:00+00:00 ESM4
2024-09-20 13:30:00+00:00 ESU4
2024-12-20 14:30:00+00:00 ESZ4
2025-12-19 14:30:00+00:00 ESZ5
2026-12-18 14:30:00+00:00 ESZ6
DBNStore.request_symbology
Request to resolve symbology mappings based on the metadata properties.
Parameters
Returns
dict[str, Any]
A result including a map of input symbol to output symbol across a date range.
import databento as db
client = db.Historical("$YOUR_API_KEY")
data = client.timeseries.get_range(
dataset="GLBX.MDP3",
symbols=["ESM2"],
schema="trades",
start="2022-06-06",
)
# Save streamed data to .dbn.zst
data.to_file("GLBX-ESM2-20201229.trades.dbn.zst")
# Read saved .dbn.zst
stored_data = db.DBNStore.from_file("GLBX-ESM2-20201229.trades.dbn.zst")
# Request symbology from .dbn.zst metadata
symbology = stored_data.request_symbology(client=client)
print(symbology)
{
"result": {
"ESM2": [
{
"d0": "2022-06-06",
"d1": "2022-06-07",
"s": "3403"
}
]
},
"symbols": [
"ESM2"
],
"stype_in": "raw_symbol",
"stype_out": "instrument_id",
"start_date": "2022-06-06",
"end_date": "2022-06-07",
"partial": [],
"not_found": [],
"message": "OK",
"status": 0
}
DBNStore.to_csv
Write data to a file in CSV format.
Parameters
pandas.Timestamp
(UTC).DBNStore
with mixed record types.
Returns
None
DBNStore.to_df
Converts data to a pandas DataFrame.
InfoThe DataFrame index will be set to
ts_recv
if it exists in the schema, otherwise it will be set tots_event
.
See alsoWhile not optimized for use with live data due to their column-oriented format, pandas DataFrames can still be used with live data by first streaming DBN data to a file, then converting to a DataFrame with DBNStore.from_file().to_df(). See this example for more information.
Parameters
int
in fixed decimal format; each unit representing 1e-9 or 0.000000001. If "float", prices will have a type of float
. If "decimal", prices will be instances of decimal.Decimal
.pandas.Timestamp
. The timezone can be specified using the tz
parameter.DBNStore
with mixed record types.pretty_ts
is True
, all timestamps will be converted to the specified timezone.DataFrame
a DataFrameIterator
instance will be returned. When iterated, this object will yield a DataFrame
with at most count
elements until the entire contents of the DBNStore
are exhausted.
Returns
A pandas DataFrame object.
ts_event rtype publisher_id instrument_id action ... size flags ts_in_delta sequence symbol
ts_recv ...
2022-03-06 23:00:00.039463300+00:00 2022-03-06 23:00:00.036436177+00:00 0 1 3403 T ... 1 0 18828 5178 ESM2
2022-03-06 23:00:01.098111252+00:00 2022-03-06 23:00:01.097477845+00:00 0 1 3403 T ... 1 0 19122 6816 ESM2
2022-03-06 23:00:04.612334175+00:00 2022-03-06 23:00:04.611714663+00:00 0 1 3403 T ... 1 0 18687 10038 ESM2
2022-03-06 23:00:04.613776789+00:00 2022-03-06 23:00:04.613240435+00:00 0 1 3403 T ... 1 0 18452 10045 ESM2
2022-03-06 23:00:06.881864467+00:00 2022-03-06 23:00:06.880575603+00:00 0 1 3403 T ... 1 0 18478 11343 ESM2
[5 rows x 13 columns]
DBNStore.to_file
Write data to a DBN file.
Parameters
None
, uses the same compression as the underlying data.
Returns
A DBNStore object.
DBNStore.to_json
Write data to a file in JSON format.
Parameters
pandas.Timestamp
(UTC).DBNStore
with mixed record types.
Returns
None
DBNStore.to_ndarray
Converts data to a numpy N-dimensional array. Each element
will contain a Python representation of the binary fields as a Tuple
.
Parameters
DBNStore
with mixed record types.np.ndarray
a NDArrayIterator
instance will be returned. When iterated, this object will yield a np.ndarray
with at most count
elements until the entire contents of the DBNStore
are exhausted.
Returns
A numpy.ndarray object.
DBNStore.to_parquet
Write data to a file in Apache parquet format.
Parameters
int
in fixed decimal format; each unit representing 1e-9 or 0.000000001. If "float", prices will have a type of float
.pyarrow.TimestampType
(UTC).DBNStore
with mixed record types.
DBNStore.__iter__
Using for; records will be iterated one at a time.
Iteration will stop when there are no more records in the DBNStore
instance.
Refer to the List of fields by schema article for documentation on the fields contained with each record type.
TradeMsg { hd: RecordHeader { length: 12, rtype: Mbp0, publisher_id: GlbxMdp3Glbx, instrument_id: 3403, ts_event: 1654524078339857609 }, price: 4164.000000000, size: 291, action: 'T', side: 'B', flags: 0, depth: 0, ts_recv: 1654524078342408839, ts_in_delta: 20352, sequence: 3605032 }
TradeMsg { hd: RecordHeader { length: 12, rtype: Mbp0, publisher_id: GlbxMdp3Glbx, instrument_id: 3403, ts_event: 1654524133736900455 }, price: 4160.000000000, size: 216, action: 'T', side: 'B', flags: 0, depth: 0, ts_recv: 1654524133737794739, ts_in_delta: 28024, sequence: 3659203 }
TradeMsg { hd: RecordHeader { length: 12, rtype: Mbp0, publisher_id: GlbxMdp3Glbx, instrument_id: 3403, ts_event: 1654538295588752739 }, price: 4140.000000000, size: 200, action: 'T', side: 'B', flags: 0, depth: 0, ts_recv: 1654538295589900967, ts_in_delta: 21708, sequence: 10031624 }
DBNStore.insert_symbology_json
Insert JSON symbology data which may be obtained from a symbology request or loaded from a file.
Parameters
import databento as db
client = db.Historical("$YOUR_API_KEY")
data = client.timeseries.get_range(
dataset="XNAS.ITCH",
symbols=["ALL_SYMBOLS"],
schema="trades",
start="2022-06-06",
end="2022-06-07",
)
# Request symbology for all symbols and then insert this data
symbology_json = data.request_symbology(client)
data.insert_symbology_json(symbology_json, clear_existing=True)
map_symbols_csv
Use a symbology.json
file to map a symbols column onto an existing CSV
file. The result is written to out_file
.
Parameters
symbology.json
file to use as a symbology source.ts_recv
or ts_event
and instrument_id
column._mapped
will be appended to the csv_file
name.
Returns
Path to the written file.
ts_event,rtype,publisher_id,instrument_id,open,high,low,close,volume,symbol
1692576000000000000,35,2,523,133550000000,135200000000,132710000000,134360000000,11015261,AMZN
1692576000000000000,35,2,7290,439090000000,472900000000,437260000000,470990000000,11098972,NVDA
1692576000000000000,35,2,10157,217000000000,233500000000,217000000000,233380000000,21336884,TSLA
1692576000000000000,35,2,7130,430170000000,434700000000,0,433000000000,68661,NOC
1692662400000000000,35,2,7132,431610000000,438480000000,0,437740000000,86950,NOC
1692662400000000000,35,2,10161,236710000000,241550000000,229560000000,232870000000,17069349,TSLA
1692662400000000000,35,2,523,135320000000,135900000000,133740000000,134200000000,8133698,AMZN
1692662400000000000,35,2,7293,475120000000,483440000000,453340000000,457910000000,11700447,NVDA
1692748800000000000,35,2,523,135120000000,137860000000,133220000000,137060000000,11430081,AMZN
1692748800000000000,35,2,7296,463190000000,518870000000,452080000000,502150000000,13361964,NVDA
1692748800000000000,35,2,7135,0,440000000000,0,434030000000,68411,NOC
1692748800000000000,35,2,10163,236590000000,243750000000,226500000000,240810000000,13759234,TSLA
1692835200000000000,35,2,7287,508190000000,512600000000,466720000000,468700000000,21611800,NVDA
1692835200000000000,35,2,10154,242000000000,244140000000,228180000000,229980000000,13724054,TSLA
1692835200000000000,35,2,7126,433650000000,437730000000,0,429880000000,54570,NOC
1692835200000000000,35,2,521,137250000000,137820000000,131410000000,131880000000,11990573,AMZN
map_symbols_json
Use a symbology.json
file to insert a symbols key into records of an
existing JSON file. The result is written to out_file
.
Parameters
symbology.json
file to use as a symbology source._mapped
will be appended to the csv_file
name.
Returns
Path to the written file.
{"hd":{"ts_event":"1692576000000000000","rtype":35,"publisher_id":2,"instrument_id":523},"open":"133550000000","high":"135200000000","low":"132710000000","close":"134360000000","volume":"11015261","symbol":"AMZN"}
{"hd":{"ts_event":"1692576000000000000","rtype":35,"publisher_id":2,"instrument_id":7290},"open":"439090000000","high":"472900000000","low":"437260000000","close":"470990000000","volume":"11098972","symbol":"NVDA"}
{"hd":{"ts_event":"1692576000000000000","rtype":35,"publisher_id":2,"instrument_id":10157},"open":"217000000000","high":"233500000000","low":"217000000000","close":"233380000000","volume":"21336884","symbol":"TSLA"}
{"hd":{"ts_event":"1692576000000000000","rtype":35,"publisher_id":2,"instrument_id":7130},"open":"430170000000","high":"434700000000","low":"0","close":"433000000000","volume":"68661","symbol":"NOC"}
{"hd":{"ts_event":"1692662400000000000","rtype":35,"publisher_id":2,"instrument_id":10161},"open":"236710000000","high":"241550000000","low":"229560000000","close":"232870000000","volume":"17069349","symbol":"TSLA"}
{"hd":{"ts_event":"1692662400000000000","rtype":35,"publisher_id":2,"instrument_id":523},"open":"135320000000","high":"135900000000","low":"133740000000","close":"134200000000","volume":"8133698","symbol":"AMZN"}
{"hd":{"ts_event":"1692662400000000000","rtype":35,"publisher_id":2,"instrument_id":7293},"open":"475120000000","high":"483440000000","low":"453340000000","close":"457910000000","volume":"11700447","symbol":"NVDA"}
{"hd":{"ts_event":"1692662400000000000","rtype":35,"publisher_id":2,"instrument_id":7132},"open":"431610000000","high":"438480000000","low":"0","close":"437740000000","volume":"86950","symbol":"NOC"}
{"hd":{"ts_event":"1692748800000000000","rtype":35,"publisher_id":2,"instrument_id":523},"open":"135120000000","high":"137860000000","low":"133220000000","close":"137060000000","volume":"11430081","symbol":"AMZN"}
{"hd":{"ts_event":"1692748800000000000","rtype":35,"publisher_id":2,"instrument_id":7296},"open":"463190000000","high":"518870000000","low":"452080000000","close":"502150000000","volume":"13361964","symbol":"NVDA"}
{"hd":{"ts_event":"1692748800000000000","rtype":35,"publisher_id":2,"instrument_id":7135},"open":"0","high":"440000000000","low":"0","close":"434030000000","volume":"68411","symbol":"NOC"}
{"hd":{"ts_event":"1692748800000000000","rtype":35,"publisher_id":2,"instrument_id":10163},"open":"236590000000","high":"243750000000","low":"226500000000","close":"240810000000","volume":"13759234","symbol":"TSLA"}
{"hd":{"ts_event":"1692835200000000000","rtype":35,"publisher_id":2,"instrument_id":7287},"open":"508190000000","high":"512600000000","low":"466720000000","close":"468700000000","volume":"21611800","symbol":"NVDA"}
{"hd":{"ts_event":"1692835200000000000","rtype":35,"publisher_id":2,"instrument_id":10154},"open":"242000000000","high":"244140000000","low":"228180000000","close":"229980000000","volume":"13724054","symbol":"TSLA"}
{"hd":{"ts_event":"1692835200000000000","rtype":35,"publisher_id":2,"instrument_id":7126},"open":"433650000000","high":"437730000000","low":"0","close":"429880000000","volume":"54570","symbol":"NOC"}
{"hd":{"ts_event":"1692835200000000000","rtype":35,"publisher_id":2,"instrument_id":521},"open":"137250000000","high":"137820000000","low":"131410000000","close":"131880000000","volume":"11990573","symbol":"AMZN"}