This article introduces additional concepts to get started with futures data on Databento. If you're new to Databento, see the Quickstart guide.
Futures: Introduction
Info
Overview
In this example, we'll show how to:
- Find a futures dataset
- Find the 10 futures contracts with the highest volume
- Use instrument definitions to get the tick size, expiration, and matching algorithm of an instrument
- Stream live BBO data
- Use parent symbology to fetch all contracts expirations
- Use continuous contract symbology to get the lead month contract
We'll also highlight any special conventions of futures datasets on Databento that differ from those of other asset classes.
Finding a futures dataset
To use futures data on Databento, first identify the dataset that you want from our data catalog and go to its detail page. Here, you can find its dataset ID at the top left of the page.
For this example, we'll use the CME Globex MDP 3.0
dataset, whose dataset ID is GLBX.MDP3
. You'll need to pass this in as the dataset
parameter of any API or client method.
Finding futures contracts with highest volume
A quick way to find the most actively-traded futures contracts, across all
expirations, is to fetch the daily volumes from the OHLCV-1d schema. (You can also
get a similar result using the statistics
schema, which provides the official daily
settlement prices and trade volumes.)
import databento as db
# Create historical client
client = db.Historical("$YOUR_API_KEY")
def rank_by_volume(top=10):
# Request OHLCV-1d data
data = client.timeseries.get_range(
dataset="GLBX.MDP3",
symbols="ALL_SYMBOLS",
schema="ohlcv-1d",
start="2023-08-15"
)
# Convert to DataFrame and filter for top 10 instruments by volume
df = data.to_df()
return df.sort_values(by="volume", ascending=False)["instrument_id"].to_list()[:top]
top_instruments = rank_by_volume()
print(top_instruments)
This returns the following list of numeric instrument IDs, from the instrument_id
field.
[338574, 3445, 404144, 9235, 2922, 2130, 72156, 399495, 225833, 1562]
Using instrument definitions to get tick size, expiration, and matching algorithm
In the case of CME Globex MDP 3.0, these instrument IDs are sourced from tag 48-SecurityID of the original Security Definition messages.
Instrument IDs are necessary for many order routing and post-trade scenarios with the exchange, but can be hard to work with, so we will print their raw symbols instead. Let's extract useful properties of these instruments, like their tick sizes, expiration dates, and matching algorithms.
def get_symbol_properties(instrument_id_list):
# Request definition data
data = client.timeseries.get_range(
dataset="GLBX.MDP3",
stype_in="instrument_id",
symbols=instrument_id_list,
schema="definition",
start="2023-08-15",
)
# Convert to DataFrame
df = data.to_df()
return df[["instrument_id", "raw_symbol", "min_price_increment", "match_algorithm", "expiration"]]
print(get_symbol_properties(top_instruments))
instrument_id raw_symbol min_price_increment match_algorithm expiration
ts_recv
2023-08-15 00:00:00+00:00 225833 ZBU3 0.031250 F 2023-09-20 17:01:00+00:00
2023-08-15 00:00:00+00:00 1562 SR3Z3 0.005000 A 2024-03-19 21:00:00+00:00
2023-08-15 00:00:00+00:00 2922 MESU3 0.250000 F 2023-09-15 13:30:00+00:00
2023-08-15 00:00:00+00:00 399495 TNU3 0.015625 F 2023-09-20 17:01:00+00:00
2023-08-15 00:00:00+00:00 338574 ZNU3 0.015625 F 2023-09-20 17:01:00+00:00
2023-08-15 00:00:00+00:00 3445 ESU3 0.250000 F 2023-09-15 13:30:00+00:00
2023-08-15 00:00:00+00:00 72156 ZTU3 0.003906 K 2023-09-29 17:01:00+00:00
2023-08-15 00:00:00+00:00 9235 MNQU3 0.250000 F 2023-09-15 13:30:00+00:00
2023-08-15 00:00:00+00:00 2130 NQU3 0.250000 F 2023-09-15 13:30:00+00:00
2023-08-15 00:00:00+00:00 404144 ZFU3 0.007812 F 2023-09-29 17:01:00+00:00
Observe that the matching_algorithm
values are in their raw values passed through
from the exchange. The full list of supported instrument definition fields and info
about the output can be found on the exchange's specifications page.
Streaming live BBO data
While highly liquid futures contracts generally maintain narrow bid-ask spreads, certain market conditions can lead to this spread widening. Stream our BBO-1s schema to monitor the current best bid and best offer, subsampled at 1-second intervals.
This example uses the BBO-1s schema for top-of-book information, but other similar schemas exist that may better suit your use case. Read more about these schemas on our MBP-1 vs. TBBO vs. BBO schemas page.
InfoThis example requires a live license to
GLBX.MDP3
. Visit our live data portal to sign up.
import databento as db
# Enable basic logging
db.enable_logging("INFO")
# Create a live client
live_client = db.Live("$YOUR_API_KEY")
# Subscribe to the BBO-1s schema for the continuous NQ contract
live_client.subscribe(
dataset="GLBX.MDP3",
schema="bbo-1s",
symbols="NQ.v.0",
stype_in="continuous",
)
# Add a print callback
live_client.add_callback(print)
# Start the live client to begin streaming
live_client.start()
# Run the stream for 15 seconds before closing
live_client.block_for_close(timeout=15)
InfoIf you do not see any output, it could be because the markets are closed. See the
start
parameter in Live.subscribe to utilize intraday replay.
SymbolMappingMsg { hd: RecordHeader { length: 44, rtype: SymbolMapping, publisher_id: 0, instrument_id: 42288528, ts_event: 1738586561618888067 }, stype_in: 255, stype_in_symbol: "NQ.v.0", stype_out: 255, stype_out_symbol: "NQH5", start_ts: 18446744073709551615, end_ts: 18446744073709551615 }
BboMsg { hd: RecordHeader { length: 20, rtype: Bbo1S, publisher_id: GlbxMdp3Glbx, instrument_id: 42288528, ts_event: 1738586561944562749 }, price: 21181.750000000, size: 1, side: 'B', flags: LAST (130), ts_recv: 1738586562000000000, sequence: 20360097, levels: [BidAskPair { bid_px: 21180.250000000, ask_px: 21181.250000000, bid_sz: 1, ask_sz: 1, bid_ct: 1, ask_ct: 1 }] }
BboMsg { hd: RecordHeader { length: 20, rtype: Bbo1S, publisher_id: GlbxMdp3Glbx, instrument_id: 42288528, ts_event: 1738586562826454327 }, price: 21180.750000000, size: 1, side: 'A', flags: LAST (130), ts_recv: 1738586563000000000, sequence: 20360523, levels: [BidAskPair { bid_px: 21180.250000000, ask_px: 21181.250000000, bid_sz: 1, ask_sz: 1, bid_ct: 1, ask_ct: 1 }] }
BboMsg { hd: RecordHeader { length: 20, rtype: Bbo1S, publisher_id: GlbxMdp3Glbx, instrument_id: 42288528, ts_event: 1738586563257839295 }, price: 21182.500000000, size: 1, side: 'B', flags: LAST (130), ts_recv: 1738586564000000000, sequence: 20361382, levels: [BidAskPair { bid_px: 21182.250000000, ask_px: 21183.000000000, bid_sz: 3, ask_sz: 1, bid_ct: 1, ask_ct: 1 }] }
...
BboMsg { hd: RecordHeader { length: 20, rtype: Bbo1S, publisher_id: GlbxMdp3Glbx, instrument_id: 42288528, ts_event: 1738586573077758761 }, price: 21181.750000000, size: 2, side: 'A', flags: LAST (130), ts_recv: 1738586574000000000, sequence: 20367086, levels: [BidAskPair { bid_px: 21182.250000000, ask_px: 21183.000000000, bid_sz: 1, ask_sz: 1, bid_ct: 1, ask_ct: 1 }] }
BboMsg { hd: RecordHeader { length: 20, rtype: Bbo1S, publisher_id: GlbxMdp3Glbx, instrument_id: 42288528, ts_event: 1738586573077758761 }, price: 21181.750000000, size: 2, side: 'A', flags: LAST (130), ts_recv: 1738586575000000000, sequence: 20367373, levels: [BidAskPair { bid_px: 21182.750000000, ask_px: 21183.750000000, bid_sz: 1, ask_sz: 1, bid_ct: 1, ask_ct: 1 }] }
BboMsg { hd: RecordHeader { length: 20, rtype: Bbo1S, publisher_id: GlbxMdp3Glbx, instrument_id: 42288528, ts_event: 1738586575233261001 }, price: 21182.250000000, size: 1, side: 'A', flags: LAST (130), ts_recv: 1738586576000000000, sequence: 20367799, levels: [BidAskPair { bid_px: 21181.250000000, ask_px: 21182.250000000, bid_sz: 2, ask_sz: 1, bid_ct: 2, ask_ct: 1 }] }
Parent symbology
On futures trading venues, it can be tedious to fetch every child instrument and
contract expiration (like ESU3, ESZ3, ESU3-ESZ3) for a given parent instrument
(like ES). You can use our parent symbology type
to do this, by passing in stype_in="parent"
.
def get_child_instruments(parents=["ZB.FUT", "SR3.FUT"]):
# Request definition data for parent symbols
data = client.timeseries.get_range(
dataset="GLBX.MDP3",
symbols=parents,
stype_in="parent",
schema="definition",
start="2023-08-15",
)
# Convert to DataFrame
df = data.to_df()
return df[["instrument_id", "raw_symbol"]]
print(get_child_instruments().head())
instrument_id raw_symbol
ts_recv
2023-08-15 00:00:00+00:00 34810 SR3:AB 01Y M8
2023-08-15 00:00:00+00:00 45040 SR3M6-SR3Z7
2023-08-15 00:00:00+00:00 51186 SR3Z4-SR3M6
2023-08-15 00:00:00+00:00 22508 SR3:DF H7Z7U8M9
2023-08-15 00:00:00+00:00 350151 SR3:SB PK M4-M5
Alternatively, you can replicate this logic by requesting definition data for all symbols and
filtering on the asset
field. The asset
field is populated with the root of the parent symbol.
Databento's web portal only exposes parent products, and not child instruments. When you set up a batch download through the web portal, note that you're making a parent symbology request and you'll receive interleaved data from multiple instruments. If you want to set up a batch download of individual child instruments, you must use our API instead.
Continuous contract symbology
Likewise, it's tedious to track the lead month contract of a futures
product over a long period of time, due to rollovers. You can use our
continuous contract symbology type
to get a single symbol that is pegged to the lead month contract, by passing
in stype_in="continuous"
.
For example, let's plot the two lead month ES contracts ES.n.0
and
ES.n.1
.
import databento as db
import matplotlib.pyplot as plt
client = db.Historical("$YOUR_API_KEY")
dataset = "GLBX.MDP3"
symbols = ["ES.n.0", "ES.n.1"]
start = "2024"
data = client.timeseries.get_range(
dataset="GLBX.MDP3",
schema="ohlcv-1d",
stype_in="continuous",
symbols=symbols,
start=start,
)
df = data.to_df()
df.groupby("symbol")["close"].plot(
xlabel="Date",
ylabel="Price",
)
plt.legend()
plt.show()
TipIf you'd like to see the two calendar front month contracts instead, you may use
ES.c.0
andES.c.1
. However, in most cases these will resolve to the same symbols asES.n.0
andES.n.1
respectively because open interest tends to decay with increasing time to expiration.
Many futures products reflect seasonality in commodities or
term structure in fixed income, so the nearest calendar month may not be the
lead month. In such cases, you need to specify how you want to resolve the lead month,
also known as a roll rule. For example, you could use ES.v.0
to resolve
the lead month by volume instead of open interest.
Special conventions for futures on Databento
- Weekly trading session. Unlike many venues, CME Globex has a weekly trading session. This affects how you process MBO data. We provide a synthetic snapshot of the book at the start of each UTC day to make it easier to start from any day of the week.
- User-defined instruments and spreads. CME Globex has a large number of user-defined instruments. While many vendors do not expose these and their raw symbols may be foreign to a user who's seeing these for the first time, Databento includes all of them as many are highly liquid and active.
- Asynchronous trade publication. CME Globex prints fills and order deletions
associated with the fills asynchronously, with the fills published before the
deletions. This is unlike most venues, which treat the individual fill and
corresponding order deletion as a single atomic event. You may choose to preemptively
update your book based on trades or fills, or wait until the corresponding deletes,
which we represent with action
C
. - Implied book. CME Globex has implied matching. If a trade is partially filled by
contra liquidity on the implied book, we show the full quantity of the trade but only
the fill quantities on the direct book. A full implied trade will have trade side
N
. - Inverted spreads. CME Globex's matching engine has various price limits and circuit breakers. The spread may be inverted during a trading halt.
- No rollover back-adjustments. Our continuous contract symbology is a notation that maps to an actual, tradable instrument on any given date. The continuous contract prices provided are the original, unadjusted prices. We don't create a synthetic time series by back-adjusting the prices to remove jumps during rollovers.
See also