For calculating latency from the Live client see this example.
Matching engine latencies
Overview
In this example we will use the Historical client to process instrument definition and MBO data to calculate matching engine and feed latency. The matching engine latency is the time between a market event and the when the message was sent by the exchange. The feed latency is the time between that message being sent by the exchange and when Databento's servers received it.
See also
Definition
We will use the definition schema in this example to assign the asset to each instrument, allowing us to examine these measurements for different assets.
MBO
We will use the MBO schema in this example to measure the matching engine and feed latency, but any schema which contains the ts_recv
and ts_in_delta
timestamps can be used for this purpose.
Timestamps
Databento provides several nanosecond resolution timestamps for every record. For these calculations, we will use: ts_event
, ts_recv
, and ts_in_delta
. You can review our timestamping guide to learn more about where each of these are captured, but they are briefly described as:
Timestamp | Description |
---|---|
ts_event |
The matching-engine-received timestamp expressed as the number of nanoseconds since the UNIX epoch. |
ts_recv |
The capture-server-received timestamp expressed as the number of nanoseconds since the UNIX epoch. |
ts_in_delta |
The matching-engine-sending timestamp expressed as the number of nanoseconds before ts_recv . |
Example
We will use a few parent symbols for this analysis: ES.FUT
, SR3.FUT
, 6E.FUT
, BTC.FUT
, ZF.FUT
, GC.FUT
, and CL.FUT
. These assets belong to different channels in CME Globex's matching engine which will allow us to compare the matching engine latency between channels. You can read more about Databento's parent symbology here.
import databento as db
import matplotlib.pyplot as plt
import numpy as np
# Set parameters
dataset = "GLBX.MDP3"
products = [
"ES.FUT",
"SR3.FUT",
"6E.FUT",
"ZF.FUT",
"GC.FUT",
"CL.FUT",
]
# Create a historical client
client = db.Historical(key="$YOUR_API_KEY")
# Request definition data
definitions = client.timeseries.get_range(
dataset=dataset,
schema="definition",
symbols=products,
stype_in="parent",
start="2024-01-21T00:00:00-6",
end="2024-01-22T17:00:00-6",
)
# Request MBO data to calculate latency
mbo = client.timeseries.get_range(
dataset=dataset,
schema="mbo",
symbols=products,
stype_in="parent",
start="2024-01-26T05:00:00-6",
end="2024-01-26T17:00:00-6", # 12 hours of mbo
)
# Create a DataFrame for the mbo data
mbo_df = mbo.to_df(pretty_ts=False, map_symbols=False)
# Filter out any records with a bad ts_recv timestamp
mbo_df[mbo_df["flags"] & db.RecordFlags.F_BAD_TS_RECV == 0]
# Check for bad timestamps
if (mbo_df.index == db.UNDEF_TIMESTAMP).any():
raise ValueError("Data contains one or more undefined ts_recv timestamps")
if (mbo_df["ts_in_delta"] == db.UNDEF_TIMESTAMP).any():
raise ValueError("Data contains one or more undefined ts_in_delta timestamps")
# Calculate latency
mbo_df["latency_matching_us"] = (mbo_df.index - mbo_df["ts_in_delta"] - mbo_df["ts_event"]) / 1e3
mbo_df["latency_send_to_recv_us"] = mbo_df["ts_in_delta"] / 1e3
mbo_df = mbo_df[["instrument_id", "latency_matching_us", "latency_send_to_recv_us"]]
# Get the asset and instrument class for every symbol
def_df = definitions.to_df(pretty_ts=False, map_symbols=False)
def_df = def_df[["instrument_id", "asset", "instrument_class"]].set_index("instrument_id")
# Join the two DataFrames
latency_df = mbo_df.join(def_df, on="instrument_id")
# Plot send to receive latency across all symbols
data = latency_df["latency_send_to_recv_us"]
sorted_data = np.sort(data)
cum_probability = np.linspace(0, 1, num=len(sorted_data))
plt.plot(sorted_data, cum_probability)
plt.title("Send to Receive Latency")
plt.grid(axis="x")
plt.xscale("log")
plt.ylabel("Cumulative Probability")
plt.xlabel("Latency (microseconds)")
plt.show()
# Plot matching engine latency across all symbols
data = latency_df["latency_matching_us"]
sorted_data = np.sort(data)
cum_probability = np.linspace(0, 1, num=len(sorted_data))
plt.plot(sorted_data, cum_probability)
plt.title("Matching Engine Latency")
plt.grid(axis="x")
plt.xscale("log")
plt.ylabel("Cumulative Probability")
plt.xlabel("Latency (microseconds)")
plt.show()
# Plot matching engine latency for each asset
for asset, group in latency_df.groupby("asset")["latency_matching_us"]:
data = np.percentile(group, range(0, 101))
plt.plot(
data.take([50, 90, 95, 99]),
[asset] * 4,
marker="|",
linestyle="",
markersize=20,
)
plt.scatter(
data,
[asset] * len(data),
marker=".",
alpha=[1 - abs(i - 50) / 50 for i, _ in enumerate(data)],
)
plt.title("Matching Engine Latency")
plt.grid(axis="x")
plt.xscale("log")
plt.ylabel("Channel")
plt.xlabel("Latency (microseconds)")
plt.show()