Support

Matching engine latencies

Overview

In this example we will use the Historical client to process instrument definition and MBO data to calculate matching engine and feed latency. The matching engine latency is the time between a market event and the when the message was sent by the exchange. The feed latency is the time between that message being sent by the exchange and when Databento's servers received it.

See also
See also

For calculating latency from the Live client see this example.

Definition

We will use the definition schema in this example to assign the asset to each instrument, allowing us to examine these measurements for different assets.

MBO

We will use the MBO schema in this example to measure the matching engine and feed latency, but any schema which contains the ts_recv and ts_in_delta timestamps can be used for this purpose.

Timestamps

Databento provides several nanosecond resolution timestamps for every record. For these calculations, we will use: ts_event, ts_recv, and ts_in_delta. You can review our timestamping guide to learn more about where each of these are captured, but they are briefly described as:

Timestamp Description
ts_event The matching-engine-received timestamp expressed as the number of nanoseconds since the UNIX epoch.
ts_recv The capture-server-received timestamp expressed as the number of nanoseconds since the UNIX epoch.
ts_in_delta The matching-engine-sending timestamp expressed as the number of nanoseconds before ts_recv.

Example

We will use a few parent symbols for this analysis: ES.FUT, SR3.FUT, 6E.FUT, BTC.FUT, ZF.FUT, GC.FUT, and CL.FUT. These assets belong to different channels in CME Globex's matching engine which will allow us to compare the matching engine latency between channels. You can read more about Databento's parent symbology here.

import databento as db
import matplotlib.pyplot as plt
import numpy as np

# Set parameters
dataset = "GLBX.MDP3"
products = [
    "ES.FUT",
    "SR3.FUT",
    "6E.FUT",
    "ZF.FUT",
    "GC.FUT",
    "CL.FUT",
]

# Create a historical client
client = db.Historical(key="$YOUR_API_KEY")

# Request definition data
definitions = client.timeseries.get_range(
    dataset=dataset,
    schema="definition",
    symbols=products,
    stype_in="parent",
    start="2024-01-21T00:00:00-6",
    end="2024-01-22T17:00:00-6",
)

# Request MBO data to calculate latency
mbo = client.timeseries.get_range(
    dataset=dataset,
    schema="mbo",
    symbols=products,
    stype_in="parent",
    start="2024-01-26T05:00:00-6",
    end="2024-01-26T17:00:00-6",  # 12 hours of mbo
)

# Create a DataFrame for the mbo data
mbo_df = mbo.to_df(pretty_ts=False, map_symbols=False)

# Filter out any records with a bad ts_recv timestamp
mbo_df[mbo_df["flags"] & db.RecordFlags.F_BAD_TS_RECV == 0]

# Check for bad timestamps
if (mbo_df.index == db.UNDEF_TIMESTAMP).any():
    raise ValueError("Data contains one or more undefined ts_recv timestamps")
if (mbo_df["ts_in_delta"] == db.UNDEF_TIMESTAMP).any():
    raise ValueError("Data contains one or more undefined ts_in_delta timestamps")

# Calculate latency
mbo_df["latency_matching_us"] = (mbo_df.index - mbo_df["ts_in_delta"] - mbo_df["ts_event"]) / 1e3
mbo_df["latency_send_to_recv_us"] = mbo_df["ts_in_delta"] / 1e3
mbo_df = mbo_df[["instrument_id", "latency_matching_us", "latency_send_to_recv_us"]]

# Get the asset and instrument class for every symbol
def_df = definitions.to_df(pretty_ts=False, map_symbols=False)
def_df = def_df[["instrument_id", "asset", "instrument_class"]].set_index("instrument_id")

# Join the two DataFrames
latency_df = mbo_df.join(def_df, on="instrument_id")

# Plot send to receive latency across all symbols
data = latency_df["latency_send_to_recv_us"]
sorted_data = np.sort(data)
cum_probability = np.linspace(0, 1, num=len(sorted_data))

plt.plot(sorted_data, cum_probability)
plt.title("Send to Receive Latency")
plt.grid(axis="x")
plt.xscale("log")
plt.ylabel("Cumulative Probability")
plt.xlabel("Latency (microseconds)")
plt.show()

# Plot matching engine latency across all symbols
data = latency_df["latency_matching_us"]
sorted_data = np.sort(data)
cum_probability = np.linspace(0, 1, num=len(sorted_data))

plt.plot(sorted_data, cum_probability)
plt.title("Matching Engine Latency")
plt.grid(axis="x")
plt.xscale("log")
plt.ylabel("Cumulative Probability")
plt.xlabel("Latency (microseconds)")
plt.show()

# Plot matching engine latency for each asset
for asset, group in latency_df.groupby("asset")["latency_matching_us"]:
    data = np.percentile(group, range(0, 101))
    plt.plot(
        data.take([50, 90, 95, 99]),
        [asset] * 4,
        marker="|",
        linestyle="",
        markersize=20,
    )
    plt.scatter(
        data,
        [asset] * len(data),
        marker=".",
        alpha=[1 - abs(i - 50) / 50 for i, _ in enumerate(data)],
    )

plt.title("Matching Engine Latency")
plt.grid(axis="x")
plt.xscale("log")
plt.ylabel("Channel")
plt.xlabel("Latency (microseconds)")
plt.show()

Results

Feed latency

Matching engine latency

Matching engine channel latency