Support

Resample OHLCV from 1-minute to 5-minute

Databento follows a convention in the OHLCV schema to only publish a record when a trade occurs in the interval.

This approach is adopted by most data vendors for two key reasons:

  • Multiple interpolation strategies exist and the optimal choice depends on the specific use case. Client-side interpolation keeps the strategy transparent.
  • This reduces storage and bandwidth requirements, especially for markets with many illiquid instruments like options.

This example demonstrates one way to interpolate OHLCV-1m data to ensure exactly one row per minute. The interpolation strategy used in this example does the following:

  • Forward-fills the close price from the last known value.
  • Sets open, high, and low equal to the forward-filled close price.
  • Sets volume to 0 for interpolated periods.

Overview

In this example, we'll use the historical client to request data from the OHLCV-1m schema. First, we'll highlight how to interpolate missing rows in the 1-minute data. Next, we'll show how to resample the data to 5-minute bars.

Note that while this example is showing these methods being used individually, you can also chain these methods together.

Example

import databento as db
import pandas as pd

def interpolate_ohlcv(
    df: pd.DataFrame,
    start: pd.Timestamp,
    end: pd.Timestamp,
    interp_interval: str,
) -> pd.DataFrame:
    """
    Interpolate OHLCV records between `start` and `end`, since Databento only sends
    an OHLCV record if a trade happens in that interval.
    """
    def _interpolate_group(group):
        """Interpolate OHLCV records for each group"""
        # Reindex with a complete index using specified start/end times
        group = group.reindex(
            pd.date_range(
                start=start,
                end=end,
                freq=interp_interval,
                inclusive="left",
            ).rename(group.index.name),
        )
        # Forward fill close prices (may remain NaN if no prior data exists)
        group["close"] = group["close"].ffill()

        # For intervals with no trades, set open/high/low equal to the close and volume to 0
        group = group.fillna({
            **{col: group["close"] for col in ["open", "high", "low"]},
            "volume": 0,
        })
        group["volume"] = group["volume"].astype(int)
        group = group.drop(columns=["rtype", "instrument_id"], errors="ignore")

        return group

    df = (
        df.groupby(["publisher_id", "symbol"])
        .apply(_interpolate_group, include_groups=False)
        .reset_index(["publisher_id", "symbol"])
        .sort_values(["ts_event", "publisher_id", "symbol"])
    )
    return df


def resample_ohlcv(
    df: pd.DataFrame,
    resample_interval: str,
) -> pd.DataFrame:
    """Resample OHLCV bars to the specified interval"""
    resampled_df = (
        df.groupby(["publisher_id", "symbol"])
        .resample(resample_interval)
        .agg({
            "open": "first",
            "high": "max",
            "low": "min",
            "close": "last",
            "volume": "sum",
        })
        .reset_index(["publisher_id", "symbol"])
        .sort_values(["ts_event", "publisher_id", "symbol"])
    )

    return resampled_df


# Set parameters
start = pd.Timestamp("2024-12-17T09:30:00", tz="US/Eastern")
end = pd.Timestamp("2024-12-17T16:00:00", tz="US/Eastern")

# Create a historical client
client = db.Historical("$YOUR_API_KEY")

# Request OHLCV-1m data for all AAPL options and convert to DataFrame
df = client.timeseries.get_range(
    dataset="OPRA.PILLAR",
    schema="ohlcv-1m",
    symbols="AAPL.OPT",
    stype_in="parent",
    start=start,
    end=end,
).to_df(tz=start.tzinfo)

# Interpolate missing rows.
df1 = interpolate_ohlcv(df, start, end, "1min")
print(df1)

# Resample to 5-minute bars
df2 = resample_ohlcv(df, "5min")
print(df2)

Result

                           publisher_id                 symbol   open   high    low  close  volume
ts_event
2024-12-17 09:30:00-05:00            20  AAPL  241220C00115000    NaN    NaN    NaN    NaN       0
2024-12-17 09:30:00-05:00            20  AAPL  241220C00155000    NaN    NaN    NaN    NaN       0
2024-12-17 09:30:00-05:00            20  AAPL  241220C00170000    NaN    NaN    NaN    NaN       0
2024-12-17 09:30:00-05:00            20  AAPL  241220C00195000    NaN    NaN    NaN    NaN       0
2024-12-17 09:30:00-05:00            20  AAPL  241220C00200000    NaN    NaN    NaN    NaN       0
...                                 ...                    ...    ...    ...    ...    ...     ...
2024-12-17 15:59:00-05:00            61  AAPL  261218C00240000  50.70  50.70  50.70  50.70       0
2024-12-17 15:59:00-05:00            61  AAPL  261218P00050000   0.29   0.29   0.29   0.29       0
2024-12-17 15:59:00-05:00            61  AAPL  270115C00240000  50.14  50.14  50.14  50.14       0
2024-12-17 15:59:00-05:00            61  AAPL  270115C00250000  44.34  44.34  44.34  44.34       0
2024-12-17 15:59:00-05:00            61  AAPL  270115C00440000   3.50   3.50   3.50   3.50       0

[3129360 rows x 7 columns]

                           publisher_id                 symbol   open   high    low  close  volume
ts_event
2024-12-17 09:30:00-05:00            20  AAPL  241220C00220000  30.70  30.70  30.70  30.70       1
2024-12-17 09:30:00-05:00            20  AAPL  241220C00225000  25.95  25.95  25.95  25.95       2
2024-12-17 09:30:00-05:00            20  AAPL  241220C00242500   8.20   8.20   8.20   8.20       1
2024-12-17 09:30:00-05:00            20  AAPL  241220C00245000   6.30   6.30   6.17   6.17       4
2024-12-17 09:30:00-05:00            20  AAPL  241220C00247500   3.70   3.85   3.70   3.85       2
...                                 ...                    ...    ...    ...    ...    ...     ...
2024-12-17 15:55:00-05:00            61  AAPL  250124C00255000   5.95   5.95   5.95   5.95       1
2024-12-17 15:55:00-05:00            61  AAPL  250124C00285000   0.30   0.30   0.30   0.30      47
2024-12-17 15:55:00-05:00            61  AAPL  250124P00235000   1.01   1.01   1.01   1.01       1
2024-12-17 15:55:00-05:00            61  AAPL  250321C00225000  33.90  33.90  33.90  33.90       1
2024-12-17 15:55:00-05:00            61  AAPL  250919P00105000   0.33   0.33   0.33   0.33       2

[231713 rows x 7 columns]