A feature is any kind of basic independent variable that we think has some predictive value. This follows machine learning nomenclature; others may refer to this as a predictor or regressor in a statistical or econometric setting.
A trading rule is a hardcoded trading decision. An example of a trading rule is "If there's only 1 order left at the best offer, lift the offer; if there is only 1 order left at the best bid, hit the bid". A trading rule may be a hardcoded trading decision taken when a feature value exceeds a certain threshold.
A strategy that is based on trading rules is a rule-based strategy.
A liquidity-taking strategy takes liquidity by crossing the spread with aggressive or marketable orders.
A high-frequency strategy is characterized by a large number of trades.

Book skew and trading rule

The simplest type of book feature is called the book skew, which is the imbalance between resting bid depth (vb) and resting ask depth (va) at the top of the book. We can formulate this as some sort of difference between vb and va. It's convenient to scale these by their order of magnitude, so we take their log differences instead.

skew = lo g (v_{b}) - lo g (v_{a}) = lo g \frac{v _{b}}{v _{a}}

Notice that we picked this ordering simply because it's useful to formulate features such that positive values imply that we expect prices to go up; this makes it easier to debug your strategy. Intuitively, we expect higher bid depth to imply higher buy demand and hence higher prices.

We can introduce a trading rule that buys when this feature exceeds some alpha threshold k, and sells when it goes below some threshold -k.

Minimum order quantity

While there's some practical advantage to trading larger clips for a liquidity-taking strategy, we'll start with a constant trade size equal to the minimum order quantity so that we minimize slippage and market impact considerations.

The minimum order quantity depends on the trading platform or market that you're on. For this toy strategy, we'll use E-mini S&P 500 (ES) futures as an example and trade in clips of 1 contract.

skew > k \to Buy 1 lot

skew < - k \to Sell 1 lot

Commissions

This strategy will be very sensitive to commissions due to its volume, so we'll include commissions on the estimated PnL. These commissions can be found here.

Position and risk limits

This strategy is convenient because you don't have to worry about complex order and position state management. You just let it build up whatever maximum position you want. Because we expect buys and sells to be symmetrically distributed in the long run, you will eventually get out of position. However, you might not have enough margin to build up arbitrarily large positions, so for proof of concept, we'll specify a maximum position of 10 contracts.

skew > k and pos < 10 lots \to Buy 1 lot

skew < - k and pos > - 10 lots \to Sell 1 lot

Implementation

These are the parameters we have so far:

Info
This example uses the instrument with raw symbol ESU3. You'll need to substitute this with your desired symbol.

Python

      
    
import math
import json
import sys
from dataclasses import dataclass, field
from decimal import Decimal

import pandas as pd
import databento as db

API_KEY = '$YOUR_API_KEY'


class Config:

    # Alpha threshold to buy/sell, k
    ALPHA_THRESHOLD: int = 1.7

    # Symbol
    DATASET = 'GLBX.MDP3'
    SYMBOL = 'ESU3'
    POINT_VALUE = 50    # $50 per index point

    # Fees
    VENUE_FEES_PER_SIDE: Decimal = Decimal('0.39')
    CLEARING_FEES_PER_SIDE: Decimal = Decimal('0.05')
    FEES_PER_SIDE: Decimal = VENUE_FEES_PER_SIDE + CLEARING_FEES_PER_SIDE

    # Position limit
    POSITION_MAX: int = 10

Since we're only simulating liquidity-taking at minimum size, our MBP-1 schema is sufficient. You can learn more about our MBP-1 schema here.

To keep this example simple, we'll assume zero round-trip latency for any orders placed. This enables us to implement a simple, online calculation of PnL for our real-time trading simulation.

Python

      
    
@dataclass
class Strategy:

    # Dataset
    dataset: str

    # Instrument to trade
    symbol: str

    # Current position, in contract units
    position: int = 0
    # Number of long contract sides traded
    buy_qty: int = 0
    # Number of short contract sides traded
    sell_qty: int = 0

    ## Total realized buy price
    real_total_buy_px: Decimal = 0
    ## Total realized sell price
    real_total_sell_px: Decimal = 0

    # Total buy price to liquidate current position
    theo_total_buy_px: Decimal = 0
    # Total sell price to liquidate current position
    theo_total_sell_px: Decimal = 0

    # Total fees paid
    fees: Decimal = 0

    # List to track results
    results: list = field(default_factory=list)

    def run(self) -> None:
        client = db.Live()
        client.subscribe(
            dataset=self.dataset,
            schema='mbp-1',
            stype_in='raw_symbol',
            symbols=[self.symbol],
            # start="2023-08-08T12:00",    # Burn-in start time
        )
        for record in client:
            if not isinstance(record, db.MBP1Msg):
                continue
            self.update(record)

    def update(self, record: db.MBP1Msg) -> None:
        ask_size = record.levels[0].ask_sz
        bid_size = record.levels[0].bid_sz
        ask_price = record.levels[0].ask_px / Decimal("1e9")
        bid_price = record.levels[0].bid_px / Decimal("1e9")

        skew = math.log10(bid_size) - math.log10(ask_size)
        mid_price = (ask_price + bid_price) / 2

        # Buy/sell based when skew signal is large
        if skew > Config.ALPHA_THRESHOLD and self.position < Config.POSITION_MAX:
            self.position += 1
            self.buy_qty += 1
            self.real_total_buy_px += ask_price
            self.fees += Config.FEES_PER_SIDE
        elif skew < -Config.ALPHA_THRESHOLD and self.position > -Config.POSITION_MAX:
            self.position -= 1
            self.sell_qty += 1
            self.real_total_sell_px += bid_price
            self.fees += Config.FEES_PER_SIDE

        # Update prices
        # Fill prices are based on BBO with assumed zero latency
        # In practice, should be worse because of alpha decay
        if self.position == 0:
            self.theo_total_buy_px = 0
            self.theo_total_sell_px = 0
        elif self.position > 0:
            self.theo_total_sell_px = bid_price * abs(self.position)
        elif self.position < 0:
            self.theo_total_buy_px = ask_price * abs(self.position)

        # Compute PnL
        theo_pnl = (
            Config.POINT_VALUE
            * (
                self.real_total_sell_px
                + self.theo_total_sell_px
                - self.real_total_buy_px
                - self.theo_total_buy_px
            )
            - self.fees
        )

        result = {
            'ts_strategy': str(pd.Timestamp(record.ts_recv, tz='UTC')),
            'bid': f'{bid_price:0.2f}',
            'ask': f'{ask_price:0.2f}',
            'skew' : f'{skew:0.3f}',
            'position': self.position,
            'trade_ct': self.buy_qty + self.sell_qty,
            'fees': f'{self.fees:0.2f}',
            'pnl': f'{theo_pnl:0.2f}',
        }

        print(json.dumps(result, indent=4))

        self.results.append(result)


if __name__ == "__main__":
    strategy = Strategy(dataset=Config.DATASET, symbol=Config.SYMBOL)
    while True:
        try:
            strategy.run()
        except KeyboardInterrupt:
            df = pd.DataFrame(strategy.results)
            df.to_csv('strategy_log.csv', index=False)
            sys.exit()

Results

Further improvements

This implementation uses our simple synchronous client. For production applications, we recommend using our asynchronous client or callback model.

A problem encountered with the book skew is that extreme values may be influenced by spoofing. One possibility is to modify the trading rule and introduce an upper limit as follows:

skew > k and abs(skew) < L and pos < 10 lots \to Buy 1 lot

skew < - k and abs(skew) < L and pos > - 10 lots \to Sell 1 lot

Finally, recall that we assumed zero delay in order placement and fill. It's also important to incorporate a delay when extending this example.