Advantages of raw PCAP files
Raw market data in PCAP (packet capture) format has become increasingly popular among institutional traders and investors for its granularity and flexibility. In this article, we delve into what PCAP files are, the advantages of PCAPs, and when raw data can be more beneficial than normalized data.
When data is transmitted over a network, it's done in discrete packets. A PCAP is a file of the raw data received from the exchange, byte for byte, without any normalization, saved to a disk that allows you to analyze and implement the raw data based on your needs.
The PCAP file format is simple, consisting of the raw market data payload, a global header that references the network type (Ethernet, VLAN, IP, UDP/TCP), and individual packet headers for each packet within it. The packet headers contain metadata, like the receive timestamp.
PCAP files can be read by tools and libraries, including Wireshark/tshark, tcpdump, and libpcap.
You can learn more about the PCAP file format here.
Here's an example of viewing an OPRA PCAP in Wireshark.
Databento PCAPs offers the most granular solution with PTP-synchronized timestamps and raw data in its native wire protocol, allowing for more in-depth analysis. This is especially helpful for low-latency strategies, microstructure research, transaction cost analysis, and more.
We use FPGAs and a capture infrastructure to maintain up to 100 Gbps line rate capture.
We colocate within the same data center as the matching engine or primary PoP to provide a more accurate timestamp that's reflective of a real trading environment. All of our capture servers are synchronized over PTP to a GPS clock source. Each packet is timestamped at SOF (Start of Frame) on ingress.
With our PCAPs, you can merge and deduplicate before delivery to reduce storage, bandwidth, and transfer costs by up to 50%. We offer multiple delivery methods, such as AWS S3, GCP, rsync, and more, to accommodate your needs.
Choosing between raw data and normalized data will depend on the type of analysis you plan to conduct. While normalized data provides a standardized format that simplifies data manipulation, there are situations where using raw data is more advantageous.
- No loss of information due to normalization: Raw data prevents any information loss during normalization processes. For example, on CME, normalized data schemas don't provide MDOrderPriority, which can be valuable for Lead Market Maker (LMM) strategies.
- Compatibility with existing parsers: Raw data is compatible with existing parsers that were written to handle the exchange's market data directly, making it easy to test and integrate previously developed tools without needing to make any changes.
- Historical analysis and testing: PCAP files can be used to replay past network traffic, which can be beneficial for historical analysis of market events, testing new algorithms against historical data, or simulating past market conditions. While using normalized data may be feasible, it would introduce inherent limitations that can compromise accuracy.