Why does Databento have its own Autonomous System Number (ASN) and why does it matter to our users?
Before answering this, let's dive into ASNs and why they're important.
Whenever we connect to the Internet, we indirectly tap into an autonomous system. Autonomous systems are like traffic controllers: they connect computer networks, help organize how data moves between them, and make transmission more efficient. Autonomous System Numbers (ASNs) are unique identifiers that allow these systems to exchange routing information with one another.
ASNs must be applied for, approved, and assigned by an internet registry, such as the American Registry for Internet Numbers (ARIN). Organizations requesting an ASN must demonstrate that they have a unique routing policy or a multi-homed site.
All of our data is self-hosted. To our knowledge, we operate one of the largest storage clusters in the financial data space. Our next largest competitor has over $300M in annual data revenues. When we designed our infrastructure, a specific requirement was that we'd be able to serve two forms of data over the Internet: historical full order book data and real-time order book snapshots mixed from multiple trading venues. The throughput requirements for these are significant—the rare few providers that provide this will usually push users towards fetching the data within the AWS cloud, over cross-connect, or even over physical hard disk shipments.
Given the amount of data we have to store and the bandwidth we'd need to serve it, the economics shift strongly in favor of a self-hosted infrastructure instead of a cloud provider like AWS or GCP. This raises the question of how to engineer a network to serve this firehose of data ourselves.
The most economical solution is to rent static IPs or IP blocks from the lowest-cost Internet service provider (ISP) and have them advertise those IPs. This is common practice: if you run a traceroute from different locations to most vendors' data gateways—on the off chance that they even publicly disclose their IPs—you'd often see the traffic pass through the same ISP's nodes all of the time.
But with a single ISP comes the risk of downtime, less route diversity, and poorer performance. A single ISP may limit your maximum bandwidth, and some ISPs don't offer more than 10 Gbps. Low-cost ISPs sometimes have peering disputes that lead to poor performance for their users. We use multiple, redundant ISPs like most large video streaming sites to get around this, but this leads to another problem: you generally can't have other ISPs advertise your static IPs borrowed from another ISP.
Having our own ASN lets us possess our own IP blocks and advertise our ASN and IPs via multiple ISPs through Border Gateway Protocol (BGP). Running BGP on our routers comes with the option of putting the full routing table on those routers—which we do— allowing us to pick the best routes between those ISPs and better utilize multiple IP links. It also allows us to set up direct peering with large cloud providers at various interconnection facilities, allowing their users to reach us directly without any hops on the public Internet. This setup is a lot costlier than renting from a single ISP, but allows us to provide a performant and robust service.
If you want to see this at work, this quick curl command helps you benchmark our network throughput and how fast it ramps up when downloading MBP-10 data, or market depth at 10 levels, which requires significant bandwidth:
{ curl -G 'https://hist.databento.com/v0/timeseries.get_range' -d dataset=GLBX.MDP3 -d schema=mbp-10 -d start=20230526 -d encoding=dbn -d compression=zstd -u YOUR_API_KEY: -o /dev/null 2>&1 | LC_CTYPE=C tr '\r' '\n' }
Here's a result obtained by one of our customers whose server is based in AWS US-East, fetching over 1 million rows per second of order book data (nearly 40 MB/s compressed) over public internet:
% Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0 0 0 0 0 0 0 0 0 --:--:-- 0:00:01 --:--:-- 0 0 0 0 0 0 0 0 0 --:--:-- 0:00:02 --:--:-- 0 100 29.4M 0 29.4M 0 0 9957k 0 --:--:-- 0:00:03 --:--:-- 9955k 100 66.9M 0 66.9M 0 0 16.6M 0 --:--:-- 0:00:04 --:--:-- 16.6M 100 106M 0 106M 0 0 21.1M 0 --:--:-- 0:00:05 --:--:-- 21.2M 100 144M 0 144M 0 0 23.9M 0 --:--:-- 0:00:06 --:--:-- 30.7M 100 182M 0 182M 0 0 26.0M 0 --:--:-- 0:00:07 --:--:-- 38.1M 100 222M 0 222M 0 0 27.6M 0 --:--:-- 0:00:08 --:--:-- 38.4M 100 261M 0 261M 0 0 28.9M 0 --:--:-- 0:00:09 --:--:-- 38.9M 100 300M 0 300M 0 0 29.9M 0 --:--:-- 0:00:10 --:--:-- 38.8M 100 339M 0 339M 0 0 30.7M 0 --:--:-- 0:00:11 --:--:-- 38.8M 100 378M 0 378M 0 0 31.4M 0 --:--:-- 0:00:12 --:--:-- 39.0M 100 418M 0 418M 0 0 32.0M 0 --:--:-- 0:00:13 --:--:-- 39.2M 100 456M 0 456M 0 0 32.5M 0 --:--:-- 0:00:14 --:--:-- 39.1M 100 497M 0 497M 0 0 33.0M 0 --:--:-- 0:00:15 --:--:-- 39.2M 100 536M 0 536M 0 0 33.4M 0 --:--:-- 0:00:16 --:--:-- 39.4M 100 575M 0 575M 0 0 33.8M 0 --:--:-- 0:00:17 --:--:-- 39.5M 100 615M 0 615M 0 0 34.1M 0 --:--:-- 0:00:18 --:--:-- 39.6M 100 654M 0 654M 0 0 34.3M 0 --:--:-- 0:00:19 --:--:-- 39.4M 100 692M 0 692M 0 0 34.6M 0 --:--:-- 0:00:20 --:--:-- 39.2M 100 733M 0 733M 0 0 34.8M 0 --:--:-- 0:00:21 --:--:-- 39.5M 100 772M 0 772M 0 0 35.0M 0 --:--:-- 0:00:22 --:--:-- 39.3M 100 811M 0 811M 0 0 35.2M 0 --:--:-- 0:00:23 --:--:-- 39.1M 100 850M 0 850M 0 0 35.4M 0 --:--:-- 0:00:24 --:--:-- 39.3M 100 891M 0 891M 0 0 35.6M 0 --:--:-- 0:00:25 --:--:-- 39.7M 100 930M 0 930M 0 0 35.7M 0 --:--:-- 0:00:26 --:--:-- 39.4M 100 971M 0 971M 0 0 35.9M 0 --:--:-- 0:00:27 --:--:-- 39.9M 100 1012M 0 1012M 0 0 36.1M 0 --:--:-- 0:00:28 --:--:-- 39.9M 100 1051M 0 1051M 0 0 36.2M 0 --:--:-- 0:00:29 --:--:-- 40.0M 100 1091M 0 1091M 0 0 36.3M 0 --:--:-- 0:00:30 --:--:-- 39.8M
You can see an up-to-date list of the interconnection facilities where we peer on our PeeringDB entry here. This currently includes the Equinix NY4 and the CyrusOne Aurora I data centers in Secaucus and Aurora respectively. You can also see our network architecture and selection of peers through public BGP tools like bgpview.
Databento doesn't stop here when it comes to latency optimization. We find that routers and firewalls usually contribute to the next largest source of variance in WAN latency. Considering the tons of literature on switching latency and even the internal latency of cloud-based trading platforms these days, there's surprisingly little literature on router latency. We make use of proprietary DPDK-accelerated firewalls and routers from our partnership with Netris and NVIDIA, which deliver combined routing, firewall, and load balancing in 63 microseconds of median latency. Combined with our blazing-fast data gateways written in Rust, we're able to deliver real-time market data to public cloud servers—commodity servers that can be spun up for as little as $50 per month—in under 315 microseconds. The next cheapest solution that achieves this costs a staggering 30-100 times as much—showing that we've made a generational shift in the Pareto frontier between cost and latency.
We see an increasing trend of our users moving towards cloud-native trading applications where WAN latency is the source of the largest variance. If your trading application runs in the cloud and you want a feed with consistently low latency, then Databento is a great solution for your market data needs.
If you're interested to learn more about our current architecture, see our docs for our network connectivity guide and architecture.