Yesterday from a post at the Deploy360 website, I learned of Comcast's IPv4 and IPv6 network speed testing tool:
http://speedtest.comcast.net/
I did a quick test from my laptop in my office and got some very surprising results. The measured IPv6 performance was better than IPv4 by a gigantic margin. With IPv6, I got 822 Mbps download and 667Mbps download throughput. With IPv4, a mere 99Mbps upload and 18Mbps download!
Something seemed fishy, but I had to run off to other work, so I quickly posted the result to Twitter, planning to look into it later.
This generated quite a bit discussion with numerous folks on twitter and elsewhere. My initial speculation was that we do some rate limiting of IPv4 traffic at the Penn border routers for selected areas of the campus, and perhaps this was throttling the IPv4 performance. My other suspicion was that there was something significantly different in the IPv4 and IPv6 routing paths contributing to the difference. The graphic above does show a round-trip time difference of 63ms for the IPv4 path and 32ms for the IPv6 path, which suggests this. Furthermore, if the TCP window is not scaled properly to keep the pipe filled for this path at 63ms (but was for 32ms), then that would decrease throughput also - but not enough to account by itself for the observed difference.
Patrik Falstrom suspected a DPI device or other middlebox causing the problem. The only problem is that we don't have any such middleboxes (unless you consider an IP border router imposing IP address based rate limits a middlebox). In any case, I was leaning towards the rate limits as the cause myself, until I confirmed that those rate limits weren't being applied to any of the traffic from my office network. The rate limits are primarily targeted at the student residential dormitories - without them, our external links typically get overwhelmed with traffic to/from the dorms (most likely due to file sharing, a very common activity on college campuses). The border routers are configured to apply a token bucket rate policer to each individual IPv4 address within the network prefixes that cover the residential networks. Note that this rate limiting is completely application agnostic. Also note that this scheme cannot scale to IPv6 (a single IPv6 subnet has more than 18 quintillion addresses!), a problem we're ignoring for the time being :-)
This morning, I decided to do another test (same laptop), but more carefully, and along with a packet capture. I also explicitly turned off the wireless interface (hmmmm) to make sure that all tests were using the wired gigabit ethernet interface. This time, I got much more reasonable looking results, both address families in the neighborhood of each other: IPv4 853Mbps down, 547Mbps up, and for IPv6 827Mbps down, 730Mbps up. One other difference I notice is that the roundtrip (ping) times to the destination server are 12ms for both IPv4 and IPv6. This is substantially different from yesterday's test (63 and 32ms respectively) despite the fact that I choose the same destination server at Comcast (Washington, DC).
A packet capture reveals that the destination server at Comcast for IPv4 was 68.87.73.52, and for IPv6 was 2001:558:1010:5:68:87:73:52. Are these the same endpoint? Hard to tell, but the fact that the last 4 fields of the IPv6 address spell out the IPv4 address in decimal might be a hint. The traffic streams use TCP port 5050. A traceroute to the IPv4 destination shows the outbound path takes one of Penn's commercial ISP links (Cogent) to New York and then back to Washington/VA. An IPv6 traceroute shows the outbound path goes out via our Internet2 link, the I2 commercial peering service, then Cogent (New York), Level3 (New York), and then Comcast to DC. So the IPv4 and IPv6 paths are substantially different in the forward direction. Harder to tell the path for the return traffic without the aid of some reverse traceroute tools or similar.
Getting a substantial fraction of a gigabit ethernet is not suprising - that's probably the bottleneck bandwidth along the measured path. My laptop has a gigabit ethernet connection to the building network, which in turn has dual 10 Gigabit Ethernet links to a 100 Gig campus core, and then multiple 10Gig links out to commercial ISPs/Internet2 etc. Most tier-1 ISP links and peerings are typically at least 10Gig.
The bandwidth-delay product on these paths is about 1,464 KB (1000Mbps * 12ms). The Comcast endpoint's receive window exceeds this, but my laptop's is slightly undersized, so I could probably do a bit of host tuning to boost the download numbers a bit more.
So, what's the explanation for the strange results I got yesterday? I wish had a packet capture to investigate, but my leading suspicion is that my laptop's wireless adapter (lower bandwidth, shared medium) was used in the IPv4 test, and the wired connection for the IPv6 one. If I have time later, I'll try to reproduce the issue.
--Shumon Huque
Addendum (February 9th 2014) - On closer inspection of the packet trace, the speed test appears to use multiple TCP streams in parallel, so scaling the window as high as the bw*delay product of the path isn't necessary.
http://speedtest.comcast.net/
I did a quick test from my laptop in my office and got some very surprising results. The measured IPv6 performance was better than IPv4 by a gigantic margin. With IPv6, I got 822 Mbps download and 667Mbps download throughput. With IPv4, a mere 99Mbps upload and 18Mbps download!
Something seemed fishy, but I had to run off to other work, so I quickly posted the result to Twitter, planning to look into it later.
This generated quite a bit discussion with numerous folks on twitter and elsewhere. My initial speculation was that we do some rate limiting of IPv4 traffic at the Penn border routers for selected areas of the campus, and perhaps this was throttling the IPv4 performance. My other suspicion was that there was something significantly different in the IPv4 and IPv6 routing paths contributing to the difference. The graphic above does show a round-trip time difference of 63ms for the IPv4 path and 32ms for the IPv6 path, which suggests this. Furthermore, if the TCP window is not scaled properly to keep the pipe filled for this path at 63ms (but was for 32ms), then that would decrease throughput also - but not enough to account by itself for the observed difference.
Patrik Falstrom suspected a DPI device or other middlebox causing the problem. The only problem is that we don't have any such middleboxes (unless you consider an IP border router imposing IP address based rate limits a middlebox). In any case, I was leaning towards the rate limits as the cause myself, until I confirmed that those rate limits weren't being applied to any of the traffic from my office network. The rate limits are primarily targeted at the student residential dormitories - without them, our external links typically get overwhelmed with traffic to/from the dorms (most likely due to file sharing, a very common activity on college campuses). The border routers are configured to apply a token bucket rate policer to each individual IPv4 address within the network prefixes that cover the residential networks. Note that this rate limiting is completely application agnostic. Also note that this scheme cannot scale to IPv6 (a single IPv6 subnet has more than 18 quintillion addresses!), a problem we're ignoring for the time being :-)
Repeat of the test
This morning, I decided to do another test (same laptop), but more carefully, and along with a packet capture. I also explicitly turned off the wireless interface (hmmmm) to make sure that all tests were using the wired gigabit ethernet interface. This time, I got much more reasonable looking results, both address families in the neighborhood of each other: IPv4 853Mbps down, 547Mbps up, and for IPv6 827Mbps down, 730Mbps up. One other difference I notice is that the roundtrip (ping) times to the destination server are 12ms for both IPv4 and IPv6. This is substantially different from yesterday's test (63 and 32ms respectively) despite the fact that I choose the same destination server at Comcast (Washington, DC).
A packet capture reveals that the destination server at Comcast for IPv4 was 68.87.73.52, and for IPv6 was 2001:558:1010:5:68:87:73:52. Are these the same endpoint? Hard to tell, but the fact that the last 4 fields of the IPv6 address spell out the IPv4 address in decimal might be a hint. The traffic streams use TCP port 5050. A traceroute to the IPv4 destination shows the outbound path takes one of Penn's commercial ISP links (Cogent) to New York and then back to Washington/VA. An IPv6 traceroute shows the outbound path goes out via our Internet2 link, the I2 commercial peering service, then Cogent (New York), Level3 (New York), and then Comcast to DC. So the IPv4 and IPv6 paths are substantially different in the forward direction. Harder to tell the path for the return traffic without the aid of some reverse traceroute tools or similar.
Getting a substantial fraction of a gigabit ethernet is not suprising - that's probably the bottleneck bandwidth along the measured path. My laptop has a gigabit ethernet connection to the building network, which in turn has dual 10 Gigabit Ethernet links to a 100 Gig campus core, and then multiple 10Gig links out to commercial ISPs/Internet2 etc. Most tier-1 ISP links and peerings are typically at least 10Gig.
The bandwidth-delay product on these paths is about 1,464 KB (1000Mbps * 12ms). The Comcast endpoint's receive window exceeds this, but my laptop's is slightly undersized, so I could probably do a bit of host tuning to boost the download numbers a bit more.
So, what's the explanation for the strange results I got yesterday? I wish had a packet capture to investigate, but my leading suspicion is that my laptop's wireless adapter (lower bandwidth, shared medium) was used in the IPv4 test, and the wired connection for the IPv6 one. If I have time later, I'll try to reproduce the issue.
--Shumon Huque
Addendum (February 9th 2014) - On closer inspection of the packet trace, the speed test appears to use multiple TCP streams in parallel, so scaling the window as high as the bw*delay product of the path isn't necessary.