AS path information in mirror selection #308

ott · 2022-01-16T02:32:14Z

At the moment, mirror selection is based on geolocation information, preference for IP prefixes and ASNs and Internet2 membership. In some countries, the Internet topology and peering agreements do not allow the conclusion that a mirror should be preferred because it is in the same country or geographically nearby and has high bandwidth.

Germany, for example, has a star topology that is heavily centred around the city Frankfurt and some larger Internet access providers have a restrictive peering policy and higher interconnection costs. As a result, packet latencies are not strictly proportional to the distance between nodes and some part of the countries' autonomous systems are connected through paths to larger Internet access providers that can be congested at times of high utilization, for example, as it was sometimes evident during restrictions of public life during the COVID-19 pandemic.

An example would be a client in the northern city Hamburg which receives the URL of a mirror server in the same city with a high preference, even though the path from the client to the server goes through the southern city Frankfurt because it is a major Internet interconnection point in Europe and many networks Germany interconnect in this city. So the latency between client and server can easily be 30 ms for residential broadband Internet connections and the all data has to be sent twice over more expensive and potentially more congested long-range Internet paths, even though the distance between both hosts is only a few kilometres and both hosts are in one of the countries largest and economically most important city. In this example it would perhaps be better to assign a higher preference to a server in Frankfurt because it might have lower latency and higher usable bandwidth.

Moreover, many mirror servers for Free Software projects in the country and perhaps the majority of bandwidth of mirror servers are provided by institutions of higher education. These institutions are members of the German Research Network DFN (AS680) that connects them to the Internet. Although it changed a bit over the last years, it is still not uncommon that paths to AS680 lead through AS3320 (Deutsche Telekom). It seems that some paths through AS3320 can become congested because Deutsche Telekom charges more than others providers for transit and not all of their customers can or want to pay for the capacity that they need at times of peak utilization and also are not able to do traffic engineering to avoid the problem. As a result, these mirror servers are often used but the download speeds can be lower than with other mirror servers. I have also heard of similar results with AS3209 (Vodafone), formerly AS31334.

Clearly, a mirror selection algorithm cannot account for all of the aspects of the example and related problems with Internet topology and capacity without realtime end-to-end measurements of network capacity and latency. Nonetheless, it might be possible to use AS path information to improve the result of the algorithm.

A first use case could be to use information about customer and peering relationships between autonomous systems. The hypothesis is that a peering or perhaps also a customer relationship indicates a higher bandwidth and lower latency path between hosts of both autonomous systems. A source for this AS relationship information could be the CAIDA AS relationships dataset.

ott · 2024-08-18T21:34:36Z

One metric that can be easily computed with a pre-trained model seems to be the distance of vectors that are computed by the BGP2Vec algorithm.

I'm skeptical about neural network based algorithms that are not based on an explicit model of reality. However, my experience with word2vec is that it is with major doubt a simple measure of semantic relatedness of English words. So I can imagine that it could work BGP paths too and that BGP2Vec could be used as a simple measure of AS relatedness.

It seems likely that graph based algorithms could be a better measure. On the other hand BGP2Vec seems less computationally intense and seems to have lower storage requirements once the model is trained.

As only BGP paths are used for training and measured data transfer rate or latency are not considered, the result is likely better than nothing but will not be perfect. For example, it could perhaps account for special network topologies in a country, like it used to be in North Macedonia, where most data between large autonomous systems was exchanged in other countries.

Perhaps BGP2Vec could be extended with BGP weights as a basic and incomplete indirect measure of data transfer rate or latency that can be expected.

This was referenced Sep 17, 2022

Avoid by and ru mirrors for ua requests openSUSE/MirrorCache#298

Merged

Fails to select a mirror in some cases etix/mirrorbits#113

Open

ott mentioned this issue Sep 8, 2024

Inaccurate geolocations in Fedora Mirror Manager PhirePhly/micromirrors#7

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AS path information in mirror selection #308

AS path information in mirror selection #308

ott commented Jan 16, 2022

ott commented Aug 18, 2024

AS path information in mirror selection #308

AS path information in mirror selection #308

Comments

ott commented Jan 16, 2022

ott commented Aug 18, 2024