You would be startled by the number of studies conducted on the geography of cyber-attacks that overlook a key factor: the use of proxies. The art of hiding one’s IP address behind another one through the means of a Virtual Private Network (VPN), anonymity networks (such as TOR), or data centers, is an unsurprisingly common technique within the hacking industry. The neglect in previous cyber-geopolitical research could be explained as a misunderstanding of the concept of proxies and their effects, or also a lack of resources to identify them.
Continuing the ongoing Remote Desktop Protocol (RDP) GoSecure research, the focus is now shifted towards the use of proxies. The usual way to detect those tools is through networking traffic, an artifact that the RDP protocol does not collect during login attempts. This raises the question: how do we detect proxies when only given an IP address? The most obvious solution is to look at maintained lists of VPN providers exit nodes. An exit node determines the external IP address of a server, as it acts as the final gateway between one’s internal network and the rest of the internet. Checking whether an IP address is present in the database would allow for a very efficient and accurate test of whether it is a VPN or not. However, keeping track of the few billion IP addresses that exist represents a tremendous job. Moreover, not only would these lists be in constant refreshment, but they would also not account for all the types of proxies.
Other methods to identify proxies could include looking at reverse DNS lookups. Once the domain name of an IP address is obtained, it could be easy to recognize a VPN provider and determine if it is a proxy. Similarly, if the IP address is located close to a known VPN or data center hub, the probability that this address is also a proxy increases. Of course, those methods only represent a portion of all the techniques that could be employed. Thankfully, there exists some online services that do it all for us: the VPN detection Application Programming Interfaces (APIs).
Testing the services
Five services were tested to attempt to draw conclusions about our data. The first one, ip-api.com, is a recent (2022) tool free for non-commercial use and provides insights on whether the received IP address is a proxy or is used for hosting on top of the basic information. The second API retained for the tests is ipapi.is, which is also relatively new to the market and offers the same type of results. It is partially free, as there are paid tiers for needier use-cases. Similarly, the third tool, ipqualityscore.com, also has free and paid tiers but specializes in fraud prevention through the analysis of IP addresses’ reputation. Virus Total was the fourth tool used. Primarily known for its IP addresses reputation check and suspicious file scanning services, Virus Total does sometimes include a “VPN” tag to certain IP addresses (see Figure 1). This feature is comprised within the first, free tier of the API. Finally, Neustar’s paid Fraud Solution was the last tool employed. This fraud detection solution sometimes detects the “VPN service,” so this is what we will focus on.
Figure 1. Detection of a VPN by Virus Total
Although some tools’ main solution is not the detection of proxies, one of the goals of the study was to evaluate where the quality really stands when the service is being offered. The five tools were tested on a subset of 3 months of login attempts on our honeypot servers. This subset comprises over 3.4 million login attempts by 1529 unique IP addresses. More information about this data can be found here.
Graph 1. Percentage of IP addresses flagged as proxy for each tool tested
The disparity between all results is quite flagrant (see graph 1): Virus Total flagged a total of 4%, or 61 addresses, as proxies, while ipqualityscore.com detected a whopping 1185 out of 1529 (so 77.5%). Neustar and ip-api.com delivered equivalent results, with about 17% of IP addresses flagged.
Looking at the similarities between all the tools, a clear observation could not be drawn (see Table 1). The tools do not identify the same IP as proxies: only a tiny 25 IP addresses were flagged by all 5 tools. On the other hand, ip-api.com and ipqualityscore.com respectively flagged 29 and 310 IP addresses that were not recognized by any other tools.
Agrees with… | 0 other API | 1 other API | 2 other APIs | 3 other APIs | 4 other APIs | TOTAL |
ip-api.com | 29 | 29 | 62 | 128 | 25 | 273 |
ipapi.is | 0 | 527 | 137 | 124 | 25 | 813 |
ipqualityscore.com | 310 | 565 | 155 | 130 | 25 | 1185 |
Virus Total | 0 | 4 | 4 | 28 | 25 | 61 |
Neustar | 0 | 7 | 107 | 110 | 25 | 249 |
Table 1. The number of API each API agrees with |
What could explain such a big contrast between the results?
Looking at the results might be a bit deceiving at first, because even amongst the tools whose primary purpose is to identify the proxies, the variance is quite significant and inconclusive.
One reason for this phenomenon could be the lack of precision of the returned data. Indeed, once the API response is parsed, the result is either a “0” or a “1”. A “1” implies that the IP address is a proxy, while a “0” means that it is not. However, as the research progressed, we realized that a “0” could also mean that no substantial information was available, and that the default result is set to “0”. Thus, a “0” does not necessarily indicate a regular residential router but rather a lack of sufficient data to make a definitive determination. This ambiguity highlights the need for tools to include an “N/A” option.
Another explanation for disparate results between the IP addresses is the difficulty of identifying a proxy but also to unflag an IP address that is no longer used as a proxy. Just like more recent databases could be considered as too fresh to have accurate data, older ones could be regarded as too polluted by years of data.
Conclusion
Before starting this project, the general assumption was that analyzing the behavior of attackers based on their use of proxies would be simple: using any proxy detector tool and assuming its information is accurate should do the trick. However, to ensure the rigor of our research, we decided to compare several proxy detectors to demonstrate consistency across various sources of information. Contrary to our expectations and as demonstrated in this blog post, the results were highly inconsistent, with each tool producing mixed outputs.
Given these discrepancies, the following question arises: which tool should be trusted? Should we rely on the one that identifies the most proxies? There is no indication that this tool is more accurate than the others. Further analysis comparing the tools’ results against concrete data (where the presence of proxies is known) is needed.
Stay tuned for our next blog post in this series to find out which tool performs best. This approach will help us evaluate their accuracy and reliability more effectively.
Author: Constance Prevot
We would like to thank Andréanne Bergeron for the supervision of this research project and for further writing and reviewing of this blogpost.
CAS D'UTILISATION
Cyberrisques
Mesures de sécurité basées sur les risques
Sociétés de financement par capitaux propres
Prendre des décisions éclairées
Sécurité des données sensibles
Protéger les informations sensibles
Conformité en matière de cybersécurité
Respecter les obligations réglementaires
Cyberassurance
Une stratégie précieuse de gestion des risques
Rançongiciels
Combattre les rançongiciels grâce à une sécurité innovante
Attaques de type « zero-day »
Arrêter les exploits de type « zero-day » grâce à une protection avancée
Consolider, évoluer et prospérer
Prenez de l'avance et gagnez la course avec la Plateforme GoSecure TitanMC.
24/7 MXDR
Détection et réponse sur les terminaux GoSecure TitanMC (EDR)
Antivirus de nouvelle génération GoSecure TitanMC (NGAV)
Surveillance des événements liés aux informations de sécurité GoSecure TitanMC (SIEM)
Détection et réponse des boîtes de messagerie GoSecure TitanMC (IDR)
Intelligence GoSecure TitanMC
Notre SOC
Défense proactive, 24h/24, 7j/7