Persistent BGP Routing Inconsistency Causing Intermittent Reachability Issues for My Website

Hello RIPE Community,

I am currently facing a persistent and highly concerning routing issue affecting my website, and I am hoping to get insight from members with expertise in BGP and inter-domain routing. The core problem is intermittent reachability of my website from certain geographic regions and autonomous systems, despite the server infrastructure being fully operational and reachable from other networks without issue. From my monitoring tools, I can see that the web server remains online with stable CPU, memory, and application performance metrics. However, users from specific ISPs report timeouts or complete inability to establish a connection, while others access the site normally at the same time. This strongly suggests that the problem is not at the application or server layer but rather related to upstream routing behavior.

Upon deeper investigation, traceroutes from affected networks show that traffic appears to stop at an upstream transit provider before reaching my hosting ASN. In contrast, traceroutes from unaffected networks follow a different AS path and reach my server without delay. BGP looking glass tools reveal that my prefix is being announced correctly from my hosting provider’s ASN, and global route visibility appears normal in most route collectors. However, certain AS paths seem to prefer alternative routes with unusually long AS path lengths or possibly suboptimal upstream peering arrangements, leading to packet loss or complete route instability. This inconsistent propagation behavior is difficult to diagnose because it does not present as a full outage but rather as selective reachability failure.

I have verified with my hosting provider that there are no active DDoS mitigation rules, prefix filtering policies, or RPKI invalid states affecting the prefix announcement. RPKI validation reports indicate that the prefix is properly signed and not marked as invalid. Additionally, no recent configuration changes were made to the routing policies or prefix advertisements. Despite this, affected users consistently report connection timeouts, and network monitoring shows that packets from specific regions are not reaching the server interface at all. This suggests that somewhere upstream, route selection or filtering behavior is causing traffic to be dropped or misrouted before reaching the destination ASN.

What makes this issue particularly challenging is that it appears intermittent and geographically selective. At times, the website becomes reachable again from previously affected networks without any intervention from my side. BGP update logs from route monitoring platforms show occasional path changes for my prefix, but no obvious withdrawals or major disruptions. It is possible that route flap damping, peering instability, or traffic engineering by upstream providers is causing inconsistent path selection. However, without direct visibility into upstream AS routing policies, it is difficult to determine whether the problem lies in my provider’s announcements, a transit provider’s filtering rules, or a remote ISP’s path preference decisions.

I have also considered the possibility of asymmetric routing contributing to the issue. In some traceroutes, the forward path appears normal, but it is unclear whether return traffic is taking a different path that may be filtered or rate-limited. Packet capture analysis on the server side does not show incoming SYN packets from certain reported client IP ranges, reinforcing the suspicion that traffic is being dropped before it reaches my ASN. Given that no firewall or application-layer restrictions exist on my end, the evidence increasingly points toward an upstream routing inconsistency rather than a server configuration issue.

I am seeking guidance from the RIPE community on effective methods to further diagnose and isolate this routing inconsistency. Specifically, I would appreciate advice on identifying selective prefix filtering, detecting partial route propagation, analyzing AS path anomalies, and verifying whether upstream transit providers may be inadvertently influencing reachability. Additionally, recommendations on best practices for coordinating with upstream providers, leveraging RIPE RIS data, or validating route propagation across multiple collectors would be extremely valuable. My ultimate goal is to ensure consistent global reachability of my website’s prefix without intermittent routing failures affecting specific networks or regions. Very sorry for long post!

Is there anyone who can guide me?