Atlas probe v5 suddenly died

pinging the Probe from the fifer router on which it is directly connected works flawlessly (about 0.8 ms each ping, 0% loss, same success when pinging successfully from a PC on the LAN side of the same router, or from a mobile phone connected to its embedded Wifi spot. As well DHCP works 100% of time. According to the router, it has been connected 100% of time (except when I manually plugged if off and back on from the USB powercable; I also tried rebooting my internet router as well, and checked its own security/firewall settings: all is clean, nothing is blocked from the probe, there’s no spurious NAT rules, no MAC address restriction, no datetime restriction; the router’s own log files also show no alert that could force it to adopt a restriction).
Is there an IP address for the server it’s supposed to connect to that I can ping (or “webping” with HTTP(S)/STAT) from a PC on the LAN side?
So may be the probe no longer finds an IP route to your server since May 15 (after a visible roeboot you made at that date, it works for about 3 hours and finally stopped). My opinion is that you’ve placed the proble in the wrong flight: is it incorrectly configured to connect to of your some proxies intended to be used only from Russia/Belarus or from Asia, and your server never detects the connection due to a bad route? Or there was a problem when reconfiguring my account, causing an error in your database?
Also I’m not aware of any major internet routing disruption on May 15.

Your colleague just asked me to follow the suggestions displayed on

However most of them do not apply: the probe v4 has NEVER had any USB flash stick, only a USB power connector, and a RJ45 Ethernet port.

  • The two external LEDs on the Ethernet port are flashing (irregularly) for a very brief time.
  • the two internal LEDs (visible through the air vents, on the internal mother baord near the USB port) are constantly on.
  • the probe is connected to the router, has IP assigned, responds to ICMP pings from the fiber router, from the LAN or from the Internet.

Visibly it cannot upgrade its firmware as intended, or cannot reach your download server (or mirror) which may be inactive or not reachable from France. have you configured it to connect on a download server in Azerbaijan? I know that some locations in Azerbijan are not reachable from France since May (this affected other internet services as well, such as OpenStreetMap tile image servers, that had to be relocated, due to a damaged link passing through the war zone in Ukraine/Russia). Some of your mirror servers may be affected due to missing routes.

My ISP is 5410, it has excellent upward perring links to 3 GIX in France, and additional links in the Netherlands, Italy and 2 in US (the link through UK has been closed, possibly because of EU regulation). But some trafic has to be redirected. I think this is a problem in your own domain zone or the peerings of your own service providers for mirror, or their own domains/subdomains you’ve subscribed for hosting this service. In that case I should not be alone, but it I look into current stats, it seems that all probes that were hosted in my ISP 5410 are also inactive since May: there’s probably a missing route between my ISP and your service provider for your firmware updates.

The router is not misconfigured, and shows no indication at all of blocking any trafic

I also note that on the day just before this occured (on 14 May) there was a sudden huge drop in ping roundtrip time to DNS route zones, immediately followed by a return to normal, then a few hours later all roundtrip times were increased by about 20ms for a few hours, before coming back to normal (for most DNS root servers, except server K which is now failing most of the time, even though it was the preferred and fastest one before May). This seems to be related to a major reconfiguration of the international routing (and I suspect this is an effect of the UA-RU war).

Interestingly, the “K” root server is the one operated by you, RIPE. Normally it should work as well for my ISP which has colocations in Paris, Lyon and Marseille (RIPE has colocation in 2 of them for France) as well as the Netherlands. From my PC I detect no defect now in K root-servers, times are normal (and in fact better than what your current measurements indicates online). MY ISP (as well as other major providers like Google, Facebook, Level3, Cloudflare, Microsoft…) seems to have correctly managed the outage that occured on May 14 on a link to Azerbaijan, by finding alternate routes, but providers for some of your download servers may still be affected and no longer reachable (you may need to check your DNS records, may be one critical entry is defective and was not updated to route your needed services and there’s no alternate backup route applicable via other remaining entries).

Hello @igormp and @verdyp ,

I’ve requested some additional logging information in the hope I can find out a bit more about probes 54080 and 60623.

@verdyp, the blink pattern you describe on a v4 probe indicates a successful connection to the backend.

I cannot speak to the K root, unfortunately, my focus is on the probes.

Will let you know as soon as I have any further information.

Regards,

Michel

1 Like

Which kind of “log” are you requesting? There’s no known interface where I can see any log from the probe.
May be you have access to it (because it is accessible at it’s IP and replies to pings from the internet (both ICMPv4 and ICMPv6). May be you have an internal access but I could not detect any working telnet, http, or SSL port with any visible prompt. I suppose that the private interface sends no prompt at all and just accepts incoming connections with specific credentials and request data initiated from you.

You said that the irregularly blinking Ethernet leds indicates that it is conencted to your backend, but then your backend does not forward the request to your effective server, so this could be a misconfiguration of your firewall or some misconfiguration in the user database since 3 months.

Or the probe forgot to receive some required firmware update in May, and the needed new firmware is not available on the server that was previously configured and that you may have shutdown and replaced (hoping that all probes configured on it had already applied the needed update to use a newer formware server before you definitely shutdown the previous server; this shutdown was made too soon, possibly because you had seen my probe connecting “successfully” on May, but still possibly using the older firmware, that it now no longer compatible with newer Atlas requirements).

It’s very likely that the probe since 3 months is desesperately trying to perform its firmware update at boot time (from a no longer working server) before performing any other test on Atlas itself (the last successfull connection on Atlas was in May, and it worked only for 3 hours and then stopped completely, was rejected and any reboot now fails to get the new needed firmware (and there’s no longer any redirection from the older address/domain/URL that was configured and used before May).

Anyway I once again powered off and on the probe, and nothing has changed, it is still shown as offline (but still replies to IPv4 and IPv6 ICMP pings from the LAN, from the ISP router on which it is connected, and from remotes hosts on the net, and also successfully retrives its IPv4 and IPv6 addresses with GDCP from the router, where it is shown as connected and uses addresses reported exactly identical to those shown on my Atlas probe online account).

Other possibilities is that:

  • the probe itself received a bad firmware update, or the firmware had issues to be applied (and then no way to revert to a recovery mode to reapply it, or the recovery part of the firmware is not functional).
  • the probe was contaminated by an external attack from the net, which could get some root access and corrupt the internal configuration data (it would then be interesting for you to get it back for analysis, to see what was corrupted).
  • the probe has a partial damage in its flash memory, making some parts of the firmware unreadable, or in its internal RAM (it boots successfuly but

Online you indicate that it was “abandoned” on May 15 around 11:00 UTC, the last recorded measurements were successfull a few minutes before. The probe is however marked as being been successfully connected to the backend one hour later around 12:00 UTC, and then visible for 3 hours until about 15:00 UTC. It has rebooted automatically at midday and 15:00 UTC (there was no power or internet outage on that day at my home).

If I send you back the probe to the Netherlands, there will also be the USB power cable that came with it (it is impossible to remove it, apparently it is physically soldered with the microUSB connector, indicating that there was at some time a too high current passing through a power pin, this is not new this occured about 2 years ago, but the probe was still continuing to work up to May 15; I signaled it 2 years ago).

The USB-A connector (on the power adapter) is normal and shows no visible damage. as well no damage on the Ethernet port.

What do I need to return to you? I have the black probe plus the black USB power cable now permanently fixed to it, the small power-USB adapter, the short Ethernet RJ45 cable?

Hi,

We only need the probe. No need to send either the power adapter or the ethernet cable back.

One last thing you could try to do: connect it to another internet connection and see if that brings it back.

Firmware updates are not the issue here, the probe is running the latest firmware. We see it connects to our registration server, gets assigned a controller and then it never gets there which might indicate a problem with the internal flash assuming it’s not a network issue.

We would very much like the probe returned so we can see what the issue was and perhaps avoid it for the future.

Kind regards,
Johan

Hello, I got my hands on a new probe v5 (ID: 64729) and I am seeing the same condition. I have a v3 probe (ID: 54620) and a software probe (ID: 1008755) connected to the same router but on different ASNs working fine.

Is this issue specific to v5 probes?

I’m having a similar issue. I upgraded my firewall this morning w/o issue, the Atlas Probe V5 is not reconnecting, shows offline for over 8 hours now.

I can ping it and I see DNS & NTP traffic in ZenArmor.

Probe #61467

Edit: The probe just randomly reconnected. If someone fixed it on the backend, thank you!

My hardware probes now show as connected but not able to make any measurements. This is definitely not an issue with my network.

Now I am having issue with software probes as well. For example: 1008768, which I created yesterday night but never got connected once.

I’ll send the probe (and the USB cable that cannot be detached as it is visibly soldered on the connector, trying to remove it by force would definitely break the connector or the internal motherboard).
You see that it runs the correct firmware, and see it connecting to the registration server (so I doubt it is a firmware installation problem or network issue), can’t you open a remote session on it to get its internal logs or get some of its internal configuration variables and profiles?
If it can’t reach the controller with the firmware and configuration data, may be the controller is not reachable from my location (missing internet routing from my ISP, possibly a problem in your DNS or firewalls or own routers or routers of your provider: if you use CloudFlare, it has very frequent down times). If my IP is blocked somewhere in your firewall or in your hosting providers, I am not ware of that (may be it was blacklisted for unknown reason, posibly with a too large blocking or bad configuration of their blacklists listing too many IP addresses, or due to specific requirements for the routing, or broken/expired certificates for secure sessions, or unsupported security/encryption algorithms: may be a required certificate is nwo missing in the probe firmware or supplemntal data).
Why don’t you include a small http server that can be used to see basic status information or initiate a firmware reset or reset of its configuration data, if the firmware was badly reconfigured after the last update on May 15 which caused it to work just for 3 hours and never again after?

Hey camin,

Not sure if I should just start my own thread or not. My Probe (61467) was rock solid until I upgraded my firewall yesterday morning, now it seems to be up but is having connection issues. I can ping it on my LAN and I see some traffic (mainly DNS & NTP, some pings). It was offline for about 10 hours yesterday, then randomly came back up and suddenly dropped again about 15 minutes ago. My Internet has been fine since the firewall upgrade.

Aaand it’s back up. Not sure what’s going on, guess I’ll just wait it out if it goes offline again.

Hi there,
my probe 61103 seems to be affacted, too. I can see DNS and ICMP traffic, but no response from the other side. Interestingly it’s down for almost the same time as @milindhvijay 's probe 64729.
However there was a firewall block for domain onclkds . com coming from the probe - can this be true and/or relevant?

Just a heads up that today that same probe is acting weird today with lots of disconnects and connecting back (even though the actual network connection is rock solid).

At the moment of writing this message it’s saying it’s disconnected, even though internet is working and it somehow is sending those SOS DNS messages without issues to your backend.

I thought I was the only one having problems! My probe has been disconnecting for the last 24 hours or so. I tried a new USB drive, a different switch, etc. Then I read that it could be an infrastructure problem? My probe is 24725. Thank you.

Richard Bejtlich

I am also having problems with probe 63034. ISP went offline for around 2 min at 1am est and my device hasn’t come back online. tried to reboot it but nothing. at this time i have a red light on for 1 sec and off for 1 sec continuously. It pulls an ip via dhcp and i can ping the probe locally.

We are aware of an ongoing problem, and have an entry about this on the RIPE NCC status page: https://status.ripe.net/

2 Likes

Did you receive the RIPE probe with the letter I have sent last week to the Netherlands at the address you gave me? I’m not able to see the tracking from the French postal service after it crossed the French border, so I don’t know which service is delivering it. May be it’s still blocked by the international customs (that may have opened the bubble letter) that contained a small protecting carton with the probe and USB cable inside. I used an official French postal enveloppe bought in a post office.

Your announcement posted on Friday here comes too late. My probe has been unable to connect since 3 months, I signaled that, it was then marked as abandoned. You gave me a postal return address and I’ve sent it to the Netherlands last week. You last attempt to change the server on which it should be connecting to still failed after 24 hours of tests and your modified configuration (even though you noted that it was connecting on your frontend, it falied to go further, even if yuo said that the firmware version was apparently OK).
Not much details, but my opinion is that the firmware forgot to include a certificate needed for secure connection (SSL or TLS), or included an obsolete/expired certificate and no alternate one valid for the target domain and connection was just rejected by your servers since May 15 (it worked only 3 hours after the last firmware update, but your servers then rejected the old certificates presented, or the firmware just dropped it automatically after expiration, possibly because of an incorrect date/time setting in the probe or a failure to properly set this date from a reachable NTP server).
You did not give any detail about the cause of this failure.

Anyway I no longer have this probe, it’s now in the Netherlands, and you should have received it. You should be able to use your debug tools from the USB or Ethernet port (don’t try to remove the USB cable, it’s apparently permanently soldered inside the USB port since several years, but the probe was still working properly up to May 15 15:00 UTC, so I’ve sent you the black probe box with its USB cable still attached to it; however the pack I sent you does not include the small Ethernet cable and the small USB power adapter box, to reduce the mailing cost: it could fit in a standard bubble letter with an additional internal protective carton for a total weight below 90 grams).

Note that I had repeatedly sent messages on this forum since the first few days when I saw it was “disconnected”. However the “community” flagged it incorrectly as “inappropriate” (even if it was perfectly on topic, not harassing anyone, and not violating any rules; a couple of those messages were restored by RIPE admins a few days ago…)… So my messages here were unnoticed by RIPE since 3 months… I don’t know how you monitor the “community” actions when your contributors just use the communication tools as they should.

Issue still seems to be going on:

Funny to see it only being connected 5 minutes at a time, then going back to disconnected:

As always, SOS packets are still being sent even though it says it’s disconnected, go figure:

Hi Philippe,

As one of the forum admins, I took a closer look at your flagged posts. They were flagged due to some of our customised forum settings. I’m aware that it showed up as a community flag - we had rather strict settings that have been changed in light of your posts being flagged, since they were, as you rightly point out, not inappropriate.

Kind regards,

Ulka