Probe keeps disconnecting and reconnecting with no changes made

Since today, my probe keeps disconnecting and reconnecting to the atlas network.
FW logs do not show any blocks, and in any case, no changes were made these past few days.
The USB storage has been replaced several times without any effect…

1 Like

Hi, you’re not the only one. Since this morning, I saw that my probe #62520 on AS2027 connects/disconnects every 10-20minutes or so. It’s not linked to a specific AS (I have another probe on the same AS which is working fine (#61107), but other probes on that AS on the other side of France also have the issue.

We can rule out FW issues, I have checked and as you stated, no drops whatsoever (and the probe is reachable from other local vlans - at least they respond to ping).

From the global probe map, I saw some probes in my region and elsewhere show similar behaviour since this morning 22/08/2024). A few examples:
probe 62520 - AS2027 (my probe)
probe 17151 - AS12322
probe 60488 - AS12322
probe 17132 - AS15557
probe 61139 - AS2027
etc…
and many others in my physical area.

So that seems to be a global issue.

Thank you for the confirmation.
Hopefully the team is aware of this and is working on it…

I bet they are, but just in case, I dropped them a mail 2 minutes ago with a link to this post as well.

So for now, there’s nothing we can do except patiently watching our uptimes going down the crapper.

Let’s hope for the best…

Which country are you from?
For the moment, I can confirm issues in France (probes listed in my first post), but also Luxembourg:
probe 13005 - AS2602 impacted - but Anchor 6249 on same AS seems OK (even though it did disconnect once today)
probe 13066 - AS6661
anchor 6919 (!!) - AS210834
and many other in LU as well.

Germany impacted as well:
probe 25341
probe 50918
etc

OK I’ll stop here, I think we can say with confidence that this is definitely a global issue.

I’m in the US and having the same problem. Probe 24725.

Richard

apparently they are aware of an issue that is preventing probes from connecting to controllers:

You can add Israel to the list.

Please note we have an entry about this on the RIPE NCC status page: https://status.ripe.net/

1 Like

Hi robertk,

thanks for the list (in fact I didn’t even know there was a status page - so thanks!)

Is the root cause of the issue confidential? If not, I’m sure many geeks here would really like to understand what’s going on, and why only certain probes are impacted. I think I can speak up for all of us, we are aware this is not a paid service so there’s no SLA involved anyway, but from the technical side, it’d be interesting to understand what’s going on.

1 Like

No secrecy here, we’ll publish relevant details in a post mortem once the issue is resolved.

4 Likes

is that helps… probe #62520 is stable for 5+ hours now.

Mine was stable for for almost 7 hours but lost connection an hour ago.

same here. It was stable for a couple hours today and now it’s still stable, but in a disconnect state! :laughing:

I really hope the RIPE team will be able to correct the uptime or reset the probes to zero afterwards, otherwise it doesn’t even make sense to keep the affected probes running as the data is now completely inaccurate.

Same here. Connection was stable for some time but now in disconnected state.

Hi all…

my first message there.

Same issue here, my probe disconnecting with no reason since yesterday.

thx

We need to be patient. robertk (from the RIPE team) has confirmed there is a known global issue and that they are working on it, and so did the support (reply to my ticket):

Thank you for getting in touch with us.
Correct, please note that we are currently experiencing some issues which are making some Atlas probes disconnect.
We have identified the cause and are currently working on this internally.We are sorry for any inconvenience this has caused.
Should you have any questions, please do let us know. Have a nice day.

2 Likes

According the to the status page, the issue has been solved.

I still had 3hrs of downtime yesterday.

Do you have some insights that you can share with the community about what happened?

Are you planning to delete the wrong downtimes from the database on the impacted probes?

1 Like

despite on status.ripe.net everything is green - the probes still disconnect-reconnect all the time
today 14 disconnects 1007887 (sw)
today 13 disconnects 1000568 (sw)
today 16 disconnects 1000234 (sw)
today stable 53236 (hw)

Is there a special problem with sw-probes? Should we update?