Probe suddenly claiming it was offline for almost a week

Probe 61817 seemed to work fine, but suddenly I see it being offline for most of the last week (shows online percent as 9.87%, when yesterday it showed ~99.9%).

The status page now shows it mostly offline since 2025-01-20 13:49:30.

The probe also shows offline despite sending pings etc.

Is something off on the backend side that caused some data loss?

1 Like

Seems now it’s showing the probe as back online, but it claims to have been offline even longer (i.e., a segment showing earlier as connected disappeared).

This is a bit of a blind spot for now for us: in some (and hopefully ever more rare) cases the infrastructure has a hard problem and therefore cannot report disconnect events to us. Once probes reconnect this is reported - but we can only make educated guesses about when the previous connection period ended. This leaves a gap in the connection logs. We have some ideas on how to improve on this but, at this time I apologise because I cannot offer a foolproof solution.

I had a similar situation. In January 2025 , the probe 62520 suddenly RETROACTIVELY indicated having been offline for a week end of December 2024. This is the first time I saw something like this.

Later that month, it showed being offline for 2 days on jan 27, 2025 (in real-time, not retroactively this time), but this is a false positive. Probe 61107 (same AS2027, same net) was OK, so no nework issue there. Probes frequently appear disconnected for a couple of minutes (up to 20-30min), but such false positives for several days are really weird. Also, this f***s up the stats.

Do you have a clue what could have happened?

The same here

The probe suddenly reported a gap of 6 days without notice before. But traffic and built-in charts are ok for the last week.:face_with_raised_eyebrow:

My probe 1007887 has had also the problem, not one week, but more than a day instead of some hours (caused by the provider).
“Uptime History” should be renamed in “estimated Uptime History”. It is little bit sad, caring about good uptime and the protocol doesn’t reflect it.

Mine just did this too, down for about 10 minutes but it now says it was down for 22 days. That’s pretty bad.

1 Like

Same at Probe ID 1009707. Probe is down for about 1 hour, then now it says it’s was down for 11 days (1 week 4 days). The problem is either caused by server side infrastructure or there was a problem of the RIPE Atlas controller?

1 Like

Today I made some debian updates on all my SW-probes, including a reboot. Instead of a disconnecting time about some minutes I lost hours and days again. :scream:

Later that day, all probes are affected. :enraged_face:

Probes frequently appear disconnected for a couple of minutes (up to 15-30min; sometime up to several hours), and lost the several days or weeks of uptime.

This is my probe:

  • Probe ID 1009313 - down for over 5 minutes but now it says it’s was down for 5 days.

Sister probes:

  • Probe ID 1010000 - down for several minutes but it now says it’s was down for 14 days (2 weeks).
  • Probe ID 64654 - down for several seconds but it now says it’s was down for 4 days.

The other probes/anchors are affected (not mine):

  • Probe ID 7470 - down for several seconds but now it says it’s was down for 9 days (1 week 2 days).
  • Probe ID 21165 - down for almost 10 minutes but now it says it’s was down for 5 days.

Hi all,

Update on this: we can see that this is an issue and we’re actively working on a solution.

1 Like

I have applied a fix to probes 1009313, 1010000, 64654 to fill in the connection history. Can you check to see if it’s now how you expect?

Thanks for the explanation.

After a few days downtime and no further intervention on my part the probe came back up.

For this probe (1009313) this is the connection log for the past few weeks:

Internet Address Controller Connected (UTC) Connected for Disconnected (UTC) Disconnected for
112.202.96.244 ctr-dub-sw01 2025-03-19 04:45:22 20h 30m Still Connected
112.202.112.38 ctr-dub-sw01 2025-03-18 20:06:04 8h 39m 2025-03-19 04:45:10 0h 0m
112.202.112.38 ctr-dub-sw01 2025-03-18 03:24:26 16h 41m 2025-03-18 20:05:52 0h 0m
112.202.112.38 ctr-dub-sw01 2025-03-17 22:56:55 4h 27m 2025-03-18 03:24:17 0h 0m
112.202.112.38 ctr-dub-sw01 2025-03-16 19:03:47 1d 3h 52m 2025-03-17 22:56:44 0h 0m
112.202.112.38 ctr-dub-sw01 2025-03-15 16:08:48 1d 2h 54m 2025-03-16 19:03:42 0h 0m
112.202.112.38 ctr-dub-sw01 2025-03-09 19:04:00 5d 19h 33m 2025-03-15 14:37:51 1h 30m

The remaining probes (e.g. 1010001) is still not fixed:

Internet Address Controller Connected (UTC) Connected for Disconnected (UTC) Disconnected for
112.202.102.169 ctr-dub-sw02 2025-03-16 03:21:21 3d 22h 22m Still Connected
112.202.105.166 ctr-dub-sw02 2025-02-22 03:06:56 6d 15h 55m 2025-02-28 19:02:03 15d 8h 19m

So, we can hopefully get it fixed the entire probe.

Thank you for whoever did the behind-the-scenes work!

I have applied the fix for # 1010001. Over the course of today a similar fix will be applied to all remaining probes.

1 Like

This issue should now be resolved for all probes. We are putting some monitoring in place to ensure that it doesn’t reoccur. Apologies for the inconvenience.

1 Like