SW Probe can't connect to ctr-dub-sw01.atlas.prod.ripe.net

Without any changes on my end, my SW probe started to seeing a lot of disconnects over the past couple weeks:

This probe was running for almost 2 years without any issues. In an attempt to fix the problem i tried to re-install the atlas probe with its latest version. Starting the atlas-probe service for the first time i’am presented with the errors below.

Reboot count is now 1
RESULT 9000 done 1729692377 d83add1a7875 STARTING ATLAS system initialized (reboot count 1)
RESULT 9000 done 1729692377 d83add1a7875 STARTING TELNETD LOCALLY
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 192.168.1.215  netmask 255.255.255.0  broadcast 192.168.1.255
        inet6 fd71:3312:3b0a:f74f:93a9:9881:f1b8:8e85  prefixlen 64  scopeid 0x0<global>
        inet6 fe80::2a66:f254:a9c5:4d2d  prefixlen 64  scopeid 0x20<link>
        ether d8:3a:dd:1a:78:75  txqueuelen 1000  (Ethernet)
        RX packets 2379  bytes 573627 (560.1 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 1024  bytes 284625 (277.9 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        inet6 ::1  prefixlen 128  scopeid 0x10<host>
        loop  txqueuelen 1000  (Local Loopback)
        RX packets 389  bytes 51822 (50.6 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 389  bytes 51822 (50.6 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

RESULT 9006 done 1729692377 d83add1a7875 no reginit.vol start registration
/run/ripe-atlas/status/reginit.vol does not exist try new reg
Ping failed
start reg
ATLAS registration starting
REASON_FOR_REGISTRATION NEW NO previous state files
REGHOSTS reg03.atlas.ripe.net 193.0.19.246 2001:67c:2e8:11::c100:13f6 reg04.atlas.ripe.net 193.0.19.247 2001:67c:2e8:11::c100:13f7
ssh -p 443 atlas@193.0.19.247 INIT
Got good controller info
check cached controller info from previous registration
NO cached controller info. NO REMOTE port info
Do a controller INIT
Controller init -p  443 atlas@ctr-dub-sw01.atlas.prod.ripe.net  INIT
255 controller INIT exit with error
condmv: not moving, destination '/var/spool/ripe-atlas/data/out/simpleping' exists
RESULT 9006 done 1729692590 d83add1a7875 no reginit.vol start registration
/run/ripe-atlas/status/reginit.vol does not exist try new reg
Ping failed
start reg
ATLAS registration starting
registration info is still valid till 1729695979, now 1729692590
check cached controller info from previous registration
NO cached controller info. NO REMOTE port info
Do a controller INIT
Controller init -p  443 atlas@ctr-dub-sw01.atlas.prod.ripe.net  INIT
255 controller INIT exit with error
condmv: not moving, destination '/var/spool/ripe-atlas/data/out/v6addr.txt' exists
condmv: not moving, destination '/var/spool/ripe-atlas/data/out/simpleping' exists
RESULT 9006 done 1729692797 d83add1a7875 no reginit.vol start registration
/run/ripe-atlas/status/reginit.vol does not exist try new reg
Ping failed
start reg
ATLAS registration starting
registration info is still valid till 1729695979, now 1729692797
check cached controller info from previous registration
NO cached controller info. NO REMOTE port info
Do a controller INIT
Controller init -p  443 atlas@ctr-dub-sw01.atlas.prod.ripe.net  INIT
255 controller INIT exit with error
condmv: not moving, destination '/var/spool/ripe-atlas/data/out/v6addr.txt' exists
condmv: not moving, destination '/var/spool/ripe-atlas/data/out/simpleping' exists
Moving reboot-count.txt
RESULT 9006 done 1729693012 d83add1a7875 no reginit.vol start registration
/run/ripe-atlas/status/reginit.vol does not exist try new reg
Ping failed
start reg
ATLAS registration starting
registration info is still valid till 1729695979, now 1729693012
check cached controller info from previous registration
NO cached controller info. NO REMOTE port info
Do a controller INIT
Controller init -p  443 atlas@ctr-dub-sw01.atlas.prod.ripe.net  INIT
255 controller INIT exit with error
condmv: not moving, destination '/var/spool/ripe-atlas/data/out/v6addr.txt' exists
condmv: not moving, destination '/var/spool/ripe-atlas/data/out/simpleping' exists
RESULT 9006 done 1729693219 d83add1a7875 no reginit.vol start registration
/run/ripe-atlas/status/reginit.vol does not exist try new reg
Ping failed
start reg
ATLAS registration starting
registration info is still valid till 1729695979, now 1729693219
check cached controller info from previous registration
NO cached controller info. NO REMOTE port info
Do a controller INIT
Controller init -p  443 atlas@ctr-dub-sw01.atlas.prod.ripe.net  INIT
255 controller INIT exit with error
condmv: not moving, destination '/var/spool/ripe-atlas/data/out/v6addr.txt' exists
condmv: not moving, destination '/var/spool/ripe-atlas/data/out/simpleping' exists
RESULT 9006 done 1729693435 d83add1a7875 no reginit.vol start registration
/run/ripe-atlas/status/reginit.vol does not exist try new reg
Ping failed
start reg
ATLAS registration starting
registration info is still valid till 1729695979, now 1729693436
check cached controller info from previous registration
NO cached controller info. NO REMOTE port info
Do a controller INIT
Controller init -p  443 atlas@ctr-dub-sw01.atlas.prod.ripe.net  INIT
255 controller INIT exit with error
condmv: not moving, destination '/var/spool/ripe-atlas/data/out/v6addr.txt' exists
condmv: not moving, destination '/var/spool/ripe-atlas/data/out/simpleping' exists
RESULT 9006 done 1729693659 d83add1a7875 no reginit.vol start registration
/run/ripe-atlas/status/reginit.vol does not exist try new reg
Ping failed
start reg
ATLAS registration starting
registration info is still valid till 1729695979, now 1729693659
check cached controller info from previous registration
NO cached controller info. NO REMOTE port info
Do a controller INIT
Controller init -p  443 atlas@ctr-dub-sw01.atlas.prod.ripe.net  INIT
	255 controller INIT exit with error
condmv: not moving, destination '/var/spool/ripe-atlas/data/out/v6addr.txt' exists
condmv: not moving, destination '/var/spool/ripe-atlas/data/out/simpleping' exists
RESULT 9006 done 1729693868 d83add1a7875 no reginit.vol start registration
/run/ripe-atlas/status/reginit.vol does not exist try new reg
Ping failed
start reg
ATLAS registration starting
registration info is still valid till 1729695979, now 1729693868
check cached controller info from previous registration
NO cached controller info. NO REMOTE port info
Do a controller INIT
Controller init -p  443 atlas@ctr-dub-sw01.atlas.prod.ripe.net  INIT
255 controller INIT exit with error
condmv: not moving, destination '/var/spool/ripe-atlas/data/out/v6addr.txt' exists
condmv: not moving, destination '/var/spool/ripe-atlas/data/out/simpleping' exists
RESULT 9006 done 1729694073 d83add1a7875 no reginit.vol start registration
/run/ripe-atlas/status/reginit.vol does not exist try new reg
Ping failed
start reg
ATLAS registration starting
registration info is still valid till 1729695979, now 1729694074
check cached controller info from previous registration
NO cached controller info. NO REMOTE port info
Do a controller INIT
Controller init -p  443 atlas@ctr-dub-sw01.atlas.prod.ripe.net  INIT
255 controller INIT exit with error
condmv: not moving, destination '/var/spool/ripe-atlas/data/out/v6addr.txt' exists
condmv: not moving, destination '/var/spool/ripe-atlas/data/out/simpleping' exists

Hi. Would you be able to share your probe ID so I can take a closer look?

Also, can you tell me if running this command manually gives you a “Permission denied” (it should):

ssh -p 443 atlas@ctr-dub-sw01.atlas.prod.ripe.net

Hey Camin, my probe ID is 1006073.

Running the command manually i get:

ssh -p 443 atlas@ctr-dub-sw01.atlas.prod.ripe.net
kex_exchange_identification: read: Connection reset by peer
Connection reset by 34.248.40.103 port 443

Also one thing i’ve noticed in the error message it says Ping failed, which is true, if i try to ping manually:

ping ctr-dub-sw01.atlas.prod.ripe.net
PING ctr-dub-sw01.atlas.prod.ripe.net (34.248.40.103): 56 data bytes
Request timeout for icmp_seq 0
Request timeout for icmp_seq 1
Request timeout for icmp_seq 2
Request timeout for icmp_seq 3
Request timeout for icmp_seq 4
Request timeout for icmp_seq 5
Request timeout for icmp_seq 6
Request timeout for icmp_seq 7
Request timeout for icmp_seq 8
Request timeout for icmp_seq 9
Request timeout for icmp_seq 10

Now I seem to got a connection since 45mins. again, without me changing anything. (and of course being able to connect to other sites and services to the internet, while it was not working)

Thanks for reporting this, I can replicate this issue for that IP address in particular - eventually your probe connected using one of the other A/AAAA records.

I will take a further look into it and provide update here.

1 Like

This has been largely fixed for the time being, although it could still be an issue in edge cases. There is a permanent fix that will be applied in a couple of weeks time.

The good news is that probes will always connect eventually, and once they’re connected (and if they stay connected) then there is no problem.

Thanks again for raising this!

The good news is that probes will always connect eventually , and once they’re connected (and if they stay connected) then there is no problem.

Ok, got it. Problem here is, that my ISP reconnects every 24hrs to assign a new IP v4. So until you apply a permanent fix, i’ll most likely see those issues.

Hello,

Following the controller deploy today I expect this issue to be permanently resolved. I noticed that after the deploy probe #1006073 reconnected quite quickly, so all looks good.

If you notice anything else weird then please let us know.

Cheers,
Chris

Thanks @camin for following up here. I see the 2 disconnects during the deployment on my end too. I’ll monitor the situation and will reach out if the situation changes, but the connection was pretty stable already after you hotfixed 2 weeks ago.

Cheers