Umbrel is unavailable after few hours

Hi,
my Umbrel is working only few hours. Then it is unavailable. I tried all possible solutions from the discussions but without result:

  • setted static IP on router
  • setted static IP on Umbrel in dhcpcd.conf
  • re-flashed Umbrel on microSD card

All this steps I tried but it didn’t work. After few hours I lose connection to my Umbrel. Ping using umbrel.local works but SSH connection is refused and access using web browser to umbrel.local also doesn’t work. When I look at the router for data traffic, the Umbrel is connected but the traffic is 0 KB/s. Umbrel stops communicating and I don’t know why.
The only option is to unplug the cable. After the reboot, my Umbrel is fine again for a while.
Can you please advise me somewhere where the problem might be?

2 Likes

Please follow the instruction to generate the debug log from the troubleshooting guide and then read the 2nd tab DMESG about your hardware. See if there is some hardware failure there.
Seems to be a problem of not enough power. Maybe your power adapter is failing.

How long does it usually take to run a debug log? Mine has been running for over 12 hours and still not complete? Thank you

I have some errors about USB:

[   14.101367] usb 1-1: new high-speed USB device number 2 using xhci_hcd
[   14.229536] usb 1-1: device descriptor read/64, error -71
[   14.465534] usb 1-1: device descriptor read/64, error -71
[   14.701330] usb 1-1: new high-speed USB device number 3 using xhci_hcd
[   14.829533] usb 1-1: device descriptor read/64, error -71
[   15.065537] usb 1-1: device descriptor read/64, error -71

next errors is

[   16.197202] xhci_hcd 0000:01:00.0: Setup ERROR: setup address command for slot 2.
[   16.609328] usb 1-1: device not accepting address 4, error -22
[   17.153346] usb 1-1: device not accepting address 5, error -22

I have 5V cooling fan. I tried switch it to 3V for power saving. Can it help?

Could help yes, but that means your power adapter sucks. Remove anything that is nor necessary and is connected to your Pi (aka keyb, mouse, video).
And buy a good original power adapter.

Unfortunately, I already have original power adapter (this one)

There aren’t any unnecessary peripherals connected to my node. It is only RPi 4 (8GB), 5V cooling fan, 1TB SSD NVMe.

I’ll wait if the fan connected to 3V instead of 5V helps. Thanks for your time!

What other errors you see in the log? Also on the main log.
Could be also the SSD case/cable that can fail. Try to replace it, if you have another one.


These next errors I found.

Oh so you are still in IBD.
You should wait.

Before 5 hours IBD was finished. Now the same issue. Umbrel is unavailable through SSH, but ping to umbrel.local is ok.

If I try connect to Umbrel using SSH it doesn’t work:
kex_exchange_identification: read: Connection reset by peer

I will try to replace the USB cable and start it again.

EDIT: After starting I am not synchronized but I have to synchronize blocks again in the last 5 days.

EDIT 2: Is ok, that blockchain has 417 GB and the system has approx 150 GB? I have a feeling that my Umbrel is broken overall.

edit 2

yes

I try different things with my knot all day. I tried replace USB cable but without result. I don’t know how to fix my node. Disk usage has increased by another 30GB to total about 600GB before crash. I don’t know the reason. After unplug and plug power cable was SSD usage decreased to aprox 570GB like on screen above.

My Umbrel crashed 5 times today. Ping always works properly but SSH connection is refused and access using browser through umbrel.local also doesn’t works. Data traffic of the Umbrel on router is 0 KB/s. If node became unavailable then LED on the SSD case lights up continuously for a while (a few minutes) and then will turn off.

Here is my last log after crash and reboot. But I don’t see anything special. Maybe because I don’t know exactly what to look for.

Shoul I try format SSD and mSD card and start from the beginning? Can it helps?

I had more or less the same problem, although i did not try to ping the inactive device. I flashed the sd with a new image, and kept the ssd. Umbrel started regularly and was running for a week and a half until it crashed again two days ago.
I just now restarted and i’m going through the logs. I hope it’s a one off.

Formatting is the last resort. So don’t jump straight to that until you do not know what really is going on.
You could re-format all but what if is doing the same shit, because of some kind of faulty part? You will just lose time.
First you need a good diagnose, then take drastic measures.

Looking to your logs, doesn’t seem to have software problems or data corruption.
Yes, the blockchain data was recently synced 100% but after that you still need to wait until electrs will finish the index and also compacting the database. That could take some time.
I would be patient for some hours now.
I was looking also in the hardware tab (DMESG) and couldn’t see serious problems.

Maybe @mayank can take a look at this log?

@gilbycoyote I already tried reflash two times but no result.

@DarthCoin I’ll wait, but I’m worried that the data may be corrupted by constantly disconnecting and reconnecting the power cable. Sometimes after plug cable I see small rollback in blockchain. I will see tomorrow.

if the data would be corrupted, it will be reflected into that log. But I couldn’t see any sign of that.

I tried to investigate yet. I tried searching the logs /var/log/syslog
There is a delay at night between the fall and the restart. Just before the fall, some errors are visible. And then subsequent start of Umbrel.
Fall: 00:49:41
Start: 03:46:00

Sep 30 00:49:39 umbrel kernel: [   34.569625] br-1e3087ca8717: port 7(veth209b4c5) entered forwarding state
Sep 30 00:49:39 umbrel containerd[669]: time="2021-09-30T00:49:39.659115668Z" level=error msg="add cg to OOM monitor" error="cgroups: memory cgroup not supported on this system"
Sep 30 00:49:39 umbrel kernel: [   35.033918] br-1e3087ca8717: port 4(veth8bbbfbb) entered disabled state
Sep 30 00:49:39 umbrel kernel: [   35.040976] eth0: renamed from vethff5e54e
Sep 30 00:49:39 umbrel kernel: [   35.097651] eth0: renamed from vethe755872
Sep 30 00:49:40 umbrel kernel: [   35.161483] br-1e3087ca8717: port 4(veth8bbbfbb) entered blocking state
Sep 30 00:49:40 umbrel kernel: [   35.161515] br-1e3087ca8717: port 4(veth8bbbfbb) entered forwarding state
Sep 30 00:49:40 umbrel kernel: [   35.161671] IPv6: ADDRCONF(NETDEV_CHANGE): vethf042c81: link becomes ready
Sep 30 00:49:40 umbrel kernel: [   35.161806] br-1e3087ca8717: port 15(vethf042c81) entered blocking state
Sep 30 00:49:40 umbrel kernel: [   35.161813] br-1e3087ca8717: port 15(vethf042c81) entered forwarding state
Sep 30 00:49:40 umbrel containerd[669]: time="2021-09-30T00:49:40.103248982Z" level=error msg="add cg to OOM monitor" error="cgroups: memory cgroup not supported on this system"
Sep 30 00:49:40 umbrel containerd[669]: time="2021-09-30T00:49:40.196853834Z" level=error msg="add cg to OOM monitor" error="cgroups: memory cgroup not supported on this system"
Sep 30 00:49:40 umbrel kernel: [   35.398334] br-1e3087ca8717: port 5(veth66ba17f) entered disabled state
Sep 30 00:49:40 umbrel kernel: [   35.399092] eth0: renamed from veth2fdef11
Sep 30 00:49:40 umbrel containerd[669]: time="2021-09-30T00:49:40.380393241Z" level=error msg="add cg to OOM monitor" error="cgroups: memory cgroup not supported on this system"
Sep 30 00:49:40 umbrel kernel: [   35.542379] br-1e3087ca8717: port 5(veth66ba17f) entered blocking state
Sep 30 00:49:40 umbrel kernel: [   35.542404] br-1e3087ca8717: port 5(veth66ba17f) entered forwarding state
Sep 30 00:49:40 umbrel kernel: [   35.542619] br-1e3087ca8717: port 1(veth98a70c7) entered disabled state
Sep 30 00:49:40 umbrel kernel: [   35.542959] eth0: renamed from veth7b46a9c
Sep 30 00:49:40 umbrel kernel: [   35.566028] eth0: renamed from veth859e33f
Sep 30 00:49:40 umbrel kernel: [   35.585636] eth0: renamed from veth9de8235
Sep 30 00:49:40 umbrel kernel: [   35.608030] IPv6: ADDRCONF(NETDEV_CHANGE): vethf645a31: link becomes ready
Sep 30 00:49:40 umbrel kernel: [   35.608136] br-1e3087ca8717: port 8(vethf645a31) entered blocking state
Sep 30 00:49:40 umbrel kernel: [   35.608150] br-1e3087ca8717: port 8(vethf645a31) entered forwarding state
Sep 30 00:49:40 umbrel kernel: [   35.661015] eth0: renamed from veth1c2f520
Sep 30 00:49:40 umbrel kernel: [   35.681826] IPv6: ADDRCONF(NETDEV_CHANGE): veth0e13a45: link becomes ready
Sep 30 00:49:40 umbrel kernel: [   35.681952] br-1e3087ca8717: port 9(veth0e13a45) entered blocking state
Sep 30 00:49:40 umbrel kernel: [   35.681966] br-1e3087ca8717: port 9(veth0e13a45) entered forwarding state
Sep 30 00:49:40 umbrel kernel: [   35.682091] IPv6: ADDRCONF(NETDEV_CHANGE): vethd22fece: link becomes ready
Sep 30 00:49:40 umbrel kernel: [   35.682158] br-1e3087ca8717: port 10(vethd22fece) entered blocking state
Sep 30 00:49:40 umbrel kernel: [   35.682167] br-1e3087ca8717: port 10(vethd22fece) entered forwarding state
Sep 30 00:49:40 umbrel kernel: [   35.682476] eth0: renamed from veth1d61f39
Sep 30 00:49:40 umbrel kernel: [   35.734691] eth0: renamed from vethcade27d
Sep 30 00:49:40 umbrel kernel: [   35.763293] IPv6: ADDRCONF(NETDEV_CHANGE): vethb153ecb: link becomes ready
Sep 30 00:49:40 umbrel kernel: [   35.763418] br-1e3087ca8717: port 6(vethb153ecb) entered blocking state
Sep 30 00:49:40 umbrel kernel: [   35.763445] br-1e3087ca8717: port 6(vethb153ecb) entered forwarding state
Sep 30 00:49:40 umbrel kernel: [   35.763709] eth0: renamed from veth7bf880f
Sep 30 00:49:40 umbrel systemd[1]: systemd-fsckd.service: Succeeded.
Sep 30 00:49:40 umbrel kernel: [   35.850568] eth0: renamed from vethe9a04ce
Sep 30 00:49:40 umbrel kernel: [   35.885522] IPv6: ADDRCONF(NETDEV_CHANGE): veth3802f62: link becomes ready
Sep 30 00:49:40 umbrel kernel: [   35.885669] br-1e3087ca8717: port 13(veth3802f62) entered blocking state
Sep 30 00:49:40 umbrel kernel: [   35.885686] br-1e3087ca8717: port 13(veth3802f62) entered forwarding state
Sep 30 00:49:40 umbrel kernel: [   35.888165] eth0: renamed from veth14ac33d
Sep 30 00:49:40 umbrel kernel: [   35.929894] IPv6: ADDRCONF(NETDEV_CHANGE): veth43bd418: link becomes ready
Sep 30 00:49:40 umbrel kernel: [   35.930010] br-1e3087ca8717: port 12(veth43bd418) entered blocking state
Sep 30 00:49:40 umbrel kernel: [   35.930026] br-1e3087ca8717: port 12(veth43bd418) entered forwarding state
Sep 30 00:49:40 umbrel kernel: [   35.930162] IPv6: ADDRCONF(NETDEV_CHANGE): veth17d3425: link becomes ready
Sep 30 00:49:40 umbrel kernel: [   35.930230] br-1e3087ca8717: port 11(veth17d3425) entered blocking state
Sep 30 00:49:40 umbrel kernel: [   35.930245] br-1e3087ca8717: port 11(veth17d3425) entered forwarding state
Sep 30 00:49:40 umbrel kernel: [   35.934334] IPv6: ADDRCONF(NETDEV_CHANGE): vethc3d285b: link becomes ready
Sep 30 00:49:40 umbrel kernel: [   35.934449] br-1e3087ca8717: port 14(vethc3d285b) entered blocking state
Sep 30 00:49:40 umbrel kernel: [   35.934462] br-1e3087ca8717: port 14(vethc3d285b) entered forwarding state
Sep 30 00:49:40 umbrel kernel: [   35.936083] br-1e3087ca8717: port 1(veth98a70c7) entered blocking state
Sep 30 00:49:40 umbrel kernel: [   35.936148] br-1e3087ca8717: port 1(veth98a70c7) entered forwarding state
Sep 30 00:49:40 umbrel containerd[669]: time="2021-09-30T00:49:40.780879797Z" level=error msg="add cg to OOM monitor" error="cgroups: memory cgroup not supported on this system"
Sep 30 00:49:40 umbrel containerd[669]: time="2021-09-30T00:49:40.869203982Z" level=error msg="add cg to OOM monitor" error="cgroups: memory cgroup not supported on this system"
Sep 30 00:49:40 umbrel containerd[669]: time="2021-09-30T00:49:40.877238648Z" level=error msg="add cg to OOM monitor" error="cgroups: memory cgroup not supported on this system"
Sep 30 00:49:41 umbrel containerd[669]: time="2021-09-30T00:49:41.226659019Z" level=error msg="add cg to OOM monitor" error="cgroups: memory cgroup not supported on this system"
Sep 30 00:49:41 umbrel containerd[669]: time="2021-09-30T00:49:41.238293370Z" level=error msg="add cg to OOM monitor" error="cgroups: memory cgroup not supported on this system"
Sep 30 00:49:41 umbrel containerd[669]: time="2021-09-30T00:49:41.262155259Z" level=error msg="add cg to OOM monitor" error="cgroups: memory cgroup not supported on this system"
Sep 30 00:49:41 umbrel containerd[669]: time="2021-09-30T00:49:41.275733667Z" level=error msg="add cg to OOM monitor" error="cgroups: memory cgroup not supported on this system"
Sep 30 00:49:41 umbrel containerd[669]: time="2021-09-30T00:49:41.387725130Z" level=error msg="add cg to OOM monitor" error="cgroups: memory cgroup not supported on this system"
Sep 30 00:49:41 umbrel containerd[669]: time="2021-09-30T00:49:41.465520426Z" level=error msg="add cg to OOM monitor" error="cgroups: memory cgroup not supported on this system"
Sep 30 00:49:41 umbrel dockerd[721]: time="2021-09-30T00:49:41.499132852Z" level=info msg="Loading containers: done."
Sep 30 03:46:00 umbrel systemd-timesyncd[309]: Synchronized to time server for the first time 46.167.244.248:123 (2.debian.pool.ntp.org).
Sep 30 03:46:00 umbrel dockerd[721]: time="2021-09-30T03:46:00.596455583Z" level=info msg="Docker daemon" commit=75249d8 graphdriver(s)=overlay2 version=20.10.8
Sep 30 03:46:00 umbrel dockerd[721]: time="2021-09-30T03:46:00.600847675Z" level=info msg="Daemon has completed initialization"
Sep 30 03:46:00 umbrel systemd[1]: Started Docker Application Container Engine.
Sep 30 03:46:00 umbrel systemd[1]: Starting Status Server iptables Update...
Sep 30 03:46:00 umbrel systemd[1]: Starting External Storage SDcard Updater...
Sep 30 03:46:00 umbrel dockerd[721]: time="2021-09-30T03:46:00.800023694Z" level=info msg="API listen on /var/run/docker.sock"
Sep 30 03:46:00 umbrel status server iptables[3606]: Removed existing iptables entry.
Sep 30 03:46:00 umbrel external storage updater[3607]: Checking if SD card Umbrel is newer than external storage...
Sep 30 03:46:00 umbrel status server iptables[3606]: Appended new iptables entry.
Sep 30 03:46:00 umbrel systemd[1]: umbrel-status-server-iptables-update.service: Succeeded.
Sep 30 03:46:00 umbrel systemd[1]: Started Status Server iptables Update.
Sep 30 03:46:01 umbrel external storage updater[3607]: No, SD version is not newer, exiting.
Sep 30 03:46:01 umbrel systemd[1]: Started External Storage SDcard Updater.
Sep 30 03:46:01 umbrel systemd[1]: Starting Umbrel Startup Service...
Sep 30 03:46:01 umbrel umbrel startup[3723]: ======================================
Sep 30 03:46:01 umbrel umbrel startup[3723]: ============= STARTING ===============
Sep 30 03:46:01 umbrel umbrel startup[3723]: ============== UMBREL ================
Sep 30 03:46:01 umbrel umbrel startup[3723]: ======================================
Sep 30 03:46:01 umbrel umbrel startup[3723]: Setting environment variables...
Sep 30 03:46:01 umbrel umbrel startup[3723]: Starting karen...
Sep 30 03:46:01 umbrel umbrel startup[3723]: Starting status monitors...
Sep 30 03:46:01 umbrel umbrel startup[3723]: Starting memory monitor...
Sep 30 03:46:01 umbrel umbrel startup[3723]: Starting backup monitor...

I tried to watch syslog in realtime. Umbrel break down after this line
usb 2-2: Enable of device-initiated U1 failed

I spent a lot of time searching the forums and found that this problem is most likely associated with a lack of power for the SSD (as @DarthCoin wrote above). My SSD (Samsung SSD 970 EVO PLUS) has probably too much energy consumption in one moment and Umbrel fails.

I ordered new SSD with smaller consumption. I’ll see if it helps. Unfortunately, I must pass through IBD again :smiley:

if the data is intact and not corrupted, you can just copy the bitcoin folder from one drive to another
See the troubleshooting manual where is explained how to do it.

So, after few days my node works correct. I replaced my SSD and I already haven’t any issue. Node is running already 3 day at a time.

2 Likes