Red Umbrella of Death after Power Outage

Recently, I’ve been experiencing more power outages than usual. Very often (almost every time) a power outage occurs, my Umbrel nodes come back to the Red Umbrella of death below.

In case you are experiencing similar issues I would recommend the following steps. These usually fix the issue for me.

  1. SSH into umbrel:
    ssh -t umbrel@umbrel.local

  2. Try to run this command to check your disk:
    sudo e2fsck /dev/sda1
    Most of the times this fixes the issue. If you get error messages that need fixing, pressing y will fix a single issue; pressing a will keep fixing issues (i.e. yes to all).

  3. If you get this message:

e2fsck 1.44.5 (15-Dec-2018)
/dev/sda1 is mounted.
e2fsck: Cannot continue, aborting.

Then follow these additional steps…

  1. Stop all running services:
    sudo systemctl stop umbrel-startup

  2. Stop the Swap:
    sudo swapoff --all

  3. Unmount the Drive:
    sudo umount /dev/sda1

  4. Try again:
    sudo e2fsck /dev/sda1

  5. If you still get the message saying that your drive is mounted, then you need to investigate manually for running services that may be using your disk. Start by running:

    ps -auxw

    then kill any processes that you believe may be using the device - this may require some tries. Note the PID number(s) then run:

    sudo kill -9 <PID> <PID> <PID>

    Where <PID> is each of the processes you’d like to kill.

  6. Try unmounting the drive and running disk check again.

  7. If none of these worked, and you are still seeing the red umbrella of death, then your best bet is to shutdown the note and reimage the SD Card with a fresh image of Umbrel (your data is stored on SSD if it’s still accessible).

  8. To restart Umbrel, execute:
    sudo systemctl start umbrel-startup

5 Likes

Including an additional step that sometimes is needed and may eventually release a lot of used disk space:

sudo docker system prune -a -f

Any ideas on what to do if these steps still don’t work? I’ve tried killing most of the processes and I still get an error that the drive is busy and can’t be unmounted. Is there a way to force unmount the drive?

Update: I ended up needing to kill docker using sudo systemctl stop docker

Killing it using the PID just allowed it to keep relaunching under a new process.

This is a good suggestion. Will include it in the steps.

This actually happens fairly frequently in some of my nodes. I wonder if there’s a way to automate a e2fsck check disk whenever the red umbrella of death is displayed. @louneskmt

@alphaazeta One thing I did on my Raspberry Pi’s when it reboots, it automatically does a fsck no matter what, so you can do this by going in the /dev/sda1’s root partition and do touch forcefsck as shown in here: https://linuxconfig.org/how-to-force-fsck-to-check-filesystem-after-system-reboot-on-linux

1 Like

I didn’t have a power outage, but my Umbrel was turning off randomly and restarting. Then I got the “error: system service failed” screen. I reflashed the SD card to 4.7, and still got the “error: system service failed” screen. I reflashed to 4.8, and am still stuck at the same screen. When I ssh in, it doesn’t recognize my custom password anymore, only the default one. Sparrow can’t connect to Umbrel anymore. I tried the steps above, to no avail.

I am using a Raspberry Pi 4 with a 32gb SD card. My Umbrel is connected through an ethernet cable to the router. I am using an SSD drive. My hardware was assembled by CryptoCloaks.

Thank you; here are the debug logs:

=====================
= Umbrel debug info =

Umbrel version

0.4.6

Flashed OS version

v0.4.8

Raspberry Pi Model

Revision : d03114
Serial : 10000000682ffc21
Model : Raspberry Pi 4 Model B Rev 1.4

Firmware

Oct 29 2021 10:47:33
Copyright © 2012 Broadcom
version b8a114e5a9877e91ca8f26d1a5ce904b2ad3cf13 (clean) (release) (start)

Temperature

temp=40.9’C

Throttling

throttled=0x0

Memory usage

          total        used        free      shared  buff/cache   available

Mem: 7.8G 398M 7.0G 8.0M 428M 7.3G
Swap: 4.1G 0B 4.1G

total: 5.1%
electrs: 4.1%
system: 1%
vaultwarden: 0%
tor: 0%
thunderhub: 0%
sphinx-relay: 0%
specter-desktop: 0%
ride-the-lightning: 0%
mempool: 0%
lnd: 0%
lightning-terminal: 0%
btc-rpc-explorer: 0%
btcpay-server: 0%
bluewallet: 0%
bitcoin: 0%

Memory monitor logs

2021-11-09 08:49:16 Memory monitor running!
2021-11-09 13:54:56 Memory monitor running!
2021-11-09 14:08:56 Memory monitor running!
2021-11-09 14:29:34 Memory monitor running!
2021-11-09 14:41:57 Memory monitor running!
2021-11-09 14:53:43 Memory monitor running!
2021-11-09 15:24:27 Memory monitor running!
2021-11-09 15:33:28 Memory monitor running!
2021-11-09 15:33:30 Memory monitor running!
2021-11-11 02:38:47 Memory monitor running!

Filesystem information

Filesystem Size Used Avail Use% Mounted on
/dev/root 29G 3.0G 25G 11% /
/dev/sda1 916G 481G 389G 56% /home/umbrel/umbrel

Startup service logs

– Logs begin at Mon 2021-11-15 14:50:50 UTC, end at Mon 2021-11-15 17:21:56 UTC. –
Nov 15 14:51:22 umbrel systemd[1]: Dependency failed for Umbrel Startup Service.
Nov 15 14:51:22 umbrel systemd[1]: umbrel-startup.service: Job umbrel-startup.service/start failed with result ‘dependency’.

External storage service logs

– Logs begin at Mon 2021-11-15 14:50:50 UTC, end at Mon 2021-11-15 17:21:56 UTC. –
Nov 15 14:50:57 umbrel systemd[1]: Starting External Storage Mounter…
Nov 15 14:50:57 umbrel external storage mounter[516]: Running external storage mount script…
Nov 15 14:51:01 umbrel external storage mounter[516]: Found device “SanDisk SSD PLUS 1000GB”
Nov 15 14:51:01 umbrel external storage mounter[516]: Blacklisting USB device IDs against UAS driver…
Nov 15 14:51:01 umbrel external storage mounter[516]: Rebinding USB drivers…
Nov 15 14:51:02 umbrel external storage mounter[516]: Checking USB devices are back…
Nov 15 14:51:02 umbrel external storage mounter[516]: Waiting for USB devices…
Nov 15 14:51:03 umbrel external storage mounter[516]: Waiting for USB devices…
Nov 15 14:51:04 umbrel external storage mounter[516]: Checking if the device is ext4…
Nov 15 14:51:04 umbrel external storage mounter[516]: Yes, it is ext4
Nov 15 14:51:04 umbrel external storage mounter[516]: Checking if device contains an Umbrel install…
Nov 15 14:51:04 umbrel external storage mounter[516]: Yes, it contains an Umbrel install
Nov 15 14:51:04 umbrel external storage mounter[516]: Bind mounting external storage over local Umbrel installation…
Nov 15 14:51:04 umbrel external storage mounter[516]: Bind mounting external storage over local Docker data dir…
Nov 15 14:51:04 umbrel external storage mounter[516]: Bind mounting external storage to /swap
Nov 15 14:51:04 umbrel external storage mounter[516]: Bind mounting SD card root at /sd-card…
Nov 15 14:51:04 umbrel external storage mounter[516]: Checking Umbrel root is now on external storage…
Nov 15 14:51:07 umbrel external storage mounter[516]: Checking /var/lib/docker is now on external storage…
Nov 15 14:51:07 umbrel external storage mounter[516]: Checking /swap is now on external storage…
Nov 15 14:51:07 umbrel external storage mounter[516]: Setting up swapfile
Nov 15 14:51:08 umbrel external storage mounter[516]: Setting up swapspace version 1, size = 4 GiB (4294963200 bytes)
Nov 15 14:51:08 umbrel external storage mounter[516]: no label, UUID=2e50af29-d76d-442e-aabc-c4c771f3e7d2
Nov 15 14:51:08 umbrel external storage mounter[516]: Checking SD Card root is bind mounted at /sd-root…
Nov 15 14:51:08 umbrel external storage mounter[516]: Starting external drive mount monitor…
Nov 15 14:51:08 umbrel external storage mounter[516]: Mount script completed successfully!
Nov 15 14:51:08 umbrel systemd[1]: Started External Storage Mounter.

External storage SD card update service logs

– Logs begin at Mon 2021-11-15 14:50:50 UTC, end at Mon 2021-11-15 17:21:56 UTC. –
Nov 15 14:51:21 umbrel systemd[1]: Starting External Storage SDcard Updater…
Nov 15 14:51:21 umbrel external storage updater[1298]: Checking if SD card Umbrel is newer than external storage…
Nov 15 14:51:22 umbrel external storage updater[1298]: Yes, SD version is newer.
Nov 15 14:51:22 umbrel external storage updater[1298]: Checking if the external storage version “0.4.6” satisfies update requirement “>=0.2.1”…
Nov 15 14:51:22 umbrel external storage updater[1298]: Yes, it does, attempting an automatic update…
Nov 15 14:51:22 umbrel external storage updater[1298]: =======================================
Nov 15 14:51:22 umbrel external storage updater[1298]: =============== UPDATE ================
Nov 15 14:51:22 umbrel external storage updater[1298]: =======================================
Nov 15 14:51:22 umbrel external storage updater[1298]: ========== Stage: Download ============
Nov 15 14:51:22 umbrel external storage updater[1298]: =======================================
Nov 15 14:51:22 umbrel external storage updater[1298]: An update is already in progress. Exiting now.
Nov 15 14:51:22 umbrel systemd[1]: umbrel-external-storage-sdcard-update.service: Main process exited, code=exited, status=2/INVALIDARGUMENT
Nov 15 14:51:22 umbrel systemd[1]: umbrel-external-storage-sdcard-update.service: Failed with result ‘exit-code’.
Nov 15 14:51:22 umbrel systemd[1]: Failed to start External Storage SDcard Updater.

Karen logs

Stopping middleware …
Stopping bitcoin …
Stopping nginx …
Stopping lnd …
Stopping manager …
Stopping electrs …
Stopping umbrel_app_3_tor_1 …
Stopping tor …
Stopping umbrel_app_tor_1 …
Stopping dashboard …
Stopping umbrel_app_2_tor_1 …
Stopping umbrel_app_tor_1 … done
Stopping middleware … done
Stopping umbrel_app_3_tor_1 … done
Stopping umbrel_app_2_tor_1 … done
Stopping electrs … done
Stopping lnd … done
Stopping bitcoin … done
Stopping nginx … done
Stopping manager … done
Stopping dashboard … done
Stopping tor … done
Removing middleware …
Removing neutrino-switcher …
Removing bitcoin …
Removing nginx …
Removing lnd …
Removing manager …
Removing electrs …
Removing umbrel_app_3_tor_1 …
Removing tor …
Removing umbrel_app_tor_1 …
Removing dashboard …
Removing umbrel_app_2_tor_1 …
Removing neutrino-switcher … done
Removing dashboard … done
Removing lnd … done
Removing umbrel_app_3_tor_1 … done
Removing bitcoin … done
Removing electrs … done
Removing middleware … done
Removing umbrel_app_2_tor_1 … done
Removing nginx … done
Removing tor … done
Removing umbrel_app_tor_1 … done
Removing manager … done
Removing network umbrel_main_network
karen is running in /home/umbrel/umbrel/events
karen is running in /home/umbrel/umbrel/events
karen is running in /home/umbrel/umbrel/events

Docker containers

NAMES STATUS
electrs Restarting (1) 13 seconds ago

Umbrel logs

Attaching to

Bitcoin Core logs

Attaching to

LND logs

Attaching to

electrs logs

Attaching to electrs
electrs | [2021-11-15T17:19:33.519Z INFO electrs::chain] loading 708941 headers, tip=00000000000000000005e1c118d0b8a3e5917c693421d32442be8c63f1a81adc
electrs | [2021-11-15T17:19:37.444Z INFO electrs::chain] chain updated: tip=00000000000000000005e1c118d0b8a3e5917c693421d32442be8c63f1a81adc, height=708941
electrs | [2021-11-15T17:19:37.476Z INFO electrs::db] closing DB at /data/db/bitcoin
electrs | Error: electrs failed
electrs |
electrs | Caused by:
electrs | 0: failed to open bitcoind cookie file: /data/.bitcoin/.cookie
electrs | 1: No such file or directory (os error 2)
electrs | Starting electrs 0.9.1 on aarch64 linux with Config { network: Bitcoin, db_path: “/data/db/bitcoin”, daemon_dir: “/data/.bitcoin”, daemon_auth: CookieFile("/data/.bitcoin/.cookie"), daemon_rpc_addr: V4(10.21.21.8:8332), daemon_p2p_addr: V4(10.21.21.8:8333), electrum_rpc_addr: V4(0.0.0.0:50001), monitoring_addr: V4(127.0.0.1:4224), wait_duration: 10s, jsonrpc_timeout: 15s, index_batch_size: 10, index_lookup_limit: Some(200), reindex_last_blocks: 0, auto_reindex: true, ignore_mempool: false, sync_once: false, disable_electrum_rpc: false, server_banner: “Umbrel v0.4.6”, args: [] }
electrs | [2021-11-15T17:20:38.410Z INFO electrs::metrics::metrics_impl] serving Prometheus metrics on 127.0.0.1:4224
electrs | [2021-11-15T17:20:38.459Z INFO electrs::db] “/data/db/bitcoin”: 134 SST files, 32.606096919 GB, 4.035228625 Grows
electrs | [2021-11-15T17:20:42.654Z INFO electrs::chain] loading 708941 headers, tip=00000000000000000005e1c118d0b8a3e5917c693421d32442be8c63f1a81adc
electrs | [2021-11-15T17:20:46.589Z INFO electrs::chain] chain updated: tip=00000000000000000005e1c118d0b8a3e5917c693421d32442be8c63f1a81adc, height=708941
electrs | [2021-11-15T17:20:46.620Z INFO electrs::db] closing DB at /data/db/bitcoin
electrs | Error: electrs failed
electrs |
electrs | Caused by:
electrs | 0: failed to open bitcoind cookie file: /data/.bitcoin/.cookie
electrs | 1: No such file or directory (os error 2)
electrs | Starting electrs 0.9.1 on aarch64 linux with Config { network: Bitcoin, db_path: “/data/db/bitcoin”, daemon_dir: “/data/.bitcoin”, daemon_auth: CookieFile("/data/.bitcoin/.cookie"), daemon_rpc_addr: V4(10.21.21.8:8332), daemon_p2p_addr: V4(10.21.21.8:8333), electrum_rpc_addr: V4(0.0.0.0:50001), monitoring_addr: V4(127.0.0.1:4224), wait_duration: 10s, jsonrpc_timeout: 15s, index_batch_size: 10, index_lookup_limit: Some(200), reindex_last_blocks: 0, auto_reindex: true, ignore_mempool: false, sync_once: false, disable_electrum_rpc: false, server_banner: “Umbrel v0.4.6”, args: [] }
electrs | [2021-11-15T17:21:47.548Z INFO electrs::metrics::metrics_impl] serving Prometheus metrics on 127.0.0.1:4224
electrs | [2021-11-15T17:21:47.598Z INFO electrs::db] “/data/db/bitcoin”: 135 SST files, 32.606097723 GB, 4.035228626 Grows
electrs | [2021-11-15T17:21:51.846Z INFO electrs::chain] loading 708941 headers, tip=00000000000000000005e1c118d0b8a3e5917c693421d32442be8c63f1a81adc
electrs | [2021-11-15T17:21:55.995Z INFO electrs::chain] chain updated: tip=00000000000000000005e1c118d0b8a3e5917c693421d32442be8c63f1a81adc, height=708941
electrs | [2021-11-15T17:21:56.041Z INFO electrs::db] closing DB at /data/db/bitcoin
electrs | Error: electrs failed
electrs |
electrs | Caused by:
electrs | 0: failed to open bitcoind cookie file: /data/.bitcoin/.cookie
electrs | 1: No such file or directory (os error 2)

Tor logs

Attaching to tor, umbrel_app_tor_1, umbrel_app_2_tor_1
tor | Nov 13 00:14:07.000 [notice] Bootstrapped 66% (loading_descriptors): Loading relay descriptors
tor | Nov 13 00:14:07.000 [notice] Bootstrapped 72% (loading_descriptors): Loading relay descriptors
tor | Nov 13 00:14:07.000 [notice] Bootstrapped 75% (enough_dirinfo): Loaded enough directory info to build circuits
tor | Nov 13 00:14:07.000 [notice] Bootstrapped 80% (ap_conn): Connecting to a relay to build circuits
tor | Nov 13 00:14:07.000 [notice] Bootstrapped 85% (ap_conn_done): Connected to a relay to build circuits
tor | Nov 13 00:14:08.000 [notice] Bootstrapped 89% (ap_handshake): Finishing handshake with a relay to build circuits
tor | Nov 13 00:14:09.000 [notice] Bootstrapped 90% (ap_handshake_done): Handshake finished with a relay to build circuits
tor | Nov 13 00:14:09.000 [notice] Bootstrapped 95% (circuit_create): Establishing a Tor circuit
tor | Nov 13 00:14:10.000 [notice] Bootstrapped 100% (done): Done
tor | Nov 13 00:15:51.000 [notice] Catching signal TERM, exiting cleanly.
app_tor_1 | Nov 13 00:14:07.000 [notice] Bootstrapped 80% (ap_conn): Connecting to a relay to build circuits
app_tor_1 | Nov 13 00:14:07.000 [notice] Bootstrapped 85% (ap_conn_done): Connected to a relay to build circuits
app_tor_1 | Nov 13 00:14:08.000 [notice] Bootstrapped 89% (ap_handshake): Finishing handshake with a relay to build circuits
app_tor_1 | Nov 13 00:14:08.000 [notice] Bootstrapped 90% (ap_handshake_done): Handshake finished with a relay to build circuits
app_tor_1 | Nov 13 00:14:08.000 [notice] Bootstrapped 95% (circuit_create): Establishing a Tor circuit
app_tor_1 | Nov 13 00:14:09.000 [notice] Bootstrapped 100% (done): Done
app_tor_1 | Nov 13 00:14:24.000 [notice] Your network connection speed appears to have changed. Resetting timeout to 60s after 18 timeouts and 100 buildtimes.
app_tor_1 | Nov 13 00:14:25.000 [notice] Guard dragonhoard ($96B70F86B45623B4EC71E8E953DCD61A05D5BAAC) is failing more circuits than usual. Most likely this means the Tor network is overloaded. Success counts are 86/151. Use counts are 57/57. 86 circuits completed, 0 were unusable, 0 collapsed, and 0 timed out. For reference, your timeout cutoff is 60 seconds.
app_tor_1 | Nov 13 00:14:25.000 [warn] Guard dragonhoard ($96B70F86B45623B4EC71E8E953DCD61A05D5BAAC) is failing a very large amount of circuits. Most likely this means the Tor network is overloaded, but it could also mean an attack against you or potentially the guard itself. Success counts are 86/173. Use counts are 57/57. 86 circuits completed, 0 were unusable, 0 collapsed, and 0 timed out. For reference, your timeout cutoff is 60 seconds.
app_tor_1 | Nov 13 00:15:51.000 [notice] Catching signal TERM, exiting cleanly.
app_2_tor_1 | Nov 13 00:14:08.000 [notice] Bootstrapped 80% (ap_conn): Connecting to a relay to build circuits
app_2_tor_1 | Nov 13 00:14:08.000 [notice] Bootstrapped 85% (ap_conn_done): Connected to a relay to build circuits
app_2_tor_1 | Nov 13 00:14:09.000 [notice] Bootstrapped 89% (ap_handshake): Finishing handshake with a relay to build circuits
app_2_tor_1 | Nov 13 00:14:09.000 [notice] Bootstrapped 90% (ap_handshake_done): Handshake finished with a relay to build circuits
app_2_tor_1 | Nov 13 00:14:09.000 [notice] Bootstrapped 95% (circuit_create): Establishing a Tor circuit
app_2_tor_1 | Nov 13 00:14:10.000 [notice] Bootstrapped 100% (done): Done
app_2_tor_1 | Nov 13 00:14:26.000 [notice] Guard torhammer ($DA84D41783BBB6058CB9DF8C90697E8D5EA647C3) is failing more circuits than usual. Most likely this means the Tor network is overloaded. Success counts are 95/151. Use counts are 58/58. 95 circuits completed, 0 were unusable, 0 collapsed, and 0 timed out. For reference, your timeout cutoff is 60 seconds.
app_2_tor_1 | Nov 13 00:14:27.000 [warn] Guard torhammer ($DA84D41783BBB6058CB9DF8C90697E8D5EA647C3) is failing a very large amount of circuits. Most likely this means the Tor network is overloaded, but it could also mean an attack against you or potentially the guard itself. Success counts are 96/193. Use counts are 59/59. 96 circuits completed, 0 were unusable, 0 collapsed, and 0 timed out. For reference, your timeout cutoff is 60 seconds.
app_2_tor_1 | Nov 13 00:14:27.000 [notice] Your network connection speed appears to have changed. Resetting timeout to 60s after 18 timeouts and 122 buildtimes.
app_2_tor_1 | Nov 13 00:15:51.000 [notice] Catching signal TERM, exiting cleanly.

App logs

bluewallet

Attaching to

btc-rpc-explorer

Attaching to

btcpay-server

Attaching to

lightning-terminal

Attaching to

mempool

Attaching to

ride-the-lightning

Attaching to

specter-desktop

Attaching to

sphinx-relay

Attaching to

thunderhub

Attaching to

vaultwarden

Attaching to

==== Result ====

The debug script did not automatically detect any issues with your Umbrel.

when trying to SSH into umbreal, I am getting:

ssh -t umbrel@umbrel.local
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@ WARNING: POSSIBLE DNS SPOOFING DETECTED! @
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
The ECDSA host key for umbrel.local has changed,
and the key for the corresponding IP address 2601:243:c700:2b00::5524
is unknown. This could either mean that
DNS SPOOFING is happening or the IP address for the host
and its host key have changed at the same time.
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@ WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED! @
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
IT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY!
Someone could be eavesdropping on you right now (man-in-the-middle attack)!
It is also possible that a host key has just been changed.
The fingerprint for the ECDSA key sent by the remote host is
SHA256:nMVOAcDdDoXW20wr+3RwiJdTYFry3P0Wp7pQetMwwEA.
Please contact your system administrator.
Add correct host key in C:\Users\Glass/.ssh/known_hosts to get rid of this message.
Offending ECDSA key in C:\Users\Glass/.ssh/known_hosts:1
ECDSA host key for umbrel.local has changed and you have requested strict checking.
Host key verification failed.

should i be worried about this or is it the IP is automatically changing?

I am getting the error: Failed to start External Storage SDcard Updater.
from what I am reading in the forums, it may be that the updater is failing (considering I reflashed to newest umbrel OS available).

If I can’t SSH into the node, how can I stop it from checking for updates?

This thread, that image, and fathoming going through this issue scares the bejesus out of me.

1 Like

I would like to suggest to the Umbrel development team to implement an HDD / SDD check on reboot or when an issue like the above is detected.

Raspiblitz has a script that seems to work well:

This is a great suggestion @alphaazeta.

Pinging @mayank, @lukechilds and @nevets963 on this.

Hi there, I ran all the commands and extracted the log.

The log shows everything working normal. It ends with

Attaching to 
================
==== Result ====
================
The debug script did not automatically detect any issues with your Umbrel.

And reading through the details there are no warnings nor alerts. Yet I still get the Red Umbrella of Death

Any idea of what I could do? I was considering flashing the OS again in the SD card, but since the logs show everything correctly my hunch is that it is not the OS.