Docker will not start

I had for several weeks a perfectly running umbrel. I had made no changes recently. Using RPI 4 with ssd all running fine. Power was disconnected from pi and reconnected. Upon power up pi boots up normally but docker will not start with the following error message

docker.service - Docker Application Container Engine

Loaded: loaded (/lib/systemd/system/docker.service; enabled; vendor preset: enabled)
Active: failed (Result: exit-code) since Thu 2021-09-16 04:55:04 UTC; 27s ago
Docs: https://docs.docker.com
Process: 4652 ExecStart=/usr/bin/dockerd --containerd=/run/containerd/containerd.sock (code=exited, status=2)
ain PID: 4652 (code=exited, status=2)

p 16 04:55:04 umbrel systemd[1]: Failed to start Docker Application Container Engine.
p 16 04:55:07 umbrel systemd[1]: docker.service: Start request repeated too quickly.
p 16 04:55:07 umbrel systemd[1]: docker.service: Failed with result ‘exit-code’.
p 16 04:55:07 umbrel systemd[1]: Failed to start Docker Application Container Engine.
p 16 04:55:19 umbrel systemd[1]: docker.service: Start request repeated too quickly.
p 16 04:55:19 umbrel systemd[1]: docker.service: Failed with result ‘exit-code’.
p 16 04:55:19 umbrel systemd[1]: Failed to start Docker Application Container Engine.
p 16 04:55:31 umbrel systemd[1]: docker.service: Start request repeated too quickly.
p 16 04:55:31 umbrel systemd[1]: docker.service: Failed with result ‘exit-code’.
p 16 04:55:31 umbrel systemd[1]: Failed to start Docker Application Container Engine.

From journalctl -xe

i-- The unit docker.service has entered the 'failed' state with result 'exit-code'.

Sep 16 05:04:29 umbrel systemd[1]: Failed to start Docker Application Container Engine.
– Subject: A start job for unit docker.service has failed
– Defined-By: systemd
– Support: https://www.debian.org/support

– A start job for unit docker.service has finished with a failure.

– The job identifier is 5832 and the job result is failed.
Sep 16 05:04:29 umbrel systemd[1]: docker.socket: Failed with result ‘service-start-limit-hit’.
– Subject: Unit failed
– Defined-By: systemd
– Support: https://www.debian.org/support

– The unit docker.socket has entered the ‘failed’ state with result ‘service-start-limit-hit’.
Sep 16 05:04:29 umbrel sudo[28137]: pam_unix(sudo:session): session closed for user root

Suggestions?

How to fix this issue:

  • just in case, re-flash the mSD card with the latest version of UmbrelOS (exactly the steps you did first time installing your node using the instructions from getumbrel.com
  • If still don’t do nothing, use this command in SSH

sudo systemctl stop umbrel-startup.service && docker system prune --force --all && sudo systemctl start umbrel-startup.service

Restart your node

sudo reboot

This suggestion did not work. Even after flashing new sdcard with umbrel and booting up, then sshing in and entering the commands above to stop service, prune docker and start service again, I still get the messages below from journalctl -xe regarding docker failing to start when I had not changed anything (only power was disconnected and reconnected).

Other suggestions? I really don’t want to spend 5 to 7 days syncing bitcoin network again using umbrel if I don’t understand why this happened or how to fix it. I already have a regular bitcoin node and wanted the convenience of having that node integrated with thunderhub and samurai dojo. Dealing with docker is however something I am not used to, but willing to learn.

    Sep 16 21:24:51 umbrel umbrel startup[15903]:     'Error while fetching server API version: {0}'.format(e)
Sep 16 21:24:51 umbrel umbrel startup[15903]: docker.errors.DockerException: Error while fetching server API version: ('Connection abor
Sep 16 21:24:51 umbrel umbrel startup[15903]: Failed to start containers
Sep 16 21:24:51 umbrel systemd[1]: umbrel-startup.service: Control process exited, code=exited, status=1/FAILURE
-- Subject: Unit process exited
-- Defined-By: systemd
-- Support: https://www.debian.org/support
--
-- An ExecStart= process belonging to unit umbrel-startup.service has exited.
--
-- The process' exit code is 'exited' and its exit status is 1.
Sep 16 21:24:51 umbrel systemd[1]: umbrel-startup.service: Failed with result 'exit-code'.
-- Subject: Unit failed
-- Defined-By: systemd
-- Support: https://www.debian.org/support
--
-- The unit umbrel-startup.service has entered the 'failed' state with result 'exit-code'.
Sep 16 21:24:51 umbrel systemd[1]: Failed to start Umbrel Startup Service.
-- Subject: A start job for unit umbrel-startup.service has failed
-- Defined-By: systemd
-- Support: https://www.debian.org/support
--
-- A start job for unit umbrel-startup.service has finished with a failure.

docker.service - Docker Application Container Engine
 Loaded: loaded (/lib/systemd/system/docker.service; enabled; vendor preset: enabled)
 Active: failed (Result: exit-code) since Thu 2021-09-16 21:37:46 UTC; 30s ago
   Docs: https://docs.docker.com
Process: 13139 ExecStart=/usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock (code=exited, status=2)
ain PID: 13139 (code=exited, status=2)

p 16 21:37:46 umbrel systemd[1]: Stopped Docker Application Container Engine.
p 16 21:37:46 umbrel systemd[1]: docker.service: Start request repeated too quickly.
p 16 21:37:46 umbrel systemd[1]: docker.service: Failed with result 'exit-code'.
p 16 21:37:46 umbrel systemd[1]: Failed to start Docker Application Container Engine.
p 16 21:37:57 umbrel systemd[1]: docker.service: Start request repeated too quickly.
p 16 21:37:57 umbrel systemd[1]: docker.service: Failed with result 'exit-code'.
p 16 21:37:57 umbrel systemd[1]: Failed to start Docker Application Container Engine.
p 16 21:38:09 umbrel systemd[1]: docker.service: Start request repeated too quickly.
p 16 21:38:09 umbrel systemd[1]: docker.service: Failed with result 'exit-code'.
p 16 21:38:09 umbrel systemd[1]: Failed to start Docker Application Container Engine.

I have learned that when I ssh in and try to execute “docker system prune --force -all” that the docker daemon is not running. I am not docker experienced enough to know whether the docker daemon and docker are the same thing or not. But how can I get the docker daemon running is not also part of my question if that daemon is different from docker.

What does this error message mean?

Sep 16 20:33:33 umbrel systemd[1]: docker.service: Main process exited, code=exited, status=2/INVALIDARGUMENT

I think I am tracing logs closer to issue preventing docker from starting. In system log, the line below occurs and then docker exits with error code.

 Sep 16 23:26:17 umbrel dockerd[26041]: #011/go/src/github.com/docker/docker/cmd/dockerd/docker.go:97 +0x188

Sep 16 23:26:17 umbrel systemd[1]: docker.service: Main process exited, code=exited, status=2/INVALIDARGUMENT
Sep 16 23:26:17 umbrel systemd[1]: docker.service: Failed with result ‘exit-code’.

Hi, I have the exact same issue. Did you manage to resolve?

@mayank or @lukechilds can you assist with this situation?
Do you think a re-flash of mSD could fix it?

I tried reflash of sdcard and it did not work. I finally had to start from scratch since I could not trace the problem to the source.

@cja1 can you please try starting from scratch as @socrates pointed out:

  1. Reflash Umbrel OS on your microSD card
  2. Format your SSD
  3. Plug everything back in

Let us know if that works!

@mayank I would like to point out that I am not willing to start from scratch more than 3 times (3 strikes rule) given that the entire bitcoin history must be downloaded again wasting time and bandwidth. I personally think the stability of many sd cards is much worse than the stability of ssd drives. I would prefer to boot off of and run all of umbrel plus the bitcoin data block history from ssd and simply make copies of the entire ssd drive which is easy to do. I have had far less corruption of ssd drives than sd cards. I have not seen an easy way to get Umbrel and all the data all booting off of a ssd with a raspberry pi, but I am hoping umbrel community will make this happen eventually and give us some easy instructions regarding how to do run it all from ssd since I suspect my crash was related to power outage affecting sd card knocking docker out.

One of my nodes is an umbrel node on a standard linux Debian machine.
Never had any issues.
And also I have an UPS that protect it from quick power outages.

Thanks @mayank, I’ve probably re-flashed the microSD card 8-10 times. I have 2 microSD cards and alternate between them. I’ve also wiped the SSD and retried too.

Looking for alternative ideas!

One of my nodes is experiencing the same issue.

While running debug, the following error message is repeated through all docker containers:

docker.errors.DockerException: Error while fetching server API version: ('Connection aborted.', ConnectionRefusedError(111, 'Connection refused'))

I’ve reflashed the SSD and it worked for a few days then back to same issues.

This specific node has been running for months without issues. I’m suspicious of something on a recent upgrade but can’t be sure.

Running systemctl status docker returns:

Running journalctl -xe returns:

Oct 11 11:39:19 umbrel systemd[1]: docker.service: Scheduled restart job, restart counter is at 3.
-- Subject: Automatic restarting of a unit has been scheduled
-- Defined-By: systemd
-- Support: https://www.debian.org/support
-- 
-- Automatic restarting of the unit docker.service has been scheduled, as the result for
-- the configured Restart= setting for the unit.
Oct 11 11:39:19 umbrel systemd[1]: Stopped Docker Application Container Engine.
-- Subject: A stop job for unit docker.service has finished
-- Defined-By: systemd
-- Support: https://www.debian.org/support
-- 
-- A stop job for unit docker.service has finished.
-- 
-- The job identifier is 1320 and the job result is done.
Oct 11 11:39:19 umbrel systemd[1]: docker.service: Start request repeated too quickly.
Oct 11 11:39:19 umbrel systemd[1]: docker.service: Failed with result 'exit-code'.
-- Subject: Unit failed
-- Defined-By: systemd
-- Support: https://www.debian.org/support
-- 
-- The unit docker.service has entered the 'failed' state with result 'exit-code'.
Oct 11 11:39:19 umbrel systemd[1]: Failed to start Docker Application Container Engine.
-- Subject: A start job for unit docker.service has failed
-- Defined-By: systemd
-- Support: https://www.debian.org/support
-- 
-- A start job for unit docker.service has finished with a failure.
-- 
-- The job identifier is 1320 and the job result is failed.
Oct 11 11:39:19 umbrel systemd[1]: docker.socket: Failed with result 'service-start-limit-hit'.
-- Subject: Unit failed
-- Defined-By: systemd
-- Support: https://www.debian.org/support
-- 
-- The unit docker.socket has entered the 'failed' state with result 'service-start-limit-hit'.

FIXED

Upon further investigation I was able to fix this issue which had a root cause on a damaged file system. Here are the steps I took.

  1. Unmount the drive:
sudo umount /dev/sda1

If you have issues unmounting (disk is in use).
Type:
ps -auxw
inspect the running operations and kill all running processes with user umbrel:
sudo kill -9 <PID>

  1. Check the drive:
sudo e2fsck /dev/sda1
  1. Reboot
sudo reboot

I may have a bad SDD on this node :frowning_face: