Majority of channels go offline and never come back....until reboot

Thanks for the instruction. The port was closed somehow, so opened to my node then rebooted now.
But the debug log keeps saying the same thing…

Bitcoin Core logs
-----------------

Attaching to bitcoin
bitcoin              | 2022-02-26T20:37:09Z New outbound peer connected: version: 70016, blocks=725047, peer=16 (outbound-full-relay)
bitcoin              | 2022-02-26T20:37:13Z New outbound peer connected: version: 70016, blocks=725047, peer=17 (block-relay-only)
bitcoin              | 2022-02-26T20:37:16Z New outbound peer connected: version: 70016, blocks=725047, peer=18 (block-relay-only)
bitcoin              | 2022-02-26T20:37:52Z UpdateTip: new best=0000000000000000000076476ffc3a097d2139a6274a26a98c08e9f7c884af0b height=725048 version=0x2000e000 log2_work=93.370910 tx=713598455 date='2022-02-26T20:37:33Z' progress=1.000000 cache=5.7MiB(42633txo)
bitcoin              | 2022-02-26T20:37:52Z BlockUntilSyncedToCurrentChain: txindex is catching up on block notifications
bitcoin              | 2022-02-26T20:38:26Z Socks5() connect to 18.162.133.34:8333 failed: general failure
bitcoin              | 2022-02-26T20:38:31Z Socks5() connect to 18.162.133.34:8333 failed: general failure
bitcoin              | 2022-02-26T20:43:40Z UpdateTip: new best=0000000000000000000610a44205b14c3b6239de23036057ce5b80bb16ab0ada height=725049 version=0x20400000 log2_work=93.370923 tx=713600546 date='2022-02-26T20:42:46Z' progress=1.000000 cache=6.6MiB(49326txo)
bitcoin              | 2022-02-26T20:45:30Z Socks5() connect to 54.37.194.43:8333 failed: InterruptibleRecv() timeout or other failure
bitcoin              | 2022-02-26T20:47:00Z Socks5() connect to 216.86.93.72:8333 failed: InterruptibleRecv() timeout or other failure
bitcoin              | 2022-02-26T20:47:22Z Socks5() connect to 188.72.203.144:8333 failed: InterruptibleRecv() timeout or other failure
bitcoin              | 2022-02-26T20:48:29Z Socks5() connect to 64.33.171.130:8333 failed: InterruptibleRecv() timeout or other failure
bitcoin              | 2022-02-26T20:49:03Z UpdateTip: new best=0000000000000000000222ea98a144e7dab0da6148236e6e4e19609611aa7b7b height=725050 version=0x27ffe000 log2_work=93.370937 tx=713602351 date='2022-02-26T20:48:31Z' progress=1.000000 cache=7.3MiB(55155txo)
bitcoin              | 2022-02-26T20:49:04Z Socks5() connect to 154.6.24.94:8333 failed: InterruptibleRecv() timeout or other failure
bitcoin              | 2022-02-26T20:50:15Z Socks5() connect to 194.14.246.8:8333 failed: InterruptibleRecv() timeout or other failure
bitcoin              | 2022-02-26T20:53:02Z UpdateTip: new best=00000000000000000009460df78eb438d4f2dcc435fc52415cd9b4067822122b height=725051 version=0x20600004 log2_work=93.370950 tx=713603985 date='2022-02-26T20:52:31Z' progress=1.000000 cache=7.7MiB(58849txo)
bitcoin              | 2022-02-26T20:55:53Z Socks5() connect to 178.112.81.138:8333 failed: general failure
bitcoin              | 2022-02-26T21:13:34Z UpdateTip: new best=000000000000000000099ac1af00057472da35b55aaaf3ca33ec3f948436c5a2 height=725052 version=0x26586004 log2_work=93.370964 tx=713606337 date='2022-02-26T21:13:23Z' progress=1.000000 cache=10.4MiB(76161txo)
bitcoin              | 2022-02-26T21:14:18Z UpdateTip: new best=00000000000000000000d20339779bee14ad1e05d6de5d4814ef1098ca7995f2 height=725053 version=0x2000e000 log2_work=93.370977 tx=713607662 date='2022-02-26T21:13:53Z' progress=1.000000 cache=10.7MiB(79189txo)
bitcoin              | 2022-02-26T21:18:58Z New outbound peer connected: version: 70016, blocks=725053, peer=23 (block-relay-only)
bitcoin              | 2022-02-26T21:21:57Z UpdateTip: new best=0000000000000000000945398a20809660a815ebf77e8f0f91a3abb919390f3b height=725054 version=0x27ffe000 log2_work=93.370991 tx=713609384 date='2022-02-26T21:21:37Z' progress=1.000000 cache=11.5MiB(85351txo)
bitcoin              | 2022-02-26T21:23:06Z UpdateTip: new best=00000000000000000004862060f857a003c0ebd27bb8ef273f5bb7e0650eaac1 height=725055 version=0x20a00004 log2_work=93.371004 tx=713609692 date='2022-02-26T21:21:56Z' progress=1.000000 cache=11.6MiB(86267txo)
bitcoin              | 2022-02-26T21:25:38Z Socks5() connect to 2600:1700:1851:6fe0::2b:8333 failed: InterruptibleRecv() timeout or other failure
bitcoin              | 2022-02-26T21:25:58Z Socks5() connect to 173.216.31.172:8333 failed: InterruptibleRecv() timeout or other failure
bitcoin              | 2022-02-26T21:26:19Z Socks5() connect to 188.255.85.37:8333 failed: InterruptibleRecv() timeout or other failure
bitcoin              | 2022-02-26T21:26:43Z Socks5() connect to 31.48.89.237:8333 failed: InterruptibleRecv() timeout or other failure
bitcoin              | 2022-02-26T21:32:56Z Socks5() connect to 87.122.9.150:8333 failed: general failure

LND logs
--------

Attaching to lnd
lnd                  |  ChainHash: (chainhash.Hash) (len=32 cap=32) 000000000019d6689c085ae165831e934ff763ae46a2a6c172b3f1b60a8ce26f,
lnd                  |  ShortChannelID: (lnwire.ShortChannelID) 633958:1809:0,
lnd                  |  Timestamp: (uint32) 1645878489,
lnd                  |  MessageFlags: (lnwire.ChanUpdateMsgFlags) 00000001,
lnd                  |  ChannelFlags: (lnwire.ChanUpdateChanFlags) 00000000,
lnd                  |  TimeLockDelta: (uint16) 40,
lnd                  |  HtlcMinimumMsat: (lnwire.MilliSatoshi) 1000 mSAT,
lnd                  |  BaseFee: (uint32) 1000,
lnd                  |  FeeRate: (uint32) 100,
lnd                  |  HtlcMaximumMsat: (lnwire.MilliSatoshi) 9900000000 mSAT,
lnd                  |  ExtraOpaqueData: (lnwire.ExtraOpaqueData) {
lnd                  |  }
lnd                  | })
lnd                  | )@6
lnd                  | 2022-02-26 21:33:57.472 [INF] SRVR: Established connection to: 033878501f9a4ce97dba9a6bba4e540eca46cb129a322eb98ea1749ed18ab67735@86.127.240.127:9735
lnd                  | 2022-02-26 21:33:57.472 [INF] SRVR: Finalizing connection to 033878501f9a4ce97dba9a6bba4e540eca46cb129a322eb98ea1749ed18ab67735@86.127.240.127:9735, inbound=false
lnd                  | 2022-02-26 21:33:57.511 [INF] PEER: NodeKey(02e9046555a9665145b0dbd7f135744598418df7d61d3660659641886ef1274844) loading ChannelPoint(b8ad58d411049040ebe8fe97986b0c095ec203bb2d34ca21259780e90084ffdf:0)
lnd                  | 2022-02-26 21:33:57.512 [INF] HSWC: Removing channel link with ChannelID(dfff8400e980972521ca342dbb03c25e090c6b9897fee8eb40900411d458adb8)
lnd                  | 2022-02-26 21:33:57.512 [INF] HSWC: ChannelLink(b8ad58d411049040ebe8fe97986b0c095ec203bb2d34ca21259780e90084ffdf:0): starting
lnd                  | 2022-02-26 21:33:57.512 [INF] HSWC: Trimming open circuits for chan_id=722806:2422:0, start_htlc_id=1429
lnd                  | 2022-02-26 21:33:57.512 [INF] HSWC: Adding live link chan_id=dfff8400e980972521ca342dbb03c25e090c6b9897fee8eb40900411d458adb8, short_chan_id=722806:2422:0
lnd                  | 2022-02-26 21:33:57.512 [INF] NTFN: New block epoch subscription
lnd                  | 2022-02-26 21:33:57.512 [INF] CNCT: Attempting to update ContractSignals for ChannelPoint(b8ad58d411049040ebe8fe97986b0c095ec203bb2d34ca21259780e90084ffdf:0)
lnd                  | 2022-02-26 21:33:57.512 [INF] HSWC: ChannelLink(b8ad58d411049040ebe8fe97986b0c095ec203bb2d34ca21259780e90084ffdf:0): HTLC manager started, bandwidth=4276891969 mSAT
lnd                  | 2022-02-26 21:33:57.512 [INF] HSWC: ChannelLink(b8ad58d411049040ebe8fe97986b0c095ec203bb2d34ca21259780e90084ffdf:0): attempting to re-synchronize
lnd                  | 2022-02-26 21:33:57.514 [INF] PEER: Negotiated chan series queries with 02e9046555a9665145b0dbd7f135744598418df7d61d3660659641886ef1274844
lnd                  | 2022-02-26 21:33:57.514 [INF] DISC: Creating new GossipSyncer for peer=02e9046555a9665145b0dbd7f135744598418df7d61d3660659641886ef1274844
lnd                  | 2022-02-26 21:33:57.520 [INF] HSWC: ChannelLink(b8ad58d411049040ebe8fe97986b0c095ec203bb2d34ca21259780e90084ffdf:0): received re-establishment message from remote side
lnd                  | 2022-02-26 21:33:57.780 [WRN] CRTR: Channel 714084423817756672 has zero cltv delta
lnd                  | 2022-02-26 21:33:58.238 [INF] PEER: disconnecting 033878501f9a4ce97dba9a6bba4e540eca46cb129a322eb98ea1749ed18ab67735@86.127.240.127:9735, reason: unable to start peer: unable to read init msg: EOF

Tor logs
--------

Attaching to umbrel_tor_server_1, tor
tor                  | Feb 26 21:33:53.000 [notice] Have tried resolving or connecting to address '[scrubbed]' at 3 different places. Giving up.
tor                  | Feb 26 21:33:53.000 [notice] Have tried resolving or connecting to address '[scrubbed]' at 3 different places. Giving up.
tor                  | Feb 26 21:33:53.000 [notice] Have tried resolving or connecting to address '[scrubbed]' at 3 different places. Giving up.
tor                  | Feb 26 21:33:55.000 [notice] Have tried resolving or connecting to address '[scrubbed]' at 3 different places. Giving up.
tor                  | Feb 26 21:33:56.000 [notice] Have tried resolving or connecting to address '[scrubbed]' at 3 different places. Giving up.
tor                  | Feb 26 21:34:02.000 [notice] Have tried resolving or connecting to address '[scrubbed]' at 3 different places. Giving up.
tor_server_1         | Feb 26 20:35:01.000 [notice] Guard Piratenpartei10 ($166850D169CC7956E77525A1A9228BC4563CFC8B) is failing more circuits than usual. Most likely this means the Tor network is overloaded. Success counts are 101/151. Use counts are 61/61. 101 circuits completed, 0 were unusable, 0 collapsed, and 0 timed out. For reference, your timeout cutoff is 60 seconds.
tor_server_1         | Feb 26 20:35:01.000 [warn] Guard Piratenpartei10 ($166850D169CC7956E77525A1A9228BC4563CFC8B) is failing a very large amount of circuits. Most likely this means the Tor network is overloaded, but it could also mean an attack against you or potentially the guard itself. Success counts are 101/203. Use counts are 61/61. 101 circuits completed, 0 were unusable, 0 collapsed, and 0 timed out. For reference, your timeout cutoff is 60 seconds.
tor_server_1         | Feb 26 20:35:04.000 [notice] Your network connection speed appears to have changed. Resetting timeout to 60000ms after 18 timeouts and 102 buildtimes.
tor_server_1         | Feb 26 20:35:09.000 [notice] Your network connection speed appears to have changed. Resetting timeout to 60000ms after 18 timeouts and 102 buildtimes.
tor_server_1         | Feb 26 20:35:10.000 [notice] Guard gbt2USicebeer06b ($D75510F5C9F356554AA47B3FB2283DA479B47574) is failing more circuits than usual. Most likely this means the Tor network is overloaded. Success counts are 100/151. Use counts are 60/60. 100 circuits completed, 0 were unusable, 0 collapsed, and 18 timed out. For reference, your timeout cutoff is 60 seconds.
tor_server_1         | Feb 26 20:35:10.000 [warn] Guard gbt2USicebeer06b ($D75510F5C9F356554AA47B3FB2283DA479B47574) is failing a very large amount of circuits. Most likely this means the Tor network is overloaded, but it could also mean an attack against you or potentially the guard itself. Success counts are 100/201. Use counts are 60/60. 100 circuits completed, 0 were unusable, 0 collapsed, and 18 timed out. For reference, your timeout cutoff is 60 seconds.
tor_server_1         | Feb 26 20:35:11.000 [notice] Guard veha ($7EDDD17E812AD07C3F0C48D5B3999BA6CB55CC2C) is failing more circuits than usual. Most likely this means the Tor network is overloaded. Success counts are 101/151. Use counts are 59/59. 101 circuits completed, 0 were unusable, 0 collapsed, and 3 timed out. For reference, your timeout cutoff is 60 seconds.
tor_server_1         | Feb 26 20:35:11.000 [warn] Guard gbt2USicebeer06b ($D75510F5C9F356554AA47B3FB2283DA479B47574) is failing an extremely large amount of circuits. This could indicate a route manipulation attack, extreme network overload, or a bug. Success counts are 99/331. Use counts are 60/60. 99 circuits completed, 0 were unusable, 0 collapsed, and 0 timed out. For reference, your timeout cutoff is 60 seconds.
tor_server_1         | Feb 26 20:35:11.000 [warn] Guard veha ($7EDDD17E812AD07C3F0C48D5B3999BA6CB55CC2C) is failing a very large amount of circuits. Most likely this means the Tor network is overloaded, but it could also mean an attack against you or potentially the guard itself. Success counts are 101/203. Use counts are 59/59. 101 circuits completed, 0 were unusable, 0 collapsed, and 3 timed out. For reference, your timeout cutoff is 60 seconds.
tor_server_1         | Feb 26 20:35:46.000 [notice] Your network connection speed appears to have changed. Resetting timeout to 60000ms after 18 timeouts and 118 buildtimes.

Can you add a screenshot of htop when node is in disconnected state (before reboot).

I had a memory leak in a process that made my Umbrel/lnd unresponsive.

Currently, 85 channels out of 200 have been disconnected. Debug log says same thing.
I know the number of disconnected channels increase overtime and never come back, so will reboot now…

Here’s screenshot of htop before rebooting.
cpu%

mem%

Lnd looks very bad… 11G virtual ram. CPU load looks bad 4.85.

Do you know the size of your lnd dB? Have you ever compacted it?

Right now, channel.db is 7.6GB. It used to be 20GB+ at maximum mostly due to dozens of channels have large footprint. Closed and reopened top 5 largest footprint channels helped me to reduce 12GB, so I’m going to continue to do the work.
Do you think the size is somewhat related to the disconnect/tor issue?

If majority of your peers are using that damn script charge-lnd, your are kinda fucked.
As I explained here, that script is affecting lots of nodes.

1 Like

What about rebalance-lnd? it conflicts with LNDg if bot are installed in the same node?

I think LNDg is enough and better. but use it punctually, not intensively.

1 Like

It’s been a week since I migrated to more powerful hardware. I see some random disconnections, but the core issue haven’t happened since then.

Well, it starts happening once channel.db is around 14GB. I decided to be a hybrid node with help of this article, and it works pretty well so far.

Read this

1 Like

I think a number of things influence it.

  • channel.db > 10gb is too much to handle for a RPi. The 8gb RAM can do, but hit’s bottlenecks, too, which is effecting connectivity
    You addressed that with hardware improvements and channel-compacting which Darthcoin referenced.
  • Tor starts acting up with connectivity issues when under heavy load, so 180 channels all via Tor for a Pi, also a lot to handle.
    You also addressed this via going Hybrid

So things should overall improve, I’d say.

I had the same problem. I have now uninstalled all the additional apps and only LND and RTL are running. Now everything works again.

2 Likes

Thanks Darth. In my case, just compacting with reboot doesn’t help much to reduce the size of DB. To achieve it, I have to close and re-open some largest footprint channels then reboot. Implementing the process to the most active channel helps me to reduce 1GB+ reduction, but within 1-2 weeks, the size back to the previous level and keep growing…it’s endless process. I can’t wait LND v0.15.

Thanks for the comment and the great guide. Yes, after going hybrid, things work perfectly.

Yes, indeed, closing and reopening the channels with lots of changes it can help.
Yes LND 0.15 is waited, but I am not so sure that we will see big improvements right now. Maybe a 20% reduction but no more.

This just happened to me, after changing color in terminal I did a reboot (not sure if I had to), and out of 50 channels only 13 came back, I did uninstall all apps and all channels got back on even without rebooting, you saved me PHEW!!

1 Like

I had this exact same issue as well. Looking at ‘top’ the tor process was going insane on CPU and mem. The only app that I have installed besides RTL is the Samourai Dojo - and I let the run to remix which uses tor as well.

I will see if this issue doesn’t resurface as long as I am avoiding using Samourai.

1 Like

All,

I have the same issue - and I am thoroughly convinced it is a TOR issue.


tor | Aug 16 16:15:34.000 [notice] Tried for 120 seconds to get a connection to [scrubbed]:9736. Giving up. (waiting for circuit)
tor | Aug 16 16:15:35.000 [notice] Tried for 120 seconds to get a connection to [scrubbed]:9735. Giving up. (waiting for circuit)
tor | Aug 16 16:15:35.000 [notice] Tried for 120 seconds to get a connection to [scrubbed]:9735. Giving up. (waiting for circuit)
tor | Aug 16 16:15:35.000 [notice] Tried for 120 seconds to get a connection to [scrubbed]:9735. Giving up. (waiting for circuit)
tor | Aug 16 16:15:35.000 [notice] Tried for 120 seconds to get a connection to [scrubbed]:9735. Giving up. (waiting for circuit)
tor | Aug 16 16:15:35.000 [notice] Tried for 120 seconds to get a connection to [scrubbed]:9735. Giving up. (waiting for circuit)

tor_server_1 | Aug 16 15:41:18.000 [notice] Your system clock just jumped 4817 seconds forward; assuming established circuits no longer work.
tor_server_1 | Aug 16 15:42:30.000 [notice] No circuits are opened. Relaxed timeout for circuit 104 (a Hidden service: Uploading HS descriptor 4-hop circuit in state doing handshakes with channel state open) to 60000ms. However, it appears the circuit has timed out anyway.

I don’t know yet how to address tor through the command line. There may be a solution is providing tor with guidance.

Thank you for your views.

Best, m

Small update:
I closed a few channels - including 5satoshi (very strange fee structure, automatic fee setting?) , sphinxrouting-fceb9955a9 (300+ channels, including many 20k sats channels) - and suddenly my node seems stable again.
Also Terminal Web score jumped by several 100 places up…

Conclusion - it seems node stability can be impacted by connected nodes - something for LND developers to build resilience against…

1 Like