1. Purpose

This provides operational steps to:

Check system status remotely

Recover from common failure states

Roll back each hardening change if required

2. Remote availability checks (from Rowe or anywhere on tailnet)

2.1 SSH via Tailscale

ssh USER@<TAILSCALE IP ADDRESS>

2.2 Home Assistant

On-site LAN: http://<LOCAL IP ADDRESS>:8123

Via Tailscale/MagicDNS (if configured): http://<Tailscale MagicDNS>:8123

3. On-Pi quick status snapshot (first action after login)

findmnt / -o SOURCE,FSTYPE,OPTIONS

systemctl is-active jf-remount-root-rw
systemctl is-active docker
systemctl is-active tailscaled

docker ps

HA local health: 401 is OK

curl -s -o /dev/null -w "HTTP %{http_code}\n" --max-time 5 http://127.0.0.1:8123/api/

journalctl -t jf_ha_healthcheck --no-pager | tail -n 50

4. Normal recovery actions (least to most disruptive)

4.1 Restart Home Assistant container only

docker restart homeassistant
docker logs --tail 150 homeassistant

4.2 Restart Docker (brings all containers back)

sudo systemctl restart docker
docker ps

4.3 Restart Tailscale

sudo systemctl restart tailscaled
tailscale status

4.4 Soft reboot (preferred reboot method)

sudo reboot

5. Failure playbooks

5.1 Pi reachable on LAN but not on Tailscale

Actions:

SSH on-site using LAN IP, then restart tailscaled and confirm DNS.

ssh USER@<LOCAL IP ADDRESS>
sudo systemctl restart tailscaled
tailscale status

getent hosts controlplane.tailscale.com
cat /etc/resolv.conf

If DNS looks wrong, confirm it is NetworkManager-generated (not Tailscale-generated).

5.2 Docker failed / containers not running

sudo systemctl restart docker
systemctl status docker --no-pager | sed -n '1,120p'
docker ps -a
docker logs --tail 200 homeassistant
docker logs --tail 200 mosquitto

Note:

Docker start-limits are disabled; it should not remain stuck failed.

5.3 Home Assistant is down but Linux is fine

First:

docker restart homeassistant

Then:

curl -s -o /dev/null -w "HTTP %{http_code}\n" --max-time 5 http://127.0.0.1:8123/api/

docker logs --tail 250 homeassistant

Also check the healthcheck actions:

journalctl -t jf_ha_healthcheck --no-pager | tail -n 80

5.4 Root filesystem came up RO again

Check:

findmnt / -o OPTIONS
dmesg | grep -i -E "ext4|journal|I/O error|read-only|uas|usb" | tail -n 160

Immediate recovery (if SSH is possible):

sudo mount -o remount,rw /
findmnt / -o OPTIONS

If RO events repeat, treat as a storage/power/USB transport issue to investigate separately.

6. Automated recovery behaviour (what happens without you)

Hardware watchdog reboots the host if Linux stops responding (systemd watchdog feed fails).

HA healthcheck runs every minute:

  • restart HA container after 3 consecutive failures (10 min cooldown)
  • reboot host after ~15 minutes of continuous failure (with restart grace period + reboot cooldown + loop protection)

7. Rollback procedures

7.1 Undo Tailscale DNS preference

sudo tailscale set --accept-dns=true
sudo systemctl restart tailscaled

7.2 Remove Docker override

sudo rm -f /etc/systemd/system/docker.service.d/override.conf
sudo systemctl daemon-reload
sudo systemctl restart docker

7.3 Remove container restart policies

docker update --restart no homeassistant mosquitto

7.4 Disable/remove the root remount helper

sudo systemctl disable --now jf-remount-root-rw.service
sudo rm -f /etc/systemd/system/jf-remount-root-rw.service
sudo systemctl daemon-reload

7.5 Disable/remove HA healthcheck (timer, service, script, state)

sudo systemctl disable --now jf-ha-healthcheck.timer

sudo rm -f /etc/systemd/system/jf-ha-healthcheck.timer
sudo rm -f /etc/systemd/system/jf-ha-healthcheck.service
sudo rm -f /usr/local/sbin/jf_ha_healthcheck.sh
sudo rm -rf /var/lib/jf_ha_healthcheck

sudo systemctl daemon-reload

7.6 Disable watchdog

Actions:

Remove dtparam=watchdog=on from the boot config file.

Remove RuntimeWatchdogSec / RebootWatchdogSec from /etc/systemd/system.conf.

Reboot.

8. Periodic maintenance checks (monthly)


Look for USB resets / IO errors that often precede RO boots

dmesg | grep -i -E "I/O error|reset SuperSpeed|usb reset|uas|ext4|journal|read-only" | tail -n 200

Confirm watchdog armed

systemctl show -p RuntimeWatchdogUSec -p RebootWatchdogUSec

Confirm HA healthcheck running

systemctl list-timers --all | grep jf-ha-healthcheck
journalctl -t jf_ha_healthcheck --no-pager | tail -n 60

<< Hardening the Raspberry Pi | | Backing up and Restoring the Raspberry Pi >>      |Table of Contents>


Page last modified on February 23, 2026, at 12:28 pm