1. Purpose
This provides operational steps to:
Check system status remotely
Recover from common failure states
Roll back each hardening change if required
2. Remote availability checks (from Rowe or anywhere on tailnet)
2.1 SSH via Tailscale
ssh USER@<TAILSCALE IP ADDRESS>
2.2 Home Assistant
On-site LAN: http://<LOCAL IP ADDRESS>:8123
Via Tailscale/MagicDNS (if configured): http://<Tailscale MagicDNS>:8123
3. On-Pi quick status snapshot (first action after login)
findmnt / -o SOURCE,FSTYPE,OPTIONS
systemctl is-active jf-remount-root-rw
systemctl is-active docker
systemctl is-active tailscaled
docker ps
HA local health: 401 is OK
curl -s -o /dev/null -w "HTTP %{http_code}\n" --max-time 5 http://127.0.0.1:8123/api/
journalctl -t jf_ha_healthcheck --no-pager | tail -n 50
4. Normal recovery actions (least to most disruptive)
4.1 Restart Home Assistant container only
docker restart homeassistant docker logs --tail 150 homeassistant
4.2 Restart Docker (brings all containers back)
sudo systemctl restart docker docker ps
4.3 Restart Tailscale
sudo systemctl restart tailscaled tailscale status
4.4 Soft reboot (preferred reboot method)
sudo reboot
5. Failure playbooks
5.1 Pi reachable on LAN but not on Tailscale
Actions:
SSH on-site using LAN IP, then restart tailscaled and confirm DNS.
ssh USER@<LOCAL IP ADDRESS> sudo systemctl restart tailscaled tailscale status getent hosts controlplane.tailscale.com cat /etc/resolv.conf
If DNS looks wrong, confirm it is NetworkManager-generated (not Tailscale-generated).
5.2 Docker failed / containers not running
sudo systemctl restart docker systemctl status docker --no-pager | sed -n '1,120p' docker ps -a docker logs --tail 200 homeassistant docker logs --tail 200 mosquitto
Note:
Docker start-limits are disabled; it should not remain stuck failed.
5.3 Home Assistant is down but Linux is fine
First:
docker restart homeassistant
Then:
curl -s -o /dev/null -w "HTTP %{http_code}\n" --max-time 5 http://127.0.0.1:8123/api/
docker logs --tail 250 homeassistant
Also check the healthcheck actions:
journalctl -t jf_ha_healthcheck --no-pager | tail -n 80
5.4 Root filesystem came up RO again
Check:
findmnt / -o OPTIONS dmesg | grep -i -E "ext4|journal|I/O error|read-only|uas|usb" | tail -n 160
Immediate recovery (if SSH is possible):
sudo mount -o remount,rw / findmnt / -o OPTIONS
If RO events repeat, treat as a storage/power/USB transport issue to investigate separately.
6. Automated recovery behaviour (what happens without you)
Hardware watchdog reboots the host if Linux stops responding (systemd watchdog feed fails).
HA healthcheck runs every minute:
- restart HA container after 3 consecutive failures (10 min cooldown)
- reboot host after ~15 minutes of continuous failure (with restart grace period + reboot cooldown + loop protection)
7. Rollback procedures
7.1 Undo Tailscale DNS preference
sudo tailscale set --accept-dns=true sudo systemctl restart tailscaled
7.2 Remove Docker override
sudo rm -f /etc/systemd/system/docker.service.d/override.conf sudo systemctl daemon-reload sudo systemctl restart docker
7.3 Remove container restart policies
docker update --restart no homeassistant mosquitto
7.4 Disable/remove the root remount helper
sudo systemctl disable --now jf-remount-root-rw.service sudo rm -f /etc/systemd/system/jf-remount-root-rw.service sudo systemctl daemon-reload
7.5 Disable/remove HA healthcheck (timer, service, script, state)
sudo systemctl disable --now jf-ha-healthcheck.timer sudo rm -f /etc/systemd/system/jf-ha-healthcheck.timer sudo rm -f /etc/systemd/system/jf-ha-healthcheck.service sudo rm -f /usr/local/sbin/jf_ha_healthcheck.sh sudo rm -rf /var/lib/jf_ha_healthcheck sudo systemctl daemon-reload
7.6 Disable watchdog
Actions:
Remove dtparam=watchdog=on from the boot config file.
Remove RuntimeWatchdogSec / RebootWatchdogSec from /etc/systemd/system.conf.
Reboot.
8. Periodic maintenance checks (monthly)
Look for USB resets / IO errors that often precede RO boots dmesg | grep -i -E "I/O error|reset SuperSpeed|usb reset|uas|ext4|journal|read-only" | tail -n 200 Confirm watchdog armed systemctl show -p RuntimeWatchdogUSec -p RebootWatchdogUSec Confirm HA healthcheck running systemctl list-timers --all | grep jf-ha-healthcheck journalctl -t jf_ha_healthcheck --no-pager | tail -n 60
<< Hardening the Raspberry Pi | | Backing up and Restoring the Raspberry Pi >> |Table of Contents>
