silverbullet-notes/docs/06-troubleshooting.md
2026-01-25 00:20:24 +00:00

7.6 KiB

Troubleshooting Guide

Last updated: 2026-01-05

This guide provides solutions to common issues encountered in this Docker-based infrastructure.

Issue: Container is restarting or won't start

Symptoms:

  • docker ps shows the container is restarting or exited.
  • docker-compose up -d command fails with an error.

Diagnosis:

  1. Check the logs: The first step is always to check the container's logs.

    docker-compose logs -f <service-name>
    

    Look for error messages, stack traces, or any indication of what might be wrong.

  2. Check dependencies: If the container depends on other services (e.g., a database), ensure those services are running and healthy.

    docker-compose ps
    
  3. Check configuration:

    • Environment variables: Ensure all required environment variables are set correctly in the .env file or docker-compose.yml.
    • Volumes: Verify that all volume paths are correct and that the files and directories on the host have the correct permissions. The user running the Docker container (often specified with PUID and PGID) needs to have read and write access to the volume paths.
    • Ports: Check for port conflicts. If another service on the host is using the same port, the container will fail to start. Use sudo lsof -i -P -n | grep LISTEN to check for listening ports.

Resolution:

  • Once the root cause is identified from the logs or configuration check, address the issue. This may involve:
    • Correcting an environment variable.
    • Fixing file permissions on a volume.
    • Changing a port mapping.
    • Restarting a dependency.
  • After applying the fix, try starting the container again:
    docker-compose up -d --force-recreate <service-name>
    

Issue: 502 Bad Gateway from Traefik

Symptoms:

  • Accessing a service through its domain (e.g., https://books.3ddbrewery.com) results in a "502 Bad Gateway" error from Traefik.

Diagnosis:

  1. Check the Traefik dashboard: The Traefik dashboard (if accessible) provides a wealth of information about routers, services, and middleware. Look for any errors related to the service in question.

  2. Check Traefik's logs:

    docker logs traefik
    

    Look for errors related to the service, such as "no servers found".

  3. Check the service's logs:

    docker-compose logs -f <service-name>
    

    The service itself might be crashing or unhealthy.

  4. Check network connectivity:

    • Ensure the service is connected to the traefik_proxy network in its docker-compose.yml.
    • From the Traefik container, try to ping the service's container.
      docker exec -it traefik /bin/sh
      ping <container_name>
      
  5. Check Traefik labels:

    • Ensure the traefik.http.services.<service-name>.loadbalancer.server.port label in the docker-compose.yml file is set to the correct port that the container is exposing.
    • Verify that all Traefik labels are correctly formatted.

Resolution:

  • Service not on traefik_proxy network: Add the service to the traefik_proxy network in its docker-compose.yml.
  • Incorrect port: Correct the port in the traefik.http.services.<service-name>.loadbalancer.server.port label.
  • Service not running: Troubleshoot the service using the "Container is restarting" guide above.

Issue: 404 Not Found from Traefik

Symptoms:

  • Accessing a service through its domain results in a "404 Not Found" error.

Diagnosis:

  1. Check the Traefik dashboard: Verify that a router has been created for the domain you are trying to access.
  2. Check the rule label: Ensure the traefik.http.routers.<service-name>.rule label is set to the correct Host(...).
  3. Check DNS: Make sure your DNS is correctly pointing the domain to the IP address of the Traefik server.

Resolution:

  • Incorrect rule: Correct the Host(...) rule in the docker-compose.yml file.
  • DNS issue: Correct the DNS record for the domain.

Issue: Authentication Failures

Symptoms:

  • Being unable to log in to a service that is protected by Authelia.
  • Seeing "Unauthorized" or "Forbidden" errors.

Diagnosis:

  1. Check Authelia's logs:

    docker logs authelia
    

    Look for any errors related to the authentication attempt.

  2. Check the application's logs: The application might be rejecting the authentication for some reason.

    docker-compose logs -f <service-name>
    

    In the case of books_webv2, check the backend logs for any errors related to the Remote-User header.

  3. Check the Traefik middleware: Ensure the traefik.http.routers.<service-name>.middleware label is correctly set to authelia-brewery or authelia-fails.

Resolution:

  • Restart Authelia: Sometimes, simply restarting Authelia can resolve issues.
    docker restart authelia
    
  • Check user credentials: Double-check the username and password.
  • Check Authelia configuration: Review Authelia's configuration.yml for any errors.

Issue: MariaDB/MySQL Replication Stopped

⚠️ CURRENT STATUS: As of January 2026, node database replication has been intentionally disabled. All applications connect directly to the primary server (192.168.1.251). This section is retained for reference if replication is re-enabled in the future.

Symptoms:

  • Secondary database server shows Replica_IO_Running or Replica_SQL_Running as No.
  • Seconds_Behind_Source is not 0 or shows a large number.
  • Applications using the secondary database have stale data.

Diagnosis:

  1. Check replication status on secondary server: Connect to the secondary database server using phpMyAdmin or MySQL client and run:

    SHOW REPLICA STATUS\G
    

    Or for older versions:

    SHOW SLAVE STATUS\G
    
  2. Check key fields:

    • Replica_IO_Running: Should be Yes
    • Replica_SQL_Running: Should be Yes
    • Seconds_Behind_Source: Should be 0
    • Last_Error: Should be empty - if there's an error here, it will indicate what went wrong
  3. Check primary server status:

    SHOW MASTER STATUS;
    

    Note the File and Position values.

  4. Check binary log settings: Ensure binary logging is enabled on the primary server:

    SHOW VARIABLES LIKE 'log_bin';
    

Resolution:

Common Fix - Restart Replication:

-- On secondary server
STOP REPLICA;
START REPLICA;
SHOW REPLICA STATUS\G

If there's a specific error:

  • Skip one transaction (if error is known to be safe):
    STOP REPLICA;
    SET GLOBAL SQL_SLAVE_SKIP_COUNTER = 1;
    START REPLICA;
    
    ⚠️ Warning: Only use this if you understand the error and know it's safe to skip.

If replication is completely broken:

  • Re-establish replication from current position:
    1. Get current position from primary:
      -- On primary
      SHOW MASTER STATUS;
      
    2. Reset and reconfigure replica:
      -- On secondary
      STOP REPLICA;
      CHANGE MASTER TO
          MASTER_LOG_FILE='<file from primary>',
          MASTER_LOG_POS=<position from primary>;
      START REPLICA;
      SHOW REPLICA STATUS\G
      

Prevention:

  • Monitor replication status regularly
  • Ensure both servers have sufficient disk space
  • Check network connectivity between primary and secondary servers
  • Review MariaDB error logs: /var/log/mysql/error.log