Tip

Check out the repository on GitHub

Check out the demo at: demo.webui.ansibleguy.net | Login: User demo, Password Ansible1337

Warning

DISCLAIMER: This is an unofficial community project! Do not confuse it with the vanilla Ansible product!

Warning

This project still in early development! DO NOT USE IN PRODUCTION!

Troubleshooting

Topology

AnsibleGuy WebUI is made of a few main components.

It will be beneficial for the troubleshooting process if we find out in which the error occurs.

ts_sys_ov


Debugging

You can enable the debug mode at the System - Config page.

If that is not possible you can alternatively set the AW_DEBUG environmental variable.

This debug mode SHOULD ONLY BE ENABLED TEMPORARILY! It could possibly open attack vectors.

You might need to restart the application to apply this setting.


Versions

You can find the versions of software packages in use at the System - Environment page.

Alternatively you can check it from the cli: python3 -m ansibleguy-webui.cli --version


Job Execution

If you want to troubleshoot a job execution, you will have to find out if it is an issue with Ansible or the WebUI system.

The Ansible execution itself can fail because of some common issues:

  • Unable to connect

    • Network issue

    • Wrong credentials supplied

    • Target system is mis-configured

  • Controller dependencies

    • Ansible needs Python Modules and in some cases Ansible Collections and Ansible Roles to function correctly

      These need to be installed and should be up-to-date.

      You can find the current versions used by your Controller system at the System - Environment page

    • If you are using Docker - you can install those dependencies using requirements-files. See Usage - Docker

  • to be continued..


Common Issues


SSH Hostkey Verification

Error: While executing Ansible you see: Host key verification failed

Problem:

  • SSH has a security feature that should keep you safe from man-in-the-middle attacks which could allow the attacker to take over your SSH account/credentials.

    See also: Ansible Docs - Hostkey Verification

  • As this security feature is important you SHOULD NOT DISABLE IT IN PRODUCTION by adding the environmental variable ANSIBLE_HOST_KEY_CHECKING=False to your jobs!

  • In production you might want to either:


Python Module not installed

Error: While executing Ansible you see: No module named '<MODULE>'

Problem:

  • Your Ansible controller system is missing a required Python3 module!

  • If you are NOT using Docker, you can install it manually using PIP: python3 -m pip install <MODULE>

    You could also find and install the module using your systems package manager: sudo apt install python3-<MODULE> (NOTE: these packages are older versions)

  • If you are using Docker, you can create and mount a requirements.txt and restart your container. See also: Usage - Docker


CSRF Failed

Error: After submitting a form you see: Forbidden (403) CSRF verification failed. Request aborted.

Problem:

  • The hostname you are using to access AW is probably not configured as/listed in AW_HOSTNAMES


SSH Shared connection

Error: While executing Ansible you see: Shared connection to <IP> closed

Problem:

  • This seems to be an issue of how Ansible calls SSH. Have seen it happen on a few systems - even with using vanilla Ansible via CLI.

    The issue is that a mux process has not terminated gracefully.

    Search for the process: ps -aux | grep mux and kill it kill -9 <PID> (the PID is the number in the second column)


SAML Issues

To get more information - you can enable its logging by adding this block to the config file:

...

SAML:
    LOGGING:
        version: 1
        formatters:
            simple:
              format: '[%(asctime)s] [%(levelname)s] [%(name)s.%(funcName)s] %(message)s'
        handlers:
            stdout:
                class: 'logging.StreamHandler'
                stream: 'ext://sys.stdout'
                level: 'DEBUG'
                formatter: 'simple'
        loggers:
            saml2:
                level: 'DEBUG'
        root:
            level: 'DEBUG'
            handlers: ['stdout']

Note: The SAML config-file is only reloaded on restart.

Common errors you might encounter:

  • CSRF validation failed - the ACS url may not be configured correctly

  • If you see a page with an error-code - you can look-up their references here

    Per example:

    • 1107 means you supplied an invalid SAML configuration or the xmlsec package is not installed

    • 1110 means you might need to check your IDPs metadata and modify the NAME_ID_FORMAT setting

    • 1113 and 1114 mean you have not or mis-configured your attribute mappings

Note: SAML testing has been done using the mocksaml.com service

Edge-Case Issues


Connection in use

Error: While starting AW you see: Connection in use: ('127.0.0.1', 8000)

Problem

  • Make sure no other process is binding to port 8000: netstat -tulpn | grep 8000

    If that is the case - you can set the AW_PORT env-var to change the port to be used.

  • The app failed last time. There is still an old process running. If this happens repeatedly - open an issue!

    You can find and kill it:

    # find it
    pgrep -f ansibleguy-webui
    netstat -tulpn | grep 8000
    ps -aux | grep ansibleguy-webui | grep -v grep
    
    # kill it
    pkill -f ansibleguy-webui
    kill -9 <PID>
    

Database is locked

Error: The Web interface shows a plain Error 500 and the console shows django.db.utils.OperationalError: database is locked

Problem:

  • I’ve encountered this issue a few times. It occurs because the SQLite database is locked by a write-operation.

    Restarting the application is the easiest way of working around it.

    If it occurs more often - please open an issue!

  • If you are running many jobs - you could try to keep a minute between their scheduled executions.


Too Many Log Files exist

Error: Job logs are currently not cleaned automatically. You may want to clean them manually periodically.

Resolution:

  • You can easily remove all log-files older than N days with this command:

MAX_LOG_AGE=7  # days
cd ~/.local/share/ansible-webui/
find -type f -mtime +${MAX_LOG_AGE} -name "*.log" -delete

Database Migration Issues

Note: This is a general guide on how to handle Django migration issues. It could also be helpful if you are running another Django app.

Error: After a version upgrade you see django.db.utils.OperationalError: no such column or even django.db.utils.OperationalError: no such table

Problem:

  • It seems the database schema was not upgraded. This is normally done automatically at application startup.

  • You can try to execute the migrations manually:

    • Stop the application

    • Enter the application context & try to upgrade

      # when running as local service-user
      su <SERVICE-USER> --login --shell /bin/bash
      
      # when running in docker
      docker exec -it ansible-webui /bin/sh
      
      # set the path to your database
      export AW_DB=<PATH-TO-YOUR-DB>
      
      # upgrade DB schema
      python3 -m ansibleguy-webui.manage migrate
      

Error: While running the database schema upgrade you see django.db.utils.OperationalError: duplicate column name or django.db.utils.OperationalError: duplicate table name

Problem:

  • This should never happen if you are running a release version (AW_ENV=prod) and did not already run migrations manually.

  • Make sure you set the AW_DB env-var correctly before running the migrations.

  • You will have to find out which migrations were already applied:

    python3 -m ansibleguy-webui.manage showmigrations

    • Or check your database manually:

      sqlite3 <PATH-TO-YOUR-DB>
      SELECT name,applied FROM django_migrations WHERE app = "aw";
      
    • You can also check the current schema of the table you see mentioned in the error message

      sqlite3 <PATH-TO-YOUR-DB>
      .table
      .schema <TABLE>
      
  • Check which migrations are available: python3 -m ansibleguy-webui.cli -a migrations.list

  • With that information you should be able to determine which migrations you can fake and which ones to apply.

    # migrations that are available and already are applied to the database - can be faked (only last one)
    python3 -m ansibleguy-webui.manage migrate --fake aw 0001_v0_0_12
    
    # you should then be able to apply the un-applied migrations
    python3 -m ansibleguy-webui.manage migrate aw 0002_v0_0_13
    

Database Startup Issue

Error: While starting AW - you see the error sqlite3.DatabaseError: database disk image is malformed

Problem:

  • The service may have been force-terminated without being able to close the database connection gracefully.

    You can try to re-/move the aw.db-shm and aw.db-wal files that can be found in the same directory as your database-file.