Operations
Troubleshooting

Troubleshooting

This guide covers common issues encountered when running the eTeamups Platform and provides step-by-step resolutions for each scenario.

Service Startup Failures

When one or more services fail to start, begin by checking container status and logs.

Diagnose

Check the status of all containers:

docker compose ps

View logs for the failing service:

docker compose logs <service-name>

Common Causes

Cause	Symptom	Resolution
Missing environment variables	Service exits immediately with a configuration error	Ensure `docker.env` is complete and contains all required variables. Refer to the Environment Variables documentation.
Port conflicts	Bind error in logs (`address already in use`)	Identify the conflicting process with `lsof -i:<port>` and either stop the process or change the port mapping in `docker-compose.yml`.
Dependency not ready	Connection refused errors to MongoDB or Redis	Ensure MongoDB and Redis containers are healthy before starting application services. Run `docker compose ps` and verify both show a `healthy` status.

Resolution Steps

Verify that docker.env exists and contains all required values.

Check for port conflicts:

lsof -i:27018   # MongoDB
lsof -i:6379    # Redis
lsof -i:9000    # Auth Service
lsof -i:9100    # Profile Service
lsof -i:9107    # Organisation Service
lsof -i:9102    # Media Service

Ensure infrastructure services are healthy before starting application services:

docker compose up -d mongodb redis
docker compose ps   # Wait until both show "healthy"
docker compose up -d

Database Connection Issues

Testing MongoDB Connectivity

Connect to MongoDB directly from the host:

mongosh "mongodb://admin:password@localhost:27018/eteamups?authSource=admin"

Check the MongoDB container logs for errors:

docker compose logs mongodb

Verify MongoDB is responding to commands:

docker exec eteamups-mongodb mongosh --eval "db.adminCommand('ping')"

Common MongoDB Issues

Issue	Symptom	Resolution
Authentication failure	`MongoServerError: Authentication failed`	Verify that `MONGO_INITDB_ROOT_USERNAME` and `MONGO_INITDB_ROOT_PASSWORD` in `docker.env` match the credentials used in connection strings. If you changed credentials after initial setup, you may need to remove the MongoDB volume and reinitialize: `docker compose down -v` (warning: this deletes data).
Connection refused	`ECONNREFUSED 127.0.0.1:27018`	Confirm the MongoDB container is running with `docker compose ps`. Verify that port `27018` is exposed in `docker-compose.yml`. Check that no firewall rules are blocking the port.
Database not initialized	Collections missing or empty	Run the application seed scripts or verify that the init scripts in the MongoDB container executed successfully by checking `docker compose logs mongodb`.

Redis Connection Issues

Testing Redis Connectivity

Test the Redis connection from within the container:

docker exec eteamups-redis redis-cli -a "$REDIS_PASSWORD" ping

A successful connection returns PONG.

Check Redis memory usage:

docker exec eteamups-redis redis-cli -a "$REDIS_PASSWORD" info memory

Common Redis Issues

Issue	Symptom	Resolution
Connection refused	`ECONNREFUSED` in application logs	Redis is not running or is bound to the wrong port. Check with `docker compose ps` and verify the Redis container is healthy.
Authentication error	`NOAUTH Authentication required` or `ERR invalid password`	The password configured in `docker.env` does not match the password the Redis container was initialized with. Verify `REDIS_PASSWORD` is consistent across all service configurations.
Message queue not processing	Jobs stuck in queue, workers idle	Check the message-queue worker logs with `docker compose logs message-queue`. Verify the Redis connection string used by the worker matches the running Redis instance. Restart the worker if needed: `docker compose restart message-queue`.

Docker and Container Issues

Image Pull Failures

If Docker cannot pull images from GitHub Container Registry (GHCR), re-authenticate:

echo $GITHUB_TOKEN | docker login ghcr.io -u USERNAME --password-stdin

Ensure the GITHUB_TOKEN has read:packages scope.

Out of Memory

Check current resource consumption:

docker stats

If the host is running low on memory, add swap space:

sudo fallocate -l 2G /swapfile
sudo chmod 600 /swapfile
sudo mkswap /swapfile
sudo swapon /swapfile

To make the swap permanent, add an entry to /etc/fstab:

/swapfile swap swap defaults 0 0

Container Restart Loop

When a container continuously restarts:

Check the logs for the root cause:

docker compose logs <service-name> --tail 50

Verify all required environment variables are set in docker.env.
Check resource limits – the container may be getting killed by the OOM killer. Review docker stats output.

Inspect the container’s exit code:

docker inspect <container-name> --format='{{.State.ExitCode}}'

Volume Permission Issues

If a service reports permission denied errors when accessing mounted volumes:

Check file ownership inside the container:

docker exec <container-name> ls -la /path/to/volume

Adjust ownership on the host if needed, matching the UID/GID used inside the container.

Network Issues

If services cannot communicate with each other:

List Docker networks:
```
docker network ls
```
Verify all services are attached to the same network:
```
docker network inspect <network-name>
```

Test connectivity between containers:

docker exec <container-name> wget -qO- http://<target-service>:<port>/health

SSL/TLS Certificate Issues

Verifying Certificate Files

Check that the required certificate files exist:

ls -la nginx/ssl/

The directory must contain:

fullchain.pem – The full certificate chain (server certificate plus intermediate certificates).
privkey.pem – The private key.

Generating Self-Signed Certificates for Testing

For local development and testing environments:

./scripts/generate-ssl.sh

Common SSL Issues

Issue	Symptom	Resolution
Missing certificate files	Nginx fails to start with `cannot load certificate`	Ensure `fullchain.pem` and `privkey.pem` exist in `nginx/ssl/`. Generate self-signed certificates for testing if needed.
Permission denied	Nginx cannot read certificate files	Certificates must be readable by the Nginx user. Check permissions with `ls -la nginx/ssl/` and adjust: `chmod 644 nginx/ssl/fullchain.pem` and `chmod 600 nginx/ssl/privkey.pem`.
Certificate expired	Browser shows `NET::ERR_CERT_DATE_INVALID`	Replace the expired certificate files with renewed ones and restart Nginx: `docker compose restart nginx`.
Mixed content warnings	Browser console shows mixed content errors	Ensure `BASE_URL` and `CORS_ORIGIN` in `docker.env` use `https://` rather than `http://`. All API endpoints and frontend URLs must use HTTPS in production.

Performance Issues

High Memory Usage

Monitor memory consumption across all containers:

docker stats --no-stream --format "table {{.Name}}\t{{.MemUsage}}\t{{.MemPerc}}"

Each application service is limited to 512 MB by default (see resource limits in Monitoring). If a service consistently approaches its limit, investigate for memory leaks in the application logs.

Slow API Responses

Query the apilogs collection in MongoDB to identify slow endpoints:

// Find requests with processing time greater than 1 second
db.apilogs.find({ processingTime: { $gt: 1000 } }).sort({ startTime: -1 }).limit(20).pretty()

// Average processing time by endpoint
db.apilogs.aggregate([
  { $group: { _id: "$url", avgTime: { $avg: "$processingTime" }, count: { $sum: 1 } } },
  { $sort: { avgTime: -1 } },
  { $limit: 20 }
])

MongoDB Slow Queries

Enable the MongoDB profiler to capture slow queries:

// Enable profiling for queries slower than 100ms
db.setProfilingLevel(1, { slowms: 100 })

// Review slow queries
db.system.profile.find().sort({ ts: -1 }).limit(10).pretty()

Check that indexes are properly defined for frequently queried fields.

Redis Memory Full

Check current Redis memory usage:

docker exec eteamups-redis redis-cli -a "$REDIS_PASSWORD" INFO memory

If Redis memory is approaching the maxmemory limit:

Review the eviction policy (default is allkeys-lru).
Increase maxmemory in the Redis configuration if the host has available resources.

Identify large keys consuming excessive memory:

docker exec eteamups-redis redis-cli -a "$REDIS_PASSWORD" --bigkeys

Too Many Connections / Rate Limiting

The platform enforces rate limits at two levels:

Layer	Limit	Scope
Application	100 requests per 15 minutes	Per IP, applied by Express middleware
Nginx	10 requests per second	Per IP, applied at the reverse proxy level

If legitimate traffic is being rate-limited, review and adjust the limits in the application rate limiter configuration and the Nginx limit_req_zone directives.

Common Error Codes

Status Code	Meaning	Typical Cause	Resolution
`401 Unauthorized`	Missing or invalid authentication	The request does not include a valid `Bearer` token in the `Authorization` header.	Ensure the client sends a valid access token. Obtain a new token via the `/auth/login` endpoint.
`403 Forbidden`	Token expired or insufficient permissions	The access token has expired, or the refresh token is invalid.	Use the `/auth/refresh-token` endpoint to obtain a new access token. If the refresh token is also expired, the user must log in again.
`404 Not Found`	Resource does not exist	The requested account, profile, or resource was not found, or the route is invalid.	Verify the request URL is correct. Check that the referenced resource ID exists in the database.
`429 Too Many Requests`	Rate limit exceeded	The client has sent too many requests in the allowed time window.	Wait for the rate limit window to reset before retrying. Implement exponential backoff in the client.
`500 Internal Server Error`	Unhandled server error	An unexpected error occurred in the application.	Check the service logs for the full stack trace: `docker compose logs <service-name> --tail 100`.

Log Analysis for Debugging

Application Logs

View the most recent application logs for a specific service:

docker compose logs <service-name> --tail 100 -f

Nginx Logs

Use the built-in log analysis scripts:

# View logs interactively
./scripts/view-logs.sh

# Monitor logs in real time
./scripts/log-monitor.sh -r

# Analyze errors
./scripts/log-monitor.sh -e

# Analyze performance
./scripts/log-monitor.sh -p

API Request Logs in MongoDB

Query the apilogs collection for detailed request-level debugging:

docker exec eteamups-mongodb mongosh eteamups --eval "db.apilogs.find().sort({startTime:-1}).limit(10).pretty()"

For more targeted queries, connect to MongoDB directly:

// Find all failed requests (5xx) in the last hour
db.apilogs.find({
  startTime: { $gte: new Date(Date.now() - 3600000) },
  url: { $not: /health/ }
}).sort({ startTime: -1 })

// Find requests for a specific endpoint
db.apilogs.find({ url: { $regex: /\/auth\/login/ } }).sort({ startTime: -1 }).limit(10)

Useful Diagnostic Commands

A quick reference of commands for diagnosing platform issues:

# Run the full platform health check
./scripts/health-check.sh

# Check Docker resource usage (one-time snapshot)
docker stats --no-stream

# Check disk space on the host
df -h

# Check running containers and their status
docker compose ps

# Kill processes occupying specific ports
./scripts/kill-ports.sh

# View recent API logs from MongoDB
docker exec eteamups-mongodb mongosh eteamups --eval "db.apilogs.find().sort({startTime:-1}).limit(10).pretty()"

# Inspect a specific container's configuration
docker inspect <container-name>

# Check Docker disk usage
docker system df

# View Docker network configuration
docker network ls