5xx Server Error Troubleshooting Guide

A systematic approach to diagnosing and resolving server-side HTTP errors. From unhandled exceptions to gateway timeouts, this guide covers each 5xx code with real-world solutions.

Understanding 5xx Errors

When a client receives a 5xx status code, it means the server acknowledged that it has a problem and cannot fulfill the request. Unlike 4xx errors, which indicate something wrong with the client's request, 5xx errors point squarely at the server infrastructure, application code, or the communication between services. The server knows the request was valid; it simply could not handle it.

The challenge with 5xx errors is that a single status code can have dozens of root causes. A 500 Internal Server Error might be caused by an unhandled null pointer exception, a database connection pool exhaustion, a misconfigured environment variable, a corrupted file upload, or any number of other issues. Effective troubleshooting requires a systematic approach that narrows down the layer where the failure occurs: infrastructure, network, application, or data.

This guide walks through each 5xx code with a structured diagnostic process. For a quick lookup of any HTTP status code, use the HTTP Status Code Reference tool.

500 Internal Server Error

The 500 Internal Server Error is the catch-all server error. It tells the client "something went wrong on our end, but we are not going to be more specific about what." This is both the most common and the most frustrating 5xx code to debug because it covers any unhandled server-side failure.

Common Causes

Unhandled exceptions: The most frequent cause. An application throws an error that is not caught by any error handler, and the framework returns a generic 500. This includes null reference errors, type errors, division by zero, and assertion failures.
Database errors: Connection pool exhaustion, query timeouts, deadlocks, schema mismatches after a migration, or the database server itself being unreachable.
Configuration errors: Missing or incorrect environment variables, invalid configuration files, or misconfigured service credentials that only manifest at runtime.
Dependency failures: A third-party API or internal microservice that the application depends on returns an unexpected response or is unreachable, and the error is not handled gracefully.
File system issues: Permission errors when trying to read or write files, full disk preventing log writes, or missing template files.
Memory exhaustion: The application runs out of memory (OOM) while processing a request, particularly with large file uploads, unbounded data processing, or memory leaks over time.

Diagnostic Steps

Check application logs first. The stack trace in your application logs is the single most valuable piece of information. Look at the error message, the file and line number, and the call stack. In most cases, the logs tell you exactly what went wrong.
Reproduce the error. Try to trigger the exact same request. If the error is intermittent, check whether it correlates with specific input data, time of day (traffic load), or a recent deployment.
Check recent deployments. If the error started suddenly, the most likely cause is a recent code change. Review the last few commits or releases for the root cause.
Inspect database connectivity. Run a health check query against the database. Check connection pool metrics for exhaustion. Review slow query logs for queries that might be timing out.
Verify environment variables. A missing or incorrect environment variable is a surprisingly common cause. Verify that all required variables are set in the running environment (not just your local setup).

Example: Debugging a 500 in Node.js

// Add a global error handler to catch unhandled errors
app.use((err, req, res, next) => {
  // Log the full error with stack trace
  console.error('Unhandled error:', {
    message: err.message,
    stack: err.stack,
    path: req.path,
    method: req.method,
    query: req.query,
    timestamp: new Date().toISOString(),
  });

  // Return a structured error response
  res.status(500).json({
    error: 'Internal Server Error',
    requestId: req.id, // Include for support reference
  });
});

// Handle unhandled promise rejections
process.on('unhandledRejection', (reason, promise) => {
  console.error('Unhandled Rejection:', reason);
});

502 Bad Gateway

A 502 Bad Gateway means that a server acting as a gateway or proxy received an invalid response from an upstream server. This is a network-layer error that occurs between servers, not within your application code.

Common Causes

Upstream server crashed: The application server (Node.js, Python, Java, etc.) behind Nginx or a load balancer has crashed and is not responding.
Upstream server is not running: The application process was not started, failed to bind to the expected port, or was killed by the OS (OOM killer on Linux).
Proxy misconfiguration: The reverse proxy (Nginx, Apache, HAProxy) is configured to forward requests to the wrong host, port, or socket.
SSL/TLS mismatch: The proxy expects HTTP but the upstream speaks HTTPS, or vice versa. Protocol mismatches produce invalid responses.
DNS resolution failure: The proxy cannot resolve the hostname of the upstream server, common in containerized environments where service names change during deployments.

Diagnostic Steps

Check if the upstream server is running. SSH into the application server and verify the process is alive: systemctl status your-app or docker ps.
Test the upstream directly. Bypass the proxy and make a request directly to the application server: curl http://localhost:3000/health.
Check proxy error logs. Nginx logs to /var/log/nginx/error.log by default. Look for "upstream prematurely closed connection" or "connect() failed" messages.
Verify proxy configuration. Confirm the upstream block in your Nginx config points to the correct host and port where your application is actually listening.
Check network connectivity between servers. In cloud environments, security groups, network ACLs, or firewall rules may block traffic between the proxy and application servers.

503 Service Unavailable

A 503 Service Unavailable indicates that the server is temporarily unable to handle the request. Unlike 500 (which implies something broke unexpectedly), 503 communicates a known, usually temporary condition such as overload or scheduled maintenance.

Common Causes

Server overload: The server is receiving more requests than it can handle. This can be caused by traffic spikes, DDoS attacks, or insufficient scaling.
Scheduled maintenance: The server is intentionally taken offline for updates, migrations, or other maintenance tasks.
Resource exhaustion: CPU, memory, or file descriptor limits have been reached, preventing the server from accepting new connections.
Dependency unavailability: A critical dependency (database, cache, external API) is down, and the server cannot function without it.
Deployment in progress: During rolling deployments, old instances are being shut down and new ones have not yet passed health checks.

Diagnostic Steps

Check server resource utilization. Use top, htop, or cloud monitoring dashboards to check CPU, memory, and disk usage.
Check for recent scaling events. If you use auto-scaling, check whether the scaling policy has been triggered and whether new instances are being provisioned.
Look for the Retry-After header. A well-configured 503 response includes a Retry-After header that tells the client when to try again. Check if your application sets this header.
Review load balancer health checks. Instances failing health checks are removed from the load balancer, reducing capacity and potentially causing more 503s in a cascading failure.

Best Practice: Graceful Degradation

// Return 503 with Retry-After during maintenance or overload
app.use((req, res, next) => {
  if (isMaintenanceMode()) {
    res.set('Retry-After', '300'); // Retry after 5 minutes
    return res.status(503).json({
      error: 'Service Unavailable',
      message: 'We are performing scheduled maintenance.',
      retryAfter: 300,
    });
  }
  next();
});

504 Gateway Timeout

A 504 Gateway Timeout occurs when a server acting as a gateway or proxy does not receive a timely response from the upstream server. The key distinction from 502 is that with 504, the upstream server is reachable but did not respond within the configured timeout period.

Common Causes

Slow backend processing: The application is taking too long to generate the response. This is often caused by expensive database queries, external API calls, or complex computations.
Database query timeout: A query is locked waiting for another transaction, is scanning too many rows without an index, or the database server is overloaded.
External API latency: The application makes a synchronous call to an external API that is slow or unresponsive, blocking the response.
Proxy timeout too short: The proxy's timeout is configured shorter than the expected response time for certain endpoints (e.g., report generation, file processing).
Network latency: High latency between the proxy and application servers, particularly in multi-region or cross-cloud deployments.

Diagnostic Steps

Identify which requests are timing out. Check access logs for requests with long response times. Look for patterns: specific endpoints, certain users, or particular input sizes.
Check slow query logs. Enable and review database slow query logs to find queries that take longer than the proxy timeout.
Review proxy timeout configuration. In Nginx, check proxy_read_timeout, proxy_connect_timeout, and proxy_send_timeout. Increase them for endpoints that legitimately need more time.
Profile the application. Use profiling tools (Node.js: clinic.js; Python: cProfile; Java: async-profiler) to identify slow code paths.
Consider asynchronous processing. For long-running operations, return a 202 Accepted immediately and process the work in the background, allowing the client to poll for results.

General Troubleshooting Checklist

When you encounter any 5xx error, work through this checklist in order. The goal is to quickly narrow down the layer where the failure occurs:

Check application logs for stack traces and error messages
Check infrastructure metrics (CPU, memory, disk, network)
Check recent deployments and configuration changes
Check dependency health (database, cache, external APIs)
Check proxy/load balancer logs for upstream connection errors
Check DNS resolution between services
Check TLS certificates for expiration or misconfiguration
Reproduce with a minimal request to isolate the trigger

For a complete reference of all HTTP status codes including the full 5xx range, see our HTTP Status Code Cheat Sheet. To look up any individual code with code snippets and RFC references, use the HTTP Status Code Reference tool.

More DevPane Tools

5xx Server Error Troubleshooting Guide

A systematic approach to diagnosing and resolving server-side HTTP errors. From unhandled exceptions to gateway timeouts, this guide covers each 5xx code with real-world solutions.

Understanding 5xx Errors

This guide walks through each 5xx code with a structured diagnostic process. For a quick lookup of any HTTP status code, use the HTTP Status Code Reference tool.

500 Internal Server Error

Common Causes

Unhandled exceptions: The most frequent cause. An application throws an error that is not caught by any error handler, and the framework returns a generic 500. This includes null reference errors, type errors, division by zero, and assertion failures.
Database errors: Connection pool exhaustion, query timeouts, deadlocks, schema mismatches after a migration, or the database server itself being unreachable.
Configuration errors: Missing or incorrect environment variables, invalid configuration files, or misconfigured service credentials that only manifest at runtime.
Dependency failures: A third-party API or internal microservice that the application depends on returns an unexpected response or is unreachable, and the error is not handled gracefully.
File system issues: Permission errors when trying to read or write files, full disk preventing log writes, or missing template files.
Memory exhaustion: The application runs out of memory (OOM) while processing a request, particularly with large file uploads, unbounded data processing, or memory leaks over time.

Diagnostic Steps

Check application logs first. The stack trace in your application logs is the single most valuable piece of information. Look at the error message, the file and line number, and the call stack. In most cases, the logs tell you exactly what went wrong.
Reproduce the error. Try to trigger the exact same request. If the error is intermittent, check whether it correlates with specific input data, time of day (traffic load), or a recent deployment.
Check recent deployments. If the error started suddenly, the most likely cause is a recent code change. Review the last few commits or releases for the root cause.
Inspect database connectivity. Run a health check query against the database. Check connection pool metrics for exhaustion. Review slow query logs for queries that might be timing out.
Verify environment variables. A missing or incorrect environment variable is a surprisingly common cause. Verify that all required variables are set in the running environment (not just your local setup).

Example: Debugging a 500 in Node.js

// Add a global error handler to catch unhandled errors
app.use((err, req, res, next) => {
  // Log the full error with stack trace
  console.error('Unhandled error:', {
    message: err.message,
    stack: err.stack,
    path: req.path,
    method: req.method,
    query: req.query,
    timestamp: new Date().toISOString(),
  });

  // Return a structured error response
  res.status(500).json({
    error: 'Internal Server Error',
    requestId: req.id, // Include for support reference
  });
});

// Handle unhandled promise rejections
process.on('unhandledRejection', (reason, promise) => {
  console.error('Unhandled Rejection:', reason);
});

502 Bad Gateway

Common Causes

Upstream server crashed: The application server (Node.js, Python, Java, etc.) behind Nginx or a load balancer has crashed and is not responding.
Upstream server is not running: The application process was not started, failed to bind to the expected port, or was killed by the OS (OOM killer on Linux).
Proxy misconfiguration: The reverse proxy (Nginx, Apache, HAProxy) is configured to forward requests to the wrong host, port, or socket.
SSL/TLS mismatch: The proxy expects HTTP but the upstream speaks HTTPS, or vice versa. Protocol mismatches produce invalid responses.
DNS resolution failure: The proxy cannot resolve the hostname of the upstream server, common in containerized environments where service names change during deployments.

Diagnostic Steps

Check if the upstream server is running. SSH into the application server and verify the process is alive: systemctl status your-app or docker ps.
Test the upstream directly. Bypass the proxy and make a request directly to the application server: curl http://localhost:3000/health.
Check proxy error logs. Nginx logs to /var/log/nginx/error.log by default. Look for "upstream prematurely closed connection" or "connect() failed" messages.
Verify proxy configuration. Confirm the upstream block in your Nginx config points to the correct host and port where your application is actually listening.
Check network connectivity between servers. In cloud environments, security groups, network ACLs, or firewall rules may block traffic between the proxy and application servers.

503 Service Unavailable

Common Causes

Server overload: The server is receiving more requests than it can handle. This can be caused by traffic spikes, DDoS attacks, or insufficient scaling.
Scheduled maintenance: The server is intentionally taken offline for updates, migrations, or other maintenance tasks.
Resource exhaustion: CPU, memory, or file descriptor limits have been reached, preventing the server from accepting new connections.
Dependency unavailability: A critical dependency (database, cache, external API) is down, and the server cannot function without it.
Deployment in progress: During rolling deployments, old instances are being shut down and new ones have not yet passed health checks.

Diagnostic Steps

Check server resource utilization. Use top, htop, or cloud monitoring dashboards to check CPU, memory, and disk usage.
Check for recent scaling events. If you use auto-scaling, check whether the scaling policy has been triggered and whether new instances are being provisioned.
Look for the Retry-After header. A well-configured 503 response includes a Retry-After header that tells the client when to try again. Check if your application sets this header.
Review load balancer health checks. Instances failing health checks are removed from the load balancer, reducing capacity and potentially causing more 503s in a cascading failure.

Best Practice: Graceful Degradation

// Return 503 with Retry-After during maintenance or overload
app.use((req, res, next) => {
  if (isMaintenanceMode()) {
    res.set('Retry-After', '300'); // Retry after 5 minutes
    return res.status(503).json({
      error: 'Service Unavailable',
      message: 'We are performing scheduled maintenance.',
      retryAfter: 300,
    });
  }
  next();
});

504 Gateway Timeout

Common Causes

Slow backend processing: The application is taking too long to generate the response. This is often caused by expensive database queries, external API calls, or complex computations.
Database query timeout: A query is locked waiting for another transaction, is scanning too many rows without an index, or the database server is overloaded.
External API latency: The application makes a synchronous call to an external API that is slow or unresponsive, blocking the response.
Proxy timeout too short: The proxy's timeout is configured shorter than the expected response time for certain endpoints (e.g., report generation, file processing).
Network latency: High latency between the proxy and application servers, particularly in multi-region or cross-cloud deployments.

Diagnostic Steps

Identify which requests are timing out. Check access logs for requests with long response times. Look for patterns: specific endpoints, certain users, or particular input sizes.
Check slow query logs. Enable and review database slow query logs to find queries that take longer than the proxy timeout.
Review proxy timeout configuration. In Nginx, check proxy_read_timeout, proxy_connect_timeout, and proxy_send_timeout. Increase them for endpoints that legitimately need more time.
Profile the application. Use profiling tools (Node.js: clinic.js; Python: cProfile; Java: async-profiler) to identify slow code paths.
Consider asynchronous processing. For long-running operations, return a 202 Accepted immediately and process the work in the background, allowing the client to poll for results.

General Troubleshooting Checklist

When you encounter any 5xx error, work through this checklist in order. The goal is to quickly narrow down the layer where the failure occurs:

Check application logs for stack traces and error messages
Check infrastructure metrics (CPU, memory, disk, network)
Check recent deployments and configuration changes
Check dependency health (database, cache, external APIs)
Check proxy/load balancer logs for upstream connection errors
Check DNS resolution between services
Check TLS certificates for expiration or misconfiguration
Reproduce with a minimal request to isolate the trigger

5xx Server Error Troubleshooting Guide

Understanding 5xx Errors

500 Internal Server Error

Common Causes

Diagnostic Steps

Example: Debugging a 500 in Node.js

502 Bad Gateway

Common Causes

Diagnostic Steps

503 Service Unavailable

Common Causes

Diagnostic Steps

Best Practice: Graceful Degradation

504 Gateway Timeout

Common Causes

Diagnostic Steps

General Troubleshooting Checklist

Further Reading

Related Articles

More DevPane Tools

5xx Server Error Troubleshooting Guide

Understanding 5xx Errors

500 Internal Server Error

Common Causes

Diagnostic Steps

Example: Debugging a 500 in Node.js

502 Bad Gateway

Common Causes

Diagnostic Steps

503 Service Unavailable

Common Causes

Diagnostic Steps

Best Practice: Graceful Degradation

504 Gateway Timeout

Common Causes

Diagnostic Steps

General Troubleshooting Checklist

Further Reading

Related Articles

More DevPane Tools