Skip to main content

Documentation Index

Fetch the complete documentation index at: https://wb-21fd5541-john-wbdocs-2044-rename-serverless-products.mintlify.app/llms.txt

Use this file to discover all available pages before exploring further.

A 503 error with the message “The engine is currently overloaded, please try again later” means the Serverless Inference server is experiencing high traffic and cannot process your request right now.

Why this happens

During periods of high demand, the inference engine may become temporarily overloaded. This is a transient condition that typically resolves on its own as traffic subsides.

What you can do

  1. Retry after a short delay
    • Wait a few seconds before retrying your request
    • Use exponential backoff to avoid adding to the congestion
  2. Spread out requests
    • If you’re sending many requests, consider spacing them out over time
    • Implement request queuing to smooth traffic spikes

Server Errors