DeepInfra
DeepInfra provides serverless inference for hundreds of open-source models including Llama, Mistral, and Falcon, with pay-per-token pricing and an OpenAI-compatible API. No infrastructure management — just call the API and scale automatically.
Fireworks AI
High-performance inference API with native function calling, structured outputs, and fine-tuning for open-source models.