DeepInfra
DeepInfra provides serverless inference for hundreds of open-source models including Llama, Mistral, and Falcon, with pay-per-token pricing and an OpenAI-compatible API. No infrastructure management — just call the API and scale automatically.
Together AI
Inference API with 200+ open-source models at competitive speeds. Popular for running Llama, Mistral, and other open models at scale.