Replicate
Cloud platform for running thousands of open-source ML models via a simple API. Supports LLMs, image generation, audio, and video models.
Fal.ai
Developer API platform for running image, video, and audio generation models (Flux, SDXL, Whisper, and more) at low latency. Popular as a serverless GPU layer for multimodal AI apps, with a clean Python/JS SDK and pay-per-use pricing.