MultimodalOpen Source✦ Free Tier

PaliGemma

Google's OSS vision-language model

⭐ 3,200 starsApp Infrastructure

About

Google's open-source multimodal model combining SigLIP vision encoder with Gemma LLM. Strong at document understanding, OCR, image captioning, and visual QA. Available via HuggingFace.

Choose PaliGemma when…

•You need strong OCR and document understanding capabilities
•You prefer Google's model family and research provenance
•You want a well-maintained open-weight model from a major lab

Builder Slot

How does your AI see and understand images?Optional for most stacks

Vision-language models for image understanding, captioning, visual QA, and document parsing

Dev Tools

Not applicable

App Infra

Optional

Hybrid

Optional

Other tools in this slot:

Fal.ai Moondream LLaVA Pixtral Qwen-VL InternVL2

Stack Genome Detection

AIchitect's Genome scanner detects PaliGemma in your project via these signals:

pip packages

transformers

env vars

HF_TOKEN

Alternatives to consider (1)

Qwen-VLcompare →

Pricing

✦ Free tier available

Badge

Add to your GitHub README

[![PaliGemma](https://aichitect.dev/badge/tool/paligemma)](https://aichitect.dev/tool/paligemma)

Explore the full AI landscape

See how PaliGemma fits into the bigger picture — browse all 207 tools and their relationships.

Explore graph →