Google's OSS vision-language model
Google's open-source multimodal model combining SigLIP vision encoder with Gemma LLM. Strong at document understanding, OCR, image captioning, and visual QA. Available via HuggingFace.
Vision-language models for image understanding, captioning, visual QA, and document parsing
AIchitect's Genome scanner detects PaliGemma in your project via these signals:
transformersHF_TOKENAdd to your GitHub README
[](https://aichitect.dev/tool/paligemma)Explore the full AI landscape
See how PaliGemma fits into the bigger picture — browse all 207 tools and their relationships.