LLaVA

Open-source multimodal LLM assistant

⭐ 22,000 stars● Health 40 — SlowingApp Infrastructure

About

Large Language and Vision Assistant — connects a vision encoder to an LLM for instruction-following with images. OSS research model widely used as a multimodal base. Runs via Ollama.

Choose LLaVA when…

•You want an open-source multimodal model for self-hosted deployment
•You're doing research on vision-language instruction following
•You need a well-documented baseline for multimodal tasks

Builder Slot

How does your AI see and understand images?Optional for most stacks

Vision-language models for image understanding, captioning, visual QA, and document parsing

Dev Tools

Not applicable

App Infra

Optional

Hybrid

Optional

Other tools in this slot:

Fal.ai Moondream PaliGemma Pixtral Qwen-VL InternVL2

Integrates with (1)

OllamaLLM Infrastructure

Compare →

Alternatives to consider (2)

Moondreamcompare →InternVL2compare →

Pricing

✦ Free tier available

Badge

Add to your GitHub README

[![LLaVA](https://aichitect.dev/badge/tool/llava)](https://aichitect.dev/tool/llava)

Explore the full AI landscape

See how LLaVA fits into the bigger picture — browse all 207 tools and their relationships.

Explore graph →