⚠ This tool appears inactive — no commits in 90+ days. Consider an alternative.
MultimodalOpen Source✦ Free Tier

LLaVA

Open-source multimodal LLM assistant

22,000 stars● Health 40SlowingApp Infrastructure

About

Large Language and Vision Assistant — connects a vision encoder to an LLM for instruction-following with images. OSS research model widely used as a multimodal base. Runs via Ollama.

Choose LLaVA when…

  • You want an open-source multimodal model for self-hosted deployment
  • You're doing research on vision-language instruction following
  • You need a well-documented baseline for multimodal tasks

Builder Slot

How does your AI see and understand images?Optional for most stacks

Vision-language models for image understanding, captioning, visual QA, and document parsing

Dev Tools
Not applicable
App Infra
Optional
Hybrid
Optional

Other tools in this slot:

Integrates with (1)

OllamaLLM Infrastructure
Compare →

Alternatives to consider (2)

Pricing

✦ Free tier available

Badge

Add to your GitHub README

LLaVA on AIchitect[![LLaVA](https://aichitect.dev/badge/tool/llava)](https://aichitect.dev/tool/llava)

Explore the full AI landscape

See how LLaVA fits into the bigger picture — browse all 207 tools and their relationships.

Explore graph →