DEV Community

# llamacpp

Posts

👋 Sign in for the ability to sort posts by relevant, latest, or top.
Benchmarking the Claude Agent SDK on a local LLM: Haiku and Sonnet tier performance

Benchmarking the Claude Agent SDK on a local LLM: Haiku and Sonnet tier performance

Comments
6 min read
Qwen 3.6 27B and 35B MTP vs Standard on 16GB GPU

Qwen 3.6 27B and 35B MTP vs Standard on 16GB GPU

Comments
8 min read
GGUF Quantization Explained: Q4_K_M vs Q5_K_M vs Q8 — Which to Pick (2026)

GGUF Quantization Explained: Q4_K_M vs Q5_K_M vs Q8 — Which to Pick (2026)

Comments
4 min read
Self-Hosted AI Agent Systems: Why Local Inference Matters More Than You Think

Self-Hosted AI Agent Systems: Why Local Inference Matters More Than You Think

Comments
4 min read
Fixing Qwen 3.6 4090 llama.cpp Bug: 18 tok/s on My RTX 4090

Fixing Qwen 3.6 4090 llama.cpp Bug: 18 tok/s on My RTX 4090

Comments
8 min read
First Words: LLM Inference on RISC-V

First Words: LLM Inference on RISC-V

Comments
9 min read
Running a 70B LLM on Pure RISC-V: The MilkV Pioneer Deployment Journey

Running a 70B LLM on Pure RISC-V: The MilkV Pioneer Deployment Journey

Comments
17 min read
Ollama vs llama.cpp vs vLLM: Which Should You Use in 2026?

Ollama vs llama.cpp vs vLLM: Which Should You Use in 2026?

Comments 1
5 min read
llama.cppの設定で8GBの性能が5倍変わる — 主要オプションの最適値を出した

llama.cppの設定で8GBの性能が5倍変わる — 主要オプションの最適値を出した

Comments
4 min read
Parameter Count Is the Worst Way to Pick a Model on 8GB VRAM

Parameter Count Is the Worst Way to Pick a Model on 8GB VRAM

Comments
5 min read
How to Run Gemma 4 Locally With Ollama, llama.cpp, and vLLM

How to Run Gemma 4 Locally With Ollama, llama.cpp, and vLLM

2
Comments 1
9 min read
👋 Sign in for the ability to sort posts by relevant, latest, or top.