ML From the Command Line — Classify, Embed, and Search Without Writing Code
Run sentiment analysis, generate embeddings, detect toxicity, and search documents from your terminal. Pipes into any script or CI pipeline. No Python, no API keys.
ML From the Command Line
Classify text, generate embeddings, detect toxicity, and search documents — from your terminal. No code. No Python. No API keys.
$ kjarni classify "I love this product!" --model roberta-sentiment
Input "I love this product!"
✓ positive ████████████████████ 98.5%
neutral ░░░░░░░░░░░░░░░░░░░░ 1.1%
negative ░░░░░░░░░░░░░░░░░░░░ 0.5%
$ kjarni similarity doctor physician
█████████████████░░░ 86.0% highly similar
↔ "doctor"
↔ "physician"
Kjarni is a single binary. Install it, run it. Models download on first use and cache locally.
Install
curl -fsSL https://kjarni.ai/install.sh | sh
No runtime, no dependencies. The binary links against libc and nothing else:
$ ldd $(which kjarni)
linux-vdso.so.1
libgcc_s.so.1
libm.so.6
libc.so.6
/lib64/ld-linux-x86-64.so.2
Sentiment Analysis
The default model (distilbert-sentiment) does binary positive/negative classification:
$ kjarni classify "Best purchase I've made this year"
✓ POSITIVE ████████████████████ 100.0%
NEGATIVE ░░░░░░░░░░░░░░░░░░░░ 0.0%
For three-class sentiment (positive/negative/neutral), use roberta-sentiment:
$ kjarni classify "It's okay I guess" --model roberta-sentiment
✓ positive ██████████░░░░░░░░░░ 52.9%
neutral ████████░░░░░░░░░░░░ 38.2%
negative █░░░░░░░░░░░░░░░░░░░ 8.9%
The model picks up on hedging — "okay I guess" is barely positive at 52.9%.
Toxicity Detection
Switch to toxic-bert for content moderation:
$ kjarni classify "i hate mondays" --model toxic-bert
✓ toxic ██████████████░░░░░░ 69.8%
obscene ░░░░░░░░░░░░░░░░░░░░ 1.1%
insult ░░░░░░░░░░░░░░░░░░░░ 0.9%
threat ░░░░░░░░░░░░░░░░░░░░ 0.5%
identity_hate ░░░░░░░░░░░░░░░░░░░░ 0.4%
Multi-label — each category is scored independently. A comment can be both toxic and an insult. Set a threshold (say 80%) and flag content above it.
JSON Output for Scripting
Add --format json to get structured output:
$ kjarni classify "Great service" --format json
{
"label": "POSITIVE",
"label_index": 1,
"predictions": [
{
"label": "POSITIVE",
"score": 0.9998435
},
{
"label": "NEGATIVE",
"score": 0.00015648185
}
],
"score": 0.9998435,
"text": "Great service"
}
Pipe into jq for extraction:
$ kjarni classify "Great service" --format json | jq '.label'
"POSITIVE"
$ kjarni classify "Great service" --format json | jq '.predictions'
[
{ "label": "POSITIVE", "score": 0.9998435 },
{ "label": "NEGATIVE", "score": 0.00015648185 }
]
Batch Processing
Classify a file of reviews, one per line:
$ cat reviews.txt | while read -r line; do
echo "$line → $(kjarni classify "$line" --format json | jq -r '.label')"
done
Fast shipping, great product → POSITIVE
Arrived damaged, no response from support → NEGATIVE
Best purchase I've made this year → POSITIVE
Embeddings
Generate a 384-dimensional vector from any text:
$ kjarni embed "hello world"
0.16229372 0.042872876 0.067300394 0.22431187 -0.12369463 ...
Space-separated floats — one vector, ready to store or compare.
Semantic Similarity
Compare two texts by meaning:
$ kjarni similarity doctor physician
█████████████████░░░ 86.0% highly similar
↔ "doctor"
↔ "physician"
$ kjarni similarity doctor banana
██████░░░░░░░░░░░░░░ 33.8% somewhat similar
↔ "doctor"
↔ "banana"
The model knows "doctor" and "physician" mean the same thing despite sharing no letters.
Index and Search
Create an index from a folder of text files:
$ kjarni index create ./my-index.idx ./docs/
Indexed 15 documents
✓ Index created: ./my-index.idx
Documents: 15
Dimension: 384
Size: 39.52 KB
Search by meaning:
$ kjarni search ./my-index.idx "war"
Results for "war"
1. ./docs/romanempire.txt
████████████████████ 100.0%
"The Roman Empire collapsed in 476 AD after centuries of political insta…"
2. ./docs/industrialrevolution.txt
█████████████████░░░ 87.5%
"The Industrial Revolution began in Britain with mechanized textile prod…"
3. ./docs/blackholes.txt
███████████████░░░░░ 75.3%
"Black holes form when massive stars exhaust their nuclear fuel and unde…"
The index combines BM25 keyword matching with semantic vector search. Add a reranker for more precise results:
$ kjarni search ./my-index.idx "artificial intelligence" --rerank-model minilm-l6-v2-cross-encoder
Reranking top 15 results with 'minilm-l6-v2-cross-encoder'...
Results for "artificial intelligence"
1. ./docs/neuralnetworks.txt
████████████████████ 100.0%
"Neural networks consist of interconnected layers of artificial neurons …"
2. ./docs/renaissance.txt
█████████░░░░░░░░░░░ 43.2%
"During the Renaissance, Florence became a center of artistic innovation…"
The reranker reads the query and each document together (cross-encoder), producing a more precise relevance ranking than embeddings alone.
Text Generation
Complete text with base language models:
$ kjarni generate "The future of AI is" --model gpt2
For instruction-following and Q&A, use chat instead.
Interactive Chat
Chat with instruct-tuned LLMs locally:
$ kjarni chat --model qwen2.5-0.5b-instruct
Kjarni Chat: qwen2.5-0.5b-instruct
Device: Cpu
Type '/help' for commands, '/quit' to exit.
> hello
Hello! How can I assist you today?
A local chatbot running from a single binary. No API key, no internet connection required after the model downloads. Models range from 490MB (qwen2.5-0.5b) to 8B parameters — pick the size that fits your hardware.
Transcription
Transcribe audio files to text using Whisper:
$ kjarni transcribe recording.wav
Supports wav, mp3, flac, and ogg formats. Auto-detects language, or specify with --language en. Add --timestamps for timed output, or --translate to translate to English.
Model Management
List all available models with download status:
$ kjarni model list
Cache: ~/.cache/kjarni
Models: 22/28 downloaded
LLM (DECODER)
✓ qwen2.5-0.5b-instruct 490M Tiny logic engine...
✓ llama3.2-1b-instruct 1.2B Official Meta edge model...
✓ llama3.2-3b-instruct 3.2B The 3B standard...
phi3.5-mini 3.8B Microsoft's 3.8B reasoning...
llama3.1-8b-instruct 8.0B The open source standard...
SEQ2SEQ
✓ flan-t5-base 250M General purpose instruction...
✓ whisper-small 244M OpenAI Whisper small...
EMBEDDING
✓ minilm-l6-v2 22M Fastest sentence embedding...
✓ mpnet-base-v2 110M High-quality sentence embed...
CLASSIFIER
✓ distilbert-sentiment 66M Fast binary sentiment...
✓ roberta-sentiment 125M 3-class sentiment...
✓ toxic-bert 110M Toxic comment classifier...
✓ distilroberta-emotion 82M 7 emotions...
Download, inspect, or remove models:
$ kjarni model download llama3.2-1b-instruct
$ kjarni model info minilm-l6-v2
$ kjarni model remove gpt2
Filter by task or architecture:
$ kjarni model list --task chat
$ kjarni model list --task embedding
$ kjarni model list --downloaded
All Commands
$ kjarni
Kjarni: The SQLite of AI
Commands:
classify Classify text using a classification model
embed Generate embeddings for text
similarity Compute similarity between two texts
index Create or manage search indexes
search Search an index
rerank Rerank documents by relevance to a query
generate Generate text from a prompt
summarize Summarize text
translate Translate text between languages
transcribe Transcribe audio to text
chat Interactive chat mode
model Manage models (list, download, info)
Every command reads from arguments or stdin and outputs human-readable text by default or JSON with --format json. Standard UNIX tool behavior — pipe it, script it, cron it.
Practical Recipes
CI Pipeline: Scan PR Comments for Toxicity
gh pr view $PR_NUMBER --json comments -q '.comments[].body' | \
while read -r comment; do
score=$(kjarni classify "$comment" --model toxic-bert --format json | \
jq '.predictions[] | select(.label == "toxic") | .score')
if (( $(echo "$score > 0.8" | bc -l) )); then
echo "⚠️ Toxic comment detected: $comment"
fi
done
Batch Classify a CSV Column
cut -d',' -f3 reviews.csv | tail -n +2 | \
while read -r text; do
kjarni classify "$text" --format json
done | jq -s '.' > results.json
Quick Sentiment Check on Logs
grep "customer feedback" app.log | \
sed 's/.*feedback: //' | \
while read -r line; do
echo "$(kjarni classify "$line" --format json | jq -r '.label') | $line"
done
How It Works
The CLI is a thin wrapper around the same Rust inference engine that powers the C# NuGet package. Same models, same accuracy, same local execution. The binary is self-contained — the only system dependency is glibc.
For the full technical story, see Why I Built a Native ML Inference Engine in Rust.
Install: curl -fsSL https://kjarni.ai/install.sh | sh
GitHub: https://github.com/olafurjohannsson/kjarni
Next Steps
- Semantic Search in C# — use the same engine from C# code
- Sentiment Analysis in C# — classification with the NuGet package
- Build a Document Search Engine in C# — full hybrid search with indexing and reranking