· 3 min read

ML From the Command Line — Classify, Embed, and Search Without Writing Code

Run sentiment analysis, generate embeddings, detect toxicity, and search documents from your terminal. Pipes into any script or CI pipeline. No Python, no API keys.

ML From the Command Line

Classify text, generate embeddings, detect toxicity, and search documents — from your terminal. No code. No Python. No API keys.

$ kjarni classify "I love this product!" --model roberta-sentiment
  Input "I love this product!"
         positive  ████████████████████   98.5%
           neutral  ░░░░░░░░░░░░░░░░░░░░    1.1%
          negative  ░░░░░░░░░░░░░░░░░░░░    0.5%
$ kjarni similarity doctor physician
  █████████████████░░░   86.0%  highly similar
  "doctor"
  "physician"

Kjarni is a single binary. Install it, run it. Models download on first use and cache locally.

Install

curl -fsSL https://kjarni.ai/install.sh | sh

No runtime, no dependencies. The binary links against libc and nothing else:

$ ldd $(which kjarni)
    linux-vdso.so.1
    libgcc_s.so.1
    libm.so.6
    libc.so.6
    /lib64/ld-linux-x86-64.so.2

Sentiment Analysis

The default model (distilbert-sentiment) does binary positive/negative classification:

$ kjarni classify "Best purchase I've made this year"
         POSITIVE  ████████████████████  100.0%
          NEGATIVE  ░░░░░░░░░░░░░░░░░░░░    0.0%

For three-class sentiment (positive/negative/neutral), use roberta-sentiment:

$ kjarni classify "It's okay I guess" --model roberta-sentiment
         positive  ██████████░░░░░░░░░░   52.9%
           neutral  ████████░░░░░░░░░░░░   38.2%
          negative  █░░░░░░░░░░░░░░░░░░░    8.9%

The model picks up on hedging — "okay I guess" is barely positive at 52.9%.

Toxicity Detection

Switch to toxic-bert for content moderation:

$ kjarni classify "i hate mondays" --model toxic-bert
            toxic  ██████████████░░░░░░   69.8%
           obscene  ░░░░░░░░░░░░░░░░░░░░    1.1%
            insult  ░░░░░░░░░░░░░░░░░░░░    0.9%
            threat  ░░░░░░░░░░░░░░░░░░░░    0.5%
     identity_hate  ░░░░░░░░░░░░░░░░░░░░    0.4%

Multi-label — each category is scored independently. A comment can be both toxic and an insult. Set a threshold (say 80%) and flag content above it.

JSON Output for Scripting

Add --format json to get structured output:

$ kjarni classify "Great service" --format json
{
  "label": "POSITIVE",
  "label_index": 1,
  "predictions": [
    {
      "label": "POSITIVE",
      "score": 0.9998435
    },
    {
      "label": "NEGATIVE",
      "score": 0.00015648185
    }
  ],
  "score": 0.9998435,
  "text": "Great service"
}

Pipe into jq for extraction:

$ kjarni classify "Great service" --format json | jq '.label'
"POSITIVE"

$ kjarni classify "Great service" --format json | jq '.predictions'
[
  { "label": "POSITIVE", "score": 0.9998435 },
  { "label": "NEGATIVE", "score": 0.00015648185 }
]

Batch Processing

Classify a file of reviews, one per line:

$ cat reviews.txt | while read -r line; do
    echo "$line$(kjarni classify "$line" --format json | jq -r '.label')"
  done
Fast shipping, great product → POSITIVE
Arrived damaged, no response from support → NEGATIVE
Best purchase I've made this year → POSITIVE

Embeddings

Generate a 384-dimensional vector from any text:

$ kjarni embed "hello world"
0.16229372 0.042872876 0.067300394 0.22431187 -0.12369463 ...

Space-separated floats — one vector, ready to store or compare.

Semantic Similarity

Compare two texts by meaning:

$ kjarni similarity doctor physician
  █████████████████░░░   86.0%  highly similar
  "doctor"
  "physician"
$ kjarni similarity doctor banana
  ██████░░░░░░░░░░░░░░   33.8%  somewhat similar
  "doctor"
  "banana"

The model knows "doctor" and "physician" mean the same thing despite sharing no letters.

Create an index from a folder of text files:

$ kjarni index create ./my-index.idx ./docs/
  Indexed 15 documents
 Index created: ./my-index.idx
  Documents: 15
  Dimension: 384
  Size: 39.52 KB

Search by meaning:

$ kjarni search ./my-index.idx "war"
  Results for "war"
    1. ./docs/romanempire.txt
       ████████████████████  100.0%
       "The Roman Empire collapsed in 476 AD after centuries of political insta…"
    2. ./docs/industrialrevolution.txt
       █████████████████░░░   87.5%
       "The Industrial Revolution began in Britain with mechanized textile prod…"
    3. ./docs/blackholes.txt
       ███████████████░░░░░   75.3%
       "Black holes form when massive stars exhaust their nuclear fuel and unde…"

The index combines BM25 keyword matching with semantic vector search. Add a reranker for more precise results:

$ kjarni search ./my-index.idx "artificial intelligence" --rerank-model minilm-l6-v2-cross-encoder
Reranking top 15 results with 'minilm-l6-v2-cross-encoder'...
  Results for "artificial intelligence"
    1. ./docs/neuralnetworks.txt
       ████████████████████  100.0%
       "Neural networks consist of interconnected layers of artificial neurons …"
    2. ./docs/renaissance.txt
       █████████░░░░░░░░░░░   43.2%
       "During the Renaissance, Florence became a center of artistic innovation…"

The reranker reads the query and each document together (cross-encoder), producing a more precise relevance ranking than embeddings alone.

Text Generation

Complete text with base language models:

$ kjarni generate "The future of AI is" --model gpt2

For instruction-following and Q&A, use chat instead.

Interactive Chat

Chat with instruct-tuned LLMs locally:

$ kjarni chat --model qwen2.5-0.5b-instruct
Kjarni Chat: qwen2.5-0.5b-instruct
Device: Cpu
Type '/help' for commands, '/quit' to exit.
> hello
Hello! How can I assist you today?

A local chatbot running from a single binary. No API key, no internet connection required after the model downloads. Models range from 490MB (qwen2.5-0.5b) to 8B parameters — pick the size that fits your hardware.

Transcription

Transcribe audio files to text using Whisper:

$ kjarni transcribe recording.wav

Supports wav, mp3, flac, and ogg formats. Auto-detects language, or specify with --language en. Add --timestamps for timed output, or --translate to translate to English.

Model Management

List all available models with download status:

$ kjarni model list
Cache: ~/.cache/kjarni
Models: 22/28 downloaded

LLM (DECODER)
   qwen2.5-0.5b-instruct            490M  Tiny logic engine...
   llama3.2-1b-instruct             1.2B  Official Meta edge model...
   llama3.2-3b-instruct             3.2B  The 3B standard...
    phi3.5-mini                      3.8B  Microsoft's 3.8B reasoning...
    llama3.1-8b-instruct             8.0B  The open source standard...

SEQ2SEQ
  ✓ flan-t5-base                     250M  General purpose instruction...
  ✓ whisper-small                    244M  OpenAI Whisper small...

EMBEDDING
  ✓ minilm-l6-v2                      22M  Fastest sentence embedding...
  ✓ mpnet-base-v2                    110M  High-quality sentence embed...

CLASSIFIER
  ✓ distilbert-sentiment              66M  Fast binary sentiment...
  ✓ roberta-sentiment                125M  3-class sentiment...
  ✓ toxic-bert                       110M  Toxic comment classifier...
  ✓ distilroberta-emotion             82M  7 emotions...

Download, inspect, or remove models:

$ kjarni model download llama3.2-1b-instruct
$ kjarni model info minilm-l6-v2
$ kjarni model remove gpt2

Filter by task or architecture:

$ kjarni model list --task chat
$ kjarni model list --task embedding
$ kjarni model list --downloaded

All Commands

$ kjarni
Kjarni: The SQLite of AI

Commands:
  classify    Classify text using a classification model
  embed       Generate embeddings for text
  similarity  Compute similarity between two texts
  index       Create or manage search indexes
  search      Search an index
  rerank      Rerank documents by relevance to a query
  generate    Generate text from a prompt
  summarize   Summarize text
  translate   Translate text between languages
  transcribe  Transcribe audio to text
  chat        Interactive chat mode
  model       Manage models (list, download, info)

Every command reads from arguments or stdin and outputs human-readable text by default or JSON with --format json. Standard UNIX tool behavior — pipe it, script it, cron it.

Practical Recipes

CI Pipeline: Scan PR Comments for Toxicity

gh pr view $PR_NUMBER --json comments -q '.comments[].body' | \
  while read -r comment; do
    score=$(kjarni classify "$comment" --model toxic-bert --format json | \
      jq '.predictions[] | select(.label == "toxic") | .score')
    if (( $(echo "$score > 0.8" | bc -l) )); then
      echo "⚠️  Toxic comment detected: $comment"
    fi
  done

Batch Classify a CSV Column

cut -d',' -f3 reviews.csv | tail -n +2 | \
  while read -r text; do
    kjarni classify "$text" --format json
  done | jq -s '.' > results.json

Quick Sentiment Check on Logs

grep "customer feedback" app.log | \
  sed 's/.*feedback: //' | \
  while read -r line; do
    echo "$(kjarni classify "$line" --format json | jq -r '.label') | $line"
  done

How It Works

The CLI is a thin wrapper around the same Rust inference engine that powers the C# NuGet package. Same models, same accuracy, same local execution. The binary is self-contained — the only system dependency is glibc.

For the full technical story, see Why I Built a Native ML Inference Engine in Rust.

Install:  curl -fsSL https://kjarni.ai/install.sh | sh
GitHub:   https://github.com/olafurjohannsson/kjarni

Next Steps