Inference
Bede E. Hampo

Inference

Inference = using a trained model to get answers.

  • Training = teaching the model.
  • Inference = asking it questions after it has learned.

Analogy: A student studies for months (training). You ask them a question in an exam (inference).


Article content

Mini-task: Install Ollama or Hugging Face Transformers → run a local LLM → ask it “What’s 2+2?”. That’s inference.

Option 1: Using Ollama (Easiest, GUI + CLI)

  1. Go to Ollama’s website: https://ollama.com
  2. Download and install the Ollama app for your computer (Windows or Mac).
  3. Open Ollama.
  4. Install a model (LLM) inside Ollama:

ollama pull llama2        

(This downloads the LLaMA 2 model to your computer.)

5. Run the model:

ollama run llama2        

6. Ask a question:

  • After the model starts, type:

What’s 2+2?        

7. See the answer. 🎉

  • The model will respond with 4.

Option 2: Using Hugging Face Transformers (Python way)

  1. Install Python (if not already):https://www.python.org/downloads/
  2. Open Terminal / Command Prompt.
  3. Install Transformers library:

pip install transformers torch        

4. Write a small Python script (llm_test.py):

from transformers import pipeline

# Load a small local model
generator = pipeline('text-generation', model='tiiuae/falcon-7b-instruct')

# Ask a question
output = generator("What is 2+2?", max_length=10)
print(output[0]['generated_text'])
        

5. Run your script:

python llm_test.py        

6. See the answer. 🎉

  • Should print 4.

To view or add a comment, sign in

More articles by Bede E. Hampo

Explore content categories