Inference
Inference = using a trained model to get answers.
Analogy: A student studies for months (training). You ask them a question in an exam (inference).
Mini-task: Install Ollama or Hugging Face Transformers → run a local LLM → ask it “What’s 2+2?”. That’s inference.
Option 1: Using Ollama (Easiest, GUI + CLI)
ollama pull llama2
(This downloads the LLaMA 2 model to your computer.)
5. Run the model:
ollama run llama2
6. Ask a question:
What’s 2+2?
7. See the answer. 🎉
Option 2: Using Hugging Face Transformers (Python way)
pip install transformers torch
4. Write a small Python script (llm_test.py):
from transformers import pipeline
# Load a small local model
generator = pipeline('text-generation', model='tiiuae/falcon-7b-instruct')
# Ask a question
output = generator("What is 2+2?", max_length=10)
print(output[0]['generated_text'])
5. Run your script:
python llm_test.py
6. See the answer. 🎉
This is Awesome 👌