How to Run Large Language Models Locally Without a GPU: A Step-by-Step Guide

Sreedeep Kumar

Published Jun 14, 2024

Running large language models (LLMs) on your local machine has never been easier, thanks to Ollama. This blog post will guide you through the process of downloading and configuring a large language model, specifically Mistral, on your laptop. No GPU required!

Step 1: Download Ollama

To get started, you'll need to download Ollama's app, which simplifies the entire process of setting up LLMs on your device.

1. Visit https://ollama.com .

2. Click on the download button for your operating system (Mac OS, Linux, or Windows).

3. Follow the installation instructions for your OS.

Step 2: Download and Run Mistral

Once you have Ollama installed, you can proceed to download the Mistral model.

1. Open your terminal (iTerm, Terminal, or any terminal emulator you prefer).

2. Run the following command to download and run Mistral:

   ollama run mistral

3. The model will start downloading. Once it's done, you can interact with it.

Step 3: Using Mistral

After the model has loaded, you can start using it right away. For example, you can type:

What is the capital of India?

The model will respond accordingly. To stop the process, use Ctrl+C.

Clearing the Session Context

If you want to clear the session context, use the following command:

/clear

This command resets the model's memory, so it won't remember the previous conversation.

Saving and Loading Sessions

Ollama allows you to save your current session and load it later. Use these commands:

- Save the current session:

Recommended by LinkedIn

Phi 2 for RAG and the Emergence of Small Language…

Ariya Hidayat 2 years ago

Very Large Context Windows - Does Size Really Matter?

Justin Ruddy 2 weeks ago

Accelerated Computing Series Part 4: Smart Camera with…

Neeraj Kumar, PhD 1 year ago

  /save

- Load a saved session:

  /load

Understanding Context Length

Context length refers to the number of tokens the model can store in its memory. For example, Mistral might have a context length of 32,000 tokens. Keep in mind that the model's ability to recall earlier parts of the conversation may degrade as the context length increases.

Step 4: Model Parameters and Options

Ollama provides several parameters that you can adjust to control the model's behavior:

- Seed: Sets the seed for the random number generator to ensure consistent outputs across different runs.

- num_predict: Specifies the number of tokens to predict in the response.

- pick_from_top_k: Limits the number of tokens to select from the most likely ones.

- repeat_penalty: Prevents the model from repeating the same tokens too often.

- temperature: Adjusts the randomness of the model's predictions, making the output more or less varied.

- stop_parameters: Sets conditions to stop the model's output when a specific text phrase is encountered.

Managing Model Storage

The models are stored in a specific path on your machine. If you need to clear space, you can delete models from this directory.

Running Mistral Again

To run the Mistral model in the future, simply use:

ollama run mistral

Trying Other Models

Ollama offers a variety of models, including those for coding and multi-language support. Feel free to explore and try out different models to see which ones work best for your needs.

Running large language models locally is now more accessible thanks to Ollama. You can download, run, and manage these models on your laptop without requiring a GPU. Play around with different models and settings to find what works best for you. Enjoy the process, and feel free to share your experiences and any questions in the comments below.

How to Run Large Language Models Locally Without a GPU: A Step-by-Step Guide

Sreedeep Kumar

Recommended by LinkedIn

More articles by this author

Others also viewed

Journey to GPU-Powered Deep Learning: Installing TensorFlow on Windows 11 with RTX 3080

Your MacBook Will Kill OpenAI

Set up GPU Accelerated Tensorflow & Keras on Windows 10 with Anaconda

Running LLMs on Edge

Your Laptop Has a Supercomputer Inside. Here's How to Actually Use It.

The 16GB Frontier: Benchmarking SLMs on Legacy Consumer Hardware

GenAI n00b, Part 1

Using Free Resources to Prepare Data for Training Object Detection Models with NVIDIA TAO Toolkit

Watch#9: CPUs Are All You Need

Building a Real-Time Inference Pipeline with Apache Beam, Python, and NVIDIA GPUs

How Large Language Models Create Text Responses

How to Prevent Large Language Model Performance Degradation

How to Train Custom Language Models

How to Use Advanced Prompt Engineering for Large Language Models

How to Reduce Hallucinations in Language Models

Explore content categories

Recommended by LinkedIn

Post-Quantum Cryptography: How We Can Face Y2Q Before It Hits

Aug 10, 2025

The future doesn't wait. It's being rewritten one qubit at a time.

Jul 30, 2025

Quantum Computing: Everything You Don’t Know Between 0 and 1

Jun 23, 2025

The Open Source AI Stack: Build Powerful AI Apps for Free

Mar 17, 2025

Did Google Fake Gemini PRO Demo??

Dec 11, 2023

Expanding the Horizons of Cybersecurity

Jul 13, 2023

Maximizing Cloud Cost Efficiency

Jul 7, 2023

Unbelievable! Are We Living in the Science Fiction Future We Dreamed of?

Jan 28, 2023

Do we need Central bank's digital currency (CBDC)??

Jan 2, 2023

Others also viewed

Journey to GPU-Powered Deep Learning: Installing TensorFlow on Windows 11 with RTX 3080

Your MacBook Will Kill OpenAI

Set up GPU Accelerated Tensorflow & Keras on Windows 10 with Anaconda

Running LLMs on Edge

Your Laptop Has a Supercomputer Inside. Here's How to Actually Use It.

The 16GB Frontier: Benchmarking SLMs on Legacy Consumer Hardware

GenAI n00b, Part 1

Using Free Resources to Prepare Data for Training Object Detection Models with NVIDIA TAO Toolkit

Watch#9: CPUs Are All You Need

Building a Real-Time Inference Pipeline with Apache Beam, Python, and NVIDIA GPUs

Similar topics

How Large Language Models Create Text Responses

How to Prevent Large Language Model Performance Degradation

How to Train Custom Language Models

How to Use Advanced Prompt Engineering for Large Language Models

How to Reduce Hallucinations in Language Models

Explore content categories