Importance of Reference Counters (krefs) For Kernel Objects in Linux

Vishnu Santhosh

Published Nov 30, 2025

When you are writing code to handle concurrency, you might think:

“But I never free the object while it’s still in use. I’m careful.”

The thing is:

In concurrent code, “I’m careful” is not a strategy.

If your object can be shared, passed around, enqueued, handed to another thread, or looked up in a table… …then without ref counts, you are relying on human memory to enforce object lifetime.

And that does not scale.

This is exactly where krefs come in.

struct kref, was created to provide a simple, and hopefully failproof method of adding proper reference counting to any kernel data structure.

You embed this reference counter inside your object:

struct my_data {
    /* ... your fields ... */
    struct kref refcount;
    /* ... more fields ... */
};

That’s it.

It can live anywhere in the struct.

Then, right after you allocate the object:

struct my_data *data;

data = kmalloc(sizeof(*data), GFP_KERNEL);
if (!data)
    return -ENOMEM;

kref_init(&data->refcount);

That call to kref_init() sets the reference count to 1.

From this point on, you no longer “guess” who owns the object.

Instead, you follow a simple discipline:

Whoever holds a usable pointer has a reference.
When you share that pointer, you increment the refcount.
When you’re done, you decrement it.
And when the count hits zero… one well-defined cleanup function runs and frees the object.

No more “I hope this isn’t freed yet.”

It’s either referenced… or it’s gone.

And the API is tiny:

kref_init() – start with 1 ref
kref_get() – add a reference
kref_put() – drop a reference, and maybe free

But the magic is not in the functions.

It’s in the rules you follow around them.

Core Rules to Follow:

Rule 1: If you hand off a pointer, you must get a ref first

If you make a non-temporary copy of a pointer, especially one that can be:

Stored somewhere, or
Used by another thread, or
Enqueued on a list or queue,

you must call:

kref_get(&data->refcount);

before handing it off.

Why “before”?

Because the moment you pass that pointer away, you’ve promised:

“This object will stay alive at least until that other code drops its ref.”

Instead, if you do:

task = kthread_run(more_data_handling, data, "more_data");
if (!IS_ERR(task))
    kref_get(&data->refcount);   // BAD: after handoff

you’ve introduced a race: the new thread might already be using or dropping the pointer before you’ve bumped the refcount.

So the pattern should be:

kref_get(&data->refcount);      // take extra ref for the new thread
task = kthread_run(more_data_handling, data, "more_data");
if (IS_ERR(task)) {
    kref_put(&data->refcount, data_release);
    return -ENOMEM;
}

Now the new thread owns that extra reference. When it’s done, it calls:

kref_put(&data->refcount, data_release);

…and your original code later does its own kref_put().

When the last one hits zero, and only then, data_release() runs and frees the memory.

Rule 2: When you’re done, you must call kref_put()

Whenever your code is finished with an object, you call:

Recommended by LinkedIn

Mystery of 100ms API response delay

Digvijay Singh 3 years ago

Part 1: Scaling My First Spring Boot API to Achieve…

Prateek Jain 1 year ago

Well, it happens: a deadlock.

Oleg Galimov 2 years ago

kref_put(&data->refcount, data_release);

If you’re not the last one, nothing is freed.
If you are the last one, data_release() runs.

A typical release function looks like this:

void data_release(struct kref *ref)
{
    struct my_data *data = container_of(ref, struct my_data, refcount);
    kfree(data);
}

This is the only place that kfree() should ever be called for that object.

Everywhere else, you just call kref_put() and trust the refcount.

Rule 3: If you don’t already have a valid ref, you must serialize get vs. put

This one is the trickiest, and it’s where people usually get bitten.

If you don’t already hold a valid reference, you must protect the kref_get() with a lock while finding the object - otherwise the object might disappear during lookup.\

Imagine This Scenario:

We have one global pointer to a shared object:

struct my_data {
    struct kref refcount;
    int value;
};

static struct my_data *global_obj;
static DEFINE_MUTEX(obj_lock);

Only one object.

No lists.

No threads.

Just a global pointer that can be set to NULL.

What we want to do

We want a function that:

Finds the object
Gets a reference to it (so it cannot be freed while in use)
Returns it

Let’s try the WRONG version first.

struct my_data *get_global_obj_wrong(void)
{
    struct my_data *obj = global_obj;   // We grab the pointer…
    if (obj)
        kref_get(&obj->refcount);      // …then try to get a ref

    return obj;
}

Looks harmless, but here's the race:

The problem:

You looked up the pointer without ensuring it stayed alive.

At that moment, you did not own a ref yet → Rule #3 violated.

CORRECT method - Protect Lookup + Get Together

You must hold a lock during the entire critical action:

struct my_data *get_global_obj(void)
{
    struct my_data *obj = NULL;

    mutex_lock(&obj_lock);
    if (global_obj) {
        obj = global_obj;
        kref_get(&obj->refcount);   // Safe: cannot be freed right now
    }
    mutex_unlock(&obj_lock);

    return obj;
}

Now no one can:

drop the last ref
free the object

between lookup and increment.

So when you return obj, you definitely own a reference.

And when you’re done?

void put_global_obj(struct my_data *obj)
{
    kref_put(&obj->refcount, data_release);
}

So, let’s recap the mental model:

Your kernel objects aren’t “owned” by one magical place.
They’re owned by whoever holds a reference.
kref_get() to extend it.
kref_put() ends it and when the last one drops, your release function runs and only then is the object really gone.

If you follow the three rules…

Get a ref before you hand off a pointer,
Put your ref when you’re done,
Serialize lookup + get when you don’t already hold a ref,

…you dramatically reduce use-after-free bugs, weird crashes, and subtle race conditions.

Now here’s your challenge:

Look at one subsystem you’ve worked on - kernel or user space - and ask yourself:

“Where am I secretly relying on people to remember object lifetimes instead of using refcounts?”

Alexander Atanasov 5mo

yeah, i've seen this in person.

1 Reaction

See more comments

To view or add a comment, sign in

Importance of Reference Counters (krefs) For Kernel Objects in Linux

Vishnu Santhosh

Core Rules to Follow:

Rule 1: If you hand off a pointer, you must get a ref first

Rule 2: When you’re done, you must call kref_put()

Recommended by LinkedIn

Rule 3: If you don’t already have a valid ref, you must serialize get vs. put

What we want to do

More articles by Vishnu Santhosh

Others also viewed

The day I created a dependency to fix my dependencies

[Java][JVM Logs][GC Logs][G1GC] Monday with JVM logs - G1GC Stop-the-world phases

CancellationToken in C#

[Java][JVM][Tuning][Profiling][G1] Can G1GC cause outage?

Memory management using Smart Pointers in C++ - Part 1

HackTheBox Guardian Writeup

Apache Iceberg Dev Mailing List – Weekly Digest (Aug 9 – 15, 2025)

What Really Happens When You Run a Program?

The case of the successful shell script. Part 1.

Explore content categories

Core Rules to Follow:

Rule 1: If you hand off a pointer, you must get a ref first

Rule 2: When you’re done, you must call kref_put()

Recommended by LinkedIn

Rule 3: If you don’t already have a valid ref, you must serialize get vs. put

What we want to do

More articles by Vishnu Santhosh

The authencesn Bug: How a 4-Byte Tag Write Became a Linux Root Exploit

Linux D-Bus Internals: What Happens When We Send a Message

Systemd Isn’t an Init System - It’s Almost an OS Inside Your OS

CXL Explained: How Memory Is Becoming a Distributed System in Linux

eBPF: The Programmable Kernel That Looks Like a Tracing Tool

io_uring Explained: Linux Async I/O Without the Syscall Overhead

Inside the Live Update Orchestrator: How Linux Solved Kernel Updates for Running VMs

The Real Cost of Context Switching in Linux: Cache, TLB, and CPU Warmup Explained

PagedAttention: How LLM Inference Just Reinvented Unix’s Virtual Memory for GPUs

NULLFS - The Empty Filesystem That Reshapes Container Security in Linux

Others also viewed

The day I created a dependency to fix my dependencies

[Java][JVM Logs][GC Logs][G1GC] Monday with JVM logs - G1GC Stop-the-world phases

CancellationToken in C#

[Java][JVM][Tuning][Profiling][G1] Can G1GC cause outage?

Memory management using Smart Pointers in C++ - Part 1

HackTheBox Guardian Writeup

Apache Iceberg Dev Mailing List – Weekly Digest (Aug 9 – 15, 2025)

What Really Happens When You Run a Program?

The case of the successful shell script. Part 1.

Explore content categories