Importance of Reference Counters (krefs) For Kernel Objects in Linux
When you are writing code to handle concurrency, you might think:
“But I never free the object while it’s still in use. I’m careful.”
The thing is:
In concurrent code, “I’m careful” is not a strategy.
If your object can be shared, passed around, enqueued, handed to another thread, or looked up in a table… …then without ref counts, you are relying on human memory to enforce object lifetime.
And that does not scale.
This is exactly where krefs come in.
struct kref, was created to provide a simple, and hopefully failproof method of adding proper reference counting to any kernel data structure.
You embed this reference counter inside your object:
struct my_data {
/* ... your fields ... */
struct kref refcount;
/* ... more fields ... */
};
That’s it.
It can live anywhere in the struct.
Then, right after you allocate the object:
struct my_data *data;
data = kmalloc(sizeof(*data), GFP_KERNEL);
if (!data)
return -ENOMEM;
kref_init(&data->refcount);
That call to kref_init() sets the reference count to 1.
From this point on, you no longer “guess” who owns the object.
Instead, you follow a simple discipline:
No more “I hope this isn’t freed yet.”
It’s either referenced… or it’s gone.
And the API is tiny:
But the magic is not in the functions.
It’s in the rules you follow around them.
Core Rules to Follow:
Rule 1: If you hand off a pointer, you must get a ref first
If you make a non-temporary copy of a pointer, especially one that can be:
you must call:
kref_get(&data->refcount);
before handing it off.
Why “before”?
Because the moment you pass that pointer away, you’ve promised:
“This object will stay alive at least until that other code drops its ref.”
Instead, if you do:
task = kthread_run(more_data_handling, data, "more_data");
if (!IS_ERR(task))
kref_get(&data->refcount); // BAD: after handoff
you’ve introduced a race: the new thread might already be using or dropping the pointer before you’ve bumped the refcount.
So the pattern should be:
kref_get(&data->refcount); // take extra ref for the new thread
task = kthread_run(more_data_handling, data, "more_data");
if (IS_ERR(task)) {
kref_put(&data->refcount, data_release);
return -ENOMEM;
}
Now the new thread owns that extra reference. When it’s done, it calls:
kref_put(&data->refcount, data_release);
…and your original code later does its own kref_put().
When the last one hits zero, and only then, data_release() runs and frees the memory.
Rule 2: When you’re done, you must call kref_put()
Whenever your code is finished with an object, you call:
Recommended by LinkedIn
kref_put(&data->refcount, data_release);
A typical release function looks like this:
void data_release(struct kref *ref)
{
struct my_data *data = container_of(ref, struct my_data, refcount);
kfree(data);
}
This is the only place that kfree() should ever be called for that object.
Everywhere else, you just call kref_put() and trust the refcount.
Rule 3: If you don’t already have a valid ref, you must serialize get vs. put
This one is the trickiest, and it’s where people usually get bitten.
If you don’t already hold a valid reference, you must protect the kref_get() with a lock while finding the object - otherwise the object might disappear during lookup.\
Imagine This Scenario:
We have one global pointer to a shared object:
struct my_data {
struct kref refcount;
int value;
};
static struct my_data *global_obj;
static DEFINE_MUTEX(obj_lock);
Only one object.
No lists.
No threads.
Just a global pointer that can be set to NULL.
What we want to do
We want a function that:
Let’s try the WRONG version first.
struct my_data *get_global_obj_wrong(void)
{
struct my_data *obj = global_obj; // We grab the pointer…
if (obj)
kref_get(&obj->refcount); // …then try to get a ref
return obj;
}
Looks harmless, but here's the race:
The problem:
You looked up the pointer without ensuring it stayed alive.
At that moment, you did not own a ref yet → Rule #3 violated.
CORRECT method - Protect Lookup + Get Together
You must hold a lock during the entire critical action:
struct my_data *get_global_obj(void)
{
struct my_data *obj = NULL;
mutex_lock(&obj_lock);
if (global_obj) {
obj = global_obj;
kref_get(&obj->refcount); // Safe: cannot be freed right now
}
mutex_unlock(&obj_lock);
return obj;
}
Now no one can:
between lookup and increment.
So when you return obj, you definitely own a reference.
And when you’re done?
void put_global_obj(struct my_data *obj)
{
kref_put(&obj->refcount, data_release);
}
So, let’s recap the mental model:
If you follow the three rules…
…you dramatically reduce use-after-free bugs, weird crashes, and subtle race conditions.
Now here’s your challenge:
Look at one subsystem you’ve worked on - kernel or user space - and ask yourself:
“Where am I secretly relying on people to remember object lifetimes instead of using refcounts?”
yeah, i've seen this in person.