Linux Network Concepts and Data Structures: sock, socket, and Their Relationships

Linux Network Concepts and Data Structures: sock, socket, and Their Relationships

Understanding how Linux handles network communication involves delving into several key concepts and data structures. At the heart of it all are the ideas of sockets, which provide an interface for applications to interact with the network, and the underlying kernel structures that manage these interactions.

1. The socket System Call and File Descriptors

In Linux, everything is treated as a file. Network connections are no exception. When an application wants to communicate over the network, it typically starts by calling the socket() system call.

  • socket() System Call: This call creates an endpoint for communication. It returns a file descriptor (an integer) that the application can then use to refer to this communication endpoint, just like it would a regular file.
  • File Descriptor (fd): This is a small, non-negative integer used to access an I/O resource. For sockets, the file descriptor acts as a handle to the kernel's internal representation of the socket.

2. The struct socket (Kernel-side Representation)

While the application sees a file descriptor, the kernel maintains a more complex data structure to represent the socket. This is primarily the struct socket.

struct socket: This is a core kernel data structure that represents a network socket. It acts as an abstraction layer, providing a generic interface for various network protocols (TCP, UDP, raw IP, etc.).

Key Fields within struct socket:

  • type: Specifies the socket type (e.g., SOCK_STREAM for TCP, SOCK_DGRAM for UDP, SOCK_RAW for raw IP).
  • state: Represents the current state of the socket (e.g., SS_UNCONNECTED, SS_CONNECTING, SS_CONNECTED).
  • flags: Various flags related to the socket's behavior.
  • ops: A pointer to a struct proto_ops, which defines the protocol-specific operations for this socket type.
  • sk: A pointer to a struct sock, which links the generic socket to protocol-specific data.
  • file: A pointer to the struct file associated with this socket's file descriptor, linking the socket to the VFS layer.

3. The struct sock (Protocol-Specific Representation)

The struct sock is arguably the most important data structure for understanding network communication within the Linux kernel. It holds all the protocol-specific information for a given socket. There's one struct sock per active network connection (or listening socket).

struct sock: This structure contains a vast amount of information related to a network connection, tailored to the specific protocol (TCP, UDP, etc.). It lives within the protocol stack.

Key Fields within struct sock:

  • sk_family: Address family (e.g., AF_INET for IPv4, AF_INET6 for IPv6).
  • sk_protocol: Protocol (e.g., IPPROTO_TCP, IPPROTO_UDP).
  • sk_daddr, sk_saddr: Destination and source IP addresses.
  • sk_dport, sk_sport: Destination and source port numbers.
  • sk_receive_queue: Queue for incoming data waiting to be read.
  • sk_write_queue: Queue for outgoing data waiting to be sent.
  • sk_backlog: Queue for incoming connection requests (for listening sockets).
  • tcp_state: Current TCP state (e.g., TCP_ESTABLISHED, TCP_LISTEN).
  • sk_rcvbuf, sk_sndbuf: Receive and send buffer sizes.
  • sk_rmem_alloc, sk_wmem_alloc: Currently allocated memory for buffers.

4. struct proto and struct proto_ops (Protocol Operations)

Linux's networking stack is highly modular. The struct proto and struct proto_ops structures facilitate this modularity.

  • struct proto: This structure defines the general operations for a specific protocol (e.g., tcp_proto for TCP, udp_proto for UDP). It contains function pointers to common operations.
  • struct proto_ops: This structure defines the operations specific to a socket type for a given protocol. For example, the inet_stream_ops for TCP sockets or inet_dgram_ops for UDP sockets.

5. sk_buff (Socket Buffer)

sk_buff is the fundamental data structure used to pass network packets through the Linux kernel's networking stack. It's a highly optimized structure for managing packet data and metadata.

sk_buff: Represents a network packet. It contains:

  • data: Pointer to the actual packet data (headers and payload).
  • protocol: The protocol of the packet.
  • dev: The network device it originated from or is destined for.
  • len: Total length of the packet.
  • cb: Control buffer for protocol-specific data.
  • Link pointers for chaining sk_buffs in queues.


Summary of Relationships:

  • socket() is the user-space function that creates the initial kernel objects.
  • File Descriptors are the user-space handles to kernel sockets.
  • struct socket is the generic kernel representation of a socket, linking to the VFS and the protocol-specific data.
  • struct sock is the core, protocol-specific data structure holding all the details of a network connection or listening endpoint. It's the "engine room" of the socket.
  • struct proto_ops provides the interface (function pointers) for generic socket operations, allowing the struct socket to interact with the underlying protocol.
  • struct proto defines general protocol-level operations.
  • sk_buff is the data unit that flows through the network stack, managed by the struct sock's queues.

This layered approach allows Linux to support a wide variety of network protocols and configurations efficiently, while providing a consistent interface to user-space applications.


Thanks for sharing David Zhu, i have a doubt how Diagnostic over IP works over ethernet? How can we manage it? Can you give us a big picture of the flow from user space to kernel?

What a nice post 👍

Like
Reply

To view or add a comment, sign in

More articles by David Zhu

Others also viewed

Explore content categories