Rate this page

Flattr this

Capture Ethernet frames using an AF_PACKET socket in C

Tested on

Ubuntu (Lucid, Trusty, Xenial)

Objective

To capture all frames received by a given Ethernet interface using an AF_PACKET socket

Background

Ethernet is a link layer protocol. Most networking programs interact with the network stack at the transport layer or above, so have no need to deal with Ethernet frames directly, but there are some circumstances where monitoring at a lower level may be necessary. These include:

Scenario

Suppose you wish to capture all frames received by all interfaces.

(Capture of specific LinkTypes, or from specific interfaces, is considered separately.)

Method

Overview

The method described here has two steps:

  1. Create the AF_PACKET socket.
  2. Receive and handle Ethernet frames as they arrive.

The following header files are used:

Header Used by
<stdlib.h> exit
<stdio.h> perror
<string.h> memcpy, strlen
<arpa/inet.h> htons
<net/ethernet.h> ETH_P_*
<net/if.h> struct ifreq
<linux/if_packet.h> struct sockaddr_ll, struct packet_mreq, PACKET_MR_PROMISC, PACKET_ADD_MEMBERSHIP
<sys/ioctl.h> SIOCGIFINDEX, ioctl
<sys/socket.h> struct sockaddr, struct iovec, struct msghdr, AF_PACKET, SOCK_RAW, SOCK_DGRAM, socket, sendto, sendmsg, SOL_SOCKET, SOL_PACKET, SO_TIMESTAMP, SIOCGSTAMP, CMSG_*
<sys/time.h> struct timeval

Older versions of the packet(7) manpage specify inclusion of the header <netpacket/packet.h>, however this has since been changed to <linux/if_packet.h>. The latter has historically been kept more up to date than the former, and is the better choice under most circumstances.

AF_PACKET sockets are specific to Linux. Programs that make use of them need elevated privileges in order to run.

Create the AF_PACKET socket

The socket that will be used to capture the Ethernet frames should be created using the socket function. This takes three arguments:

In this instance the intention is to capture packets with their headers and with no filtering by EtherType, in which case the second argument should be set to SOCK_RAW and the third argument to htons(ETH_P_ALL):

int fd = socket(AF_PACKET, SOCK_RAW, htons(ETH_P_ALL));
if (fd == -1) {
    perror("socket");
    exit(1);
}

Receive and handle Ethernet frames as they arrive

Frames can be received using any function that is capable of reading from a file descriptor, however if you have opted to discard the frame headers then it will be necessary to use either recvfrom or recvmsg if you wish to have visibility of the source address.

Regardless of which function you choose you will need to supply a buffer to receive the data. If this is too small to accommodate a complete frame then any excess is discarded. That means you need not be concerned about tracking frame boundaries, because the first byte returned by a read operation will always be the start of a frame. However it does raise two issues: how the buffer size should be chosen, and how any overflow can be detected.

A standard Ethernet frame has a maximum length of 1500 bytes (payload only) or 1518 bytes (header plus payload), however most modern Ethernet interfaces are capable of sending larger frames if configured to do so. Furthermore, if capturing from all interfaces then this will include the loopback interface. This usually has a significantly larger MTU: Linux defaults to 65536 bytes as of version 3.7, or 16436 bytes previously. For general-purpose packet capture it is therefore prudent to default to a buffer size of at least 65536 bytes, and to make the size user-configurable.

The recvmsg function explicitly reports truncation by setting the MSG_TRUNC flag in the msg_flags member of the message header. Alternatively, truncation can be detected when using any of the available functions by providing a buffer that is one byte longer than the largest payload that you actually wish to receive, then interpreting a full buffer as a truncated frame. If required, it would be possible to receive arbitrary-length frames with assistance from the MSG_PEEK option.

Receive and handle frames as they arrive using recvfrom

To call recvfrom you need a buffer for the frame and a buffer for the remote address:

char buffer[65537];
if (setsockopt(fd, SOL_PACKET, PACKET_ADD_MEMBERSHIP, &mreq, sizeof(mreq)) == -1) {
    perror("setsockopt");
    exit(1);
}
struct sockaddr_ll src_addr;
socklen_t src_addr_len = sizeof(src_addr);
ssize_t count = recvfrom(fd, buffer, sizeof(buffer), 0, (struct sockaddr*)&src_addr, &src_addr_len);
if (count == -1) {
    perror("recvfrom");
    exit(1)
} else if (count == sizeof(buffer)) {
    fprintf(stderr, "frame too large for buffer: truncated\n");
} else {
    handle_frame(buffer, count);
}

The fourth argument is for specifying flags which modify the behaviour of recvfrom, none of which are needed in this example.

The value returned by recvfrom is the number of bytes received, or -1 if there was an error. Truncation is detected in this example using the technique described above of providing a slightly over-sized frame buffer.

Receive and handle frames as they arrive using recvmsg

To call recvmsg, in addition to buffers for the frame and remote address you must also construct an iovec array and a msghdr structure:

char buffer[65536];
struct sockaddr_ll src_addr;

struct iovec iov[1];
iov[0].iov_base = buffer;
iov[0].iov_len = sizeof(buffer);

struct msghdr message;
message.msg_name = &src_addr;
message.msg_namelen = sizeof(src_addr);
message.msg_iov = iov;
message.msg_iovlen = 1;
message.msg_control = 0;
message.msg_controllen = 0;

size_t count = recvmsg(fd, &message, 0);
if (count == -1) {
    perror("recvmsg");
    exit(1)
} else if (message.msg_flags & MSG_TRUNC) {
    fprintf(stderr, "frame too large for buffer: truncated\n");
} else {
    handle_frame(buffer, count);
}

The purpose of the iovec array is to provide a scatter/gather capability so that the frame need not be stored in a contiguous region of memory. In this example the entire payload is stored in a single buffer, therefore only one array element is needed.

The msghdr structure exists to bring the number of arguments to recvmsg and sendmsg down to a managable number. On entry to recvmsg it specifies where the source address, the frame and any ancillary data should be stored. In this example no ancillary data has been requested, therefore no provision has been made for receiving any.

The msg_flags field of the msghdr structure is used by recvmsg to return flags to the caller. These include the MSG_TRUNC flag, which on exit will be set if the frame was truncated or clear if it was not. If you wish to pass any flags into recvmsg then this cannot be done using msg_flags, which is ignored on entry. Instead you must pass them using the third argument to recvmsg (which is zero in this example).

Variations

Capture only from a particular network interface

By default an AF_PACKET socket will capture packets that arrive via any network interface. If instead you only wish to capture packets from one particular interface then it is necessary to:

Network interfaces are usually identified by name in user-facing contexts, but for some low-level APIs like the one used here a number is used instead. You can obtain the index from the name by means of the ioctl command SIOCGIFINDEX:

struct ifreq ifr;
size_t if_name_len = strlen(if_name);
if (if_name_len < sizeof(ifr.ifr_name)) {
    memcpy(ifr.ifr_name, if_name, if_name_len);
    ifr.ifr_name[if_name_len] = 0;
} else {
    fprintf(stderr, "interface name is too long\n");
    exit(1);
}
if (ioctl(fd,SIOCGIFINDEX, &ifr) == -1) {
    perror("ioctl");
    exit(1);
}
int ifindex=ifr.ifr_ifindex;

For further details of this method see the microHOWTO Get the index number of a Linux network interface in C using SIOCGIFINDEX.

The sockaddr_ll structure needs to contain the address family, the interface number, and the protocol (which should match the protocol specified when the socket was created, so ETH_P_ALL for no filtering):

struct sockaddr_ll addr = {0};
addr.sll_family = AF_PACKET;
addr.sll_ifindex = ifindex;
addr.sll_protocol = htons(ETH_P_ALL);

Given this socket address, binding is performed in the usual way:

if (bind(fd, (struct sockaddr*)&addr, sizeof(addr)) == -1) {
    perror("bind");
    exit(1);
}

Should you wish to explicitly bind to all interfaces then this can be done by setting the sll_ifindex field to zero.

Capture only a particular EtherType

The EtherType of an Ethernet frame specifies the type of payload that it contains. AF_PACKET sockets have the ability to filter by EtherType when capturing. This is helpful in the case where only one particular EtherType is of interest, as it reduces the volume of data that must be passed from the kernel to userspace.

The desired EtherType should be used in place of ETH_P_ALL both when the socket is first created, and if applicable, when the socket is bound to an interface (remembering that in both cases it should be converted to network byte order).

Put the interface into promiscuous mode

By default, Ethernet interfaces normally filter out any frame which is not addressed to:

To receive all frames, regardless of destination, it is necessary to put the interface into promiscuous mode. This is done using the PACKET_ADD_MEMBERSHIP socket option, which accepts a structure of type packet_mreq:

struct packet_mreq mreq = {0};
mreq.mr_ifindex = ifindex;
mreq.mr_type = PACKET_MR_PROMISC;
if (setsockopt(fd, SOL_PACKET, PACKET_ADD_MEMBERSHIP, &mreq, sizeof(mreq)) == -1) {
    perror("setsockopt");
    exit(1);
}

Note that it is only possible to put specific network interfaces into promiscuous mode using this call: you cannot set the mr_ifindex field to zero to select all interfaces.

A similar effect can be achieved using the ioctl command SIOCSIFFLAGS to set the IFF_PROMISC flag, however this has the undesirable characteristic that this flag does not automatically revert to its previous state when the capture is over (and cannot be safely reverted using a second ioctl in the case where two processes might capture from the same interface).

Note that promiscuous mode will not allow you to capture traffic which was filtered elsewhere before it reached the network interface, therefore in a switched environment this option will typically make little difference to the volume of traffic seen.

Obtain the time of capture

The time when a packet was captured can be obtained by calling the ioctl command SIOCGSTAMP after the packet has been read from the file descriptor. This writes the timestamp into a struct timeval passed to the ioctl as its argument:

struct timeval ts;
if (ioctl(fd, SIOCGSTAMP, &ts) == -1) {
    perror("ioctl");
    exit(1);
}

The timestamp contains the number of seconds since the UNIX epoch, expressed as a number of whole seconds plus a number of microseconds in the tv_sec and tv_usec fields respectively.

An alternative method is to enable the SO_TIMESTAMP or SO_TIMESTAMPNS socket option, for microsecond or nanosecond resolution respectively:

int enable=1;
if (setsockopt(fd, SOL_SOCKET, SO_TIMESTAMP, &enable, sizeof(enable)) == -1) {
    perror("setsockopt");
    exit(1);
}

Once this has been done, the packet must be read using recvmsg with a control buffer supplied. The simplest way to parse the control buffer is to use the macros provided in <sys/socket.h>:

char control[CMSG_SPACE(sizeof(struct timeval))];

struct msghdr message;
/* ... */
message.msg_control = control;
message.msg_controllen = sizeof(control);

size_t count=recvmsg(fd, &message, 0);
/* ... */

struct cmsghdr *cmsg;
struct timeval *ts = 0;

for (cmsg = CMSG_FIRSTHDR(&message); cmsg; cmsg=CMSG_NXTHDR(&message,cmsg)) {
    if (cmsg->cmsg_level == SOL_PACKET && cmsg->cmsg_type == SO_TIMESTAMP) {
        ts = (struct timeval*)CMSG_DATA(cmsg);
        break;
    }
}

There is also a socket option SO_TIMESTAMPING to provide more detailed control of how packets are timestamped.

Note that SIOCGSTAMP will not function as intended if any of the above socket options are enabled.

Alternatives

Using libpcap

libpcap is a cross-platform library for capturing traffic from network interfaces. It also has the ability to send, so provides broadly the same functionality as a packet socket (and on Linux, is implemented using a packet socket).

The main advantage of using libpcap is that it abstracts away differences between the operating systems that it supports, thereby allowing relatively portable code to be written. This involves some loss of functionality, and that may make libpcap unsuitable for use in some circumstances, but otherwise it is recommended in preference to AF_PACKET sockets on the grounds of portability.

Using a raw socket

Raw sockets differ from packet sockets in that they operate at the network layer as opposed to the link layer. For this reason they are limited to network protocols for which raw socket support has been explicitly built into the network stack, but they also have a number of advantages which result from operating at a higher level of abstraction:

For these reasons, use of a raw socket is recommended unless you specifically need the extra functionality provided by working at the link layer.

Using a ring buffer

AF_PACKET sockets are capable of writing packets directly into a memory-mapped ring buffer. This requires fewer system calls and less copying of data, making it possible to achieve higher throughput with less risk of packet loss. Ring buffers will be the subject of a future microHOWTO.

Further reading

Tags: c | socket