Capture Ethernet frames using an AF_PACKET
socket in C
Tested on |
Ubuntu (Lucid, Trusty, Xenial) |
Objective
To capture all frames received by a given Ethernet interface using an AF_PACKET
socket
Background
Ethernet is a link layer protocol. Most networking programs interact with the network stack at the transport layer or above, so have no need to deal with Ethernet frames directly, but there are some circumstances where monitoring at a lower level may be necessary. These include:
- troubleshooting (for example, using tools such as tcpdump and Wireshark),
- network intrusion detection (for example, using tools such as Snort), and
- implementation of Ethernet-based protocols that are not built in to the network stack.
Scenario
Suppose you wish to capture all frames received by all interfaces.
(Capture of specific LinkTypes, or from specific interfaces, is considered separately.)
Method
Overview
The method described here has two steps:
- Create the
AF_PACKET
socket. - Receive and handle Ethernet frames as they arrive.
The following header files are used:
Header | Used by |
---|---|
<stdlib.h> |
exit |
<stdio.h> |
perror |
<string.h> |
memcpy , strlen
|
<arpa/inet.h> |
htons |
<net/ethernet.h> |
ETH_P_* |
<net/if.h> |
struct ifreq |
<linux/if_packet.h> |
struct sockaddr_ll , struct packet_mreq , PACKET_MR_PROMISC , PACKET_ADD_MEMBERSHIP
|
<sys/ioctl.h> |
SIOCGIFINDEX , ioctl
|
<sys/socket.h> |
struct sockaddr , struct iovec , struct msghdr , AF_PACKET , SOCK_RAW , SOCK_DGRAM , socket , sendto , sendmsg , SOL_SOCKET , SOL_PACKET , SO_TIMESTAMP , SIOCGSTAMP , CMSG_*
|
<sys/time.h> |
struct timeval |
Older versions of the packet(7)
manpage specify inclusion of the header <netpacket/packet.h>
, however this has since been changed to <linux/if_packet.h>
. The latter has historically been kept more up to date than the former, and is the better choice under most circumstances.
AF_PACKET
sockets are specific to Linux. Programs that make use of them need elevated privileges in order to run.
Create the AF_PACKET socket
The socket that will be used to capture the Ethernet frames should be created using the socket
function. This takes three arguments:
- the domain (
AF_PACKET
for a packet socket); - the socket type (
SOCK_RAW
if you want to capture the Ethernet headers orSOCK_DGRAM
if not); and - the protocol (equal to the required Ethertype, converted to network byte order), which is used for filtering inbound packets.
In this instance the intention is to capture packets with their headers and with no filtering by EtherType, in which case the second argument should be set to SOCK_RAW
and the third argument to htons(ETH_P_ALL)
:
int fd = socket(AF_PACKET, SOCK_RAW, htons(ETH_P_ALL)); if (fd == -1) { perror("socket"); exit(1); }
Receive and handle Ethernet frames as they arrive
Frames can be received using any function that is capable of reading from a file descriptor, however if you have opted to discard the frame headers then it will be necessary to use either recvfrom
or recvmsg
if you wish to have visibility of the source address.
Regardless of which function you choose you will need to supply a buffer to receive the data. If this is too small to accommodate a complete frame then any excess is discarded. That means you need not be concerned about tracking frame boundaries, because the first byte returned by a read operation will always be the start of a frame. However it does raise two issues: how the buffer size should be chosen, and how any overflow can be detected.
A standard Ethernet frame has a maximum length of 1500 bytes (payload only) or 1518 bytes (header plus payload), however most modern Ethernet interfaces are capable of sending larger frames if configured to do so. Furthermore, if capturing from all interfaces then this will include the loopback interface. This usually has a significantly larger MTU: Linux defaults to 65536 bytes as of version 3.7, or 16436 bytes previously. For general-purpose packet capture it is therefore prudent to default to a buffer size of at least 65536 bytes, and to make the size user-configurable.
The recvmsg
function explicitly reports truncation by setting the MSG_TRUNC
flag in the msg_flags
member of the message header. Alternatively, truncation can be detected when using any of the available functions by providing a buffer that is one byte longer than the largest payload that you actually wish to receive, then interpreting a full buffer as a truncated frame. If required, it would be possible to receive arbitrary-length frames with assistance from the MSG_PEEK
option.
Receive and handle frames as they arrive using recvfrom
To call recvfrom
you need a buffer for the frame and a buffer for the remote address:
char buffer[65537]; if (setsockopt(fd, SOL_PACKET, PACKET_ADD_MEMBERSHIP, &mreq, sizeof(mreq)) == -1) { perror("setsockopt"); exit(1); } struct sockaddr_ll src_addr; socklen_t src_addr_len = sizeof(src_addr); ssize_t count = recvfrom(fd, buffer, sizeof(buffer), 0, (struct sockaddr*)&src_addr, &src_addr_len); if (count == -1) { perror("recvfrom"); exit(1) } else if (count == sizeof(buffer)) { fprintf(stderr, "frame too large for buffer: truncated\n"); } else { handle_frame(buffer, count); }
The fourth argument is for specifying flags which modify the behaviour of recvfrom
, none of which are needed in this example.
The value returned by recvfrom
is the number of bytes received, or -1 if there was an error. Truncation is detected in this example using the technique described above of providing a slightly over-sized frame buffer.
Receive and handle frames as they arrive using recvmsg
To call recvmsg
, in addition to buffers for the frame and remote address you must also construct an iovec
array and a msghdr
structure:
char buffer[65536]; struct sockaddr_ll src_addr; struct iovec iov[1]; iov[0].iov_base = buffer; iov[0].iov_len = sizeof(buffer); struct msghdr message; message.msg_name = &src_addr; message.msg_namelen = sizeof(src_addr); message.msg_iov = iov; message.msg_iovlen = 1; message.msg_control = 0; message.msg_controllen = 0; size_t count = recvmsg(fd, &message, 0); if (count == -1) { perror("recvmsg"); exit(1) } else if (message.msg_flags & MSG_TRUNC) { fprintf(stderr, "frame too large for buffer: truncated\n"); } else { handle_frame(buffer, count); }
The purpose of the iovec
array is to provide a scatter/gather capability so that the frame need not be stored in a contiguous region of memory. In this example the entire payload is stored in a single buffer, therefore only one array element is needed.
The msghdr
structure exists to bring the number of arguments to recvmsg
and sendmsg
down to a managable number. On entry to recvmsg
it specifies where the source address, the frame and any ancillary data should be stored. In this example no ancillary data has been requested, therefore no provision has been made for receiving any.
The msg_flags
field of the msghdr
structure is used by recvmsg
to return flags to the caller. These include the MSG_TRUNC
flag, which on exit will be set if the frame was truncated or clear if it was not. If you wish to pass any flags into recvmsg
then this cannot be done using msg_flags
, which is ignored on entry. Instead you must pass them using the third argument to recvmsg
(which is zero in this example).
Variations
Capture only from a particular network interface
By default an AF_PACKET
socket will capture packets that arrive via any network interface. If instead you only wish to capture packets from one particular interface then it is necessary to:
- Determine the index number of the Ethernet interface to be used.
- Construct a
sockaddr_ll
structure containing that interface number. - Bind the
AF_PACKET
socket to that address.
Network interfaces are usually identified by name in user-facing contexts, but for some low-level APIs like the one used here a number is used instead. You can obtain the index from the name by means of the ioctl
command SIOCGIFINDEX
:
struct ifreq ifr; size_t if_name_len = strlen(if_name); if (if_name_len < sizeof(ifr.ifr_name)) { memcpy(ifr.ifr_name, if_name, if_name_len); ifr.ifr_name[if_name_len] = 0; } else { fprintf(stderr, "interface name is too long\n"); exit(1); } if (ioctl(fd,SIOCGIFINDEX, &ifr) == -1) { perror("ioctl"); exit(1); } int ifindex=ifr.ifr_ifindex;
For further details of this method see the microHOWTO Get the index number of a Linux network interface in C using SIOCGIFINDEX
.
The sockaddr_ll
structure needs to contain the address family, the interface number, and the protocol (which should match the protocol specified when the socket was created, so ETH_P_ALL
for no filtering):
struct sockaddr_ll addr = {0}; addr.sll_family = AF_PACKET; addr.sll_ifindex = ifindex; addr.sll_protocol = htons(ETH_P_ALL);
Given this socket address, binding is performed in the usual way:
if (bind(fd, (struct sockaddr*)&addr, sizeof(addr)) == -1) { perror("bind"); exit(1); }
Should you wish to explicitly bind to all interfaces then this can be done by setting the sll_ifindex
field to zero.
Capture only a particular EtherType
The EtherType of an Ethernet frame specifies the type of payload that it contains. AF_PACKET
sockets have the ability to filter by EtherType when capturing. This is helpful in the case where only one particular EtherType is of interest, as it reduces the volume of data that must be passed from the kernel to userspace.
- The header file
<linux/if_ether.h>
provides constants for most commonly-used EtherTypes. Examples includeETH_P_IP
for the Internet Protocol (0x8000
),ETH_P_ARP
for the Address Resolution Protocol (0x0806
) andETH_P_8021Q
for IEEE 802.1Q VLAN tags (0x8100
). - The IEEE maintains the definitive list of registered EtherTypes.
- A semi-official list is maintained by IANA.
The desired EtherType should be used in place of ETH_P_ALL
both when the socket is first created, and if applicable, when the socket is bound to an interface (remembering that in both cases it should be converted to network byte order).
Put the interface into promiscuous mode
By default, Ethernet interfaces normally filter out any frame which is not addressed to:
- the MAC address of that interface,
- the broadcast MAC address, or
- a multicast which the interfaces has been configured to receive.
To receive all frames, regardless of destination, it is necessary to put the interface into promiscuous mode. This is done using the
PACKET_ADD_MEMBERSHIP
socket option, which accepts a structure of type packet_mreq
:
struct packet_mreq mreq = {0}; mreq.mr_ifindex = ifindex; mreq.mr_type = PACKET_MR_PROMISC; if (setsockopt(fd, SOL_PACKET, PACKET_ADD_MEMBERSHIP, &mreq, sizeof(mreq)) == -1) { perror("setsockopt"); exit(1); }
Note that it is only possible to put specific network interfaces into promiscuous mode using this call: you cannot set the mr_ifindex
field to zero to select all interfaces.
A similar effect can be achieved using the ioctl
command SIOCSIFFLAGS
to set the IFF_PROMISC
flag, however this has the undesirable characteristic that this flag does not automatically revert to its previous state when the capture is over (and cannot be safely reverted using a second ioctl
in the case where two processes might capture from the same interface).
Note that promiscuous mode will not allow you to capture traffic which was filtered elsewhere before it reached the network interface, therefore in a switched environment this option will typically make little difference to the volume of traffic seen.
Obtain the time of capture
The time when a packet was captured can be obtained by calling the ioctl
command SIOCGSTAMP
after the packet has been read from the file descriptor. This writes the timestamp into a struct timeval
passed to the
ioctl
as its argument:
struct timeval ts; if (ioctl(fd, SIOCGSTAMP, &ts) == -1) { perror("ioctl"); exit(1); }
The timestamp contains the number of seconds since the UNIX epoch, expressed as a number of whole seconds plus a number of microseconds in the tv_sec
and tv_usec
fields respectively.
An alternative method is to enable the SO_TIMESTAMP
or SO_TIMESTAMPNS
socket option,
for microsecond or nanosecond resolution respectively:
int enable=1; if (setsockopt(fd, SOL_SOCKET, SO_TIMESTAMP, &enable, sizeof(enable)) == -1) { perror("setsockopt"); exit(1); }
Once this has been done, the packet must be read using recvmsg
with a control buffer supplied. The simplest way to parse the control buffer is to use the macros provided in <sys/socket.h>
:
char control[CMSG_SPACE(sizeof(struct timeval))]; struct msghdr message; /* ... */ message.msg_control = control; message.msg_controllen = sizeof(control); size_t count=recvmsg(fd, &message, 0); /* ... */ struct cmsghdr *cmsg; struct timeval *ts = 0; for (cmsg = CMSG_FIRSTHDR(&message); cmsg; cmsg=CMSG_NXTHDR(&message,cmsg)) { if (cmsg->cmsg_level == SOL_PACKET && cmsg->cmsg_type == SO_TIMESTAMP) { ts = (struct timeval*)CMSG_DATA(cmsg); break; } }
There is also a socket option SO_TIMESTAMPING
to provide more detailed control of how packets are timestamped.
Note that SIOCGSTAMP
will not function as intended if any of the above socket options are enabled.
Alternatives
Using libpcap
libpcap is a cross-platform library for capturing traffic from network interfaces. It also has the ability to send, so provides broadly the same functionality as a packet socket (and on Linux, is implemented using a packet socket).
The main advantage of using libpcap is that it abstracts away differences between the operating systems that it supports, thereby allowing relatively portable code to be written. This involves some loss of functionality, and that may make libpcap unsuitable for use in some circumstances, but otherwise it is recommended in preference to AF_PACKET
sockets on the grounds of portability.
Using a raw socket
Raw sockets differ from packet sockets in that they operate at the network layer as opposed to the link layer. For this reason they are limited to network protocols for which raw socket support has been explicitly built into the network stack, but they also have a number of advantages which result from operating at a higher level of abstraction:
- You can write code that will work with any suitable type of network interface.
- The raw socket API has been partially standardised by POSIX, whereas
AF_PACKET
sockets are specific to Linux.
For these reasons, use of a raw socket is recommended unless you specifically need the extra functionality provided by working at the link layer.
Using a ring buffer
AF_PACKET
sockets are capable of writing packets directly into a memory-mapped ring buffer. This requires fewer system calls and less copying of data, making it possible to achieve higher throughput with less risk of packet loss. Ring buffers will be the subject of a future microHOWTO.
Further reading
- packet(7) (Linux manpage)