Rate this page

Flattr this

Listen for and receive UDP datagrams in C

Tested on

Debian (Lenny)
Ubuntu (Lucid)

Objective

To listen for and receive inbound UDP datagrams in C

Scenario

Suppose that you wish to write a server that implements the UDP-based variant of the Daytime Protocol, as defined by RFC 867

This is a very simple protocol whereby the client sends a datagram to the server, then the server responds with a datagram containing a human-readable copy of the current date and time. The datagram from the client is not required to have any particular content.

Method

Overview

The method described here has four steps:

  1. Construct the local socket address.
  2. Create the socket.
  3. Bind the local address to the socket.
  4. Receive and handle datagrams as they arrive.

This is the appropriate procedure when listening for unsolicited datagrams, as in the scenario described above. See below for how it can be adapted to:

The following header files will be needed:

#include <errno.h>
#include <string.h>
#include <unistd.h>
#include <netdb.h>
#include <sys/socket.h>
#include <netinet/in.h>

and if using recvmsg to receive datagrams:

#include <sys/uio.h>

Construct the local socket address

In order to listen for UDP datagrams it is necessary to choose a port number and, optionally, a local IP address on which to listen. The combination of these two values is treated as a single entity called the socket address, which is represented by a struct sockaddr_in for IPv4 or a struct sockaddr_in6 for IPv6.

Most common network services have an assigned port number on which they are normally expected to listen. While it makes sense to use this as the default, it is good practice to make the port number configurable. Possible reasons for wanting to override the assigned port number include:

The local IP address should normally default to either the the wildcard address or the loopback address, but like the port number it is good practice to make it configurable. When a service is bound to a particular IP address it will only accept connections directed to that address, whereas when bound to the wildcard address it will accept connections to any local address. Binding to the loopback address has the effect of prohibiting connections from other machines.

For most purposes the best way to construct the socket address is by calling getaddrinfo. This takes a string containing the IP address and a string containing the port number, and converts them into a sockaddr_in or a sockaddr_in6 as appropriate. It is also able to resolve hostnames and service names:

const char* hostname=0; /* wildcard */
const char* portname="daytime";
struct addrinfo hints;
memset(&hints,0,sizeof(hints));
hints.ai_family=AF_UNSPEC;
hints.ai_socktype=SOCK_DGRAM;
hints.ai_protocol=0;
hints.ai_flags=AI_PASSIVE|AI_ADDRCONFIG;
struct addrinfo* res=0;
int err=getaddrinfo(hostname,portname,&hints,&res);
if (err!=0) {
    die("failed to resolve local socket address (err=%d)",err);
}

The hints argument contains additional information to help guide the conversion. In this example:

The res argument is used to return a linked list of addrinfo structures containing the address or addresses that were found. If the network service daemon has the ability to listen on multiple sockets then it should open one for each address in the list. Otherwise it is considered acceptable to use the first result and discard the remainder.

The memory occupied by the result list should be released by calling freeaddrinfo once it is no longer needed, however this cannot be done until after the socket has been created and bound.

Create the socket.

The socket that will be used to listen for inbound datagrams should be created using the socket function. This takes three arguments:

  1. the domain (AF_INET or AF_INET6 in this case, corresponding to IPv4 or IPv6 respectively),
  2. the socket type (SOCK_DGRAM in this case, meaning that the socket should provide connectionless and potentially unreliable transfer of datagrams), and
  3. the protocol (IPROTO_UDP in this case, corresponding to UDP).

A value of 0 for the protocol requests the default for the given address family and socket type, which for AF_INET or AF_INET6 and SOCK_DGRAM would be IPPROTO_UDP. It is equally acceptable for the protocol to be deduced in this manner or specified explicitly.

Assuming you previously used getaddrinfo to construct the remote address then the required values can be obtained from the addrinfo structure:

int fd=socket(res->ai_family,res->ai_socktype,res->ai_protocol);
if (fd==-1) {
    die("%s",strerror(errno));
}

Bind the local address to the socket

As noted previously, the server socket must be bound to a local address before it can listen for inbound datagrams. This should be done using the bind function:

if (bind(fd,res->ai_addr,res->ai_addrlen)==-1) {
    die("%s",strerror(errno));
}

The first argument is the socket descriptor. The second and third arguments are the local address and its length.

If the local address was constructed using getaddrinfo then the memory occupied by the address list can now be released:

freeaddrinfo(res);

(If the address list has been searched or filtered then take care that it is the head of the list that is released, not the address that you have chosen to use.)

Receive and handle datagrams as they arrive

Datagrams can be received using any function that is capable of reading from a file descriptor, however if you are listening for unsolicited datagrams (as in this example) then you will normally want to know where each datagram originated from so that it can be replied to. This information is provided by the functions recvfrom and recvmsg. Of these recvmsg is the more flexible option, but at the cost of a significantly more complex interface. Details for each function are given below.

Regardless of which function you choose you will need to supply a buffer to receive the data. If this is too small to accommodate a complete datagram then any excess is discarded. That means you need not be concerned about tracking datagram boundaries, because the first byte returned by a read operation will always be the start of a datagram. However it does raise two issues: how the buffer size should be chosen, and how any overflow can be detected.

UDP-based application-layer protocols often limit the size of datagram that can be sent in order to provide an solution to the first issue. For example, TFTP and DNS each have a fixed maximum payload size of 512 bytes. For DHCP the limit defaults to 548 bytes, but a larger value can be negotiated if both parties are willing to support it.

In the absence of such guidance it is necessary to consider what the transport, network and link layer protocols are likely to support. The maximum payload size for UDP over IPv4 is 65507 bytes, and for IPv6 with jumbogram support it is close to 4 gigabytes. However, the largest payload that an implementation is required to support is 548 bytes for IPv4 and 1452 bytes for IPv6. On an Ethernet with the standard MTU of 1500 bytes, the largest payload that can be sent without fragmentation is 1472 bytes. On this basis, 1472 bytes would be a reasonable choice if you have no reason to believe that a larger buffer is needed or that a smaller buffer would suffice.

It is possible to receive arbitrary-length datagrams with assistance from the MSG_PEEK option, however if you choose to do this then it would be prudent to set an upper limit in order to prevent denial of service attacks.

The recvmsg function explicitly reports truncation by setting the MSG_TRUNC flag in the msg_flags member of the message header. Alternatively, truncation can be detected when using any of the available functions by providing a buffer that is one byte longer than the largest payload that you actually wish to receive, then interpreting a full buffer as a truncated datagram.

Receive and handle datagrams as they arrive using recvfrom

To call recvfrom you need a buffer for the datagram and a buffer for the remote address:

char buffer[549];
struct sockaddr_storage src_addr;
socklen_t src_addr_len=sizeof(src_addr);
ssize_t count=recvfrom(fd,buffer,sizeof(buffer),0,(struct sockaddr*)&src_addr,&src_addr_len);
if (count==-1) {
    die("%s",strerror(errno));
} else if (count==sizeof(buffer)) {
    warn("datagram too large for buffer: truncated");
} else {
    handle_datagram(buffer,count);
}

The fourth argument is for specifying flags which modify the behaviour of recvfrom, none of which are needed in this example.

The value returned by recvfrom is the number of bytes received, or -1 if there was an error. Truncation is detected in this example using the technique described above of providing a slightly over-sized datagram buffer.

Receive and handle datagrams as they arrive using recvmsg

To call recvmsg, in addition to buffers for the datagram and remote address you must also construct an iovec array and a msghdr structure:

char buffer[548];
struct sockaddr_storage src_addr;

struct iovec iov[1];
iov[0].iov_base=buffer;
iov[0].iov_len=sizeof(buffer);

struct msghdr message;
message.msg_name=&src_addr;
message.msg_namelen=sizeof(src_addr);
message.msg_iov=iov;
message.msg_iovlen=1;
message.msg_control=0;
message.msg_controllen=0;

ssize_t count=recvmsg(fd,&message,0);
if (count==-1) {
    die("%s",strerror(errno));
} else if (message.msg_flags&MSG_TRUNC) {
    warn("datagram too large for buffer: truncated");
} else {
    handle_datagram(buffer,count);
}

The purpose of the iovec array is to provide a scatter/gather capability so that the datagram payload need not be stored in a contiguous region of memory. In this example the entire payload is stored in a single buffer, therefore only one array element is needed.

The msghdr structure exists to bring the number of arguments to recvmsg and sendmsg down to a managable number. On entry to recvmsg it specifies where the source address, the datagram payload and any ancillary data should be stored. In this example no ancillary data has been requested, therefore no provision has been made for receiving any.

The msg_flags field of the msghdr structure is used by recvmsg to return flags to the caller. These include the MSG_TRUNC flag, which on exit will be set if the datagram was truncated or clear if it was not. If you wish to pass any flags into recvmsg then this cannot be done using msg_flags, which is ignored on entry. Instead you must pass them using the third argument to recvmsg (which is zero in this example).

Variations

Listening for a reply

When listening for a reply to a datagram that you have sent then three of the four steps listed above may be omitted:

Connecting to a remote host

When exchanging many datagrams from a particular remote host it may be beneficial for a UDP socket to be connected to that host. This removes the need for the remote address to be explicitly checked every time a datagram is received, and for the address to be specified every time one is sent. The connection is made using the connect function:

if (connect(fd,remote_addr,sizeof(remote_addr))==-1) {
    die("%s",strerror(errno));
}

This is superficially identical to the call that would be made to establish a TCP connection, however unlike TCP there is no handshake. This has two notable consequences:

A UDP socket in the connected state will only receive datagrams that originate from the given remote address. It is therefore feasible to use functions such as read or recv in place of recvfrom. Similarly the given remote address becomes the default for outgoing datagrams, therefore it is feasible to use write or send in place of sendto. (Being connected does not, however, prevent you from sending datagrams to arbitrary destinations using sendto if you so wish.)

Determining the local address

When replying to a datagram on a multihomed host, RFC 1123 recommends that the source address of the reply should match the destination address of the corresponding request. Unfortunately the POSIX API does not provide a satisfactory way to achieve this in a portable manner. Briefly, the available options include:

This is a substantial topic in its own right and will be the subject of a future microHOWTO.

See also

Further Reading

Tags: c | posix | socket