master_thesis/Thesis/content/Technical_Background/DNS/DNS.tex

\section{Domain Name System}
\label{sec:DNS}

The \gls{dns} is one of the cornerstone of the internet as it is known today. Nearly every device, connected to the internet is using DNS. Initial designs have been proposed in 1983 and evolved over the following four years into the first globally adapted standard RFC 1034 \fsCite{rfc1034} (see also RFC 1035 for implementation and specification details \fsCite{rfc1035}). The main idea of the \gls{dns} is translating human readable domain names to network addresses. There are many extensions to the initial design including many security related features and enhancements or the support for \gls{ipv6} in 1995.

In order to understand how the \gls{dns} is misused for malicious activities and how to prevent these attacks, it is necessary to explain some basic mechanisms.


\subsection{Basics}
\label{subsec:basics}

In the early days of the internet the mapping between host names and ip addresses has been accomplished using a single file, \texttt{HOSTS.TXT}. This file was maintained on a central instance, the \gls{sri-nic}, and distributed to all hosts in the internet via \gls{ftp}. As this file grew and more machines got connected to the internet, the costs for distributing the mappings were increasing up to an unacceptable effort. Additionally, the initial trend of the internet, the \gls{arpanet} connecting multiple hosts together into one network, got outdated. The new challenge of the internet was to connect multiple local networks (which itself contain many machines) into a global, interactive and \gls{tcp/ip} based grid. With the amount of machines quickly increasing and the costs for distributing the \texttt{HOSTS.TXT} file exponentially rising, a new system for a reliable and fast resolution of addresses to host names had to be developed.

\citeauthor{mockapetris1988development} proposed five conditions that had to be met by the base design of \gls{dns} \fsCite[p. 124]{mockapetris1988development}:

\begin{itemize}
\item Provide at least all of the same information as HOSTS.TXT.
\item Allow the database to be maintained in a distributed manner.
\item Have no obvious size limits for names, name components, data associated with a name, etc.
\item Interoperate across the DARPA Internet as many other environments as possible.
\item Provide tolerable performance.
\end{itemize}

For the \gls{dns} to be globally acceptable, it should furthermore not give too many restrictions on how the distributed local networks and the hosts are designed and operated. This includes i.e. not limiting the system to work for a single \gls{os} or software architecture, backing different network topologies or the support of encapsulation of other name spaces.
In general, avoid as many constraints and support as many implementation structures as possible.


\subsubsection{Architecture}
\label{subsubsec:architecture}

The \gls{dns} primarily builds on two types of components: name servers and resolvers. A name server holds information that can be used to handle incoming requests e.g. to resolve a domain name into an ip address. Although resolving domain names into ip addresses might be the primary use case, name servers can possess arbitrary information and provide service to retrieve this information. A resolver interacts with client software and implements algorithms to find a name server that holds the information requested by the client. Depending on the functionality needed, these two components may be split to different machines and locations or running on one machine. Where in former days the power of a workstation may not has been sufficient to run a resolver on, today it is more interesting to benefit from cached information for performance reasons. In a company network it is common to have multiple resolvers e.g. one per organizational unit.


\subsubsection{Name space}
\label{subsubsec:name_space}

The \gls{dns} is based on a naming system that consists of a hierarchical and logical tree structure and is called the domain namespace. It contains a single root node (\textit{top level domain} or \textit{TLD})and an arbitrary amount of nodes in subordinate levels in variable depths (descending called second level, third level domain, and so forth). Each node is uniquely identifiable through a \gls{fqdn} and usually represents a domain, machine or service in the network. Furthermore, every domain can be subdivided into more fine-grained domains. These can again be specific machines or domains, called subdomains. This subdividing is an important concept for the internet to continue to grow and each responsible instance of a domain (e.g. a company or cooperative) is responsible for the maintenance and subdivision of the domain.


\subsubsection{\gls{dns} Resource Records}
\label{subsubsec:dns_resource_records}

See Table~\ref{tab:resource_record_types} for an list of built-in resource types in the DNS. Those built-in resource records do serve different purposes and are more or less frequently used.


\begin{table}[!htbp]
\centering
\caption{Resource Record Types}
\label{tab:resource_record_types}
\begin{tabular}{@{}llll@{}}
\toprule
Value & Text Code & Type                                                         & Description                                                                                                                                                                                                                                                                                                              \\ \midrule
1     & A         & Address                                                      & \begin{tabular}[c]{@{}l@{}}Returns the 32 bit IPv4 address of a host. \\ Most commonly used for name resolution \\ of a host.\end{tabular}                                                                                                                                                                               \\
28    & AAAA      & IPv6 address                                                 & \begin{tabular}[c]{@{}l@{}}Similar to the A record, this returns the \\ address of an host. For IPv6 this has 128 bit.\end{tabular}                                                                                                                                                                                      \\
2     & NS        & \begin{tabular}[c]{@{}l@{}}Name\\ Server\end{tabular}        & \begin{tabular}[c]{@{}l@{}}Specifies the name of a \gls{dns} name server \\ that is authoritative for the zone. Each \\ zone must have at least one NS record \\ that points to its primary name server.\end{tabular}                                                                                                          \\
5     & CNAME     & \begin{tabular}[c]{@{}l@{}}Canonical\\ Name\end{tabular}     & \begin{tabular}[c]{@{}l@{}}The CNAME records allows to define \\ aliases that point to the real canonical \\ name of the node. This can e.g. be used\\  to hide internal \gls{dns} structures and \\ provide a stable interface for outside users.\end{tabular}                                                        \\
6     & SOA       & \begin{tabular}[c]{@{}l@{}}Start of\\ Authority\end{tabular} & \begin{tabular}[c]{@{}l@{}}The SOA record marks the start of a \gls{dns} \\ zone and provides important information \\ about the zone. Every zone must have \\ exactly one SOA records containing \\ e.g. name of the zone, primary \\ authoritative server name and the \\ administration email address.\end{tabular} \\
12    & PTR       & Pointer                                                      & \begin{tabular}[c]{@{}l@{}}Provides a pointer to a different record\\  in the name space.\end{tabular}                                                                                                                                                                                                                   \\
15    & MX        & Mail Exchange                                                & \begin{tabular}[c]{@{}l@{}}Returns the host that is responsible for\\  handling emails sent to this domain.\end{tabular}                                                                                                                                                                                                 \\
16    & TXT       & Text String                                                  & \begin{tabular}[c]{@{}l@{}}Record which allows arbitrary \\ additional texts to be stored that are\\  related to the domain.\end{tabular}                                                                                                                                                                                \\ \bottomrule
\end{tabular}
\end{table}


\subsubsection{Payload}
\label{subsubsec:payload}

In this section we will introduce the actual payload a \gls{dns} request as well as the response is built on. The format of each message that is shared between a resolver and \gls{dns} server has been initially defined in RFC 1035 \fsCite{rfc1035} and consecutively extended with new opcodes, response codes etc. This general format applies to both requests as well as responses and consists of five sections:

\begin{enumerate}
    \item Message Header
    \item Question Section
    \item Answer Section
    \item Authority Section
    \item Additional Section
\end{enumerate}

\paragraph{Message Header:}
\label{par:message_header}with
The Message Header is obligatory for all types of communication and may not be empty. It contains different types of flags that are used to control the transaction. The header specifies e.g. which further sections are present, whether the message is a query or a response and more specific opcodes.

\begin{table}[!htbp]
\centering
\caption{Message Header}
\label{tab:message_header}
\begin{tabular}{@{}cccccccccccccccc@{}}
\toprule
0  & 1     & 2    & 3    & 4    & 5  & 6  & 7  & 8  & 9    & 10   & 11   & 12   & 13   & 14   & 15   \\ \midrule
\multicolumn{16}{c}{Message ID}                                                                      \\
QR & \multicolumn{4}{c}{OPCODE} & AA & TC & RD & RA & Z & AD & CD & \multicolumn{4}{c}{RCODE} \\
\multicolumn{16}{c}{QDCOUNT}                                                                         \\
\multicolumn{16}{c}{ANCOUNT}                                                                         \\
\multicolumn{16}{c}{NSCOUNT}                                                                         \\
\multicolumn{16}{c}{ARCOUNT}                                                                         \\ \bottomrule
\end{tabular}
\end{table}

Table~\ref{tab:message_header} shows the template of a \gls{dns} message header. In the following listing, an explanation for the respective variables and flags is given:

\begin{itemize}
    \item \textbf{Message ID:} 16 bit identifier supplied by the requester (any kind of software that generates a request) and resend back unchanged by the responder to identify the transaction and enables the requester to match up replies to outstanding request.

    \item \textbf{QR:} Query/Response Flag – one bit field whether this message is a query(0) or a response(1)

    \item \textbf{OPCODE:} Four bit field that specifies the kind of query for this message. This is set by the requester and copied into the response. Possible values for the opcode field can be found in Table~\ref{tab:message_header_opcodes}
    \begin{table}[!htbp]
    \centering
    \caption{Message Header Opcodes}
    \label{tab:message_header_opcodes}
    \begin{tabular}{@{}lll@{}}
    \toprule
    Opcode & Type       & Description                                                                                                                                                                                                                     \\ \midrule
    0      & QUERY      & Standard Query.                                                                                                                                                                                                                 \\ \midrule
    1      & IQUERY     & \begin{tabular}[c]{@{}l@{}}Inverse Query: Find domain name by IP address. \\ Deprecated with RFC 3425 in favor of the more widely \\ used in-addr.arpa reverse lookup.\end{tabular}                                             \\ \midrule
    2      & STATUS     & Request server status.                                                                                                                                                                                                          \\ \midrule
    3      & (reserved) & not in use                                                                                                                                                                                                                      \\ \midrule
    4      & NOTIFY     & \begin{tabular}[c]{@{}l@{}}Server to server message type added by RFC 1996. \\ Primary servers (master, authoritative) notify secondary \\ servers to initiate a zone transfer due to updated records \\ in the zone.\end{tabular} \\ \midrule
    5      & UPDATE     & \begin{tabular}[c]{@{}l@{}}Special message type to allow dynamic additions, updates \\ and removals of selected resource records. Basically \\ implements what is known as "dynamic DNS".\end{tabular}                          \\ \midrule
    6-15   & (reserved) & reserved for future use                                                                                                                                                                                                         \\ \bottomrule
    \end{tabular}
    \end{table}

    \item \textbf{AA:} Authoritative Answer – this flag is set to 1 by the responding server if it is an authority for the domain name in the question section. If set to 0 this usually means that a cached record is returned.

    \item \textbf{TC:} The Truncated bit is set to 1 if the response is larger then the permitted transmission channel length and the message has been truncated therefore. This usually indicates that \gls{dns} over \gls{udp} is used and the response payload size increases the maximum 512 bytes. The client may either requery over \gls{tcp} (with no size limits) or not bother at all if the truncated data was part of the Additional section. Set on all truncated messages except for the last one.

    \item \textbf{RD:} Recursion Desired – this bit may be set in a query and is copied into the response if the name server supports recursion. If recursion is refused by this name server, e.g. it has been configured as authoritative only, the response does not have this bit set. Recursive query support is optional.

    \item \textbf{RA:} The recursion available flag can be set in responses by the server to denote whether it is capable of processing recursive queries (1) or not (0).

    \item \textbf{Z:} One bit reserved for future extensions

    \item \textbf{AD:} The authenticated data flag is used by \gls{dnssec} to indicate that the data returned has been verified by the providing server. Always 0 if \gls{dnssec} is not available on the server.

    \item \textbf{CD:} Checking Disabled – also used by \gls{dnssec} and may be set in a requests to show that non-verified data is acceptable to the requester. If \gls{dnssec} is not available in the resolver, this is always set to 0.

    \item \textbf{RCODE:} Response Code – only available in response messages, these four bits are used to reveal errors while processing the query. Available error codes are listed in Table~\ref{tab:message_header_response_codes}. Error codes 0 to 5 have been initially available whereas error codes 6 to 10 are used for dynamic \gls{dns} defined in RFC 2136 \fsCite{rfc2136}.

    \begin{table}[!htbp]
    \centering
    \caption{Message Header Response Codes}
    \label{tab:message_header_response_codes}
    \begin{tabular}{@{}lll@{}}
    \toprule
    RCode & Type            & Description                                                                                                                                       \\ \midrule
    0     & No Error        & Request was successful processed.                                                                                                                 \\ \midrule
    1     & Format Error    & \begin{tabular}[c]{@{}l@{}}The server was unable to respond due to the format \\ of the request.\end{tabular}                                     \\ \midrule
    2     & Server Failure  & \begin{tabular}[c]{@{}l@{}}The server was unable to respond due to an internal \\ server error.\end{tabular}                                      \\ \midrule
    3     & Name Error      & \begin{tabular}[c]{@{}l@{}}The queried domain name could not be found on the \\ server.\end{tabular}                                              \\ \midrule
    4     & Not Implemented & \begin{tabular}[c]{@{}l@{}}The name server does not support the requested kind \\ of query.\end{tabular}                                          \\ \midrule
    5     & Refused         & \begin{tabular}[c]{@{}l@{}}The server refused to answer the request, usually for \\ policy reasons. E.g. unauthorized zone transfer.\end{tabular} \\ \midrule
    6     & YX Domain       & Domain name exists when it should not.                                                                                                            \\ \midrule
    7     & YX RR Set       & A resource record set exists that should not.                                                                                                     \\ \midrule
    8     & NX RR Set       & A resource record set that should exist does not.                                                                                                 \\ \midrule
    9     & Not Auth        & The server is not authoritative for the requested zone.                                                                                           \\ \midrule
    10    & Not Zone        & \begin{tabular}[c]{@{}l@{}}A name specified in the request is not contained \\ within the zone declared in the message.\end{tabular}              \\ \bottomrule
    \end{tabular}
    \end{table}
\end{itemize}


\paragraph{Question Section:}
\label{par:question_section}

\begin{table}[!htbp]
\centering
\caption{Question Section}
\label{tab_question_section}
\begin{tabular}{@{}ccccccccc@{}}
\toprule
0    & 4    & 8    & 12    & 16   & 20      & 24     & 28     & 32     \\ \midrule
\multicolumn{9}{c}{Question Name}                                      \\
\multicolumn{5}{c}{Question Type} & \multicolumn{4}{c}{Question Class} \\ \bottomrule
\end{tabular}
\end{table}


\begin{itemize}
    \item \textbf{Question Name:} Contains a variably sized payload including the domain, zone name or general object that is subject of the query. Encoded using standard \gls{dns} name notation. Depending on the Question Type, for example requesting an A Record will typically require an host part, such as www.domain.tld. A MX query will usually only contain a base domain name (domain.tld).

    \item \textbf{Question Type:} Specifies the type of question being asked. This field may contain a code number corresponding to a particular type of resource being requested, see Table~\ref{tab:resource_record_types} for common resource types.

    \item \textbf{Question Class:} The class of the resource records that are being requested (unsigned 16 bit value). Usually Internet, question classes are assigned by the IANA where all can be found (\fsCite{IANADNSClassesOnline})
\end{itemize}

There are more parameters available that can be specified when requesting a resource but do not have a higher relevance here.


\subsection{Domain Names}
\label{subsec:domain_names}

The structure of domain names is generally managed by the corresponding registrar, e.g. the DENIC e.G. (\fsCite{DENICOnline}) for .de domains. This includes for example which characters are allowed in second-level domains and the overall registration process. In the .de space, the second-level domain must contain between one and 63 characters, all characters of the latin alphabet can be used in addition to numbers, hyphen and all 93 characters of the internationalized domain name. The first, third, fourth and last characters is additionally not allowed to be a hyphen. Many different registrars use similar rules like this example which makes it hard to easily distinguish valid from non-valid domain names.


\subsection{Resolution}
\label{subsec:resolution}

Figure~\ref{fig:address_resolution} quickly describes the process of how domain names are resolved from the perspective of a requesting machine. Each step here assumes that the request has not been performed before and such is not available in any cache. In the first step, the \textit{Operating System} is contacting the local resolver, e.g. a router in a private network or a dedicated resolve server in a larger company. As the \textit{DNS Resolver} does know nothing about the domain, it contacts the \textit{Root NS} to return the address of the responsible top-level domain server (\textit{TLD NS} for .com in this example). The resolver then asks the \textit{TLD NS} server to return back the address of the second-level domain server that is in charge of the requested zone (e.g. google.com). Finally the resolver queries the \textit{Google NS} server for the IP address of the \textit{Google Webserver} and sends it back to the \textit{Operating System} which can then establish a connection to the \textit{Google Webserver}.

There are mainly two different types of DNS requests that are performed here. The \textit{Operating System} is sending a recursive request to the \textit{DNS Resolver} which itself is successively sending iterative requests to the higher level DNS servers. Usually most public servers do not allow recursive queries due to security risks (denial of service attacks).


\begin{figure}[!htbp]
\centering
\includegraphics[scale=.5, clip=true]{content/Technical_Background/DNS/DNS_address-resolution.pdf}
\caption{Address Resolution}
\label{fig:address_resolution}
\end{figure}

\subsection{Passive DNS}
\label{subsec:passive_dns}

A Passive DNS database is a database that contains a history of all resolved DNS queries in a network. The traffic can be observed at any appropriate location in a network, e.g. on a resolver. A Passive DNS database can be used in a variety of actions to harden a network from different threats. Projects like the Security Information Exchange (SIE) collect passive DNS data from multiple sources and analyse the databases to find e.g. inconsistencies in the resolutions (\fsCite{SIEOnline}). Passive DNS databases can also be used by researchers or service providers to find performance issues, identify anomalies or generate usage statistics \fsCite{Deri:2012:TPD:2245276.2245396}.