rush hour

2018-01-29 22:52:15 +01:00
parent 817b68b025
commit ece9b4afcf
14 changed files with 284 additions and 110 deletions
--- a/Thesis/content/Technical_Background/Benchmarks/Benchmarks.tex
+++ b/Thesis/content/Technical_Background/Benchmarks/Benchmarks.tex
@@ -1,7 +1,7 @@
 \section{Benchmarks}
 \label{sec:benchmarks}

-To get a better understanding of performance related challenges, some benchmarks are performed and described in this section. All benchmarks are performed on the same machine with 16 GB of DD3 RAM with a clock speed of 1600 MT/s in dual channel, an Intel i7-3520M CPU @ 2900 MHz and a Samsung SSD 850 EVO with 250 GB (where not otherwise specified). Linux 4.13.12-1 has been used and Python scripts are executed with Python interpreter in version 3.6.3. For consistency, no other software is running at the time of the benchmark execution (e.g. a desktop environment or heavy background processes) \todo{list of what is running}. All benchmark are run ten times and outliers that show a run time of 10\% above the statistical median are ignored. Although considering the mentioned actions, it is not safe to assume completely equal initial situations at the time of execution on non real-time operating systems (like the one used). So these figures have to be treated with care and should only give a fundamental understanding of how long tasks are about to run.
+To get a better understanding of performance related challenges, some benchmarks are performed and described in this section. All benchmarks are performed on the same machine with 16 GB of DD3 RAM with a clock speed of 1600 MT/s in dual channel, an Intel i7-3520M CPU @ 2900 MHz and a Samsung SSD 850 EVO with 250 GB (where not otherwise specified). Linux 4.13.12-1 has been used and Python scripts are executed with Python interpreter in version 3.6.3. For consistency, no other software is running at the time of the benchmark execution (e.g. a desktop environment or heavy background processes). All benchmark are run ten times and outliers that show a run time of 10\% above the statistical median are ignored. Although considering the mentioned actions, it is not safe to assume completely equal initial situations at the time of execution on non real-time operating systems (like the one used). So these figures have to be treated with care and should only give a fundamental understanding of how long tasks are about to run.

 \begin{lstlisting}[language={bash}, caption={Benchmark: Load and iterate one day of compressed pdns logs}, label={lst:load_and_iterate_one_day_of_compressed_pdns_logs}]
 start_z = time.time()
@@ -19,4 +19,4 @@ print('iterating day took: ' + str(time.time() - start_z) + ' s')

 cleaned results: [155.0667760372162, 148.00951623916626, 147.8429672718048, 147.2554485797882, 147.1039183139801, 147.26967453956604, 147.13052105903625, 147.33162689208984, 147.20316672325134, 147.29751586914062]
 average: 148.15111315250397 seconds
-\end{lstlisting}
+\end{lstlisting}
--- a/Thesis/content/Technical_Background/DNS/DNS.tex
+++ b/Thesis/content/Technical_Background/DNS/DNS.tex
@@ -1,9 +1,9 @@
 \section{Domain Name System}
 \label{sec:DNS}

-The \gls{dns} is one of the cornerstone of the internet as it is known today. \todo{statistic about usage}. Initial designs have been proposed in 1983 and evolved over the following four years into the first globally adapted standard RFC 1034 \fsCite{rfc1034} (RFC 1035 for implementation and specification details \fsCite{rfc1035}). The main idea of the \gls{dns} is translating human readable domain names to network addresses. There are many extensions to the initial design including many security related features and enhancements or the support for \gls{ipv6} in 1995. 
+The \gls{dns} is one of the cornerstone of the internet as it is known today. Nearly every device, connected to the internet is using DNS. Initial designs have been proposed in 1983 and evolved over the following four years into the first globally adapted standard RFC 1034 \fsCite{rfc1034} (see also RFC 1035 for implementation and specification details \fsCite{rfc1035}). The main idea of the \gls{dns} is translating human readable domain names to network addresses. There are many extensions to the initial design including many security related features and enhancements or the support for \gls{ipv6} in 1995. 

-In order to understand how the \gls{dns} is misused for hostile activities and how to prevent these attacks, it is necessary to explain some basic mechanisms.
+In order to understand how the \gls{dns} is misused for malicious activities and how to prevent these attacks, it is necessary to explain some basic mechanisms.


 \subsection{Basics}
@@ -35,15 +35,16 @@ The \gls{dns} primarily builds on two types of components: name servers and reso
 \subsubsection{Name space}
 \label{subsubsec:name_space}

-The \gls{dns} is based on a naming system that consists of a hierarchical and logical tree structure and is called the domain namespace. It contains a single root node and an arbitrary amount of nodes in subordinate levels in variable depths. Each node is uniquely identifiable through a \gls{fqdn} and usually represents a domain, machine or service in the network. Furthermore, every domain can be subdivided into more fine-grained domains. These can again be specific machines or domains, called subdomains. This subdividing is an important concept for the internet to continue to grow and each responsible instance of a domain (e.g. a company or cooperative) is responsible for the maintenance and subdivision of the domain. 
+The \gls{dns} is based on a naming system that consists of a hierarchical and logical tree structure and is called the domain namespace. It contains a single root node (\textit{top level domain} or \textit{TLD})and an arbitrary amount of nodes in subordinate levels in variable depths (descending called second level, third level domain, and so forth). Each node is uniquely identifiable through a \gls{fqdn} and usually represents a domain, machine or service in the network. Furthermore, every domain can be subdivided into more fine-grained domains. These can again be specific machines or domains, called subdomains. This subdividing is an important concept for the internet to continue to grow and each responsible instance of a domain (e.g. a company or cooperative) is responsible for the maintenance and subdivision of the domain. 


 \subsubsection{\gls{dns} Resource Records}
 \label{subsubsec:dns_resource_records}

-\todo{TODO}
+See Table~\ref{tab:resource_record_types} for an list of built-in resource types in the DNS. Those built-in resource records do serve different purposes and are more or less frequently used.

-\begin{table}[]
+
+\begin{table}[!htbp]
 \centering
 \caption{Resource Record Types}
 \label{tab:resource_record_types}
@@ -79,7 +80,7 @@ In this section we will introduce the actual payload a \gls{dns} request as well
 \label{par:message_header}with
 The Message Header is obligatory for all types of communication and may not be empty. It contains different types of flags that are used to control the transaction. The header specifies e.g. which further sections are present, whether the message is a query or a response and more specific opcodes.

-\begin{table}[h!]
+\begin{table}[!htbp]
 \centering
 \caption{Message Header}
 \label{tab:message_header}
@@ -103,7 +104,7 @@ Table~\ref{tab:message_header} shows the template of a \gls{dns} message header.
    \item \textbf{QR:} Query/Response Flag – one bit field whether this message is a query(0) or a response(1)
    
    \item \textbf{OPCODE:} Four bit field that specifies the kind of query for this message. This is set by the requester and copied into the response. Possible values for the opcode field can be found in Table~\ref{tab:message_header_opcodes}
-    \begin{table}[h!]
+    \begin{table}[!htbp]
    \centering
    \caption{Message Header Opcodes}
    \label{tab:message_header_opcodes}
@@ -136,7 +137,7 @@ Table~\ref{tab:message_header} shows the template of a \gls{dns} message header.
    
    \item \textbf{RCODE:} Response Code – only available in response messages, these four bits are used to reveal errors while processing the query. Available error codes are listed in Table~\ref{tab:message_header_response_codes}. Error codes 0 to 5 have been initially available whereas error codes 6 to 10 are used for dynamic \gls{dns} defined in RFC 2136 \fsCite{rfc2136}.
    
-    \begin{table}[h!]
+    \begin{table}[!htbp]
    \centering
    \caption{Message Header Response Codes}
    \label{tab:message_header_response_codes}
@@ -156,24 +157,13 @@ Table~\ref{tab:message_header} shows the template of a \gls{dns} message header.
    10    & Not Zone        & \begin{tabular}[c]{@{}l@{}}A name specified in the request is not contained \\ within the zone declared in the message.\end{tabular}              \\ \bottomrule
    \end{tabular}
    \end{table}
-    
-    \todo{do something with this}
-    There are more response codes available that could be added (due to size restrictions) after \gls{edns} has been introduced.
-    
-    \item \textbf{QDCOUNT:} Unsigned 16 bit integer specifying the number of entries in the Question Section.
-    
-    \item \textbf{ANCOUNT:} Unsigned 16 bit integer specifying the number of resource records in the answer section.
-    
-    \item \textbf{NSCOUNT:} Unsigned 16 bit integer specifying the number of name server resource records in the authority records section.
-    
-    \item \textbf{ARCOUNT:} Unsigned 16 bit integer specifying the number of resource records in the additional records section.
 \end{itemize}


 \paragraph{Question Section:}
 \label{par:question_section}

-\begin{table}[]
+\begin{table}[!htbp]
 \centering
 \caption{Question Section}
 \label{tab_question_section}
@@ -187,43 +177,28 @@ Table~\ref{tab:message_header} shows the template of a \gls{dns} message header.


 \begin{itemize}
-    \item \textbf{Question Name:} Contains a variably sized payload payload including the domain, zone name or general object that is subject of the query. Encoded using standard \gls{dns} name notation. Depending on the Question Type, for example requesting an A Record will typically require an host part, such as www.domain.tld. A MX query will usually only contain a base domain name (domain.tld).
-    \todo{\url{http://www.tcpipguide.com/free/t_DNSNameNotationandMessageCompressionTechnique.htm}}
+    \item \textbf{Question Name:} Contains a variably sized payload including the domain, zone name or general object that is subject of the query. Encoded using standard \gls{dns} name notation. Depending on the Question Type, for example requesting an A Record will typically require an host part, such as www.domain.tld. A MX query will usually only contain a base domain name (domain.tld).
    
-    \item \textbf{Question Type:} Specifies the type of question being asked. This field may contain a code number corresponding to a particular type of resource being requested, see Table~\ref{tab:resource_record_types} for common resource types. TODO continue here (special values)
+    \item \textbf{Question Type:} Specifies the type of question being asked. This field may contain a code number corresponding to a particular type of resource being requested, see Table~\ref{tab:resource_record_types} for common resource types.
    
-    \item \textbf{Question Class} \todo{TODO}
+    \item \textbf{Question Class:} The class of the resource records that are being requested (unsigned 16 bit value). Usually Internet, question classes are assigned by the IANA where all can be found (\fsCite{IANADNSClassesOnline})
 \end{itemize}

-\todo{all tables h!}
+There are more parameters available that can be specified when requesting a resource but do not have a higher relevance here.

-\begin{table}[h!]
-\centering
-\caption{Question Section Format}
-\label{tab:question_section_format}
-\begin{tabular}{@{}lll@{}}
-\toprule
-QType & Type  & Description                                                  \\ \midrule
-251   & IXFR  & Request for a incremental Zone transfer (RFC 1995 \fsCite{rfc1995}) \\
-252   & AXFR  & Request for a Zone Transfer                                  \\
-253   & MAILB & Request for mailbox like resources (obsolete now)            \\
-254   & MAILA & Request for mail agent (obsolete, MX records used instead)   \\
-255   & *     & Request for all records                                      \\ \bottomrule
-\end{tabular}
-\end{table}

 \subsection{Domain Names}
 \label{subsec:domain_names}
-\todo{TODO structure of a domain, etc. top-level, second-level, third-level}
+
+The structure of domain names is generally managed by the corresponding registrar, e.g. the DENIC e.G. (\fsCite{DENICOnline}) for .de domains. This includes for example which characters are allowed in second-level domains and the overall registration process. In the .de space, the second-level domain must contain between one and 63 characters, all characters of the latin alphabet can be used in addition to numbers, hyphen and all 93 characters of the internationalized domain name. The first, third, fourth and last characters is additionally not allowed to be a hyphen. Many different registrars use similar rules like this example which makes it hard to easily distinguish valid from non-valid domain names. 


 \subsection{Resolution}
 \label{subsec:resolution}

-\subsubsection{Recursive}
-\label{TODO subsubsec:recursive}
+Figure~\ref{fig:address_resolution} quickly describes the process of how domain names are resolved from the perspective of a requesting machine. Each step here assumes that the request has not been performed before and such is not available in any cache. In the first step, the \textit{Operating System} is contacting the local resolver, e.g. a router in a private network or a dedicated resolve server in a larger company. As the \textit{DNS Resolver} does know nothing about the domain, it contacts the \textit{Root NS} to return the address of the responsible top-level domain server (\textit{TLD NS} for .com in this example). The resolver then asks the \textit{TLD NS} server to return back the address of the second-level domain server that is in charge of the requested zone (e.g. google.com). Finally the resolver queries the \textit{Google NS} server for the IP address of the \textit{Google Webserver} and sends it back to the \textit{Operating System} which can then establish a connection to the \textit{Google Webserver}.

-\todo{explain delegation (e.g. of TLDs) somewhere here}
+There are mainly two different types of DNS requests that are performed here. The \textit{Operating System} is sending a recursive request to the \textit{DNS Resolver} which itself is successively sending iterative requests to the higher level DNS servers. Usually most public servers do not allow recursive queries due to security risks (denial of service attacks).


 \begin{figure}[!htbp]
@@ -232,10 +207,8 @@ QType & Type  & Description                                                  \\
 \caption{Address Resolution}
 \label{fig:address_resolution}
 \end{figure}
-\todo{not referenced atm}
-

 \subsection{Passive DNS}
 \label{subsec:passive_dns}

-
+A Passive DNS database is a database that contains a history of all resolved DNS queries in a network. The traffic can be observed at any appropriate location in a network, e.g. on a resolver. A Passive DNS database can be used in a variety of actions to harden a network from different threats. Projects like the Security Information Exchange (SIE) collect passive DNS data from multiple sources and analyse the databases to find e.g. inconsistencies in the resolutions (\fsCite{SIEOnline}). Passive DNS databases can also be used by researchers or service providers to find performance issues, identify anomalies or generate usage statistics \fsCite{Deri:2012:TPD:2245276.2245396}.