partially finished exposure; started kopis

2018-01-16 18:30:45 +01:00
parent 0f10314bc8
commit 3586762494
12 changed files with 224 additions and 18 deletions
--- a/Thesis/content/Evaluation_of_existing_Systems/Kopis/Kopis.tex
+++ b/Thesis/content/Evaluation_of_existing_Systems/Kopis/Kopis.tex
@@ -0,0 +1,74 @@
+\section{Kopis}
+\label{sec:kopis}
+
+\subsection{General}
+\label{subsec:kopis_general}
+
+The last evaluated System is called \textit{Kopis} and has been proposed in 2011 by \fsAuthor{Antonakakis:2011:DMD:2028067.2028094}, the authors that also released \nameref{sec:notos}, at the Georgia Institute of Technology and the University of Georgia. \textit{Kopis} is following a slightly different approach compared to the previous two Systems, \textit{Notos} and \textit{Exposure}. Instead of collecting passively monitored DNS traffic from a (limited) number of different recursive DNS servers in various locations \textit{Kopis} uses requests, registered in the upper DNS layers, from e.g. top-level domain servers and authoritative name servers. See Figure~\ref{fig:kopis_data_sources} for an overview of where those three different System aggregate logs to perform traffic analysis. Operating in the upper DNS layers, \textit{Kopis} is not only able to extract significantly different classes of features compared to \textit{Notos} and \textit{Exposure} but also has to deal with different challenges like DNS caching. The biggest 
+
+\begin{figure}[!htbp]
+    \centering
+    \includegraphics[width=.9\textwidth, clip=true]{content/Evaluation_of_existing_Systems/Kopis/kopis_data_sources.png}
+    \caption{Overview of the levels at which Kopis, Notos, and Exposure perform DNS monitoring. \fsCite[Figure 1]{Antonakakis:2011:DMD:2028067.2028094}}
+    \label{fig:kopis_data_sources}
+\end{figure}
+
+
+\subsection{Architecture}
+\label{subsec:kopis_architecture}
+
+\begin{figure}[!htbp]
+    \centering
+    \includegraphics[width=.9\textwidth, clip=true]{content/Evaluation_of_existing_Systems/Kopis/kopis_system_overview.png}
+    \caption{Kopis: system overview \fsCite[Figure 3]{Antonakakis:2011:DMD:2028067.2028094}}
+    \label{fig:kopis_system_overview}
+\end{figure}
+
+The overall system architecture can be seen in Figure~\ref{fig:kopis_system_overview}. The first step in the reputation system is to gather all (streamed) DNS queries and responses and divide this traffic into fixed epochs (e.g. one day in \textit{Kopis}). After collecting the traffic of each epoch \(E_i\), different statistics about a domain \textit{d} are extracted by the \textit{Feature Computation} function into a feature vector \(v_d^i\). A detailed table of which features are used is listed in Section~\ref{subsec:kopis_features}. \textit{Kopis} tries to separate benign from malicious domains by characteristics like the volume of DNS requests to domain \textit{d}, the diversity of IP addresses of the querying machines and the historic information relating to the IP space \textit{d} is pointing to. Like the first two investigated systems, \textit{Kopis} is operating in two different modes. In training mode, the reputation model is built in an offline fashion (\textit{Learning Module}) which is later used in the operational mode (\textit{Statistical Classifier}) to assign \textit{d} a reputation score in a streamed fashion. The \textit{Learning Module} takes the feature vector of a period of \textit{m} days that is generated by the \textit{Feature Computation} function as input and uses the \textit{Knowledge Base (KB)} to label each sample in that training set as being a malicious or legitimate domain (training set: \(V_{train} = \{v_d^i\}_{i=1..m}, \forall d \in \textit{KB}\)). The \textit{KB} consists of various public and undisclosed sources: \\
+
+\textbf{Malicious domain sources: } 
+\begin{itemize}
+    \item Information about malware from a commercial feed with a volume between 400 MB and 2GB a day
+    \item Malware, captured from two corporate networks
+    \item Public blacklists, e.g., malwaredomains.com \fsCite{malwaredomainsInformationOnline} and the Zeus Block List \fsCite{zeusblocklistInformationOnline} \\
+\end{itemize}
+
+
+\textbf{Benign domain sources: }
+\begin{itemize}
+    \item Domain and ip whitelists from DNSWL \fsCite{DNSWLOnline}
+    \item Address space of the top 30 Alexa domains \fsCite{AlexaWebInformationOnline}
+    \item Dihe's IP-Index Browser \fsCite{DIHEOnline}
+\end{itemize}
+
+The operational mode first captures all DNS traffic streams. At the end of each epoch \(E_j\), the feature vector \(v_{d'}^j\) for all unknown domains \(d' \notin \textit{KB}\) is extracted and the \textit{Statistical Classifier} assigns a label (either malicious or legitimate) \(l_{d', j}\) and a confidence score \(c(l_{d', j})\). While the label classifies if the domain \textit{d'} is expected to be malicious or legitimate, the confidence score expresses the probability of this label. For the final reputation score, \textit{Kopis} first computes a series of label/confidence tuples for \textit{m} epochs starting at epoch \(E_t\): \(S(v_{d'}^j) = \{l_{d', j}, c(l_{d', j})\}, j = t, .., (t + m)\) and by averaging the confidence scores of the malicious labels (\textit{M}), the reputation score can be expressed as \(\overline{C}_M = avg_j\{c(l_{d', j})\}\)
+
+
+\subsection{Features}
+\label{subsec:kopis_features}
+
+Much like the previous investigated systems, \textit{Kopis} is extracting different features that are grouped in three sets. Two of those groups, the \textit{Requester Diversity} and the \textit{Requester Profile} features, have not been proposed in research before and due to the system architecture are differing from those that are used in \textit{Notos} and \textit{Exposure}. In contrast to \textit{Notos} and \textit{Exposure}, which use traffic monitored from recursive DNS servers in lower DNS layers, \textit{Kopis} is operating with data from two large AuthNS as well as a country level TLD server (.ca space) in the upper DNS layers (see \ref{fig:kopis_data_sources}). Operating in this level in the DNS hierarchy leads to different challenges as well. A top-level domain server is rarely answering a request itself but most of the time is only delegating the request to a more specific server, e.g. a server responsible for the zone of a second-level domain in a company. For this reason, to get the actual resolved record (IP), the delegated name server can be queried straightly or a passive DNS database (e.g. from the Security Information Exchange \fsCite{SIEOnline}) can be engaged. 
+
+The first step of extracting features out of the captured traffic for each dns query \(q_j\) (to resolve a domain \textit{d}), is to find the epoch \(T_j\), in which the request has been made, the IP address of the machine \(R_j\) that run the query and the resolved records \(IPs_j\). Using these raw values, \textit{Kopis} extracts the following specific features:
+
+\subsubsection{Requester Diversity (RD)}
+\label{subsubsec:kopis_requester_diversity}
+
+\subsubsection{Requester Profile (RP)}
+\label{subsubsec:kopis_requester_profile}
+
+\subsubsection{Resolved-IPs Reputation (IPR)}
+\label{subsubsec:kopis_resolved-ips_reputation}
+
+
+\subsection{Reputation Engine}
+\label{subsec:kopis_reputation_engine}
+
+
+\subsection{Results}
+\label{subsec:kopis_results}
+
+
+Using the \textit{KB}, a sample with 225,429 unique RRs (corresponding to 28,915 unique domain names) could be split into groups with 27,317 malicious and 1,598 benign domains. 
+
+\todo{see section one for contributions}
--- a/Thesis/content/Evaluation_of_existing_Systems/Kopis/kopis_data_sources.png
+++ b/Thesis/content/Evaluation_of_existing_Systems/Kopis/kopis_data_sources.png
--- a/Thesis/content/Evaluation_of_existing_Systems/Kopis/kopis_system_overview.png
+++ b/Thesis/content/Evaluation_of_existing_Systems/Kopis/kopis_system_overview.png