added notos results

2018-01-03 20:19:47 +01:00
parent 48f8343a79
commit d47217fd88
6 changed files with 45 additions and 15 deletions
--- a/Thesis/content/Evaluation_of_existing_Systems/Evaluation_of_existing_Systems.tex
+++ b/Thesis/content/Evaluation_of_existing_Systems/Evaluation_of_existing_Systems.tex
@@ -15,7 +15,7 @@ For a comprehensive evaluation, all input and output as well as the exact implem
 \subsection{General}
 \label{subsec:notos_general}

-\textit{Notos} has been published in 2010 by \fsAuthor{Antonakakis:2010:BDR:1929820.1929844} at the Georgia Institute of Technology. It has been introduced as ``being the first [system] to create a comprehensive dynamic reputation system around domain names'' \fsCite[Section 1]{Antonakakis:2010:BDR:1929820.1929844}. \textit{Notos} is based on observations that malicious use of DNS usually can be distinguished from legitimate, professionally provisioned DNS services by unique characteristics. Fraudulent activities therefore usually utilize techniques to evade security countermeasures \fsCite{Antonakakis:2010:BDR:1929820.1929844}. This approach is mainly using passive historical DNS information that was obtained on multiple recursive resolvers distributed accross the Internet. For building a model of how resources are typically used in legitimate and malcious applications, information about vicious ip addresses and domain names is collected from different sources like honeypots, malware analysis services and spam-traps. Using this model, new domains that have never been seen before can be dynamically assigned with a reputation score of how likely this new domain is involved in malicious activities. Malcious activities in the context of \textit{Notos} are roughly described as: ``if it [a domain] has been involved with botnet C\&C servers, spam campaigns, malware propagation, etc.'' \fsCite[Section 3]{Antonakakis:2010:BDR:1929820.1929844}
+\textit{Notos} has been published in 2010 by \fsAuthor{Antonakakis:2010:BDR:1929820.1929844} at the Georgia Institute of Technology. It has been introduced as ``being the first [system] to create a comprehensive dynamic reputation system around domain names'' \fsCite[Section 1]{Antonakakis:2010:BDR:1929820.1929844}. \textit{Notos} is based on observations that malicious use of DNS usually can be distinguished from legitimate, professionally provisioned DNS services by unique characteristics. Fraudulent activities therefore usually utilize techniques to evade security countermeasures \fsCite{Antonakakis:2010:BDR:1929820.1929844}. This approach is mainly using passive historical DNS information that was obtained on multiple recursive resolvers distributed across the Internet. For building a model of how resources are typically used in legitimate and malicious applications, information about vicious ip addresses and domain names is collected from different sources like honeypots, malware analysis services and spam-traps. Using this model, new domains that have never been seen before can be dynamically assigned with a reputation score of how likely this new domain is involved in malicious activities. Malicious activities in the context of \textit{Notos} are roughly described as: ``if it [a domain] has been involved with botnet C\&C servers, spam campaigns, malware propagation, etc.'' \fsCite[Section 3]{Antonakakis:2010:BDR:1929820.1929844}

 \textit{Notos} uses some basic terminology which is shortly introduced here:
 \begin{itemize}
@@ -33,10 +33,10 @@ For a comprehensive evaluation, all input and output as well as the exact implem
 \label{subsec:notos_architecture}

 The main goal of \textit{Notos} is to assign a dynamic reputation score to domain names. Domains that are likely to be involved in malicious activities are tagged with a low reputation score, whereas legitimate Internet services are assigned with a high reputation score. 
-\textit{Notos'} primary source of information is a database that contains historical data about domains and resolved ip addresses. This database is built using DNS traffic from two recursive ISP DNS servers (RDNS) and pDNS logs collected by the Security Information Exchange (SIE) which covers authoritive name servers in North America and Europe. For building a list of known malicious domain names, several honeypots and spam-traps have been deployed. A large list of known good domains has been gathered from the top sites list on \textit{alexa.com} which ranks the most popular websites in several regions. These two lists are referred to as the \textit{knowledge base} and are used to train the reputation training model. 
+\textit{Notos'} primary source of information is a database that contains historical data about domains and resolved ip addresses. This database is built using DNS traffic from two recursive ISP DNS servers (RDNS) and pDNS logs collected by the Security Information Exchange (SIE) which covers authoritative name servers in North America and Europe. For building a list of known malicious domain names, several honeypots and spam-traps have been deployed. A large list of known good domains has been gathered from the top sites list on \textit{alexa.com} which ranks the most popular websites in several regions. These two lists are referred to as the \textit{knowledge base} and are used to train the reputation training model. 


-To assign a reputation score to a domain \textit{d}, the most current set of IP addresses \(A_{c}(d) = \left\{a_{i}\right\}_{i=1..m}\) to which \textit{d} points is first fetched. Afterwards the pDNS database is queried for several information for this domain \textit{d}. The \textit{Related Historic IPs (RHIPs)} is the set of all IP addresses that ever pointed to this domain. In case domain \textit{d} is a third-level domain, all IP addresses that pointed to the corresponding second-level domain are also included. See Chapter~\ref{subsec:domain_names} for more information on the structure of domain names. If \textit{d} is a second-level domain, then all IPs that are pointed to from any of the third-level subdomains are also added to the RHIPs. In the next step, the set of \textit{Related Historic Domains (RHDNs)} is queried and covers all domains that are related to the currently processed domain \textit{d}. Specifically, all domains which ever resolved to an IP address that is residing in any of the ASNs of those IPs that \textit{d} currently resolves to. \todo{understandable?}
+To assign a reputation score to a domain \textit{d}, the most current set of IP addresses \(A_{c}(d) = \left\{a_{i}\right\}_{i=1..m}\) to which \textit{d} points is first fetched. Afterwards the pDNS database is queried for several information for this domain \textit{d}. The \textit{Related Historic IPs (RHIPs)} is the set of all IP addresses that ever pointed to this domain. In case domain \textit{d} is a third-level domain, all IP addresses that pointed to the corresponding second-level domain are also included. See Chapter~\ref{subsec:domain_names} for more information on the structure of domain names. If \textit{d} is a second-level domain, then all IPs that are pointed to from any of the third-level subdomains are also added to the RHIPs. In the next step, the set of \textit{Related Historic Domains (RHDNs)} is queried and covers all domains that are related to the currently processed domain \textit{d}. Specifically, all domains which ever resolved to an IP address that is residing in any of the ASNs of those IPs that \textit{d} currently resolves to.

 There are three types of features extracted from the database for \textit{Notos} that are used for training the reputation model (quotation from \fsCite[Section 3.1]{Antonakakis:2010:BDR:1929820.1929844}):

@@ -48,9 +48,8 @@ There are three types of features extracted from the database for \textit{Notos}
 \end{enumerate}
 \end{quote}

-Figure~\ref{fig:notos_system_overview} shows the overall system architecture of \textit{Notos}. After all the features are extracted from the passive DNS database and prepared for further steps, the reputation engine is initialized. \textit{Notos'} reputation engine is operating in two modes. In offline mode, the reputation model is constructed for a set of domains using the feature set of each domain and the classification which can be calculated using the \textit{knowledge base} with black- and whitelist (also referred as training). This model can later be used in the online mode to dynamically assign a reputation score. In online mode, the same features that are used for the initial training are extracted for a new domain (resource record or RR, see Section~\nameref{subsubsec:dns_resource_records}) and \textit{Notos} queries the trained reputation engine for the dynamic reputation rating (see Figure~\ref{fig:notos_online_offline_mode}).
+Figure~\ref{fig:notos_system_overview} shows the overall system architecture of \textit{Notos}. After all the features are extracted from the passive DNS database and prepared for further steps, the reputation engine is initialized. \textit{Notos'} reputation engine is operating in two modes. In offline mode, the reputation model is constructed for a set of domains using the feature set of each domain and the classification which can be calculated using the \textit{knowledge base} with black- and whitelist (also referred as training). This model can later be used in the online mode to dynamically assign a reputation score. In online mode, the same features that are used for the initial training are extracted for a new domain (resource record or RR, see Section~\nameref{subsubsec:dns_resource_records}) and \textit{Notos} queries the trained reputation engine for the dynamic reputation rating (see Figure~\ref{fig:notos_online_offline_mode}). The data for labeling domains and IPs  originates from various sources: the blacklist primarily consists of filter lists from malware services like malwaredomainlist.com and malwaredomains.com. Additional IP and domain labeling blacklists are the Sender Policy Block from Spamhaus and the ZeuS blocklist from ZeuS Tracker. The base has been downloaded before the main analyzation period (fifteen days from the first of August 2009)and as filter lists usually lag behind state-of-the art malware, the blacklists have continuously been updated. The whitelist was built using the top 500 popular Alexa websites. The 18 most common second level domains from various content delivery networks for classifying the CDN clusters and a list of 464 dynamic DNS 2LD for identifying domains and IPs in dynamic DNS zones.

-\todo{better explain EV, NM, DC}
 \begin{figure}[!htbp]
    \centering
    \includegraphics[scale=.3, clip=true]{content/Evaluation_of_existing_Systems/Notos_System_overview.png}
@@ -101,9 +100,9 @@ The first group of features handles network-related keys. This group mostly desc
 \end{tabularx}
 \end{table}

-The second group is about zone-based features and is extracted from the RHDNs. In contrast to the network-based features which compares characteristics of the historic IPs, the zone-based features handles characteristics of all historically involved domains. While legitimate services often involve many domains, they usually share similarities. ``For example, google.com, googlesyndication.com, googlewave.com, etc., are all related to Internet services provided by Google, and contain the string 'google' in their name.'' \fsCite[Section 3.2.2]{Antonakakis:2010:BDR:1929820.1929844}. In contrast, randomly generated domains used in spam campaigns are rarely sharing similarities. To calculate this level of diversity, seventeen features are extracted which can be found in Table~\ref{tab:notos_zone-based_features}:
+The second group is about zone-based features and is extracted from the RHDNs. In contrast to the network-based features which compares characteristics of the historic IPs, the zone-based features handles characteristics of all historically involved domains. While legitimate services often involve many domains, they usually share similarities. ``For example, google.com, googlesyndication.com, googlewave.com, etc., are all related to Internet services provided by Google, and contain the string 'google' in their name.''. In contrast, randomly generated domains used in spam campaigns are rarely sharing similarities. By calculating the mean, median and standard deviation for some key, the ``summarize [of] the shape of its distribution'' is investigated \fsCite[Section 3.2.2]{Antonakakis:2010:BDR:1929820.1929844}. To calculate this level of diversity, seventeen features are extracted which can be found in Table~\ref{tab:notos_zone-based_features}:

-\begin{table}[]
+\begin{table}[!htbp]
    \centering
    \caption{Notos: Zone-based features}
    \label{tab:notos_zone-based_features}
@@ -132,7 +131,7 @@ The second group is about zone-based features and is extracted from the RHDNs. I

 For the evidence-based features, public information and exclusively collected data from honeypots and spam-traps is collected. This \textit{knowledge base} primarily helps to discover if a domain \textit{d} is in some way interacting with known malicious IPs and domains. As domain names are much cheaper than ip addresses, malware authors tend to reuse IPs with updated domain names. The blacklist features detect the reuse of known malicious resources like IP addresses, \gls{bgp} prefixes and \glspl{as}. 

-\begin{table}[]
+\begin{table}[!htbp]
    \centering
    \caption{Notos: Evidence-based features}
    \label{tab:notos_evidence-based_features}
@@ -145,11 +144,8 @@ For the evidence-based features, public information and exclusively collected da
    \multirow{3}{*}{\textit{Blacklist}} & \# of IP addresses in \(A(d)\) that are listed in public IP blacklists                             \\ \cline{2-2} 
                                        & \# of IPs in \(BGP(A(d)\) that are listed in public IP blacklists                                \\ \cline{2-2} 
                                        & \# of IPs in \(AS(A(d))\) that are listed in public IP blacklists                                  \\ \hline
-    \end{tabularx}
-    \end{table}
-
-\todo{all formulas explained?}
-
+\end{tabularx}
+\end{table}


 \begin{figure}[!htbp]
@@ -164,16 +160,50 @@ For the evidence-based features, public information and exclusively collected da
 \subsection{Reputation Engine}
 \label{subsec:notos_reputation_engine}

-The reputation engine is used to dynamically assign a reputation score to a domain \textit{d}. In order to be able to achieve this, the engine has to be trained first. The training clusters 
+The reputation engine is used to dynamically assign a reputation score to a domain \textit{d}. In the first step, the engine has to be trained with the available training set (temporal defined as the \textit{training period}). The training is performed in an offline fashion which means all data is statically available at the beginning of this step. The training mode consists of three modules: The \textit{Network Profile Model} is a model of how known good domains are using resources. This model uses popular content delivery networks (e.g. Akamai, Amazon CloudFront) and large sites (e.g. google.com, yahoo.com) as a base. In total the \textit{Network Profile Model} consists of five classes of domains: \textit{Popular Domains}, \textit{Common Domains}, \textit{Akamai Domains}, \textit{CDN Domains} and \textit{Dynamic DNS Domains}. The second module \textit{Domain Name Clusters} performs a general clustering of all domains (respectively their statistical feature vectors) of the training set. There are two consecutive clustering processes: The \textit{network-based} clustering aims to group domains with similar agility characteristics. To refine those clusters, a \textit{zone-based} clustering is performed which groups domains that are similar in terms of its RHDNs (see explanation for the \textit{zone-based features}). Those clusters of domains with similar characteristics can then be used to identify mostly benign and malicious sets of domains. In the last step of the offline mode, the \textit{Reputation Function} is build. As seen in Figure~\ref{fig:notos_online_offline_mode} this module takes the results of the \textit{Network Profile Model} (\(NM(d_i)\)) and the \textit{Domain Name Clusters} (\(DC(d_i)\)) for each domain \textit{d} in \(d_i, i = 1..n\) as inputs, calculates an \textit{Evidence Features Vector} \(EV(d_i)\), which basically checks if \(d_i\) or any of its resolved IPs is known to be benign or malicious, and builds a model that can assign a reputation score between zero and one to \textit{d}. This \textit{Reputation Function} is implemented as a statistical classifier. These three modules form the reputation model that can be used in the last step to compute the reputation score. A rebuild of the training model can be done at any time, for example given an updated training set.
+
+The final stage of the reputation engine is the online (streaming like) mode. Any considered domain \textit{d} is first supplied to the \textit{network profiles} module which returns a probability vector \(NM(d) = \{c_1, c_2, ..., c_5\}\) of how likely \textit{d} belongs to one of the five classes (e.g. probability \(c_1\) that \textit{d} belongs to \textit{Popular Domains}). \(DC(d)\) is the resulting vector of the \textit{domain clusters} module and can be broken down into the following parts: For the domain \textit{d} of interest, the network-based features are extracted and the closest network-based cluster \(C_d\), generated in the training mode by the \textit{Domain Name Clusters} module, is calculated. The following step takes all zone-based feature vectors \(v_j \in C_d\) and eliminates those vectors that do not fulfill \(dist(z_d , v_j ) < R\), where \(z_d\) is the zone-based feature vector for \textit{d} and \textit{R} being a predefined radius; or \(v_j \in KNN(z_d)\), with \(KNN(z_d)\)) being the k nearest-neighbors of \(z_d\). Each vector \(v_i\) of the resulting subset \(V_d \subseteq C_d\) is then assigned one of this eight labels: \textit{Popular Domains}, \textit{Common Domains}, \textit{Akamai}, \textit{CDN}, \textit{Dynamic DNS}, \textit{Spam Domains}, \textit{Flux Domains}, and \textit{Malware Domains}. The next step is to calculate the five statistical features that form the resulting vector \(DC(d) = \{l_1, l_2, ..., l_5\}\).
+
+\begin{enumerate}
+    \item \(l_1\) the \textit{majority class label} \textit{L}, i.e. the most common label in \(v_i \in V_d\) (e.g. \textit{Spam Domains})
+    \item \(l_2\) the standard deviation of the occurrence frequency of each label
+    \item \(l_3\) mean of the distribution of distances between \(z_d\) and the vectors \(v_j \in V_{d}^{(L)}\), where \(V_{d}^{(L)} \subseteq V_d\) is the subset of those vectors, associated with the \textit{majority class label} \textit{L}
+\end{enumerate}
+
+Having the \textit{Network Profile Model} \(NM(d)\), the \textit{Domain Name Clusters} \(DC(d_i)\), and the \textit{Evidence Features Vector} \(EV(d)\), these vectors are combined into a sixteen dimensional feature vector \(v(d)\) which is then fed into the trained reputation function. This results in a reputation score \textit{S} in the range of [0, 1], where values close to zero represent a low reputation and such more likely represent malicious usage of the domain.


-==> a low false positive rate (0.38\%) and high true positive rate (96.8\%).
+\subsection{Results}
+\label{subsec:notos_results}
+
+In the last Section of the evaluation of \textit{Notos}, experimental results that have been published are listed. This covers metrics about the usage of raw data, lessons learned in the analyzation process (e.g. examined algorithms) and final acquisitions like precision and accuracy of the classification.
+
+\textit{Notos} being the first dynamic reputation system in the context of domain names, it is able to identify malicious domain names before they appear in public filter lists. To be able to assign reputation scores to new domains, \fsAuthor{Antonakakis:2010:BDR:1929820.1929844} used historic passive dns logs of a time span of 68 days with a total volume of 27,377,461 unique, successful A-type resolutions mainly from two recursive ISP DNS servers in North America (plus pDNS logs from various networks, aggregated by the SIE \ref{subsec:notos_architecture}). Figure~\ref{fig:notos_volume_new_rr} shows that after a few days, the number of new domains (RR) stabilizes at about 100,000 to 150,000 new domains a day compared to a much higher total load of unique resource records (about 94.7\% duplicates) (see Figure~\ref{fig:notos_total_volume_unique_rr}). The amount of new IPs is analogously nearly constant. After few weeks, even big content delivery networks with a large (but nearly constant) number of IP addresses will get scanned, in contrast to botnets where continuously new machines are infected. The authors follow that a relatively small pDNS database is therefor sufficient for \textit{Notos} to produce good results.
+
+\begin{figure}[!htbp]
+    \centering
+    \includegraphics[scale=.3, clip=true]{content/Evaluation_of_existing_Systems/Notos_new-RR.png}
+    \caption{Notos: Volume of new unseen RRs \fsCite[Figure 7b]{Antonakakis:2010:BDR:1929820.1929844}}
+    \label{fig:notos_volume_new_rr}
+\end{figure}
+
+\begin{figure}[!htbp]
+    \centering
+    \includegraphics[scale=.3, clip=true]{content/Evaluation_of_existing_Systems/Notos_total-RR.png}
+    \caption{Notos: Total volume of unique RRs \fsCite[Figure 7a]{Antonakakis:2010:BDR:1929820.1929844}}
+    \label{fig:notos_total_volume_unique_rr}
+\end{figure}
+
+To get optimal results with the \textit{Reputation Function}, several classifiers have been tested and selected for the given circumstances (time complexity, detection results and precision [true positives over all positives]). A decision tree with Logit-Boost strategy has shown to provide the best results with a low false positive rate (FP) of 0.38\% and a high true positive rate (TP) of 96.8\%. These results have been verified using a 10-fold cross-validation with a reputation score threshold of 0.5. For this validation, a dataset of 20,249 domains with 9,530 known bad RR has been used. As the list of known good domains, the Alexa top 500 websites have been used. Taking a bigger amount of Alexa popular sites has shown to decrease accuracy of the overall system, e.g. 100,000 entries showed a TP of 80.6\% and a FP of 0.6\%. To compare \textit{Notos}' performance with static filter lists, a pre-trained instance has been fed with 250,000 unique domains collected on 1. August 2009. 10,294 distinct entries have been reported with a reputation score below 0.5. 7,984 of this 10,294 or 77.6\% could be found in at least one blacklist (see Section~\nameref{subsec:notos_architecture} for a list of included blacklists). The remaining 22.4\% could not be precisely revealed. It is worth stating that 7,980 of the 7,984 confirmed bad domain names were assigned a reputation score of less than or equal to 0.15.
+

 \section{Exposure}
 \label{sec:exposure}

+
 \section{Kopis}
 \label{sec:kopis}

+
 \section{Results and Comparison}
 \label{sec:results_and_comparison}
--- a/Thesis/content/Evaluation_of_existing_Systems/Notos_System_overview.png
+++ b/Thesis/content/Evaluation_of_existing_Systems/Notos_System_overview.png
--- a/Thesis/content/Evaluation_of_existing_Systems/Notos_features.png
+++ b/Thesis/content/Evaluation_of_existing_Systems/Notos_features.png
--- a/Thesis/content/Evaluation_of_existing_Systems/Notos_new-RR.png
+++ b/Thesis/content/Evaluation_of_existing_Systems/Notos_new-RR.png
--- a/Thesis/content/Evaluation_of_existing_Systems/Notos_offline-online_mode.png
+++ b/Thesis/content/Evaluation_of_existing_Systems/Notos_offline-online_mode.png
--- a/Thesis/content/Evaluation_of_existing_Systems/Notos_total-RR.png
+++ b/Thesis/content/Evaluation_of_existing_Systems/Notos_total-RR.png