partially finished exposure; started kopis

This commit is contained in:
2018-01-16 18:30:45 +01:00
parent 0f10314bc8
commit 3586762494
12 changed files with 224 additions and 18 deletions

View File

@@ -60,6 +60,9 @@ Figure~\ref{fig:notos_system_overview} shows the overall system architecture of
In this Section, all statistical features are listed and a short explanation, for what reason those have been chosen, is introduced.
\subsubsection{Network-based features}
\label{subsubsec:notos_network-based_features}
The first group of features handles network-related keys. This group mostly describe how the owning operators of \textit{d} allocate network resources to achieve different goals. While most legitimate and professionally operated internet services feature have a rather stable network profile, malicious usage usually involves short living domain names and ip addresses with high agility to circumvent blacklisting and other simple types of resource blocking. Botnets usually contain machines in many different networks (\glspl{as} and \glspl{bgp}) operated by different organizations in different countries. Appropriate companies mostly acquire bigger ip blocks and such use consecutive IPs for their services in the same address space. This homogeneity also applies to other registration related information like registrars and registration dates. To measure this level of agility and homogeneity, eighteen statistical network-based features are extracted from the RHIPs (see Table~\ref{tab:notos_network-based_features}).
\begin{table}[!htbp]
@@ -90,6 +93,9 @@ The first group of features handles network-related keys. This group mostly desc
\end{tabularx}
\end{table}
\subsubsection{Zone-based features}
\label{subsubsec:notos_zone-based_features}
The second group is about zone-based features and is extracted from the RHDNs. In contrast to the network-based features which compares characteristics of the historic IPs, the zone-based features handles characteristics of all historically involved domains. While legitimate services often involve many domains, they usually share similarities. ``For example, google.com, googlesyndication.com, googlewave.com, etc., are all related to Internet services provided by Google, and contain the string 'google' in their name.''. In contrast, randomly generated domains used in spam campaigns are rarely sharing similarities. By calculating the mean, median and standard deviation for some key, the ``summarize [of] the shape of its distribution'' is investigated \fsCite[Section 3.2.2]{Antonakakis:2010:BDR:1929820.1929844}. To calculate this level of diversity, seventeen features are extracted which can be found in Table~\ref{tab:notos_zone-based_features}:
\begin{table}[!htbp]
@@ -119,6 +125,9 @@ The second group is about zone-based features and is extracted from the RHDNs. I
\end{tabularx}
\end{table}
\subsubsection{Evidence-based features}
\label{subsubsec:notos_evidence-based_features}
For the evidence-based features, public information and exclusively collected data from honeypots and spam-traps is collected. This \textit{knowledge base} primarily helps to discover if a domain \textit{d} is in some way interacting with known malicious IPs and domains. As domain names are much cheaper than ip addresses, malware authors tend to reuse IPs with updated domain names. The blacklist features detect the reuse of known malicious resources like IP addresses, \gls{bgp} prefixes and \glspl{as}.
\begin{table}[!htbp]