master_thesis/Thesis/content/Technical_Background/Technical_Background.tex

\chapter{Technical Background}
\label{cha:technical_background}

\input{content/Technical_Background/DNS/DNS}

\section{Machine Learning}
\label{sec:machine_learning}

Machine learning is a broad field in computer science that aims to give computers the ability to learn without being explicitly programmed for a special purpose. There are many different approaches available that have advantages and disadvantages in different areas like object recognition in images, self driving cars or forecastings. Machine learning in this work is mostly limited to decision tree learning. Decision tree learning is an approach that is generally adopted from how humans are making decisions. Given a set of attributes, humans are able to decide, e.g. whether to buy one or another product. Machine learning algorithms use a technique called training to build a model which can later be used to make decisions and e.g. classify a dataset. A decision tree consists of three components: a node represents the test of a certain attribute to split up the tree, leafs are terminal nodes and represent a prediction (class/label) using all attributes in the trace from the root node to the leaf, and edges correspond to the results of a test and establish a connection to the next node or leaf. The training is performed in multiple steps. Input for the training is an arbitrarily large dataset (training set) with an fixed size of features (attributes) and for each sample in the training set, the corresponding label has to be known. The amount of labels or classes is arbitrary (but limited), in a binary classification there are two different labels (e.g. malicious or benign in the case of this work). In the first step of the training, the whole training set is iterated and each time a set of samples can be separated using one single attribute (in perspective to the assigned label) it is branched out and a new leaf is created. Each branch is then split into more fine grained subtrees as long as there is an \textit{information gain}, which means that not all samples of the subset belong to the same class, i.e. are assigned the same label. The model can later be queried with an unlabeled data sample and the model returns the probability with which the data sample can be assigned to a class.

This way, having a labeled training set with limited size and by learning the characteristics of the labeled test sample, unlabeled data can be classified. The most popular decision tree implementation is \textit{C4.5} \fsCite{Salzberg1994}. Many current implementations like \textit{CART} (Classification and Regression Trees \fsCite{SciKitOnline}) or \textit{J48} are based off of \textit{C4.5}.