What is a Decision Tree?
A decision tree is a machine learning method used for regression and classification. In practice, however, decision trees are mainly used for classification problems. The basic goal of a decision tree is to predict a target variable (either categorical or numerical) using simple decision rules derived from the data set. The name "decision tree" is derived from their appearance, which is reminiscent of a tree that is upside down.
When using decision trees, node splitting is crucial. An optimal node splitting is achieved when all nodes are as "pure" as possible. Nodes are therefore split further and further until the nodes are no longer "pure" through further splitting. To ensure optimal node splitting, there are various mathematical models such as "Gini Impurity".
The following explains how a decision tree works using a simple example.
In our example, we want to predict whether to go jogging or not based on the weather. The decision whether to go jogging or not is our categorical target variable in this example. The top node of a decision tree is the root node. Formulated as a question, the first decision rule made at the root node in our example is: According to the weather forecast, should it be sunny, cloudy or raining? Depending on which answer is taken, one reaches the next node via one of the branches (=branch). If it is supposed to be sunny or rainy, you reach a so-called internal node, where you have to make another decision. If it is cloudy, one reaches directly a leaf node (=Leaf node). This node gives us the value for our target variable, in our example "Yes" or "No". For a simple decision tree like in our example, a decision can also be expressed verbally: If the weather forecast is sunny and the predicted humidity is high, you should not go jogging.
As with any machine learning algorithm, the use of decision trees has advantages and disadvantages. One of the biggest advantages is that they are easy to understand, interpret, and visualize, unlike other machine learning algorithms. One of the biggest disadvantages of decision trees is their tendency to overfit.
Some popular decision tree algorithms are CART, ID3 and C4.5.
#RoadToRevolution #MachineLearning #ai #decisiontree
Sources (translated): Towards Data Science and R2D3
Damage good. All good.