# Decision Tree Induction

• Decision Tree is a tree that helps us in decision-making purposes. Decision tree creates classification or regression models as a tree structure.
• It separates a data set into smaller subsets, and at same time, decision tree is steadily developed. Decision node has at least two branches. leaf nodes show a classification or decision. Decision trees can deal with both categorical and numerical data.

## Entropy

• Entropy refers a common way to measure impurity. It measures the randomness or impurity in data sets. ## Information Gain

• It refers to decline in entropy after dataset is split. It is also called Entropy Reduction. • Decision tree is just like a flow chart diagram with terminal nodes showing decisions.

## Why are decision trees useful

• It enables us to analyze the possible consequences.
• It provides us a framework to measure the values of outcomes.
• It helps us to make the best decisions based on existing data.
• The decision tree model comprises a set of rules for portioning a huge heterogeneous population into smaller, more homogeneous, or mutually exclusive classes given data of attributes together with its class, a decision tree creates a set of rules that can be used to identify the class. A decision tree creates a set of rules that can be used to identify the class. Rule is implemented after another, resulting in a hierarchy of segments within a segment.
• The hierarchy is known as the tree. Each segment is called a node. With each progressive division, the members from the subsequent sets become more and more similar to each other. The algorithm used to build a decision tree is referred to as recursive partitioning. The algorithm is called as CART (Classification and Regression Trees)
• The given example of a factory where • Management teams need to take a data-driven decision to expand or not based on the given data.
• Net Expand = ( 0.6 *8 + 0.4*6 ) - 3 = \$4.2M
Net Not Expand = (0.6*4 + 0.4*2) - 0 = \$3M
\$4.2M > \$3M, the factory should be expanded.

## Decision tree Algorithm

• Algorithm is based on three parameters: D, attribute_list, and Attribute _selection_method. It refer to D as a data partition.
• D - It is the entire set of training tuples and their related class levels.
• attribute_list - It is a set of attributes defining tuples.
• Attribute_selection_method - It specifies a heuristic process for choosing attribute that "best" discriminates given tuples according to class. Attribute_selection_method process applies attribute selection measure