## Python Program to Implement Decision Tree ID3 Algorithm

Exp. No. 3. Write a program to demonstrate the working of the decision tree based ID3 algorithm. Use an appropriate data set for building the decision tree and apply this knowledge to classify a new sample.

### Decision Tree ID3 Algorithm Machine Learning

ID3(Examples, Target_attribute, Attributes) Examples are the training examples. Target_attribute is the attribute whose value is to be predicted by the tree. Attributes is a list of other attributes that may be tested by the learned decision tree. Returns a decision tree that correctly classifies the given Examples. Create a Root node for the tree If all Examples are positive, Return the single-node tree Root, with label = + If all Examples are negative, Return the single-node tree Root, with label = - If Attributes is empty, Return the single-node tree Root, with label = most common value of Target_attribute in Examples Otherwise Begin A ← the attribute from Attributes that best* classifies Examples The decision attribute for Root ← A For each possible value, vi, of A, Add a new tree branch below Root, corresponding to the test A = vi Let Examples vi, be the subset of Examples that have value vi for A If Examples vi , is empty Then below this new branch add a leaf node with label = most common value of Target_attribute in Examples Else below this new branch add the subtree ID3(Examples vi, Targe_tattribute, Attributes – {A})) End Return Root

The best attribute is the one with highest information gain

**ENTROPY:**

Entropy measures the impurity of a collection of examples*.*

Where, *p _{+} *is the proportion of positive examples in S

*p _{–} *is the proportion of negative examples in S.

**INFORMATION GAIN:**

** Information gain, **is the expected reduction in entropy caused by partitioning the examples according to this attribute.

The information gain, Gain(S, A) of an attribute A, relative to a collection of examples S, is defined as

### Dataset:

PlayTennis Dataset is saved as .csv (comma separated values) file in the current working directory otherwise use the complete path of the dataset set in the program:

Day | Outlook | Temperature | Humidity | Wind | PlayTennis |

D1 | Sunny | Hot | High | Weak | No |

D2 | Sunny | Hot | High | Strong | No |

D3 | Overcast | Hot | High | Weak | Yes |

D4 | Rain | Mild | High | Weak | Yes |

D5 | Rain | Cool | Normal | Weak | Yes |

D6 | Rain | Cool | Normal | Strong | No |

D7 | Overcast | Cool | Normal | Strong | Yes |

D8 | Sunny | Mild | High | Weak | No |

D9 | Sunny | Cool | Normal | Weak | Yes |

D10 | Rain | Mild | Normal | Weak | Yes |

D11 | Sunny | Mild | Normal | Strong | Yes |

D12 | Overcast | Mild | High | Strong | Yes |

D13 | Overcast | Hot | Normal | Weak | Yes |

D14 | Rain | Mild | High | Strong | No |

Click here to download dataset

### Python Program to Implement and Demonstrate FIND-S Algorithm

import pandas as pd import math import numpy as np data = pd.read_csv("3-dataset.csv") features = [feat for feat in data] features.remove("answer") class Node: def __init__(self): self.children = [] self.value = "" self.isLeaf = False self.pred = "" def entropy(examples): pos = 0.0 neg = 0.0 for _, row in examples.iterrows(): if row["answer"] == "yes": pos += 1 else: neg += 1 if pos == 0.0 or neg == 0.0: return 0.0 else: p = pos / (pos + neg) n = neg / (pos + neg) return -(p * math.log(p, 2) + n * math.log(n, 2)) def info_gain(examples, attr): uniq = np.unique(examples[attr]) #print ("\n",uniq) gain = entropy(examples) #print ("\n",gain) for u in uniq: subdata = examples[examples[attr] == u] #print ("\n",subdata) sub_e = entropy(subdata) gain -= (float(len(subdata)) / float(len(examples))) * sub_e #print ("\n",gain) return gain def ID3(examples, attrs): root = Node() max_gain = 0 max_feat = "" for feature in attrs: #print ("\n",examples) gain = info_gain(examples, feature) if gain > max_gain: max_gain = gain max_feat = feature root.value = max_feat #print ("\nMax feature attr",max_feat) uniq = np.unique(examples[max_feat]) #print ("\n",uniq) for u in uniq: #print ("\n",u) subdata = examples[examples[max_feat] == u] #print ("\n",subdata) if entropy(subdata) == 0.0: newNode = Node() newNode.isLeaf = True newNode.value = u newNode.pred = np.unique(subdata["answer"]) root.children.append(newNode) else: dummyNode = Node() dummyNode.value = u new_attrs = attrs.copy() new_attrs.remove(max_feat) child = ID3(subdata, new_attrs) dummyNode.children.append(child) root.children.append(dummyNode) return root def printTree(root: Node, depth=0): for i in range(depth): print("\t", end="") print(root.value, end="") if root.isLeaf: print(" -> ", root.pred) print() for child in root.children: printTree(child, depth + 1) root = ID3(data, features) printTree(root)

### Output:

### Solved Numerical Examples and Tutorial on Decision Trees Machine Learning:

**1. How to build a decision Tree for Boolean Function Machine Learning**

**2. How to build a decision Tree for Boolean Function Machine Learning**

**3. How to build Decision Tree using ID3 Algorithm – Solved Numerical Example – 1**

**4. How to build Decision Tree using ID3 Algorithm – Solved Numerical Example -2**

**5. How to build Decision Tree using ID3 Algorithm – Solved Numerical Example -3**

**6. Appropriate Problems for Decision Tree Learning Machine Learning Big Data Analytics**

**7. How to find the Entropy and Information Gain in Decision Tree Learning**

**8. Issues in Decision Tree Learning Machine Learning**

**9. How to Avoid Overfitting in Decision Tree Learning, Machine Learning, and Data Mining**

**10. How to handle Continuous Valued Attributes in Decision Tree Learning, Machine Learning **

## Summary

This tutorial discusses how to Implement and demonstrate the Decision Tree ID3 Algorithm in Python. The training data is read from a .CSV file. If you like the tutorial share it with your friends. Like the **Facebook page** for regular updates and **YouTube channel** for video tutorials.