Pre-requisites
- Recursion
- Object Oriented Programming
- Linked List (optional)
Introduction
Trees, particularly binary trees, are a fundamental data structure in computer science, renowned for their efficiency in various operations.
- Efficient Insertion, Deletion, and Search: One of the primary reasons for using binary trees is their efficiency in adding, deleting, and finding items. In a balanced binary tree, these operations can be performed in O(log n) time, making them significantly faster than linear data structures like arrays or linked lists for large data sets.
- Ordered Structure: Binary trees have an inherent ordered nature, especially binary search trees (BSTs). In a BST, the left subtree of a node contains only nodes with keys less than the node’s key, and the right subtree contains only nodes with keys greater than the node’s key. This property ensures that items are stored in an ordered manner, facilitating operations like in-order traversal, which retrieves items in a sorted sequence.
- Cost Efficiency: Binary trees are also memory-efficient. They dynamically allocate memory for nodes, unlike arrays, which may allocate more memory than needed. This dynamic allocation makes them suitable for systems where memory efficiency is crucial.
Limitations of Binary Trees
While binary trees offer several advantages, they are not without limitations, especially in certain configurations.
- Performance in Unbalanced Trees: A major limitation arises with unbalanced binary trees. In the worst case, where the tree takes a form similar to a linked list (e.g., each node has only one child), the time complexity of operations like search, insert, and delete degrades to O(n). This situation effectively nullifies the primary purpose of using a binary tree by losing the log(n) efficiency.
- Inefficiency with Sorted Data: When inserting already sorted data into a regular BST, the tree becomes unbalanced, forming a structure similar to a linked list. This leads to inefficient operations, contrary to the desired quick access times of a BST.
Solving the Problem: Self-Balancing Binary Trees
To address these limitations, self-balancing binary trees like AVL (Adelson-Velsky and Landis) trees were developed.
- What are Self-Balancing Binary Trees?: Self-balancing binary trees are a type of binary tree that automatically adjusts their structure to remain balanced at all times.
- AVL Trees: AVL trees are a popular type of self-balancing binary tree. They maintain a balance factor for each node (the difference in heights between the left and right subtrees) and perform rotations to ensure that this balance factor remains -1, 0, or +1. This balancing ensures that the tree remains approximately balanced, maintaining O(log n) complexity for insertions, deletions, and searches.
Where binary trees are used?
Binary trees are versatile data structures extensively used across various domains in computer science. Here’s an overview of some key areas where binary trees find significant application
Search and Sorting Algorithms: Binary Search Trees (BST)
Binary Search Trees (BSTs) are a cornerstone in the world of search and sorting algorithms. In a BST, each node is arranged in a manner where it holds a value greater than all the values in its left subtree and less than those in its right subtree. This unique property allows operations like search, insertion, and deletion to be performed in O(log n) time complexity in the best case, making it highly efficient compared to linear data structures. BSTs are extensively used in various software systems where data organization and quick retrieval are crucial.
Database Indices: AVL and Red-Black Trees
Databases frequently utilize binary trees, particularly balanced forms such as AVL and Red-Black Trees, to enhance the efficiency of data retrieval. These trees maintain a balance through rotations and color changes (in the case of Red-Black Trees), ensuring that the time complexity for insertion, deletion, and search remains at O(log n). This balance is key to maintaining high performance in database indexing, enabling quick access to data and efficient handling of queries.
Priority Queues and Heaps
Heaps, a type of complete binary tree, are crucial for implementing priority queues. In a heap, the tree is organized so that each parent node’s value is either greater than or equal to (in a max heap) or less than or equal to (in a min heap) its children’s values. This structure allows for efficient extraction of the minimum or maximum element, a process central to algorithms like heap sort and is also used in priority scheduling in operating systems.
Expression Parsing
Expression parsing, a fundamental aspect of compilers, uses binary trees to evaluate mathematical expressions or logical conditions. Each node in these trees typically represents an operator, while the leaves represent operands. By traversing these trees, compilers can effectively interpret and evaluate complex expressions, making them indispensable in software development, especially in programming language compilers and calculators.
Huffman Coding Trees
In the data compression, Huffman Coding Trees play a pivotal role. They are used to create a compression scheme where each leaf of the tree represents a character in the data, and the path from the root to that leaf represents its unique binary code. This method reduces the overall size of the data, making it more efficient for storage and transmission.
Routing Algorithms in Networking
Binary trees find their application in networking, particularly in routing algorithms. They help in organizing network paths, thereby facilitating efficient routing of data packets between various nodes. The structured nature of binary trees allows for manageable and scalable routing mechanisms, enhancing the speed and efficiency of data transmission across complex networks.
Abstract Syntax Trees (ASTs) in Compilers
ASTs are a special type of binary tree used extensively in programming language compilers. They represent the syntactical structure of the source code in a hierarchical manner, where each node denotes a construct in the code. ASTs play a crucial role in both compiling and interpreting code, forming the backbone of the syntax analysis phase in compilers.
Decision Trees in Machine Learning
In machine learning, binary trees form the basis of decision trees used for tasks like classification and regression. These trees split the dataset into subsets based on certain decision criteria, forming a tree-like model of decisions. Decision trees are valued for their clarity and ease of interpretation, making them popular in various machine learning applications.
Graphical Rendering: Binary Space Partitioning (BSP) Trees
BSP Trees are a type of binary tree used in computer graphics, particularly in 3D rendering and video games. They help in organizing objects in a scene, which optimizes the rendering process by deciding the order in which these objects need to be rendered. This improves the efficiency of graphical rendering, which is crucial for real-time applications.
File Systems
In file systems, variations of binary trees, such as B-trees and B+ trees, are employed for efficient storage and retrieval of files and directories. These tree structures are particularly effective in handling large-scale storage systems, enhancing the performance of file operations by ensuring balanced trees and thus maintaining O(log n) time complexity for most operations.
Terminologies
Root
The root of a binary tree is the topmost node and does not have a parent. It’s the starting point of the tree.
Leaf Node
A leaf node is a node with no children. It’s at the end of a branch in the tree.
Internal Node
An internal node (or non-leaf node) is any node in the tree that has one or more children.
Sibling Nodes
Sibling nodes are nodes that share the same parent.
Ancestors and Descendants
Ancestors of a node include its parent, the parent’s parent, and so on up to the root. Descendants are any nodes for which it is an ancestor.
Subtree
A subtree is a portion of a tree that consists of a node and all its descendants.
Depth of a Node
The depth of a node is the number of edges from the root to the node.
Height of a Node
The height of a node is the number of edges on the longest path from the node to a leaf.
Height of the Tree
The height of the tree is the height of its root node.
Level
The level of a node is defined by 1 + the number of connections between the node and the root.
Root (1)[Height of Tree: 3] ----> Level 0
/ \
/ \
/ \
Sibling Nodes (2) (3) Sibling Nodes ----> Level 1
/ \ /
/ \ /
Internal Nodes (4) (5) (6) Internal Nodes ----> Level 2
/ \
/ \
Leaf Nodes (7) (8) Leaf Nodes ----> Level 3
Terminologies:
-------------
- Root: Top node without a parent (Node 1).
- Leaf Nodes: Nodes with no children (Nodes 7, 8).
- Internal Nodes: Nodes with at least one child (Nodes 2, 3, 4, 6).
- Sibling Nodes: Nodes sharing the same parent (e.g., Nodes 2 and 3).
- Ancestors of Node 8: Nodes 6, 3, 1.
- Descendants of Node 2: Nodes 4, 5, 7.
- Subtree: Node 2 and all its descendants (Nodes 4, 5, 7).
- Depth of Node 6: 2 (Edges from Root to Node 6).
- Height of Node 2: 2 (Edges on longest path from Node 2 to Leaf).
- Level: Defined by distance from the root (Root is Level 0).
Types of binary trees
Full Binary Tree
- Every node in the tree has either 0 or 2 children.
- No node has only one child.
(A)
/ \
(B) (C)
/ \
(D) (E)
Complete Binary Tree
- All levels are completely filled except possibly the last level.
- The last level has all keys as left as possible.
(A)
/ \
(B) (C)
/ \ /
(D) (E)(F)
Perfect Binary Tree
- All internal nodes have two children and all leaves are at the same level.
- A perfect binary tree of height
h
has2^(h+1) – 1
nodes. (Geometric progression) - Total leaf nodes in a perfect binary tree of height
**h**
is**2^h**
- Total internal nodes in a perfect binary tree of height
**h**
is**2^(h+1) - 1 - 2^h**
(A)
/ \
(B) (C)
/ \ / \
(D) (E)(F) (G)
- All levels are completely filled.
Balanced Binary Tree
The height of the left and right subtrees of any node at the same level differ by no more than 1.
(A)
/ \
(B) (C)
/ / \
(D) (E) (F)
- Differences in height are within 1 for all nodes.
Degenerate (or Pathological) Tree
- Every internal node has one child.
- Resembles a linked list.
(A)
\
(B)
\
(C)
\
(D)
- Resembles a straight line (linked list).
Binary Search Tree (BST)
- Left subtree of a node contains only nodes with keys less than the node’s key.
- Right subtree of a node contains only nodes with keys greater than the node’s key.
- The left and right subtree each must also be a binary search tree.
(5)
/ \
(3) (7)
/ \ \
(2) (4) (8)
- Follows BST properties.
Binary Tree Implementation (code)
Lets start by implementing a simple binary tree. In a binary tree, each node has at most two children, referred to as the left and right children. The Tree
class is designed to hold a value and pointers to these two children.
class Tree:
def __init__(self, key):
self.left = None
self.right: Tree = None
self.value: Tree = key
def insert(root, key) -> Tree:
if root is None:
return Tree(key)
else:
position = input(f'Do you want to insert {key} in left or right? [l/r]')
if position == 'l':
root.left = insert(root.left, key)
else:
root.right = insert(root.right, key)
return root
root = None
root = insert(root, 10)
root = insert(root, 5)
root = insert(root, 15)
root = insert(root, 3)
root = insert(root, 7)
root = insert(root, 12)
root = insert(root, 18)
In this snippet, the Tree
class initializes with a key
, and left
and right
are set to None
, indicating no children initially.
The insert
function takes a root
node and a key
. If root
is None
, it creates a new Tree
node with the key
. Otherwise, it prompts the user to decide whether to insert the new key to the left or right of the current node. This function uses recursion to navigate through the tree and insert the new node at the correct position.
Next, lets add a height attribute to each node. The height of a node is nothing but the number of edges on the longest path from the node to a leaf.
class Tree:
def __init__(self, key):
self.left = None
self.right: Tree = None
self.value: Tree = key
self.height = 0
def insert(root, key) -> Tree:
if root is None:
return Tree(key)
else:
position = input(f'Do you want to insert {key} in left or right? [l/r]')
if position == 'l':
root.left = insert(root.left, key)
else:
root.right = insert(root.right, key)
left_height = root.left.height if root.left else 0
right_height = root.right.height if root.right else 0
root.height = 1 + max(left_height, right_height)
return root
def inorder_traversal(root):
if root:
inorder_traversal(root.left)
print(f"({root.value}, h={root.height})", end=" ")
inorder_traversal(root.right)
root = None
root = insert(root, 10)
root = insert(root, 5)
root = insert(root, 15)
root = insert(root, 3)
root = insert(root, 7)
root = insert(root, 12)
root = insert(root, 18)
inorder_traversal(root)
In the revised insert
function, after inserting a node, we update the height
attribute. The height of a node is calculated as 1 plus the maximum height of its left and right children. This recursive approach ensures that each node’s height is updated for every insert.
Can We Convert this to a Binary Search Tree?
Yes, converting our existing binary tree to a Binary Search Tree is fairly straightforward. The key step involves a simple tweaking in the insert
method. Specifically, we need to ensure that any key less than a node’s value is directed to the left subtree, while keys greater than the node’s value are placed in the right subtree. This adjustment aligns our tree with the fundamental ordering principle of a BST.
Binary Search Tree Implementation (code)
class TreeNode:
def __init__(self, key):
self.left = None
self.right = None
self.val = key
self.height = 0
def insert(node, key):
if node is None:
return TreeNode(key)
else:
if key < node.val:
node.left = insert(node.left, key)
else:
node.right = insert(node.right, key)
left_height = node.left.height if node.left else 0
right_height = node.right.height if node.right else 0
node.height = 1 + max(left_height, right_height)
return node
def inorder_traversal(node):
if node:
inorder_traversal(node.left)
print(node.val, end=" ")
inorder_traversal(node.right)
def preorder_traversal(node):
if node:
print(node.val, end=" ")
preorder_traversal(node.left)
preorder_traversal(node.right)
def postorder_traversal(node):
if node:
postorder_traversal(node.left)
postorder_traversal(node.right)
print(node.val, end=" ")
root = None
root = insert(root, 8)
root = insert(root, 3)
root = insert(root, 10)
root = insert(root, 1)
root = insert(root, 6)
inorder_traversal(root)
print("\n")
preorder_traversal(root)
print("\n")
postorder_traversal(root)
Ensuring Tree Balance A balanced binary tree is one where the difference in heights of the left and right subtrees of any node at same level is no more than 1. This property is essential for maintaining the efficiency of operations in the tree.
Now that we have height of each node, lets check if the tree is balanced.
class TreeNode:
def __init__(self, key):
self.left = None
self.right = None
self.val = key
self.height = 0
def insert(node, key):
if node is None:
return TreeNode(key)
else:
if key < node.val:
node.left = insert(node.left, key)
else:
node.right = insert(node.right, key)
left_height = node.left.height if node.left else 0
right_height = node.right.height if node.right else 0
node.height = 1 + max(left_height, right_height)
return node
def inorder_traversal(node):
if node:
inorder_traversal(node.left)
print(node.val, end=" ")
inorder_traversal(node.right)
def preorder_traversal(node):
if node:
print(node.val, end=" ")
preorder_traversal(node.left)
preorder_traversal(node.right)
def postorder_traversal(node):
if node:
postorder_traversal(node.left)
postorder_traversal(node.right)
print(node.val, end=" ")
def is_balanced(node) -> bool:
if node is None:
return True
left_height = node.left.height if node.left else 0
right_height = node.right.height if node.right else 0
if abs(left_height - right_height) > 1:
return False
return is_balanced(node.left) and is_balanced(node.right)
root = None
root = insert(root, 8)
root = insert(root, 3)
root = insert(root, 10)
root = insert(root, 1)
root = insert(root, 6)
inorder_traversal(root)
print("\n")
preorder_traversal(root)
print("\n")
postorder_traversal(root)
print("\n")
print(is_balanced(root))
Binary Tree Traversal Methods and Their Uses
1. Pre-order Traversal (N → L → R)
Process: Visit the node first, then traverse the left and right subtrees.
Uses:
- Tree Copying or Cloning: In scenarios where an exact replica of the tree structure is required, pre-order traversal is ideal. By visiting nodes before their children, it preserves the tree’s structure in the copy.
- Serialization and Deserialization: For converting a tree into a linear sequence of values (like for storing in a file or transmitting over a network), pre-order traversal is effective. It captures the tree structure in a way that allows for accurate reconstruction later.
- Expression Trees: In compiler design, pre-order traversal is used to evaluate prefix expressions in expression trees.
Example:
Pre-order 8 [8,3,1,6,10] / \ | | | | / \ | ----- | 3 10 N L R / \ / \ 1 6
2. In-order Traversal (L → N → R)
Process: Traverse the left subtree, visit the node, then traverse the right subtree.
Uses:
- Sorted Traversal in BSTs: In a Binary Search Tree, in-order traversal retrieves the elements in ascending order. This property is crucial for operations like displaying sorted data, range queries, and database indexing.
- Debugging: It’s a simple way to check if a binary tree is a valid BST or not, as the output should be in sorted order for BSTs.
Example:
In-order 8 [1,3,6,8,10] / \ | | | | | / \ L N R | | 3 10 | | | | / \ ----- | | / \ | | | 1 6 L N R
3. Post-order Traversal (L → R → N)
Process: Traverse the left and right subtrees, then visit the node.
Uses:
- Tree Deletion: For safely deleting nodes from a tree, post-order traversal ensures that children are processed before the parent, which is crucial for clean, recursive deletion.
- Bottom-Up Calculations: In algorithms that require processing from leaf nodes upwards, like calculating the height of nodes, subtree sizes, or dynamic programming on trees, post-order traversal is essential.
- Expression Trees: In evaluating postfix expressions in expression trees, used in certain types of calculators and interpreters.
Example:
Post-order 8 [1,6,3,10,8] / \ | | | | | / \ L R N | | 3 10 | | | | | / \ ----- | | / \ | | | 1 6 L R N