What are Data Structures?
Data structures are specialized formats for organizing, processing, retrieving, and storing data. They provide a way to manage large amounts of data efficiently for uses such as large databases and internet indexing services.
Think of data structures as different types of containers, each designed to store data in a specific way that makes certain operations more efficient.
Why are Data Structures Important?
def find_in_list(arr, target):
for i in range(len(arr)):
if arr[i] == target:
return i
return -1
def find_in_set(data_set, target):
return target in data_set
import time
large_list = list(range(1000000))
large_set = set(range(1000000))
start = time.time()
find_in_list(large_list, 999999)
print(f"List search time: {time.time() - start}")
start = time.time()
find_in_set(large_set, 999999)
print(f"Set search time: {time.time() - start}")
Types of Data Structures
1. Linear Data Structures
Data elements arranged in sequential order.
Arrays (Lists in Python)
arr = [1, 2, 3, 4, 5]
arr.append(6)
arr.insert(0, 0)
arr.pop()
arr[2] = 10
Linked Lists
class Node:
def __init__(self, data):
self.data = data
self.next = None
class LinkedList:
def __init__(self):
self.head = None
def append(self, data):
if not self.head:
self.head = Node(data)
return
current = self.head
while current.next:
current = current.next
current.next = Node(data)
Stacks
class Stack:
def __init__(self):
self.items = []
def push(self, item):
self.items.append(item)
def pop(self):
if not self.is_empty():
return self.items.pop()
def is_empty(self):
return len(self.items) == 0
def peek(self):
if not self.is_empty():
return self.items[-1]
Queues
from collections import deque
class Queue:
def __init__(self):
self.items = deque()
def enqueue(self, item):
self.items.append(item)
def dequeue(self):
if not self.is_empty():
return self.items.popleft()
def is_empty(self):
return len(self.items) == 0
def front(self):
if not self.is_empty():
return self.items[0]
2. Non-Linear Data Structures
Trees
class TreeNode:
def __init__(self, data):
self.data = data
self.left = None
self.right = None
class BinarySearchTree:
def __init__(self):
self.root = None
def insert(self, data):
if not self.root:
self.root = TreeNode(data)
else:
self._insert_recursive(self.root, data)
def _insert_recursive(self, node, data):
if data < node.data:
if node.left is None:
node.left = TreeNode(data)
else:
self._insert_recursive(node.left, data)
else:
if node.right is None:
node.right = TreeNode(data)
else:
self._insert_recursive(node.right, data)
Graphs
class Graph:
def __init__(self):
self.graph = {}
def add_vertex(self, vertex):
if vertex not in self.graph:
self.graph[vertex] = []
def add_edge(self, vertex1, vertex2):
if vertex1 in self.graph:
self.graph[vertex1].append(vertex2)
else:
self.graph[vertex1] = [vertex2]
Abstract Data Types (ADTs)
Abstract Data Types are mathematical models for data types where a data type is defined by its behavior from the point of view of a user of the data.
Common ADTs include:
- List ADT
- Stack ADT
- Queue ADT
- Dictionary ADT
- Tree ADT
- Graph ADT
Choosing the Right Data Structure
The choice of data structure depends on:
- What needs to be stored
- Cost of operations
- Memory usage
- Ease of implementation
Data Structure | Advantages | Disadvantages | Best Used When |
---|
Array | Fast access | Fixed size | Size is known |
Linked List | Dynamic size | Slow access | Frequent insertions |
Stack | LIFO access | Limited access | Parsing, backtracking |
Queue | FIFO access | Limited access | Order processing |
Tree | Hierarchical | Complex | Hierarchical data |
Graph | Relationships | Complex | Network modeling |
Hash Table | Fast access | Space overhead | Fast lookup needed |
Complexity Analysis
Understanding time and space complexity is crucial for choosing the right data structure:
def constant_time(arr):
return arr[0] if arr else None
def linear_time(arr):
total = 0
for num in arr:
total += num
return total
def quadratic_time(arr):
n = len(arr)
for i in range(n):
for j in range(n):
print(arr[i], arr[j])
Memory Considerations
Different data structures have different memory requirements:
import sys
numbers_list = list(range(1000))
numbers_set = set(range(1000))
numbers_dict = {i: i for i in range(1000)}
print(f"List size: {sys.getsizeof(numbers_list)}")
print(f"Set size: {sys.getsizeof(numbers_set)}")
print(f"Dict size: {sys.getsizeof(numbers_dict)}")