What are Data Structures?

Data structures are specialized formats for organizing, processing, retrieving, and storing data. They provide a way to manage large amounts of data efficiently for uses such as large databases and internet indexing services.

Think of data structures as different types of containers, each designed to store data in a specific way that makes certain operations more efficient.

Why are Data Structures Important?

# Example showing importance of data structure choice
# Finding an element:

# Using List (Linear Search) - O(n)
def find_in_list(arr, target):
    for i in range(len(arr)):
        if arr[i] == target:
            return i
    return -1

# Using Set (Hash Table) - O(1)
def find_in_set(data_set, target):
    return target in data_set

# Performance comparison
import time

large_list = list(range(1000000))
large_set = set(range(1000000))

start = time.time()
find_in_list(large_list, 999999)
print(f"List search time: {time.time() - start}")

start = time.time()
find_in_set(large_set, 999999)
print(f"Set search time: {time.time() - start}")

Types of Data Structures

1. Linear Data Structures

Data elements arranged in sequential order.

Arrays (Lists in Python)

# Basic array operations
arr = [1, 2, 3, 4, 5]
arr.append(6)       # Add element at end
arr.insert(0, 0)    # Add element at index
arr.pop()          # Remove last element
arr[2] = 10        # Update element

# Time Complexities
# Access: O(1)
# Search: O(n)
# Insert/Delete at end: O(1)
# Insert/Delete at middle: O(n)

Linked Lists

class Node:
    def __init__(self, data):
        self.data = data
        self.next = None

class LinkedList:
    def __init__(self):
        self.head = None
    
    def append(self, data):
        if not self.head:
            self.head = Node(data)
            return
        current = self.head
        while current.next:
            current = current.next
        current.next = Node(data)

# Time Complexities
# Access: O(n)
# Search: O(n)
# Insert/Delete at beginning: O(1)
# Insert/Delete at end: O(n)

Stacks

class Stack:
    def __init__(self):
        self.items = []
    
    def push(self, item):
        self.items.append(item)
    
    def pop(self):
        if not self.is_empty():
            return self.items.pop()
    
    def is_empty(self):
        return len(self.items) == 0
    
    def peek(self):
        if not self.is_empty():
            return self.items[-1]

# Time Complexities
# Push: O(1)
# Pop: O(1)
# Peek: O(1)

Queues

from collections import deque

class Queue:
    def __init__(self):
        self.items = deque()
    
    def enqueue(self, item):
        self.items.append(item)
    
    def dequeue(self):
        if not self.is_empty():
            return self.items.popleft()
    
    def is_empty(self):
        return len(self.items) == 0
    
    def front(self):
        if not self.is_empty():
            return self.items[0]

# Time Complexities
# Enqueue: O(1)
# Dequeue: O(1)
# Front: O(1)

2. Non-Linear Data Structures

Trees

class TreeNode:
    def __init__(self, data):
        self.data = data
        self.left = None
        self.right = None

class BinarySearchTree:
    def __init__(self):
        self.root = None
    
    def insert(self, data):
        if not self.root:
            self.root = TreeNode(data)
        else:
            self._insert_recursive(self.root, data)
    
    def _insert_recursive(self, node, data):
        if data < node.data:
            if node.left is None:
                node.left = TreeNode(data)
            else:
                self._insert_recursive(node.left, data)
        else:
            if node.right is None:
                node.right = TreeNode(data)
            else:
                self._insert_recursive(node.right, data)

# Time Complexities (Balanced BST)
# Search: O(log n)
# Insert: O(log n)
# Delete: O(log n)

Graphs

class Graph:
    def __init__(self):
        self.graph = {}
    
    def add_vertex(self, vertex):
        if vertex not in self.graph:
            self.graph[vertex] = []
    
    def add_edge(self, vertex1, vertex2):
        if vertex1 in self.graph:
            self.graph[vertex1].append(vertex2)
        else:
            self.graph[vertex1] = [vertex2]

# Time Complexities
# Add Vertex: O(1)
# Add Edge: O(1)
# Storage: O(V + E)

Abstract Data Types (ADTs)

Abstract Data Types are mathematical models for data types where a data type is defined by its behavior from the point of view of a user of the data.

Common ADTs include:

  1. List ADT
  2. Stack ADT
  3. Queue ADT
  4. Dictionary ADT
  5. Tree ADT
  6. Graph ADT

Choosing the Right Data Structure

The choice of data structure depends on:

  1. What needs to be stored
  2. Cost of operations
  3. Memory usage
  4. Ease of implementation
Data StructureAdvantagesDisadvantagesBest Used When
ArrayFast accessFixed sizeSize is known
Linked ListDynamic sizeSlow accessFrequent insertions
StackLIFO accessLimited accessParsing, backtracking
QueueFIFO accessLimited accessOrder processing
TreeHierarchicalComplexHierarchical data
GraphRelationshipsComplexNetwork modeling
Hash TableFast accessSpace overheadFast lookup needed

Complexity Analysis

Understanding time and space complexity is crucial for choosing the right data structure:

# Examples of different time complexities

def constant_time(arr):  # O(1)
    return arr[0] if arr else None

def linear_time(arr):    # O(n)
    total = 0
    for num in arr:
        total += num
    return total

def quadratic_time(arr): # O(n²)
    n = len(arr)
    for i in range(n):
        for j in range(n):
            print(arr[i], arr[j])

Memory Considerations

Different data structures have different memory requirements:

import sys

# Memory usage examples
numbers_list = list(range(1000))
numbers_set = set(range(1000))
numbers_dict = {i: i for i in range(1000)}

print(f"List size: {sys.getsizeof(numbers_list)}")
print(f"Set size: {sys.getsizeof(numbers_set)}")
print(f"Dict size: {sys.getsizeof(numbers_dict)}")