Core Implementation
Python strings are immutable sequences of Unicode characters:
# Basic string creation
str1 = "Hello, World!"
str2 = 'Hello, World!' # Single quotes
str3 = '''Multiline
string''' # Triple quotes
# String methods
upper_case = str1.upper()
lower_case = str1.lower()
stripped = str1.strip()
split_words = str1.split(',')
String Buffer Implementation (for mutable strings)
from io import StringIO
class StringBuffer:
def __init__(self):
self.buffer = StringIO()
def append(self, string):
self.buffer.write(string)
def toString(self):
return self.buffer.getvalue()
def clear(self):
self.buffer.seek(0)
self.buffer.truncate(0)
Implementation Variations
1. Rope Data Structure (for large strings)
class RopeNode:
def __init__(self, text=""):
self.weight = len(text)
self.text = text
self.left = None
self.right = None
class Rope:
def __init__(self, text=""):
self.root = self._build_rope(text)
def _build_rope(self, text):
if len(text) <= 8: # Threshold for leaf nodes
return RopeNode(text)
mid = len(text) // 2
node = RopeNode()
node.weight = mid
node.left = self._build_rope(text[:mid])
node.right = self._build_rope(text[mid:])
return node
2. Trie (for string prefix operations)
class TrieNode:
def __init__(self):
self.children = {}
self.is_end = False
class Trie:
def __init__(self):
self.root = TrieNode()
def insert(self, word):
node = self.root
for char in word:
if char not in node.children:
node.children[char] = TrieNode()
node = node.children[char]
node.is_end = True
Time Complexities
Operation | String | StringBuffer | Rope | Trie |
---|
Access by Index | O(1) | O(n) | O(log n) | N/A |
Append | O(n) | O(1)* | O(log n) | O(m)** |
Concatenation | O(n+m) | O(1)* | O(log n) | N/A |
Substring | O(k) | O(n) | O(log n) | N/A |
Find/Search | O(nm)** | O(n*m) | O(n) | O(m) |
Insert | O(n) | O(1)* | O(log n) | O(m) |
Delete | O(n) | O(n) | O(log n) | O(m) |
* Amortized time complexity
** Where m is the length of the word
*** Using naive search. KMP algorithm improves this to O(n+m)
Space Complexities
Implementation | Space Complexity | Additional Notes |
---|
String | O(n) | Immutable, requires new copy for changes |
StringBuffer | O(n) | Mutable, grows dynamically |
Rope | O(n) | Additional O(log n) for tree structure |
Trie | O(ALPHABET_SIZE * m * n) | Where m is average string length |
Common Data Structure Patterns
-
Sliding Window
- Fixed size window
- Variable size window
- Window with constraints
-
Two Pointers
- Start and end
- Same direction
- Opposite directions
-
String Matching Patterns
- KMP Algorithm
- Rabin-Karp
- Boyer-Moore
-
State Machine
- String parsing
- Pattern matching
-
Dynamic Programming
- Longest common substring
- Edit distance
- String alignment
-
Hash-based Patterns
- Rolling hash
- Substring hash
Edge Cases to Consider
- Empty String
s = ""
# Handle: if not s: return None
- Single Character
s = "a"
# Handle: if len(s) == 1: return s
- Unicode Characters
s = "Hello, 世界" # Mixed ASCII and Unicode
s = "🌟" # Emoji
# Use s.encode().decode('utf-8') for proper handling
- Special Characters
s = "Hello\n\tWorld" # Escape sequences
s = r"Hello\n\tWorld" # Raw string
- Case Sensitivity
s1 = "Hello"
s2 = "hello"
# Consider: s1.lower() == s2.lower()
Optimization Techniques
- String Interning
# Python automatically interns small strings
a = 'hello'
b = 'hello'
# id(a) == id(b) is True
# Force interning
from sys import intern
c = intern('hello'*100)
d = intern('hello'*100)
# id(c) == id(d) is True
- Join vs Concatenation
# Bad
result = ""
for s in string_list:
result += s # Creates new string each time
# Good
result = "".join(string_list)
- Format String Optimization
# Bad
s = "Hello " + name + ", you are " + str(age) + " years old"
# Good
s = f"Hello {name}, you are {age} years old"
# or
s = "Hello {}, you are {} years old".format(name, age)
Memory Management
- String Deduplication
# Using weakref for string deduplication
from weakref import WeakValueDictionary
string_cache = WeakValueDictionary()
def deduplicate(s):
if s not in string_cache:
string_cache[s] = s
return string_cache[s]
- Memory-Efficient String Storage
# Using __slots__ for string containers
class StringContainer:
__slots__ = ['value']
def __init__(self, value):
self.value = value
- Garbage Collection
import gc
# Force garbage collection after string operations
large_string = "x" * 10**6
del large_string
gc.collect()
- String Methods vs Regular Expressions
# Slower (regex for simple case)
import re
result = re.sub(r'\s+', ' ', text)
# Faster (string methods)
result = ' '.join(text.split())
- String Views
# Memory efficient string slicing
from memory_view import memoryview
text = bytearray(b'Hello, World!')
view = memoryview(text)
- Batch Processing
def process_large_string(s, chunk_size=1024):
for i in range(0, len(s), chunk_size):
chunk = s[i:i + chunk_size]
# Process chunk
Common Pitfalls
- String Immutability
# Common mistake
s = "hello"
s[0] = 'H' # TypeError: strings are immutable
# Correct way
s = 'H' + s[1:]
- Character vs String Comparison
# Pitfall
if c == "a": # c is a single character
pass
# Better
if c == 'a': # Single quotes for characters
pass
- Unicode Handling
# Potential issue
s = "Hello, 世界"
length = len(s) # May not give expected length
# Better
import unicodedata
length = len(unicodedata.normalize('NFC', s))
- Memory Usage in String Formatting
# Memory inefficient
large_list = ["item"] * 1000000
result = ", ".join(large_list) # Creates large temporary string
# Better (using generator)
def generate_items():
for item in large_list:
yield item
result = ", ".join(generate_items())
- String Pool References
# Unexpected behavior
a = "hello" * 100
b = "hello" * 100
print(a is b) # False, different objects
# Expected behavior
a = "hello"
b = "hello"
print(a is b) # True, same object from string pool