What is the difference between a dictionary and a defaultdict?
A dict is Python’s built-in hash map: key-value pairs where accessing a missing key raises KeyError. A collections.defaultdict is a subclass of dict that automatically creates a default value the first time a missing key is read.
How defaultdict differs: you pass a zero-argument factory callable at construction. When you access a key that isn’t there, the factory is called, the result is stored under that key, and that value is returned — all in one step.
from collections import defaultdict
counts = defaultdict(int) # factory = int, produces 0
for word in "the quick brown fox".split():
counts[word] += 1 # no KeyError; first access creates 0
groups = defaultdict(list) # factory = list, produces []
for word in words:
groups[len(word)].append(word) # auto-creates the listThe equivalent with plain dict: doable but noisier — you use d.setdefault(key, factory()) or if k not in d: d[k] = [].
groups = {}
for word in words:
groups.setdefault(len(word), []).append(word)Common factories:
int— counters (0 default).list— grouping values into buckets.set— collecting unique values per key.lambda: 0.0orlambda: "N/A"— custom scalar defaults.A class — lazily instantiate rich objects.
Behavior to know:
Reading a missing key inserts and returns the factory result. This can surprise code that expects a “pure” read to leave the dict unchanged.
inand.get()do not trigger the factory — use them if you want to probe without auto-insert.The factory is
.default_factory; setting it toNonemakes adefaultdictbehave like a normaldictagain.Pickling/unpickling and JSON dumps work fine; JSON serializes the current contents as a regular object.
When to reach for each:
Use
dictfor general key-value storage or when you explicitly want missing-key errors to surface bugs.Use
defaultdictwhen you’re building per-key collections (counters, groupings, adjacency lists) and want to drop theif-existsboilerplate.For pure counting,
collections.Counteris even more direct:Counter(words).
Gotcha to watch: because reading inserts, logging d["missing"] in an error path changes the dict. Use d.get("missing") for pure reads.
Interview-ready summary: a plain dict raises on missing keys; a defaultdict(factory) auto-creates and stores a default value. Use defaultdict to simplify grouping, counting, and bucketing code — and use Counter when all you’re doing is counting.
from collections import defaultdict
# Regular dict - KeyError on missing key
regular = {}
# regular['missing'] # KeyError!
value = regular.get('missing', 0) # Safe but verbose
# Counting with regular dict (verbose)
words = ['apple', 'banana', 'apple', 'cherry', 'banana', 'apple']
counts = {}
for word in words:
if word not in counts: # Must check!
counts[word] = 0
counts[word] += 1
# Counting with defaultdict (clean)
counts = defaultdict(int) # int() returns 0
for word in words:
counts[word] += 1 # Auto-creates 0 if missing
# {'apple': 3, 'banana': 2, 'cherry': 1}
# Grouping with defaultdict
students = [('Math', 'Alice'), ('Science', 'Bob'), ('Math', 'Charlie')]
by_subject = defaultdict(list)
for subject, name in students:
by_subject[subject].append(name) # Auto-creates [] if missing
# {'Math': ['Alice', 'Charlie'], 'Science': ['Bob']}
# Counter (even cleaner for counting)
from collections import Counter
counts = Counter(words) # {'apple': 3, 'banana': 2, 'cherry': 1}
print(counts.most_common(2)) # [('apple', 3), ('banana', 2)]Regular dict requires key existence checks before incrementing. defaultdict(int) auto-creates 0 for missing keys, making counting a one-liner. defaultdict(list) auto-creates empty lists for grouping. Counter is even more concise for counting — it counts elements directly from an iterable and provides most_common.
Show the before/after: verbose dict counting vs clean defaultdict counting. Know the three common factories: int (counting), list (grouping), set (unique grouping).
Mention Counter as the purpose-built solution for counting.