Tips & Tricks for Python Lists - Part 1
Contents of this post:
- Naming Conventions
- Dynamic Storage vs Pre-allocated Storage
- Overwriting default keyword “list”
- Using List Comprehensions
- Printing a List
- Flatten a List of Lists
- Remove Duplicates from a List
- Find the Most Frequent Element in a List
- Extend a List
- Lists and Memory
- Copying a List
- Shallow Copy vs Deep Copy
- Finding the longest string in a list
- List Unpacking
- Sorting Lists
1. Naming Conventions
It is a good practice to name your lists in plural from to give idea to fellow programmers that it belongs to a collection type. Unlike JSON, you can have a comma after last element of list, even production level code uses this technique to avoid unnecessary git diffs.
2. Dynamic Storage vs Pre-allocated Storage
Python lists are dynamic in nature and they grow as you add more elements to it. While working with large data sets, it is important to understand the difference between dynamic storage and pre-allocated storage. Dynamic storage is when the list is created with no initial size and it grows as you add more elements to it. Pre-allocated storage is when you create a list with a fixed size and you can only add elements to it until it reaches its maximum size.
# Dynamic Storage
numbers = []
for num in range(1000000):
# each time a new element is added to the list, the list is copied to a new location in memory
# that is why it is much slower than pre-allocated storage
numbers.append(num)
# Pre-allocated Storage
# memory is pre-allocated upfront
numbers = [None] * 1000000
for num in range(1000000):
numbers[num] = num
# Pre-allocated Storage is much faster than Dynamic Storage
In general, the execution speed is fastest in pre-allocated storage, followed by list comprehension and then dynamic storage.
The pre-allocated method does not work for lists of lists (or nested lists). In that case, you can use list comprehension. Memory management in a preallocated nested list can be a bit tricky, especially when modifying individual elements of the nested list. This is because each element of the outer list is a reference to a separate inner list, and modifying an element of an inner list can have unexpected consequences.
# Pre-allocated Storage does not work for lists of lists
n = 3
nested_list = [[0]*n]*n
print(nested_list) # [[0, 0, 0], [0, 0, 0], [0, 0, 0]]
nested_list[0][0] = 1
print(nested_list) # [[1, 0, 0], [1, 0, 0], [1, 0, 0]]
# Use list comprehension instead
n = 3
nested_list = [[0]*n for _ in range(n)]
print(nested_list) # [[0, 0, 0], [0, 0, 0], [0, 0, 0]]
nested_list[0][0] = 1
print(nested_list) # [[1, 0, 0], [0, 0, 0], [0, 0, 0]]
3. Overwriting default keyword “list”
It is a good practice to avoid overwriting “list” keyword with your own variable name.
# Bad
list = 4
print(type(list)) # <class 'int'>
# Good
var = 4
print(type(var)) # <class 'int'>
4. Using List Comprehensions
List comprehensions are a concise way to create a new list by applying an operation to each element in an existing list. They are often more efficient than using a for loop to iterate over the list and append the results to a new list. Here’s an example of using a list comprehension to square each element in a list:
original_list = [1, 2, 3, 4, 5]
squared_list = [x ** 2 for x in original_list]
print(squared_list)
# Output: [1, 4, 9, 16, 25]
# x will not be available outside the list comprehension
print(x) # NameError: name 'x' is not defined
5. Printing a List
Want to print all the elements of a list? These are some of the ways you can do it.
my_list = [1, 2, 3, 4, 5]
print(my_list) # [1, 2, 3, 4, 5]
print(*my_list) # 1 2 3 4 5
print(*my_list, sep=', ') # 1, 2, 3, 4, 5
print(' '.join(map(str, my_list))) # 1 2 3 4 5
6. Flatten a List of Lists
# Method 1
my_list = [[1, 2], [3, 4], [5, 6]]
flattened_list = [item for sublist in my_list for item in sublist]
print(flattened_list) # [1, 2, 3, 4, 5, 6]
# Method 2
from itertools import chain
my_list = [[1, 2], [3, 4], [5, 6]]
flattened_list = list(chain.from_iterable(my_list))
print(flattened_list) # [1, 2, 3, 4, 5, 6]
flattened_list = list(chain(*my_list))
print(flattened_list) # [1, 2, 3, 4, 5, 6]
7. Remove Duplicates from a List
The simplest way to remove duplicates from a list is to convert it to a set and then back to a list. This will remove all duplicate elements from the list. However, the order of the elements can be lost since sets are unordered.
# Method 1
my_list = [1, 2, 3, 4, 5, 5, 6, 7, 8, 9, 9]
unique_list = list(set(my_list))
print(unique_list) # [1, 2, 3, 4, 5, 6, 7, 8, 9]
# Method 2
my_list = [1, 2, 3, 4, 5, 5, 6, 7, 8, 9, 9]
unique_list = []
for item in my_list:
if item not in unique_list:
unique_list.append(item)
print(unique_list) # [1, 2, 3, 4, 5, 6, 7, 8, 9]
Example of how the order of elements can be lost:
my_list = [3, 2, 1, 2, 3, 4, 5, 4, 6]
unique_list = list(set(my_list))
print("Original List: ", my_list) # [3, 2, 1, 2, 3, 4, 5, 4, 6]
print("Unique List: ", unique_list) # [1, 2, 3, 4, 5, 6]
# and may be you expect the output to be [3, 2, 1, 4, 5, 6]
8. Find the Most Frequent Element in a List
from collections import Counter
my_list = [1, 2, 3, 4, 5, 5, 6, 7, 8, 9, 9]
counter = Counter(my_list)
print(counter.most_common(1)) # [(5, 2)]
9. Extend a List
No need to use a for loop to extend a list. You can use the extend() method to add all the elements of a list to another list.
my_list = [1, 2, 3, 4, 5]
another_list = [6, 7, 8, 9, 10]
# Method 1
my_list.extend(another_list)
print(my_list) # [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
# Method 2
my_list = my_list + another_list
print(my_list) # [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
10. Lists and Memory
lst1 = [1, 2, 3]
lst2 = [1, 2, 3]
print(lst1 == lst2) # True, because they have same elements
print(lst1 is lst2) # False, because they are not stored at same memory location
lst3 = [1, 2, 3, 4, 5]
lst4 = lst3
print(lst3 is lst4) # True, because they are stored at same memory location
# This means lst4 is just another name for lst3 or pointer to lst3
lst4[0] = 10
print(lst3) # [10, 2, 3, 4, 5]
What happens if you do the same assignment with a variable?
# variables are immutable, therefore each time you assign a new value to a variable, a new memory location is created
a = 10
b = a
print(a is b) # True, because they are stored at same memory location
b = 20
print(a) # 10
a = 30
print(b) # 20
11. Copying a List
# Method 1
lst1 = [1, 2, 3]
lst2 = lst1.copy()
print(lst1 is lst2) # False, because they are not stored at same memory location
lst2[0] = 10
print(lst1) # [1, 2, 3]
print(lst2) # [10, 2, 3]
# Method 2
lst1 = [1, 2, 3]
lst2 = list(lst1)
print(lst1 is lst2) # False, because they are not stored at same memory location
lst2[0] = 10
print(lst1) # [1, 2, 3]
print(lst2) # [10, 2, 3]
12. Shallow Copy vs Deep Copy
Shallow copy creates a new list that stores the reference of the original elements. Deep copy creates a new list and recursively adds the copies of nested lists present in the original elements.
A shallow copy creates a new object, but the new object contains references to the same memory locations as the original object. In other words, the copy is a new object, but the contents of the object are still the same as the original object. Shallow copy is done using the copy()
method or the [:]
operator.
In deep copy the new object is completely independent of the original object, even if the contents of the object are themselves nested objects. Deep copy is done using the deepcopy()
method in the copy
module.
# Shallow Copy
lst1 = [[1, 2], [3, 4], [5, 6]]
lst2 = lst1.copy()
print(lst1 is lst2) # False, because they are not stored at same memory location
print(lst1[0] is lst2[0]) # True, because they are stored at same memory location
lst2[0][0] = 10
print(lst1) # [[10, 2], [3, 4], [5, 6]]
print(lst2) # [[10, 2], [3, 4], [5, 6]]
# Deep Copy
import copy
lst1 = [[1, 2], [3, 4], [5, 6]]
lst2 = copy.deepcopy(lst1)
print(lst1 is lst2) # False, because they are not stored at same memory location
print(lst1[0] is lst2[0]) # False, because they are not stored at same memory location
lst2[0][0] = 10
print(lst1) # [[1, 2], [3, 4], [5, 6]]
print(lst2) # [[10, 2], [3, 4], [5, 6]]
13. Finding the longest string in a list
Can we do this without a for loop? Yes, we can use the max() function with the key parameter.
my_list = ['a', 'aa', 'aaa', 'aaaa', 'aaaaa']
longest_string = max(my_list, key=len)
print(longest_string) # aaaaa
14. List unpacking
my_list = [1, 2, 3, 4, 5]
a, b, c, d, e = my_list
print(a) # 1
print(b) # 2
print(c) # 3
print(d) # 4
print(e) # 5
m, n, *other = my_list
print(m) # 1
print(n) # 2
print(other) # [3, 4, 5]
x, *_, y, z = my_list
print(x) # 1
print(y) # 4
print(z) # 5
15. Sorting lists
my_list = [1, 2, 3, 4, 5]
# in-place sorting
my_list.sort()
print(my_list) # [1, 2, 3, 4, 5]
my_list = [5, 4, 3, 2, 1]
# create a new list
new_list = my_list.sorted()
print(my_list) # [5, 4, 3, 2, 1]
print(new_list) # [1, 2, 3, 4, 5]
Both sorted
and .sort()
have a reverse
argument, and .reverse()
method exists seperately, moreover reverse method does not sort the numbers, only reverses them.
my_list = [3, 2, 1, 4, 5]
my_list.reverse()
print(my_list) # [5, 4, 1, 2, 3]