The course below was translated into English by ChatGPT (version 3.5) from the initial course. Some inconsistencies may remain. The discussion with ChatGPT is accessible here (he had difficulty 🥵)
Note: This is only a summary of the course and by no means replaces the complete study of the two algorithms, which has been done in the activity and exercises.
The goal is to sort an array T
of non-empty integers in ascending order.
T
of size $n$ consisting of integers (indices ranging from 0
to n-1
).T
in ascending order.T
is non-empty.T
is sorted, meaning that each element of T
is less than or equal to the next element: for any integer $i$ between $0$ and $n-2$, we have $T[i] \leq T[i+1]$.Idea of the insertion sort algorithm:
This is the algorithm we apply to sort cards we hold in our hands.
Algorithm
for i from 1 to n-1 do
x ← T[i]
j ← i
while j > 0 and x < T[j-1] do
T[j] ← T[j-1]
j ← j – 1
end while
T[j] ← x
end for
Example: We want to sort the array T = [5, 1, 2, 1, 4]
.
i = 1 : |
|
🡆 |
| ||||||||||
i = 2 : |
|
🡆 |
| ||||||||||
i = 3 : |
|
🡆 |
| ||||||||||
i = 4 : |
|
🡆 |
|
In Python
def insertion_sort(T):
for i in range(1, len(T)):
x = T[i]
j = i
while j > 0 and x < T[j-1]:
T[j] = T[j-1]
j = j - 1
T[j] = x
>>> t1 = [5, 1, 2, 1, 4]
>>> insertion_sort(t1)
>>> t1
[1, 1, 2, 4, 5]
The time cost of the algorithm defines its efficiency. Determining the cost of an algorithm is equivalent to evaluating the number of elementary operations in the worst case.
To simplify the study, we only count the comparisons between two elements of the array (corresponding to the line x < T[j-1]
).
The worst case for insertion sort is an array sorted in descending order because each element needs to be inserted in the first position, requiring comparison with all preceding elements.
For a descending order sorted array of size $n$ (worst case), we have:
T[1]
is compared to T[0]
);T[2]
is successively compared to T[1]
and T[0]
);i
= $n – 1$): there are $n – 1$ comparisons to make (T[n-1]
is successively compared to T[n-2]
, T[n-3]
, ..., T[0]
).Conclusion: There are a total of $1+2+3+⋯+(n-2)+(n-1)$ comparisons in the worst case.
Now, $1+2+3+⋯+(n-2)+(n-1)=\dfrac{n(n-1)}{2}=\dfrac{n^2}{2}-\dfrac{n}{2}$, so we can say that the cost of the algorithm is on the order of $n^2$ (taking the highest power of $n$), and it is called quadratic cost.
Insertion sort has a quadratic cost in the worst case. This means that if you multiply the size of the array to be sorted by $k$, then the time for sorting is multiplied by $k^2$ (this property is experimentally illustrated in the last paragraph).
Idea of the selection sort algorithm:
This means performing a minimum search in the right part of the array at each step.
Algorithm
for i from 0 to n-2 do
ind_min ← i
for j from i+1 to n-1 do
if T[j] < T[ind_min] then
ind_min ← j
end if
end for
swap(T, i, ind_min) # swap T[i] and T[ind_min]
end for
Example: We want to sort the array T = [5, 1, 2, 1, 4]
.
i = 1 : |
|
🡆 |
| ||||||||||
i = 2 : |
|
🡆 |
| ||||||||||
i = 3 : |
|
🡆 |
| ||||||||||
i = 4 : |
|
🡆 |
|
In Python
def swap(T, i, j):
"""swap T[i] and T[j] in the array T"""
temp = T[i]
T[i] = T[j]
T[j] = temp
def selection_sort(T):
for i in range(len(T)-1):
ind_min = i
for j in range(i+1, len(T)):
if T[j] < T[ind_min]:
ind_min = j
swap(T, i, ind_min)
>>> t1 = [5, 1, 2, 1, 4]
>>> selection_sort(t1)
>>> t1
[1, 1, 2, 4, 5]
As before, we only study the number of comparisons (line T[j] < T[ind_min]
) in the worst case.
Regardless of the initial array, there is always the same number of comparisons to make because we know the number of loop iterations since there are two for
loops. There is no worst or best case in the case of selection sort (or all cases are worst cases).
More specifically, for an array T
of size $n$:
i
= 0), there are $n-1$ comparisons in the search for the smallest element (search for the minimum among the last $n-1$ values);i
= 1), there are $n-2$ comparisons (search for the minimum among the last $n-2$ values);i
= $n-2$), there is only 1 comparison left (search for the minimum among the last 2 values).Conclusion: There are a total of $(n-1)+(n-2)+ ⋯ +2+1=\dfrac{n(n-1)}{2}=\dfrac{n^2}{2}-\dfrac{n}{2}$ comparisons in the worst case (the same number as for insertion sort), so the cost is also on the order of $n^2$.
Selection sort also has a quadratic cost in the worst case. This means that if you multiply the size of the array to be sorted by $k$, then the time for sorting is multiplied by $k^2$ (this property is experimentally illustrated in the last paragraph).
There are other sorting algorithms, and some are much more efficient (for reference, their cost is on the order of $n \log_2(n)$, which is much less than $n^2$.)
Python provides two functions for sorting an array efficiently, depending on whether you want to modify the initial array or not.
The sorted
function takes an array as an argument and returns a new sorted array containing the same elements.
>>> t = [12, 5, 3, 6, 8, 10]
>>> sorted(t)
[3, 5, 6, 8, 10, 12]
>>> t
[12, 5, 3, 6, 8, 10] # the original array t has not been modified!
On the other hand, the t.sort()
method modifies the t
array to sort it without returning anything.
>>> t = [12, 5, 3, 6, 8, 10]
>>> t.sort()
>>> t
[3, 5, 6, 8, 10, 12]
We can experimentally measure the efficiency of insertion sort, selection sort, and the sorting functions provided by Python.
The time values measured below obviously depend on the power of the computer executing the algorithms. However, the general idea and orders of magnitude of the cost remain the same with current standard computers.
We create a function to construct a decreasing array, which will be a worst-case scenario.
def create_decreasing_array(size):
array = [size - k for k in range(size)]
return array
>>> create_decreasing_array(10)
[10, 9, 8, 7, 6, 5, 4, 3, 2, 1]
The time measurement protocol is proposed below:
import time
def measure_worst_case_time(size):
# selection sort
array = create_decreasing_array(size)
t0 = time.time()
selection_sort(array)
t1 = time.time()
selection_sort_time = t1 - t0
# insertion sort
array = create_decreasing_array(size)
t2 = time.time()
insertion_sort(array)
t3 = time.time()
insertion_sort_time = t3 - t2
print(f"selection sort time for an array of size {size}: {selection_sort_time}")
print(f"insertion sort time for an array of size {size}: {insertion_sort_time}")
Trials
>>> measure_worst_case_time(100)
selection sort time for an array of size 100: 0.0009999275207519531
insertion sort time for an array of size 100: 0.002000093460083008
>>> measure_worst_case_time(1000)
selection sort time for an array of size 1000: 0.10900020599365234
insertion sort time for an array of size 1000: 0.22899985313415527
We see that by multiplying the size by 10, the time for sorting is multiplied by about $10^2=100$.
>>> measure_worst_case_time(2000)
selection sort time for an array of size 2000: 0.43900012969970703
insertion sort time for an array of size 2000: 0.9839999675750732
We multiplied the size by 2 compared to the previous example, and the time is approximately multiplied by $2^2=4$.
To illustrate the point, we take the previous value, namely that it takes about 0.1 seconds to sort a selection array of size 1000.
Since the cost of the algorithm is quadratic, to sort an array of size one million ($10^6$), which is 1000 times larger, it would take multiplying the time by $1000^2=10^6$. Therefore, it would take $0.1\times 10^6=10^5$ seconds, which is 1 day, 3 hours, 46 minutes, and 20 seconds.
And for an array of size $10^9$, it would take more than 3170 years...
It is clear that an algorithm with a quadratic cost is not efficient. Moreover, the sorting functions provided by Python are efficient. Indeed, it takes only a few milliseconds to sort an array of size $10^6$:
def measure_worst_case_sort(size):
array = create_decreasing_array(size)
# sort the array using Python's provided methods and measure the time
t0 = time.time()
array.sort() # or sorted(array)
t1 = time.time()
python_sort_time = t1 - t0
print(f"time to sort with functions provided by Python: {python_sort_time}")
>>> measure_worst_case_sort(1_000_000)
time to sort with functions provided by Python: 0.009999990463256836
sort
and sorted
functions.References:
Germain BECKER & Sébastien POINT, Lycée Mounier, ANGERS
Voir en ligne : info-mounier.fr/premiere_nsi/algorithmique/tris-insertion-selection-english