The input is a list of integers from 1 to n in a random order. What matters is the average time taken, not best or worst case. It does not have to be stable. Anthropic effects and other cheesy special cases are ignored.

Resolves YES when one is provided. Resolves NO when this is proven impossible.

## Related questions

I think this question is ill-posed, because complexity analysis is always done with reference to a specific model, and you haven't specified a model.

For arbitrary lists that are sorted by comparisons, O(n * log(n)) is the best you can do. The explanation is a bit technical, but this write up is pretty good: https://www.geeksforgeeks.org/lower-bound-on-comparison-based-sorting-algorithms/amp/. However, since we know the range of values is 1-N, we can use that assumption to let us sort without comparisons, allowing us to sort in O(n) time.

Counting sort is a well known algorithm that can be used to get O(n + k) time, where k is the max value in the array:

https://en.m.wikipedia.org/wiki/Counting_sort. It works by counting the number of times each element is in the list, and then generates a sorted list from the counts. It only works if the list has a known range of values, but if k is small it can be faster than most algorithms. In this case where k=n, the time complexity is just O(2n) = O(n). We can actually do it even more efficiently by never reading the values in the list.

Because the list only contains values 1 through N, with no repeats or gaps in the values, the solution is the same for all lists of the same length. So if we have the length, we can generate the sorted list from scratch. A simple Python solution would look like this:

def sort(lst):

return list(range(len(lst)))

Which runs in O(n) time.

@TonyPepperoni While what you're saying is accurate, I'm not sure if that was their intention when they asked the question haha. I interpreted the prompt as sorting an arbitrary collection of integers, rather than sorting specifically the list consisting of the values 1 to N, because as you say that is quite trivial. But I can definitely see why you read the question that way

@TonyPepperoni Why does counting sort not run afoul of the proof of n * log(n) as a lower bound for comparison sorts?

@IsaacKing counting sort is not a comparison sort, so the proof does not apply to it. A comparison sort can operate on any array of comparable values, while a counting sort operates on an array of values from a known finite set (e.g. positive integers up to some maximum).

@IsaacKing I agree with jcb, the lower bound for comparison sorts is based on the assumption that the sorting algorithm can be modeled as a tree of decisions based on the result of the comparisons. It also assumes that each compare can only have two inputs (any two values in the list), and two outcomes (true / false).

That assumption is useful for designers of a programming language, since they want their sorting algorithms to generalize to things like floats, strings, and objects. But if we're able to make assumptions about the values in the list, there's room to optimize. Specifically, counting sort gets around this by converting the list to an alternate representation that is inherently sorted.

If we have the list [1, 3, 0, 4, 4], counting sort will do a pass over all values and count how many of each it sees, in this case it's [1, 1, 0, 1, 2]. Then to make the sorted list it does a pass over the counts, outputting a value once for each time it was first seen, which would give us the result of [0, 1, 3, 4, 4]. The first step is O(n), and the second step is O(k), so the resulting speed is O(n + k).

It skipped doing any comparisons by using array indexing to sort, instead of comparisons. Once the list has been converted into the counts, comparing isn't necessary because the array of counts are already in order.

Ok, so if I understand correctly, O(n * log(n)) is the best possible average time for a sort that can operate on arbitrary values, and O(n) is the best possible average time when operating on a finite list of values that are known in advance. Is that correct?

And why is counting sort able to compare numbers at a constant speed rather than proportional to log(k)? Surely it takes longer for a computer to check if two 1 million digit numbers are equal than to check if two 1 digit numbers are equal.

@IsaacKing well if we are bring technical, you could achieve O(1) on a finite list of values known in advance e.g. numbers from 1 to N inclusive. All you need to do is have a lookup table by index N.

@IsaacKing If one is not being careful, such judgements are generally made assuming that memory access is completely free and the computer can perform arithmetic operations on arbitrarily large integers in one time-step. But if one is being careful, those assumptions are obvious nonsense:

Memory access is not free. There's only so much data you can fit within a cubic metre before you become a black hole, and the constant speed of light sets an upper bound on how fast you can get data to the processor from outside your cubic metre.

As you point out, no computer can perform arithmetic operations on arbitrarily large integers in one time-step.

Similarly, consider looking an element up in a hashset. This is generally thought of as average constant-time when there are sufficiently many buckets relative to the number of entries in the table, degenerating to linear time when there are few buckets. Of course, it's not constant/linear! Hashing an integer can be done in time at best logarithmic in the size of that integer, essentially by consuming it one bit at a time, so there's an extra logarithmic factor people conveniently omit.

If you're happy that indexing into a hashset "is" constant time, then you should probably also be happy that comparing integers "is" constant time. If you're not happy with that, then I'd suggest elaborating in your question on the degree to which you care about this kind of thing.

In fact this question seems designed to probe that kind of edge, assuming the upper-case N (length of list) and lower-case n (complexity of algorithm) are meant to denote the same quantity. Maybe that's what you intended!

@IsaacKing it's because counting sort never compares the values of the list against one another, it replaces comparisons with linear time transformations. Here's some rough pseudocode for how it works:

counting_sort(list, k):

# Count items in array

count = array of k + 1 zeros

for v in list:

count[v] += 1

output = []

# Iterate over each entry in count

for i from 0 to len(count):

# Add i to output, count[i] times

for j from 0 to count[i]:

output.append(count[i])

return output

There's a better implementation described on Wikipedia which avoids the nested for loop and is generally more efficient, but also harder to follow. I hope my rough implementation shows how sorting can be done without any comparisons. The original list is transformed into an intermediate representation (the counts) which can then be transformed into a sorted list. The linear time transformations replace the need for comparisons between values.

@TonyPepperoni You saw there where you indexed into `count`

using an element from the input list? Strictly speaking that's logarithmic, and you seem to be treating it as constant.

And why is counting sort able to compare numbers at a constant speed rather than proportional to log(k)? Surely it takes longer for a computer to check if two 1 million digit numbers are equal than to check if two 1 digit numbers are equal.

To be clear, counting sort is not *comparing* numbers, it's using them to index into an array of counts.

One of the assumptions for counting sort is that the values can be used to index into an array; either they numbers are in the range 0-n for some suitable n, or they can be bijectively mapped onto that range in constant time.

@PatrickStevens array access is generally considered to be constant time. see e.g. https://cs.stackexchange.com/a/152615

@jcb in this case the items to be sorted are integers from 1 to N, so the bijective map is just index = entry - 1

@jcb Sure, that's why I suggested Isaac clarify exactly how correct to be. After all, if one is concerned that comparing numbers isn't constant-time, then presumably one is not working in a transdichotomous model!