Does there exist a sorting algorithm with average time complexity better than O(n * log(n)) on a list of integers?
Basic
28
5.8k
2100
97%
chance

The input is a list of integers from 1 to n in a random order. What matters is the average time taken, not best or worst case. It does not have to be stable. Anthropic effects and other cheesy special cases are ignored.

Resolves YES when one is provided. Resolves NO when this is proven impossible.

Get Ṁ600 play money
Sort by:

Seems there were some definitional issues here. Should I resolve YES or N/A?

@IsaacKing I don't mind if you resolve however you wish. Let me give some background: I assume that definitional issues refer to the fact that you have not specified the computational model. In algorithmic practice, if you do not specify model, it means that you implicitly work in a model that's called WordRAM. This model is basically the C language, except that variables are not 32 bits, but their length is a parameter of the model called w. Once you are in this model, the word "integer" is well defined, it is an integer encoded in a w-bit length word that you can operate with in constant time. The WordRAM model has several variants, but you can sort integers faster than O(n log n) in all of them. The currenct fastest sorting is by Han & Thorup in time O(n * sqrt(loglog(n))). Whether you can sort integers in linear time is a big open question -- actually there should be a market on it!

Realistically, other models of computations like Turing machines or Game of life seem like a weird interpretation of your question. For example, you can't implement standard n*log(n) algorithms on Turing machine, I think some standard communication complexity argument should imply that any sorting on Turing machine has to take at least n^2 steps.

bought Ṁ150 YES

For standard models of computation, o(n log n) sorting algorithms are known, see e.g. this: https://cs.stackexchange.com/questions/32064/about-sorting-numbers-in-linear-time

It would be nice if the question could be made clearer—it seems like the issue is not so much that any part is unsettled, but rather that it isn't sufficiently specified to resolve. In particular:

  1. What computational model are we working with?

  2. "a list of integers from 1 to n in a random order" This suggests that the list contains each of the numbers exactly once. If so, there's an easy answer, but I doubt that's the intent. "a list of integers between 1 and n inclusive" would be much clearer.

  3. This is a question of mathematics, what do "anthropic effects" refer to?

@jesyspa I think "anthropic affects" refers to approaches like "shuffle the list, then check if it's sorted, then end your life if the list is not sorted. In worlds where you survive, the list has been sorted in O(n) time"

I think this question is ill-posed, because complexity analysis is always done with reference to a specific model, and you haven't specified a model.

@tfae I'm not familiar enough with the field to know what that means. Is there a "normal" model?

Radix sort is O(n * log(N)) so it technically count.

predicts NO

@MarisaKirisame This market is only about sorts that are faster than that.

@IsaacKing oh, i misread the question and thought n and N is different.

predicts NO

@MarisaKirisame Ah, sorry, I've fixed the description.

For arbitrary lists that are sorted by comparisons, O(n * log(n)) is the best you can do. The explanation is a bit technical, but this write up is pretty good: https://www.geeksforgeeks.org/lower-bound-on-comparison-based-sorting-algorithms/amp/. However, since we know the range of values is 1-N, we can use that assumption to let us sort without comparisons, allowing us to sort in O(n) time.

Counting sort is a well known algorithm that can be used to get O(n + k) time, where k is the max value in the array:

https://en.m.wikipedia.org/wiki/Counting_sort. It works by counting the number of times each element is in the list, and then generates a sorted list from the counts. It only works if the list has a known range of values, but if k is small it can be faster than most algorithms. In this case where k=n, the time complexity is just O(2n) = O(n). We can actually do it even more efficiently by never reading the values in the list.

Because the list only contains values 1 through N, with no repeats or gaps in the values, the solution is the same for all lists of the same length. So if we have the length, we can generate the sorted list from scratch. A simple Python solution would look like this:

def sort(lst):

return list(range(len(lst)))

Which runs in O(n) time.

@TonyPepperoni While what you're saying is accurate, I'm not sure if that was their intention when they asked the question haha. I interpreted the prompt as sorting an arbitrary collection of integers, rather than sorting specifically the list consisting of the values 1 to N, because as you say that is quite trivial. But I can definitely see why you read the question that way

predicts NO

@TonyPepperoni Why does counting sort not run afoul of the proof of n * log(n) as a lower bound for comparison sorts?

@IsaacKing counting sort is not a comparison sort, so the proof does not apply to it. A comparison sort can operate on any array of comparable values, while a counting sort operates on an array of values from a known finite set (e.g. positive integers up to some maximum).

predicts YES

@IsaacKing I agree with jcb, the lower bound for comparison sorts is based on the assumption that the sorting algorithm can be modeled as a tree of decisions based on the result of the comparisons. It also assumes that each compare can only have two inputs (any two values in the list), and two outcomes (true / false).

That assumption is useful for designers of a programming language, since they want their sorting algorithms to generalize to things like floats, strings, and objects. But if we're able to make assumptions about the values in the list, there's room to optimize. Specifically, counting sort gets around this by converting the list to an alternate representation that is inherently sorted.

If we have the list [1, 3, 0, 4, 4], counting sort will do a pass over all values and count how many of each it sees, in this case it's [1, 1, 0, 1, 2]. Then to make the sorted list it does a pass over the counts, outputting a value once for each time it was first seen, which would give us the result of [0, 1, 3, 4, 4]. The first step is O(n), and the second step is O(k), so the resulting speed is O(n + k).

It skipped doing any comparisons by using array indexing to sort, instead of comparisons. Once the list has been converted into the counts, comparing isn't necessary because the array of counts are already in order.

predicts NO

Ok, so if I understand correctly, O(n * log(n)) is the best possible average time for a sort that can operate on arbitrary values, and O(n) is the best possible average time when operating on a finite list of values that are known in advance. Is that correct?

And why is counting sort able to compare numbers at a constant speed rather than proportional to log(k)? Surely it takes longer for a computer to check if two 1 million digit numbers are equal than to check if two 1 digit numbers are equal.

predicts NO

@IsaacKing well if we are bring technical, you could achieve O(1) on a finite list of values known in advance e.g. numbers from 1 to N inclusive. All you need to do is have a lookup table by index N.

@IsaacKing If one is not being careful, such judgements are generally made assuming that memory access is completely free and the computer can perform arithmetic operations on arbitrarily large integers in one time-step. But if one is being careful, those assumptions are obvious nonsense:

  • Memory access is not free. There's only so much data you can fit within a cubic metre before you become a black hole, and the constant speed of light sets an upper bound on how fast you can get data to the processor from outside your cubic metre.

  • As you point out, no computer can perform arithmetic operations on arbitrarily large integers in one time-step.

Similarly, consider looking an element up in a hashset. This is generally thought of as average constant-time when there are sufficiently many buckets relative to the number of entries in the table, degenerating to linear time when there are few buckets. Of course, it's not constant/linear! Hashing an integer can be done in time at best logarithmic in the size of that integer, essentially by consuming it one bit at a time, so there's an extra logarithmic factor people conveniently omit.

If you're happy that indexing into a hashset "is" constant time, then you should probably also be happy that comparing integers "is" constant time. If you're not happy with that, then I'd suggest elaborating in your question on the degree to which you care about this kind of thing.

In fact this question seems designed to probe that kind of edge, assuming the upper-case N (length of list) and lower-case n (complexity of algorithm) are meant to denote the same quantity. Maybe that's what you intended!

predicts YES

@IsaacKing it's because counting sort never compares the values of the list against one another, it replaces comparisons with linear time transformations. Here's some rough pseudocode for how it works:

counting_sort(list, k):

# Count items in array

count = array of k + 1 zeros

for v in list:

count[v] += 1

output = []

# Iterate over each entry in count

for i from 0 to len(count):

# Add i to output, count[i] times

for j from 0 to count[i]:

output.append(count[i])

return output

There's a better implementation described on Wikipedia which avoids the nested for loop and is generally more efficient, but also harder to follow. I hope my rough implementation shows how sorting can be done without any comparisons. The original list is transformed into an intermediate representation (the counts) which can then be transformed into a sorted list. The linear time transformations replace the need for comparisons between values.

@TonyPepperoni You saw there where you indexed into count using an element from the input list? Strictly speaking that's logarithmic, and you seem to be treating it as constant.

And why is counting sort able to compare numbers at a constant speed rather than proportional to log(k)? Surely it takes longer for a computer to check if two 1 million digit numbers are equal than to check if two 1 digit numbers are equal.

To be clear, counting sort is not comparing numbers, it's using them to index into an array of counts.

One of the assumptions for counting sort is that the values can be used to index into an array; either they numbers are in the range 0-n for some suitable n, or they can be bijectively mapped onto that range in constant time.

@PatrickStevens array access is generally considered to be constant time. see e.g. https://cs.stackexchange.com/a/152615

@jcb in this case the items to be sorted are integers from 1 to N, so the bijective map is just index = entry - 1

@jcb Sure, that's why I suggested Isaac clarify exactly how correct to be. After all, if one is concerned that comparing numbers isn't constant-time, then presumably one is not working in a transdichotomous model!

@IsaacKing I don’t think we can argue about complexity of such elementary operations as comparing numbers. Otherwise just reading or rewriting array should be n*log(k) which you can consider as a proof for non existence of such algorithm. As far as I understand provided algorithm is not what you were looking for, however it completely fits provided description. So should be resolved as yes

Why can't any sort be turned into a counting sort by defining a mapping from the elements into the integers?

@IsaacKing because it is not always possible or practical to define such a mapping. For the reals, obviously, this can't be done. (Of course also a computer can't truly represent arbitrary real numbers, but neither can it use 128-bit integers as array indexes for a counting sort.)

I think if you want to really answer your question, you need to get more concrete about what model of computation you are working under.

@jcb Ok, sure. But assuming the set of possible array values is countable?

I don't think the model of computation matters if I'm allowed to precompute them, right?

@IsaacKing it seems, that your question about mapping is basically “Why can’t we first sort values, and then sort them faster”. Sorting can be pretty much defined as generating of mapping to integers 1-n that holds x < y <=> f(x) < f(y)

@026f Hmm, fair enough.