20

4

This is a question that's been lingering in my mind for some time ...

Suppose I have a list of items and an equivalence relation on them, and comparing two items takes constant time. I want to return a partition of the items, e.g. a list of linked lists, each containing all equivalent items.

One way of doing this is to extend the equivalence to an ordering on the items and order them (with a sorting algorithm); then all equivalent items will be adjacent.

But can it be done more efficiently than with sorting? Is the time complexity of this problem lower than that of sorting? If not, why not?

1Any hash value which does not yield equal hashes for equal objects is broken. Part of the definition of a hash function (indeed, practically the only requirement) is that equal values must yield equal hashes. – supercat – 2010-07-15T15:48:19.247

The latter is exactly what I was looking for. Thanks! – reinierpost – 2010-07-15T15:51:50.453

@supercat: If we had such a hash would we even be having this question posted? My guess is no. Given an arbitrary equality/compare function on objects, it might be hard to come up with such a hash and solving that might well involve solving the partition problem. So just claiming that hashing is a solution is not really helpful. – None – 2010-07-15T16:06:13.617

@reinierpost: It helped? No upvote then? ;-) Really though, if you think an answer (not just for this question) is correct and useful, you should consider upvoting it, especially when there are many upvoted misleading answers about (in general, not making any claims about this question). The whole premise of this site is that correct and useful answers get upvoted and that the signal/noise ratio is high. – None – 2010-07-15T16:17:34.083

1+1 for seeing and succinctly stating the salient points. – andand – 2010-07-15T16:55:17.323

+1, but I really think you meant "unequal objects need not have unequal hashes". Supercat is right. (Though it may be hard to find such a hash.) – ShreevatsaR – 2010-07-15T18:01:13.013

2@supercat: "Any hash value which does not yield equal hashes for equal objects is broken." -- that's true, but in this case the hash also needs to yield equal hashes for

equivalentobjects. There's a difference between equivalence and equality. "John Smith" is not equal to "Fred Smith" but they may be equivalent if you have defined equivalence to only consider last names. – Dan – 2010-07-15T18:04:19.243@ShreevatsaR: I meant, you just can't pick any hash function and expect it to work. It has to be based on equal objects having equal hashes. I will edit the answer to clarify. – None – 2010-07-15T18:44:32.740

+1 [what andand said] – Dave O. – 2010-07-15T22:27:32.333

@Moron: I'm not sure what makes you think I didn't upvote your answer. But all answers have been useful, and they complement each other, so it's hard to pick a winner. If you can suggest a criterion for ordering by usefulness, be my guest :-) – reinierpost – 2010-07-15T22:36:00.583

@rei: I saw your comment, and saw zero upvotes and so just assumed :-) I was kidding, btw, you are not under any obligations, even if it is your own question. The criterion for ordering is the number of votes! So if people are diligent in voting the right stuff, the ordering is automatic and that is what the voting system is for. Anyway, we are getting way off-topic. – None – 2010-07-15T23:19:44.073

"Whether to use hashing or sorting completely depends on other constraints not available in the question". Well, yes, like in the ~0.000...01% of the cases where you cannot come up with a reasonable hash function for your elements. Otherwise, a hashtable beats sorting hands-down in this problem. – Dimitris Andreou – 2010-07-16T10:43:09.820

@Dimitris: Where did you pull that figure from? btw, did you read the question? – None – 2010-07-16T13:32:37.910

@Moron, might you show (or pull from somewhere, heh, good one!) an example where sorting would be better than hashing for this problem? Just saying. – Dimitris Andreou – 2010-07-16T14:59:15.760

@Dimit: I added an answer to the question Dan posted. I didn't think much about it, so might be wrong, though. In any case, the burden of proof is on

you! If you make a claim, you better prove it. I never claimed sorting isalways(or even usually) better than hashing. I only claimed that in thecontextof the question ("is partition easier than sorting") talking about hashing is noise. People seem to have misunderstood what I was saying. – None – 2010-07-16T15:03:13.053In fact

youmade a claim that somehow deciding between sorting and hashing for this problem depends on some other, hidden factors. I merely asked you to show an example where the best answer would not be clear (and in particular, would be sorting rather than hashing). Still waiting for that. – Dimitris Andreou – 2010-07-16T16:13:37.050@Dim: I stand by it. Choosing whether hashing or sorting is better depends on the actual situation! There is too little information provided in the question to say hashing is better or not. For an answer: http://stackoverflow.com/questions/3261782/puzzle-need-an-example-of-a-complicated-equivalence-relation-partitioning-th/3266142#3266142. (In fact, I already referred to this answer in my previous comment, if you happened to miss the first sentence of that comment!)

– None – 2010-07-16T16:15:33.767What I'm getting at is that under any remotely reasonable hash function, it is exceedingly improbable to get a degenerate case. I would choose the expected O(N) algorithm all times, even with the risk that it might degenerate to Ω(N^2) once in a blue moon, rather that pay the cost of O(NlogN) cost each and every time. (This argument also applies in using hashtables rather than search trees.) Theoretically? Υes, you are right, in the worst case hashing is rubbish (it is!). But in practice, I'd suggest hashing with hardly any discussion over it. – Dimitris Andreou – 2010-07-16T17:49:16.320

1@Dimi: Without even knowing if a suitable hash function exists you want to suggest hashing to a question whose title is "Is Partitioning easier than Sorting"!? No one is claiming sorting is better (I am repeating myself here) than hashing in practice. If you have a decent hash function sure, use it. If you notice, the question was a theoretical one and the better or worse was only in terms of the worst case complexity. My answer does mention that hashing is

expectedO(n) with a decent hash. If you see the answer I linked to earlier, I hope you will agree that sorting might be better there. – None – 2010-07-16T18:06:30.067...I am not sure where you got that I am suggesting always use sorting or sorting is always better! Please read the answer in the context of the question. Also, I disagree that you can blindly always go with the hash. Depends on the situation! I do agree that hashing will probably win, though... :-) – None – 2010-07-16T18:07:03.250