## How to pair socks from a pile efficiently?

3 561

1 505

Yesterday I was pairing the socks from the clean laundry and figured out the way I was doing it is not very efficient. I was doing a naive search — picking one sock and "iterating" the pile in order to find its pair. This requires iterating over n/2 * n/4 = n2/8 socks on average.

As a computer scientist I was thinking what I could do? Sorting (according to size/color/...) of course came to mind to achieve an O(NlogN) solution.

Hashing or other not-in-place solutions are not an option, because I am not able to duplicate my socks (though it could be nice if I could).

So, the question is basically:

Given a pile of `n` pairs of socks, containing `2n` elements (assume each sock has exactly one matching pair), what is the best way to pair them up efficiently with up to logarithmic extra space? (I believe I can remember that amount of info if needed.)

I will appreciate an answer that addresses the following aspects:

• A general theoretical solution for a huge number of socks.
• The actual number of socks is not that large, I don't believe my spouse and I have more than 30 pairs. (And it is fairly easy to distinguish between my socks and hers; can this be used as well?)
• Is it equivalent to the element distinctness problem?

405I use pigeon hole principle to pair exactly one from the laundry pile. I have 3 different colors of socks (Red,Blue and Green) and 2 pairs of each color. I pick up 4 number of socks each time and I always make up a pair and get to work. – Srinivas – 2013-01-19T15:37:34.983

2I would think you could simplify it somewhat by assuming that some subset of pairs of socks are fungible; I have about six pairs of socks that are identical. Also: neat question! – Dave – 2013-01-19T15:37:45.150

1Pick your favourite ordering principle (colour, texture, thickness), sort by that, pick adjacent pairs – sehe – 2013-01-19T15:42:31.183

52Yet another pigeon hole principle: if you take a subset of n/2 +1 socks, there must be at least one pair in this subset. – wildplasser – 2013-01-19T15:57:54.087

6also, you discard hashes beccause you cant make copies, but note that its easily possible to make a hash set that doesnt need copies, both in computing and in laundry. – Mooing Duck – 2013-01-19T17:42:42.563

2@MooingDuck: If you have something specific in mind, please post it, note that I do not "discard it" - this is only my initial thaughts - that might be wrong, the question itself does not forbid hashing, it only requires the algorithm to be in-place (and efficient). As said, I will also appreciate an answer that deals with the other aspects (small/large scale, and equivalence to the distinctness problem - that will show O(nlogn) is basically the best I can get without extra space) – amit – 2013-01-19T17:54:19.250

6

remark that if you had an infinite number of different symmetric pairs of socks (i.e. left=right), then, in fact, just picking out one of each is not possible unless you accept the axiom of choice (http://en.wikipedia.org/wiki/Axiom_of_choice). What consequence does this have on pairing? how is this relevant to computability? http://math.stackexchange.com/questions/269902/what-is-the-relationship-between-zfc-and-turing-machine

– thang – 2013-01-20T04:52:15.153

1

By the way, this is better known (sort of) as the game of Concentration) - making pairs from a large random group.

– Izkata – 2013-01-20T06:13:47.860

2A good algorithm to do by hand might be a shift-reduce algorithm. Add a new sock to the pile randomly and as soon as you have a pair on top, "reduce" and take the pair away. Eventually you'll get most of them. Remainder can be manually handled or algorithm restarted. Not computer science in the slightest but reasonably good since you can cheat a bit and pick up the right sock sometimes. – doug65536 – 2013-01-20T06:56:45.130

34

Great question! You might be interested in my article on a related problem, which is a discussion of the probability of pulling two matched socks out of the pile: http://blogs.msdn.com/b/ericlippert/archive/2010/03/22/socks-birthdays-and-hash-collisions.aspx

– Eric Lippert – 2013-01-20T19:08:45.353

Pigeonhole principle : http://en.wikipedia.org/wiki/Pigeonhole_principle

– Nabil Kadimi – 2013-01-21T20:34:28.120

9Add an extra pointer to the sock class that will point to the sock's pair. – AJMansfield – 2013-07-11T20:49:21.137

266Why not spawn a child and `waitpid` so that, as the parent, you're not even sorting any socks yourself? – Mxyk – 2013-09-06T16:48:54.320

101I solved this problem by only owning white knee-high socks. They all match. I could simply grab any two socks at random from the pile and they would match. I further simplify the problem by NOT pairing the socks. I have a sock drawer that I simply throw all my socks into, unpaired. I grab two at random from the drawer every morning. I've simplified it down to O(0). Can't get any simpler than that. :) – Lee – 2013-09-06T20:18:28.037

14This is O(1) anyway - there's a constant limit to the number of socks that will fit in any particular washing machine or sock drawer. – Steve314 – 2013-09-07T01:51:31.923

1@Steve314 So parallelize and use multiple sock drawers. – Thomas – 2013-09-14T04:07:38.053

5If you'd be able to duplicate socks, "hashing" wouldn't be the most efficient answer ;) – sfussenegger – 2013-09-19T13:46:27.587

1

This paper discusses a related problem : http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.178.4654&rep=rep1&type=pdf

– Erel Segal-Halevi – 2013-10-16T12:20:55.043

1

Here is another related link: http://www.mail-archive.com/[email protected]/msg00084.html

– Erel Segal-Halevi – 2013-10-16T12:26:50.103

10I know this doesn't answer your question in a theoretical sense, but practically speaking, nobody's looking at your socks. Not pairing them is O(k). – sqykly – 2013-11-10T11:25:32.130

7Laundry day is boring as it is. To eliminate the drag of sock sorting you should be doing it lazily: pour all (unmatched) socks in drawer, each morning pick two that look (kind of) the same. – Marcus – 2013-12-06T11:36:48.900

1I think in this case avoidance is the best solution: I have only one type of socks and therefore it's O(n) and requires a minimum of memory... :-) – Francois Bourgeois – 2013-12-24T14:24:29.693

14This question is chock full of engineering tunnel vision. Not everything's a nail. Sorting real life socks is very dependent on the performance limitations of the human visual cognitive system and her manipulators. We can, to an extent, pattern-match socks using our parallel vision processing (visual cortex FTW). We can also, to an extent, do motion planning in parallel with acquiring the next pair of socks to pick out of the pile. The theoretic description of the algorithmic complexity of real life sock searching is nothing like what you describe. – Kuba Ober – 2014-03-25T21:05:03.443

8IOW: Real-life sock sorts are completely insensitive to what underlying algorithm you use, and you can't really choose an algorithm because your visual and motor cortex has already chosen one for you. An optimal one, as far as I can tell. The manipulation time trumps everything else. My finding is (and damn I've spent a couple of hours recording myself and analyzing the recordings): you can easily saturate your manipulators, and that's the end of it. All you need is a sufficiently big table. The CS stuff comes in only if your load doesn't fit on one table. – Kuba Ober – 2014-03-25T21:05:56.270

@thang, you don't need to assume the Axiom of Choice if the number of socks is countable. – Apprentice Queue – 2014-04-02T23:21:06.587

@ApprenticeQueue say you're right, what would the choice function look like? I think that you are (provably) wrong, but to give a hint, have a look at this: http://en.wikipedia.org/wiki/Axiom_of_countable_choice (a weaker version of AC). To prove that even axiom of countable choice is independent of ZF is difficult and uses forcing.

– thang – 2014-04-03T08:38:15.763

1You can save memory by immediately stuffing the pile from the basket to the drawer instead of spreading them out on a surface (to find matching pairs). Just look for a matching pair from the drawer when you need it. – user1164937 – 2014-04-30T03:13:19.507

9I am sure someone has already posted this, but you are inferring from the incorrect assumption that there is a pair for each sock. I mean - come on, do you really think that each sock that you are laundering has a matching one... – Vasil Dininski – 2014-04-30T22:33:42.937

3The optimum solution to the sock pairing problem is to put them into the laundry as a pair. This reduces sort time to zero and resolves the single sock problem. – Keith – 2014-05-02T06:12:27.043

@amit you wrote using your naive technique you were doing This requires iterating over n/2 n/4 = n2/8 socks on average.*. What is the reasoning for this? How are the terms n/2 and n/4 appearing? – Geek – 2014-05-07T15:02:45.187

@Geek You need to pair n/2 socks (this is where n/2 came from). by taking one sock, and finding its match. On average, you need to go through half of the left socks. At the first iteration this number is n/2, at second, (n/2)/2, ... the average of n/2,(n-2)/2,(n-4)/2,...,2/2 is n/4. – amit – 2014-05-07T15:38:12.210

3The other problem that everyone else seemed to overlook is the problem of finding a sock's exact match (not just same color and size). I always keep socks with their respective partners so that they wear the same, so if I don't find the exact sock I have always worn with the other sock, they will feel different (unacceptable to my OCD programmer mind) even though they originated from the exact same package. – cnsumner – 2014-07-07T20:25:15.940

5I have also solved the practical problem in O(1) time, by only buying black socks in packs of 50 pairs. It's interesting how many Stack Overflow posters have found the same solution! – Persixty – 2014-11-19T19:29:18.260

3How do you account for the sock without its pair that the dryer monster ate? – woot – 2015-02-24T05:07:57.817

My solution is to only have 1 type of socks of 1 color. This way I can pick any two and they always match. – user985366 – 2017-03-05T14:17:01.283

For socks, I think using a bucket sort would be the most efficient method. have your pile in one spot and your buckets in another. pull a sock, put it in a new pile of socks that match that sock. as long as you have enough table space, you can sort your socks in O(n) time, which is the lower limit, if you're doing this yourself. adding extra workers allows you to push towards O(log n) – Chris Phillips – 2017-09-29T13:23:33.700

your assumption about matching pairs is soooo removed from reality. even with two `parallel.task(child).waitpid.all` the performance gain the accepted answer promises is outweighed by the exponential decline of the number of matching specimen in the pile over time. – dlatikay – 2018-01-22T22:03:07.570

1I do not have any matching pairs of socks. When you put a set of matching pairs of socks into the washing machine, what comes out is a random number of socks (usually odd), none of which match each other. This is a well-known property of washing machines. – Michael Kay – 2018-03-05T15:26:45.410

## Answers

2 210

Sorting solutions have been proposed, but sorting is a little too much: We don't need order; we just need equality groups.

So hashing would be enough (and faster).

1. For each color of socks, form a pile. Iterate over all socks in your input basket and distribute them onto the color piles.
2. Iterate over each pile and distribute it by some other metric (e.g. pattern) into the second set of piles
3. Recursively apply this scheme until you have distributed all socks onto very small piles that you can visually process immediately

This kind of recursive hash partitioning is actually being done by SQL Server when it needs to hash join or hash aggregate over huge data sets. It distributes its build input stream into many partitions which are independent. This scheme scales to arbitrary amounts of data and multiple CPUs linearly.

You don't need recursive partitioning if you can find a distribution key (hash key) that provides enough buckets that each bucket is small enough to be processed very quickly. Unfortunately, I don't think socks have such a property.

If each sock had an integer called "PairID" one could easily distribute them into 10 buckets according to `PairID % 10` (the last digit).

The best real-world partitioning I can think of is creating a rectangle of piles: one dimension is color, the other is the pattern. Why a rectangle? Because we need O(1) random-access to piles. (A 3D cuboid would also work, but that is not very practical.)

Update:

What about parallelism? Can multiple humans match the socks faster?

1. The simplest parallelization strategy is to have multiple workers take from the input basket and put the socks onto the piles. This only scales up so much - imagine 100 people fighting over 10 piles. The synchronization costs (manifesting themselves as hand-collisions and human communication) destroy efficiency and speed-up (see the Universal Scalability Law!). Is this prone to deadlocks? No, because each worker only needs to access one pile at a time. With just one "lock" there cannot be a deadlock. Livelocks might be possible depending on how the humans coordinate access to piles. They might just use random backoff like network cards do that on a physical level to determine what card can exclusively access the network wire. If it works for NICs, it should work for humans as well.
2. It scales nearly indefinitely if each worker has its own set of piles. Workers can then take big chunks of socks from the input basket (very little contention as they are doing it rarely) and they do not need to synchronise when distributing the socks at all (because they have thread-local piles). At the end, all workers need to union their pile-sets. I believe that can be done in O(log (worker count * piles per worker)) if the workers form an aggregation tree.

What about the element distinctness problem? As the article states, the element distinctness problem can be solved in `O(N)`. This is the same for the socks problem (also `O(N)`, if you need only one distribution step (I proposed multiple steps only because humans are bad at calculations - one step is enough if you distribute on `md5(color, length, pattern, ...)`, i.e. a perfect hash of all attributes)).

Clearly, one cannot go faster than `O(N)`, so we have reached the optimal lower bound.

Although the outputs are not exactly the same (in one case, just a boolean. In the other case, the pairs of socks), the asymptotic complexities are the same.

62This is exactly what I do! I make piles dependent on the style of the opening of the sock (I only have white), that gives me enough "buckets" to quickly match each of those up. – Scott Chamberlain – 2013-01-20T00:00:45.880

1Whether sorting is beneficial or not is dependent on the problem domain, specifically on the topography of the phase space of the primary grouping criterion. As a counterexample to your statement, almost all my socks are in some variation of black, so the primary grouping criterion is shape instead - and a quick way to determine a black sock's shape is to examine the length. In this case, I would argue, sorting adds to the efficiency. – mikołak – 2013-01-20T11:26:23.270

1I addressed parallelism and the element distinctness problem. – usr – 2013-01-20T11:45:14.870

1@TheTerribleSwiftTomato I believe sorting by length would only help here because humans are not good at "hashing on length". One could form piles for each millimeter of length, though. Each pile is addressable in `O(1)` so our distribution scheme still works and should be faster than sorting (actually, we are almost sorting but only by accident). – usr – 2013-01-20T11:46:57.660

@usr: well, I would say it's because a) humans put a lot of emphasis on ordering (as e.g. the Ultimatum Game experiments show), b) humans have very powerful, parallelized optical recognition abilities - seeing a row of socks pulled out of a pile, sorted by length, you probably would be immediately able to place the next sock out of the pile in the "right" spot. The problem is, the question as currently posted lacks clarity in one regard: do we treat humans as actual biological entities (with all the possible optimizations), or as bio-robots without regard to their capabilities? – mikołak – 2013-01-20T12:05:09.727

How many times will you have to "Recursively apply this scheme" "until you have distributed all socks onto very small piles that you can visually process immediately"? I would calculate that as O(n log n). If my calculation is correct, how is this better than sorting? – emory – 2013-01-20T12:44:55.620

@emory read the paragraph about the "element distinctness problem". I address these concerns there. One might well need only one step of distribution.; Asymptotic complexity aside, I'd argue that distribution onto piles is easier for humans than sorting along some arbitrary dimension and order ("Is red smaller than blue? Don't remember anymore..."). – usr – 2013-01-20T12:59:54.490

24I've tried this with my socks (I've got easily 30+ pairs) and man it is FAST. One problem I've found is when I can't have a good enough hash algorithm (I've got lots of white socks without any pattern) so it becomes hard. In that case, what would be the optimal way to do it? – NothingsImpossible – 2013-01-20T14:14:40.937

50@NothingsImpossible that's how hash collision attacks feel like for a poor web-server! Are the white socks distinguishable by some attribute? There must be something you can distribute them on. Otherwise, you could just form pairs arbitrarily. – usr – 2013-01-20T14:22:03.417

6@usr They are distinguishable but not on a fast glance. Comparing with file hashing, it is like the colored socks are a 100KB file and the white ones are 1GB files, it takes much longer to analyze. What I do on practice is to use different socks anyway, because nobody will notice it. – NothingsImpossible – 2013-01-20T14:29:45.073

1Whether this satisfies the logarithmic space limit depends on the quality of the hash function. Let's say there is a perfect hash...it would take O(n) memory to remember which trait is associated with which bucket. This isn't an issue with an integer hash as you don't need to keep a table of values to buckets (you just apply some arithmetic) but it's an issue here. If you don't keep a lookup table in memory you need to do a scan of all the buckets, which (again, depending on the quality if the hash) could be no better than just scanning the pile. – Mark Peters – 2013-01-20T17:32:16.060

2@MarkPeters good point. One could argue that we are using the human brain as an associative array: we can immediately spot the appropriate pile that has a certain color. In that sense the brain is a parallel hardware machine. – usr – 2013-01-20T18:12:29.923

For white tube socks, another frequent differentiator is the toe - different colored thread, an embroidered word, etc. – GalacticCowboy – 2013-01-20T19:01:13.500

31This is a Radix Sort, which I agree is the right answer. @MarkPeters I don't think you need a lookup table. A single linear pass over the socks can convert the socks to number vectors, making the mapping of "sock segment" to bucket trivial. The socks can be tied to the vectors with string so that you don't need another linear pass at the end. – Pointy – 2013-01-20T21:31:36.190

(@MarkPeters though I guess during that pass you'd need a mapping mechanism of some sort.) – Pointy – 2013-01-20T21:32:12.897

cryptographic hash functions such as md5 are unnecessary here. – Janus Troelsen – 2013-01-21T13:51:38.030

@JanusTroelsen true. It was an example for a practically perfect hash function. Any good hash function would do, but I had to name one for clarity. – usr – 2013-01-21T14:06:13.103

I assumed everybody did it like this. Of course, I normally break the loop after discovering the first pair, since I'm lucky if sock draw filling results in a set of just socks. – Jodrell – 2013-01-22T14:31:47.927

38A guy I went to college with actually had PairIDs. It was sewn on each pair of socks with thread: 1, 2, 3, 4... – Ryan Lundy – 2013-01-24T22:51:29.457

2For a huge number of socks this might be the solution but it would probably be overkill to keep piling recursively. Given that the piling process will be much more difficult each time you try to distinguish socks in a pile since the difference would not be immediate for the human eye/brain. So i would think just one or two rounds of piling would be enough to start matching in a pile. – harun – 2013-01-25T13:29:54.873

7Or you could do what I do and buy 100% of the same sock, and then you can guarantee O(1) time for search AND sort. – davethegr8 – 2013-09-06T17:24:47.043

`If each sock had an integer called "PairID" one could easily distribute them into 10 buckets according to […] the last digit` My first thought was to buy a big blacklight (pun not intended), and make markings on all the socks with that dye that can only be seen under ultraviolet light, such that any two socks with the same marking match. – Blacklight Shining – 2014-01-23T21:58:54.453

I think this solution glosses over some harder questions... – mattismyname – 2014-02-08T08:59:37.823

5Sigh. When you have socks spread out on a table, you don't actually need to do anything special. With good lighting and good vision, you instantly pick out pairs. It's completely automatic, and our brain processes the image of each sock in parallel and also indicates the pair that has highest certainity of being matched. If the table is sufficiently large, you may need a couple of visual scan targets (pegs sticking out from a workbench, say) to enforce a visual scanning pattern that goes over the entire table. The time cost of this is proportional in number of socks. – Kuba Ober – 2014-03-25T21:13:36.700

+1 For the parallel solution, you can create a thread for each Color with its own list, if they go in sync, they can read the same data-structure with no wait – Khaled.K – 2014-05-25T08:37:03.637

@Pointy: This technically is not hashing or radix sort but bucket sort (which just basically means a generalization of radix sort.) http://en.wikipedia.org/wiki/Bucket_sort

– ldog – 2014-06-27T06:49:54.133

About parallelism, I think we could also introduce thread affinity , so the involved multiple workers are in fact the different sock owners. Each of them should be more efficient at selecting their own socks, provided their tastes are not similar. – rupps – 2014-12-07T02:05:13.233

@davethegr8 but pairing would still be `O(1)` :-) – k_g – 2015-04-03T01:44:27.383

1@usr I am going to ask this question to become un-community wiki. I'll be glad if you can clean it up a bit if it got any messy during the time it was community wiki. Thanks. – amit – 2015-05-01T12:58:31.727

4"you instantly pick out pairs" -- Sigh indeed. No, it is obviously not "instant", and the brain's processing time goes up non-linearly with the number of socks ... as does the time to physically reach random positions on the table. Anyone who has actually sorted socks knows that it goes much faster if you use a number piles that you can quickly reach, and avoid random access. – Jim Balter – 2015-05-27T12:17:21.197

I can't believe you can score 19K on SO answering a question about socks – Mick – 2016-08-31T01:02:39.553

@Mick Isn't http://stackoverflow.com/questions/927358/how-to-undo-last-commits-in-git more egregious? :) Btw, the daily gain is capped at 200 points.

– usr – 2016-08-31T13:17:55.717

I have many very similar looking socks, so I can't sort them on a color/pattern gradient just by looking at one, I have to compare it to others. Would this method still work for me? – Fabian Röling – 2017-07-21T11:25:03.070

@Fabian no, hashing requires an equality comparison. You could make it a little easier by sorting the socks by some continuous attribute such as primary color. You could put the socks into an R-tree if you want to make use of continuous attributes. That might make sense if you have > 1 million socks. – usr – 2017-07-26T15:55:15.763

1I am color blind... "Step 1 - for each color..." Oh, crud! – AgapwIesu – 2017-08-04T14:12:51.570

@usr oh my! this is how a `HashMap` works in java pretty much... – Eugene – 2018-03-08T21:06:40.777

Note there is a SIMD instruction for when all workers need to union their pile-sets, assuming you have a fairly even floor to slide the sock piles together. – TamaMcGlinn – 2018-11-02T12:12:43.657

528

As the architecture of the human brain is completely different than a modern CPU, this question makes no practical sense.

Humans can win over CPU algorithms using the fact that "finding a matching pair" can be one operation for a set that isn't too big.

My algorithm:

``````spread_all_socks_on_flat_surface();
while (socks_left_on_a_surface()) {
// Thanks to human visual SIMD, this is one, quick operation.
pair = notice_any_matching_pair();
remove_socks_pair_from_surface(pair);
}
``````

At least this is what I am using in real life, and I find it very efficient. The downside is it requires a flat surface, but it's usually abundant.

200as the number of socks increases, human's SIMD become no better than a CPU. – Lie Ryan – 2013-01-20T15:42:16.030

20The best answer, IMO. While it's fun and clever (and appropriate for SO) to reduce a day-to-day problem to a computer algorithm, it makes much more sense to use the resolution power of man's eye/brain for a set as small as ~60 socks. – drug_user841417 – 2013-01-20T18:27:06.010

11@LieRyan If the socks are uniformily distributed, you will end up noticing a pair in any sufficiently small set of socks due to the birthday paradox (unless you can distinguish colors to arbitrary precision, which I doubt) so the bottleneck here wouldn't be the human color matching algorithm but the spreading step. – Thomas – 2013-01-21T13:21:08.027

2@Christian: Same for me. Therefore i always use the "deal with it" algorithm: It's the only usable one in this situation... – Askaga – 2013-01-21T14:30:59.563

4@Christian: If you only have black socks, wouldn't any sock match any other sock? ;) – dpc.pw – 2013-01-21T16:06:46.360

3Yep. At the very least, after you're matched all the readily identifiable pairs, you're left with a much smaller number. I don't think I have any actual pairs left, so I tend to just iterate on "find two socks that look less dissimilar than the average random pairing." – Steve Bennett – 2013-01-22T05:30:31.930

5I disagree to the claim that the brain is completely different architecture. The `"finding a matching pair" can be one operation for a set that isn't too big.` is equivalent to searching in a machine cache - which is MUCH faster then RAM look-up, but you are going to load these socks into "cache" first, and thus the suggested approach will still be `O(n^2)` both for human and machine. – amit – 2013-01-22T06:18:06.663

10@dpc.ucore.info No, because they have different woven cuff patterns, cuff lengths, overall lengths and shades of black (my wife would probably physically hurt me for that last one). – Christian – 2013-01-22T15:07:10.860

2@Christian Even among my black socks I can easily identify differences in length, patterns and shades to find matching pairs. It's not as easy as noticing my only pair of bright orange socks, but it's still quite easy. I always spread my socks so they point in the same direction side by side, this seems to make it easier to locate pairs and to pair them. – Annan – 2013-01-22T23:05:12.883

3@Annan Well, lucky you then, because I can't :) – Christian – 2013-01-23T11:22:27.053

172You had better hope you have an even number of socks, otherwise you are going to be folding socks for a long time... – Patrick James McDougle – 2013-01-23T15:14:17.427

3Similarly, selection sort is faster than merge sort for sufficiently small samples – Brian Webster – 2013-01-25T18:15:06.060

1Funny, I had read that question when it popped in news hacker, with no answers yet. I was pairing my socks, and - giving a little attention at how I managed to be 'efficient' - came to the same conclusion! Cooooooooooool ! ;-) – Vinzzz – 2013-01-31T19:53:46.040

I don't think you need to make your horizontal surface single-layered, and thus prohibitively large, given a large pile of randomly-assorted socks. Nor is preemptively sorting necessary, since through the recursive SIMD matching process you will be sorting. If you have a pile, you are likely to have at least one match visible. When you no longer have a visible match, remove items from the pile into another of its 'type'. You will eventually find another match visible on the pile, or will have made one when transferring to the bespoke 'type' pile. How fast is this? that's for a mathematician – JoeBrockhaus – 2013-02-13T21:03:26.273

That'd work for a hundred socks. Now try and scale that up to a few million. – Matt – 2013-11-24T23:10:09.383

@Christian If you have only black socks, matching socks is not a problem. – OJFord – 2014-03-19T19:55:55.563

@OllieFord You honestly have no idea :) – Christian – 2014-03-19T22:28:41.610

1@Christian You need good, task-specific lighting, probably at an oblique angle so that the patterns stand out more. Machine vision people should chime in here with real-life recommendations. – Kuba Ober – 2014-03-25T21:17:27.427

2@KubaOber An easier solution overall might be to just replace the pairs of Sock objects by instances of a WashablePair class. There are socks that come with built-in buttons to enable this but there are also solutions to retrofit conventional socks with the ability to remain paired throughout the washing process. I just personally never got around to refactor my sock collection. – Christian – 2014-03-25T22:14:36.183

1I suggest changing this to `while(pair = notice_any_matching_pair()) {` for a failsafe in case no match can be found (i.e. if there is no match, or if the number of socks that can fit in memory is not enough to find a match). – Brilliand – 2014-04-15T21:46:19.177

2There are some serious limitations to flat-surface-visual-scan performance. If there are a lot of relatively unique pairs (no plain white socks here), you quickly exceed the visual scan capability of the human brain. Requiring more brain attention for matching means you can't watch TV while you pair your socks. Someone else mentioned the practical problem of finding a large enough flat surface. Finally, as the area of the surface increases, you start having problems with distance and visual angle when viewing socks closer to the horizon, and physically traversing your sort area. – Mnebuerquo – 2014-04-29T15:55:17.203

1spreading out the socks takes a lot of space and the question says only logarithmic space – Akshay L Aradhya – 2014-08-01T04:24:04.020

Is it only me who finds this got all the humour votes..or at best, the pseudocode ones..? – Nirav Bhatt – 2015-10-07T10:39:30.447

yes, actually pattern matching takes O(1) with human eyes if N is small. – ramgorur – 2016-06-03T05:08:06.397

231

Case 1: All socks are identical (this is what I do in real life by the way).

Pick any two of them to make a pair. Constant time.

Case 2: There are a constant number of combinations (ownership, color, size, texture, etc.).

Use radix sort. This is only linear time since comparison is not required.

Case 3: The number of combinations is not known in advance (general case).

We have to do comparison to check whether two socks come in pair. Pick one of the `O(n log n)` comparison-based sorting algorithms.

However in real life when the number of socks is relatively small (constant), these theoretically optimal algorithms wouldn't work well. It might take even more time than sequential search, which theoretically requires quadratic time.

I think strictly speaking radix sort won't do because "assume each sock has exactly one matching pair." Though in most practical applications it would work out well. – emory – 2013-01-20T03:08:41.753

7> It might take even more time than sequential search, which requires quadratic time in theory.

Yeah that is why I hate doing this, maybe I should throw away all my socks and start with case 1. – Nils – 2013-01-21T12:19:11.650

4I find it easier to have all the same socks as well. Every couple years I buy 10 of those 6 packs of socks when they are on sale and throw out all my old socks. Just easier to match identical socks and they look better then old holy socks. With this, just a simple first off the top of the pile is the fastest for me. – Michael D. Kirkpatrick – 2013-01-21T13:59:07.307

43the down side of having all identical socks is that they tend to age at different rates. So you still end up trying to match them based on how worn they are. (which is harder than simply matching by pattern) – SDC – 2013-01-22T13:45:32.520

89The problem with having 60 pairs of identical socks "because it makes pairing easier" is that it gives people the impression you work with computers. – Steve Ives – 2013-04-24T12:17:16.240

9Case 1 is not constant time when there's an operation involved, such as folding pairs together. In this case, it's linear time with the smallest constant factor (the proof of which is left as an exercise for the reader). One can't possibly take the same time folding one pair and a bucket full of socks. However, it scales linearly. By Amdahl's law, it has unlimited speedup, ignoring overhead. By Gustafson's law, you can fold as many pairs as it takes to fold one pair given enough workers (the amount of which is left as an exercise for the reader), ignoring overhead. – acelent – 2013-09-04T13:00:06.357

5@PauloMadeira The sorting is constant time - you just take the pile and put it in your drawer. The only operation in this case is actually putting the socks on your feet which is also constant. Performance is gained by deferred execution of the sock wearing, possibly with some sacrifice in space (consumed space of non-folded socks is larger than folded). I argue that this is worth it; I usually lose this argument with my wife. – Travis – 2013-09-12T19:59:21.690

1@Travis, you are correct. But the problem states pairing socks (possibly resulting in a pile of folded/stuffed socks), not sorting a pile of socks (resulting in just another pile). That's what I wanted to point out. – acelent – 2013-09-13T20:38:22.340

4@TerryLi Case 4: you have all the same socks, your wife has a lot of different ones. Solution: make 2 piles, use 'case 1' for the pile of your socks, let your wife deal with her own mess. That's basically how I do it :-) – atlaste – 2014-02-11T09:34:14.013

144

Non-algorithmic answer, yet "efficient" when I do it:

• step 1) discard all your existing socks

• step 2) go to Walmart and buy them by packets of 10 - n packet of white and m packets of black. No need for other colors in everyday's life.

Yet times to times, I have to do this again (lost socks, damaged socks, etc.), and I hate to discard perfectly good socks too often (and I wished they kept selling the same socks reference!), so I recently took a different approach.

Algorithmic answer:

Consider than if you draw only one sock for the second stack of socks, as you are doing, your odds of finding the matching sock in a naive search is quite low.

• So pick up five of them at random, and memorize their shape or their length.

Why five? Usually humans are good are remembering between five and seven different elements in the working memory - a bit like the human equivalent of a RPN stack - five is a safe default.

• Pick up one from the stack of 2n-5.

• Now look for a match (visual pattern matching - humans are good at that with a small stack) inside the five you drew, if you don't find one, then add that to your five.

• Keep randomly picking socks from the stack and compare to your 5+1 socks for a match. As your stack grows, it will reduce your performance but raise your odds. Much faster.

Feel free to write down the formula to calculate how many samples you have to draw for a 50% odds of a match. IIRC it's an hypergeometric law.

I do that every morning and rarely need more than three draws - but I have `n` similar pairs (around 10, give or take the lost ones) of `m` shaped white socks. Now you can estimate the size of my stack of stocks :-)

BTW, I found that the sum of the transaction costs of sorting all the socks every time I needed a pair were far less than doing it once and binding the socks. A just-in-time works better because then you don't have to bind the socks, and there's also a diminishing marginal return (that is, you keep looking for that two or three socks that when somewhere in the laundry and that you need to finish matching your socks and you lose time on that).

20Upvote for 'non-algorithmic' answer. This is exactly what I do and it works wonderfully. The replacement issue is not a problem if you 'rotate' your sock stock by placing washed socks in back and pulling from the front of the drawer in the morning. All socks wear evenly. When I start noticing some wear on one, I put on the shopping list to completely replace that entire class of socks. For the old socks, I give the best 20% to Goodwill (tied in a grocery sac so they don't get mixed back in) and pitch the rest. You're not wasting socks, at this point, the 80% only have 6 months left anyway. – FastAl – 2013-01-22T16:51:03.640

1BTW (1) Binding your socks results in the elastic one one being stored stretched and the will fail much more quickly. Limiting the kinds of unique socks you have makes binding unneded. (2) A disadvantage of limiting unique socks is that for people with certain fashion concerns, the method may be unsuitable. – FastAl – 2013-01-22T16:53:36.540

2I came here specifically to post your "non-algorithmic" answer. As in true computer science, most people never pay enough attention to the data and its structure. – bkconrad – 2013-09-06T23:39:31.573

I use this algorithmic approach every morning and it works like a charm! Additionally, I put worn out socks to a different pile to throw away later (unfortunately they manage to get to the original pile again before I find the time to trash it). – Donatas Olsevičius – 2013-09-11T13:35:41.107

in my case, I don't need step (1), as they automatically discard themselves, unfortunately not in pairs. Socks tend to disappear at a constant rate, normally about 2 months are enough to lose track of 10 pairs. – rupps – 2014-12-07T01:54:46.027

A lot of people tend to discount the non-algorithmic answer as a joke. "Sure, it works for socks, but this is really a discussion about programming and you're dodging the question."

But I think it's still relevant.

In my experience building actual software products for customers, if someone spends days working on a problem that looks like it came out of a college textbook, there's almost always a simpler (non-algorithmic) solution they're overlooking. – DrShaffopolis – 2015-04-23T13:52:11.713

2«n packet of white and m packets of black. No need for other colors in everyday's life» A good standard rule for easy sock selection is actually that they should match either the color of your trousers or the color of your belt. For this reason, the most commonly used colors will likely be black, blue, gray and some brown. It's hard to believe one needs many white socks. – Andrea Lazzarotto – 2015-12-13T14:46:52.850

92

What I do is that I pick up the first sock and put it down (say, on the edge of the laundry bowl). Then I pick up another sock and check to see if it's the same as the first sock. If it is, I remove them both. If it's not, I put it down next to the first sock. Then I pick up the third sock and compare that to the first two (if they're still there). Etc.

This approach can be fairly easily be implemented in an array, assuming that "removing" socks is an option. Actually, you don't even need to "remove" socks. If you don't need sorting of the socks (see below), then you can just move them around and end up with an array that has all the socks arranged in pairs in the array.

Assuming that the only operation for socks is to compare for equality, this algorithm is basically still an n2 algorithm, though I don't know about the average case (never learned to calculate that).

Sorting, of course improves efficiency, especially in real life where you can easily "insert" a sock between two other socks. In computing the same could be achieved by a tree, but that's extra space. And, of course, we're back at NlogN (or a bit more, if there are several socks that are the same by sorting criteria, but not from the same pair).

Other than that, I cannot think of anything, but this method does seem to be pretty efficient in real life. :)

6This is also what I do, (note that if you simply leave spaces then inserts are also O(1) ), but it scales poorly with theoretically large numbers of socks. – Mooing Duck – 2013-01-19T17:40:42.800

14scales poorly with theoretically large numbers of types of socks – Steven Lu – 2013-01-20T05:14:19.320

@StevenLu - as I said - it's n*n or nLogn, depending on whether you sort it or not. So it scales about as poorly as any sorting algorithm. If you want faster, number them and use radix sort. – Vilx- – 2013-01-20T09:34:20.663

This is essentially storing found-but-not-matched socks in a hash-based lookup. With an ideal hash it is O(n), but if you've enough socks stored that the hash begins to degenerate, it becomes more complex accordingly. – Jon Hanna – 2013-01-20T14:34:54.103

I cannot imagine sorting, which sock is "bigger" or "smaller" than other? :D – graywolf – 2013-01-20T17:50:23.583

@Paladin - sew a unique number on each pair. :) – Vilx- – 2013-01-21T09:21:15.717

This method is what I use, because it makes use of a nice heuristic: Since the "pile" sock-structure has several top elements, it is possible to select the one sock at the top that is most likely to match any of your previously picked socks. – evilcandybag – 2013-01-21T10:56:32.400

I spread out the socks first and skip the hanging on the edge of the laundry bowl. But then I'm good at seeing the wood in the trees. – hplbsh – 2013-01-21T21:11:04.570

@lttlrck - that works if your socks are different enough that you can find some pairs in a glance. Mine mostly require deeper inspection, so spreading them out doesn't help much. – Vilx- – 2013-01-22T10:59:45.320

2what value does inserting a sock between 2 other socks provide to the goal of pairing socks? there is no cardinality of socks. :-x – JoeBrockhaus – 2013-02-13T21:06:20.653

@JoeBrockhaus - `Sorting, of course improves efficiency, especially in real life where you can easily "insert" a sock between two other socks.` What I mean is - if you can compare socks, then you can sort them, and insertion helps speed up sorting (the easiest sort for humans is Insertion Sort, which benefits greatly from this operation). – Vilx- – 2013-02-14T07:55:23.930

This is what I do as well, and since I only ever have 4 to 5 different types of socks, I can always quickly eyeball the unmatched socks, and so for all practically purposes, it functions as the ideal Hash @JonHanna described and scales as O(n). – Jacob Eggers – 2013-09-07T01:26:10.147

"Sorting, of course improves efficiency" -- No, it of course does the opposite. – Jim Balter – 2015-05-27T12:21:50.670

@JimBalter - Hm? I was comparing it to the naive algorithm where you compare each sock with each other sock. Shouldn't sorting improve that? – Vilx- – 2015-05-27T12:34:32.953

Classic strawman fallacy. – Jim Balter – 2015-05-27T12:36:15.550

@JimBalter - OK, wait, what? Let's analyze this. Suppose you have N pairs of socks (that's 2N socks). The worst case scenario is - the first N socks you pull are each of a different pair. So you've got N socks lined up and only the N+1 sock actuall removes something. In the naive scenario, the first sock would require 0 comparisons, the second 1 comparison, etc. until the Nth sock would require N-1 comparisons. Altogether that's N(N-1)/2 comparisons. For removing, the worst case is pretty much the same. So this algorithm is O(NN). – Vilx- – 2015-05-27T15:24:51.670

@JimBalter - OK, so now take sorting. Suppose each pair of socks has a number printed on both of them. The numbers are from 1 to N, of course. Now whenever I take a new sock out, I keep the line in front of me sorted. I can find the place to insert the new sock in O(logN), and I can insert in O(1). Since this is Real Life and I can just shove it in place while shifting all other socks at the same time. On a computer you could use a balanced tree to the same effect (trees are harder to work with in Real Life). – Vilx- – 2015-05-27T15:33:18.233

@JimBalter - All in all, taking the first N socks will be an O(NlogN) operation, and then taking the remaining socks too. So, as far as I can see, sorting DOES help - at least, from a theoretical standpoint. Did I make a mistake somewhere? – Vilx- – 2015-05-27T15:34:59.903

50

This is asking the wrong question. The right question to ask is, why am I spending time sorting socks? How much does it cost on yearly basis, when you value your free time for X monetary units of your choice?

And more often than not, this is not just any free time, it's morning free time, which you could be spending in bed, or sipping your coffee, or leaving a bit early and not being caught in the traffic.

It's often good to take a step back, and think a way around the problem.

And there is a way!

Find a sock you like. Take all relevant features into account: colour in different lighting conditions, overall quality and durability, comfort in different climatic conditions, and odour absorption. Also important is, they should not lose elasticity in storage, so natural fabrics are good, and they should be available in a plastic wrapping.

It's better if there's no difference between left and right foot socks, but it's not critical. If socks are left-right symmetrical, finding a pair is O(1) operation, and sorting the socks is approximate O(M) operation, where M is the number of places in your house, which you have littered with socks, ideally some small constant number.

If you chose a fancy pair with different left and right sock, doing a full bucket sort to left and right foot buckets take O(N+M), where N is the number of socks and M is same as above. Somebody else can give the formula for average iterations of finding the first pair, but worst case for finding a pair with blind search is N/2+1, which becomes astronomically unlikely case for reasonable N. This can be sped up by using advanced image recognition algorithms and heuristics, when scanning the pile of unsorted socks with Mk1 Eyeball.

So, an algorithm for achieving O(1) sock pairing efficiency (assuming symmetrical sock) is:

1. You need to estimate how many pairs of socks you will need for the rest of your life, or perhaps until you retire and move to warmer climates with no need to wear socks ever again. If you are young, you could also estimate how long it takes before we'll all have sock-sorting robots in our homes, and the whole problem becomes irrelevant.

2. You need to find out how you can order your selected sock in bulk, and how much it costs, and do they deliver.

3. Order the socks!

4. Get rid of your old socks.

An alternative step 3 would involve comparing costs of buying the same amount of perhaps cheaper socks a few pairs at a time over the years and adding the cost of sorting socks, but take my word for it: buying in bulk is cheaper! Also, socks in storage increase in value at the rate of stock price inflation, which is more than you would get on many investments. Then again there is also storage cost, but socks really do not take much space on the top shelf of a closet.

Problem solved. So, just get new socks, throw/donate your old ones away, and live happily ever after knowing you are saving money and time every day for the rest of your life.

A lifetime (assuming 75 years) supply of socks (assuming you exhaust 4 pairs/month, that makes 3600 pairs) would take up (assuming a new pair of socks takes up 20 cubic inches) a total of 1 1/2 cubic yards. That is an enormous amount of space. Assuming they deliver it to you in a box that is roughly a cube, that crate will be about 3 feet 4 inches on a side. – AJMansfield – 2013-07-11T20:42:21.013

1@AJMansfield valid concern. However, I disagree with a few of your numbers. I'd take a timespan of just 40 years (25...65) (time between not living at parents/dorm/etc and retiring, see above). Also, I think one pair takes more like 0,5x4x6 inches in original packaging. These numbers bring your space estime down quite a bit! – hyde – 2013-07-14T06:40:27.113

Step 4 is unnecessarily wasteful, -1. – Dan Bechard – 2013-11-04T15:49:02.547

1Guide for others who might be confused by AJMansfield's measurements, a translation into metric: »would take up (assuming a new pair of socks takes up 327 cm³) a total of 1.14 m³. That is an enormous amount of space. Assuming they deliver it to you in a box that is roughly a cube, that crate will be about 1.04 m on a side.« – Joey – 2013-11-28T10:51:38.263

47

The theoretical limit is O(n) because you need to touch each sock (unless some are already paired somehow).

You can achieve O(n) with radix sort. You just need to pick some attributes for the buckets.

1. First you can choose (hers, mine) - split them into 2 piles,
2. then use colors (can have any order for the colors, e.g. alphabetically by color name) - split them into piles by color (remember to keep the initial order from step 1 for all socks in the same pile),
3. then length of the sock,
4. then texture, ....

If you can pick a limited number of attributes, but enough attributes that can uniquely identify each pair, you should be done in O(k * n), which is O(n) if we can consider k is limited.

2Socks often come in 4-packs and larger, since that is cheaper, but that also makes them indistinguishable. To counter this, my wife sews a tiny mark onto each new pair of socks I buy. The mark is of a different color for each pair, or of a different shape, if she runs out of colors. With this approach you don't even need a limited set of attributes. Just sew a unique number on each pair. :) For extra points, use binary. – Vilx- – 2013-01-19T21:45:58.807

26@Vilx- WHY?!? Isn't the whole point that they be indistinguishable? – flup – 2013-01-20T11:10:28.870

2@flup - I think the whole point is to sell in larger bundles. :) As for me, this helps to wear them down in pairs. Otherwise I can end up with three very worn socks and one brand new one. Kinda silly. – Vilx- – 2013-01-20T12:39:20.687

12I disagree with the calculation of O(n). What is \$k\$? \$k\$ is the number of attributes. I would argue \$k\$ is \$O(log n)\$ because it has to be enough to uniquely identify each pair. If you have 2 pairs (black and white), then color (\$k=1, n=2\$) is enough. If you have one pair of black, short; one pair of black, long; one pair of white, short; and one pair of white, long - then \$k=2, n=4\$. Then if we limit \$k\$, we at the same time limit \$n\$. If we are going to limit \$n\$ then order calculation does not make sense anymore. – emory – 2013-01-20T13:25:31.980

2@emory, I think that you're looking for the backtick, not the `\$` character, to make your stuff look code-y. – Xymostech – 2013-01-21T02:46:28.733

1

You're abusing Landau notation... Big Oh is used for the upper bound: see Family of Bachmann-Landau notations. Also: "You can achieve O(n) with radix sort.". O(n) for what? Building? Insertion?

– Janus Troelsen – 2013-01-21T13:47:14.560

@Xymostech you are correct. It does not seem I can edit my comment to correct that mistake. – emory – 2013-01-22T19:10:59.197

@Xymostech LaTeX tags anyone? StackOverflow really should have them, they do come in really handy sometimes. – Thomas – 2013-01-26T07:06:40.700

1For emory: I disagree with the calculation of O(n). What is `k`? `k` is the number of attributes. I would argue `k` is `O(log n)` because it has to be enough to uniquely identify each pair. If you have 2 pairs (black and white), then color (`k=1, n=2`) is enough. If you have one pair of black, short; one pair of black, long; one pair of white, short; and one pair of white, long - then `k=2, n=4`. Then if we limit `k`, we at the same time limit `n`. If we are going to limit `n` then order calculation does not make sense anymore. – Jim Balter – 2013-04-01T19:01:25.927

1@Vilx- But how do you deal with lost socks? When I have one sock that wears out, I just throw it away and throw the other into the laundry, hoping that it will match up with some other orphaned socks. – Jacob Eggers – 2013-09-07T01:27:46.997

@JacobEggers - I always throw out the whole pair. Arguably suboptimal, yes. – Vilx- – 2013-09-07T17:53:43.087

@Xymostech: Some SE sites (like Mathematics) let you use the `\$` to mark up LaTeX. – rvighne – 2014-04-13T05:07:54.037

"As for me, this helps to wear them down in pairs. Otherwise I can end up with three very worn socks and one brand new one. Kinda silly." -- What's "kinda silly" is that you get the same amount of total wear time either way, so it's stupid to sew tags into identical socks to distinguish them ... and if the point is to pair socks of equal degrees of wear, you can do that visually (to the precision needed). "I always throw out the whole pair." -- More stupidity stemming from the stupid tagging. – Jim Balter – 2015-05-27T12:25:35.390

31

As a practical solution:

1. Quickly make piles of easily distinguishable socks. (Say by color)
2. Quicksort every pile and use the length of the sock for comparison. As a human you can make a fairly quick decision which sock to use to partition that avoids worst case. (You can see multiple socks in parallel, use that to your advantage!)
3. Stop sorting piles when they reached a threshold at which you are comfortable to find spot pairs and unpairable socks instantly

If you have 1000 socks, with 8 colors and an average distribution, you can make 4 piles of each 125 socks in c*n time. With a threshold of 5 socks you can sort every pile in 6 runs. (Counting 2 seconds to throw a sock on the right pile it will take you little under 4 hours.)

If you have just 60 socks, 3 colors and 2 sort of socks (yours / your wife's) you can sort every pile of 10 socks in 1 runs (Again threshold = 5). (Counting 2 seconds it will take you 2 min).

The initial bucket sorting will speed up your process, because it divides your n socks into k buckets in `c*n` time so than you will only have to do `c*n*log(k)` work. (Not taking into account the threshold). So all in all you do about `n*c*(1 + log(k))` work, where c is the time to throw a sock on a pile.

This approach will be favourable compared to any `c*x*n + O(1)` method roughly as long as `log(k) < x - 1`.

In computer science this can be helpful: We have a collection of n things, an order on them (length) and also an equivalence relation (extra information, for example the color of socks). The equivalence relation allows us to make a partition of the original collection, and in every equivalence class our order is still maintained. The mapping of a thing to it's equivalence class can be done in O(1), so only O(n) is needed to assign each item to a class. Now we have used our extra information and can proceed in any manner to sort every class. The advantage is that the data sets are already significantly smaller.

The method can also be nested, if we have multiple equivalence relations -> make colour piles, than within every pile partition on texture, than sort on length. Any equivalence relation that creates a partition with more than 2 elements that have about even size will bring a speed improvement over sorting (provided we can directly assign a sock to its pile), and the sorting can happen very quickly on smaller data sets.

2Human optimisation: I'd argue that as a human, for step 2, you should plonk the socks down in roughly ascending order, then repeat with finer and finer granularity until sorted, a bit like shell sort. This would be much faster for a human (visual estimation) than a comparison-swap based approach. – AndrewC – 2013-01-21T01:16:43.070

25

This question is actually deeply philosophical. At heart it's about whether the power of people to solve problems (the "wetware" of our brains) is equivalent to what can be accomplished by algorithms.

An obvious algorithm for sock sorting is:

``````Let N be the set of socks that are still unpaired, initially empty
for each sock s taken from the dryer
if s matches a sock t in N
remove t from N, bundle s and t together, and throw them in the basket
else
add s to N
``````

Now the computer science in this problem is all about the steps

1. "if s pairs with a sock t in N". How quickly can we "remember" what we've seen so far?
2. "remove t from N" and "add s to N". How expensive is keeping track of what we've seen so far?

Human beings will use various strategies to effect these. Human memory is associative, something like a hash table where feature sets of stored values are paired with the corresponding values themselves. For example, the concept of "red car" maps to all the red cars a person is capable of remembering. Someone with a perfect memory has a perfect mapping. Most people are imperfect in this regard (and most others). The associative map has a limited capacity. Mappings may bleep out of existence under various circumstances (one beer too many), be recorded in error ("I though her name was Betty, not Nettie"), or never be overwritten even though we observe that the truth has changed ("dad's car" evokes "orange Firebird" when we actually knew he'd traded that in for the red Camaro).

In the case of socks, perfect recall means looking at a sock `s` always produces the memory of its sibling `t`, including enough information (where it is on the ironing board) to locate `t` in constant time. A person with photographic memory accomplishes both 1 and 2 in constant time without fail.

Someone with less than perfect memory might use a few commonsense equivalence classes based on features within his capability to track: size (papa, mama, baby), color (greenish, redish, etc.), pattern (argyle, plain, etc.), style (footie, knee-high, etc.). So the ironing board would be divided into sections for the categories. This usually allows the category to be located in constant time by memory, but then a linear search through the category "bucket" is needed.

Someone with no memory or imagination at all (sorry) will just keep the socks in one pile and do a linear search of the whole pile.

A neat freak might use numeric labels for pairs as someone suggested. This opens the door to a total ordering, which allows the human to use exactly the same algorithms we might with a CPU: binary search, trees, hashes, etc.

So the "best" algorithm depends on the qualities of the wetware/hardware/software that is running it and our willingness to "cheat" by imposing a total order on pairs. Certainly a "best" meta-algorithm is to hire the worlds best sock-sorter: a person or machine that can aquire and quickly store a huge set N of sock attribute sets in a 1-1 associative memory with constant time lookup, insert, and delete. Both people and machines like this can be procured. If you have one, you can pair all the socks in O(N) time for N pairs, which is optimal. The total order tags allow you to use standard hashing to get the same result with either a human or hardware computer.

Ok, that's better, although it's still quite wrong ... this question is not about that. Whether or not the Church-Turing thesis is correct, both humans and our computers can sort socks. (The reality is that, humans, being highly finite entities, have far less computational power than Turing Machines ... and the same is true of our computers, but the limitations are different.) – Jim Balter – 2013-04-01T23:52:46.267

I disagree. Of course any of our current computers is essentially and enormous DFA (modulo i/o differences) rather than a TM. Any analog device, however, such as our bodies, is capable of emulating an infinite tape. We don't yet have a useful characterization of the way our minds compute. – Gene – 2013-04-02T01:10:09.570

No infinite tape for humans or other physical devices because nothing in the human brain has infinite resolution, nor could it. It would also help to learn some neuroscience. In any case, there was no deep philosophical question here, regardless of your desire to inject one. But believe what you will ... this isn't the place for this sort of debate and I've had it too many times before. But I'm always amused by people who can barely solve the simplest problems (that's all of us) imagining that they are TM-equivalent. – Jim Balter – 2013-04-02T02:43:23.090

24

You are trying to solve the wrong problem.

Solution 1: Each time you put dirty socks in your laundry basket, tie them in a little knot. That way you will not have to do any sorting after the washing. Think of it like registering an index in a Mongo database. A little work ahead for some CPU savings in the future.

Solution 2: If it's winter, you don't have to wear matching socks. We are programmers. Nobody needs to know, as long as it works.

Solution 3: Spread the work. You want to perform such a complex CPU process asynchronously, without blocking the UI. Take that pile of socks and stuff them in a bag. Only look for a pair when you need it. That way the amount of work it takes is much less noticeable.

Hope this helps!

2Tying socks (or any clothes) in a knot reduces the capability of the washer to wash the clothes, and makes untying them to wear much more difficult. Solution 2 makes maintenance more difficult the longer the state of affairs progresses; after 6 months, when you need two black ankle socks to wear with a pair of shorts and sneakers, 6 months of doing whatever works is going to make finding that pair in the same condition (dirty/clean, similar wear) much less likely. Solution 3 is less "asynchronous" and more straight-up "lazy"; do the minimum work you need exactly when you need to. – KeithS – 2017-04-19T22:15:55.267

20

Here's an Omega(n log n) lower bound in comparison based model. (The only valid operation is comparing two socks.)

Suppose that you know that your 2n socks are arranged this way:

p1 p2 p3 ... pn pf(1) pf(2) ... pf(n)

where f is an unknown permutation of the set {1,2,...,n}. Knowing this cannot make the problem harder. There are n! possible outputs (matchings between first and second half), which means you need log(n!) = Omega(n log n) comparisons. This is obtainable by sorting.

Since you are interested in connections to element distinctness problem: proving the Omega(n log n) bound for element distinctness is harder, because the output is binary yes/no. Here, the output has to be a matching and the number of possible outputs suffices to get a decent bound. However, there's a variant connected to element distinctness. Suppose you are given 2n socks and wonder if they can be uniquely paired. You can get a reduction from ED by sending (a1, a2, ..., an) to (a1, a1, a2, a2, ..., an, an). (Parenthetically, the proof of hardness of ED is very interesting, via topology.)

I think that there should be an Omega(n2) bound for the original problem if you allow equality tests only. My intuition is: Consider a graph where you add an edge after a test, and argue that if the graph is not dense the output is not uniquely determined.

19

Cost: Moving socks -> high, finding/search socks in line -> small

What we want to do is reduce the number of moves, and compensate with the number of searches. Also, we can utilize the multithreded environment of the Homo Sapiens to hold more things in the descision cache.

X = Yours, Y = Your spouses

From pile A of all socks:

Pick two socks, place corresponding X sock in X line, and Y sock in Y line at next available position.

Do until A is empty.

For each line X and Y

1. Pick the first sock in line, search along the line until it finds the corresponding sock.

2. Put into the corresponding finished line of socks.

3. Optional While you are searching the line and and the current sock you are looking at is identical to the previous, do step 2 for these socks.

Optionally to step one, you pick up two sock from that line instead of two, as the caching memory is large enough we can quickly identify if either sock matches the current one on the line you are observing. If you are fortunate enough to have three arms, you could possibly parse three socks at the same time given that the memory of the subject is large enough.

Do until both X and Y is empty.

Done

However, as this have simillar complexity as selection sort, the time taken is far less due to the speeds of I/O(moving socks) and search(searching the line for a sock).

17

This is how I actually do it, for p pairs of socks (n = 2p individual socks):

• Grab a sock at random from the pile.
• For the first sock, or if all previously-chosen socks have been paired, simply place the sock into the first "slot" of an "array" of unpaired socks in front of you.
• If you have one or more selected unpaired socks, check your current sock against all the unpaired socks in the array.
• It is possible to separate socks into general classes or types (white/black, ankle/crew, athletic/dress) when building your array, and "drill-down" to only compare like-for-like.
• If you find an acceptable match, put both socks together and remove them from the array.
• If you do not, put the current sock into the first open slot in the array.
• Repeat with every sock.

The worst-case scenario of this scheme is that every pair of socks is different enough that it must be matched exactly, and that the first n/2 socks you pick are all different. This is your O(n2) scenario, and it's extremely unlikely. If the number of unique types of sock t is less than the number of pairs p = n/2, and the socks in each type are alike enough (usually in wear-related terms) that any sock of that type can be paired with any other, then as I inferred above, the maximum number of socks you will ever have to compare to is t, after which the next one you pull will match one of the unpaired socks. This scenario is much more likely in the average sock drawer than the worst-case, and reduces the worst-case complexity to O(n*t) where usually t << n.

1This is probably pretty close to my mental process. I have an added layer of pre-sort optimization. My athletic socks get washed with the whites and my dress socks get washed with colors. This means that as long as I don't dump two loads of laundry together, my socks are already grouped by type. The white load goes really fast (many identical socks) but the dress socks take longer. Other key tip--make more available memory for the sort (fold and remove all non-socks first and THEN run the pairing algorithm) – orh – 2013-01-23T22:39:05.803

16

Real world approach:

As rapidly as possible, remove socks from the unsorted pile one at a time and place in piles in front of you. The piles should be arranged somewhat space-efficiently, with all socks pointing the same direction; the number of piles is limited by the distance you can easily reach. The selection of a pile on which to put a sock should be -- as rapidly as possible -- by putting a sock on a pile of apparently like socks; the occasional type I (putting a sock on a pile it doesn't belong to) or type II (putting a sock in its own pile when there's an existing pile of like socks) error can be tolerated -- the most important consideration is speed. Once all the socks are in piles, rapidly go through the multi-sock piles creating pairs and removing them (these are heading for the drawer). If there are non-matching socks in the pile, re-pile them to their best (within the as-fast-as-possible constraint) pile. When all the multi-sock piles have been processed, match up remaining pairable socks that weren't paired due to type II errors. Whoosh, you're done -- and I have a lot of socks and don't wash them until a large fraction are dirty. Another practical note: I flip the top of one of a pair of socks down over the other, taking advantage of their elastic properties, so they stay together while being transported to the drawer and while in the drawer.

14

From your question it is clear you don't have much actual experience with laundry :). You need an algorithm that works well with a small number of non-pairable socks.

The answers till now don't make good use of our human pattern recognition capabilities. The game of Set provides a clue of how to do this well: put all socks in a two-dimensional space so you can both recognize them well and easily reach them with your hands. This limits you to an area of about 120 * 80 cm or so. From there select the pairs you recognize and remove them. Put extra socks in the free space and repeat. If you wash for people with easily recognizable socks (small kids come to mind), you can do a radix sort by selecting those socks first. This algorithm works well only when the number of single socks is low

That is usually how I do it. Works much better than iterating through all the remaining socks each time. – yu_ominae – 2013-01-22T23:46:13.280

Nice approach and I think it can be applied to some real CS problems as well. Can you please add an example of such (a CS problem where we could use a similar approach to solve problems)? Also, how does this solution scales for millions of socks? – amit – 2013-01-23T12:08:44.953

I think this is basically th same as the other answer here, http://stackoverflow.com/a/14423956, from Jan 20. Both +1. Human vision system is massively parallel.

– Will Ness – 2013-02-07T16:44:32.073

13

Pick up a first sock and place it on a table. Now pick another sock; if it matches the first picked, place it on top of the first. If not, place it on the table a small distance from the first. Pick a third sock; if it matches either of the previous two, place it on top of them or else place it a small distance from the third. Repeat until you have picked up all the socks.

1This is the only valid answer. All the others disregard the fact that the most time is spent distinguishing between similar socks (so lumping them all together by physical appearance makes it even worse). – entonio – 2014-05-05T09:43:09.283

For fun I wrote this method of piling socks up into a little python program https://gist.github.com/justinfay/53b574cf0a492f6795ef

– justinfay – 2014-07-04T08:05:03.350

12

``````List<Sock> UnSearchedSocks = getAllSocks();
List<Sock> UnMatchedSocks = new list<Sock>();
List<PairOfSocks> PairedSocks = new list<PairOfSocks>();

foreach (Sock newSock in UnsearchedSocks)
{
Sock MatchedSock = null;
foreach(Sock UnmatchedSock in UnmatchedSocks)
{
if (UnmatchedSock.isPairOf(newSock))
{
MatchedSock = UnmatchedSock;
break;
}
}
if (MatchedSock != null)
{
UnmatchedSocks.remove(MatchedSock);
PairedSocks.Add(new PairOfSocks(MatchedSock, NewSock));
}
else
{
UnmatchedSocks.Add(NewSock);
}
}
``````

12

I came out with another solution which would not promise fewer operations, neither less time consumption, but it should be tried to see if it can be a good-enough heuristic to provide less time consumption in huge series of sock pairing.

Preconditions: There is no guarantee that there are the same socks. If they are of the same color it doesn't mean they have the same size or pattern. Socks are randomly shuffled. There can be odd number of socks (some are missing, we don't know how many). Prepare to remember a variable "index" and set it to 0.

The result will have one or two piles: 1. "matched" and 2. "missing"

Heuristic:

1. Find most distinctive sock.
2. Find its match.
3. If there is no match, put it on the "missing" pile.
4. Repeat from 1. until there are no more most distinctive socks.
5. If there are less then 6 socks, go to 11.
6. Pair blindly all socks to its neighbor (do not pack it)
7. Find all matched pairs, pack it and move packed pairs to "matched" pile; If there were no new matches - increment "index" by 1
8. If "index" is greater then 2 (this could be value dependent on sock number because with greater number of socks there are less chance to pair them blindly) go to 11
9. Shuffle the rest
10. Go to 1
11. Forget "index"
12. Pick a sock
13. Find its pair
14. If there is no pair for the sock, move it to the "missing" pile
15. If match found pair it, pack pair and move it to the "matched" pile
16. If there are still more then one socks go to 12
17. If there is just one left go to 14
18. Smile satisfied :)

Also, there could be added check for damaged socks also, as if the removal of those. It could be inserted between 2 and 3, and between 13 and 14.

I'm looking forward to hear about any experiences or corrections.

After I wrote this, I use it every time. It helped me to became a bit more efficient and the job is less boring now. – Sasa – 2015-02-12T09:48:34.523

12

In order to say how efficient it is to pair socks from a pile, we have to define the machine first, because the pairing isn't done whether by a turing nor by a random access machine, which are normally used as the basis for an algorithmic analysis.

# The machine

The machine is an abstraction of a the real world element called human being. It is able to read from the environment via a pair of eyes. And our machine model is able to manipulate the environment by using 2 arms. Logical and arithmetic operations are calculated using our brain (hopefully ;-)).

We also have to consider the intrinsic runtime of the atomic operations that can be carried out with these instruments. Due to physical constraints, operations which are carried out by an arm or eye have non constant time complexity. This is because we can't move an endlessly large pile of socks with an arm nor can an eye see the top sock on an endlessly large pile of socks.

However mechanical physics give us some goodies as well. We are not limited to move at most one sock with an arm. We can move a whole couple of them at once.

So depending on the previous analysis following operations should be used in descending order:

• logical and arithmetic operations
• environmental reads
• environmental modifications

We can also make use of the fact that people only have a very limited amount of socks. So an environmental modification can involve all socks in the pile.

# The algorithm

So here is my suggestion:

1. Spread all socks in the pile over the floor.
2. Find a pair by looking at the socks on the floor.
3. Repeat from 2 until no pair can be made.
4. Repeat from 1 until there are no socks on the floor.

Operation 4 is necessary, because when spreading socks over the floor some socks may hide others. Here is the analysis of the algorithm:

# The analysis

The algorithm terminates with high probability. This is due to the fact that one is unable to find pairs of socks in step number 2.

For the following runtime analysis of pairing `n` pairs of socks, we suppose that at least half of the `2n` socks aren't hidden after step 1. So in the average case we can find `n/2` pairs. This means that the loop is step 4 is executed `O(log n)` times. Step 2 is executed `O(n^2)` times. So we can conclude:

• The algorithm involves `O(ln n + n)` environmental modifications (step 1 `O(ln n)` plus picking every pair of sock from the floor)
• The algorithm involves `O(n^2)` environmental reads from step 2
• The algorithm involves `O(n^2)` logical and arithmetic operations for comparing a sock with another in step 2

So we have a total runtime complexity of `O(r*n^2 + w*(ln n + n))` where `r` and `w` are the factors for environmental read and environmental write operations respectively for a reasonable amount of socks. The cost of the logical and arithmetical operations are omitted, because we suppose that it takes a constant amount of logical and arithmetical operations to decide whether 2 socks belong to the same pair. This may not be feasible in every scenario.

1

this is the same as http://stackoverflow.com/a/14423956 and http://stackoverflow.com/a/14468913 I think.

– Will Ness – 2013-02-07T16:54:19.827

@WillNess Yep, with a bit more of explanation – SpaceTrucker – 2013-02-07T19:19:52.797

10

Socks, whether real ones or some analogous data structure, would be supplied in pairs.

The simplest answer is prior to allowing the pair to be separated, a single data structure for the pair should have been initialized that contained a pointer to the left and right sock, thus enabling socks to be referred to directly or via their pair. A sock may also be extended to contain a pointer to its partner.

This solves any computational pairing problem by removing it with a layer of abstraction.

Applying the same idea to the practical problem of pairing socks, the apparent answer is: don't allow your socks to ever be unpaired. Socks are provided as a pair, put in the drawer as a pair (perhaps by balling them together), worn as a pair. But the point where unpairing is possible is in the washer, so all that's required is a physical mechanism that allows the socks to stay together and be washed efficiently.

There are two physical possibilities:

For a 'pair' object that keeps a pointer to each sock we could have a cloth bag that we use to keep the socks together. This seems like massive overhead.

But for each sock to keep a reference to the other, there is a neat solution: a popper (or a 'snap button' if you're American), such as these:

http://www.aliexpress.com/compare/compare-invisible-snap-buttons.html

Then all you do is snap your socks together right after you take them off and put them in your washing basket, and again you've removed the problem of needing to pair your socks with a physical abstraction of the 'pair' concept.

It does not answer the question, because handling with already paired data is easy, the question is what to do when the data is UNPAIRED and you want to pair it. – amit – 2015-05-01T12:56:35.467

10

When I sort socks, I do an approximate radix sort, dropping socks near other socks of the same colour/pattern type. Except in the case when I can see an exact match at/near the location I'm about to drop the sock I extract the pair at that point.

Almost all the other algorithms (including the top scoring answer by usr) sort, then remove pairs. I find that, as a human, it is better to minimize the number of socks being considered at one time.

I do this by:

1. Picking a distinctive sock (whatever catches my eye first in the pile).
2. Starting a radix sort from that conceptual location by pulling socks from the pile based on similarity to that one.
3. Place the new sock near into the current pile, with a distance based on how different it is. If you find yourself putting the sock on top of another because it is identical, form the pair there, and remove them. This means that future comparisons take less effort to find the correct place.

This takes advantage of the human ability to fuzzy-match in O(1) time, which is somewhat equivalent to the establishment of a hash-map on a computing device.

By pulling the distinctive socks first, you leave space to "zoom" in on the features which are less distinctive, to begin with.

After eliminating the fluro coloured, the socks with stripes, and the three pairs of long socks, you might end up with mostly white socks roughly sorted by how worn they are.

At some point, the differences between socks are small enough that other people won't notice the difference, and any further matching effort is not needed.

I didn't answer, just edited the answer. Please correct. – Peter Mortensen – 2013-10-30T19:33:18.640

9

Whenever you pick up a sock, put it in one place. Then the next sock you pick up, if it doesn't match the first sock, set it beside the first one. If it does, there's a pair. This way it doesn't really matter how many combinations there are, and there are only two possibilities for each sock you pick up -- either it has a match that's already in your array of socks, or it doesn't, which means you add it to a place in the array.

This also means that you will almost certainly never have all your socks in the array, because socks will get removed as they're matched.

This is what I do ... O(n) – Pykler – 2013-01-20T05:55:13.610

2@Pykler - It's O(n) in the best case and O(n*n) in the worst case. – Vilx- – 2013-01-20T12:41:35.083

2Thats assuming that you cannot create a fully unique hash in your mind of all the socks you already seen, which for me is a O(1) to match a sock that I have seen and previously and placed in the waiting for matching hash – Pykler – 2013-01-21T02:22:06.203

9

Consider a hash-table of size 'N'.

If we assume normal distribution, then the estimated number of 'insertions' to have atleast one sock mapped to one bucket is NlogN (ie, all buckets are full)

I had derived this as a part of another puzzle,but I would be happy to be proven wrong. Here's my blog article on the same

Let 'N' correspond to an approximate upper-bound on the number of number of unique colors/pattern of socks that you have.

Once you have a collision(a.k.a : a match) simply remove that pair of socks. Repeat the same experiment with the next batch of NlogN socks. The beauty of it is that you could be making NlogN parallel comparisons(collision-resolution) because of the way the human mind works. :-)

8

I hope I can contribute something new to this problem. I noticed that all of the answers neglect the fact that there are two points where you can perform preprocessing, without slowing down your overall laundry performance.

Also, we don't need to assume a large number of socks, even for large families. Socks are taken out of the drawer and are worn, and they are tossed in a place (maybe a bin) where they stay before being laundered. While I wouldn't call said bin a LIFO-Stack, I'd say it is safe to assume that

1. people toss both of their socks roughly in the same area of the bin,
2. the bin is not randomized at any point, and therefore
3. any subset took from the top of this bin generally contains both socks of a pair.

Since all washing machines I know about are limited in size (regardless of how many socks you have to wash), and the actual randomizing occurs in the washing machine, no matter how many socks we have, we always have small subsets which contain almost no singletons.

Our two preprocessing stages are "putting the socks on the clothesline" and "Taking the socks from the clothesline", which we have to do, in order to get socks which are not only clean but also dry. As with washing machines, clotheslines are finite, and I assume that we have the whole part of the line where we put our socks in sight.

Here's the algorithm for put_socks_on_line():

``````while (socks left in basket) {
take_sock();
if (cluster of similar socks is present) {
Add sock to cluster (if possible, next to the matching pair)
} else {
Hang it somewhere on the line, this is now a new cluster of similar-looking socks.
Leave enough space around this sock to add other socks later on
}
}
``````

Don't waste your time moving socks around or looking for the best match, this all should be done in O(n), which we would also need for just putting them on the line unsorted. The socks aren't paired yet, we only have several similarity clusters on the line. It's helpful that we have a limited set of socks here, as this helps us to create "good" clusters (for example, if there are only black socks in the set of socks, clustering by colours would not be the way to go)

Here's the algorithm for take_socks_from_line():

``````while(socks left on line) {
take_next_sock();
if (matching pair visible on line or in basket) {
Take it as well, pair 'em and put 'em away
} else {
put the sock in the basket
}
``````

I should point out that in order to improve the speed of the remaining steps, it is wise not to randomly pick the next sock, but to sequentially take sock after sock from each cluster. Both preprocessing steps don't take more time than just putting the socks on the line or in the basket, which we have to do no matter what, so this should greatly enhance the laundry performance.

After this, it's easy to do the hash partitioning algorithm. Usually, about 75% of the socks are already paired, leaving me with a very small subset of socks, and this subset is already (somewhat) clustered (I don't introduce much entropy into my basket after the preprocessing steps). Another thing is that the remaining clusters tend to be small enough to be handled at once, so it is possible to take a whole cluster out of the basket.

Here's the algorithm for sort_remaining_clusters():

``````while(clusters present in basket) {
Take out the cluster and spread it
Process it immediately
Leave remaining socks where they are
}
``````

After that, there are only a few socks left. This is where I introduce previously unpaired socks into the system and process the remaining socks without any special algorithm - the remaining socks are very few and can be processed visually very fast.

For all remaining socks, I assume that their counterparts are still unwashed and put them away for the next iteration. If you register a growth of unpaired socks over time (a "sock leak"), you should check your bin - it might get randomized (do you have cats which sleep in there?)

I know that these algorithms take a lot of assumptions: a bin which acts as some sort of LIFO stack, a limited, normal washing machine, and a limited, normal clothesline - but this still works with very large numbers of socks.

About parallelism: As long as you toss both socks into the same bin, you can easily parallelize all of those steps.

Socks are only metaphor for pairing arbitrary objects in some database. – amit – 2015-05-01T12:44:27.230

No, you can clearly see from the questions and the answers that this question is in fact about socks. – Philipp Flenker – 2015-05-01T14:24:15.453

As author of the question, socks are metaphors. – amit – 2015-05-01T14:36:18.473

1Got it, didn't see that you are the author. If you wanted a generic solution, you should really have said so. Anyway, there is nothing wrong with taking any information you have into account, unless you have to come up with a general solution - giving up the reusability of the solution could result in considerably better performance. In this case, considering the use case and the available data base as a whole is be beneficial. However, this special answer to your special question has issues with similar-looking socks, e.g. black socks in different sizes, so it's not applicable in some cases. – Philipp Flenker – 2015-05-01T14:46:30.900

1Also, you did not get >2k upvotes because you asked a question about pairing arbitrary objects in the database. You specifically constrained the question due to the very nature of socks (which you cannot duplicate, as opposed to data), you even encouraged to use the fact that you can easily distinguish your socks from the socks of your spouse. If you ask a question about socks, don't expect the answers to be about databases ;-) – Philipp Flenker – 2015-05-01T15:13:30.333

I specifically asked in the first guideline for answers I am looking for a theoretical answer `A general theoretical solution for a huge number of socks.` This is the first guideline I wrote I expect an answer to address. So if you think this answer can scale for huge number of "socks", please provide information how./ – amit – 2015-05-01T15:29:24.813

1There are a few assumptions: a normal washing mashine, a, normal clothesline, and the fact that you toss both socks in the bin at the same time, which means that in most cases both socks are in the same machine, and the number of leftover socks to be sorted is therefore small. But since you really wanted an answer about storing arbitrary objects in the database, is it really useful discussing my solution any futher? – Philipp Flenker – 2015-05-01T15:40:50.347

I am trying to clean up the thread a bit, and would appritiate if you either (1) address the points addressed in the question, or (2) remove the answer. Of course, it's up to you - but it's a request to make sure the thread is not a complete mess. – amit – 2015-05-01T15:45:11.093

1As I said, I think that I addressed everything you asked for, except for the element distinctness problem, which has been answered by other people. I'm not trying to be a douche here, but I have put a lot of effort in this answer a while back, and am mildly disappointed that you now go through some of the answers and claim that they didn't answer the original question. Why don't you just leave the whole thread alone - it's still an interesting read, over 2 years after you asked it? – Philipp Flenker – 2015-05-01T16:03:48.497

8

If the "move" operation is fairly expensive, and the "compare" operation is cheap, and you need to move the whole set anyway, into a buffer where search is much faster than in original storage... just integrate sorting into the obligatory move.

I found integrating the process of sorting into hanging to dry makes it a breeze. I need to pick up each sock anyway, and hang it (move) and it costs me about nothing to hang it in a specific place on the strings. Now just not to force search of the whole buffer (the strings) I choose to place socks by color/shade. Darker left, brighter right, more colorful front etc. Now before I hang each sock, I look in its "right vicinity" if a matching one is there already - this limits "scan" to 2-3 other socks - and if it is, I hang the other one right next to it. Then I roll them into pairs while removing from the strings, when dry.

Now this may not seem all that different from "forming piles by color" suggested by top answers but first, by not picking discrete piles but ranges, I have no problem classifying whether "purple" goes to "red" or "blue" pile; it just goes between. And then by integrating two operations (hang to dry and sort) the overhead of sorting while hanging is like 10% of what separate sorting would be.

This approach has two other advantages: line-drying loses many fewer socks IME than the tumble dryer does, and the sort process can be extended to the rest of the laundry, so (e.g.) all towels are near each other to be folded off the line and binned and taken straight to their storage. It also works in two low-effort passes, putting the clothes up and taking them down again. – cphlewis – 2015-04-10T23:13:08.430

7

What about doing some preprocessing? I would stitch a mark or id number in every sock in a way that every pair has the same mark/id number. This process might be done every time you buy a new pair of socks. Then, you could do a radix sort to get O(n) total cost. Find a place for every mark/id number and just pick all the socks one by one and put them into the correct place.

7

I have taken simple steps to reduce my effort into a process taking O(1) time.

By reducing my inputs to one of two types of socks (white socks for recreation, black socks for work), I only need to determine which of two socks I have in hand. (Technically, since they are never washed together, I have reduced the process to O(0) time)

Some upfront effort is required to find desirable socks, and to purchase in sufficient quantity as to eliminate need for your existing socks. As I'd done this before my need for black socks, my effort was minimal, but mileage may vary.

Such an upfront effort has been seen many times in very popular and effective code. Examples include #DEFINE'ing pi to several decimals (other examples exist, but that's the one that comes to mind right now).

7

Create a hash table which will be used for unmatched socks, using the pattern as the hash. Iterate over the socks one by one. If the sock has a pattern match in the hash table, take the sock out of the table and make a pair. If the sock does not have a match, put it into the table.

How to do it not in-place, as specifically mentioned in the question? – amit – 2015-05-01T12:42:29.663

7

The problem of sorting your n pairs of socks is O(n). Before you throw them in the laundry basket, you thread the left one to the right one. On taking them out, you cut the thread and put each pair into your drawer - 2 operations on n pairs, so O(n).

Now the next question is simply whether you do your own laundry and your wife does hers. That is a problem likely in an entirely different domain of problems. :)

This does not answer the question, where the socks are only a metaphor. – amit – 2015-05-01T12:43:11.260

The question was how to pair the socks from an unpaired pile, not how to avoid needing to pair. – amit – 2015-05-02T10:08:25.953

7

I've finished pairing my socks just right now, and I found that the best way to do it is the following:

• Choose one of the socks and put it away (create a 'bucket' for that pair)
• If the next one is the pair of the previous one, then put it to the existing bucket, otherwise create a new one.

In the worst case it means that you will have n/2 different buckets, and you will have n-2 determinations about that which bucket contains the pair of the current sock. Obviously, this algorithm works well if you have just a few pairs; I did it with 12 pairs.

It is not so scientific, but it works well:)

This is still an O(n^2) algorithm since you have to iterate over each bucket whenever you pull out a new sock. But, considering the fact that even the socks bought within the same batch have minor differences which render them effectively pair-unique (or even single-unique), there's no better way anyway – Semisonic – 2018-01-24T17:04:24.557

Agree, but my algorithm is assuming that human is doing the pairing. Therefore, there will be a kind-of cache in your mind when you are searching for the matching bucket, so you don't really need to iterate over the buckets anyway. Not sure what kind of data structure is built for this caching mechanism in my head during the pairing. – maestro – 2018-01-24T18:06:50.670

7

My solution does not exactly correspond to your requirements, as it formally requires `O(n)` "extra" space. However, considering my conditions it is very efficient in my practical application. Thus I think it should be interesting.

## Combine with Other Task

The special condition in my case is that I don't use drying machine, just hang my cloths on an ordinary cloth dryer. Hanging cloths requires `O(n)` operations (by the way, I always consider bin packing problem here) and the problem by its nature requires the linear "extra" space. When I take a new sock from the bucket I to try hang it next to its pair if the pair is already hung. If its a sock from a new pair I leave some space next to it.

## Oracle Machine is Better ;-)

It obviously requires some extra work to check if there is the matching sock already hanging somewhere and it would render solution `O(n^2)` with coefficient about `1/2` for a computer. But in this case the "human factor" is actually an advantage -- I usually can very quickly (almost `O(1)`) identify the matching sock if it was already hung (probably some imperceptible in-brain caching is involved) -- consider it a kind of limited "oracle" as in Oracle Machine ;-) We, the humans have these advantages over digital machines in some cases ;-)

## Have it Almost `O(n)`!

Thus connecting the problem of pairing socks with the problem of hanging cloths I get `O(n)` "extra space" for free, and have a solution that is about `O(n)` in time, requires just a little more work than simple hanging cloths and allows to immediately access complete pair of socks even in a very bad Monday morning... ;-)

5

I thought about this very often during my PhD (in computer science). I came up with multiple solutions, depending on the ability to distinguish socks and thus find correct pairs as fast as possible.

Suppose the cost of looking at socks and memorizing their distinctive patterns is negligible (ε). Then the best solution is simply to throw all socks on a table. This involves those steps:

1. Throw all socks on a table (1) and create a hashmap {pattern: position} (ε)
2. While there are remaining socks (n/2):
1. Pick up one random sock (1)
2. Find position of corresponding sock (ε)
3. Retrieve sock (1) and store pair

This is indeed the fastest possibility and is executed in n + 1 = O(n) complexity. But it supposes that you perfectly remember all patterns... In practice, this is not the case, and my personal experience is that you sometimes don't find the matching pair at first attempt:

1. Throw all socks on a table (1)
2. While there are remaining socks (n/2):
1. Pick up one random sock (1)
2. while it is not paired (1/P):
1. Find sock with similar pattern
2. Take sock and compare both (1)
3. If ok, store pair

This now depends on our ability to find matching pairs. This is particularly true if you have dark/grey pairs or white sports socks that often have very similar patterns! Let's admit that you have a probability of P of finding the corresponding sock. You'll need, on average, 1/P tries before finding the corresponding sock to form a pair. The overall complexity is 1 + (n/2) * (1 + 1/P) = O(n).

Both are linear in the number of socks and are very similar solutions. Let's slightly modify the problem and admit you have multiple pairs of similar socks in the set, and that it is easy to store multiple pairs of socks in one move (1+ε). For K distinct patterns, you may implement:

1. For each sock (n):
1. Pick up one random sock (1)
2. Put it on its pattern's cluster
2. For each cluster (K):
1. Take cluster and store pairs of socks (1+ε)

The overall complexity becomes n+K = O(n). It is still linear, but choosing the correct algorithm may now greatly depend on the values of P and K! But one may object again that you may have difficulties to find (or create) cluster for each sock.

Besides, you may also loose time by looking on websites what is the best algorithm and proposing your own solution :)

1

My proposed solution assumes that all socks are identical in details, except by color. If there are more details to defer between socks, these details can be used to define different types of socks instead of colors in my example ..

Given that we have a pile of socks, a sock can come in three colors: Blue, red, or green.

Then we can create a parallel worker for each color; it has its own list to fill corresponding colors.

``````At time i:

Blue  read  Pile[i]    : If Blue  then Blue.Count++  ; B=TRUE  ; sync

Red   read  Pile[i+1]  : If Red   then Red.Count++   ; R=TRUE  ; sync

Green read  Pile [i+2] : If Green then Green.Count++ ; G=TRUE  ; sync
``````

With synchronization process:

``````Sync i:

i++

If R is TRUE:
i++
If G is TRUE:
i++
``````

This requires an initialization:

``````Init:

If Pile[0] != Blue:
If      Pile[0] = Red   : Red.Count++
Else if Pile[0] = Green : Green.Count++

If Pile[1] != Red:
If Pile[0] = Green : Green.Count++
``````

Where

``````Best Case: B, R, G, B, R, G, .., B, R, G

Worst Case: B, B, B, .., B

Time(Worst-Case) = C * n ~ O(n)

Time(Best-Case) = C * (n/k) ~ O(n/k)

n: number of sock pairs
k: number of colors
C: sync overhead
``````

1

# Towards an efficient algorithm for pairing socks from a pile

Preconditions

1. There must be at least one sock in the pile
2. The table must be large enough to accommodate N/2 socks (worst case), where N is the total number of socks.

Algorithm

Try:

1. Pick the first sock
2. Put it down on the table
3. Pick next sock, and look at it (may throw 'no more socks in the pile' exception)
4. Now scan the socks on the table (throws exception if there are no socks left on the table)
5. Is there any match? a) yes => remove the matching sock from the table b) no => put the sock on the table (may throw 'the table is not large enough' exception)

Except:

• The table is not large enough:
carefully mix all unpaired socks together, then resume operation
// this operation will result in a new pile and an empty table

• No socks left on the table:
throw (the last unpairable sock)

• No socks left in the pile:
exit laundry room

Finally:

• If there are still socks in the pile:
goto 3

Known issues

The algorithm will enter an infinite loop if there is no table around or there is not enough place on the table to accommodate at least one sock.

Possible improvement

Depending on the number of socks to be sorted, throughput could be increased by sorting the socks on the table, provided there's enough space.

In order for this to work, an attribute is needed that has a unique value for each pair of socks. Such an attribute can be easily synthesized from the visual properties of the socks.

Sort the socks on the table by said attribute. Let's call that attribute ' colour'. Arrange the socks in a row, and put darker coloured socks to the right (i.e. .push_back()) and lighter coloured socks to the left (i.e. .push_front())

For huge piles and especially previously unseen socks, attribute synthesis might require significant time, so throughput will apparently decrease. However, these attributes can be persisted in memory and reused.

Some research is needed to evaluate the efficiency of this possible improvement. The following questions arise:

• What is the optimal number of socks to be paired using above improvement?
• For a given number of socks, how many iterations are needed before throughput increases?
a) for the last iteration
b) for all iterations overall

PoC in line with the MCVE guidelines:

``````#include <iostream>
#include <vector>
#include <string>
#include <time.h>

using namespace std;

struct pileOfsocks {
pileOfsocks(int pairCount = 42) :
elemCount(pairCount<<1) {
srand(time(NULL));
socks.resize(elemCount);

vector<int> used_colors;
vector<int> used_indices;

auto getOne = [](vector<int>& v, int c) {
int r;
do {
r = rand() % c;
} while (find(v.begin(), v.end(), r) != v.end());
v.push_back(r);
return r;
};

for (auto i = 0; i < pairCount; i++) {
auto sock_color = getOne(used_colors, INT_MAX);
socks[getOne(used_indices, elemCount)] = sock_color;
socks[getOne(used_indices, elemCount)] = sock_color;
}
}

void show(const string& prompt) {
cout << prompt << ":" << endl;
for (auto i = 0; i < socks.size(); i++){
cout << socks[i] << " ";
}
cout << endl;
}

void pair() {
for (auto i = 0; i < socks.size(); i++) {
std::vector<int>::iterator it = find(unpaired_socks.begin(), unpaired_socks.end(), socks[i]);
if (it != unpaired_socks.end()) {
unpaired_socks.erase(it);
paired_socks.push_back(socks[i]);
paired_socks.push_back(socks[i]);
}
else
unpaired_socks.push_back(socks[i]);
}

socks = paired_socks;
paired_socks.clear();
}

private:
int elemCount;
vector<int> socks;
vector<int> unpaired_socks;
vector<int> paired_socks;
};

int main() {
pileOfsocks socks;

socks.show("unpaired socks");
socks.pair();
socks.show("paired socks");

system("pause");
return 0;
}
``````

1

Two lines of thinking, the speed it takes to find any match, versus the speed it takes to find all matches compared to the storage.

For the second case, I wanted to point out a GPU paralleled version which queries the socks for all matches.

If you have multiple properties for which to match, you can make use of grouped tuples and fancier zip iterators and the transform functions of thrust, for simplicity sake though here is a simple GPU based query:

``````//test.cu
#include <thrust/device_vector.h>
#include <thrust/sequence.h>
#include <thrust/copy.h>
#include <thrust/count.h>
#include <thrust/remove.h>
#include <thrust/random.h>
#include <iostream>
#include <iterator>
#include <string>

// Define some types for pseudo code readability
typedef thrust::device_vector<int> GpuList;
typedef GpuList::iterator          GpuListIterator;

template <typename T>
struct ColoredSockQuery : public thrust::unary_function<T,bool>
{
ColoredSockQuery( int colorToSearch )
{ SockColor = colorToSearch; }

int SockColor;

__host__ __device__
bool operator()(T x)
{
return x == SockColor;
}
};

struct GenerateRandomSockColor
{
float lowBounds, highBounds;

__host__ __device__
GenerateRandomSockColor(int _a= 0, int _b= 1) : lowBounds(_a), highBounds(_b) {};

__host__ __device__
int operator()(const unsigned int n) const
{
thrust::default_random_engine rng;
thrust::uniform_real_distribution<float> dist(lowBounds, highBounds);
rng.discard(n);
return dist(rng);
}
};

template <typename GpuListIterator>
void PrintSocks(const std::string& name, GpuListIterator first, GpuListIterator last)
{
typedef typename std::iterator_traits<GpuListIterator>::value_type T;

std::cout << name << ": ";
thrust::copy(first, last, std::ostream_iterator<T>(std::cout, " "));
std::cout << "\n";
}

int main()
{
int numberOfSocks = 10000000;
GpuList socks(numberOfSocks);
thrust::transform(thrust::make_counting_iterator(0),
thrust::make_counting_iterator(numberOfSocks),
socks.begin(),
GenerateRandomSockColor(0, 200));

clock_t start = clock();

GpuList sortedSocks(socks.size());
GpuListIterator lastSortedSock = thrust::copy_if(socks.begin(),
socks.end(),
sortedSocks.begin(),
ColoredSockQuery<int>(2));
clock_t stop = clock();

PrintSocks("Sorted Socks: ", sortedSocks.begin(), lastSortedSock);

double elapsed = (double)(stop - start) * 1000.0 / CLOCKS_PER_SEC;
std::cout << "Time elapsed in ms: " << elapsed << "\n";

return 0;
}

//nvcc -std=c++11 -o test test.cu
``````

Run time for 10 million socks: 9 ms

-5

We can use hashing to our leverage if you can abstract a single pair of socks as the key itself and its other pair as value.

1. Make two imaginary sections behind you on the floor, one for you and another for your spouse.

2. Take one from the pile of socks.

3. Now place the socks on the floor, one by one, following the below rule.

• Identify the socks as yours or hers and look at the relevant section on the floor.

• If you can spot the pair on the floor pick it up and knot them up or clip them up or do whatever you would do after you find a pair and place it in a basket (remove it from the floor).

• Place it in the relevant section.

4. Repeat 3 until all socks are over from the pile.

Explanation:

Hashing and Abstraction

Abstraction is a very powerful concept that has been used to improve user experience (UX). Examples of abstraction in real-life interactions with computers include:

• Folder icons used for navigation in a GUI (graphical user interface) to access an address instead of typing the actual address to navigate to a location.
• GUI sliders used to control various levels of volume, document scroll position, etc..

Hashing or other not-in-place solutions are not an option because I am not able to duplicate my socks (though it could be nice if I could).

I believe the asker was thinking of applying hashing such that the slot to which either pair of sock goes should be known before placing them.

That's why I suggested abstracting a single sock that is placed on the floor as the hash key itself (hence there is no need to duplicate the socks).

How to define our hash key?

The below definition for our key would also work if there are more than one pair of similar socks. That is, let's say there are two pairs of black men socks PairA and PairB, and each sock is named PairA-L, PairA-R, PairB-L, PairB-R. So PairA-L can be paired with PairB-R, but PairA-L and PairB-L cannot be paired.

Let say any sock can be uniqly identified by,

`Attribute[Gender] + Attribute[Colour] + Attribute[Material] + Attribute[Type1] + Attribute[Type2] + Attribute[Left_or_Right]`

This is our first hash function. Let's use a short notation for this `h1(G_C_M_T1_T2_LR)`. h1(x) is not our location key.

Another hash function eliminating the Left_or_Right attribute would be `h2(G_C_M_T1_T2)`. This second function h2(x) is our location key! (for the space on the floor behind you).

• To locate the slot, use h2(G_C_M_T1_T2).
• Once the slot is located then use h1(x) to check their hashes. If they don't match, you have a pair. Else throw the sock into the same slot.

NOTE: Since we remove a pair once we find one, it's safe to assume that there would only be at a maximum one slot with a unique h2(x) or h1(x) value.

In case we have each sock with exactly one matching pair then use h2(x) for finding the location and if no pair, a check is required, since it's safe to assume they are a pair.

Why is it important to lay the socks on the floor

Let's consider a scenario where the socks are stacked over one another in a pile (worst case). This means we would have no other choice but to do a linear search to find a pair.

Spreading them on the floor gives more visibility which improves the chance of spotting the matching sock (matching a hash key). When a sock was placed on the floor in step 3, our mind had subconsciously registered the location. - So in case, this location is available in our memory we can directly find the matching pair. - In case the location is not remembered, don't worry, then we can always revert back to linear search.

Why is it important to remove the pair from the floor?

• Short-term human memory works best when it has fewer items to remember. Thus increasing the probability of us resorting to hashing to spot the pair.
• It will also reduce the number of items to be searched through when linear searching for the pair is being used.

Analysis

1. Case 1: Worst case when Derpina cannot remember or spot the socks on the floor directly using the hashing technique. Derp does a linear search through the items on the floor. This is not worse than the iterating through the pile to find the pair.
• Upper bound for comparison: O(n^2).
• Lower bound for comparison: (n/2). (when every other sock Derpina picks up is the pair of previous one).
2. Case 2: Derp remembers each the location of every sock that he placed on the floor and each sock has exactly one pair.
• Upper bound for comparison: O(n/2).
• Lower bound for comparison: O(n/2).

I am talking about comparison operations, picking the socks from the pile would necessarily be n number of operations. So a practical lower bound would be n iterations with n/2 comparisons.

Speeding up things

To achieve a perfect score so Derp gets O(n/2) comparisons, I would recommend Derpina to,

• spend more time with the socks to get familiarize with it. Yes, that means spending more time with Derp's socks too.
• Playing memory games like spot the pairs in a grid can improve short-term memory performance, which can be highly beneficial.

Is this equivalent to the element distinctness problem?

The method I suggested is one of the methods used to solve element distinctness problem where you place them in the hash table and do the comparison.

Given your special case where there exists only one exact pair, it has become very much equivalent to the element distinct problem. Since we can even sort the socks and check adjacent socks for pairs (another solution for EDP).

However, if there is a possibility of more than one pair that can exist for given sock then it deviates from EDP.

2So, basically other then splitting the problem into 2 subproblems (without resplitting it again later on) - it offers to "cache" as much elements I can (the top of each "spot"), while piling it up, and repeat while there are still elements. Can you provide complexity analysis for it? My gut tells me it is going to be worse then O(n^2) at average case (though I cannot prove it yet), and you cannot bound the number of iterations you make. You are also going to need some randomization to guarantee you take the elements at different order each time. Or am I missing something here? – amit – 2013-01-21T15:07:40.800

worst case (assuming all pairs are men's and are different) would be n^2 and on the extreme other side the number of linear searches you would need would be n/2. I would improve my answer later today to explain how the iterations would be performed on reducing sets. – D34dman – 2013-01-22T14:42:18.753

@amit EDIT NOTE: originally i wanted to point out that hashing is possible. But due to human mind behavior being sporadic hashing is not totally reliable and thus a mixture of hashing and linear searching have been suggested. I am in favor of linear searching as against any other form of searching since it involves least stress on human mind. Since hashing method might prove quite stressful linear searching would be quite a relief. IMHO, Efficiency should be measured with respect to time required to complete this operation rather than iterations required. – D34dman – 2013-01-29T16:35:50.453