<?xml version='1.0' encoding='UTF-8'?><?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:georss='http://www.georss.org/georss' xmlns:gd='http://schemas.google.com/g/2005' xmlns:thr='http://purl.org/syndication/thread/1.0'><id>tag:blogger.com,1999:blog-5313430225166146055</id><updated>2011-07-07T17:15:49.889-07:00</updated><category term='math'/><category term='algorithms'/><title type='text'>Strange Quark</title><subtitle type='html'>Ruminations on computer science, philosophy, and mathematics.</subtitle><link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://adrianquark.blogspot.com/feeds/posts/default'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5313430225166146055/posts/default?max-results=100'/><link rel='alternate' type='text/html' href='http://adrianquark.blogspot.com/'/><link rel='hub' href='http://pubsubhubbub.appspot.com/'/><author><name>Adrian Quark</name><uri>http://www.blogger.com/profile/08774936184238473409</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><generator version='7.00' uri='http://www.blogger.com'>Blogger</generator><openSearch:totalResults>1</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>100</openSearch:itemsPerPage><entry><id>tag:blogger.com,1999:blog-5313430225166146055.post-9200479527485237823</id><published>2008-09-22T20:09:00.000-07:00</published><updated>2008-09-23T13:29:08.215-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='algorithms'/><category scheme='http://www.blogger.com/atom/ns#' term='math'/><title type='text'>How to Shuffle an Array (Correctly)</title><content type='html'>&lt;p&gt;Today at a career fair, an Amazon recruiter asked me how to shuffle an array. Sounds like an easy problem, right? It turns out a good majority of people posting code to the internet get it wrong. But before reading on, try it out yourself, see what you come up with, and then Google for "array shuffle" or "randomize array" to see what other people think.&lt;br /&gt;&lt;/p&gt;&lt;p&gt;. . .&lt;br /&gt;&lt;/p&gt;&lt;p&gt;Two approaches came to mind: iterate through all of the elements, swapping each element with some random location in the array, or iterate through all of the elements, swapping each element with some random current-or-later location in the array. If you're like me, something bugs you about both these approaches. The first one may swap a given element several times, so it seems like it does too much work. The second one swaps over an ever-decreasing range of possible positions, so it seems like it might treat later elements differently than earlier ones. On the spot at the career fair, I waffled and ended up picking the first approach just because I couldn't convince myself that the second one was valid, and I thought, you can never have &lt;span style="font-style: italic;"&gt;too&lt;/span&gt; much shuffling.&lt;/p&gt;&lt;p&gt;Only later did I prove to myself that the current-or-later algorithm works. Think about the last element in an array of size &lt;span style="font-style: italic;"&gt;n&lt;/span&gt;. What are the odds that it ends up at index 1? Obviously, 1/&lt;span style="font-style: italic;"&gt;n&lt;/span&gt;. What about index 2? This is a bit trickier, since it's a compound probability: we have to multiply the odds of not swapping with 1 by the odds of then swapping with 2, to find the probability of the element actually ending up at 2. In other words, (&lt;span style="font-style: italic;"&gt;n&lt;/span&gt;-1)/&lt;span style="font-style: italic;"&gt;n&lt;/span&gt; * 1/(&lt;span style="font-style: italic;"&gt;n&lt;/span&gt;-1) = 1/&lt;span style="font-style: italic;"&gt;n&lt;/span&gt;. Similar math shows that there is an equal probability of swapping with 3, 4, etc, and you can do the same for other elements in the array besides the last.&lt;br /&gt;&lt;/p&gt;&lt;p&gt;What about the alternative, just swapping each element with any old element? It's a popular solution; see, for example: &lt;a href="http://www.codecodex.com/wiki/index.php?title=Shuffle_an_array"&gt;http://www.codecodex.com/wiki/index.php?title=Shuffle_an_array&lt;/a&gt;, which recommends implementations of this algorithm in several languages. Unfortunately, the algorithm doesn't work. Here's a mathematical argument for why, although I'd love to hear a more succinct one which avoids the number theory.&lt;/p&gt;&lt;p&gt;Let's start with the fact that the number of possible permutations of an &lt;span style="font-style: italic;"&gt;n&lt;/span&gt;-element array is &lt;span style="font-style: italic;"&gt;n&lt;/span&gt;!. So for a good shuffle, any given permutation must be chosen with probability 1/&lt;span style="font-style: italic;"&gt;n&lt;/span&gt;!. What happens when we apply the swap-with-anything algorithm? On the first iteration, we can end up with &lt;span style="font-style: italic;"&gt;n&lt;/span&gt; possible permutations (the results of swapping the first element with each of &lt;span style="font-style: italic;"&gt;n&lt;/span&gt; locations, including itself). For every one of of these possibilities, we generate &lt;span style="font-style: italic;"&gt;n&lt;/span&gt; possibilities for the location of the second element. And so on. We will end up with &lt;span style="font-style: italic;"&gt;n&lt;/span&gt;*&lt;span style="font-style: italic;"&gt;n&lt;/span&gt;*...*&lt;i&gt;n&lt;/i&gt; = &lt;i&gt;n&lt;/i&gt;&lt;sup&gt;&lt;i&gt;n&lt;/i&gt;&lt;/sup&gt; possible results. Obviously this is bigger than &lt;span style="font-style: italic;"&gt;n&lt;/span&gt;!, but that's ok because we expect that some permutations can be produced in more than one way, as elements are swapped more than once. The important thing is that each of the &lt;span style="font-style: italic;"&gt;n&lt;/span&gt;! permutations must appear the same number of times in our  &lt;i&gt;n&lt;/i&gt;&lt;sup&gt;&lt;i&gt;n&lt;/i&gt;&lt;/sup&gt; possible results, to ensure that they each are chosen with equal possibility. In other words, for some integer &lt;span style="font-style: italic;"&gt;k&lt;/span&gt;: &lt;span style="font-style: italic;"&gt;k*n&lt;/span&gt;! =&lt;i&gt; n&lt;/i&gt;&lt;sup&gt;&lt;i&gt;n&lt;/i&gt;&lt;/sup&gt;.&lt;/p&gt;&lt;p&gt;But this equation is not satisfiable for all values of &lt;span style="font-style: italic;"&gt;n&lt;/span&gt;. Consider &lt;span style="font-style: italic;"&gt;n&lt;/span&gt;=3. Then &lt;span style="font-style: italic;"&gt;n&lt;/span&gt;! = 6 and &lt;i&gt;n&lt;/i&gt;&lt;sup&gt;&lt;i&gt;n&lt;/i&gt;&lt;/sup&gt; = 27. Since 6 does not divide 27, there is no way the 27 possible results of the algorithm can be evenly distributed amongst the 6 possible permutations, so some will be chosen more than others. In fact, the only number &lt;span style="font-style: italic;"&gt;n&lt;/span&gt; such that &lt;span style="font-style: italic;"&gt;n&lt;/span&gt;! divides &lt;i&gt;n&lt;/i&gt;&lt;sup&gt;&lt;i&gt;n&lt;/i&gt;&lt;/sup&gt; is 2. The reason is that since &lt;span style="font-style: italic;"&gt;n&lt;/span&gt;! includes as prime factors every prime number less than &lt;span style="font-style: italic;"&gt;n&lt;/span&gt;, and &lt;i&gt;n&lt;/i&gt;&lt;sup&gt;&lt;i&gt;n&lt;/i&gt;&lt;/sup&gt; includes only those prime factors in &lt;span style="font-style: italic;"&gt;n&lt;/span&gt;, for &lt;span style="font-style: italic;"&gt;n&lt;/span&gt;! to divide &lt;i&gt;n&lt;/i&gt;&lt;sup&gt;&lt;i&gt;n&lt;/i&gt;&lt;/sup&gt;, &lt;span style="font-style: italic;"&gt;n&lt;/span&gt; must include every smaller prime number as a factor. However, a product of prime numbers +1 is itself a prime number [&lt;span style="font-weight: bold;"&gt;CORRECTION&lt;/span&gt;: it's divisible by some prime number not in the product], so (hand waving) &lt;span style="font-style: italic;"&gt;n&lt;/span&gt; can't have any prime factors, so &lt;span style="font-style: italic;"&gt;n&lt;/span&gt; must be 2&lt;br /&gt;&lt;/p&gt;&lt;p&gt;The moral of this story is, don't blindly cut-and-paste code you find with Google! Unless you are implementing an online poker server, in which case, please send me the URL.&lt;/p&gt;&lt;p&gt;&lt;span style="font-weight: bold;"&gt;UPDATES:&lt;/span&gt;&lt;/p&gt;&lt;p&gt;As was pointed out to me by several people, the correct algorithm described above is known as the "Knuth Shuffle", or "&lt;a href="http://en.wikipedia.org/wiki/Fisher-Yates_shuffle"&gt;Fisher-Yates Shuffle&lt;/a&gt;", and Wikipedia has an explanation &lt;a href="http://en.wikipedia.org/wiki/Shuffling#Implementation_of_shuffling_algorithms"&gt;very similar to mine&lt;/a&gt; about the dangers of mis-implementing this algorithm. I joked about online poker servers, but apparently at least one online poker server really did implement the broken algorithm described above. They were &lt;a href="http://www.cigital.com/papers/download/developer_gambling.php"&gt;exploited by some white hats&lt;/a&gt;, who give an excellent explanation of why the algorithm is broken and also discuss issues with random number generation.&lt;br /&gt;&lt;/p&gt;&lt;p&gt;In the comments &lt;span dir="ltr"&gt;&lt;a href="http://www.blogger.com/profile/06766959286958583427" onclick="" rel="nofollow"&gt;mook&lt;/a&gt;&lt;/span&gt; suggests an interesting variation: pick two elements at random and swap them. Repeat some large number of times. Unfortunately this approach is wrong for the same reason as the bad iterative approach. Supposing you are shuffling a deck of 52 cards: each time you choose a random element with 1/52 probability, you multiply the number of possible outcomes of the algorithm by 52. By choosing two random elements on each iteration rather than picking one deterministically, you are just squaring the number of possible outcomes. So you'll still end up with 52&lt;sup&gt;2&lt;i&gt;k&lt;/i&gt;&lt;/sup&gt; possible outcomes for some &lt;i&gt;k&lt;/i&gt;. And it's still the case that you can't divide the 52! possible permutations of the deck evenly amongst these 52&lt;sup&gt;2&lt;i&gt;k&lt;/i&gt;&lt;/sup&gt; possible outcomes (check the prime factors) -- so the method must be biased.&lt;/p&gt;&lt;p&gt;Finally, I should mention that there is yet another broken alternative in common use: generate a random number for each element and sort the elements by these numbers. Oleg Kislyov explains with typical lucidity &lt;a href="http://okmij.org/ftp/Haskell/perfect-shuffle.txt"&gt;why this doesn't work&lt;/a&gt;. &lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5313430225166146055-9200479527485237823?l=adrianquark.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://adrianquark.blogspot.com/feeds/9200479527485237823/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5313430225166146055&amp;postID=9200479527485237823' title='18 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5313430225166146055/posts/default/9200479527485237823'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5313430225166146055/posts/default/9200479527485237823'/><link rel='alternate' type='text/html' href='http://adrianquark.blogspot.com/2008/09/how-to-shuffle-array-correctly.html' title='How to Shuffle an Array (Correctly)'/><author><name>Adrian Quark</name><uri>http://www.blogger.com/profile/08774936184238473409</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>18</thr:total></entry></feed>
