Feynman's restaurant problem, by Michael A Gottlieb

This problem and its solution were invented by Michael Gottlieb in 2004, based on a story told by Ralph Leighton.

FEYNMAN'S ORIGINAL RESTAURANT PROBLEM

has recently been deciphered from his notes.

Richard Feynman and Ralph Leighton had a hard time at restaurants deciding whether to order the best dish they had tried so far or something new, which - who knows - might be even better. So, naturally, they decided to formalize this as a mathematical problem:

Assume that a restaurant has N dishes on its menu that are rated from worst to best, 1 to N (according to your personal preferences). You, however, don't know the ratings of the dishes, and when you try a new dish, all you learn is whether it is the best (highest rated) dish you have tried so far, or not. Each time you eat a meal at the restaurant you either order a new dish or you order the best dish you have tried so far. Your goal is to maximize the average total ratings of the dishes you eat in M meals (where M is less than or equal to N).

The average total ratings in a sequence of meals that includes n "new" dishes and b "best so far" dishes can be no higher than the average total ratings in the sequence having all n "new" dishes followed by all b "best so far" dishes. Thus a successful strategy requires you to order some number of new dishes and thereafter only order the best dish so far. The problem then reduces to the following:

Given N (dishes on the menu) and M <= N (meals to be eaten at the restaurant), how many new dishes D should you try before switching to ordering the best of them for all the remaining (M–D) meals, in order to maximize the average total ratings of the dishes consumed?

Solution by Michael A. Gottlieb (based on a letter to Chris Siegert)

If you randomly select an integer from the range [1...N], with all integers in that range being equiprobable, the expected value of the integer is just the average value of the range, (N+1)/2. Similarly if you select D (unique) integers from the range, the expected value of each one is (N+1)/2. [So, for example if D=1 (you try only one new dish) your average score for that day is (N+1)/2, while if D=N (ie you try ALL the dishes on the menu) then your average score for those N days is N*(N+1)/2 = 1+2+...+N, which, of course, is your score on every N-day trial, because you have tried each dish once.] Thus your score for the first D days averages D*(N+1)/2. Your score for the remaining M-D days, when you switch to eating the best dish you have tried in the first D days, is (M-D)*E[N,D], where E[N,D] is the expected (average) maximum integer from amongst D (unique) integers selected from the range [1...N]. Your total average score is equal to,

    D*(N+1)/2 + (M-D)*E[N,D].

To solve the problem you take the derivative of this expression with respect to D, set it to zero, and solve for D. That gives the answer posted with the problem on The Feynman Lectures website Thus the problem really reduces to finding an explicit expression for E[N,D].

It is not too hard to guess what E[N,D] has to be, without making many computations. We know that E[N,k] has to be strictly increasing with k (because the average highest will always be higher when you select more). We know that

   E[N,1] = (N+1)/2,

because if you only sample one dish, it's most likely average. We also know that

   E[N,N] = N.

That is to say, if you sample all the dishes, N is always the highest rating, so that is also the average highest rating. Now consider E[N,N-1], the average largest integer selected when you select all but one integer in the range [1...N]. That is easy to compute because for all cases where the unselected number is not N, the largest integer selected is N, there are N-1 such cases, and in the one remaining case, the largest integer selected is N-1, so the average is ((N-1)*N + (N-1))/N = ((N-1)/N) * (N+1). So,

   E[N,N-1] = ((N-1)/N) * (N+1).

In a similar manner you can easily deduce that

   E[N,N-2] = ((N-2)/(N-1)) * (N+1).

It is also not very difficult to compute E[N,2], since there's 1 possible pair where 2 is maximum, 2 where 3 is maximum, 3 where 4 is maximum, etc., and all together there are N*(N-1)/2 possible pairs, so

    E[N,2] = (Sum m=2 to N of (m-1)*m) / (N*(N-1)/2)

                 = ((N^3- N)/3) / (N*(N-1)/2)

                 = (2/3)(N+1).

From all these clues you might guess that

   E[N,k] = (k/(k+1)) * (N+1),

and you would be right!

But here is a more formal derivation of E[N,k] :

Define P[N,k,m] = probability that the largest of k (unique) integers selected in the range [1...N] equals m. Then

   E[N,k] = Sum m=1 to N of m*P[N,k,m] .

P[N,k,m] equals the number of ways you can select k unique integers from the range [1...N] with m being the maximum, divided by the total number of ways you can select k unique integers from the range [1...N]. If you have k unique integers with m being the maximum, then the remaining k-1 integers must be selected from the range [1...m-1], so

   P[N,k,m] = Binomial[m-1,k-1]/Binomial[N,k]

with m>=k. P[N,k,m] = 0 if m<k, since amongst k unique integers selected from [1...N], the smallest possible maximum is k. Thus,

   E[N,k] = Sum m=k to N of m*P[N,k,m]

We have

    P[N,k,m] = ((m-1)!/((m-k)!(k-1)!)) / (N!/((N-k)!k!)),

and

E[N,k] = Sum m=k to N of m*((m-1)!/((m-k)!(k-1)!)) / (N!/((N-k)!k!))

           = Sum m=k to N of (m!/((m-k)!(k-1)!)) / (N!/((N-k)!k!)).

           =    ((N-k)!k!)/(N!(k-1)!) * Sum m=k to N of m!/(m-k)!.

Now,

Sum m=k to N of m!/(m-k)! = k! * Sum m=k to N of m!/(k!(m-k))

         = k! * Sum m=k to N Binomial[m,k]

         = k! * Binomial[N+1,k+1].

         = k! (N+1)!/((N-k)!(k+1)!)

So,

E[N,k] = ((N-k)!k!)/(N!(k-1)!) * k! (N+1)!/((N-k)!(k+1)!)

          = (k/(k+1)) * (N+1).

Finally, returning to the first formula above, we find for the expected average rating

   D*(N+1)/2 + (M-D)*E[N,D]

   = D*(N+1)/2 + (M-D)*(D/(D+1))*(N+1).

Taking the derivative with respect to D and setting it to 0,

(D^2 + 2D - (2M+1))*(N+1)/2(D+1)^2 = 0,

and noting that (N+1)/2(D+1)^2 > 0, we end up with the quadratic equation,

D^2 + 2D - (2M+1) = 0,

having positive solution,

D = Sqrt[2(M+1)] - 1.