The Feynman Lectures on Physics Vol. I Ch. 6: Probability

6Probability

(There was no summary for this lecture.)

“The true logic of this world is in the calculus of probabilities.”
—James Clerk Maxwell

6–1Chance and likelihood

“Chance” is a word which is in common use in everyday living. The radio reports speaking of tomorrow’s weather may say: “There is a sixty percent chance of rain.” You might say: “There is a small chance that I shall live to be one hundred years old.” Scientists also use the word chance. A seismologist may be interested in the question: “What is the chance that there will be an earthquake of a certain size in Southern California next year?” A physicist might ask the question: “What is the chance that a particular geiger counter will register twenty counts in the next ten seconds?” A politician or statesman might be interested in the question: “What is the chance that there will be a nuclear war within the next ten years?” You may be interested in the chance that you will learn something from this chapter.

By chance, we mean something like a guess. Why do we make guesses? We make guesses when we wish to make a judgment but have incomplete information or uncertain knowledge. We want to make a guess as to what things are, or what things are likely to happen. Often we wish to make a guess because we have to make a decision. For example: Shall I take my raincoat with me tomorrow? For what earth movement should I design a new building? Shall I build myself a fallout shelter? Shall I change my stand in international negotiations? Shall I go to class today?

Sometimes we make guesses because we wish, with our limited knowledge, to say as much as we can about some situation. Really, any generalization is in the nature of a guess. Any physical theory is a kind of guesswork. There are good guesses and there are bad guesses. The theory of probability is a system for making better guesses. The language of probability allows us to speak quantitatively about some situation which may be highly variable, but which does have some consistent average behavior.

Let us consider the flipping of a coin. If the toss—and the coin—are “honest,” we have no way of knowing what to expect for the outcome of any particular toss. Yet we would feel that in a large number of tosses there should be about equal numbers of heads and tails. We say: “The probability that a toss will land heads is $0.5$.”

We speak of probability only for observations that we contemplate being made in the future. By the “probability” of a particular outcome of an observation we mean our estimate for the most likely fraction of a number of repeated observations that will yield that particular outcome. If we imagine repeating an observation—such as looking at a freshly tossed coin—$N$ times, and if we call $N_A$ our estimate of the most likely number of our observations that will give some specified result $A$, say the result “heads,” then by $P(A)$, the probability of observing $A$, we mean \begin{equation} \label{Eq:I:6:1} P(A)=N_A/N. \end{equation}

Our definition requires several comments. First of all, we may speak of a probability of something happening only if the occurrence is a possible outcome of some repeatable observation. It is not clear that it would make any sense to ask: “What is the probability that there is a ghost in that house?”

You may object that no situation is exactly repeatable. That is right. Every different observation must at least be at a different time or place. All we can say is that the “repeated” observations should, for our intended purposes, appear to be equivalent. We should assume, at least, that each observation was made from an equivalently prepared situation, and especially with the same degree of ignorance at the start. (If we sneak a look at an opponent’s hand in a card game, our estimate of our chances of winning are different than if we do not!)

We should emphasize that $N$ and $N_A$ in Eq. (6.1) are not intended to represent numbers based on actual observations. $N_A$ is our best estimate of what would occur in $N$ imagined observations. Probability depends, therefore, on our knowledge and on our ability to make estimates. In effect, on our common sense! Fortunately, there is a certain amount of agreement in the common sense of many things, so that different people will make the same estimate. Probabilities need not, however, be “absolute” numbers. Since they depend on our ignorance, they may become different if our knowledge changes.

You may have noticed another rather “subjective” aspect of our definition of probability. We have referred to $N_A$ as “our estimate of the most likely number …” We do not mean that we expect to observe exactly $N_A$, but that we expect a number near $N_A$, and that the number $N_A$ is more likely than any other number in the vicinity. If we toss a coin, say, $30$ times, we should expect that the number of heads would not be very likely to be exactly $15$, but rather only some number near to $15$, say $12$, $13$, $14$, $15$, $16$, or $17$. However, if we must choose, we would decide that $15$ heads is more likely than any other number. We would write $P(\text{heads})=0.5$.

Why did we choose $15$ as more likely than any other number? We must have argued with ourselves in the following manner: If the most likely number of heads is $N_H$ in a total number of tosses $N$, then the most likely number of tails $N_T$ is $(N-N_H)$. (We are assuming that every toss gives either heads or tails, and no “other” result!) But if the coin is “honest,” there is no preference for heads or tails. Until we have some reason to think the coin (or toss) is dishonest, we must give equal likelihoods for heads and tails. So we must set $N_T=N_H$. It follows that $N_T=$ $N_H=$ $N/2$, or $P(H)=$ $P(T)=$ $0.5$.

We can generalize our reasoning to any situation in which there are $m$ different but “equivalent” (that is, equally likely) possible results of an observation. If an observation can yield $m$ different results, and we have reason to believe that any one of them is as likely as any other, then the probability of a particular outcome $A$ is $P(A)=1/m$.

If there are seven different-colored balls in an opaque box and we pick one out “at random” (that is, without looking), the probability of getting a ball of a particular color is $\tfrac{1}{7}$. The probability that a “blind draw” from a shuffled deck of $52$ cards will show the ten of hearts is $\tfrac{1}{52}$. The probability of throwing a double-one with dice is $\tfrac{1}{36}$.

In Chapter 5 we described the size of a nucleus in terms of its apparent area, or “cross section.” When we did so we were really talking about probabilities. When we shoot a high-energy particle at a thin slab of material, there is some chance that it will pass right through and some chance that it will hit a nucleus. (Since the nucleus is so small that we cannot see it, we cannot aim right at a nucleus. We must “shoot blind.”) If there are $n$ atoms in our slab and the nucleus of each atom has a cross-sectional area $\sigma$, then the total area “shadowed” by the nuclei is $n\sigma$. In a large number $N$ of random shots, we expect that the number of hits $N_C$ of some nucleus will be in the ratio to $N$ as the shadowed area is to the total area of the slab: \begin{equation} \label{Eq:I:6:2} N_C/N=n\sigma/A. \end{equation} We may say, therefore, that the probability that any one projectile particle will suffer a collision in passing through the slab is \begin{equation} \label{Eq:I:6:3} P_C=\frac{n}{A}\,\sigma, \end{equation} where $n/A$ is the number of atoms per unit area in our slab.

6–2Fluctuations

Fig. 6–1.Observed sequences of heads and tails in three games of 30 tosses each.

We would like now to use our ideas about probability to consider in some greater detail the question: “How many heads do I really expect to get if I toss a coin $N$ times?” Before answering the question, however, let us look at what does happen in such an “experiment.” Figure 6–1 shows the results obtained in the first three “runs” of such an experiment in which $N=30$. The sequences of “heads” and “tails” are shown just as they were obtained. The first game gave $11$ heads; the second also $11$; the third $16$. In three trials we did not once get $15$ heads. Should we begin to suspect the coin? Or were we wrong in thinking that the most likely number of “heads” in such a game is $15$? Ninety-seven more runs were made to obtain a total of $100$ experiments of $30$ tosses each. The results of the experiments are given in Table 6–1.¹

Table 6–1Number of heads in successive trials of 30 tosses of a coin.
$11$	$16$	$17$	$15$	$17$	$16$	$19$	$18$	$15$	$13$	$100\text{ trials}$
$11$	$17$	$17$	$12$	$20$	$23$	$11$	$16$	$17$	$14$
$16$	$12$	$15$	$10$	$18$	$17$	$13$	$15$	$14$	$15$
$16$	$12$	$11$	$22$	$12$	$20$	$12$	$15$	$16$	$12$
$16$	$10$	$15$	$13$	$14$	$16$	$15$	$16$	$13$	$18$
$14$	$14$	$13$	$16$	$15$	$19$	$21$	$14$	$12$	$15$
$16$	$11$	$16$	$14$	$17$	$14$	$11$	$16$	$17$	$16$
$19$	$15$	$14$	$12$	$18$	$15$	$14$	$21$	$11$	$16$
$17$	$17$	$12$	$13$	$14$	$17$	$\phantom{1}9$	$13$	$19$	$13$
$14$	$12$	$15$	$17$	$14$	$10$	$17$	$17$	$12$	$11$

Looking at the numbers in Table 6–1, we see that most of the results are “near” $15$, in that they are between $12$ and $18$. We can get a better feeling for the details of these results if we plot a graph of the distribution of the results. We count the number of games in which a score of $k$ was obtained, and plot this number for each $k$. Such a graph is shown in Fig. 6–2. A score of $15$ heads was obtained in $13$ games. A score of $14$ heads was also obtained $13$ times. Scores of $16$ and $17$ were each obtained more than $13$ times. Are we to conclude that there is some bias toward heads? Was our “best estimate” not good enough? Should we conclude now that the “most likely” score for a run of $30$ tosses is really $16$ heads? But wait! In all the games taken together, there were $3000$ tosses. And the total number of heads obtained was $1493$. The fraction of tosses that gave heads is $0.498$, very nearly, but slightly less than half. We should certainly not assume that the probability of throwing heads is greater than $0.5$! The fact that one particular set of observations gave $16$ heads most often, is a fluctuation. We still expect that the most likely number of heads is $15$.

Fig. 6–2.Summary of the results of 100 games of 30 tosses each. The vertical bars show the number of games in which a score of $k$ heads was obtained. The dashed curve shows the expected numbers of games with the score $k$ obtained by a probability computation.

We may ask the question: “What is the probability that a game of $30$ tosses will yield $15$ heads—or $16$, or any other number?” We have said that in a game of one toss, the probability of obtaining one head is $0.5$, and the probability of obtaining no head is $0.5$. In a game of two tosses there are four possible outcomes: $HH$, $HT$, $TH$, $TT$. Since each of these sequences is equally likely, we conclude that (a) the probability of a score of two heads is $\tfrac{1}{4}$, (b) the probability of a score of one head is $\tfrac{2}{4}$, (c) the probability of a zero score is $\tfrac{1}{4}$. There are two ways of obtaining one head, but only one of obtaining either zero or two heads.

Consider now a game of $3$ tosses. The third toss is equally likely to be heads or tails. There is only one way to obtain $3$ heads: we must have obtained $2$ heads on the first two tosses, and then heads on the last. There are, however, three ways of obtaining $2$ heads. We could throw tails after having thrown two heads (one way) or we could throw heads after throwing only one head in the first two tosses (two ways). So for scores of $3$-$H$, $2$-$H$, $1$-$H$, $0$-$H$ we have that the number of equally likely ways is $1$, $3$, $3$, $1$, with a total of $8$ different possible sequences. The probabilities are $\tfrac{1}{8}$, $\tfrac{3}{8}$, $\tfrac{3}{8}$, $\tfrac{1}{8}$.

Fig. 6–3.A diagram for showing the number of ways a score of 0, 1, 2, or 3 heads can be obtained in a game of 3 tosses.

Fig. 6–4.A diagram like that of Fig. 6–3, for a game of 6 tosses.

The argument we have been making can be summarized by a diagram like that in Fig. 6–3. It is clear how the diagram should be continued for games with a larger number of tosses. Figure 6–4 shows such a diagram for a game of $6$ tosses. The number of “ways” to any point on the diagram is just the number of different “paths” (sequences of heads and tails) which can be taken from the starting point. The vertical position gives us the total number of heads thrown. The set of numbers which appears in such a diagram is known as Pascal’s triangle. The numbers are also known as the binomial coefficients, because they also appear in the expansion of $(a+b)^n$. If we call $n$ the number of tosses and $k$ the number of heads thrown, then the numbers in the diagram are usually designated by the symbol $\tbinom{n}{k}$. We may remark in passing that the binomial coefficients can also be computed from \begin{equation} \label{Eq:I:6:4} \binom{n}{k}=\frac{n!}{k!(n-k)!}, \end{equation} where $n!$, called “$n$-factorial,” represents the product $(n)(n-1)(n-2)\dotsm(3)(2)(1)$.

We are now ready to compute the probability $P(k,n)$ of throwing $k$ heads in $n$ tosses, using our definition Eq. (6.1). The total number of possible sequences is $2^n$ (since there are $2$ outcomes for each toss), and the number of ways of obtaining $k$ heads is $\tbinom{n}{k}$, all equally likely, so we have \begin{equation} \label{Eq:I:6:5} P(k,n)=\frac{\tbinom{n}{k}}{2^n}. \end{equation}

Since $P(k,n)$ is the fraction of games which we expect to yield $k$ heads, then in $100$ games we should expect to find $k$ heads $100\cdot P(k,n)$ times. The dashed curve in Fig. 6–2 passes through the points computed from $100\cdot P(k,30)$. We see that we expect to obtain a score of $15$ heads in $14$ or $15$ games, whereas this score was observed in $13$ games. We expect a score of $16$ in $13$ or $14$ games, but we obtained that score in $15$ games. Such fluctuations are “part of the game.”

The method we have just used can be applied to the most general situation in which there are only two possible outcomes of a single observation. Let us designate the two outcomes by $W$ (for “win”) and $L$ (for “lose”). In the general case, the probability of $W$ or $L$ in a single event need not be equal. Let $p$ be the probability of obtaining the result $W$. Then $q$, the probability of $L$, is necessarily $(1-p)$. In a set of $n$ trials, the probability $P(k,n)$ that $W$ will be obtained $k$ times is \begin{equation} \label{Eq:I:6:6} P(k,n)=\tbinom{n}{k}p^kq^{n-k}. \end{equation} This probability function is called the Bernoulli or, also, the binomial probability.

6–3The random walk

There is another interesting problem in which the idea of probability is required. It is the problem of the “random walk.” In its simplest version, we imagine a “game” in which a “player” starts at the point $x=0$ and at each “move” is required to take a step either forward (toward $+x$) or backward (toward $-x$). The choice is to be made randomly, determined, for example, by the toss of a coin. How shall we describe the resulting motion? In its general form the problem is related to the motion of atoms (or other particles) in a gas—called Brownian motion—and also to the combination of errors in measurements. You will see that the random-walk problem is closely related to the coin-tossing problem we have already discussed.

First, let us look at a few examples of a random walk. We may characterize the walker’s progress by the net distance $D_N$ traveled in $N$ steps. We show in the graph of Fig. 6–5 three examples of the path of a random walker. (We have used for the random sequence of choices the results of the coin tosses shown in Fig. 6–1.)

Fig. 6–5.The progress made in a random walk. The horizontal coordinate $N$ is the total number of steps taken; the vertical coordinate $D_N$ is the net distance moved from the starting position.

What can we say about such a motion? We might first ask: “How far does he get on the average?” We must expect that his average progress will be zero, since he is equally likely to go either forward or backward. But we have the feeling that as $N$ increases, he is more likely to have strayed farther from the starting point. We might, therefore, ask what is his average distance travelled in absolute value, that is, what is the average of $\abs{D}$. It is, however, more convenient to deal with another measure of “progress,” the square of the distance: $D^2$ is positive for either positive or negative motion, and is therefore a reasonable measure of such random wandering.

We can show that the expected value of $D_N^2$ is just $N$, the number of steps taken. By “expected value” we mean the probable value (our best guess), which we can think of as the expected average behavior in many repeated sequences. We represent such an expected value by $\expval{D_N^2}$, and may refer to it also as the “mean square distance.” After one step, $D^2$ is always $+1$, so we have certainly $\expval{D_1^2}=1$. (All distances will be measured in terms of a unit of one step. We shall not continue to write the units of distance.)

The expected value of $D_N^2$ for $N>1$ can be obtained from $D_{N-1}$. If, after $(N-1)$ steps, we have $D_{N-1}$, then after $N$ steps we have $D_N=D_{N-1}+1$ or $D_N=D_{N-1}-1$. For the squares, \begin{equation} \label{Eq:I:6:7} D_N^2= \begin{cases} D_{N-1}^2+2D_{N-1}+1,\\[2ex] \kern{3.7em}\textit{or}\\[2ex] D_{N-1}^2-2D_{N-1}+1. \end{cases} \end{equation} In a number of independent sequences, we expect to obtain each value one-half of the time, so our average expectation is just the average of the two possible values. The expected value of $D_N^2$ is then $D_{N-1}^2+1$. In general, we should expect for $D_{N-1}^2$ its “expected value” $\expval{D_{N-1}^2}$ (by definition!). So \begin{equation} \label{Eq:I:6:8} \expval{D_N^2}=\expval{D_{N-1}^2}+1. \end{equation}

We have already shown that $\expval{D_1^2}=1$; it follows then that \begin{equation} \label{Eq:I:6:9} \expval{D_N^2}=N, \end{equation} a particularly simple result!

If we wish a number like a distance, rather than a distance squared, to represent the “progress made away from the origin” in a random walk, we can use the “root-mean-square distance” $D_{\text{rms}}$: \begin{equation} \label{Eq:I:6:10} D_{\text{rms}}=\sqrt{\expval{D^2}}=\sqrt{N}. \end{equation}

We have pointed out that the random walk is closely similar in its mathematics to the coin-tossing game we considered at the beginning of the chapter. If we imagine the direction of each step to be in correspondence with the appearance of heads or tails in a coin toss, then $D$ is just $N_H-N_T$, the difference in the number of heads and tails. Since $N_H+N_T=N$, the total number of steps (and tosses), we have $D=2N_H-N$. We have derived earlier an expression for the expected distribution of $N_H$ (also called $k$) and obtained the result of Eq. (6.5). Since $N$ is just a constant, we have the corresponding distribution for $D$. (Since for every head more than $N/2$ there is a tail “missing,” we have the factor of $2$ between $N_H$ and $D$.) The graph of Fig. 6–2 represents the distribution of distances we might get in $30$ random steps (where $k=15$ is to be read $D=0$; $k=16$, $D=2$; etc.).

The variation of $N_H$ from its expected value $N/2$ is \begin{equation} \label{Eq:I:6:11} N_H-\frac{N}{2}=\frac{D}{2}. \end{equation} The rms deviation is \begin{equation} \label{Eq:I:6:12} \biggl(N_H-\frac{N}{2}\biggr)_{\text{rms}}=\tfrac{1}{2}\sqrt{N}. \end{equation}

According to our result for $D_{\text{rms}}$, we expect that the “typical” distance in $30$ steps ought to be $\sqrt{30} \approx 5.5$, or a typical $k$ should be about $5.5/2 = 2.75$ units from $15$. We see that the “width” of the curve in Fig. 6–2, measured from the center, is just about $3$ units, in agreement with this result.

We are now in a position to consider a question we have avoided until now. How shall we tell whether a coin is “honest” or “loaded”? We can give now at least a partial answer. For an honest coin, we expect the fraction of the times heads appears to be $0.5$, that is, \begin{equation} \label{Eq:I:6:13} \frac{\expval{N_H}}{N}=0.5. \end{equation} We also expect an actual $N_H$ to deviate from $N/2$ by about $\sqrt{N}/2$, or the fraction to deviate by \begin{equation*} \frac{1}{N}\,\frac{\sqrt{N}}{2}=\frac{1}{2\sqrt{N}}. \end{equation*} The larger $N$ is, the closer we expect the fraction $N_H/N$ to be to one-half.

Fig. 6–6.The fraction of the tosses that gave heads in a particular sequence of $N$ tosses of a penny.

In Fig. 6–6 we have plotted the fraction $N_H/N$ for the coin tosses reported earlier in this chapter. We see the tendency for the fraction of heads to approach $0.5$ for large $N$. Unfortunately, for any given run or combination of runs there is no guarantee that the observed deviation will be even near the expected deviation. There is always the finite chance that a large fluctuation—a long string of heads or tails—will give an arbitrarily large deviation. All we can say is that if the deviation is near the expected $1/2\sqrt{N}$ (say within a factor of $2$ or $3$), we have no reason to suspect the honesty of the coin. If it is much larger, we may be suspicious, but cannot prove, that the coin is loaded (or that the tosser is clever!).

We have also not considered how we should treat the case of a “coin” or some similar “chancy” object (say a stone that always lands in either of two positions) that we have good reason to believe should have a different probability for heads and tails. We have defined $P(H)=\expval{N_H}/N$. How shall we know what to expect for $N_H$? In some cases, the best we can do is to observe the number of heads obtained in large numbers of tosses. For want of anything better, we must set $\expval{N_H}=N_H(\text{observed})$. (How could we expect anything else?) We must understand, however, that in such a case a different experiment, or a different observer, might conclude that $P(H)$ was different. We would expect, however, that the various answers should agree within the deviation $1/2\sqrt{N}$ [if $P(H)$ is near one-half]. An experimental physicist usually says that an “experimentally determined” probability has an “error,” and writes \begin{equation} \label{Eq:I:6:14} P(H)=\frac{N_H}{N}\pm\frac{1}{2\sqrt{N}}. \end{equation} There is an implication in such an expression that there is a “true” or “correct” probability which could be computed if we knew enough, and that the observation may be in “error” due to a fluctuation. There is, however, no way to make such thinking logically consistent. It is probably better to realize that the probability concept is in a sense subjective, that it is always based on uncertain knowledge, and that its quantitative evaluation is subject to change as we obtain more information.

6–4A probability distribution

Let us return now to the random walk and consider a modification of it. Suppose that in addition to a random choice of the direction ($+$ or $-$) of each step, the length of each step also varied in some unpredictable way, the only condition being that on the average the step length was one unit. This case is more representative of something like the thermal motion of a molecule in a gas. If we call the length of a step $S$, then $S$ may have any value at all, but most often will be “near” $1$. To be specific, we shall let $\expval{S^2}=1$ or, equivalently, $S_{\text{rms}}=1$. Our derivation for $\expval{D^2}$ would proceed as before except that Eq. (6.8) would be changed now to read \begin{equation} \label{Eq:I:6:15} \expval{D_N^2}=\expval{D_{N-1}^2}+\expval{S^2}=\expval{D_{N-1}^2}+1. \end{equation} We have, as before, that \begin{equation} \label{Eq:I:6:16} \expval{D_N^2}=N. \end{equation}

What would we expect now for the distribution of distances $D$? What is, for example, the probability that $D=0$ after $30$ steps? The answer is zero! The probability is zero that $D$ will be any particular value, since there is no chance at all that the sum of the backward steps (of varying lengths) would exactly equal the sum of forward steps. We cannot plot a graph like that of Fig. 6–2.

We can, however, obtain a representation similar to that of Fig. 6–2, if we ask, not what is the probability of obtaining $D$ exactly equal to $0$, $1$, or $2$, but instead what is the probability of obtaining $D$ near $0$, $1$, or $2$. Let us define $P(x,\Delta x)$ as the probability that $D$ will lie in the interval $\Delta x$ located at $x$ (say from $x$ to $x+\Delta x$). We expect that for small $\Delta x$ the chance of $D$ landing in the interval is proportional to $\Delta x$, the width of the interval. So we can write \begin{equation} \label{Eq:I:6:17} P(x,\Delta x)=p(x)\,\Delta x. \end{equation} The function $p(x)$ is called the probability density.

The form of $p(x)$ will depend on $N$, the number of steps taken, and also on the distribution of individual step lengths. We cannot demonstrate the proofs here, but for large $N$, $p(x)$ is the same for all reasonable distributions in individual step lengths, and depends only on $N$. We plot $p(x)$ for three values of $N$ in Fig. 6–7. You will notice that the “half-widths” (typical spread from $x=0$) of these curves is $\sqrt{N}$, as we have shown it should be.

Fig. 6–7.The probability density for ending up at the distance $D$ from the starting place in a random walk of $N$ steps. ($D$ is measured in units of the rms step length.)

You may notice also that the value of $p(x)$ near zero is inversely proportional to $\sqrt{N}$. This comes about because the curves are all of a similar shape and their areas under the curves must all be equal. Since $p(x)\,\Delta x$ is the probability of finding $D$ in $\Delta x$ when $\Delta x$ is small, we can determine the chance of finding $D$ somewhere inside an arbitrary interval from $x_1$ to $x_2$, by cutting the interval in a number of small increments $\Delta x$ and evaluating the sum of the terms $p(x)\,\Delta x$ for each increment. The probability that $D$ lands somewhere between $x_1$ and $x_2$, which we may write $P(x_1 < D < x_2)$, is equal to the shaded area in Fig. 6–8. The smaller we take the increments $\Delta x$, the more correct is our result. We can write, therefore, \begin{equation} \label{Eq:I:6:18} P(x_1 < D < x_2)=\sum p(x)\,\Delta x=\int_{x_1}^{x_2}p(x)\,dx. \end{equation} \begin{equation} \begin{gathered} P(x_1 < D < x_2)=\sum p(x)\Delta x\\[1ex] =\int_{x_1}^{x_2}p(x)\,dx. \end{gathered} \label{Eq:I:6:18} \end{equation}

Fig. 6–8.The probability that the distance $D$ traveled in a random walk is between $x_1$ and $x_2$ is the area under the curve of $p(x)$ from $x_1$ to $x_2$.

The area under the whole curve is the probability that $D$ lands somewhere (that is, has some value between $x=-\infty$ and $x=+\infty$). That probability is surely $1$. We must have that \begin{equation} \label{Eq:I:6:19} \int_{-\infty}^{+\infty}p(x)\,dx=1. \end{equation} Since the curves in Fig. 6–7 get wider in proportion to $\sqrt{N}$, their heights must be proportional to $1/\sqrt{N}$ to maintain the total area equal to $1$.

The probability density function we have been describing is one that is encountered most commonly. It is known as the normal or Gaussian probability density. It has the mathematical form \begin{equation} \label{Eq:I:6:20} p(x)=\frac{1}{\sigma\sqrt{2\pi}}\,e^{-x^2/2\sigma^2}, \end{equation} where $\sigma$ is called the standard deviation and is given, in our case, by $\sigma=\sqrt{N}$ or, if the rms step size is different from $1$, by $\sigma=\sqrt{N}S_{\text{rms}}$.

We remarked earlier that the motion of a molecule, or of any particle, in a gas is like a random walk. Suppose we open a bottle of an organic compound and let some of its vapor escape into the air. If there are air currents, so that the air is circulating, the currents will also carry the vapor with them. But even in perfectly still air, the vapor will gradually spread out—will diffuse—until it has penetrated throughout the room. We might detect it by its color or odor. The individual molecules of the organic vapor spread out in still air because of the molecular motions caused by collisions with other molecules. If we know the average “step” size, and the number of steps taken per second, we can find the probability that one, or several, molecules will be found at some distance from their starting point after any particular passage of time. As time passes, more steps are taken and the gas spreads out as in the successive curves of Fig. 6–7. In a later chapter, we shall find out how the step sizes and step frequencies are related to the temperature and pressure of a gas.

Earlier, we said that the pressure of a gas is due to the molecules bouncing against the walls of the container. When we come later to make a more quantitative description, we will wish to know how fast the molecules are going when they bounce, since the impact they make will depend on that speed. We cannot, however, speak of the speed of the molecules. It is necessary to use a probability description. A molecule may have any speed, but some speeds are more likely than others. We describe what is going on by saying that the probability that any particular molecule will have a speed between $v$ and $v+\Delta v$ is $p(v)\,\Delta v$, where $p(v)$, a probability density, is a given function of the speed $v$. We shall see later how Maxwell, using common sense and the ideas of probability, was able to find a mathematical expression for $p(v)$. The form² of the function $p(v)$ is shown in Fig. 6–9. Velocities may have any value, but are most likely to be near the most probable value $v_p$.

Fig. 6–9.The distribution of velocities of the molecules in a gas.

We often think of the curve of Fig. 6–9 in a somewhat different way. If we consider the molecules in a typical container (with a volume of, say, one liter), then there are a very large number $N$ of molecules present ($N\approx10^{22}$). Since $p(v)\,\Delta v$ is the probability that one molecule will have its velocity in $\Delta v$, by our definition of probability we mean that the expected number $\expval{\Delta N}$ to be found with a velocity in the interval $\Delta v$ is given by \begin{equation} \label{Eq:I:6:21} \expval{\Delta N}=N\,p(v)\,\Delta v. \end{equation} We call $N\,p(v)$ the “distribution in velocity.” The area under the curve between two velocities $v_1$ and $v_2$, for example the shaded area in Fig. 6–9, represents [for the curve $N\,p(v)$] the expected number of molecules with velocities between $v_1$ and $v_2$. Since with a gas we are usually dealing with large numbers of molecules, we expect the deviations from the expected numbers to be small (like $1/\sqrt{N}$), so we often neglect to say the “expected” number, and say instead: “The number of molecules with velocities between $v_1$ and $v_2$ is the area under the curve.” We should remember, however, that such statements are always about probable numbers.

6–5The uncertainty principle

The ideas of probability are certainly useful in describing the behavior of the $10^{22}$ or so molecules in a sample of a gas, for it is clearly impractical even to attempt to write down the position or velocity of each molecule. When probability was first applied to such problems, it was considered to be a convenience—a way of dealing with very complex situations. We now believe that the ideas of probability are essential to a description of atomic happenings. According to quantum mechanics, the mathematical theory of particles, there is always some uncertainty in the specification of positions and velocities. We can, at best, say that there is a certain probability that any particle will have a position near some coordinate $x$.

We can give a probability density $p_1(x)$, such that $p_1(x)\,\Delta x$ is the probability that the particle will be found between $x$ and $x+\Delta x$. If the particle is reasonably well localized, say near $x_0$, the function $p_1(x)$ might be given by the graph of Fig. 6–10(a). Similarly, we must specify the velocity of the particle by means of a probability density $p_2(v)$, with $p_2(v)\,\Delta v$ the probability that the velocity will be found between $v$ and $v+\Delta v$.

Fig. 6–10.Probability densities for observation of the position and velocity of a particle.

It is one of the fundamental results of quantum mechanics that the two functions $p_1(x)$ and $p_2(v)$ cannot be chosen independently and, in particular, cannot both be made arbitrarily narrow. If we call the typical “width” of the $p_1(x)$ curve $[\Delta x]$, and that of the $p_2(v)$ curve $[\Delta v]$ (as shown in the figure), nature demands that the product of the two widths be at least as big as the number $\hbar/2m$, where $m$ is the mass of the particle. We may write this basic relationship as \begin{equation} \label{Eq:I:6:22} [\Delta x]\cdot[\Delta v]\geq\hbar/2m. \end{equation} This equation is a statement of the Heisenberg uncertainty principle that we mentioned earlier.

Since the right-hand side of Eq. (6.22) is a constant, this equation says that if we try to “pin down” a particle by forcing it to be at a particular place, it ends up by having a high speed. Or if we try to force it to go very slowly, or at a precise velocity, it “spreads out” so that we do not know very well just where it is. Particles behave in a funny way!

The uncertainty principle describes an inherent fuzziness that must exist in any attempt to describe nature. Our most precise description of nature must be in terms of probabilities. There are some people who do not like this way of describing nature. They feel somehow that if they could only tell what is really going on with a particle, they could know its speed and position simultaneously. In the early days of the development of quantum mechanics, Einstein was quite worried about this problem. He used to shake his head and say, “But, surely God does not throw dice in determining how electrons should go!” He worried about that problem for a long time and he probably never really reconciled himself to the fact that this is the best description of nature that one can give. There are still one or two physicists who are working on the problem who have an intuitive conviction that it is possible somehow to describe the world in a different way and that all of this uncertainty about the way things are can be removed. No one has yet been successful.

The necessary uncertainty in our specification of the position of a particle becomes most important when we wish to describe the structure of atoms. In the hydrogen atom, which has a nucleus of one proton with one electron outside of the nucleus, the uncertainty in the position of the electron is as large as the atom itself! We cannot, therefore, properly speak of the electron moving in some “orbit” around the proton. The most we can say is that there is a certain chance $p(r)\,\Delta V$, of observing the electron in an element of volume $\Delta V$ at the distance $r$ from the proton. The probability density $p(r)$ is given by quantum mechanics. For an undisturbed hydrogen atom $p(r)=Ae^{-2r/a}$. The number $a$ is the “typical” radius, where the function is decreasing rapidly. Since there is a small probability of finding the electron at distances from the nucleus much greater than $a$, we may think of $a$ as “the radius of the atom,” about $10^{-10}$ meter.

Fig. 6–11.A way of visualizing a hydrogen atom. The density (whiteness) of the cloud represents the probability density for observing the electron.

We can form an image of the hydrogen atom by imagining a “cloud” whose density is proportional to the probability density for observing the electron. A sample of such a cloud is shown in Fig. 6–11. Thus our best “picture” of a hydrogen atom is a nucleus surrounded by an “electron cloud” (although we really mean a “probability cloud”). The electron is there somewhere, but nature permits us to know only the chance of finding it at any particular place.

In its efforts to learn as much as possible about nature, modern physics has found that certain things can never be “known” with certainty. Much of our knowledge must always remain uncertain. The most we can know is in terms of probabilities.

After the first three games, the experiment was actually done by shaking $30$ pennies violently in a box and then counting the number of heads that showed. ↩
Maxwell’s expression is $p(v)=Cv^2e^{-av^2}$, where $a$ is a constant related to the temperature and $C$ is chosen so that the total probability is one. ↩