init_mathjax(); Pig in the Python: Stitching Monte Carlo Methods with Buffon's Needle

Monday, September 9, 2013

Stitching Monte Carlo Methods with Buffon's Needle

Buffon's Needle

The problem is to randomly drop a needle on the unit square and count the number of times that the needle touches (or "cuts") the edge of the square. This is a well-studied problem, but instead of the usual tack of going directly for the analytical result (see appendix), we instead write a quick simulation that we'll later refine to obtain more precise results more efficiently. This methodology is common in Monte Carlo methods and we will work through this example in careful detail.

Buffon's needle is a classic problem that is easy to understand, and complex enough to be characteristic of much more difficult problems encountered in practice. The overall idea is that when the problem is too complex to analyze analytically (at least in the general case), we write representative computer models that can drive a numerical solution and motivate analytical solutions, even if only on special cases.

Setting up the Simulation

The following code-block sets up the random needle position and orientation and creates the corresponding matplotlib primitives to be drawn in the figure. As shown, the simulation chooses a random center point for the needle in the unit square and then chooses a random angle $\theta \in (0,\pi)$ as the needle's orientation. Because the needle pivots at its center, we only need $\theta$ in this range.

In [4]:
from __future__ import  division
from matplotlib.lines import Line2D
from matplotlib.patches import Rectangle

def needle_gen(L=0.5):
    '''Drops needle on unit square. Return dictionary describing needle position and 
    orientation. The `cut` key in the dictionary is True when the needle intersects 
    an edge of the unit square and is False otherwise.
    
    L := length of needle (default = 0.5)
    '''
    assert 0<L<1
    xc,yc = rand(2) # uniform random in [0,1] for each coordinate dimension
    ang = rand()*pi # uniform random angle in [0,pi]
    x0,y0 = xc-L/2*cos(ang),yc-L/2*sin(ang) # coordinates for one end of needle 
    xe,ye = xc+L/2*cos(ang),yc+L/2*sin(ang) # coordinates for the other end
    not_cut = (0<x0<1) and (0<y0<1) and (0<xe<1) and (0<ye<1)
    return dict(x0=x0,y0=y0,xc=xc,yc=yc,xe=xe,ye=ye,ang=ang,cut=( not not_cut ))


def draw_needle(ax,d):
    '''
    Draw needle symbol on given axis.
    
    If red, then the line cuts an edge of the unit square. If blue, then it does not. 
    One end of the needle is marked with a `o` symbol and the other end is unmarked. 
    
    ax := matplotlib axes to draw needle symbol
    d := dictionary output from needle_gen
    '''
    for i,j in d.iteritems(): exec('%s=%r'%(i,j)) # dump dictionary in local namespace
    if cut: 
        line= [Line2D([x0,xe],[y0,ye],color='r',alpha=0.3), 
               Line2D([xc],[yc],marker='o',color='r')] 
    else:   
        line= [Line2D([x0,xe],[y0,ye],alpha=0.3), 
               Line2D([xc],[yc],marker='o',alpha=0.3)] 
    for i in line: 
        ax.add_line(i)

Now, we can simulate the dropping event on the square (yellow background ). The red needles cut the edge of the square and the blue ones are completely contained therein. The circle marker shows the needle position.

In [5]:
L=0.5

samples = [needle_gen() for i in range(500)]

def draw_sim(samples,L=0.5):
    'Draw simulation results. Package this plot for reuse later'
    fig,ax=subplots()
    fig.set_size_inches(5,5)
    
    red_count=0
    n=len(samples)
    for k in samples:
        red_count+=k['cut']
        draw_needle(ax,k)
        
    ax.set_aspect(1) # set aspect ratio to 1
    ax.add_patch(Rectangle((0,0),1,1,alpha=0.3,color='y')) # unit square background is light yellow
    ax.set_xlabel('x-direction')
    ax.set_ylabel('y-direction')
    ax.axis([-.1,1.1,-.1,1.1]) # add some space around the unit square
    ax.set_title(r'$\mathbb{P}$ (cut)=%3.2f'% (red_count/n))
    ax.grid()

draw_sim(samples)    

The figure above is a good visual check of our simulation and helps motivate our reasoning. Now, we want to concentrate on counting the number of cuts and then use that to estimate the probability of a cut.

In [6]:
fig,axs = subplots(1,2)
fig.set_size_inches((12,2))
ax=axs[0]
samples=np.array([[needle_gen(L)['cut'] for i in range(300)] for k in range(100)])

ax.plot(np.mean(samples,axis=1),'o-',alpha=0.3)
ax.axis(xmax=samples.shape[0])
ax.set_xlabel(r'$trial$',fontsize=18)
ax.set_ylabel('$\mathbb{\hat{P}}(cut)$',fontsize=18)
ax.set_title('$\overline{\mathbb{\hat{P}}}(cut) = %3.3f,\hat{\sigma}=%3.3f$'%
             (samples.mean(),(samples.mean(1)).std()))
ax.grid()

ax = axs[1]
ax.hist( samples.mean(1));
ax.set_title('Distribution of Mean Estimates');

The figure above shows some statistical history of our Monte Carlo simulation. The samples array in the prior code-block is a 100x300 matrix of True/False values indicating cut or no-cut. The overall average of this array is our estimate of the probability of a cut. In other words, we count the number of cuts and divide by the total number of needle drops to obtain the estimated probability (approximately 55% in this case of $L=1/2$). The plot on the left shows the average across each of the 300 columns. Thus, there are 100 such column-wise averages (recall the samples matrix is 100x300). The title shows the estimated standard deviation across these 100 columnwise averages.

The plot on the right shows a histogram of the column-wise averages. Why did we partition the simulated samples like this and compute column-wise instead of just using all the data at once? Because we want to get a sense of the spread of the estimates by computing the standard error. If we used all our samples at once, we would not be able to do this even though we'll still use our average of these averages to get our ultimate probability of a cut!

Naturally, if we wanted to improve the standard deviation of our ensemble of columnwise means, we could simply run a larger simulation, but remember that the standard error only improves at the rate of $\sqrt{N}$ (where $N$ is the number of samples) so we'd need a sample run four times as large to improve the standard error by a factor of two. Furthermore, we need to understand the mechanics of this problem better per unit of computing time.

Relative Errors and Using Weighting for Better Efficiency

There's nothing wrong with the way we have been using the simulation so far, but in the first figure all of those blue needles did not help us compute the probability we were after because they did not touch the edges of the square. Yes, we could reverse the reasoning and use the blue instead of the red for our estimation, but that does not escape the basic problem. For example, if the length of the needle were really short in comparison, then there would have been a lot of blue in our initial plot (try this using this IPython notebook!). In the extreme case, we could find ourselves in a situation where we run the simulation for a long time and not have a single cut! We want to use our valuable computer time to create samples that will improve our estimated solution.

To recap, we computed a matrix of (100x300) cases of needle drops. Each of element of samples in the code above is a True/False value as to whether or not the needle touched the edge of the square. We averaged over the 300 columns to compute the estimated probability of a cut (the mean value of the boolean array in this case). The estimated standard deviation is taken over the ensemble of these estimated means.

A common way to evaluate the quality of the simulated result is to compute the relative error, which is the ratio of the estimated standard deviation of the mean to the estimated mean using all of the samples ($R_e$). This is computed in the title of the figure below.

In [7]:
fig,ax=subplots()
fig.set_size_inches(12,2)
ax.plot(samples.mean(1))
ax.set_title(r'$\hat{\sigma}=%2.3f,R_e=%2.3f$'% (samples.mean(1).std(),samples.mean(1).std()/samples.mean()), fontsize=18)
ax.set_xlabel('sample history index',fontsize=16)
ax.set_ylabel('$\mathbb{\hat{P}}(cut)$',fontsize=18);

So far, our relative error is pretty good (the rule of thumb is below 5%), but let's see how we do when the needle length is smaller? The code-block below draws the corresponding figure for the simulation for this case.

In [8]:
L=0.1

samples_d=[needle_gen(L) for i in range(300*100)]
samples = np.array([k['cut'] for k in samples_d]).reshape(100,300)

def draw_sample_history(samples):
    fig,ax=subplots()
    fig.set_size_inches(12,2)
    mn = samples.mean(1)
    ax.plot(mn)
    ax.set_title(r'$\hat{\sigma}=%2.3f,R_e=%2.3f$'% (mn.std(),
                                                     mn.std()/mn.mean()), fontsize=18)
    ax.set_xlabel('sample history index',fontsize=16)
    ax.set_ylabel('$\mathbb{\hat{P}}(cut)$',fontsize=18);

draw_sample_history(samples)    

The figure above shows that for shorter needles, we did not do so well for relative error. This situation is illustrated in the figure below.

In [9]:
draw_sim(samples_d[:500])