• 熊孩子不小心把弟弟锁进快递柜 消防5分钟破拆 2019-08-20
  • 春运首日 追忆那些年我们坐过的绿皮车 2019-08-20
  • 推动乡村振兴的重要举措(新知新觉) 2019-08-16
  • 中国民生投资集团获第十二届人民企业社会责任奖年度企业奖 2019-08-12
  • 蓝天下的河北—波光潋滟衡水湖 2019-08-10
  • 美对500亿美元中国商品征税 外媒美做法很不体面 2019-08-09
  • 栗战书主持召开十三届全国人大常委会第五次委员长会议 决定十三届全国人大常委会第三次会议6月19日至22日举行 2019-08-09
  • 受“闰月”影响今年端午节比去年“晚”了19天 2019-07-28
  • 海淀创业园双创文化:工艺师传播纸制花艺术与花道 2019-07-28
  • 朝韩将军级会谈时隔11年后在板门店重启 2019-07-23
  • Conférence de presse du Premier ministre chinois 2019-07-09
  • 大观园举办“古琴雅集” 市民感受“古韵端午” 2019-06-30
  • 文脉颂中华——黄河新闻网 2019-06-30
  • 水费欠账竟“穿越”16年?用户质疑:为何没见催缴过? 2019-06-21
  • 买房怎么看风水这个真的实在是太重要了 ——凤凰网房产北京 2019-06-11
  • In [1]:
    %matplotlib inline
    from IPython.display import Image
    from IPython.html.widgets import interact
    
    In [2]:
    Image('Lighthouse_schematic.jpg',width=500)
    
    Out[2]:

    The following is a classic estimation problem called the "lighthouse problem". The figure shows a set of receivers distributed at coordinates $x_k$ along the shore and a lighthouse located at some position $(\alpha,\beta)$ offshore. The idea is that the coastline is equipped with a continuous strip of photodetectors. The lighthouse flashes the shore $N$ times at some random angle $\theta_k$. The strip of photodetectors registers the $k^{th}$ flash position $x_k$, but the the angle $\theta_k$ of the flash is unknown. Furthermore, the lighthouse beam is laser-like so there is no smearing along the strip of photodetectors. In other words, the lighthouse is actually more of a disco-ball in a dark nightclub.

    The problem is how to estimate $ \alpha $ given we already have $\beta$.

    From basic trigonometry, we have the following:

    $$ \beta \tan(\theta_k) = x_k - \alpha $$

    The density function for the angle is assumed to be the following:

    $$ f_{\alpha}(\theta_k) = \frac{1}{\pi} $$

    This means that the density of the angle is uniformly distributed between $ \pm \pi/2 $. Now, what we really want is the density function for $x_k$ which will tell us the probability that the $k^{th}$ flash will be recorded at position $ x_k $. After a transformation of variables, we obtain the following:

    $$ f_{\alpha}(x_k) = \frac{\beta}{\pi(\beta ^2 +(x_k-\alpha)^2)} $$

    which we plot below for some reasonable factors

    In [3]:
    xi = linspace(-10,10,150)
    alpha = 1
    f = lambda x: 1/(pi*(1+(x-alpha)**2))
    plot(xi,f(xi))
    xlabel('$x$',fontsize=24)
    ylabel(r'$f_{\alpha}(x)$',fontsize=24);
    vlines(alpha,0,.35,linestyles='--',lw=4.)
    grid()
    

    As shown in figure, the peak for this distribution is when $ x_k=\alpha $. Because there is no coherent processing, the recording of a signal at one position does not influence what we can infer about the position of another measurement. Thus, we can justify assuming independence between the individual $x_k$ measurements. The encouraging thing about this distribution is that it is centered at the variable we are trying to estimate, namely $\alpha$.

    The temptation here is to just average out the $x_k$ measurements because of this tendency of the distribution around $\alpha$. Later, we will see why this is a bad idea.

    Using Maximum Likelihood Estimation

    500元 倍投方案 稳赚 www.gvqn.net Given $N$ measurements, we form the likelihood as the following:

    $$ L(\alpha) = \prod_{k=1}^N f_{\alpha}(x_k)$$

    And this is the function we want to maximize for our maximum likelihood estimator. Let's process the logarithm of this to make the product easy to work with as this does not influence the position of the maximum $ \alpha $. Then,

    $$\mathcal{L}(\alpha)=\sum_{k=1}^N \ln f_{\alpha}(x_k) = \texttt{constant}- \sum_{k=1}^N \ln (\beta ^2 +(x_k-\alpha)^2) $$

    Taking the first derivative gives us the equation would have to solve in order to compute the estimator for $ \alpha $,

    $$ \frac{d \mathcal{L}}{d \alpha} = 2 \sum_{k=1}^N \frac{x_k-\alpha}{\beta^2+(x_k-\alpha)^2}=0$$

    Unfortunately, there is no easy way to solve for the optimal $ \alpha $ for this equation. However, we are not defenseless at this point because Python has all the tools we need to overcome this. Let's start by getting a quick look at the histogram of the $x_k$ measurements.

    In [4]:
    beta  =alpha = 1
    theta_samples=((2*np.random.rand(250)-1)*pi/2)
    x_samples = alpha+beta*np.tan(theta_samples)
    hist(x_samples);
    

    The histogram above shows that although many of the measurements are in one bin, even for relatively few samples, we still observe measurements that are very for displaced from the main group. This is because we initially assumed that the $\theta_k$ angle could be anywhere between $[-\pi/2,\pi/2]$, which is basically expressing our ignorance of the length of the coastline.

    With that in mind, let's turn to the maximum likelihood estimation. We will need some tools from sympy.

    In [5]:
    import sympy as S
    a=S.symbols('alpha',real=True)
    x = S.symbols('x',real=True)
    # form derivative of the log-likelihood
    dL=sum((xk-a)/(1+(xk-a)**2)  for xk in x_samples)
    S.plot(dL,(a,-15,15),xlabel=r'$\alpha$',ylabel=r'$dL(\alpha)/d\alpha$')
    
    Out[5]:
    <sympy.plotting.plot.Plot at 0x94adb90>

    The above figure shows that the zero-crossing of the derivative of the log-likelihood crosses where the $\alpha$ is to be estimated. Thus, our next task is to solve for the zero-crossing, which will then reveal the maximum likelihood estimate of $\alpha$ given the set of measurements $\lbrace x_k \rbrace$.

    There are tools in scipy.optimize that can help us compute the zero-crossing as demonstrated in the cell below.

    In [6]:
    from scipy import optimize
    from scipy.optimize import fmin_l_bfgs_b
    
    # convert sympy function to numpy version with lambdify
    alpha_x  = fmin_l_bfgs_b(S.lambdify(a,(dL)**2),0,bounds=[(-3,3)],approx_grad=True)
    
    print alpha_x
    
    (array([ 0.99435905]), array([  3.76209349e-14]), {'warnflag': 2, 'task': 'ABNORMAL_TERMINATION_IN_LNSRCH', 'grad': array([  1.64970679e-05]), 'nit': 5, 'funcalls': 150})
    

    Comparing The Maximum likelihood Estimator to Just Plain Averaging

    Whew. That was a lot of work to compute the maximum likelihood estimation. Earlier we observed that the density function is peaked around the $\alpha$ we are trying to estimate. Why not just take the average of the $\lbrace x_k \rbrace$ and use that to estimate $\alpha$?

    Let's try computing the average in the cell below.

    In [7]:
    print 'alpha using average =',x_samples.mean()
    print 'maximum likelihood estimate = ', alpha_x[0]
    
    alpha using average = -14.7935661293
    maximum likelihood estimate =  [ 0.99435905]
    

    If you run this notebook a few times, you will see that estimate using the average has enormous variance. This is a consequence of the fact that we can have very large absolute values for $\lbrace x_k \rbrace$ corresponding to values of $\theta_k$ near the edges of the $[-\pi/2,\pi/2]$ interval.

    In [8]:
    def run_trials(n=100):
        o=[]
        for i in range(100):
            theta_samples=((2*np.random.rand(250)-1)*pi/2)
            x_samples = alpha+beta*np.tan(theta_samples)
            o.append(x_samples)
        return np.array(o)
    
    In [9]:
    o= run_trials()
    

    The following figure shows the histogram of the measurements. As shown, there are many measurements away from the central part. This is the cause of our widely varying average. What if we just trimmed away the excess outliers? Would that leave us with an easier to implement procedure for estimating $\alpha$?

    In [10]:
    hist(o[np.where(abs(o)<200)]);
    

    The following graph shows what happens when we include only a relative neighborhood around zero in our calculation of the average value. Note that the figure shows a wide spread of average values depending upon how big a neighborhood around zero we decide to keep. This is an indication that the average is not a good estimator for our problem because it is very sensitive to outliers.

    In [11]:
    plot(range(100,10000,100),[o[np.where(abs(o)<i)].mean() for i in range(100,10000,100)],'-o')
    xlabel('width of neighborhood around zero')
    ylabel('value of average estimate')
    
    Out[11]:
    <matplotlib.text.Text at 0x9a27430>

    For some perspective, we can wrap our maximum likelihood estimator code in one function and then examine the variance of the estimator using our set of synthetic trials data. Note that this takes a long time to run!

    In [12]:
    def ML_estimator(x_samples):
        dL=sum((xk-a)/(1+(xk-a)**2)  for xk in x_samples)
        a_x  = fmin_l_bfgs_b(S.lambdify(a,(dL)**2),0,bounds=[(-3,3)],approx_grad=True)
        return a_x[0]
    
    In [13]:
    # run maximum likelihood estimator on synthetic data we generated earlier
    # Beware this may take a long time!
    v= np.hstack([ML_estimator(o[i,:]) for i in range(o.shape[0])])
    
    In [14]:
    vmed= np.hstack([np.median(o[i,:]) for i in range(o.shape[0])])
    vavg =  np.hstack([np.mean(o[i,:]) for i in range(o.shape[0])])
    fig,ax = subplots()
    ax.plot(v,'-o',label='ML')
    ax.plot(vmed,'gs-',label='median')
    ax.plot(vavg,'r^-',label='mean')
    ax.axis(ymax=2,ymin=0)
    ax.legend(loc=(1,0))
    ax.set_xlabel('Trial Index')
    ax.set_ylabel('Value of Estimator')
    
    Out[14]:
    <matplotlib.text.Text at 0x9a2ca30>

    The above chart shows that the using the mean-based estimator jumps all over the place while the maximum likelihood (ML) and median-based estimators are less volatile. The next chart explores the relationship between the ML and median-based estimators and checks whether one is biased compared to the other. The figure below shows that (1) there is a small numerical difference between the two estimators (2) neither is systematically different from the other (otherwise, the diagonal would not split them so evenly ).

    In [15]:
    fig,ax = subplots()
    ii= np.argsort(v)
    ax.plot(v[ii],vmed[ii],'o',alpha=.3)
    axs = ax.axis()
    ax.plot(linspace(0,2,10),linspace(0,2,10))
    ax.axis(axs)
    ax.set_aspect(1)
    ax.set_xlabel('ML estimate')
    ax.set_ylabel('Median estimate')
    
    Out[15]:
    <matplotlib.text.Text at 0x9a45d30>
    In [16]:
    fig,ax = subplots()
    ax.hist(v,10,alpha=.3,label='ML')
    ax.hist(vmed,10,alpha=.3,label='median')
    ax.legend(loc=(1,0))
    
    Out[16]:
    <matplotlib.legend.Legend at 0x1aa258f0>

    Maximum A-Posteriori (MAP) Estimation

    Let's use a uniform distribution for the prior of $\alpha$ around some bracketed interval

    $$ f(\alpha) = \frac{1}{\alpha_{max}-\alpha_{min}} \quad where \quad \alpha_{max} \le \alpha \le \alpha_{min}$$

    and zero otherwise. We can compute this sample-by-sample to see how this works using this prior.

    In [17]:
    alpha  = a
    alphamx,alphamn=3,-3
    g = f(x_samples[0])
    xi = linspace(alphamn,alphamx,100)
    mxval = S.lambdify(a,g)(xi).max()
    plot(xi,S.lambdify(a,g)(xi),x_samples[0],mxval*1.1,'o')
    
    Out[17]:
    [<matplotlib.lines.Line2D at 0x1aacd610>,
     <matplotlib.lines.Line2D at 0x1aacd770>]
    In [21]:
    xi = linspace(alphamn,alphamx,100)
    palpha=S.Piecewise((1,abs(x)<3),(0,True))
    
    def slide_figure(n=0):
        fig,ax=subplots()
        palpha=S.Piecewise((1,abs(x)<3),(0,True))
        if n==0:
            ax.plot(xi,[S.lambdify(x,palpha)(i) for i in xi])
            ax.plot(x_samples[0],1.1,'o',ms=10.)
        else:
            g = S.prod(map(f,x_samples[:n]))
            mxval = S.lambdify(a,g)(xi).max()*1.1
            ax.plot(xi,S.lambdify(a,g)(xi))
            ax.plot(x_samples[:n],mxval*ones(n),'o',color='gray',alpha=0.3)
            ax.plot(x_samples[n],mxval,'o',ms=10.)
            ax.axis(ymax=mxval*1.1)
        ax.set_xlabel(r'$\alpha$',fontsize=18)
        ax.axis(xmin=-17,xmax=17)
    
    In [22]:
    interact(slide_figure, n=(0,15,1));
    

    Does the order matter?

    In [23]:
    fig,axs = subplots(2,6,sharex=True)
    fig.set_size_inches((10,3))
    for n,ax in enumerate(axs.flatten()):
        if n==0:
            ax.plot(xi,[S.lambdify(x,palpha)(i) for i in xi])
            ax.plot(x_samples[0],1.1,'o',ms=10.)
        else:
            g = S.prod(map(f,x_samples[:n]))
            mxval = S.lambdify(a,g)(xi).max()*1.1
            ax.plot(xi,S.lambdify(a,g)(xi))
            ax.plot(x_samples[:n],mxval*ones(n),'o',color='gray',alpha=0.3)
            ax.plot(x_samples[n],mxval,'o',ms=10.)
            ax.axis(ymax=mxval*1.1)
            ax.set_yticklabels([''])
            ax.tick_params(labelsize=6)
    
    In [24]:
    # use random  order for first 12 samples
    x_samples[:12]= np.random.permutation(x_samples[:12])
    fig2,axs = subplots(2,6,sharex=True)
    fig2.set_size_inches((10,3))
    
    for n,ax in enumerate(axs.flatten()):
        if n==0:
            ax.plot(xi,[S.lambdify(x,palpha)(i) for i in xi])
            ax.plot(x_samples[0],1.1,'o',ms=10.)
        else:
            g = S.prod(map(f,x_samples[:n]))
            mxval = S.lambdify(a,g)(xi).max()*1.1
            ax.plot(xi,S.lambdify(a,g)(xi))
            ax.plot(x_samples[:n],mxval*ones(n),'o',color='gray',alpha=0.3)
            ax.plot(x_samples[n],mxval,'o',ms=10.)
            ax.axis(ymax=mxval*1.1)
            ax.set_yticklabels([''])
            ax.tick_params(labelsize=6)
    
    In [25]:
    fig
    
    Out[25]:

    Using Monte Carlo Methods

    In [26]:
    import pymc as pm
    
    WARNING (theano.configdefaults): g++ not detected ! Theano will be unable to execute optimized C-implementations (for both CPU and GPU) and will default to Python implementations. Performance will be severely degraded.
    
    $$f(x \mid \alpha, \eta) = \frac{1}{\pi \beta [1 + (\frac{x-\alpha}{\beta})^2]}$$
    In [27]:
    with pm.Model() as model:
        th = pm.Uniform('th',lower=-pi/2,upper=pi/2)
        alpha= pm.Uniform('a',lower=-3,upper=3)
        xk = pm.Cauchy('x',beta=1,alpha=alpha,observed=x_samples)
        start = pm.find_MAP()
        step = pm.NUTS()
        trace = pm.sample(300, step, start)
    
    c:\Python27\lib\site-packages\theano-0.6.0-py2.7.egg\theano\gradient.py:512: UserWarning: grad method was asked to compute the gradient with respect to a variable that is not part of the computational graph of the cost, or is used only by a non-differentiable operator: th
      handle_disconnected(elem)
    c:\Python27\lib\site-packages\theano-0.6.0-py2.7.egg\theano\gradient.py:532: UserWarning: grad method was asked to compute the gradient with respect to a variable that is not part of the computational graph of the cost, or is used only by a non-differentiable operator: <DisconnectedType>
      handle_disconnected(rval[i])
    c:\Python27\lib\site-packages\theano-0.6.0-py2.7.egg\theano\tensor\subtensor.py:110: FutureWarning: comparison to `None` will result in an elementwise object comparison in the future.
      start in [None, 0] or
    
     [-----------------100%-----------------] 300 of 300 complete in 424.9 sec
    In [28]:
    pm.traceplot(trace);
    
    In [ ]:
     
    
  • 熊孩子不小心把弟弟锁进快递柜 消防5分钟破拆 2019-08-20
  • 春运首日 追忆那些年我们坐过的绿皮车 2019-08-20
  • 推动乡村振兴的重要举措(新知新觉) 2019-08-16
  • 中国民生投资集团获第十二届人民企业社会责任奖年度企业奖 2019-08-12
  • 蓝天下的河北—波光潋滟衡水湖 2019-08-10
  • 美对500亿美元中国商品征税 外媒美做法很不体面 2019-08-09
  • 栗战书主持召开十三届全国人大常委会第五次委员长会议 决定十三届全国人大常委会第三次会议6月19日至22日举行 2019-08-09
  • 受“闰月”影响今年端午节比去年“晚”了19天 2019-07-28
  • 海淀创业园双创文化:工艺师传播纸制花艺术与花道 2019-07-28
  • 朝韩将军级会谈时隔11年后在板门店重启 2019-07-23
  • Conférence de presse du Premier ministre chinois 2019-07-09
  • 大观园举办“古琴雅集” 市民感受“古韵端午” 2019-06-30
  • 文脉颂中华——黄河新闻网 2019-06-30
  • 水费欠账竟“穿越”16年?用户质疑:为何没见催缴过? 2019-06-21
  • 买房怎么看风水这个真的实在是太重要了 ——凤凰网房产北京 2019-06-11
  • 河北时时11选五开奖结果查询 安徽十一选五走势多少 六台宝典2019历史开奖记彔 山东时时开奖结果走势图 四川时时网站 王牌资料三码中特 北京pk走势图 河南彩票22选5预测 时时黑马计划免费版 皇家世界彩 新时时彩胆杀号 陕西十一选五开奖号 本彩网旧版走势图 手机wifi黑客工具 上海时时计划软件哪个好用 小捣蛋走势图时时彩下载