# Cs229 Homework Solution

CS229 Problem Set #4 Solutions2(a) Supposex,y, andzare all observed, so that we obtain a training set{(x(1), y(1), z(1)), . . . ,(x(m), y(m), z(m))}. Write the log-likelihood of the parameters,and derive the maximum likelihood estimates forφ,θ0, andθ1. Note that becausep(z|x) is a logistic regression model, there will not exist a closed form estimate ofφ.In this case, derive the gradient and the Hessian of the likelihood with respect toφ;in practice, these quantities can be used to numerically compute the ML esimtate.Answer:The log-likelihood is given byℓ(φ, θ0, θ1)=logmproductdisplayi=1p(y(i)|x(i), z(i);θ0, θ1)p(z(i)|x(i);φ)=summationdisplayi:z(i)=0logparenleftbigg(1−g(φTx))1√2πσexpparenleftbigg−(y(i)−θT0x(i))22σ2parenrightbiggparenrightbigg+summationdisplayi:z(i)=1logparenleftbigg(g(φTx)1√2πσexpparenleftbigg−(y(i)−θT1x(i))22σ2parenrightbiggparenrightbiggDifferentiating with respect toθ1and setting it to 0,0set=∇θ0ℓ(φ, θ0, θ1)=∇θsummationdisplayi:z(i)=0−(y(i)−θT0x(i))2But this is just a least-squares problem on a subset of the data. In particular, if we letX0andvectory0be the design matrices formed by considering only those examples withz(i)= 0,

CS229 Problem Set #1 Solutions 2 The − λ 2 θ T θ here is what is known as a regularization parameter, which will be discussed in a future lecture, but which we include here because it is needed for Newton’s method to perform well on this task. For the entirety of this problem you can use the value λ = 0 . 0001. Using this de±nition, the gradient of ℓ ( θ ) is given by ∇ θ ℓ ( θ ) = X T z − λθ where z ∈ R m is de±ned by z i = w ( i ) ( y ( i ) − h θ ( x ( i ) )) and the Hessian is given by H = X T DX − λI where D ∈ R m × m is a diagonal matrix with D ii = − w ( i ) h θ ( x ( i ) )(1 − h θ ( x ( i ) )) For the sake of this problem you can just use the above formulas, but you should try to derive these results for yourself as well. Given a query point x , we choose compute the weights w ( i ) = exp p − || x − x ( i ) || 2 2 τ 2 P . Much like the locally weighted linear regression that was discussed in class, this weighting scheme gives more when the “nearby” points when predicting the class of a new example. (a) Implement the Newton-Raphson algorithm for optimizing ℓ ( θ ) for a new query point x , and use this to predict the class of x . The q2/ directory contains data and code for this problem. You should implement the y = lwlr(X train, y train, x, tau) function in the lwlr.m ±le. This func-tion takes as input the training set (the X train and y train matrices, in the form described in the class notes), a new query point x and the weight bandwitdh tau .

## Comments