Steepest Descent is a well-known iterative minimization method. It
can be applied to any surface for which the gradient may be computed.
The method of Steepest Descent computes the gradient at its current
location, then travels in the opposite direction of the gradient
until it reaches a minimum in this direction. Elementary calculus
shows that at this minimum the new gradient is perpendicular to the
previous gradient. When applied to a quadratic form
with initial
guess
, Steepest Descent may be summarized as
Figure:
Steepest Descent worst-case path. The large dot is
, and the smaller
dots are the iterates
. Steepest Descent follows
until it finds a perpendicular gradient. Then it repeats the process with
the new gradient.
|
Figure:
CG worst-case path. The large dot is
, and the smaller
dots are the iterates
. Note that CG stops its first iteration
such that
, but it does not follow the
gradient for its second iteration.
|
Because we are only concerned with quadratic forms having a global
minimum and no local minima, Steepest Descent is guaranteed to
converge. However there is no guarantee that Steepest Descent will
converge quickly. For example, Figure
2.1 shows
Steepest Descent's path when started from a worst-case starting point

[
41].
The minimum point in Figure
2.1 is

, which by definition satisfies

.
Let

represent the error vector at
iteration

. We cannot compute

since we do not know

; this requires the inverse or pseudo-inverse of

.
However we can easily compute the image of

under

,
since
Define the residual vector

, and note that Equation
2.3 implies

. Since this negative gradient is the
direction Steepest Descent will travel next, we may characterize
Steepest Descent as greedily reducing the size of the residual.
Because Steepest Descent follows the gradient

until it reaches a point

for which

, the algorithm can only make square
turns. When applied to a two dimensional quadratic form with
symmetric positive semidefinite Hessian

, this property implies
that search directions must be repeated if more than two iterations
are necessary. The only way to guarantee two or fewer iterations is
to choose a starting point

for which

or

. This cannot be done
without computing

, which requires knowing the answer

.