Math Refresher: 2007

Saturday, December 08, 2007

Irreducible Polynomials

Definition 1: Irreducible Polynomials

A polynomial P ∈ F[X] is irreducible in F[X] if deg P is greater than 0 and P is not divisible by any polynomial Q ∈ F[X] such that 0 is less than deg Q is less than deg P.

NOTE: See Definition 1, here, for definition of polynomials and Definition 4, here, for definition of deg of a polynomial.

Lemma 1:

If a polynomial D divides an irreducible polynomial P,

Then, either D is a constant or deg D = deg P.

Proof:

(1) Let P be an irreducible polynomial

(2) Let D be a polynomial that divides P.

(3) Assume that D is not a constant and that deg D is less than deg P.

(4) Then D cannot divide P [by Definition 1 above]

(5) So we have a contradiction and we must reject our assumption at step #3.

QED

Corollary 1.1: Every polynomial of degree 1 is irreducible

(1) Assume that there exists a polynomial D of degree 1 that is not irreducible.

(2) Then there exists a polynomial P that divides D where deg P is less than deg D and deg P is greater than 0.

(3) But this is impossible since deg D = 1 (either deg P = 0 or deg P ≥ 1).

(4) So we reject our assumption at step #1.

QED

Theorem 2: Euclid's Lemma for Polynomials

For polynomials S,T,U if S divides TU and S is relatively prime to T, then S divides U.

Proof:

(1) Since GCD(S,T)=1, there exists polynomials V,W such that [See Corollary 3.1, here]:

SV + TW = 1

(2) Multiplying both sides by U gives us:

S(UV) + (TU)W = U

(3) This proves that S divides U since S divides TU.

QED

Corollary 2.1: Generalized Euclid's Lemma for Polynomials

If a polynomial S divides a product of r factors and is relatively prime to the first r-1 factors, then it divides the last one.

Proof:

(1) Let T = the first r-1 factors.

(2) Let U = the last factor

(3) Then S divides TU and GCD(S,T)=1 so using Theorem 2 above, we can conclude that S divides U.

QED

Corollary 2.2: If a polynomial is divisible by pairwise relatively prime polynomials, then it is divisible by their product

Proof:

(1) Let P₁, ..., P_r be relatively prime polynomials which divide a polynomial P.

(2) For r=1, P₁ = P so the corollary is true.

(3) Assume that it is true up to r-1.

(4) So, then we have:

P = P₁*..*P_r-1*Q

(5) By Corollary 2.1 above, since P_r divides P, it follows that P_r must divide Q.

(6) So, then it follows that P₁...*...*P_r must divide P.

QED

Theorem 3: Every non constant polynomial P ∈ F[X] is a finite product:

P = cP₁P₂...P_n

where c ∈ F^x and P₁, P₂, ..., P_n are monic irreducible polynomials where this factorization is unique. Order of these polynomials is not unique and there may include repeats of the same polynomial more than once.

Proof:

(1) If P is irreducible, then P = c*P₁ where c is the leading coefficient of P and P₁ = c^-1P is irreducible and monic. [See Definition 1 above and Theorem 2, here]

(2) If P is reducible, then it can be written as a product of two polynomials of degree less than deg P.

(3) If the two polynomials less than P are irreducible, then we are done. If not, then we can continue to break each reducible polynomial into a product of reducible polynomials and then using step #1, we break this product into the desired result.

(4) The next step is to prove that this product independent of order is unique.

(5) Assume that:

P = cP₁*...*P_n = dQ₁*...*Q_m

where c,d ∈ F^x and P₁, ..., P_n, Q₁, ..., Q_m are irreducible polynomials.

(6) c = d since both equal the leading coefficient of P.

(7) So, we have:

P₁*...*P_n = Q₁*...*Q_m

(8) Using Corollary 2.1 above, we know that P₁ cannot be relatively prime to all Q_i

(9) So, let's assume that there exists a Q_i such that GCD(P₁,Q_i) ≠ 1. Let's label it Q₁

(10) Let D be the unique monic GCD of P₁,Q₁. [See Theorem 2, here]

(11) Since P₁ is irreducible, then P₁ = cD for some c.

(12) Since P₁ is monic, then c = 1 and P₁ = D.

(13) Since Q₁ is monic and irreducible, we can use the same argument as in steps #11 and steps #12 to establish that Q₁ = D.

(14) Therefore, it follows that Q₁ = P₁

(15) We can make the same argument for all n elements of P_i so that we have P₂ = Q₂, P₃ = Q₃, ..., P_n = Q_n

(16) Since there must be a matched element for each P_i, it follows that n ≤ m.

(17) But we can also make the same line of argument for Q_j so that we also know that m ≤ n

(18) Therefore n = m.

QED

References

Jean-Pierre Tignol, Galois' Theory of Algebraic Equations, World Scientific, 2001

Friday, December 07, 2007

Greatest Common Divisor for Polynomials

Definition 1: Divisor for Polynomials

Let P₁, P₂ ∈ F[X].

We say that P₂ divides P₁ if there exists Q ∈ F[X] such that P₁ = P₂Q

Definition 2: GCD for Polynomials

A greatest common divisor (GCD) of P₁, P₂ is a polynomial D ∈ F[X] which has the following properties:

(a) D divides P₁ and P₂
(b) If S is a polynomial which divides P₁ and P₂, then S divides D.

Definition 3: Relatively prime polynomials

Two polynomials P₁, P₂, then P₁, P₂ are said to relatively prime polynomials if the only factors they have in common are of degree 0.

Definition 4: degree: deg

The deg of a polynomial P is the greatest integer n for which the coefficient Xⁿ in the expression of P is not zero.

Theorem 1: Euclid's Algorithm for Greatest Common Divisor for Polynomials

For any two polynomials P₁, P₂, there exists a GCD

Proof:

(1) Let P₁, P₂ be any two polynomials such that deg P₁ ≥ deg P₂

(2) If P₂ = 0, then P₁ is the GCD of P₁, P₂ [See Definition 2 above]

(3) Otherwise, we divide P₁ by P₂ using the Euclidean Division Algorithm for Polynomials. [See Theorem, here]

(4) Then there exists two polynomials Q₁, R₁ such that:

P₁ = Q₁P₂ + R₁

and deg R₁ is less than deg P₂.

(5) If R₁ = 0, then P₂ is the GCD of P₁, P₂

(6) Next, we divide P₂ by R₁ to get:

P₂ = Q₂R₁ + R₂

and deg R₂ is less than deg R₁

(7) If R₂ ≠ 0, then we can set up the following equations:

(a) R₁ = Q₃R₂ + R₃

(b) R₂ = Q₄R₃ + R₄

...

(c) R_n-2 = Q_nR_n-1 + R_n

(d) R_n-1 = Q_n+1R_n + R_n+1

(8) Since deg P₂ is greater than deg R₁ which is greater than deg R₂ which is greater than ... deg R_n which is greater deg R_n+1, this sequence cannot extend indefinitely.

(9) Therefore, R_n+1 = 0 for some n.

(10) R_n divides P₁, P₂ since:

Because R_n+1=0, R_n divides R_n-1

Because R_n divides R_n-1, from equation 7c, R_n divides R_n+1

We can now proceed up each of these implied equations in the same way until we get to 7b.

Since we have shown that R_n divides R₂ and R₃ before it, it is clear from 7a, that R_n divides R₁

It is clear from step #6 that R_n divides P₂ and clear from step #4 that R_n divides P₁

(11) Assume that P₁ and P₂ are both divisible by a polynomial S.

(12) Then by step #4, S must divide R₁.

(13) By step #6, S must divide R₂

(14) We can now use the same argument to go through the equations in step #7 to conclude that S must likewise divide R_n.

(15) This proves that R_n is the GCD for P₁,P₂ [See Definition 2 above]

QED

Definition 5: monic polynomial

A polynomial is monic if and only if its leading coefficient is 1.

Theorem 2:

Any two polynomials P₁, P₂ ∈ F[X] which are not both 0 have a unique monic greatest common divisor D₁ and a polynomial D ∈ F[X] is a greatest common divisor of P₁, P₂ if and only if D=cD₁ for some c ∈ F^x (= F - {0}).

Proof:

(1) Let P₁ ∈ F[X] be polynomials such that deg P₁ is greater than deg P₂

(2) Using Theorem 1 above, we know that there exists R_n such that R_n is the GCD of P₁, P₂

(3) Dividing R_n by its leading coefficient gives us a monic GCD of P₁,P₂

(4) Assume that D, D' are GCD's of P₁, P₂

(5) Then D divides D' and D' divides D. [See Definition 2 above]

(6) Then there exists polynomials Q,Q' ∈ F[X] such that:

D' = DQ
D = D'Q'

(7) So that:

Q = D'/D
Q' = D/D'

QQ' = (D'/D)(D/D') = 1

(8) But since Q'Q=1, they must both be constants which are inverses of each other.

(9) So, if D and D' are monic, then Q=Q'=1 and D=D' [Otherwise if Q ≠ 1, then D'=DQ would imply that D' is not be monic]

(10) Suppose D is any GCD of P₁, P₂

(11) Let D' be the unique monic GCD of P₁, P₂

(12) Let a = the leading coefficient of D.

(13) It is clear that (1/a)D = D' since the monic GCD is unique [see Step #9 above]

QED

Theorem 3:

If D is the GCD of P₁, P₂, then there exists polynomials U₁, U₂ such that:

D = P₁U₁ + P₂U₂ where U₁, U₂ ∈ F[X]

Proof #1:

(1) Using step #7c from Theorem 1 above, we have:

R_n-2 = Q_nR_n-1 + R_nwhich gives us:

R_n = R_n-2 - Q_nR_n-1

and likewise:

R_n-1 = R_n-3 - Q_n-1R_n-2

(2) We can then use the same type of equation to resolve R_n-1 to get:

R_n = R_n-2 - Q_n(R_n-3 - Q_n-1R_n-2) =

= R_n-2 - Q_nR_n-3 + Q_nQ_n-1R_n-2 =

= -R_n-3Q_n + R_n-2(1 +Q_nQ_n-1)

(3) We've now put R_n in terms of R_n-3 and R_n-2

(4) In this way, we can eliminate each R_i in terms of R_i-1 and R_i-2 up until we resolve R₁ into P₁, P₂

(5) Eventually we get to an expression of R_n such that:

R_n = P₁U₁ + P₂U₂

QED

Proof #2:

(1) From step #4 of Theorem 1 above, we have:

P₁ = Q₁P₂ + R₁

(2) Using Matrix Theory (see here for review of 2 x 2 matrices), this gives us:

(3) From step #6 of Theorem 1 above we have:

P₂ = Q₂R₁ + R₂
which gives us:

(4) We can continue this approach until step #7d:

R_n-1 = Q_n+1R_n + R_n+1

which gives us:

(5) Combining the matrix equations gives us:

(6) Now, since we know that:

(7) We see that each:

is invertible (see here for review of invertibility of matrices).

(8) We can then rearrange the equation in step #5 to be:

(9) Now, we can define U₁, U₂, U₃, U₄ such that:

(10) It now follows that:

(11) This then gives us that:

U₁*P₁ + U₂*P₂ = R_n

and

U₃*P₁ + U₄*P₂ = 0

QED

Corollary 3.1:

If P₁, P₂ are relatively prime polynomials in F[X], then there exists polynomials U₁, U₂ such that:

P₁U₁ + P₂U₂ = 1

Proof:

Since for relatively prime polynomials GCD = 1, this result follows directly from Theorem 3 above.

QED

References

Jean-Pierre Tignol, Galois' Theory of Algebraic Equations, World Scientific, 2001

Tuesday, December 04, 2007

Polynomials defined

A polynomial expression is a mathematical construct that if often used and rarely defined. The most common form is a polynomial in one variable which can be represented as:
a₀xⁿ + a₁x^n-1 + ... + a_(n-1)x + a_n

In today's blog, I will present a more formal definition and then show the very fundamental result that a polynomial is a ring. If you are not familiar with rings, then please start here.

The content in today's blog is taken from Jean Tignol's excellent Galois' Theory of Algebraic Equations.

Definition 1: Polynomial in one indeterminant with coefficients in ring A

P: N → A such that { n ∈ N : P_n ≠ 0 } is finite. The set of all polynomials in one indeterminant with coefficients in ring A can be denoted as A[X].

Example 1:

Let us consider the polynomial:

a₀ + a₁X + ... + a_nXⁿ

In this case, each a_i is a coefficient in ring A.

We can see that for each i, that i ∈ N and that there exists an integer n ∈ N such that:

if i ≤ n, then i ∈ N maps to a_i.

if i is greater than n, then i ∈ N maps to 0.

Definition 2: Addition of Polynomials

(P + Q)_n = P_n + Q_n

In other words, each of the maps corresponding to the same natural number are added together to form a new map.

Definition 3: Multiplication of Polynomials

(PQ)_n = ∑ (i+j=n) P_i*Q_j

In other words, each of the maps corresponding to the multiplication of all maps where the index of one plus the index of the other = n.

Lemma 1: If A is a ring, then A[X] is a ring which is commutative if and only if A is commutative

Proof:

(1) A[X] is a ring since:

(a) A[X] has a commutative operation of addition

P + Q = P_i + Q_i for all i.

Q + P = Q_i + P_i for all i.

Since A is a ring, P_i + Q_i = Q_i + P_i

(b) A[X] has an associative rule for addition

(P + Q) + R = (P_i + Q_i) + R_i for all i

P + (Q + R) = P_i + (Q_i + R_i) for all i

(P_i + Q_i) + R_i = P_i + (Q_i + R_i) since A is a ring.

(c) A[X] has an additive identity

Let Q be a polynomial such that all Q_i map to 0.

Then for any polynomial P, it is clear that P + Q = P_i + 0 = P_i for all i.

(d) A[X] has an additive inverse

For any polynomial P:

Let Q be a polynomial derived from P such that for all i: Q_i = -P_i

Then

P + Q = P_i + -P_i = 0 for all i.

(e) A[X] has an associative rule for multiplication

PQ = ∑ (i+j=n) P_i*Q_j for all n

(PQ)R = ∑ (i+j+k=n) (P_i*Q_j)R_k for all n =

= ∑ (i+j+k=n) P_i*(Q_jR_k) for all n

QR = (j+k=n) Q_j*R_k for all n

P(QR) = ∑ (i+j+k=n) P_i*(Q_jR_k) for all n

(f) A[X] has a distributive rule

Q + R = Q_j + Rj for all j

P(Q+R) = ∑ (i+j=n) P_i*(Q_j + R_j) =

= ∑ (i+j=n) P_iQ_j + P_iR_j

PQ = ∑ (i+j=n) P_i*Q_j for all n

PR = ∑ (i+j=n) P_i*R_j for all n.

PQ + PR = ∑ (i+j=n) P_iQ_j + ∑(i + j=n) P_iR_j for all n =

= ∑(i+j=n) P_iQ_j + P_iR_j

(Q+R)P = ∑ (i+j=n) (Q_j + R_j)*P_i =

= ∑ (i+j=n) P_iQ_j + P_iR_j

(2) If A is a commutative ring, then A[X] is a commutative ring.

(a) Assume that A is a commutative ring

(b) PQ = ∑ (i+j=n) P_i*Q_j for all n =

= ∑ (i+j=n) Q_j*P_i for all n = QP

(3) If A[X] is a commutative ring, then A is a commutative ring

(a) Assume that A[X] is a commutative ring

(b) QP = PQ = ∑ (i+j=n) P_i*Q_j for all n

(c) QP = ∑ (i+j=n) Q_j*P_i for all n

(d) So, therefore for all n, ∑ (i+j=n) P_i*Q_j = ∑ (i+j=n) Q_j*P_i

QED

References

Jean-Pierre Tignol, Galois' Theory of Algebraic Equations, World Scientific, 2001

Breakdown of rational fractions as sums of partial fractions

Lemma 1: Rational fraction as sum of relatively prime polynomials

If P,Q,S₁,S₂ are polynomials such that:

Q = S₁S₂

S₁,S₂ are relatively prime polynomials

Then, there exists polynomials P₀, P₁, P₂ such that:

P/Q = P₀ + P₁/S₁ + P₂/S₂

deg P_i is less than deg S_i for both i=1 and i=2

Proof:

(1) Since GCD(S₁,S₂) = 1, there exists polynomials T₁, T₂ such that [See Corollary 3.1, here]:

1 = S₁T₁ + S₂T₂

(2) Multiply each side by P/Q to get:

P/Q = (PS₁T₁)/Q + (PS₂T₂)/Q

(3) Since Q=S₁S₂, we now have:

P/Q = (PS₁T₁)/(S₁S₂) + (PS₂T₂)/(S₁S₂) = (PT₁)/S₂ + (PT₂)/S₁

(4) By the Euclidean Division Algorithm for Polynomials (see Theorem, here), there exists U₁, U₂, R₁, R₂ such that:

PT₁ = S₂U₂ + R₂

where deg R₂ is less than deg S₂

PT₂ = S₁U₁ + R₁

where deg R₁ is less than deg S₁

(5) Replacing PT₁ and PT₂ in step #3 gives us:

P/Q = (S₂U₂ + R₂)/S₂ + (S₁U₁ + R₁)/S₁ =

= (S₂U₂)/S₂ + R₂/S₂ + (S₁U₁)/S₁ + R₁/S₁ =

= (U₂ + U₁) + R₁/S₁ + R₂/S₂
QED

Lemma 2:

For polynomials Q,P:

If Q is irreducible and deg P is less than deg Q^m, then:

P/Q^m = P₁/Q + P₂/Q² + ... + P_m/Q^m

where deg P_i is less than deg Q for i = 1, ..., m

Proof:

(1) Using Euclidean Division for Polynomials (see Theorem, here), there exists P₁,R₁ such that:

P = P₁Q^m-1 + R₁ where deg R₁ is less than deg Q^m-1

(2) deg P₁ is less than deg Q since:

(a) Assume deg P₁ ≥ deg Q

(b) Then P₁Q^m-1 is greater than deg Q^m

(c) But this is impossible since deg P is less than deg Q^m and since P₁Q^m-1 is less than P.

(d) So we reject our assumption at step #2a.

(3) We can now repeat this same step with P₂ as a quotient for Q^m-2 and so on.

(4) Eventually, we get to:

P = P₁Q^m-1 + P₂Q^m-2 + ... + P_m-1Q + P_m

(5) If we now divide both sides by Q^m, we get:

P/Q^m = P₁/Q + P₂/Q² + ... + P^m-1/Q^m-1 + P^m/Q^m

QED

References

Jean-Pierre Tignol, Galois' Theory of Algebraic Equations, World Scientific, 2001

Tuesday, November 27, 2007

Derivative of tan x

Theorem: Derivative of tan x = sec²x

Proof:

(1) d(sin x)/dx = cos x [See Theorem 1, here]

(2) d(cos x)/dx = -sin x [See Theorem 2, here]

(3) Now, tan x = (sin x)/(cos x)

(4) Let f(x) = sin x, let g(x) = cos x

(5) If we assume that x ≠ ± π/2 and x is between π/2 and -π/2, then we can see that f(x), g(x) is differentiable for all values of x and further that g(x) ≠ 0. [See here for review of cos, sin if needed]

(6) From step #5, we can now apply the Quotient Rule (see Lemma 6, here) to get:

d(tan x)/dx = [f'(x)g(x) - f(x)g'(x)]/[g(x)]²= [cos x*cos x - sin x*(- sin x)]/[cos x]² = [cos² x + sin² x]/(cos² x)

(7) Now, using sin²x + cos²x = 1 [See Corollary 2, here], we have:

d(tan x)/dx = 1/(cos² x)

(8) Since sec x = 1/cos x, we now have:

d(tan x)/dx = sec² x

QED

Monday, November 26, 2007

Inverse Tangent

Definition 1: Inverse Tangent

y = tan^-1x if and only if tan y = x where -π/2 is less than y is less than π/2.

Lemma: sec²x = 1 + tan² x

Proof:

(1) By definition sec x = 1/cos x

(2) So, sec²x = 1/(cos² x)

(3) sin²x + cos²x = 1 [See Corollary 2, here]

(4) 1/(cos² x) = (cos²x + sin²x)/(cos²) =

= cos²x/(cos²x) + (sin²x)/(cos²) = 1 + tan²x.

QED

Theorem: D tan^-1 x = 1/(1 + x²)

Proof:

(1) Let y = tan^-1 x

(2) Then tan y = x [See Definition 1 above]

(3) d(tan y)/dx = d(x)/dx = 1

(4) d(tan y)/dx = (sec²y)dy/dx = 1 [See Theorem 1, here]

(5) dy/dx = 1/sec²y

(6) Using Lemma 1 above, we have:

dy/dx =1/(1 + tan²y)

(6) Using step #2, this gives us:

dy/dx = 1/(1 + tan²y) = 1/(1 + x²) [Since x = tan y]

QED

References

Edwards & Penny, Calculus and Analytic Geometry

Sunday, October 14, 2007

A Shorter Proof of Euler's Formula

Euler's Formula is the very famous equation:

e^ix = cos x + isin x

In a previous blog, I showed how it can be derived using the Taylor Series. In today's blog, I will show how it can be derived in an even simpler way using concepts from calculus.

Theorem: Euler's Formula

e^ix = cos x + isin x

Proof:

(1) For some number x:

let y = cos x + isin x

(2) Taking the first derivative of both sides gives us:

dy/dx = -sin x + icos x

[For details if needed:

(a) dy/dx = d(cos x + isin x)/dx

(b) d(cos x + isin x)/dx = d(cos x)/dx + d(isin x)/dx (see Lemma 3, here)

(c) d(cos x)/dx = -sin x (see Theorem 2, here)

(d) d(isin x)/dx = icos x (see Theorem 1, here) ]

(3) Since i² = -1 by definition, we have

-sin x + icos x = i(isin x + cos x)

(4) Combining step #1, #2, and #3, we get:

dy/dx = iy

[Since:

(a) y = isin x + cos x [from step #1 above]

(b) dy/dx = -sin x + icos x [from step #2 above]

(c) dy/dx = i(isin x + cos x) [from step #3 above]

(d) dy/dx = i(y)

]

(5) If we multiply (dx/y) to each side, we get:

(1/y)dy = (i)dx

(6) Now, if we take the integral of each side we get:

ln y = ix

[For details if needed:

(a) d(ln x)/dy = 1/x [See Lemma 1, here for proof]

(b) So using the Fundamental Theorem of Calculus (see Theorem 2, here), we know that:

∫ (1/x)dx = ln x + C

(c) So, ∫ (1/y)dy = ln y + C

(d) ∫ i(dx) = ix + C [Since d(ix + C)/dx = i, see Lemma 2, here]

(e) So putting this together gives us:

ln y = ix

]

(7) Now putting e the power of both sides, we get:

y = e^ix [Since e^{ln y} = y]

(8) Now combining step #1 with step #7 we get:

e^ix = cos x + isin x

QED

References

"Proof of Euler's Formula", August 15, 2007, Those Who Can Teach Blog.
"Proof of Euler's Formula (II)", August 15, 2007, Those Who Can Teach Blog.

Monday, September 17, 2007

Cramer's Rule

In today's blog, I go over the classic result known as Cramer's Rule.

Theorem: Cramer's Rule

Let an n x n matrix A represent a system of linear equations such that AX = B.

Then it follows that if Det(A) ≠ 0, then X has only one unique solution and x_i = det(A_{(C_i ↔ B)})/det(A).

Proof:

(1) Assume that Det(A) ≠ 0

(2) Then A is invertible [see Theorem 4, here]

(3) Now, A^-1B is a unique solution to AX = B [See Lemma 2, here]

(4) From Corollary 4.1, here, we have:

X = A^-1B = 1/det(A)(adj A)B

(5) Therefore, we have:

x_i = 1/det(A)*∑(j=1,n) ent_i,j(adj A))b_j

[See Definition 1 here for definition of matrix multiplication if needed]

= 1/det(A)*∑(j=1,n) b_j*(cof_j,i(A)) [See Definition 1 here for definition of adj A]

(6) Using Corollary 4.1 here, this gives us:

1/det(A)*∑(j=1,n)b_j*(cof_j,i(A)) = 1/det(A)*det(A_{(C_i ↔ B)}).

(7) Putting it all together gives us:

x_i = 1/det(A)*det(A_{(C_i ↔ B)}).

QED

References

Charles G. Cullen, Matrices and Linear Transformations, Dover Publications, Inc., 1972.
"Cramer's Rule", PlanetMath.org

Sunday, September 16, 2007

Homogeneous System of Linear Equations

In today's blog, I talk about homogeneous systems of linear equations. In essence, this extends a previous blog on systems of linear equations and their representation through matrices.

If you are not familiar with how matrices can represent systems of linear equations, start here. If you are not familiar with reduced echelon form, start here. If you are not familiar with the idea that each matrix is uniquely correlated with a specific matrix in reduced echelon form, start here.

Let Ax = b be a system of linear equations. This system is called homogeneous if and only if b=0.

In other words:

Definition 1: Homogeneous System of Linear Equations

Let Ax = b be a system of linear equations. This system of equations is called a homogeneous system of linear equations if and only if b = 0.

The important idea behind homogeneous systems of linear equations is that they always have at least one solution which is called the trivial solution.

For all matrices M, it is clear that if X is a vector of 0's, then MX = 0 where 0 is a vector of 0's. In other words, we are stating M0 = 0.

Definition 2: Trivial Solution of a Homogeneous System of Linear Equations

If MX=0 is a homogeneous system of linear equations, then it is clear that 0 is a solution. Since 0 is a solution to all homogeneous systems of linear equations, this solution is known as the trivial solution.

A nontrivial solution of a homogeneous system of linear equations is any solution to MX=0 where X ≠ 0.

The main conclusion that I will establish in today's blog is that homogeneous system of linear equations will have a nontrivial solution if and only if its determinant is 0.

To establish this point, let's consider some lemmas:

Lemma 1:

Let A be an n x n matrix that represents a homogeneous system of linear equations.

If A has an all-zero column, then there exists a nontrivial solution.

Proof:

(1) Assume that A has an all-zero column at j such that:

A =

so that for all i, a_i,j=0.

(2) Let X =

such that for all i ≠ j, x_i,1=0 and x_j,1 = 1 [Note: x_j,1 can equal any nonzero value and the argument will still hold]

(3) Let B=AX

(4) For any row i, Row_i(B) = a_i,1*0 + a_i,2*0 + ... + 0*x_j,1 + ... + 0*a_i,n = 0

(5) So, the X defined above is a nontrivial solution.

QED

Definition 3: Free Entry

For any nonzero row, a free entry is any nonzero entry that follows the leading entry, that is, the the free entry and the leading entry are in the same row but the free entry is a column that comes after the leading entry column.

Example 1: Leading Entry

Let A =

A is in reduced echelon form. In this case, row 1 has a leading entry at column 1 and a free entry at column 3. Row 2 has a leading entry at column 2 and a free entry at column 3. Row 3 does not have a leading entry or a free entry.

Lemma 2:

Let A be an n x n matrix in reduced echelon form that represents a homogeneous system of linear equations.

If A has a free entry, then there exists a nontrivial solution.

Proof:

(1) We can assume that A does not have any zero columns. If it did, then it would have a nontrivial solution by Lemma 1 above.

(2) By definition of reduced echelon form (see Definition 2, here), each column is either a leading entry for a row (one row has a 1 at this column and all other rows have a 0) or a free entry for a row (the column is never a leading entry for any row).

(3) We can now build a vector X in the following way:

(a) If a column j is a free entry (that is, it has at least one free entry in its column), then let X_j,1 = 1.

(b) If a column j has a leading entry at row t, then let X_j,1 = -(a_t,j+1 + ... + a_t,n)

NOTE: By the definition of reduced echelon form, if any row of a column is a leading entry, then all other rows of that column are 0.

(4) Let B = AX.

(5) Then, for any nonzero row i in B:

(a) Let li(i) be the column which is the leading index for row i.

(b) Row_i(B) = a_i,1*x₁ + ... + a_i,li(i)*[-(a_i,li(i)+1 + ... + a_i,n)] + ... + a_i,n*x_n

(c) Now, for row i, we know that all columns before li(i) are zero and the entry at li(i)=1, so we have:

Row_i(B) = 0*x₁ + ... + a_i,li(i)*[-(a_i,li(i)+1 + ... + a_i,n)] + ... + a_i,n*x_n =

= 1*[-(a_i,li(i)+1 + ... + a_i,n)] + ... + a_i,n*x_n

(d) For any column j that is a leading entry for another row, a_i,j=0 (from the definition of reduced echelon form), so we know that the sum for all columns after li(i) are:

a_i,li(i)+1 + ... + a_i,n

[Note: This is because either a_i,j=0 or x_j=1 when a_i,j is nonzero]

(e) So, we get that:

Row_i(B) = -(a_i,li(i)+1 + ... + a_i,n) + (a_i,li(i)+1 + ... + a_i,n) = 0

(6) We know that X is not the trivial solution since, by assumption, we assumed that A has a free entry. It therefore follows that X ≠ 0 [since if j is the column with the free entry, then x_j=1]

(7) Therefore, it follows that A has a nontrivial solution.

QED

Lemma 3:

If an n x n matrix A is in reduced echelon form and A has n nonzero rows, then A = I_n.

Proof:

(1) If A has n nonzero rows, then each row has a leading entry.

(2) This means that all n columns must have a leading entry so that for each row, there is only nonzero entry, the leading entry which is 1.

(3) But, we also know that the first row must have a leading entry in the column before the leading entry of the second row and so on.

(4) The only way that this can occur is if the leading for row 1 is in column 1 and the leading entry for row 2 is in column 2 and so on since this is the only way to order the n columns which make up the n leading entries.

(5) Thus, A must equal I_n (see Definition 1, here for definition of the Identity Matrix).

QED

Lemma 4:

Let A be an n x n matrix in reduced echelon form that represents a homogeneous system of linear equations.

If A has n nonzero rows, then there is only one solution: the trivial solution.

Proof:

(1) If A has n nonzero rows and A is in reduced echelon form, then by Lemma 3 above, A = I_n.

(2) But then we have:

I_nX = 0

(3) By the definition of the I_n (See Definition 1, here), we know that I_nX = X

(4) Thus we have:

0 = I_nX = X

QED

Lemma 5:

If an n x n matrix A is in reduced echelon form and A has a zero row, then A has a nontrivial solution.

Proof:

(1) If A has a zero row, then then there are at most n-1 columns with leading entries and one column cannot have a leading entry since:

(a) Assume that all columns have a leading entry.

(b) Then there are necessarily n leading entries

(c) But then since each leading entry must be on its own row, there must be n nonzero rows.

(d) But this is impossible since there is at least one zero row so we have a contradiction and we reject our assumption in (a).

(2) But if a column does not have a leading entry, then it is necessarily an all-zero column or it contains free entries.

(3) If it is an all zero column, the A has a nontrivial solution by Lemma 1 above. If it contains free entries, then A has a nontrivial solution by Lemma 2 above. Either way, A has a nontrivial solution.

QED

Theorem 6:

An n x n matrix that represents a homogeneous system of linear equations has a nontrivial solution if and only if its determinant = 0

Proof:

(1) Assume that a matrix has a nontrivial solution.

(2) Assume that its determinant ≠ 0

(3) By Cramer's Rule (see Theorem, here), if determinant ≠ 0, then the matrix has a unique solution. But if matrix has a unique solution, then it does not have a nontrivial solution (since we know that the trivial solution must be this unique solution).

(4) Therefore we have a contradiction and we reject our assumption at step #2 and conclude that determinant = 0.

(5) Assume that det(A) = 0

(6) Then it follows that A is not invertible. [See Theorem 4, here]

(7) Since every matrix has a reduced echelon form (see Theorem 1, here), let A_r be the reduced echelon form for A.

(8) Since A is not invertible, A_r cannot be invertible since:

(a) Assume that A_r is invertible.

(b) Since A is row equivalent A_r, there exists a matrix P that is a product of elementary matrices such that A = PA_r [See Theorem 5, see here]

(c) Since each of the elementary matrices are invertible [see Lemma 4, here] and P is the product of elementary matrices, P is invertible. [See Corollary 3.1, here]

(d) Since P is invertible and A_r is invertible by assumption, it follows that A must be invertible [See Corollary 3.1, here]

(e) But A is not invertible so we have a contradiction and we reject our assumption in step #8a.

(9) Since A_r is not invertible, it is not equal to I_n (since I_n is invertible to itself), and A_r must have a zero row since:

(a) Assume A_r did not have a zero row.

(b) Then A_r = I_n [See Lemma 3, above]

(c) But A_r ≠ I_n so we have a contradiction and we reject our assumption at step #9a.

(10) But if A_r has a nonzero row, then A_r has a nontrivial solution. [See Lemma 5 above]

(11) And if A_r has a nontrivial solution, then A has a nontrivial solution since:

(a) By from the properties of row equivalence, A_r = PA [See Theorem 5, here]

(b) Let X be a nontrivial solution for A_r such that:

A_rX = 0

(c) Then applying step #11a, we have:

A_rX = PAX = 0

(d) Now P is invertible (see step #8c above), so we can multiply P^-1 to both sides to get:

P^-1PAX = AX = P^-10=0

(e) So, if X is a nontrivial solution for A_r, it is also necessarily a nontrivial solution for A.

QED

Corollary 6.1:

An n x n matrix that represents a homogeneous system of linear equations has only the trivial solution if and only if its determinant ≠ 0

Proof:

(1) Assume that det(A)≠0

(2) Then A is invertible. [See Theorem 4, here]

(3) So, we have:

A^-1AX = A^-10

(4) Now, A^-1AX = I_nX = X [See Definition 1, here]

(5) Likewise A^-10 = 0

(6) So, X = 0

(7) Assume that the only solution is the trivial solution.

(8) Assume that Det(A) = 0

(9) But by Theorem 6 above, AX=0 must have a nontrivial solution.

(10) But this is a contradiction so we reject our assumption in step #8.

(11) Therefore Det(A) ≠ 0

QED

Subscribe to: Posts ( Atom )

Math Refresher