Math Refresher: 2007-04-29

Thursday, May 03, 2007

Dimension of a Vector Space

The content in today's blog is taken from Hans Schnedier and George Phillip Barker's Matrices and Linear Algebra.

Definition 1: Finite-dimensional space

A vector space V is called finite-dimensional if and only if there is a finite family of vectors spanning V.

[For background: see Definition 2 here for vector space; see Definition 4 here for family of vectors; see Definition 3 here for spanning a vector space.]

Theorem 1:

Let V be a nonzero finite-dimensional vector space. Then:
(1) There is a finite basis for V.
(2) All bases for V have the same number of elements.

Proof:

(1) Let V be a nonzero finite-dimensional vector space.

[see Definition 2 here for vector space; see Definition 1 above for finite-dimensional]

(2) Then, there exists a finite family (x¹, ..., xⁿ) that spans V.

[See Definition 1 above; see Definition 4 here for family of vectors; see Definition 3 here for spanning a vector space.]

(3) Then there is a finite basis for V.

[See Corollary 1.1, here; see Definition 2 here for basis]

(4) Let X = the family of vectors (x¹, ..., x^m)

(5) Let Y = the family of vectors (y¹, ..., yⁿ)

(6) Let X,Y be two bases for V.

(7) X is linearly independent [see Definition 2, here]

(8) Y spans V. [see Definition 2, here]

(9) Using Theorem 2 here, we can conclude that m ≤ n.

(10) Since Y is linearly independent and X spans V, we can also conclude that n ≤ m.

(11) Hence, we have shown that m = n.

(12) This proves that all bases for V have the same number of elements.

QED

From Theorem 1, since all bases have the same number of elements, it makes sense to define a notation for this number of elements.

Definition 2: Dimension

Let V ≠ {0} be a finite-dimensional vector space and let X be a basis for V. Then |X| = n is called the dimension of V. We write dim V = n. If V = {0}, we define dim V = 0.

Lemma 2:

Let V be a nonzero finite-dimensional vector space.

Then, if the family of vectors (x¹, ..., x^m) is linearly independent, then there exists the family of vectors (y¹, ..., yⁿ) such that:

(x¹, ..., x^m, y¹, ..., yⁿ) is a basis.

Proof:

(1) Since V is a nonzero finite-dimensional space, there exists a family of vectors Y = (y¹, ..., y^p) such that Y is the basis for V. [see Theorem 1 above]

(2) Since X = (x¹, ..., x^m) is linearly independent, we can apply Theorem 1 here to show that there exists a family of vectors (x¹, ..., x^m, y¹, ..., yⁿ) which is a basis.

QED

Lemma 3:

Let n = dim V where n is greater than 0.

(1) Then every linearly independent family in V has at most n members.
(2) Every family spanning V has at least n members.

Proof:

(1) Let the family of vectors (x¹, ..., x^t) be a linearly independent family of vectors.

(2) By Lemma 2 above, it can be completed to a basis (x¹, ..., x^t, y¹, ..., y^r).

(3) By Theorem 1 above, t + r = n since t + r and n are both bases and all bases have the same number of elements.

(4) Hence, t ≤ n.

(5) Let [[ y¹, ..., y^u ]] = V.

(6) By a previous result (see Corollary 1.1, here), we can reduce (y¹, ..., y^u) to a basis (y¹, ..., y^s)

(7) So that s ≤ u since [[y¹, ..., y^s]] ⊆ [[y¹, ..., y^u]]

(8) Using Theorem 1 above, s = n which means that n ≤ u.

QED

Lemma 4:

Let W be a subspace of a finite-dimensional vector space V.

Then dim W ≤ dim V

Proof:

(1) Let n = dim V

(2) Let W be a nonzero subspace of V. [See Definition 3 here for definition of subspace]

(3) Since W is nonzero, we know that there exists x such that x ≠ 0 and x ∈ W.

(4) Let (x¹, x², ..., x^r) be a linearly independent family of elements of W. [See Definition 1 here for linearly independent]

(5) Then, we know that r ≤ n by Lemma 3 above.

(6) Let m be the largest integer for which there is a linearly independent family (x¹, ..., x^m) with each xⁱ ∈ W.

(7) Using Lemma 3 above, we can see that m ≤ n.

(8) I will now show that (x¹, ..., x^m) is a basis for W.

(9) Since W is nonzero, there exists y such that y ∈ W.

(10) Let Y = the family of vectors (x¹, x², ..., x^m, y).

(11) By the choice of m, this family is linearly dependent since we assumed that m is the largest family of linearly independent vectors.

(12) Using Lemma 4 here, we know that some element of Y is a linear combination of the preceding elements.

(13) But x^j is not a linear combination of xⁱ for i is less than j from step #4 since (x¹, x², ..., x^m) is linearly independent. [See Corollary 3.1, here]

(14) Hence, y is a linear combination of (x¹, ..., x^m).

(15) Thus, (x¹, ..., x^m) spans W since this is true for all y and y can be any element of W. [See step #9 above]

(16) Since (x¹, ..., x^m) is linearly independent and spans W this shows that it is a basis. [See Definition 2 here for definition of basis]

QED

References

Hans Schneider, George Philip Barker, Matrices and Linear Algebra, 1989.

Vector spaces

Today's blog is taken straight from Matrices and Linear Algebra by Hans Schneider and George Philip Barker.

Definition 1: Vector

A vector is any matrix of the form 1 x n or m x 1. [See Definition 1, here for review of matrices if needed]

Definition 2: Vector Space

A vector space V is a set of vectors such that the following 10 properties are met:

(1) V is nonempty.

(2) V is closed on addition:

If x,y ∈ V, then x + y ∈ V.

(3) V is closed on scalar multiplication:

For any number α, if x ∈ V, then αx ∈ V.

(4) Addition is associative:

For x,y,z ∈ V, (x + y) + z = x + (y + z)

(5) Existence of 0:

0 ∈ V such that for all x ∈ V, x + 0 = x = 0 + x.

(6) Existence of negative:

For each vector x ∈ V, there is a vector y ∈ V such that:

x + y = 0 = y + x

(7) Addition is commutative:

For all vectors x,y ∈ V, x + y = y + x.

(8) All vectors in V are distributive:

For all numbers α and for all x,y ∈ V:
α(x + y) = αx + αy

For all numbers α, β and for all x ∈ V:
(α + β)x = αx + βx

(9) Scalar multiplication is associative:

For all numbers α, β and for all x ∈ V:
(αβ)x = α(βx)

(10) Existence of 1:

For all x ∈ V, 1x = x.

This definition is very important. For example, it is used to establish Cramer's Rule regarding matrices.

Definition 3: subspace

A vector space W is a subspace if and only if there exists a vector space V such that W is a nonempty subset of V.

We can use the following to determine if a nonepty subset of a vector space is a subspace:

Theorem 1: Criteria for a subspace

Let V be a vector space and let W be a nonempty subset of V. Then W is a subspace of V if and only if W is closed under the addition and under scalar multiplication.

Proof:

(1) Assume that W is a subspace.

(2) Then by Definition 3 above, W is a vector space.

(3) Then, W is closed under addition and under scalar multiplication. [See Definition 2, #2 and #3].

(4) Assume that W is closed under addition and under scalar multiplication.

(5) Them W is a vector space by Definition 2 above since:

(a) W is nonempty [from the given]

(b) W is closed under addition. [from the assumption in step #4]

(c) W is closed under scalar multiplication. [from the assumption in step #4]

(d) Addition in W is associative since:

Assume x,y,z ∈ W

Then x,y,z ∈ V [since W ⊆ V from the given]

Then (x + y) + z = x + (y + z) [since V is vector space, from definition 2, #4]

(e) Existence of 0 since:

Assume that x ∈ W

0x ∈ W since W is closed under scalar multiplication and 0 is a number which is scalar.

It is therefore clear that 0 exists since x + 0x = x = 0x + x.

(f) Existence of negative since:

Assume that x ∈ W

Then (-1)*x = -x ∈ W since W is closed under scalar multiplication and (-1) is a number.

And x + (-1)*x = 0*x = (-1)*x + x.

(g) Addition is commutative since:

Assume that x,y ∈ W

Then x,y ∈ V [since V ⊆ W from the given]

Then x + y = y + x [since V is a vector space, from definition 2, #7]

(h) W is distributive since:

Assume that x,y ∈ W

Then (x + y) ∈ W since W is closed under addition.

Then α(x + y) ∈ W since W is closed under scalar multiplication.

If α, β are numbers, then (α + β) is a number since numbers are closed on addition.

If x ∈ W, then it follows that (α + β)x ∈ W since W is closed under scalar multiplication.

(i) Scalar multiplication on W is associative since:

Assume x ∈ W

Then x ∈ V [since W ⊆ V from the given]

Then for any numbers α, β, (αβ)x = α(βx) [since V is a vector space, from Definition 2, #9]

(j) Existence of 1 since:

Assume x ∈ W

Then x ∈ V [since W ⊆ V from the given]

Then 1*x = x [from Definition 2, #10]

QED

Definition 4: family

A family is an ordered sequence (v₁, v₂, v₃, ..., v_i) of vectors from a given vector space such that if a vector u ≠ vector v, then (u,u,v) ≠ (u,v,u) ≠ (v,u,u) ≠ (u,v).

And here is the last definition:

Definition 5: linear combination

Let (x¹, ..., x^t) be a family of vectors in a vector space V. We call u in V a linear combination of (x¹, ..., x^t) if and only if there exists scalars α₁, ..., α_t such that:

u = α₁x¹ + ... + α_tx^t = ∑ (i=1,t) α_ixⁱ.

Theorem 2: Let V be a vector space and let x¹, ..., x^t be in V. Then the set of all linear combinations of (x¹, ..., x^t) is a subspace of V.

Proof:

(1) Let W be the set of all linear combinations of (x¹, ..., x^t). [See Definition 5 above]

(2) For all x ∈ W, it is clear that x ∈ V. [from Definition 5 above]

(3) Thus, W ⊆ V

(4) W is nonempty since if if α_i = 1,

then x¹ + ... + x^t ∈ W [since W includes all possible linear combinations of (x¹, ..., x^t).]

(5) W is closed on addition since:

(a) Assume u,v ∈ W

(b) Then, there exists α_i such that:

u = α₁x¹ + ... + α_tx^t = ∑ (i=1,t) α_ixⁱ. [since W is the set of all linear combinations of (x¹, ..., x^t).]

(c) Then, there exists β_i such that:

v = β₁x¹ + ... + β_tx^t = ∑ (i=1,t) β_ixⁱ. [since W is the set of all linear combinations of (x¹, ..., x^t).]

(d) And then (u+v) ∈ W since:

(u + v) = (α₁ + β₁)x¹ + ... + (α_t + β_t)x^t = ∑ (i=1,t) (α_i + β_i)xⁱ. [since W is the set of all linear combinations of (x¹, ..., x^t).]

(6) Likewise, W is closed under scalar multiplication since:

(a) Assume that u ∈ W.

(b) Then, there exists α_i such that:

u = α₁x¹ + ... + α_tx^t = ∑ (i=1,t) α_ixⁱ. [since W is the set of all linear combinations of (x¹, ..., x^t).]

(c) But then for any scalar γ, γu ∈ W since:

γu = (γα₁)x¹ + ... + (γα_t)x^t = ∑ (i=1,t) (γα_i)xⁱ. [since W is the set of all linear combinations of (x¹, ..., x^t).]

(7) We can now use Theorem 1 above to establish that W is a subspace of V since:

(a) V is a vector space [given]

(b) W is a nonempty subset of V. [see step #3 and step #4]

(c) W is closed under addition [see step #5] and W is closed under scalar multiplication [see step #6]

QED

References

Hans Schneider, George Philip Barker, Matrices and Linear Algebra, 1989.

Row Space

Today's blog is taken straight from Matrices and Linear Algebra by Hans Schneider and George Philip Barker.

Definition 1: Row Space

A row space of an m x n matrix A is the subspace of F_1,n spanned by the rows of A. I will denote this row space as R(A).

[For review: see Definition 3 here for definition of subspace; see Definition 3 here for definition of spanning a vector space; see Definition 2 here for definition of a vector space.]

Lemma 1: For any matrix P for which PA is defined:
(1) R(PA) ⊆ R(A)
(2) R(PA) = R(A) if P is invertible.

Proof:

(1) Let B = PA

(2) The rows of B are linear combinations of the rows of A. [See Lemma 2, here]

(3) Since the rows of B are linear combinations of the rows of A:

R(B) ⊆ R(A) [See Corollary 1.1, here]

(4) Using step #1 above gives us:

R(PA) ⊆ R(A)

(5) If P is invertible, then there exists a value P^-1 such that P^-1P = 1.

[Review: see Definition 1 here for definition of invertible.]

(6) So that:

P^-1B = P^-1PA = (P^-1P)A = 1*A = A

(7) From this, the rows of A are linear combinations of the rows of B. [See Lemma 2, here]

(8) So that:

R(A) ⊆ R(B) [See Corollary 1.1, here]

(9) Using step #1 above gives us:

R(A) ⊆ R(PA)

(10) Thus, if P is invertible, then:

R(A) ⊆ R(PA) (step #4) and R(PA) ⊆ R(A) (step #9) which gives us R(PA)=R(A)

QED

Corollary 1.1: If B is row equivalent to A, then R(B) = R(A)

Proof:

(1) If B is row equivalent to A, then B = PA and P is invertible. [See Theorem 5, here]

(2) Using Lemma 1 above, we can then conclude that R(PA) = R(A)

(3) But since B=PA, this gives us that R(B) = R(A)

QED

Definition 2: Row rank

The row rank of a matrix A is the dimension of the row space, that is, dim R(A).

[Review: see Definition 2 here for definition of the dimension of a vector space which is abbreviated as dim V.]

Corollary 1.2:

For any matrix P for which PA is defined:
(1) Row rank PA ≤ row rank A
(2) If P is invertible, then row rank PA = row rank A

Proof:

(1) Using Lemma 1 above, we have:
R(PA) ⊆ R(A)

(2) Then Row rank PA ≤ row rank A [See Lemma 4, here]

(3) If P is invertible, then R(PA) = R(A)

(4) This then gives us that Row rank PA = row rank A.

QED

Lemma 2:

Let C be any matrix in reduced echelon form with r nonzero rows. Then the row rank of C is r.

Proof:

(1) row rank of C ≤ r since c_1*, ..., c_r* span R(C).

(2) We shall show that c_1*, ..., c_r* form a basis for R(C).

[Review: See Definition 2 here for definition of basis].

(3) For suppose that

∑ (i=1,r) α_ic_i,* = [ 0 0 ... 0 ]

(4) Since C is in reduced echelon form, we know that in the leading nonzero column for each row, all other rows are 0 at that same column.

[Review: See Definition 2 here for definition of reduced echelon form].

(5) If we let l(k) be the column index of the leading nonzero column for the row k, we can be sure that:

if i ≠ k, c_i,l(k) = 0 and if i = k, then c_i,l(k) = 1. [From the Definition of Reduced Echelon Form, see Definition 2, here]

(6) This means that for each row, the sum of values at that column give us:

For each k, 1 ≤ k ≤ r,

∑ (i=1,r) α_ic_i,l(k) = α_k

[This is true since this is the sum of all the columns across rows at l(k). All c_i,l(k) are 0 except for where i = k.]

(7) Hence each α_k = 0 after combining the results of step #3 with step #6.

(8) But then if step #3 implies step #6, we can conclude that c_i,* is linearly independent. [See Definition 1 here for definition of linearly independent]

(9) Since c_i,* spans C and is linearly independent, it is clear that it is a basis for C. [See Definition 2 here for definition of basis] and its rank is r.

QED

Lemma 3: Uniqueness of Reduced Echelon Form

If R(A) = R(B) and there are C and D such that C is row equivalent to A and D is row equivalent to B where C and D are in reduced echelon form, then C=D.

Proof:

(1) Since C is row equivalent to A, there exists P such that C = PA and P is invertible [See Theorem 5, here]

(2) Likewise, since D is row equivalent to B, there exists Q such that D = QB and Q is invertible.

(3) Using Corollary 1.2 above, we can conclude:

row rank C = row rank A

row rank D = row rank B

(4) Since R(A) = R(B), then row rank A = row rank B. [see Definition 2 above; see Theorem 1 here for proof that equal vector spaces have equal dimension.]

(5) Using Lemma 2 above, we can also conclude that C and D have the same number of nonzero rows since:

(a) C and D have the same row rank since row rank C = row rank A = row rank B = row rank D.

(b) Using Lemma 2 above, we know that the row rank of a matrix in row echelon form is the number of nonzero rows.

(c) Therefore, it follows that since C,D are in row echelon form and since C,D have the same row rank, then C,D must have the same number of nonzero rows.

(6) Let r = the number of nonzero rows in C,D.

(7) Our proof will consist of two stages:
(i) Show that the leading coefficients of C and D occur in the same place.
(ii) Show that the remaining coefficients are equal.

(8) Let l(i) be the column index of the leading entry of c_i,* and let m(i) be the column index of the leading coefficient of d_i,*

(9) We prove the first part by induction.

(a) Since R(C) = R(D), we have d_k,* ∈ R(C) for all k, 1 ≤ k ≤ r, with:

d_k,* = ∑ (h=1,r) (α_k,h)c_h,* [see Definition 1 above for R(D); see Definition 3 here for definition of spanning a vector space.]

(b) First, consider:

d_1,* = ∑ (h=1,r) α_1,hc_h,* [This follows from step (a) above]

If j is less than l(1), then j is less than l(h) for h = 1, 2, ..., r. [see Definition 2 here, property (3)]

Hence c_h,j = 0 when j is less than l(1).

Now, since d_1,* = ∑ (h=1,r) α_1,hc_h,*
We know that for each d_1,j, we have:

d_1,j = ∑ (h=1,r) α_1,hc_h,j = 0

So, we have: j is less than l(1) implies that d_1,j is 0. It follows that:

l(1) ≤ m(1)

Similarly, we consider c_1* ∈ R(D) and by reversing the roles of C and D we obtain m(1) ≤ l(1).

Since we have m(1) ≤ l(1) and l(1) ≤ m(1).

It follows that:

m(1) = l(1).

(b) We next turn to the induction step.

Now assume that 1 is less than k ≤ r and assume that we have already proved that l(i) = m(i) for 1 ≤ i less than k.

For each leading nonzero, only one row has a nonzero value while the other rows have a zero value. In other words, in all columns associated with l(i), the value is either 1 if i=l(i) and 0 if i ≠ l(i).

This gives us:

if h = i, c_h,l(i) = 1

if h ≠ i, c_h,l(i) = 0 [See Definition 2 here]

Using the step in #9a, we can conclude:

d_k,l(i) = ∑ (h=1,r) α_k,hc_h,l(i) = α_k,i[Since in all but one case, c_h,l(i) = 0 and in that one case c_h,l(i)=1.]

Assume that h ≥ k and j is less than l(k)

Then j is less than l(h), and therefore, c_h,j = 0 since all values preceding the first nonzero value in a row must be 0.

But then, if all for all j is less than l(k), c_h,j = 0, it follows that using our equation in #9a above, we get:

d_k,j = ∑ (h=k,r) α_k,hc_h,j = 0

for all j is less than l(k).

Finally, if no nonzero value can be less than l(k), then it follows that the first nonzero value at row k, m(k), must be greater or equal to l(k). In other words:

l(k) ≤ m(k)

But all the arguments we have made with respect to D and C can be reversed since R(D)=R(C). We can then make the same arguments substituting D for C to prove that m(k) ≤ l(k).

Hence, l(k) = m(k) and the induction step is proved.

(c) It follows from (a) and (b) by induction that l(i) = m(i) for all i where 1 ≤ i ≤ r.

(10) We now turn to the second part of the proof that C = D.

(11) Suppose again that 1 ≤ k ≤ r, and that:

d_k,* = ∑ (h=1,r) α_k,hc_h,*.

As before, d_k,l(i) = α_k,i. [See #9b above]

But since m(i) = l(i), we now obtain d_k,m(i) = α_k,i. [This follows directly from d_k,l(i) = α_k,i]

Since d_k,m(i) = α_k,i, it follows that α_k,k = d_k,m(k)

Since d_k,m(k) = 1, it follows that α_k,k = 1.

Now, if i ≠ k, then it follows that α_k,i = d_k,m(i).

But in this case, d_k,m(i) = 0 since all other values in this column = 0.

Thus, since we have shown that:

i = k → α_k,i=1 and i ≠ k → α_k,i=0, it follows that:

d_k,* = c_k,* since d_k,* = ∑ (h=1,r) α_k,hc_h,*.

It therefore follows that C = D.

QED

Corollary 3.1:

Let A be any matrix. Then there is just one matrix in reduced echelon form that is row equivalent to A.

Proof:

(1) We know that every matrix A can be reduced to a reduced echelon form C. [See Theorem 1, here]

(2) Suppose that D is also a reduced echelon form that is row equivalent to A.

(3) Let B = A.

(4) Then using Lemma 3 above, we obtain that C = D.

QED

References

Hans Schneider, George Philip Barker, Matrices and Linear Algebra, 1989.

Subscribe to: Posts ( Atom )

Math Refresher

Thursday, May 03, 2007

Dimension of a Vector Space

Vector spaces

Row Space

About Me

Blog Archive