# Assignment 5: Latent Semantic Analysis

Computer Science 252
Neuromorphic Computing

Assignment 5: Latent Semantic Analysis / Indexing

Objectives

1. Understand the LSA (LSI) algorithm by coding it from scratch in Python.
2. Apply LSA to a real-world problem like semantic web search.

Read this four-page tutorial to see what you will need to implement for this assignment.  The tutorial uses technical vocabulary from Natural Language Processing like stemming (removing suffixes) and stop words (articles, prepositions) but is otherwise straightforward.

Part II: Code up LSA

As usual, we’ll complete the project one step at a time.  The tutorial lays out the steps very clearly, but to give you an more concrete goal to aim for, here is the output of my solution:

`Step 1: Set term weights and construct the term-document matrix A and query matrix`

`a        [1. 1. 1.] 0`
`arrived  [0. 1. 1.] 0`
`damaged  [1. 0. 0.] 0`
`delivery [0. 1. 0.] 0`
`fire     [1. 0. 0.] 0`
`gold     [1. 0. 1.] 1`
`in       [1. 1. 1.] 0`
`of       [1. 1. 1.] 0`
`shipment [1. 0. 1.] 0`
`silver   [0. 2. 0.] 1`
`truck    [0. 1. 1.] 1`

`Step 2: Decompose matrix A matrix and find the U, S and V matrices,`
`where A = USV^T`

`U=`
`[[-0.4201 -0.0748 -0.046 ]`
`[-0.2995 0.2001 0.4078]`
`[-0.1206 -0.2749 -0.4538]`
`[-0.1576 0.3046 -0.2006]`
`[-0.1206 -0.2749 -0.4538]`
`[-0.2626 -0.3794 0.1547]`
`[-0.4201 -0.0748 -0.046 ]`
`[-0.4201 -0.0748 -0.046 ]`
`[-0.2626 -0.3794 0.1547]`
`[-0.3151 0.6093 -0.4013]`
`[-0.2995 0.2001 0.4078]]`

`S=`
`[4.0989 2.3616 1.2737]`

`V=`
`[[-0.4945 -0.6458 -0.5817]`
`[-0.6492 0.7194 -0.2469]`
`[-0.578 -0.2556 0.775 ]]`

`Step 3: Implement a Rank 2 Approximation by keeping the first two columns of`
`U and V and the first two columns and rows of S.`

`U=`
`[[-0.4201 -0.0748]`
`[-0.2995 0.2001]`
`[-0.1206 -0.2749]`
`[-0.1576 0.3046]`
`[-0.1206 -0.2749]`
`[-0.2626 -0.3794]`
`[-0.4201 -0.0748]`
`[-0.4201 -0.0748]`
`[-0.2626 -0.3794]`
`[-0.3151 0.6093]`
`[-0.2995 0.2001]]`

`S=`
`[4.0989 2.3616]`

`V=`
`[[-0.4945 -0.6458 -0.5817]`
`[-0.6492 0.7194 -0.2469]]`

`Step 4: Find the new document vector coordinates in this reduced 2-dimensional`
`space.`

`d1: [-0.4945 -0.6492]`
`d2: [-0.6458 0.7194]`
`d3: [-0.5817 -0.2469]`

`Step 5: Find the new query vector coordinates in the reduced 2-dimensional`
`space.`

`q = [-0.214 0.1821]`

`Step 6: Rank documents in decreasing order of query-document cosine`
`similarities.`

`sim(q,d1) = -0.054`
`sim(q,d2) = +0.991`
`sim(q,d3) = +0.448`