Text Processing of CiteSeer UMD

Program that preprocesses a collection of documents to calculate the frequency of the most common terms and identify the keywords of each document. The first time (Q2) will do it without using the stemming technique and without removing the stopwords. The second time (Q3) will use these techniques. The Approach is: First remove the stop-words and then stemming. After stemming, if the resulting word is a stop word then will be remove it.

The project name is: HW1_TextProcess and it was created using the Eclipse IDE. The program solves all the questions in a single execution (run).

Click here to see the project and the repository.