One of the strongest points of AINetSolutions' Investigation is the Processing of the Natural Language, discipline in which the Information Retrieval is framed. Here we present a library written in java that helps to the labors of Information Retrieval.
The original IR tool was written by Raymond Mooney (http://www.cs.utexas.edu/users/mooney/ir-course/) for its IR course at the Texas University. We have changed that library according to solve some interesting points:
» Allow recursivity indexation of documents.
» Allow upper case words at the stop list.
» Allow create Experiment objects from created index and even create that objects for classes that extends from InvertedIndex.
Besides we have created an extension (com.ainetsolutions.nlp.utils.ir. WekaIndex) that allows recursive indexation of documents that are stored in directories and later to traduct the inverse index to a direct index in Weka format (arff) assigning to every document a certain category that well it is possible to assign from a Hashtable containing the files and the classes for every file or it is possible to assign to every document a category represented by the directory where it's stored (which is the best selection). |