Innovation
Simple Text Mining
University of Michigan
posted on 08/27/2010
Simple Text Mining
Innovation Details
Detailed Description
Currently, there is a lack of text/network mining software available to the typical analyst end-user. Generally available text mining algorithms require extensive programming to implement. Typically, these more complex algorithms have an extremely steep learning curve, requiring a long-term commitment of professional software developer resources. Such solutions usually cannot be implemented by the typical analyst or small business.
The University of Michigan has developed an Excel-based tool and algorithm for text mining that 'reads' blocks of unstructured text for each word in a lexicon (supplied by the user) and assembles the words found into a common network analysis data structure called an "edge list." This analysis includes additional descriptive data concerning the weight of lexicon words found. This 'weight' output allows for analysis of terms found. The network output allows for analysis of term "adjacency," i.e. appearing together in the same block of unstructured text, the computation of network analysis measures, and the production of network visualizations. Outputs include user-specified data dimensions, carried over from the text input, for easily cross-referenced and more descriptive output.
• Analysis of unstructured text for a large number of known lexical terms
• Analysis of occurrence and adjacency (co-occurrence) of terms in papers, abstracts, etc.
Advantages
• Approachability/ease-of-use (single-click processing of input text)
• Easy copy/paste of input/output data
File Number: 4730
IP Protection
License Online
|
4730 Opensource License Item type: Softwareview license
This is an Excel document with embedded algorithms. It is licensed for use under a type of Opensource License. By continuing to download and use, y...
|
[Edit] [Delete] [Test] |
|---|
Find more innovations
