Piranha (software)
   HOME

TheInfoList



OR:

Piranha is a
text mining Text mining, text data mining (TDM) or text analytics is the process of deriving high-quality information from text. It involves "the discovery by computer of new, previously unknown information, by automatically extracting information from differe ...
system. It was developed for the
United States Department of Energy The United States Department of Energy (DOE) is an executive department of the U.S. federal government that oversees U.S. national energy policy and energy production, the research and development of nuclear power, the military's nuclear w ...
(DOE) by
Oak Ridge National Laboratory Oak Ridge National Laboratory (ORNL) is a federally funded research and development centers, federally funded research and development center in Oak Ridge, Tennessee, United States. Founded in 1943, the laboratory is sponsored by the United Sta ...
(ORNL). The software processes free-text documents and shows relationships amongst them, a technique valuable across numerous data domains, from health care fraud to national security. The results are presented in clusters of prioritized relevance. Piranha uses the term frequency/inverse corpus frequency term weighting method which provides strong parallel processing of textual information, thus the ability to analyze large document sets. Piranha has six main elements: * Collecting and Extracting: Millions of documents from sources such as databases and social media can be collected and text extracted from hundreds of file formats; This information can be translated to other languages. * Storing and indexing: Documents in search servers, relational databases, etc. can be stored and indexed. * Recommending: The system can highlight the most valuable information for specific users. * Categorizing: Grouping items via supervised and semi-supervised machine learning methods and targeted search lists. * Clustering: Similarity is used to group documents hierarchically. * Visualizing: Showing relationships among documents so that users can quickly recognize connections. This work has resulted in eight patents (9,256,649, 8,825,710, 8,473,314, 7,937,389, 7,805,446, 7,693,9037, 7,315,858, 7,072,883), and commercial licenses (including TextOre and Pro2Serve), a spin-off company with the inventors, Covenant Health, and Pro2Serve called VortexT Analytics, two R&D 100 Awards, and scores of peer reviewed research publications.


References

* Cui, X., Beaver, J., St. Charles, J., Potok, T. (September 2008). Proceedings of the IEEE Swarm Intelligence Symposium, St. Louis, Mo.
Dimensionality Reduction for High Dimensional Particle Swarm Clustering
'. * Yasin, Rutrell (Nov 29, 2012) GCN.
Energy lab's Piranha puts teeth into text analysis
' * Franklin Jr., Curtis (Nov 30, 2012) Enterprise Efficiency.
Piranha Brings Affordable Big-Data to Government
' * Breeden II, John (Dec 7, 2012) GCN.
Swimming with Piranha: Testing Oak Ridge's text analysis tool
' * Kirby, Bob (Summer 2013) FedTech.
Big Data Can Help the Federal Government Move Mountains. Here's How.
' * R. M. Patton, B. G. Beckerman, T. E. Potok, G. Tourassi, "A Recommender System for Web-Based Discovery and Refinement of Information Radiologists Seek", Radiological Society of North America (RSNA), 2012 Annual Meeting, Nov. 2012, Chicago, IL, USA. * R. M. Patton, T. E. Potok, B. A. Worley, "Discovery & Refinement of Scientific Information via a Recommender System", The Second International Conference on Advanced Communications and Computation, Oct. 2012, Venice, Italy. * J. W. Reed, T. E. Potok, and R. M. Patton, "A multi-agent system for distributed cluster analysis," in Proceedings of Third International Workshop on Software Engineering for Large-Scale Multi- Agent Systems (SELMAS'04)" W16L Workshop - 26th International Conference on Software Engineering Edinburgh, Scotland, UK: IEE, 2004, pp. 152-5. * J. Reed, Y. Jiao, T. E. Potok, B. Klump, M. Elmore, and A. R. Hurson, "TF-ICF: A New Term Weighting Scheme for Clustering Dynamic Data Streams," in Proceedings of 5th International Conference on Machine Learning and Applications (ICMLA'06). vol. 0 ORLANDO, FL, 2006, pp. 258–263.


Awards

* 2007 R&D 100 Magazine's Award
Piranha (software)
'


Patents

* – ''System for gathering and summarizing internet information'' * – ''Method for gathering and summarizing internet information'' * * – ''Agent-based method for distributed clustering of textual information'' * – ''Dynamic reduction of dimensions of a document vector in a document search and retrieval system'' * {{US patent, 8473314 – ''Method and system for determining precursors of health abnormalities from processing medical records''


External links

* DOE Energy Innovlation Portal (2014)
Agent-Based Software for Gathering and Summarizing Textual and Internet Information
'.
ORNL Piranha website
Cluster computing Data mining and machine learning software Agent-based software