KNIME
   HOME

TheInfoList



OR:

KNIME (), the Konstanz Information Miner, is a data
analytics Analytics is the systematic computational analysis of data or statistics. It is used for the discovery, interpretation, and communication of meaningful patterns in data, which also falls under and directly relates to the umbrella term, data sc ...
, reporting and integrating platform. KNIME integrates various components for
machine learning Machine learning (ML) is a field of study in artificial intelligence concerned with the development and study of Computational statistics, statistical algorithms that can learn from data and generalise to unseen data, and thus perform Task ( ...
and data mining through its modular data pipelining "Building Blocks of Analytics" concept. A
graphical user interface A graphical user interface, or GUI, is a form of user interface that allows user (computing), users to human–computer interaction, interact with electronic devices through Graphics, graphical icon (computing), icons and visual indicators such ...
and use of Java Database Connectivity (JDBC) allows assembly of nodes blending different data sources, including preprocessing ( extract, transform, load (ETL)), for modeling, data analysis and visualization with minimal, or no, programming. It is
free and open-source software Free and open-source software (FOSS) is software available under a license that grants users the right to use, modify, and distribute the software modified or not to everyone free of charge. FOSS is an inclusive umbrella term encompassing free ...
released under a
GNU General Public License The GNU General Public Licenses (GNU GPL or simply GPL) are a series of widely used free software licenses, or ''copyleft'' licenses, that guarantee end users the freedom to run, study, share, or modify the software. The GPL was the first ...
. Since 2006, KNIME has been used in pharmaceutical research, and in other areas including customer relationship management (CRM) and data analysis,
business intelligence Business intelligence (BI) consists of strategies, methodologies, and technologies used by enterprises for data analysis and management of business information. Common functions of BI technologies include Financial reporting, reporting, online an ...
, text mining and financial data analysis. Recently, attempts were made to use KNIME as robotic process automation (RPA) tool. KNIME's headquarters are based in Zurich, with other offices in Konstanz, Berlin, and Austin (USA).


History

Development of KNIME began in January 2004, with a team of software engineers at the University of Konstanz, as an open-source platform. The original team, headed by Michael Berthold, came from a Silicon Valley pharmaceutical industry software company. The initial goal was to create a modular, highly scalable and open data processing platform that allows easy integration of different data loading, processing, transforming, analyzing, and visual exploring modules, without focus on any one application area. The platform was intended for collaborating, research, and for integrating various other data analysis projects. In 2006, the first version of KNIME was released. Several pharmaceutical companies began using KNIME, and several life science software vendors began integrating their tools into the platform. Later that year, after an article in the German magazine '' c't'', users from a number of other areas joined ship. As of 2012, KNIME is in use by over 15,000 actual users (i.e. not counting downloads, but users regularly retrieving updates) in the life sciences and at banks, publishers, car manufacturer, telcos, consulting firms, and various other industries, and a large number of research groups, worldwide. Latest updates to KNIME Server and KNIME Big Data Extensions, provide support for Apache Spark 2.3, Parquet and HDFS-type storage. For the sixth year in a row, KNIME has been placed as a leader for ''
data science Data science is an interdisciplinary academic field that uses statistics, scientific computing, scientific methods, processing, scientific visualization, algorithms and systems to extract or extrapolate knowledge from potentially noisy, stru ...
'' and ''
machine learning Machine learning (ML) is a field of study in artificial intelligence concerned with the development and study of Computational statistics, statistical algorithms that can learn from data and generalise to unseen data, and thus perform Task ( ...
'' platforms in
Gartner Gartner, Inc. is an American research and advisory firm focusing on business and technology topics. Gartner provides its products and services through research reports, conferences, and consulting. Its clients include large corporations, gover ...
's Magic Quadrant.


Design philosophy, features

These are the design principles and features that KNIME software follows: * Visual, Interactive Framework: KNIME Software prioritizes a
user-friendly Usability can be described as the capacity of a system to provide a condition for its users to perform the tasks safely, effectively, and efficiently while enjoying the experience. In software engineering, usability is the degree to which a softw ...
and intuitive approach to data analysis. This is achieved through a visual and interactive framework where data flows can be combined using a drag-and-drop interface. Users can develop customized and interactive applications by creating simple to advanced and highly-automated data pipelines. These may include, for example, access to
database In computing, a database is an organized collection of data or a type of data store based on the use of a database management system (DBMS), the software that interacts with end users, applications, and the database itself to capture and a ...
s,
machine learning Machine learning (ML) is a field of study in artificial intelligence concerned with the development and study of Computational statistics, statistical algorithms that can learn from data and generalise to unseen data, and thus perform Task ( ...
libraries, logic for workflow control (e.g., loops, switches, etc.), abstraction (e.g., interactive widgets), invocation, dynamic data apps, integrated deployment, or error handling. * Modularity: processing units and data containers should remain independent of each other. This design choice enables easy distribution of computation and allows for the independent development of different
algorithm In mathematics and computer science, an algorithm () is a finite sequence of Rigour#Mathematics, mathematically rigorous instructions, typically used to solve a class of specific Computational problem, problems or to perform a computation. Algo ...
s.
Data type In computer science and computer programming, a data type (or simply type) is a collection or grouping of data values, usually specified by a set of possible values, a set of allowed operations on these values, and/or a representation of these ...
s within KNIME are encapsulated, meaning no types are predefined. This design choice facilitates adding new data types, and integrating them with extant types, while including type-specific renderers and comparators. This principle also enables inspecting results at the end of each single data operation. *Extensibility: KNIME Software is designed to be extensible. Adding new processing nodes or views is made simple through a plug-in mechanism. This mechanism ensures that users can distribute their custom functionalities without the need for complicated install or uninstall procedures. *Interleaving No-Code with Code: the platform supports integrating both visual programming ( no-code) and script-based programming (e.g., Python, R,
JavaScript JavaScript (), often abbreviated as JS, is a programming language and core technology of the World Wide Web, alongside HTML and CSS. Ninety-nine percent of websites use JavaScript on the client side for webpage behavior. Web browsers have ...
) approaches to data analysis. This design principle is termed low-code. * Automation and Scalability: for example, the use of parameterization via flow variables, or the encapsulation of workflow segments in components contribute to reduce manual work and errors in analyses. Further, the scheduling of workflow execution (available in KNIME Business Hub and KNIME Community Hub for Teams) reduces dependency on human resources. In terms of scalability, a few examples include the ability to handle large datasets (millions of rows), execute multiple processes simultaneously out of the box and reuse workflow segments. * Full Usability: due to the
open source Open source is source code that is made freely available for possible modification and redistribution. Products include permission to use and view the source code, design documents, or content of the product. The open source model is a decentrali ...
nature, KNIME Analytics Platform provides free full usability with no limited trial periods.


Internals

KNIME allows users to visually create data flows (or pipelines), selectively execute some or all analysis steps, and later inspect the results, models, using interactive widgets and views. KNIME is written in Java and based on Eclipse. It makes use of an extension mechanism to add plug-ins providing added functions. The core version includes hundreds of modules for data integration (file
input/output In computing, input/output (I/O, i/o, or informally io or IO) is the communication between an information processing system, such as a computer, and the outside world, such as another computer system, peripherals, or a human operator. Inputs a ...
(I/O), database nodes supporting all common database management systems through JDBC or native connectors: SQLite, MS-Access, SQL Server, MySQL, Oracle, PostgreSQL, Vertica and H2), data transformation (filter, converter, splitter, combiner, joiner), and the commonly used methods of statistics, data mining, analysis and text analytics. Visualization is supported with the Report Designer extension. KNIME workflows can be used as data sets to create report templates that can be exported to document formats such as doc, ppt, xls, pdf and others. Other KNIME abilities are: * KNIMEs core-architecture allows processing of large data volumes that are only limited by the available hard disk space (not limited to the available RAM). E.g., KNIME allows analyzing 300 million customer addresses, 20 million cell images, and 10 million molecular structures. * Added plug-ins allow integrating methods for text mining, image mining, time series analysis, and networking. * KNIME integrates various other open-source projects, e.g., machine learning algorithms from Weka, H2O.ai, Keras, Spark, the R project and LIBSVM; plotly, JFreeChart, ImageJ, and the Chemistry Development Kit. KNIME is implemented in
Java Java is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea (a part of Pacific Ocean) to the north. With a population of 156.9 million people (including Madura) in mid 2024, proje ...
, allows for wrappers calling other code, in addition to providing nodes that allow it to run
Java Java is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea (a part of Pacific Ocean) to the north. With a population of 156.9 million people (including Madura) in mid 2024, proje ...
, Python, R,
Ruby Ruby is a pinkish-red-to-blood-red-colored gemstone, a variety of the mineral corundum ( aluminium oxide). Ruby is one of the most popular traditional jewelry gems and is very durable. Other varieties of gem-quality corundum are called sapph ...
and other code fragments.


License

In 2024, KNIME version 5.3 is released under the same GPLv3 license as previous versions. As of version 2.1, KNIME is released under the
GPLv3 The GNU General Public Licenses (GNU GPL or simply GPL) are a series of widely used free software licenses, or ''copyleft'' licenses, that guarantee end users the freedom to run, study, share, or modify the software. The GPL was the first ...
license, with an exception that allows others to use the well-defined node application programming interface ( API) to add proprietary extensions. This allows
commercial software Commercial software, or, seldom, payware, is a computer software that is produced for sale or that serves commercial purposes. Commercial software can be proprietary software or free and open-source software. Background and challenge While ...
vendors to add wrappers calling their tools from KNIME.


Courses

KNIME allows the performance of data analysis without programming skills. Several free, online courses are provided.


See also

* Weka – machine-learning algorithms that can be integrated in KNIME * ELKI – data mining framework with many clustering algorithms * Keras – neural network library * Orange – an open-source data visualization, machine learning and data mining toolkit with a similar visual programming front-end * List of free and open-source software packages


References


External links

* *
KNIME Hub
- Official community platform to search and find nodes, components, workflows and collaborate on new solutions
Nodepit
- KNIME node collection supporting versioning and node installation {{DEFAULTSORT:Knime Data mining and machine learning software Extract, transform, load tools Free bioinformatics software Free software programmed in Java (programming language) Free software projects Image processing software