The Collective Knowledge (CK) project is an
open-source
Open source is source code that is made freely available for possible modification and redistribution. Products include permission to use the source code, design documents, or content of the product. The open-source model is a decentralized sof ...
framework and
repository
Repository may refer to:
Archives and online databases
* Content repository, a database with an associated set of data management tools, allowing application-independent access to the content
* Disciplinary repository (or subject repository), a ...
to enable collaborative, reproducible and sustainable research and development of complex computational systems.
[
][
] CK is a small, portable, customizable and decentralized infrastructure helping researchers and practitioners:
* share their code, data and models as reusable
Python components and automation actions with unified
JSON
JSON (JavaScript Object Notation, pronounced ; also ) is an open standard file format and data interchange format that uses human-readable text to store and transmit data objects consisting of attribute–value pairs and arrays (or other s ...
API, JSON meta information, and a
UID based on
FAIR principles
* assemble portable workflows from shared components (such as multi-objective autotuning and
Design space exploration)
* automate,
crowdsource and reproduce benchmarking of complex computational systems
* unify
predictive analytics
Predictive analytics encompasses a variety of statistical techniques from data mining, predictive modeling, and machine learning that analyze current and historical facts to make predictions about future or otherwise unknown events.
In busin ...
(
scikit-learn,
R, DNN)
* enable reproducible and interactive papers
Notable usages
*
ARM uses CK to accelerate computer engineering
*
Association for Computing Machinery
The Association for Computing Machinery (ACM) is a US-based international learned society for computing. It was founded in 1947 and is the world's largest scientific and educational computing society. The ACM is a non-profit professional member ...
evaluates CK for possible integration with the ACM Digital Library sponsored by the
Sloan Foundation
The Alfred P. Sloan Foundation is an American philanthropic nonprofit organization. It was established in 1934 by Alfred P. Sloan Jr., then-president and chief executive officer of General Motors.
The Sloan Foundation makes grants to support ...
and for reproducible research
* Several
ACM-sponsored conferences use CK for the Artifact Evaluation process
*
Imperial College (London) uses CK to automate and crowdsource
compiler
In computing, a compiler is a computer program that translates computer code written in one programming language (the ''source'' language) into another language (the ''target'' language). The name "compiler" is primarily used for programs that ...
bug detection
* Researchers from the
University of Cambridge
, mottoeng = Literal: From here, light and sacred draughts.
Non literal: From this place, we gain enlightenment and precious knowledge.
, established =
, other_name = The Chancellor, Masters and Schola ...
used CK to help the community reproduce results of their publication in the International Symposium on Code Generation and Optimization (CGO'17) during Artifact Evaluation
*
General Motors (USA) uses CK to crowd-benchmark
convolutional neural network
In deep learning, a convolutional neural network (CNN, or ConvNet) is a class of artificial neural network (ANN), most commonly applied to analyze visual imagery. CNNs are also known as Shift Invariant or Space Invariant Artificial Neural Netwo ...
optimizations
* The
Raspberry Pi Foundation
The Raspberry Pi Foundation is a British charity and company founded in 2009 to promote the study of basic computer science in schools, and is responsible for developing the Raspberry Pi single-board computers.
Foundation
The Raspberry Pi Foun ...
and the
cTuning foundation
The cTuning Foundation is a global non-profit organization developing open-source tools and a common methodology to enable sustainable, collaborative and reproducible research in Computer science, perform collaborative optimization of realisti ...
released a CK workflow with a reproducible "live" paper to enable collaborative research into multi-objective autotuning and machine learning techniques
Grigori Fursin
Grigori Fursin is a British computer scientist, vice president of MLOps at OctoML and the president of the non-profit CTuning foundation. His research group created open-source machine learning based self-optimizing compiler, MILEPOST GCC, cons ...
, Anton Lokhmotov, Dmitry Savenko, Eben Upton. ''A Collective Knowledge workflow for collaborative research into multi-objective autotuning and machine learning techniques'', arXiv:1801.08024, January 2018
arXiv link
interactive report with reproducible experiments
*
IBM uses CK to reproduce
Quantum results from Nature
* CK is used to automat
MLPerf benchmark
Portable package manager for portable workflows
CK has an integrated cross-platform package manager with
Python scripts,
JSON
JSON (JavaScript Object Notation, pronounced ; also ) is an open standard file format and data interchange format that uses human-readable text to store and transmit data objects consisting of attribute–value pairs and arrays (or other s ...
API and
JSON
JSON (JavaScript Object Notation, pronounced ; also ) is an open standard file format and data interchange format that uses human-readable text to store and transmit data objects consisting of attribute–value pairs and arrays (or other s ...
meta-description to automatically rebuild software environment on a user machine required to run a given research workflow.
Reproducibility of experiments
CK enables reproducibility of experimental results via community involvement similar to
Wikipedia
Wikipedia is a multilingual free online encyclopedia written and maintained by a community of volunteers, known as Wikipedians, through open collaboration and using a wiki-based editing system. Wikipedia is the largest and most-read ref ...
and
physics
Physics is the natural science that studies matter, its fundamental constituents, its motion and behavior through space and time, and the related entities of energy and force. "Physical science is that department of knowledge which rel ...
. Whenever a new workflow with all components is shared via GitHub, anyone can try it on a different machine, with different environment and using slightly different choices (compilers, libraries, data sets). Whenever an unexpected or wrong behavior is encountered, the community explains it, fixes components and shares them back as described in.
References
External links
* Development site
* Documentation
* Public repository with crowdsourced experiments
* International Workshop on Adaptive Self-tuning Computing System (ADAPT) uses CK to enable public reviewing of publications and artifacts via
Reddit
Reddit (; stylized in all lowercase as reddit) is an American social news news aggregator, aggregation, Review site#Rating site, content rating, and Internet forum, discussion website. Registered users (commonly referred to as "Redditors") subm ...
{{FLOSS
Workflow applications
Build automation