Doug Cutting
   HOME

TheInfoList



OR:

Douglass Read Cutting is a software designer, advocate for, and creator of
open-source Open source is source code that is made freely available for possible modification and redistribution. Products include permission to use and view the source code, design documents, or content of the product. The open source model is a decentrali ...
search technology. He founded two technology projects,
Lucene Apache Lucene is a free and open-source search engine software library, originally written in Java by Doug Cutting. It is supported by the Apache Software Foundation and is released under the Apache Software License. Lucene is widely used as a ...
and
Nutch Apache Nutch is a highly extensible and scalable Open-source license, open source web crawler software project. Features Nutch is coded entirely in the Java (programming language), Java programming language, but data is written in language-ind ...
, with Mike Cafarella. The
Apache Software Foundation The Apache Software Foundation ( ; ASF) is an American nonprofit corporation (classified as a 501(c)(3) organization in the United States) to support a number of open-source software projects. The ASF was formed from a group of developers of the ...
now manages both projects. Cutting and Cafarella were also co-founders of
Apache Hadoop Apache Hadoop () is a collection of open-source software utilities for reliable, scalable, distributed computing. It provides a software framework for distributed storage and processing of big data using the MapReduce programming model. Hadoop wa ...
.


Education and early career

Cutting graduated from
Stanford University Leland Stanford Junior University, commonly referred to as Stanford University, is a Private university, private research university in Stanford, California, United States. It was founded in 1885 by railroad magnate Leland Stanford (the eighth ...
in 1985 with a
bachelor's degree A bachelor's degree (from Medieval Latin ''baccalaureus'') or baccalaureate (from Modern Latin ''baccalaureatus'') is an undergraduate degree awarded by colleges and universities upon completion of a course of study lasting three to six years ...
. Prior to developing Lucene, Cutting held search technology positions at
Xerox PARC Future Concepts division (formerly Palo Alto Research Center, PARC and Xerox PARC) is a research and development company in Palo Alto, California. It was founded in 1969 by Jacob E. "Jack" Goldman, chief scientist of Xerox Corporation, as a div ...
where he worked on the Scatter/Gather algorithm Cutting, Douglass R., David R. Karger, Jan O. Pedersen, and John W. Tukey. "Scatter/gather: A cluster-based approach to browsing large document collections." SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval. (Reprinted in ACM SIGIR Forum, vol. 51, no. 2, pp. 148-159. ACM, 2017.) Pedersen, Jan O., David Karger, Douglass R. Cutting, and John W. Tukey. "Scatter-gather: a cluster-based method and apparatus for browsing large document collections." U.S. Patent 5,442,778, issued August 15, 1995. and on computational
stylistics Stylistics, a branch of applied linguistics, is the study and interpretation of texts of all types, but particularly literary texts, and spoken language with regard to their linguistic and tonal style, where style is the particular variety of l ...
. He also worked at Excite, where he was one of the chief designers of the
search engine A search engine is a software system that provides hyperlinks to web pages, and other relevant information on World Wide Web, the Web in response to a user's web query, query. The user enters a query in a web browser or a mobile app, and the sea ...
, and
Apple Inc. Apple Inc. is an American multinational corporation and technology company headquartered in Cupertino, California, in Silicon Valley. It is best known for its consumer electronics, software, and services. Founded in 1976 as Apple Comput ...
, where he was the primary author of the
V-Twin A V-twin engine, also called a V2 engine, is a two-cylinder piston engine where the cylinders are arranged in a V configuration and share a common crankshaft. The V-twin is widely associated with motorcycles, primarily installed longitudinally ...
text search framework.


Open source projects

Lucene, a search indexer, and Nutch, a spider or crawler, are the two key components of an
open-source Open source is source code that is made freely available for possible modification and redistribution. Products include permission to use and view the source code, design documents, or content of the product. The open source model is a decentrali ...
general search platform that first crawls the Web for content, and then structures it into a searchable index. Cutting's leadership of these two projects extended the concepts and capabilities of general open-source software projects such as
Linux Linux ( ) is a family of open source Unix-like operating systems based on the Linux kernel, an kernel (operating system), operating system kernel first released on September 17, 1991, by Linus Torvalds. Linux is typically package manager, pac ...
and
MySQL MySQL () is an Open-source software, open-source relational database management system (RDBMS). Its name is a combination of "My", the name of co-founder Michael Widenius's daughter My, and "SQL", the acronym for Structured Query Language. A rel ...
into the vertical domain of search. In a 2017 article, Cutting was quoted with the statement, "Open source is a requirement for business."


Use of MapReduce paradigm

In December 2004,
Google Google LLC (, ) is an American multinational corporation and technology company focusing on online advertising, search engine technology, cloud computing, computer software, quantum computing, e-commerce, consumer electronics, and artificial ...
Research published a paper on the
MapReduce MapReduce is a programming model and an associated implementation for processing and generating big data sets with a parallel and distributed algorithm on a cluster. A MapReduce program is composed of a ''map'' procedure, which performs filte ...
algorithm, which allows very large-scale computations to be trivially parallelized across large clusters of servers. Cutting and Mike Cafarella, realizing the importance of this paper to extending Lucene into the realm of extremely large search problems, created the open-source Hadoop framework. This framework allows applications based on the MapReduce paradigm to be run on large clusters of commodity hardware. Cutting was an employee of
Yahoo! Yahoo (, styled yahoo''!'' in its logo) is an American web portal that provides the search engine Yahoo Search and related services including My Yahoo, Yahoo Mail, Yahoo News, Yahoo Finance, Yahoo Sports, y!entertainment, yahoo!life, and its a ...
, where he led the Hadoop project full-time; he later went on to work for
Cloudera Cloudera, Inc. is an American data lake software company. History Cloudera, Inc. was formed on June 27, 2008 in Burlingame, California by Christophe Bisciglia, Amr Awadallah, Jeff Hammerbacher, and chief executive Mike Olson. Prior to Cloude ...
.


Open source foundations and awards

In July 2009, Cutting was elected to the board of directors of the
Apache Software Foundation The Apache Software Foundation ( ; ASF) is an American nonprofit corporation (classified as a 501(c)(3) organization in the United States) to support a number of open-source software projects. The ASF was formed from a group of developers of the ...
, and in September 2010, he was elected the chairman. In 2015, Cutting was awarded the O'Reilly Open Source Award.


References


Articles


Blog post by Tom White about Doug Cutting creating Hadoop
Note that this post was written while Hadoop was still an unnamed spinoff of
Nutch Apache Nutch is a highly extensible and scalable Open-source license, open source web crawler software project. Features Nutch is coded entirely in the Java (programming language), Java programming language, but data is written in language-ind ...
. Tom updates his earlier post with the
Hadoop Apache Hadoop () is a collection of Open-source software, open-source software utilities for reliable, scalable, distributed computing. It provides a software framework for Clustered file system, distributed storage and processing of big data usin ...
nam
here

Article co-authored by Doug Cutting in ACM Queue, 'Building Nutch: Open Source Search'


External links

*

{{DEFAULTSORT:Cutting, Doug American information theorists Living people Year of birth missing (living people) Stanford University alumni Scientists at PARC (company) Yahoo! employees Apple Inc. employees American computer programmers Open source advocates