BIGTOP-1082. spark package tests are missing
[bigtop.git] / bigtop-tests / test-artifacts / package / src / main / resources / package_data.xml
1 <?xml version="1.0" encoding="UTF-8" standalone="no"?>
2 <!--
3 Licensed to the Apache Software Foundation (ASF) under one or more
4 contributor license agreements. See the NOTICE file distributed with
5 this work for additional information regarding copyright ownership.
6 The ASF licenses this file to You under the Apache License, Version 2.0
7 (the "License"); you may not use this file except in compliance with
8 the License. You may obtain a copy of the License at
9
10 http://www.apache.org/licenses/LICENSE-2.0
11
12 Unless required by applicable law or agreed to in writing, software
13 distributed under the License is distributed on an "AS IS" BASIS,
14 WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
15 See the License for the specific language governing permissions and
16 limitations under the License.
17 -->
18 <packages>
19 <bigtop-utils>
20 <metadata>
21 <summary>Collection of useful tools for Bigtop</summary>
22 <description>This includes a collection of useful tools and files for Bigtop</description>
23 <url>http://bigtop.apache.org/</url>
24 </metadata>
25 </bigtop-utils>
26 <bigtop-jsvc>
27 <metadata>
28 <summary>Application to launch java daemon</summary>
29 <description>jsvc executes classfile that implements a Daemon interface.</description>
30 <url>http://commons.apache.org/daemon/</url>
31 </metadata>
32 </bigtop-jsvc>
33 <bigtop-tomcat>
34 <metadata>
35 <summary>Apache Tomcat</summary>
36 <description>Apache Tomcat is an open source software implementation of the
37 Java Servlet and JavaServer Pages technologies.</description>
38 <url>http://tomcat.apache.org/</url>
39 </metadata>
40 </bigtop-tomcat>
41 <mahout>
42 <metadata>
43 <summary>A set of Java libraries for scalable machine learning.</summary>
44 <description>Mahout's goal is to build scalable machine learning libraries.
45 With scalable we mean:
46 .
47 Scalable to reasonably large data sets. Our core algorithms for clustering,
48 classfication and batch based collaborative filtering are implemented on top of
49 Apache Hadoop using the map/reduce paradigm. However we do not restrict
50 contributions to Hadoop based implementations: Contributions that run on a
51 single node or on a non-Hadoop cluster are welcome as well. The core libraries
52 are highly optimized to allow for good performance also for non-distributed
53 algorithms.
54 Scalable to support your business case. Mahout is distributed under a
55 commercially friendly Apache Software license.
56 Scalable community. The goal of Mahout is to build a vibrant, responsive,
57 diverse community to facilitate discussions not only on the project itself but
58 also on potential use cases. Come to the mailing lists to find out more.</description>
59 <url>http://mahout.apache.org</url>
60 </metadata>
61 <deps>
62 <hadoop/>
63 <bigtop-utils/>
64 </deps>
65 <alternatives>
66 <mahout-conf>
67 <status>auto</status>
68 <link>/etc/mahout/conf</link>
69 <value>/etc/mahout/conf.dist</value>
70 <alt>/etc/mahout/conf.dist</alt>
71 </mahout-conf>
72 </alternatives>
73 </mahout>
74 <giraph>
75 <metadata>
76 <summary>Giraph is a BSP inspired graph processing platform that runs on Hadoop</summary>
77 <description>Giraph implements a graph processing platform to run large scale algorithms (such as page rank, shared connections, personalization-based popularity, etc.) on top of Hadoop infrastructure. Giraph builds upon the graph-oriented nature of Pregel but additionally adds fault-tolerance to the coordinator process with the use of ZooKeeper as its centralized coordination service.</description>
78 <url>http://incubator.apache.org/giraph/</url>
79 </metadata>
80 <deps>
81 <hadoop-client/>
82 <bigtop-utils/>
83 </deps>
84 <alternatives>
85 <giraph-conf>
86 <status>auto</status>
87 <link>/etc/giraph/conf</link>
88 <value>/etc/giraph/conf.dist</value>
89 <alt>/etc/giraph/conf.dist</alt>
90 </giraph-conf>
91 </alternatives>
92 </giraph>
93 <spark>
94 <metadata>
95 <summary>Lightning-Fast Cluster Computing</summary>
96 <description>Spark is a MapReduce-like cluster computing framework designed to support
97 low-latency iterative jobs and interactive use from an interpreter. It is
98 written in Scala, a high-level language for the JVM, and exposes a clean
99 language-integrated syntax that makes it easy to write parallel jobs.
100 Spark runs on top of the Apache Mesos cluster manager.</description>
101 <url>http://incubator.apache.org/spark/</url>
102 </metadata>
103 <deps>
104 <bigtop-utils/>
105 </deps>
106 <alternatives>
107 <spark-conf>
108 <status>auto</status>
109 <link>/etc/spark/conf</link>
110 <value>/etc/spark/conf.dist</value>
111 <alt>/etc/spark/conf.dist</alt>
112 </spark-conf>
113 </alternatives>
114 </spark>
115 <whirr>
116 <metadata>
117 <summary>Scripts and libraries for running software services on cloud infrastructure.</summary>
118 <description>Whirr provides
119 .
120 * A cloud-neutral way to run services. You don't have to worry about the
121 idiosyncrasies of each provider.
122 * A common service API. The details of provisioning are particular to the
123 service.
124 * Smart defaults for services. You can get a properly configured system
125 running quickly, while still being able to override settings as needed.
126 </description>
127 <url>http://whirr.apache.org/</url>
128 </metadata>
129 <deps>
130 <bigtop-utils/>
131 </deps>
132 </whirr>
133 <flume>
134 <metadata>
135 <summary>Flume is a reliable, scalable, and manageable distributed log collection application for collecting data such as logs and delivering it to data stores such as Hadoop's HDFS.</summary>
136 <description>Flume is a reliable, scalable, and manageable distributed data collection
137 application for collecting data such as logs and delivering it to data stores
138 such as Hadoop's HDFS. It can efficiently collect, aggregate, and move large
139 amounts of log data. It has a simple, but flexible, architecture based on
140 streaming data flows. It is robust and fault tolerant with tunable reliability
141 mechanisms and many failover and recovery mechanisms. The system is centrally
142 managed and allows for intelligent dynamic management. It uses a simple
143 extensible data model that allows for online analytic applications.</description>
144 <url>http://incubator.apache.org/projects/flume.html</url>
145 </metadata>
146 <deps>
147 <zookeeper/>
148 <hadoop/>
149 <bigtop-utils/>
150 </deps>
151 <groups>
152 <flume>
153 <user>flume</user>
154 </flume>
155 </groups>
156 <alternatives>
157 <flume-conf>
158 <status>auto</status>
159 <link>/etc/flume/conf</link>
160 <value>/etc/flume/conf.empty</value>
161 <alt>/etc/flume/conf.empty</alt>
162 </flume-conf>
163 </alternatives>
164 </flume>
165 <flume-agent>
166 <metadata>
167 <summary>The flume agent daemon is a core element of flume's data path and is responsible for generating, processing, and delivering data.</summary>
168 <description>Flume is a reliable, scalable, and manageable distributed data collection application for collecting data such as logs and delivering it to data stores such as Hadoop's HDFS. It can efficiently collect, aggregate, and move large amounts of log data. It has a simple, but flexible, architecture based on streaming data flows. It is robust and fault tolerant with tunable reliability mechanisms and many failover and recovery mechanisms. The system is centrally managed and allows for intelligent dynamic management. It uses a simple extensible data model that allows for online analytic applications.</description>
169 <url>http://incubator.apache.org/projects/flume.html</url>
170 </metadata>
171 <deps>
172 <flume>/self</flume>
173 </deps>
174 </flume-agent>
175 <solr>
176 <metadata>
177 <summary>Apache Solr is the popular, blazing fast open source enterprise search platform</summary>
178 <description>Solr is the popular, blazing fast open source enterprise search platform from
179 the Apache Lucene project. Its major features include powerful full-text
180 search, hit highlighting, faceted search, dynamic clustering, database
181 integration, rich document (e.g., Word, PDF) handling, and geospatial search.
182 Solr is highly scalable, providing distributed search and index replication,
183 and it powers the search and navigation features of many of the world's
184 largest internet sites.
185
186 Solr is written in Java and runs as a standalone full-text search server within
187 a servlet container such as Tomcat. Solr uses the Lucene Java search library at
188 its core for full-text indexing and search, and has REST-like HTTP/XML and JSON
189 APIs that make it easy to use from virtually any programming language. Solr's
190 powerful external configuration allows it to be tailored to almost any type of
191 application without Java coding, and it has an extensive plugin architecture
192 when more advanced customization is required.</description>
193 <url>http://lucene.apache.org/solr</url>
194 </metadata>
195 <deps>
196 <bigtop-tomcat/>
197 <bigtop-utils/>
198 </deps>
199 <alternatives>
200 </alternatives>
201 </solr>
202 <solr-doc>
203 <metadata>
204 <summary>Documentation for Apache Solr</summary>
205 <description>Documentation for Apache Solr</description>
206 <url>http://lucene.apache.org/solr</url>
207 </metadata>
208 <deps>
209 </deps>
210 <alternatives>
211 </alternatives>
212 </solr-doc>
213 <solr-server>
214 <metadata>
215 <summary>The Solr server</summary>
216 <description>This package starts the Solr server on startup</description>
217 <url>http://lucene.apache.org/solr</url>
218 </metadata>
219 <deps>
220 </deps>
221 <alternatives>
222 </alternatives>
223 </solr-server>
224 <sqoop>
225 <metadata>
226 <summary>Tool for easy imports and exports of data sets between databases and the Hadoop ecosystem</summary>
227 <description>Sqoop is a tool that provides the ability to import and export data sets between the Hadoop Distributed File System (HDFS) and relational databases.</description>
228 <url>http://sqoop.apache.org</url>
229 </metadata>
230 <deps>
231 <bigtop-tomcat/>
232 <bigtop-utils/>
233 <hadoop-client/>
234 <sqoop-client>/self</sqoop-client>
235 </deps>
236 <alternatives>
237 <sqoop-conf>
238 <status>auto</status>
239 <link>/etc/sqoop/conf</link>
240 <value>/etc/sqoop/conf.dist</value>
241 <alt>/etc/sqoop/conf.dist</alt>
242 </sqoop-conf>
243 </alternatives>
244 <groups>
245 <sqoop>
246 <user>sqoop</user>
247 </sqoop>
248 </groups>
249 </sqoop>
250 <sqoop-server>
251 <metadata>
252 <summary>Server for Sqoop.</summary>
253 <description>Centralized server for Sqoop.</description>
254 <url>http://sqoop.apache.org</url>
255 </metadata>
256 <deps>
257 <sqoop>/self</sqoop>
258 </deps>
259 </sqoop-server>
260 <sqoop-client>
261 <metadata>
262 <summary>Client for Sqoop.</summary>
263 <description>Lightweight client for Sqoop.</description>
264 <url>http://sqoop.apache.org</url>
265 </metadata>
266 </sqoop-client>
267 <oozie>
268 <metadata>
269 <summary>Oozie is a system that runs workflows of Hadoop jobs.</summary>
270 <description> Oozie is a system that runs workflows of Hadoop jobs.
271 Oozie workflows are actions arranged in a control dependency DAG (Direct
272 Acyclic Graph).
273
274 Oozie coordinator functionality allows to start workflows at regular
275 frequencies and when data becomes available in HDFS.
276
277 An Oozie workflow may contain the following types of actions nodes:
278 map-reduce, map-reduce streaming, map-reduce pipes, pig, file-system,
279 sub-workflows, java, hive, sqoop and ssh (deprecated).
280
281 Flow control operations within the workflow can be done using decision,
282 fork and join nodes. Cycles in workflows are not supported.
283
284 Actions and decisions can be parameterized with job properties, actions
285 output (i.e. Hadoop counters) and HDFS file information (file exists,
286 file size, etc). Formal parameters are expressed in the workflow definition
287 as ${VARIABLE NAME} variables.
288
289 A Workflow application is an HDFS directory that contains the workflow
290 definition (an XML file), all the necessary files to run all the actions:
291 JAR files for Map/Reduce jobs, shells for streaming Map/Reduce jobs, native
292 libraries, Pig scripts, and other resource files.
293
294 Running workflow jobs is done via command line tools, a WebServices API
295 or a Java API.
296
297 Monitoring the system and workflow jobs can be done via a web console, the
298 command line tools, the WebServices API and the Java API.
299
300 Oozie is a transactional system and it has built in automatic and manual
301 retry capabilities.
302
303 In case of workflow job failure, the workflow job can be rerun skipping
304 previously completed actions, the workflow application can be patched before
305 being rerun.</description>
306 <url>http://incubator.apache.org/oozie/</url>
307 </metadata>
308 <deps>
309 <oozie-client>/self</oozie-client>
310 </deps>
311 <groups>
312 <oozie>
313 <user>oozie</user>
314 </oozie>
315 </groups>
316 <alternatives>
317 <oozie-conf>
318 <status>auto</status>
319 <link>/etc/oozie/conf</link>
320 <value>/etc/oozie/conf.dist</value>
321 <alt>/etc/oozie/conf.dist</alt>
322 </oozie-conf>
323 </alternatives>
324 </oozie>
325 <oozie-client>
326 <metadata>
327 <summary>Client for Oozie Workflow Engine</summary>
328 <description>Oozie client is a command line client utility that allows remote
329 administration and monitoring of worflows. Using this client
330 utility you can submit worflows, start/suspend/resume/kill
331 workflows and find out their status at any instance. Apart from
332 such operations, you can also change the status of the entire
333 system, get vesion information. This client utility also allows
334 you to validate any worflows before they are deployed to the Oozie
335 server.</description>
336 <url>http://incubator.apache.org/oozie/</url>
337 </metadata>
338 <deps>
339 <hadoop/>
340 <bigtop-utils/>
341 </deps>
342 </oozie-client>
343 <zookeeper>
344 <metadata>
345 <summary>A high-performance coordination service for distributed applications.</summary>
346 <description>ZooKeeper is a centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services. All of these kinds of services are used in some form or another by distributed applications. Each time they are implemented there is a lot of work that goes into fixing the bugs and race conditions that are inevitable. Because of the difficulty of implementing these kinds of services, applications initially usually skimp on them ,which make them brittle in the presence of change and difficult to manage. Even when done correctly, different implementations of these services lead to management complexity when the applications are deployed.
347 </description>
348 <url>http://zookeeper.apache.org/</url>
349 </metadata>
350 <deps>
351 <bigtop-utils/>
352 </deps>
353 <groups>
354 <zookeeper>
355 <user>zookeeper</user>
356 </zookeeper>
357 </groups>
358 <alternatives>
359 <zookeeper-conf>
360 <status>auto</status>
361 <link>/etc/zookeeper/conf</link>
362 <value>/etc/zookeeper/conf.dist</value>
363 <alt>/etc/zookeeper/conf.dist</alt>
364 </zookeeper-conf>
365 </alternatives>
366 </zookeeper>
367 <zookeeper-server>
368 <metadata>
369 <summary>The Hadoop Zookeeper server</summary>
370 <description>This package starts the zookeeper server on startup</description>
371 <url>http://zookeeper.apache.org/</url>
372 </metadata>
373 <deps>
374 <zookeeper>/self</zookeeper>
375 </deps>
376 </zookeeper-server>
377 <crunch>
378 <metadata>
379 <summary>Simple and Efficient MapReduce Pipelines.</summary>
380 <description>Apache Crunch (incubating) is a Java library for writing, testing, and running
381 MapReduce pipelines, based on Google's FlumeJava. Its goal is to make
382 pipelines that are composed of many user-defined functions simple to write,
383 easy to test, and efficient to run.</description>
384 <url>http://incubator.apache.org/crunch/</url>
385 </metadata>
386 </crunch>
387 <crunch-doc>
388 <metadata>
389 <summary>Apache Crunch (incubating) documentation</summary>
390 <description>Apache Crunch (incubating) documentation</description>
391 <url>http://incubator.apache.org/crunch/</url>
392 </metadata>
393 </crunch-doc>
394 <hcatalog>
395 <metadata>
396 <summary>Apache Hcatalog (incubating) is a data warehouse infrastructure built on top of Hadoop</summary>
397 <description>
398 Apache HCatalog (incubating) is a table and storage management service for data created using Apache Hadoop.
399 This includes:
400 * Providing a shared schema and data type mechanism.
401 * Providing a table abstraction so that users need not be concerned with where or how their data is stored.
402 * Providing interoperability across data processing tools such as Pig, Map Reduce, Streaming, and Hive.
403 </description>
404 <url>http://incubator.apache.org/hcatalog</url>
405 </metadata>
406 <deps>
407 <hadoop/>
408 <bigtop-utils/>
409 <hive/>
410 </deps>
411 <alternatives>
412 <hcatalog-conf>
413 <status>auto</status>
414 <value>/etc/hcatalog/conf.dist</value>
415 <link>/etc/hcatalog/conf</link>
416 <alt>/etc/hcatalog/conf.dist</alt>
417 </hcatalog-conf>
418 </alternatives>
419 </hcatalog>
420 <hcatalog-server>
421 <metadata>
422 <summary>Server for HCatalog.</summary>
423 <description>Server for HCatalog.</description>
424 <url>http://incubator.apache.org/hcatalog</url>
425 </metadata>
426 <deps>
427 <hcatalog>/self</hcatalog>
428 </deps>
429 </hcatalog-server>
430 <pig>
431 <metadata>
432 <summary>Pig is a platform for analyzing large data sets</summary>
433 <description>Pig is a platform for analyzing large data sets that consists of a high-level language
434 for expressing data analysis programs, coupled with infrastructure for evaluating these
435 programs. The salient property of Pig programs is that their structure is amenable
436 to substantial parallelization, which in turns enables them to handle very large data sets.
437 .
438 At the present time, Pig's infrastructure layer consists of a compiler that produces
439 sequences of Map-Reduce programs, for which large-scale parallel implementations already
440 exist (e.g., the Hadoop subproject). Pig's language layer currently consists of a textual
441 language called Pig Latin, which has the following key properties:
442 .
443 * Ease of programming
444 It is trivial to achieve parallel execution of simple, "embarrassingly parallel" data
445 analysis tasks. Complex tasks comprised of multiple interrelated data transformations
446 are explicitly encoded as data flow sequences, making them easy to write, understand,
447 and maintain.
448 * Optimization opportunities
449 The way in which tasks are encoded permits the system to optimize their execution
450 automatically, allowing the user to focus on semantics rather than efficiency.
451 * Extensibility
452 Users can create their own functions to do special-purpose processing.</description>
453 <url>http://pig.apache.org/</url>
454 </metadata>
455 <deps>
456 <hadoop/>
457 <bigtop-utils/>
458 </deps>
459 <alternatives>
460 <pig-conf>
461 <status>auto</status>
462 <link>/etc/pig/conf</link>
463 <value>/etc/pig/conf.dist</value>
464 <alt>/etc/pig/conf.dist</alt>
465 </pig-conf>
466 </alternatives>
467 </pig>
468 <pig-udf-datafu>
469 <metadata>
470 <summary>A collection of user-defined functions for Hadoop and Pig.</summary>
471 <description> DataFu is a collection of user-defined functions for working with large-scale
472 data in Hadoop and Pig. This library was born out of the need for a stable,
473 well-tested library of UDFs for data mining and statistics. It is used
474 at LinkedIn in many of our off-line workflows for data derived products like
475 "People You May Know" and "Skills".
476
477 It contains functions for: PageRank, Quantiles (median), variance, Sessionization,
478 Convenience bag functions (e.g., set operations, enumerating bags, etc),
479 Convenience utility functions (e.g., assertions, easier writing of EvalFuncs)
480 and more...</description>
481 <url>https://github.com/linkedin/datafu</url>
482 </metadata>
483 <deps>
484 <pig/>
485 </deps>
486 </pig-udf-datafu>
487 <hive-jdbc>
488 <metadata>
489 <summary>Provides libraries necessary to connect to Apache Hive via JDBC</summary>
490 <description>This package provides libraries necessary to connect to Apache Hive via JDBC</description>
491 <url>http://hive.apache.org/</url>
492 </metadata>
493 <deps>
494 <hadoop-client/>
495 </deps>
496 </hive-jdbc>
497 <hive>
498 <metadata>
499 <summary>Hive is a data warehouse infrastructure built on top of Hadoop</summary>
500 <description>Hive is a data warehouse infrastructure built on top of Hadoop that
501 provides tools to enable easy data summarization, adhoc querying and
502 analysis of large datasets data stored in Hadoop files. It provides a
503 mechanism to put structure on this data and it also provides a simple
504 query language called Hive QL which is based on SQL and which enables
505 users familiar with SQL to query this data. At the same time, this
506 language also allows traditional map/reduce programmers to be able to
507 plug in their custom mappers and reducers to do more sophisticated
508 analysis which may not be supported by the built-in capabilities of
509 the language.</description>
510 <url>http://hive.apache.org/</url>
511 </metadata>
512 <deps>
513 <hadoop/>
514 <bigtop-utils/>
515 <hive-jdbc>/self</hive-jdbc>
516 </deps>
517 <alternatives>
518 <hive-conf>
519 <status>auto</status>
520 <value>/etc/hive/conf.dist</value>
521 <link>/etc/hive/conf</link>
522 <alt>/etc/hive/conf.dist</alt>
523 </hive-conf>
524 </alternatives>
525 </hive>
526 <hive-metastore>
527 <metadata>
528 <summary>Shared metadata repository for Hive.</summary>
529 <description>This optional package hosts a metadata server for Hive clients across a network to use.</description>
530 <url>http://hive.apache.org/</url>
531 </metadata>
532 <deps>
533 <hive>/self</hive>
534 </deps>
535 <groups>
536 <hive>
537 <user>hive</user>
538 </hive>
539 </groups>
540 </hive-metastore>
541 <hive-server>
542 <metadata>
543 <summary>Provides a Hive Thrift service.</summary>
544 <description>This optional package hosts a Thrift server for Hive clients across a network to use.</description>
545 <url>http://hive.apache.org/</url>
546 </metadata>
547 <deps>
548 <hive>/self</hive>
549 </deps>
550 <groups>
551 <hive>
552 <user>hive</user>
553 </hive>
554 </groups>
555 </hive-server>
556 <hive-hbase>
557 <metadata>
558 <summary>Provides integration between Apache HBase and Apache Hive</summary>
559 <description>This optional package provides integration between Apache HBase and Apache Hive</description>
560 <url>http://hive.apache.org/</url>
561 </metadata>
562 <deps>
563 <hive>/self</hive>
564 <hbase/>
565 </deps>
566 </hive-hbase>
567 <webhcat>
568 <metadata>
569 <summary>WEBHcat provides a REST-like web API for HCatalog and related Hadoop components.</summary>
570 <description>
571 WEBHcat provides a REST-like web API for HCatalog and related Hadoop components.
572 </description>
573 <url>http://incubator.apache.org/hcatalog</url>
574 </metadata>
575 <deps>
576 <hcatalog>/self</hcatalog>
577 </deps>
578 <alternatives>
579 <webhcat-conf>
580 <status>auto</status>
581 <value>/etc/webhcat/conf.dist</value>
582 <link>/etc/webhcat/conf</link>
583 <alt>/etc/webhcat/conf.dist</alt>
584 </webhcat-conf>
585 </alternatives>
586 </webhcat>
587 <webhcat-server>
588 <metadata>
589 <summary>Server for WEBHcat.</summary>
590 <description>Server for WEBHcat.</description>
591 <url>http://incubator.apache.org/hcatalog</url>
592 </metadata>
593 <deps>
594 <webhcat>/self</webhcat>
595 </deps>
596 </webhcat-server>
597 <hbase>
598 <metadata>
599 <summary>HBase is the Hadoop database. Use it when you need random, realtime read/write access to your Big Data. This project's goal is the hosting of very large tables -- billions of rows X millions of columns -- atop clusters of commodity hardware.</summary>
600 <description>HBase is an open-source, distributed, column-oriented store modeled after Google' Bigtable: A Distributed Storage System for Structured Data by Chang et al. Just as Bigtable leverages the distributed data storage provided by the Google File System, HBase provides Bigtable-like capabilities on top of Hadoop. HBase includes:
601
602 * Convenient base classes for backing Hadoop MapReduce jobs with HBase tables
603 * Query predicate push down via server side scan and get filters
604 * Optimizations for real time queries
605 * A high performance Thrift gateway
606 * A REST-ful Web service gateway that supports XML, Protobuf, and binary data encoding options
607 * Cascading source and sink modules
608 * Extensible jruby-based (JIRB) shell
609 * Support for exporting metrics via the Hadoop metrics subsystem to files or Ganglia; or via JMX</description>
610 <url>http://hbase.apache.org/</url>
611 </metadata>
612 <deps>
613 <zookeeper/>
614 <hadoop/>
615 <bigtop-utils/>
616 </deps>
617 <alternatives>
618 <hbase-conf>
619 <status>auto</status>
620 <value>/etc/hbase/conf.dist</value>
621 <link>/etc/hbase/conf</link>
622 <alt>/etc/hbase/conf.dist</alt>
623 </hbase-conf>
624 </alternatives>
625 <groups>
626 <hbase>
627 <user>hbase</user>
628 </hbase>
629 </groups>
630 </hbase>
631 <hbase-doc>
632 <metadata>
633 <summary>Hbase Documentation</summary>
634 <description>Documentation for Hbase</description>
635 <url>http://hbase.apache.org/</url>
636 </metadata>
637 </hbase-doc>
638 <hbase-master>
639 <metadata>
640 <summary>The Hadoop HBase master Server.</summary>
641 <description>HMaster is the "master server" for a HBase. There is only one HMaster for a single HBase deployment.</description>
642 <url>http://hbase.apache.org/</url>
643 </metadata>
644 <deps>
645 <hbase>/self</hbase>
646 </deps>
647 </hbase-master>
648 <hbase-regionserver>
649 <metadata>
650 <summary>The Hadoop HBase RegionServer server.</summary>
651 <description>HRegionServer makes a set of HRegions available to clients. It checks in with the HMaster. There are many HRegionServers in a single HBase deployment.</description>
652 <url>http://hbase.apache.org/</url>
653 </metadata>
654 <deps>
655 <hbase>/self</hbase>
656 </deps>
657 </hbase-regionserver>
658 <hbase-thrift>
659 <metadata>
660 <summary>The Hadoop HBase Thrift Interface</summary>
661 <description>ThriftServer - this class starts up a Thrift server which implements the Hbase API specified in the Hbase.thrift IDL file.
662 "Thrift is a software framework for scalable cross-language services development. It combines a powerful software stack with a code generation engine to build services that work efficiently and seamlessly between C++, Java, Python, PHP, and Ruby. Thrift was developed at Facebook, and we are now releasing it as open source." For additional information, see http://developers.facebook.com/thrift/. Facebook has announced their intent to migrate Thrift into Apache Incubator.</description>
663 <url>http://hbase.apache.org/</url>
664 </metadata>
665 <deps>
666 <hbase>/self</hbase>
667 </deps>
668 </hbase-thrift>
669 <hbase-rest>
670 <metadata>
671 <summary>The Apache HBase REST gateway</summary>
672 <description>The Apache HBase REST gateway</description>
673 <url>http://hbase.apache.org/</url>
674 </metadata>
675 <deps>
676 <hbase>/self</hbase>
677 </deps>
678 </hbase-rest>
679 <hadoop>
680 <metadata>
681 <summary>Hadoop is a software platform for processing vast amounts of data</summary>
682 <description>Hadoop is a software platform that lets one easily write and
683 run applications that process vast amounts of data.
684
685 Here's what makes Hadoop especially useful:
686 * Scalable: Hadoop can reliably store and process petabytes.
687 * Economical: It distributes the data and processing across clusters
688 of commonly available computers. These clusters can number
689 into the thousands of nodes.
690 * Efficient: By distributing the data, Hadoop can process it in parallel
691 on the nodes where the data is located. This makes it
692 extremely rapid.
693 * Reliable: Hadoop automatically maintains multiple copies of data and
694 automatically redeploys computing tasks based on failures.
695
696 Hadoop implements MapReduce, using the Hadoop Distributed File System (HDFS).
697 MapReduce divides applications into many small blocks of work. HDFS creates
698 multiple replicas of data blocks for reliability, placing them on compute
699 nodes around the cluster. MapReduce can then process the data where it is
700 located.</description>
701 <url>http://hadoop.apache.org/core/</url>
702 </metadata>
703 <deps>
704 <bigtop-utils/>
705 </deps>
706 <alternatives>
707 <hadoop-conf>
708 <status>auto</status>
709 <link>/etc/hadoop/conf</link>
710 <value>/etc/hadoop/conf.empty</value>
711 <alt>/etc/hadoop/conf.empty</alt>
712 </hadoop-conf>
713 </alternatives>
714 </hadoop>
715 <hadoop-hdfs>
716 <metadata>
717 <summary>The Hadoop Distributed File System</summary>
718 <description>Hadoop Distributed File System (HDFS) is the primary storage system used by
719 Hadoop applications. HDFS creates multiple replicas of data blocks and distributes
720 them on compute nodes throughout a cluster to enable reliable, extremely rapid
721 computations.</description>
722 <url>http://hadoop.apache.org/core/</url>
723 </metadata>
724 <deps>
725 <hadoop>/self</hadoop>
726 <bigtop-utils/>
727 </deps>
728 <groups>
729 <hdfs>
730 <user>hdfs</user>
731 </hdfs>
732 </groups>
733 </hadoop-hdfs>
734 <hadoop-yarn>
735 <metadata>
736 <summary>The Hadoop NextGen MapReduce (YARN)</summary>
737 <description>YARN (Hadoop NextGen MapReduce) is a general purpose data-computation framework.
738 The fundamental idea of YARN is to split up the two major functionalities of the
739 JobTracker, resource management and job scheduling/monitoring, into separate daemons:
740 ResourceManager and NodeManager.
741
742 The ResourceManager is the ultimate authority that arbitrates resources among all
743 the applications in the system. The NodeManager is a per-node slave managing allocation
744 of computational resources on a single node. Both work in support of per-application
745 ApplicationMaster (AM).
746
747 An ApplicationMaster is, in effect, a framework specific library and is tasked with
748 negotiating resources from the ResourceManager and working with the NodeManager(s) to
749 execute and monitor the tasks.</description>
750 <url>http://hadoop.apache.org/core/</url>
751 </metadata>
752 <deps>
753 <hadoop>/self</hadoop>
754 <bigtop-utils/>
755 </deps>
756 <groups>
757 <yarn>
758 <user>yarn</user>
759 </yarn>
760 </groups>
761 </hadoop-yarn>
762 <hadoop-mapreduce>
763 <metadata>
764 <summary>The Hadoop MapReduce (MRv2)</summary>
765 <description>Hadoop MapReduce is a programming model and software framework for writing applications
766 that rapidly process vast amounts of data in parallel on large clusters of compute nodes.</description>
767 <url>http://hadoop.apache.org/core/</url>
768 </metadata>
769 <deps>
770 <hadoop-yarn>/self</hadoop-yarn>
771 <bigtop-utils/>
772 </deps>
773 <groups>
774 <mapred>
775 <user>mapred</user>
776 </mapred>
777 </groups>
778 </hadoop-mapreduce>
779 <hadoop-httpfs>
780 <metadata>
781 <summary>HTTPFS for Hadoop</summary>
782 <description>The server providing HTTP REST API support for the complete FileSystem/FileContext
783 interface in HDFS.</description>
784 <url>http://hadoop.apache.org/core/</url>
785 </metadata>
786 <deps>
787 <hadoop-hdfs>/self</hadoop-hdfs>
788 <bigtop-utils/>
789 </deps>
790 <groups>
791 <httpfs>
792 <user>httpfs</user>
793 </httpfs>
794 </groups>
795 </hadoop-httpfs>
796 <hadoop-hdfs-namenode>
797 <metadata>
798 <summary>The Hadoop namenode manages the block locations of HDFS files</summary>
799 <description>The Hadoop Distributed Filesystem (HDFS) requires one unique server, the
800 namenode, which manages the block locations of files on the filesystem.</description>
801 <url>http://hadoop.apache.org/core/</url>
802 </metadata>
803 <deps>
804 <hadoop-hdfs>/self</hadoop-hdfs>
805 </deps>
806 </hadoop-hdfs-namenode>
807 <hadoop-hdfs-zkfc>
808 <metadata>
809 <summary>Hadoop HDFS failover controller</summary>
810 <description>The Hadoop HDFS failover controller is a ZooKeeper client which also
811 monitors and manages the state of the NameNode. Each of the machines
812 which runs a NameNode also runs a ZKFC, and that ZKFC is responsible
813 for: Health monitoring, ZooKeeper session management, ZooKeeper-based
814 election.</description>
815 <url>http://hadoop.apache.org/core/</url>
816 </metadata>
817 <deps>
818 <hadoop-hdfs>/self</hadoop-hdfs>
819 </deps>
820 </hadoop-hdfs-zkfc>
821 <hadoop-hdfs-journalnode>
822 <metadata>
823 <summary>Hadoop HDFS JournalNode</summary>
824 <description>The HDFS JournalNode is responsible for persisting NameNode edit logs.
825 In a typical deployment the JournalNode daemon runs on at least three
826 separate machines in the cluster.</description>
827 <url>http://hadoop.apache.org/core/</url>
828 </metadata>
829 <deps>
830 <hadoop-hdfs>/self</hadoop-hdfs>
831 <hadoop>/self</hadoop>
832 </deps>
833 </hadoop-hdfs-journalnode>
834 <hadoop-hdfs-secondarynamenode>
835 <metadata>
836 <summary>Hadoop Secondary namenode</summary>
837 <description>The Secondary Name Node periodically compacts the Name Node EditLog
838 into a checkpoint. This compaction ensures that Name Node restarts
839 do not incur unnecessary downtime.</description>
840 <url>http://hadoop.apache.org/core/</url>
841 </metadata>
842 <deps>
843 <hadoop-hdfs>/self</hadoop-hdfs>
844 </deps>
845 </hadoop-hdfs-secondarynamenode>
846 <hadoop-hdfs-datanode>
847 <metadata>
848 <summary>Hadoop Data Node</summary>
849 <description>The Data Nodes in the Hadoop Cluster are responsible for serving up
850 blocks of data over the network to Hadoop Distributed Filesystem
851 (HDFS) clients.</description>
852 <url>http://hadoop.apache.org/core/</url>
853 </metadata>
854 <deps>
855 <hadoop-hdfs>/self</hadoop-hdfs>
856 </deps>
857 </hadoop-hdfs-datanode>
858 <hadoop-yarn-resourcemanager>
859 <metadata>
860 <summary>YARN Resource Manager</summary>
861 <description>The resource manager manages the global assignment of compute resources to applications</description>
862 <url>http://hadoop.apache.org/core/</url>
863 </metadata>
864 <deps>
865 <hadoop-yarn>/self</hadoop-yarn>
866 </deps>
867 </hadoop-yarn-resourcemanager>
868 <hadoop-yarn-nodemanager>
869 <metadata>
870 <summary>YARN Node Manager</summary>
871 <description>The NodeManager is the per-machine framework agent who is responsible for
872 containers, monitoring their resource usage (cpu, memory, disk, network) and
873 reporting the same to the ResourceManager/Scheduler.</description>
874 <url>http://hadoop.apache.org/core/</url>
875 </metadata>
876 <deps>
877 <hadoop-yarn>/self</hadoop-yarn>
878 </deps>
879 </hadoop-yarn-nodemanager>
880 <hadoop-yarn-proxyserver>
881 <metadata>
882 <summary>YARN Web Proxy</summary>
883 <description>The web proxy server sits in front of the YARN application master web UI.</description>
884 <url>http://hadoop.apache.org/core/</url>
885 </metadata>
886 <deps>
887 <hadoop-yarn>/self</hadoop-yarn>
888 </deps>
889 </hadoop-yarn-proxyserver>
890 <hadoop-mapreduce-historyserver>
891 <metadata>
892 <summary>MapReduce History Server</summary>
893 <description>The History server keeps records of the different activities being performed on a Apache Hadoop cluster</description>
894 <url>http://hadoop.apache.org/core/</url>
895 </metadata>
896 <deps>
897 <hadoop-mapreduce>/self</hadoop-mapreduce>
898 </deps>
899 </hadoop-mapreduce-historyserver>
900 <hadoop-conf-pseudo>
901 <metadata>
902 <summary>Pseudo-distributed Hadoop configuration</summary>
903 <description>Contains configuration files for a "pseudo-distributed" Hadoop deployment.
904 In this mode, each of the hadoop components runs as a separate Java process,
905 but all on the same machine.</description>
906 <url>http://hadoop.apache.org/core/</url>
907 </metadata>
908 <deps>
909 <hadoop>/self</hadoop>
910 <hadoop-hdfs-namenode>/self</hadoop-hdfs-namenode>
911 <hadoop-hdfs-datanode>/self</hadoop-hdfs-datanode>
912 <hadoop-hdfs-secondarynamenode>/self</hadoop-hdfs-secondarynamenode>
913 <hadoop-yarn-resourcemanager>/self</hadoop-yarn-resourcemanager>
914 <hadoop-yarn-nodemanager>/self</hadoop-yarn-nodemanager>
915 <hadoop-mapreduce-historyserver>/self</hadoop-mapreduce-historyserver>
916 </deps>
917 </hadoop-conf-pseudo>
918 <hadoop-doc>
919 <metadata>
920 <summary>Hadoop Documentation</summary>
921 <description>Documentation for Hadoop</description>
922 <url>http://hadoop.apache.org/core/</url>
923 </metadata>
924 </hadoop-doc>
925 <hadoop-client>
926 <metadata>
927 <summary>Hadoop client side dependencies</summary>
928 <description>Installation of this package will provide you with all the dependencies for Hadoop clients.</description>
929 <url>http://hadoop.apache.org/core/</url>
930 </metadata>
931 <deps>
932 <hadoop>/self</hadoop>
933 <hadoop-hdfs>/self</hadoop-hdfs>
934 <hadoop-yarn>/self</hadoop-yarn>
935 <hadoop-mapreduce>/self</hadoop-mapreduce>
936 </deps>
937 </hadoop-client>
938 <hadoop-hdfs-fuse>
939 <metadata>
940 <summary>Mountable HDFS</summary>
941 <description>These projects (enumerated below) allow HDFS to be mounted (on most flavors of Unix) as a standard file system using</description>
942 <url>http://hadoop.apache.org/core/</url>
943 </metadata>
944 <deps>
945 <hadoop-client>/self</hadoop-client>
946 <hadoop>/self</hadoop>
947 <hadoop-libhdfs>/self</hadoop-libhdfs>
948 </deps>
949 </hadoop-hdfs-fuse>
950 <hue-common>
951 <metadata>
952 <summary>A browser-based desktop interface for Hadoop</summary>
953 <description>Hue is a browser-based desktop interface for interacting with Hadoop.
954 It supports a file browser, job tracker interface, cluster health monitor, and more.</description>
955 <url>http://github.com/cloudera/hue</url>
956 </metadata>
957 <deps>
958 <hue-server>/self</hue-server>
959 <hue-beeswax>/self</hue-beeswax>
960 </deps>
961 <alternatives>
962 <hue-conf>
963 <status>auto</status>
964 <link>/etc/hue/conf</link>
965 <value>/etc/hue/conf.empty</value>
966 <alt>/etc/hue/conf.empty</alt>
967 </hue-conf>
968 </alternatives>
969 </hue-common>
970 <hue-server>
971 <metadata>
972 <summary>Service Scripts for Hue</summary>
973 <description>This package provides the service scripts for Hue server.</description>
974 <url>http://github.com/cloudera/hue</url>
975 </metadata>
976 <deps>
977 <hue-common>/self</hue-common>
978 </deps>
979 </hue-server>
980 <hue-beeswax>
981 <metadata>
982 <summary>A UI for Hive on Hue</summary>
983 <description>Beeswax is a web interface for Hive.
984
985 It allows users to construct and run queries on Hive, manage tables,
986 and import and export data.</description>
987 <url>http://github.com/cloudera/hue</url>
988 </metadata>
989 <deps>
990 <hue-common>/self</hue-common>
991 <hive/>
992 <make/>
993 </deps>
994 </hue-beeswax>
995 <hue-pig>
996 <metadata>
997 <summary>A UI for Pig on Hue</summary>
998 <description>A web interface for Pig.
999
1000 It allows users to construct and run Pig jobs.</description>
1001 <url>http://github.com/cloudera/hue</url>
1002 </metadata>
1003 <deps>
1004 <hue-common>/self</hue-common>
1005 <make/>
1006 <pig/>
1007 </deps>
1008 </hue-pig>
1009 <hue>
1010 <metadata>
1011 <summary>The hue metapackage</summary>
1012 <description>Hue is a browser-based desktop interface for interacting with Hadoop. It supports a file browser, job tracker interface, cluster health monitor, and more.</description>
1013 <url>http://github.com/cloudera/hue</url>
1014 </metadata>
1015 <deps>
1016 <hue-server>/self</hue-server>
1017 <hue-beeswax>/self</hue-beeswax>
1018 </deps>
1019 </hue>
1020 </packages>