Josh Wills [Tue, 2 Feb 2021 17:20:15 +0000 (09:20 -0800)]
Merge pull request #34 from noslowerdna/CRUNCH-698
CRUNCH-698: Inclusion of local patch for AVRO-2944
Andrew Olson [Tue, 2 Feb 2021 16:38:53 +0000 (10:38 -0600)]
CRUNCH-698: Inclusion of local patch for AVRO-2944
Josh Wills [Tue, 12 May 2020 18:41:24 +0000 (11:41 -0700)]
Merge pull request #33 from ben-roling/CRUNCH-696
CRUNCH-696 update FormatBundle.readFields() compatibility
Ben Roling [Tue, 12 May 2020 17:24:21 +0000 (12:24 -0500)]
Update crunch-core/src/main/java/org/apache/crunch/io/FormatBundle.java
Co-authored-by: Andrew Olson <930946+noslowerdna@users.noreply.github.com>
Ben Roling [Mon, 11 May 2020 17:02:19 +0000 (12:02 -0500)]
CRUNCH-696 update FormatBundle.readFields() compatibility
Make FormatBundle.readFields() compatible with FormatBundles serialized
with an older version of Crunch. This ensures jobs don't fail during an
upgrade to a cluster-provided Crunch dependency. Without this some jobs
get submitted without the filesystem field in the serialized
FormatBundle and then encounter EOFException when the job gets
scheduled to run and uses the newer Crunch to deserialize the FormatBundle.
Andrew Olson [Wed, 25 Mar 2020 16:14:37 +0000 (11:14 -0500)]
CRUNCH-695: Fix NullPointerException in RegionLocationTable (#32)
Co-authored-by: Andrew Olson <aolson1@cerner.com>
Jan Van Besien [Fri, 21 Feb 2020 13:22:59 +0000 (14:22 +0100)]
Update to kafka 2.2.1
Remove duplication of KafkaData, KafkaSource, KafkaInputFormat in order to only retain
the variants from org.apache.crunch.kafka.record that were already mostly compatible
with kafka 2.2.1. Fix some remaining incompatibilities, in particular related to reading
offset information from the broker.
Signed-off-by: Josh Wills <jwills@apache.org>
Josh Wills [Thu, 16 Jan 2020 00:57:42 +0000 (16:57 -0800)]
Merge pull request #30 from apache/jwills_great_version_upgrade
The great version upgrade PR
Josh Wills [Tue, 14 Jan 2020 23:14:17 +0000 (15:14 -0800)]
Fixup duplicate hadoop-hdfs dep
Josh Wills [Tue, 14 Jan 2020 23:08:40 +0000 (15:08 -0800)]
Merge pull request #31 from apache/CRUNCH-693
CRUNCH-693: Make text parsing locale-independent
Gabriel Reid [Sat, 11 Jan 2020 15:35:20 +0000 (16:35 +0100)]
CRUNCH-693: Make text parsing locale-independent
Standardize on US-based locale for number formatting (which is
backwards-compatible with historical behavior).
Josh Wills [Fri, 15 Nov 2019 00:48:33 +0000 (16:48 -0800)]
Fix unnecessary stubbings in the kafka test suite
Josh Wills [Fri, 15 Nov 2019 00:34:52 +0000 (16:34 -0800)]
oops should have fixed that one
Josh Wills [Fri, 15 Nov 2019 00:22:49 +0000 (16:22 -0800)]
mostly kafka fixes; some jackson fixes
Josh Wills [Thu, 14 Nov 2019 21:50:59 +0000 (13:50 -0800)]
and more more fixes
Josh Wills [Thu, 14 Nov 2019 21:50:15 +0000 (13:50 -0800)]
Ever more fixes
Josh Wills [Wed, 13 Nov 2019 23:14:51 +0000 (15:14 -0800)]
WIP for modernizing Crunch deps
Josh Wills [Tue, 8 Oct 2019 23:24:41 +0000 (16:24 -0700)]
[maven-release-plugin] prepare for next development iteration
Josh Wills [Tue, 8 Oct 2019 23:24:23 +0000 (16:24 -0700)]
[maven-release-plugin] prepare branch apache-crunch-1.0
Josh Wills [Tue, 8 Oct 2019 17:19:03 +0000 (10:19 -0700)]
Wire up crunch-kafka to work as part of the distribution/release
process.
Josh Wills [Mon, 23 Sep 2019 17:57:52 +0000 (10:57 -0700)]
CRUNCH-670: Make AvroPathPerKeyTarget work with the Spark Runtime.
Josh Wills [Fri, 2 Aug 2019 23:12:03 +0000 (16:12 -0700)]
Merge pull request #27 from noslowerdna/CRUNCH-688
CRUNCH-688: Fix HFile node affinity for non-default namespace HBase t…
Andrew Olson [Fri, 2 Aug 2019 21:47:09 +0000 (16:47 -0500)]
CRUNCH-688: Fix HFile node affinity for non-default namespace HBase tables
Andrew Olson [Mon, 15 Jul 2019 16:42:30 +0000 (11:42 -0500)]
CRUNCH-679: Improvements for usage of DistCp (#20)
* CRUNCH-679: Improvements for usage of DistCp
* CRUNCH-679: Fix NPE bug by preserving IOUtils.cleanup logic
* CRUNCH-679: CrunchRenameCopyListing's constructor needs to be public
* CRUNCH-679: Unset rename configuration after loading into copy listing
* CRUNCH-679: Reduce default max distcp map tasks from 1000 to 100
* CRUNCH-679: Update log message formatting
Andrew Olson [Fri, 12 Jul 2019 21:43:19 +0000 (16:43 -0500)]
CRUNCH-681: Updating HFileUtils to accept a filesystem parameter for … (#22)
* CRUNCH-681: Updating HFileUtils to accept a filesystem parameter for targets and sources
* CRUNCH-681: Add and update javadoc
Ben Roling [Fri, 12 Jul 2019 21:36:57 +0000 (16:36 -0500)]
CRUNCH-685 Use whitelist and blacklist for .fileSystem() properties (#25)
* CRUNCH-685 Use whitelist and blacklist for .fileSystem() properties
* CRUNCH-685 fix noisy logging
* CRUNCH-686 Fix FormatBundle to hide redacted properties
Ben Roling [Fri, 12 Jul 2019 21:30:24 +0000 (16:30 -0500)]
CRUNCH-683 avoid unnecessary listStatus() calls from getPathSize() (#26)
Suyash Agarwal [Wed, 3 Jul 2019 02:35:47 +0000 (08:05 +0530)]
CRUNCH-635: Output path per key for Text target
Signed-off-by: Josh Wills <jwills@apache.org>
Andrew Olson [Thu, 2 May 2019 12:36:34 +0000 (07:36 -0500)]
CRUNCH-684: Fix NullPointerException
Signed-off-by: Josh Wills <jwills@apache.org>
Andrew Olson [Wed, 1 May 2019 21:20:17 +0000 (16:20 -0500)]
CRUNCH-684: Fix .equals and .hashCode for Targets
Signed-off-by: Josh Wills <jwills@apache.org>
Andrew Olson [Thu, 18 Apr 2019 20:54:47 +0000 (15:54 -0500)]
CRUNCH-681: Add and update javadoc
Signed-off-by: Josh Wills <jwills@apache.org>
Andrew Olson [Thu, 18 Apr 2019 15:26:48 +0000 (10:26 -0500)]
CRUNCH-681: Updating HFileUtils to accept a filesystem parameter for targets and sources
Signed-off-by: Josh Wills <jwills@apache.org>
Micah Whitacre [Fri, 1 Mar 2019 16:40:48 +0000 (10:40 -0600)]
Merge pull request #21 from noslowerdna/CRUNCH-680
CRUNCH-680: Kafka Source should split very large partitions
Andrew Olson [Fri, 22 Feb 2019 19:34:32 +0000 (13:34 -0600)]
CRUNCH-680: Kafka Source should split very large partitions
Micah Whitacre [Tue, 26 Feb 2019 16:27:37 +0000 (10:27 -0600)]
Merge pull request #19 from ben-roling/CRUNCH-677_master2
CRUNCH-677 Source and Target accept FileSystem
Ben Roling [Thu, 21 Feb 2019 17:17:25 +0000 (11:17 -0600)]
CRUNCH-677 fix merge mistakes
Ben Roling [Wed, 20 Feb 2019 17:42:24 +0000 (11:42 -0600)]
CRUNCH-677 Source and Target accept FileSystem
Andrew Olson [Tue, 19 Feb 2019 22:46:20 +0000 (16:46 -0600)]
CRUNCH-678: Avoid unnecessary last modified time retrieval
Signed-off-by: Josh Wills <jwills@apache.org>
Andrew Olson [Wed, 23 Jan 2019 17:23:57 +0000 (11:23 -0600)]
CRUNCH-660, CRUNCH-675: Use DistCp instead of FileUtils.copy when source and destination paths are in different filesystems
Signed-off-by: Josh Wills <jwills@apache.org>
Jun He [Thu, 9 Aug 2018 05:49:09 +0000 (05:49 +0000)]
CRUNCH-671: Failed to generate reports using "mvn site"
Crunch build failed due to "ClassNotFound" in doxia.
This is caused by maven-project-info-reports-plugin updated to 3.0.0, depends on
doxia-site-renderer 1.8 (which has org.apache.maven.doxia.siterenderer.DocumentContent
this class), while maven-site-plugin:3.3 depends on doxia-site-renderer:1.4 (which
doesn't have org.apache.maven.doxia.siterenderer.DocumentContent)
Specify maven-site-plugin to 3.7 can resolve this.
Signed-off-by: Jun He <jun.he@linaro.org>
Signed-off-by: Josh Wills <jwills@apache.org>
Josh Wills [Mon, 23 Jul 2018 20:31:00 +0000 (13:31 -0700)]
CRUNCH-619: Update to HBase 2.0.1. Contributed by Attila Sasvari.
Josh Wills [Mon, 30 Apr 2018 18:47:15 +0000 (11:47 -0700)]
CRUNCH-669: Add an option to disable temp dir deletion in the finalize() method of a DistributedPipeline
Clément MATHIEU [Tue, 27 Mar 2018 15:55:15 +0000 (17:55 +0200)]
CRUNCH-668: Support globbing patterns in From#avroFile
Signed-off-by: Josh Wills <jwills@apache.org>
Clément MATHIEU [Tue, 6 Mar 2018 16:47:48 +0000 (17:47 +0100)]
Fix HCatSourceITSpec.testBasic
Signed-off-by: Josh Wills <jwills@apache.org>
Clément MATHIEU [Wed, 7 Mar 2018 09:13:51 +0000 (10:13 +0100)]
CRUNCH-665: Add crunch.max.poll.interval property
Signed-off-by: Josh Wills <jwills@apache.org>
Nathan Schile [Mon, 5 Feb 2018 15:08:46 +0000 (09:08 -0600)]
CRUNCH-664 Fixes HBase configuration properties being overwritten
Signed-off-by: Josh Wills <jwills@apache.org>
Ben Roling [Wed, 24 Jan 2018 16:40:18 +0000 (10:40 -0600)]
Expose combine file split file path via Hadoop config
Signed-off-by: Josh Wills <jwills@apache.org>
Bryan Baugher [Wed, 24 Jan 2018 20:14:31 +0000 (14:14 -0600)]
CRUNCH-662: Updated KafkaRecordReader to better handle errors, empty reads and appropriately retry
Signed-off-by: Josh Wills <jwills@apache.org>
Josh Wills [Thu, 18 Jan 2018 21:11:26 +0000 (13:11 -0800)]
CRUNCH-661: Make DataBaseSource.Builder methods public
Josh Wills [Mon, 11 Dec 2017 17:56:38 +0000 (09:56 -0800)]
CRUNCH-654: KafkaSource should use the new Kafka Consumer API instead of the SimpleConsumer. Contributed by Bryan Baugher.
Stephen Durfey [Mon, 4 Dec 2017 16:49:59 +0000 (10:49 -0600)]
CRUNCH-340: added HCatSource & HCatTarget
Signed-off-by: Josh Wills <jwills@apache.org>
Stephen Durfey [Thu, 7 Dec 2017 15:55:56 +0000 (09:55 -0600)]
CRUNCH-659: updated hive dependency to 2.1
Signed-off-by: Micah Whitacre <mkwhit@gmail.com>
Josh Wills [Fri, 27 Oct 2017 04:09:27 +0000 (21:09 -0700)]
CRUNCH-652: Fix to make the SourceTargetHelperTest less flakey on hadoop 3.0.0. Contributed by Gergo Repas.
Bryan Baugher [Wed, 16 Aug 2017 21:19:42 +0000 (16:19 -0500)]
CRUNCH-653: Created KafkaSource that provides ConsumerRecord messages
Signed-off-by: Josh Wills <jwills@apache.org>
Josh Wills [Fri, 12 May 2017 16:52:49 +0000 (09:52 -0700)]
CRUNCH-647: Remove obsolete jackson dependencies
Gabriel Reid [Thu, 27 Apr 2017 12:52:16 +0000 (14:52 +0200)]
CRUNCH-644 Supply preferred node for HFile writes
Designate the preferred HDFS data node when creating HFiles for
bulk load to improve data locality of the created HFiles.
Tom White [Thu, 13 Apr 2017 15:10:23 +0000 (16:10 +0100)]
CRUNCH-618: Run on Spark 2. Contributed by Gergő Pásztor.
Xavier Talpe [Thu, 13 Apr 2017 05:52:43 +0000 (07:52 +0200)]
CRUNCH-642 Enable GroupingOptions for Distinct operations.
This fixes the existing call for numReducers as it was not working as
intended for non-memory PCollections due to using an invalid amount
of numReducers. To increase flexibility when using the API,
another call was added that allow to directly pass the GroupingOptions.
Signed-off-by: Josh Wills <jwills@apache.org>
Tom White [Wed, 12 Apr 2017 14:03:41 +0000 (15:03 +0100)]
CRUNCH-641: Wrong decimal format in dot files. Contributed by Gergő Pásztor.
Xavier Talpe [Mon, 10 Apr 2017 13:51:32 +0000 (15:51 +0200)]
CRUNCH-642 Enable numReducers option for Distinct operations.
Signed-off-by: Josh Wills <jwills@apache.org>
Attila Sasvari [Thu, 23 Mar 2017 20:35:36 +0000 (21:35 +0100)]
CRUNCH-636: amend Make replication factor for temporary files configurable
Signed-off-by: Josh Wills <jwills@apache.org>
Attila Sasvari [Mon, 20 Mar 2017 10:17:55 +0000 (11:17 +0100)]
CRUNCH-636: Make replication factor for temporary files configurable
Signed-off-by: Josh Wills <jwills@apache.org>
Tom White [Tue, 7 Mar 2017 14:38:52 +0000 (14:38 +0000)]
CRUNCH-638: Improve dot file generation for better supportability. Contributed by Gergő Pásztor.
Tom White [Mon, 20 Feb 2017 10:28:05 +0000 (10:28 +0000)]
CRUNCH-633: Remove the commons-httpclient:commons-httpclient dependency. Contributed by Gergő Pásztor.
Gabriel Reid [Mon, 13 Feb 2017 18:57:32 +0000 (19:57 +0100)]
CRUNCH-634 Fix typo in log message
Contributed by Attila Sasvari
Micah Whitacre [Wed, 7 Dec 2016 02:50:02 +0000 (21:50 -0500)]
CRUNCH-628: Upgraded to Kafka 0.10.0.x
Josh Wills [Sun, 5 Feb 2017 19:22:06 +0000 (11:22 -0800)]
[maven-release-plugin] prepare for next development iteration
Josh Wills [Sun, 5 Feb 2017 19:22:05 +0000 (11:22 -0800)]
[maven-release-plugin] prepare branch apache-crunch-0.15
Micah Whitacre [Thu, 12 Jan 2017 02:51:26 +0000 (20:51 -0600)]
CRUNCH-632: Added support for compressed CSVSource files.
CRUNCH-632: Wrote simple test showing it now working on compressed CSV file.
Signed-off-by: Josh Wills <jwills@apache.org>
Micah Whitacre [Thu, 12 Jan 2017 01:53:05 +0000 (19:53 -0600)]
Merge branch 'CRUNCH-630'
Brian Tieman [Tue, 13 Dec 2016 15:01:08 +0000 (09:01 -0600)]
CRUNCH-629: Kafka source pulling is aggressive
Added some parenthesis to force proper order of operations in KafkaRecordReader.
Signed-off-by: Micah Whitacre <mkwhit@gmail.com>
Micah Whitacre [Tue, 3 Jan 2017 17:39:31 +0000 (11:39 -0600)]
CRUNCH-630: set a better default for the situation where offsets are out of range.
Dimitry Goldin [Fri, 14 Oct 2016 16:39:41 +0000 (18:39 +0200)]
Quick and Dirty Workaround for Crunch DistCache
Signed-off-by: Micah Whitacre <mkwhit@gmail.com>
Signed-off-by: Josh Wills <jwills@apache.org>
Josh Wills [Sat, 3 Dec 2016 19:56:59 +0000 (11:56 -0800)]
CRUNCH-622: From.avroFile fails if path not on default filesystem. Contributed by Micah Whitacre.
Stefan Mendoza [Tue, 13 Sep 2016 03:38:41 +0000 (22:38 -0500)]
CRUNCH-620: Reduce "isn't a known config" warnings by slimming down ConsumerConfig properties
Resolved by tagging the Kafka connection properties so that the Kafka Consumers can be built with slimmer ConsumerConfig properties.
Signed-off-by: Micah Whitacre <mkwhit@gmail.com>
David Whiting [Thu, 20 Oct 2016 14:17:00 +0000 (16:17 +0200)]
CRUNCH-625: Add missing .union implementations for LTables with LTables and PTables
Micah Whitacre [Tue, 13 Sep 2016 15:35:35 +0000 (10:35 -0500)]
CRUNCH-621: Added check into hasPendingData to check if there is a large number of requests with no data to make sure there is still data there.
Nathan Schile [Thu, 29 Sep 2016 21:24:14 +0000 (16:24 -0500)]
CRUNCH-623: Improves Javadoc of PTable#cogroup
Signed-off-by: Josh Wills <jwills@apache.org>
Micah Whitacre [Tue, 6 Sep 2016 20:55:56 +0000 (15:55 -0500)]
CRUNCH-617: Support defensively handling null when partition leader cannot be found.
Signed-off-by: Micah Whitacre <mkwhit@gmail.com>
Tom White [Thu, 8 Sep 2016 13:12:30 +0000 (14:12 +0100)]
CRUNCH-616: Replace (possibly copyrighted) Maugham text with Dickens. Contributed by Sean Owen.
Remove non-applicable Project Gutenberg license. Adjust lots of tests to match new text.
Josh Wills [Wed, 24 Aug 2016 17:59:14 +0000 (10:59 -0700)]
CRUNCH-601: Handle empty PCollections correctly in Crunch-on-Spark. Created by Micah Whitacre,
Mikael Goldmann, and Josh Wills.
Josh Wills [Wed, 24 Aug 2016 03:07:23 +0000 (20:07 -0700)]
CRUNCH-519: Add more detail to plan dot file. Contributed by Ron Hashimshony.
Micah Whitacre [Tue, 2 Aug 2016 21:29:55 +0000 (16:29 -0500)]
CRUNCH-604: Avoid expensive Writables.reloadWritableComparableCodes
Micah Whitacre [Tue, 2 Aug 2016 20:58:29 +0000 (15:58 -0500)]
CRUNCH-611: Corrected files that were missing the APL headers.
Micah Whitacre [Wed, 13 Jul 2016 15:18:17 +0000 (10:18 -0500)]
CRUNCH-611: Added API for Offset reading/writing along with a simple implementation that supports doing it from hdfs.
Signed-off-by: Micah Whitacre <mkwhit@apache.org>
Josh Wills [Sat, 30 Jul 2016 00:47:09 +0000 (17:47 -0700)]
CRUNCH-614: Fix HFileUtils.writeToHFilesForIncrementalLoad slowed dramatically by copying KeyValue byte array. Contributed by Ben Roling.
Clément MATHIEU [Tue, 19 Jul 2016 19:30:35 +0000 (21:30 +0200)]
CRUNCH-613: Fix FileTargetImplTest.testHandleOutputsMovesFilesToDestination instability
Signed-off-by: Micah Whitacre <mkwhit@gmail.com>
CRUNCH-613: Fixed up the test to consolidate constants used.
Clément MATHIEU [Tue, 19 Jul 2016 19:01:20 +0000 (21:01 +0200)]
CRUNCH-612: Add support of private ctors to AvroDeepCopier
Signed-off-by: Micah Whitacre <mkwhit@gmail.com>
Micah Whitacre [Tue, 28 Jun 2016 20:44:15 +0000 (15:44 -0500)]
CRUNCH-609: Improved KafkaRecordReader to keep retrying when the range of offsets has not been fully consumed.
Micah Whitacre [Mon, 23 May 2016 20:13:02 +0000 (15:13 -0500)]
CRUNCH-606: Handle setting version correctly and removed stray System.out in test.
Micah Whitacre [Mon, 11 Apr 2016 14:47:33 +0000 (09:47 -0500)]
CRUNCH-606: Kafka Source for Crunch which supports reading data as BytesWritable
* Some of the code contributed by Bryan Baugher and Andrew Olson
Gabriel Reid [Tue, 10 May 2016 09:02:11 +0000 (11:02 +0200)]
CRUNCH-608 Write Bloom filters in HFiles
Use a correctly-configured StoreFile.Writer (instead of HFile.Writer)
for writing HFiles so that Bloom filter data is also included in
the written HFiles.
Gabriel Reid [Mon, 2 May 2016 15:31:20 +0000 (17:31 +0200)]
CRUNCH-607 Allow collection reuse in MemPipeline
Prevent SingleUseIterable from throwing an IllegalArgumentException
when legal reuse of PGroupedCollections are done with the
MemPipeline.
This simply prevents materializing the transformed contents of
a MemCollection until it is iterated over.
Josh Wills [Sun, 24 Apr 2016 02:23:02 +0000 (19:23 -0700)]
[maven-release-plugin] prepare for next development iteration
Josh Wills [Sun, 24 Apr 2016 02:23:01 +0000 (19:23 -0700)]
[maven-release-plugin] prepare branch apache-crunch-0.14
mkwhitacre [Mon, 23 Nov 2015 00:07:30 +0000 (18:07 -0600)]
CRUNCH-579: Supported access to counters from original TaskContext
Signed-off-by: Micah Whitacre <mkwhit@gmail.com>
Igor Bernstein [Sun, 10 Apr 2016 19:42:10 +0000 (15:42 -0400)]
CRUNCH-600: pass job credentials when building multiple outputs
Signed-off-by: Micah Whitacre <mkwhit@gmail.com>
David Whiting [Thu, 31 Mar 2016 10:06:45 +0000 (12:06 +0200)]
CRUNCH-599: Fix increment and incrementIf methods in crunch-lambda so they also emit the incoming element
Josh Wills [Thu, 24 Mar 2016 16:55:16 +0000 (09:55 -0700)]
CRUNCH-597: Upgrade to Parquet 1.8.1
tworec [Fri, 4 Mar 2016 17:58:01 +0000 (18:58 +0100)]
CRUNCH-596 Support right-outer bloom join
Signed-off-by: Gabriel Reid <greid@apache.org>