Samarth Jain [Fri, 6 Nov 2020 21:21:30 +0000 (13:21 -0800)]
Parquet: Fix vectorized reads for negative decimals (#1736)
yyanyy [Fri, 6 Nov 2020 20:12:45 +0000 (12:12 -0800)]
API: Fix Expressions.notNull operation (#1722)
Ryan Blue [Fri, 6 Nov 2020 01:48:18 +0000 (17:48 -0800)]
Core: Fix NullPointerException in ManifestReader (#1730)
This fixes a NullPointerException that is thrown by a ManifestReader for delete files when there is a query filter. The DeleteFileIndex projects all fields of a delete manifest, so it doesn't call select to select specific columns, unlike ManifestGroup, which selects * by default. When select is not called, methods that check whether to add stats columns fail, but only if there is a row filter because stats columns are not needed if there is no row filter.
Existing tests either called select to configure the reader, or didn't pass a row filter and projected all rows. This adds a test that uses DeleteFileIndex and a test for ManifestReader. This also fixes dropStats in addition to requireStatsProjection.
Co-authored-by: 钟保罗 <zhongbaoluo@shxgroup.net>
Jungtaek Lim [Fri, 6 Nov 2020 01:02:10 +0000 (10:02 +0900)]
Core: Fix NPE in SnapshotManager.validateCurrentSnapshot (#1725)
Anton Okolnychyi [Tue, 3 Nov 2020 23:50:30 +0000 (15:50 -0800)]
Build: Ignore json files in licence checks
Jungtaek Lim [Tue, 3 Nov 2020 20:19:26 +0000 (05:19 +0900)]
Docs: Add type compatibility tables for Spark (#1611)
Ryan Blue [Tue, 3 Nov 2020 18:28:00 +0000 (10:28 -0800)]
Revert "Spark: Handle null literals in predicates (#1709)"
This reverts commit
552b76713fca5d34a951d2794d74d3d7b8829ff3.
Ryan Blue [Tue, 3 Nov 2020 17:07:26 +0000 (09:07 -0800)]
Spark: Handle null literals in predicates (#1709)
Marton Bod [Tue, 3 Nov 2020 00:52:58 +0000 (01:52 +0100)]
Hive: Run StorageHandler tests in both MR and Tez (#1695)
Co-authored-by: Marton Bod <mbod@cloudera.com>
Shardul Mahadik [Mon, 2 Nov 2020 22:00:16 +0000 (14:00 -0800)]
ORC: Fix non-vectorized reader incorrectly skipping rows (#1706)
Marton Bod [Mon, 2 Nov 2020 21:53:41 +0000 (22:53 +0100)]
Revert "Hive: Fix missing table schema in Hive 1.1 query (#1557)" (#1707)
This reverts commit
13d94bc2e4e2dcae3d964b98dbb3aaeee19b46c2.
Co-authored-by: Marton Bod <mbod@cloudera.com>
yyanyy [Sun, 1 Nov 2020 20:19:41 +0000 (12:19 -0800)]
Core: Check required v2 fields in table metadata parser (#1614)
Shardul Mahadik [Sun, 1 Nov 2020 19:27:53 +0000 (11:27 -0800)]
ORC: Enable predicate pushdown and remove metrics workaround for Timestamps (#1696)
Shardul Mahadik [Sun, 1 Nov 2020 19:20:11 +0000 (11:20 -0800)]
ORC: OrcFileAppender#length should return file size on disk (#1697)
Anton Okolnychyi [Fri, 30 Oct 2020 19:17:27 +0000 (12:17 -0700)]
Build: Fix Javadoc in SnapshotSummary
Anton Okolnychyi [Fri, 30 Oct 2020 16:57:59 +0000 (09:57 -0700)]
Build: Fix source-release.sh (#1694)
Russell Spitzer [Thu, 29 Oct 2020 18:51:18 +0000 (13:51 -0500)]
Core: Fix entries metadata tables with delete manifests (#1673)
Ryan Blue [Thu, 29 Oct 2020 18:29:11 +0000 (11:29 -0700)]
Core: Do not create an Evaluator in ManifestGroup unless the file filter is used (#1678)
Ryan Blue [Thu, 29 Oct 2020 18:25:32 +0000 (11:25 -0700)]
Core: Fix replaced partitions validation in SnapshotManager (#1677)
Ryan Blue [Thu, 29 Oct 2020 18:14:30 +0000 (11:14 -0700)]
Core: Fix ArrayIndexOutOfBounds for tables with migrated partition specs (#1676)
Ryan Blue [Thu, 29 Oct 2020 18:11:10 +0000 (11:11 -0700)]
Spark: Enable testInsertOverwrite, passing with Spark 3.0.1 (#1679)
Kyle Bendickson [Thu, 29 Oct 2020 17:14:14 +0000 (10:14 -0700)]
Build: Change labeller to more secure pull_request_target (#1692)
Kyle Bendickson [Thu, 29 Oct 2020 16:31:23 +0000 (09:31 -0700)]
Build: Move PR labeler to GitHub action (#1686)
László Pintér [Thu, 29 Oct 2020 16:30:11 +0000 (18:30 +0200)]
Core: Update FS table version-hint.txt atomically (#1559)
Anton Okolnychyi [Thu, 29 Oct 2020 16:07:07 +0000 (09:07 -0700)]
Build: Support builds without .git directory (#1681)
Anton Okolnychyi [Thu, 29 Oct 2020 01:03:58 +0000 (18:03 -0700)]
Build: Add ASF licence header to .asf.yaml
Ryan Blue [Wed, 28 Oct 2020 23:10:06 +0000 (16:10 -0700)]
Spark: Remove cache expiration in HiveCatalogs cache (#1674)
Anton Okolnychyi [Wed, 28 Oct 2020 22:57:56 +0000 (15:57 -0700)]
Fix IN predicate performance (#1672)
Ryan Blue [Wed, 28 Oct 2020 22:40:22 +0000 (15:40 -0700)]
Spec: Update the spec for row-level deletes (#1499)
Anton Okolnychyi [Wed, 28 Oct 2020 16:40:05 +0000 (09:40 -0700)]
Parquet: Optimize IN predicates in ParquetDictionaryRowGroupFilter (#1664)
JunZhang [Wed, 28 Oct 2020 04:19:30 +0000 (12:19 +0800)]
Core: Extract the BaseRewriteDataFilesAction for implementing both flink and spark rewrite actions.
openinx [Wed, 28 Oct 2020 04:06:53 +0000 (12:06 +0800)]
Flink: maintain the complete data files into manifest before checkpoint finished. (#1477)
Marton Bod [Tue, 27 Oct 2020 21:29:55 +0000 (22:29 +0100)]
Hive: Add TestHiveShell for parameterized StorageHandler tests (#1631)
Co-authored-by: Marton Bod <mbod@cloudera.com>
Fokko Driesprong [Tue, 27 Oct 2020 18:24:22 +0000 (19:24 +0100)]
Spark: Bump Spark 3 version to 3.0.1 (#1656)
Russell Spitzer [Tue, 27 Oct 2020 18:23:29 +0000 (13:23 -0500)]
Docs: Fix docs typo in ParquetVectorizedReader (#1658)
Marton Bod [Tue, 27 Oct 2020 18:22:25 +0000 (19:22 +0100)]
Hive: Update test code for Hive4 (#1645)
Co-authored-by: Marton Bod <mbod@cloudera.com>
Samarth Jain [Fri, 23 Oct 2020 21:30:49 +0000 (14:30 -0700)]
Parquet: Add test for Arrow buffer reallocation (#1480)
openinx [Fri, 23 Oct 2020 21:28:49 +0000 (05:28 +0800)]
Flink: Load hive-site.xml for HiveCatalog (#1586)
Sunitha Kambhampati [Fri, 23 Oct 2020 18:24:09 +0000 (11:24 -0700)]
Docs: Document property behavior for Spark REPLACE TABLE (#1644)
Fokko Driesprong [Fri, 23 Oct 2020 18:22:32 +0000 (20:22 +0200)]
Spark: Bump Spark 2 module to 2.4.7 (#1646)
Fokko Driesprong [Fri, 23 Oct 2020 18:19:43 +0000 (20:19 +0200)]
Lint: Fix small issues (#1650)
Russell Spitzer [Thu, 22 Oct 2020 17:14:49 +0000 (12:14 -0500)]
Spark: Split Actions for Spark 2 and 3 using reflection (#1616)
Russell Spitzer [Thu, 22 Oct 2020 00:07:37 +0000 (19:07 -0500)]
Parquet: Fix row group filtering with old CDH stats (#1638)
openinx [Wed, 21 Oct 2020 01:09:35 +0000 (09:09 +0800)]
Flink: Support configurable parallelism for write tasks (#1619)
Steven Zhen Wu [Tue, 20 Oct 2020 02:49:42 +0000 (19:49 -0700)]
Flink: move convertConstant method from DataIterator to RowDataUtil class (#1625)
Marton Bod [Mon, 19 Oct 2020 19:33:50 +0000 (21:33 +0200)]
Hive: Select ObjectInspectors based on classpath (#1632)
pvary [Mon, 19 Oct 2020 10:07:23 +0000 (12:07 +0200)]
Hive: Make the TestHiveMetastore connection pool size configurable (#1629)
pvary [Fri, 16 Oct 2020 16:55:17 +0000 (18:55 +0200)]
Hive: Fix TestHiveMetastore worker exhaustion (#1620)
Shardul Mahadik [Thu, 15 Oct 2020 21:49:21 +0000 (14:49 -0700)]
Docs: Add ORC to table and writer format configs (#1615)
Chen, Junjie [Wed, 14 Oct 2020 23:11:41 +0000 (07:11 +0800)]
Flink: Apply row-level deletes when reading (#1517)
Russell Spitzer [Wed, 14 Oct 2020 00:07:29 +0000 (19:07 -0500)]
Spark: Move Action tests to the correct package (#1609)
Chen, Junjie [Tue, 13 Oct 2020 19:11:16 +0000 (14:11 -0500)]
Spark: Fix benchmark docs and temp files (#1606)
Chen, Junjie [Tue, 13 Oct 2020 17:58:51 +0000 (12:58 -0500)]
Spark: Reuse containers when reading Parquet (#1522)
Shardul Mahadik [Tue, 13 Oct 2020 16:32:19 +0000 (09:32 -0700)]
ORC: Remove workarounds after 1.6.5 upgrade (#1561)
Kyle Bendickson [Tue, 13 Oct 2020 16:16:44 +0000 (09:16 -0700)]
Hive: Fix filter conversion with Date (#1579)
Shardul Mahadik [Tue, 13 Oct 2020 16:11:12 +0000 (09:11 -0700)]
MR: Fix NPE when InputSplit.getLocations is called on mappers (#1582)
Kyle Bendickson [Tue, 13 Oct 2020 16:10:08 +0000 (09:10 -0700)]
Spark: Add parameter name to TestPartitionPruning (#1600)
pvary [Tue, 13 Oct 2020 16:08:25 +0000 (18:08 +0200)]
Hive: Add HiveMetaHook to support Hive DDL commands (#1495)
Jingsong Lee [Tue, 13 Oct 2020 06:25:58 +0000 (14:25 +0800)]
Flink: move hadoop configuration to Loaders from Source/Sink API (#1565)
zhangdove [Mon, 12 Oct 2020 17:33:16 +0000 (01:33 +0800)]
API: Validate bucket and truncate function parameters (#1569)
Marton Bod [Mon, 12 Oct 2020 17:25:12 +0000 (19:25 +0200)]
Hive: Run StorageHandler tests with Avro (#1585)
Co-authored-by: Marton Bod <mbod@cloudera.com>
Kyle Bendickson [Mon, 12 Oct 2020 17:22:52 +0000 (10:22 -0700)]
Replace Table.toString calls with Table.name (#1580)
Kyle Bendickson [Sun, 11 Oct 2020 05:46:37 +0000 (22:46 -0700)]
Flink: Fix IntLongMath warnings (#1581)
Kyle Bendickson [Sat, 10 Oct 2020 23:53:34 +0000 (16:53 -0700)]
Build: Update autolabeler configuration for new Hive modules (#1577)
Marton Bod [Sat, 10 Oct 2020 22:20:41 +0000 (00:20 +0200)]
Hive: Run storage handler tests for Parquet and ORC (#1570)
Co-authored-by: Marton Bod <mbod@cloudera.com>
mickjermsurawong-stripe [Sat, 10 Oct 2020 21:13:27 +0000 (14:13 -0700)]
Spark: Close final reader in BaseDataIterator (#1563)
Sushant Raikar [Sat, 10 Oct 2020 21:09:18 +0000 (14:09 -0700)]
Hive: Avoid loading catalog to initialized a serde (#1564)
jbirtman [Sat, 10 Oct 2020 20:51:31 +0000 (13:51 -0700)]
Docs: Move URI clarification earlier (#1571)
Ryan Blue [Fri, 9 Oct 2020 15:46:12 +0000 (08:46 -0700)]
Parquet: Remove hard-coded file paths from tests (#1562)
* Remove hard-coded file paths from tests.
* Fix checkstyle in tests.
Sushant Raikar [Thu, 8 Oct 2020 17:15:15 +0000 (10:15 -0700)]
Hive: Fix missing table schema in Hive 1.1 query (#1557)
Marton Bod [Wed, 7 Oct 2020 16:53:48 +0000 (18:53 +0200)]
Hive: Add Hive3 module and testing (#1478)
Hive 3 classes are included in the iceberg-hive-runtime Jar.
Ryan Blue [Wed, 7 Oct 2020 16:35:43 +0000 (09:35 -0700)]
Core: Add row-level delete validations (#1469)
Ryan Blue [Tue, 6 Oct 2020 23:57:30 +0000 (16:57 -0700)]
Add TableMetadata.updateSortOrder. (#1551)
Ryan Blue [Tue, 6 Oct 2020 23:17:18 +0000 (16:17 -0700)]
Parallelize reading manifest list files in metadata tables. (#1440)
Anton Okolnychyi [Tue, 6 Oct 2020 21:29:58 +0000 (14:29 -0700)]
Spark3: Refresh state in SparkTable for all scans when not caching (#1545)
Steven Zhen Wu [Tue, 6 Oct 2020 19:36:32 +0000 (12:36 -0700)]
BaseMetastoreTableOperations shouldn't disable refresh upon NoSuchTableException. Otherwise it might cause unfriendly NPE for callers that don't check null return value from current() method. (#1553)
Co-authored-by: Steven Wu <stevenwu@netflix.com>
Ryan Blue [Tue, 6 Oct 2020 19:35:34 +0000 (12:35 -0700)]
Core: Add partition summaries in SnapshotSummary builder (#1367)
Jungtaek Lim [Tue, 6 Oct 2020 16:55:48 +0000 (01:55 +0900)]
Spark: Add end-to-end test for partition pruning (#1487)
Ashish Mehta [Tue, 6 Oct 2020 16:34:32 +0000 (09:34 -0700)]
Core: Fix error when a DeleteFile is used twice in a task (#1514)
mickjermsurawong-stripe [Tue, 6 Oct 2020 02:38:48 +0000 (19:38 -0700)]
Core: Allow LocationProvider to be customized using reflection (#1531)
pvary [Mon, 5 Oct 2020 23:03:59 +0000 (01:03 +0200)]
Hive: Make HiveCatalog based tables readable from Hive (#1505)
Anton Okolnychyi [Mon, 5 Oct 2020 18:46:36 +0000 (11:46 -0700)]
Build: Ignore OverloadMethodsDeclarationOrder rule (#1550)
Chen, Junjie [Mon, 5 Oct 2020 17:02:02 +0000 (12:02 -0500)]
MR: Apply row-level delete files when reading (#1497)
Dongjoon Hyun [Fri, 2 Oct 2020 19:55:23 +0000 (12:55 -0700)]
ORC: Upgrade to 1.6.5 (#1546)
Kyle Bendickson [Fri, 2 Oct 2020 18:17:04 +0000 (11:17 -0700)]
Tests: Add names to parameterized tests (#1539)
Anton Okolnychyi [Thu, 1 Oct 2020 23:46:16 +0000 (16:46 -0700)]
Spark: Implement equals and hashCode in SparkTable (#1530)
Chen, Junjie [Thu, 1 Oct 2020 21:37:39 +0000 (16:37 -0500)]
MR: Use encryption manager for input files (#1532)
Anton Okolnychyi [Thu, 1 Oct 2020 20:16:31 +0000 (13:16 -0700)]
API: Add name to Table and Catalog APIs (#1537)
Resolves #658.
openinx [Thu, 1 Oct 2020 17:51:59 +0000 (01:51 +0800)]
Docs: Add Flink sink docs (#1464)
Adrian Woodhead [Thu, 1 Oct 2020 17:51:06 +0000 (18:51 +0100)]
Docs: Add Hive read docs (#1490)
Marton Bod [Thu, 1 Oct 2020 17:18:26 +0000 (19:18 +0200)]
Hive: Fix temporary folder cleanup in TestCatalogs (#1535)
Co-authored-by: Marton Bod <mbod@cloudera.com>
Kyle Bendickson [Wed, 30 Sep 2020 22:58:17 +0000 (15:58 -0700)]
Spark: Add null check for catalog in SparkTestBase (#1529)
Shardul Mahadik [Wed, 30 Sep 2020 20:35:16 +0000 (13:35 -0700)]
ORC: Fix predicate pushdown for notIn and notEqual (#1536)
Ryan Blue [Wed, 30 Sep 2020 00:17:46 +0000 (17:17 -0700)]
Spark: Implement equals and hashCode in SparkBatchScan (#1512)
Russell Spitzer [Tue, 29 Sep 2020 18:33:10 +0000 (13:33 -0500)]
Core: Replace partition field id literal with reference (#1528)
Jungtaek Lim [Tue, 29 Sep 2020 17:14:24 +0000 (02:14 +0900)]
Docs: Add section on writing to a partitioned table in Spark (#1523)
Jingsong Lee [Tue, 29 Sep 2020 17:02:03 +0000 (01:02 +0800)]
Flink: Integrate Flink reader to SQL (#1509)
Kyle Bendickson [Tue, 29 Sep 2020 00:30:46 +0000 (17:30 -0700)]
API: Add null check to PruneColumns (#1491)
Holden Karau [Tue, 29 Sep 2020 00:18:05 +0000 (17:18 -0700)]
MR: Use HadoopFileIO instead of HadoopInputFile (#1526)
Kyle Bendickson [Mon, 28 Sep 2020 22:33:43 +0000 (15:33 -0700)]
Core: Suppress UnnecessaryAnonymousClass warnings (#1493)