drill.git
2 years ago[maven-release-plugin] prepare release drill-1.17.0 1.17.0 drill-1.17.0
Volodymyr Vysotskyi [Sun, 22 Dec 2019 15:32:27 +0000 (17:32 +0200)] 
[maven-release-plugin] prepare release drill-1.17.0

2 years agoDRILL-7494: Unable to connect to Drill using JDBC driver when using custom authenticator
Volodymyr Vysotskyi [Sat, 21 Dec 2019 14:14:55 +0000 (16:14 +0200)] 
DRILL-7494: Unable to connect to Drill using JDBC driver when using custom authenticator

closes #1938

2 years agoDRILL-7490: LIMIT is not pushed to JDBC storage plugin
Volodymyr Vysotskyi [Wed, 18 Dec 2019 17:16:08 +0000 (19:16 +0200)] 
DRILL-7490: LIMIT is not pushed to JDBC storage plugin

closes #1936

2 years agoDRILL-7485: NPE on PCAP Batch Reader
Charles Givre [Mon, 16 Dec 2019 13:16:53 +0000 (08:16 -0500)] 
DRILL-7485: NPE on PCAP Batch Reader

closes #1932

2 years agoDRILL-7486: Refactor row set reader builders
Paul Rogers [Fri, 13 Dec 2019 01:37:53 +0000 (17:37 -0800)] 
DRILL-7486: Refactor row set reader builders

Moves reader building code into a shared location, independent
of the RowSet class. Allows readers to be built from a
VectorContainer in addition to a row set.

closes #1928

2 years agoDRILL-7484: Malware found in the Drill test folder
Charles Givre [Mon, 16 Dec 2019 13:14:48 +0000 (08:14 -0500)] 
DRILL-7484: Malware found in the Drill test folder

closes #1934

2 years agoDRILL-7483: Add support for 12 and 13 java versions
Volodymyr Vysotskyi [Wed, 11 Dec 2019 13:11:31 +0000 (15:11 +0200)] 
DRILL-7483: Add support for 12 and 13 java versions

closes #1935

2 years agoDRILL-7482: Fix missing artifact and overlapping classes warnings in Drill build
Volodymyr Vysotskyi [Wed, 11 Dec 2019 11:38:09 +0000 (13:38 +0200)] 
DRILL-7482: Fix missing artifact and overlapping classes warnings in Drill build

closes #1927

2 years agoDRILL-7479: Partial fixes for metadata parameterized type issues
Paul Rogers [Wed, 11 Dec 2019 04:13:30 +0000 (20:13 -0800)] 
DRILL-7479: Partial fixes for metadata parameterized type issues

See DRILL-7479 and DRILL-7480 for an explanation. Adds generic
type parameters where needed to avoid the need to supporess
warnings. However, type parameters are probably not needed
at all and should be removed in the future for reasons explained
in DRILL-7480.

closes #1923

2 years agoDRILL-7481: Fix raw type warnings in Iceberg Metastore and related classes
Arina Ielchiieva [Wed, 11 Dec 2019 10:39:29 +0000 (12:39 +0200)] 
DRILL-7481: Fix raw type warnings in Iceberg Metastore and related classes

closes #1924

2 years agoDRILL-7476: Set lastSet on TransferPair copies
Paul Rogers [Wed, 11 Dec 2019 04:15:32 +0000 (20:15 -0800)] 
DRILL-7476: Set lastSet on TransferPair copies

Variable-width nullable vectors maintain a "lastSet" field
in the mutator. This field is used in "fill empties" logic
when setting the vector's value count. This is true even
if the vector is read-only, or has been transferred from
another (read-only) vector. LastSet must be set to the
row count or the code will helpfully overwrite existing
offsets with 0.

closes #1922

2 years agoDRILL-7474: Reduce size of Drill's tar.gz file
Anton Gozhiy [Tue, 10 Dec 2019 18:35:13 +0000 (20:35 +0200)] 
DRILL-7474: Reduce size of Drill's tar.gz file

- Excluded aws-java-sdk-bundle dependency, only required aws libraries added instead.
- Cleared format-excel module from unused dependencies.

closes #1926

2 years agoDRILL-7473: Parquet reader failed to get field of repeated map
Bohdan Kazydub [Fri, 13 Dec 2019 14:27:07 +0000 (16:27 +0200)] 
DRILL-7473: Parquet reader failed to get field of repeated map

closes #1933

2 years agoDRILL-7472: Fix ser / de for sys and information_schema schemas queries
Arina Ielchiieva [Wed, 11 Dec 2019 15:26:48 +0000 (17:26 +0200)] 
DRILL-7472: Fix ser / de for sys and information_schema schemas queries

closes #1925

2 years agoDRILL-7470: Remove conflicting logback-classic dependency in drill-yarn
Anton Gozhiy [Fri, 6 Dec 2019 17:32:50 +0000 (19:32 +0200)] 
DRILL-7470: Remove conflicting logback-classic dependency in drill-yarn

closes #1920

2 years agoDRILL-6332: Allow to provide two-component Kerberos principals
Stefan Hammer [Fri, 13 Dec 2019 12:38:35 +0000 (13:38 +0100)] 
DRILL-6332: Allow to provide two-component Kerberos principals

closes #1931

2 years agoDRILL-7471: DESCRIBE TABLE command fails with ClassCastException when Metastore is...
Volodymyr Vysotskyi [Fri, 6 Dec 2019 21:26:19 +0000 (23:26 +0200)] 
DRILL-7471: DESCRIBE TABLE command fails with ClassCastException when Metastore is enabled

2 years agoDRILL-7468: Metastore unit tests may fail when used sources from the release archive
Volodymyr Vysotskyi [Fri, 6 Dec 2019 11:19:44 +0000 (13:19 +0200)] 
DRILL-7468: Metastore unit tests may fail when used sources from the release archive

closes #1917

2 years agoDRILL-7469: Disable doclint for maven-javadoc-plugin
Volodymyr Vysotskyi [Thu, 5 Dec 2019 18:43:00 +0000 (20:43 +0200)] 
DRILL-7469: Disable doclint for maven-javadoc-plugin

closes #1918

2 years agoAdd Volodymyr's PGP key
Volodymyr Vysotskyi [Wed, 4 Dec 2019 11:51:49 +0000 (13:51 +0200)] 
Add Volodymyr's PGP key

2 years agoDRILL-7221: Exclude debug files generated by maven debug option from jar
Volodymyr Vysotskyi [Mon, 2 Dec 2019 16:46:51 +0000 (18:46 +0200)] 
DRILL-7221: Exclude debug files generated by maven debug option from jar

closes #1915

2 years agoDRILL-6904: Update maven-javadoc-plugin, maven-compiler-plugin and maven-assembly...
Volodymyr Vysotskyi [Fri, 29 Nov 2019 14:27:52 +0000 (16:27 +0200)] 
DRILL-6904: Update maven-javadoc-plugin, maven-compiler-plugin and maven-assembly-plugin to the latest version

2 years agoDRILL-7208: Reuse root git.properties file
Volodymyr Vysotskyi [Thu, 28 Nov 2019 18:25:57 +0000 (20:25 +0200)] 
DRILL-7208: Reuse root git.properties file

- Generate git.properties for root module only and copy it to child modules when required

closes #1911

2 years agoDRILL-7463: Apache license is not added to the generated classes
Volodymyr Vysotskyi [Wed, 4 Dec 2019 09:43:29 +0000 (11:43 +0200)] 
DRILL-7463: Apache license is not added to the generated classes

closes #1916

2 years agoDRILL-7324: Final set of "batch count" fixes
Paul Rogers [Sat, 30 Nov 2019 02:58:59 +0000 (18:58 -0800)] 
DRILL-7324: Final set of "batch count" fixes

Final set of fixes for batch count/record count issues. Enables
vector checking for all operators.

closes #1912

2 years agoDRILL-7450: Improve performance for ANALYZE command
Volodymyr Vysotskyi [Fri, 22 Nov 2019 17:53:08 +0000 (19:53 +0200)] 
DRILL-7450: Improve performance for ANALYZE command

- Implement two-phase aggregation for the lowest metadata aggregate to optimize performance
- Allow using complex functions with hash aggregate
- Use hash aggregation for PHASE_1of2 for ANALYZE to reduce memory usage and avoid sorting non-aggregated data
- Add sort above hash aggregation to fix correctness of merge exchange and stream aggregate

closes #1907

2 years agoDRILL-5844: Incorrect values of TABLE_TYPE returned from method DatabaseMetaData...
Arjun Gupta [Wed, 20 Nov 2019 10:07:33 +0000 (15:37 +0530)] 
DRILL-5844: Incorrect values of TABLE_TYPE returned from method DatabaseMetaData.getTables of JDBC API

closes #1904

2 years agoDRILL-6540: Updated Hadoop and HBase libraries to the latest versions
Anton Gozhiy [Mon, 4 Nov 2019 12:08:22 +0000 (14:08 +0200)] 
DRILL-6540: Updated Hadoop and HBase libraries to the latest versions

Hadoop: 3.2.1
HBase: 2.2.2

closes #1895

2 years agoDRILL-6540: Upgrade to HADOOP-3.0.3 libraries
Vitalii Diravka [Tue, 4 Sep 2018 17:02:43 +0000 (20:02 +0300)] 
DRILL-6540: Upgrade to HADOOP-3.0.3 libraries

- accomodate apache and mapr profiles with hadoop 3.0 libraries
- update HBase version
- fix jdbc-all woodox dependency
- unban Apache commons-logging dependency

2 years agoDRILL-7393: Revisit Drill tests to ensure that patching is executed before any test run
Anton Gozhiy [Thu, 28 Nov 2019 12:04:22 +0000 (14:04 +0200)] 
DRILL-7393: Revisit Drill tests to ensure that patching is executed before any test run

- Added BaseTest with patchers and extended all tests from it.
- Added a test to java-exec module to ensure that all tests there are inherited from BaseTest.
- Revised exception handling in the patchers, now it's individual for each patching method.

closes #1910

2 years agoUpdate Slack Link in README.md
Charles S. Givre [Tue, 3 Dec 2019 19:36:11 +0000 (14:36 -0500)] 
Update Slack Link in README.md

This commit updates the slack link in the `README.md` file so that people can join the channel directly.

JIRA: https://issues.apache.org/jira/browse/DRILL-7462

2 years agoDRILL-7456: Batch count fixes for 12 operators
Paul Rogers [Sat, 23 Nov 2019 01:28:24 +0000 (17:28 -0800)] 
DRILL-7456: Batch count fixes for 12 operators

Enables batch validation for 12 additional operators:

* MergingRecordBatch
* OrderedPartitionRecordBatch
* RangePartitionRecordBatch
* TraceRecordBatch
* UnionAllRecordBatch
* UnorderedReceiverBatch
* UnpivotMapsRecordBatch
* WindowFrameRecordBatch
* TopNBatch
* HashJoinBatch
* ExternalSortBatch
* WriterRecordBatch

Fixes issues found with those checks so that this set of
operators passes all checks.

Includes code cleanup in many files touched during this
work.

closes #1906

2 years agoDRILL-7443: Enable PCAP Plugin to Reassemble TCP Streams
Charles Givre [Fri, 22 Nov 2019 21:04:29 +0000 (16:04 -0500)] 
DRILL-7443: Enable PCAP Plugin to Reassemble TCP Streams

closes #1898

2 years agoDRILL-7453: Update joda-time to 2.10.5 to have correct time zone info
Bohdan Kazydub [Thu, 21 Nov 2019 16:43:45 +0000 (18:43 +0200)] 
DRILL-7453: Update joda-time to 2.10.5 to have correct time zone info

2 years agoDRILL-7445: Create batch copier based on result set framework
Paul Rogers [Mon, 11 Nov 2019 00:17:49 +0000 (16:17 -0800)] 
DRILL-7445: Create batch copier based on result set framework

The result set framework now provides both a reader and writer.
This PR provides a copier that copies batches using this
framework. Such a copier can:

- Copy selected records
- Copy all records, such as for an SV2 or SV4

The copier uses the result set loader to create uniformly-sized
output batches from input batches of any size. It does this
by merging or splitting input batches as needed.

Since the result set reader handles both SV2 and SV4s, the
copier can filter or reorder rows based on the SV associated
with the input batch.

This version assumes single stream of input batches, and handles
any schema changes in that input by creating output batches
that track the input schema. This would be used in, say, the
selection vector remover. A different design is needed for merging
such as in the merging receiver.

Adds a "copy" method to the column writers. Copy is implemented
by doing a direct memory copy from source to destination vectors.

A unit test verifies functionality for various use cases
and data types.

closes #1899

2 years agoDRILL-7441: Fix issues with fillEmpties, offset vectors
Paul Rogers [Fri, 8 Nov 2019 04:56:13 +0000 (20:56 -0800)] 
DRILL-7441: Fix issues with fillEmpties, offset vectors

Fixes subtle issues with offset vectors and "fill empties"
logic.

Drill has an informal standard that if a batch has no rows, then
offset vectors within that batch should have zero size. Contrast
this with batches of size 1 that should have offset vectors of
size 2. Changed to enforce this rule throughout.

Nullable, repeated and variable-width vectors have "fill empties"
logic that is used in two places: when setting the value count and
when preparing to write a new value. The current logic is not
quite right for either case. Added tests and fixed the code to
properly handle each case.

Revised the batch validator to enforce the offset-vector length of 0 for
0-sized batches rule. The result was much simpler code.

Added tools to easily print a batch, restoring some code that
was recently lost when the RowSet classes were moved.

Code cleanup in all files touched.

Added logic to "dirty" allocated buffers when testing to ensure
logic is not sensitive to the "pristine" state of new buffers.

Added logic to the column writers to enforce the zero-size-batch rule
for offset vectors. Added unit tests for this case.

Fixed the column writers to set the "lastSet" mutator value for
nullable types since other code relies on this value.

Removed the "setCount" field in nullable vectors: turns out
it is not actually used.

closes #1896

2 years agoDRILL-7448: Fix warnings when running Drill memory tests
Bohdan Kazydub [Mon, 18 Nov 2019 17:51:55 +0000 (19:51 +0200)] 
DRILL-7448: Fix warnings when running Drill memory tests

closes #1902

2 years agoDRILL-7388: Kafka improvements
Arina Ielchiieva [Fri, 15 Nov 2019 14:01:52 +0000 (16:01 +0200)] 
DRILL-7388: Kafka improvements

1. Upgraded Kafka libraries to 2.3.1 (DRILL-6739).
2. Added new options to support the same features as native JSON reader:
  a. store.kafka.reader.skip_invalid_records, default: false (DRILL-6723);
  b. store.kafka.reader.allow_nan_inf, default: true;
  c. store.kafka.reader.allow_escape_any_char, default: false.
3. Fixed issue when Kafka topic contains only one message (DRILL-7388).
4. Replaced Gson parser with Jackson to parse JSON in the same manner as Drill native Json reader.
5. Performance improvements: Kafka consumers will be closed async, fixed issue with resource leak (DRILL-7290), moved to debug unnecessary info logging.
6. Updated bootstrap-storage-plugins.json to reflect actual Kafka connection properties.
7. Added unit tests.
8. Refactoring and code clean up.

closes #1901

2 years agoDRILL-7446: Fix Eclipse compilation issue in AbstractParquetGroupScan
Paul Rogers [Sun, 17 Nov 2019 05:58:12 +0000 (21:58 -0800)] 
DRILL-7446: Fix Eclipse compilation issue in AbstractParquetGroupScan

Adds dummy parameter types to several files to avoid compilation errors
when loading Drill into Eclipse.

2 years agoDRILL-7273: Introduce operators for handling metadata
Volodymyr Vysotskyi [Wed, 26 Jun 2019 15:11:59 +0000 (18:11 +0300)] 
DRILL-7273: Introduce operators for handling metadata

closes #1886

2 years agoDRILL-7372: MethodAnalyzer consumes too much memory
Volodymyr Vysotskyi [Wed, 30 Oct 2019 14:49:45 +0000 (16:49 +0200)] 
DRILL-7372: MethodAnalyzer consumes too much memory

closes #1887

2 years agoDRILL-7409: Moving test with huge test data to the drill-test-framework.
Denys Ordynskiy [Tue, 5 Nov 2019 12:55:09 +0000 (14:55 +0200)] 
DRILL-7409: Moving test with huge test data to the drill-test-framework.

closes #1891

2 years agoDRILL-7440: Failure during loading of RepeatedCount functions
Bohdan Kazydub [Thu, 7 Nov 2019 17:12:26 +0000 (19:12 +0200)] 
DRILL-7440: Failure during loading of RepeatedCount functions

closes #1894

2 years agoDRILL-7442: Create multi-batch row set reader
Paul Rogers [Mon, 11 Nov 2019 00:17:49 +0000 (16:17 -0800)] 
DRILL-7442: Create multi-batch row set reader

Adds a ResultSetReader that works across multiple batches
in a result set. Reuses the same row set and readers if
schema is unchanged, creates a new set if the schema changes.

Adds a unit test for the result set reader.

Adds a "rebind" capability to the row set readers to focus
on new buffers under an existing set of vectors. Used when
a new batch arrives, if the schema is unchanged.

Extends row set classses to be aware of the BatchAccessor class
which encapsulates a container and optional selection vector,
and tracks schema changes.

Moves row set tests into the same package as the row sets.
(Row set classes were moved a while back, but the tests were
not moved.)

Renames some BatchAccessor methods.

closes #1897

2 years agoDRILL-7439: Batch count fixes for six additional operators
Paul Rogers [Tue, 5 Nov 2019 21:50:56 +0000 (13:50 -0800)] 
DRILL-7439: Batch count fixes for six additional operators

Enables vector checks, and fixes batch count and vector issues for:

* StreamingAggBatch
* RuntimeFilterRecordBatch
* FlattenRecordBatch
* MergeJoinBatch
* NestedLoopJoinBatch
* LimitRecordBatch

Also fixes a zero-size batch validity issue for the CSV reader when
all files contain no data.

Includes code cleanup for files touched in this PR.

closes #1893

2 years agoDRILL-7436: Fix record count, vector structure issues in several operators
Paul Rogers [Mon, 4 Nov 2019 00:21:54 +0000 (16:21 -0800)] 
DRILL-7436: Fix record count, vector structure issues in several operators

Adds additional vector checks to the BatchValidator.

Enables checking for the following operators:

* FilterRecordBatch
* PartitionLimitRecordBatch
* UnnestRecordBatch
* HashAggBatch
* RemovingRecordBatch

Fixes vector count issues for each of these.

Fixes empty-batch (record count = 0) handling in several of the
above operators. Added a method to VectorContainer to correctly
create an empty batch. (An empty batch, counter-intuitively,
needs vectors allocated to hold the 0 value in the first
position of each offset vector.)

Disables verbose logging for MongoDB tests. Details are written to
the log rather than the console.

Disables two invalid Mongo tests. See DRILL-7428.

Adjusts the expression tree materializer to not add the LATE type
to Union vectors. (See DRILL-7435.)

Ensures that Union vectors contain valid vectors for each subtype.
The present fix is a work-around, see DRILL-7434 for a better
long-term fix.

Cleans up code formatting and other minor issues in each file touched
during the fixes in this PR.

2 years agoDRILL-7391: Wrong result when doing left outer join on CSV table
Volodymyr Vysotskyi [Tue, 29 Oct 2019 10:49:34 +0000 (12:49 +0200)] 
DRILL-7391: Wrong result when doing left outer join on CSV table

2 years agoDRILL-7397: Fix logback errors when building the project
Bohdan Kazydub [Mon, 28 Oct 2019 14:19:00 +0000 (16:19 +0200)] 
DRILL-7397: Fix logback errors when building the project

2 years agoDRILL-7418: MetadataDirectGroupScan improvements
Arina Ielchiieva [Tue, 22 Oct 2019 14:33:04 +0000 (17:33 +0300)] 
DRILL-7418: MetadataDirectGroupScan improvements

1. Replaced files listing with selection root information to reduce query plan size in MetadataDirectGroupScan.
2. Fixed MetadataDirectGroupScan ser / de issues.
3. Added PlanMatcher to QueryBuilder for more convenient plan matching.
4. Re-written TestConvertCountToDirectScan to use ClusterTest.
5. Refactoring and code clean up.

2 years agoDRILL-7351: Added tokens to Web forms to prevent CSRF attacks
Anton Gozhiy [Fri, 13 Sep 2019 14:21:05 +0000 (17:21 +0300)] 
DRILL-7351: Added tokens to Web forms to prevent CSRF attacks

2 years agoDRILL-7424: Project operator fails to set the container row count
Paul Rogers [Sun, 27 Oct 2019 07:23:25 +0000 (00:23 -0700)] 
DRILL-7424: Project operator fails to set the container row count

Enabled the "batch validator" for the Project operator. Ran tests.
Exceptions occurred because, in some paths, the Project operator
fails to set the container row count.

Fixes the project operator. Cleans up formatting issues in files
touched during the investigation. Cleaned up batch-related issues
in Project.

2 years agoDRILL-7347: Upgrade Apache Iceberg to released version
Arina Ielchiieva [Mon, 28 Oct 2019 11:51:21 +0000 (13:51 +0200)] 
DRILL-7347: Upgrade Apache Iceberg to released version

2 years agoDRILL-4303: ESRI Shapefile (shp) Format Plugin
Charles Givre [Mon, 28 Oct 2019 14:39:20 +0000 (10:39 -0400)] 
DRILL-4303: ESRI Shapefile (shp) Format Plugin

2 years agoDRILL-7177: Format Plugin for Excel Files
Charles Givre [Mon, 28 Oct 2019 11:30:15 +0000 (07:30 -0400)] 
DRILL-7177: Format Plugin for Excel Files

closes #1749

2 years agoDRILL-1709: Add desc alias for describe command
Arina Ielchiieva [Thu, 24 Oct 2019 12:11:24 +0000 (15:11 +0300)] 
DRILL-1709: Add desc alias for describe command

closes #1881

2 years agoDRILL-7417: Add user logged in/out event in info level logs 1880/head
Sorabh Hamirwasia [Tue, 22 Oct 2019 21:16:52 +0000 (14:16 -0700)] 
DRILL-7417: Add user logged in/out event in info level logs

2 years agoDRILL-7413: Test and fix scan operator vectors
Paul Rogers [Sun, 20 Oct 2019 19:03:28 +0000 (12:03 -0700)] 
DRILL-7413: Test and fix scan operator vectors

Enables vector validation tests for the ScanBatch and all
EasyFormat plugins. Fixes a bug in scan batch that failed to set
the record count in the output container.

Fixes a number of formatting and other issues found while adding
the tests.

2 years agoDRILL-5674: Support ZIP compression
Arina Ielchiieva [Fri, 18 Oct 2019 15:22:15 +0000 (18:22 +0300)] 
DRILL-5674: Support ZIP compression

1. Added ZipCodec implementation which can read / write single file.
2. Revisited Drill plugin formats to ensure 'openPossiblyCompressedStream' method is used in those which support compression.
3. Added unit tests.
4. General refactoring.

2 years agoDRILL-7414: EVF incorrectly sets buffer writer index after rollover
Paul Rogers [Sun, 20 Oct 2019 21:09:26 +0000 (14:09 -0700)] 
DRILL-7414: EVF incorrectly sets buffer writer index after rollover

Enabling the vector validator on the "new" scan operator, in cases
in which overflow occurs, identified that the DrillBuf writer index
was not properly set for repeated vectors.

Enables such checking, adds unit tests, and fixes the writer index
issue.

closes #1878

2 years agoDRILL-7403: Validate batch checks, vector integretity in unit tests
Paul Rogers [Sun, 13 Oct 2019 19:41:03 +0000 (12:41 -0700)] 
DRILL-7403: Validate batch checks, vector integretity in unit tests

Enhances the existing record batch checks to check all the various
batch record counts, and to more fully validate all vector types.

This code revealed that virtually all record batches have
problems: they omit setting some record count or other, they
introduce some form of vector corruption.

Since we want things to work as we make fixes, this change enables
the checks for only one record batch: the "new" scan. Others are
to come as they are fixed.

closes #1871

2 years agoDRILL-6096: Provide mechanism to configure text writer configuration
Arina Ielchiieva [Thu, 10 Oct 2019 12:43:33 +0000 (15:43 +0300)] 
DRILL-6096: Provide mechanism to configure text writer configuration

1. Usage of format plugin configuration allows to specify line and field delimiters, quotes and escape characters.
2. Usage of system / session options allows to specify if writer should add headers, force quotes.

closes #1873

2 years agoDRILL-7405: Avoiding download of TPC-H data
Abhishek Girish [Thu, 17 Oct 2019 03:24:27 +0000 (20:24 -0700)] 
DRILL-7405: Avoiding download of TPC-H data

closes #1874

2 years agoDRILL-7412: Minor unit test improvements
Paul Rogers [Sun, 20 Oct 2019 06:48:39 +0000 (23:48 -0700)] 
DRILL-7412: Minor unit test improvements

Many tests intentionally trigger errors. A debug-only log setting
sent those errors to stdout. The resulting stack dumps simply cluttered
the test output, so disabled error output to the console.

Drill can apply bounds checks to vectors. Tests run via Maven
enable bounds checking. Now, bounds checking is also enabled in
"debug mode" (when assertions are enabled, as in an IDE.)

Drill contains two test frameworks. The older BaseTestQuery was
marked as deprecated, but many tests still use it and are unlikely
to be changed soon. So, removed the deprecated marker to reduce the
number of spurious warnings.

Also includes a number of minor clean-ups.

closes #1876

2 years agoDRILL-7402: Suppress batch dumps for expected failures in tests
Paul Rogers [Sun, 13 Oct 2019 21:43:27 +0000 (14:43 -0700)] 
DRILL-7402: Suppress batch dumps for expected failures in tests

Drill provides a way to dump the last few batches when an error
occurs. However, in tests, we often deliberately cause something
to fail. In this case, the batch dump is unnecessary.

This enhancement adds a config property, disabled in tests, that
controls the dump activity. The option is enabled in the one test
that needs it enabled.

closes #1872

2 years agoDRILL-7401: Upgrade to SqlLine 1.9.0
Arina Ielchiieva [Wed, 16 Oct 2019 15:10:20 +0000 (18:10 +0300)] 
DRILL-7401: Upgrade to SqlLine 1.9.0

closes #1875

2 years agoDRILL-7385: Convert PCAP Format Plugin to EVF
Charles Givre [Sat, 12 Oct 2019 23:46:13 +0000 (19:46 -0400)] 
DRILL-7385: Convert PCAP Format Plugin to EVF

2 years agoDRILL-7377: Nested schemas for dynamic EVF columns
Paul Rogers [Mon, 7 Oct 2019 05:09:44 +0000 (22:09 -0700)] 
DRILL-7377: Nested schemas for dynamic EVF columns

The Result Set Loader (part of EVF) allows adding columns up-front
before reading rows (so-called "early schema.") Such schemas allow
nested columns (maps with members, repeated lists with a type, etc.)

The Result Set Loader also allows adding columns dynamically
while loading data (so-called "late schema".) Previously, the code
assumed that columns would be added top-down: first the map, then
the map's contents, etc.

Charles found a need to allow adding a nested column (a repeated
list with a declared list type.)

This patch revises the code to use the same mechanism in both the
early- and late-schema cases, allowing adding nested columns at
any time.

Testing: Added a new unit test case for the repeated list late
schema with content case.

2 years agoDRILL-7254: Read Hive union w/o nulls
Igor Guzenko [Wed, 25 Sep 2019 15:58:39 +0000 (18:58 +0300)] 
DRILL-7254: Read Hive union w/o nulls

2 years agoDRILL-7358: Fix COUNT(*) for empty text files
Paul Rogers [Sun, 6 Oct 2019 01:57:14 +0000 (18:57 -0700)] 
DRILL-7358: Fix COUNT(*) for empty text files

Fixes a subtle error when a text file has a header (and so has a
schema), but is in a COUNT(*) query, so that no columns are
projected. Ensures that, in this case, an empty schema is
treated as a valid result set.

Tests: updated CSV tests to include this case.

closes #1867

2 years agoDRILL-5983: Add missing nullable Parquet readers for INT and UINT logical types
Arina Ielchiieva [Fri, 4 Oct 2019 11:36:01 +0000 (14:36 +0300)] 
DRILL-5983: Add missing nullable Parquet readers for INT and UINT logical types

closes #1866

2 years agoDRILL-7387: Failed to get value by int key from map nested into struct
Igor Guzenko [Wed, 25 Sep 2019 10:49:51 +0000 (13:49 +0300)] 
DRILL-7387: Failed to get value by int key from map nested into struct

2 years agoDRILL-7374: Support for IPV6 address
Arjun Gupta [Wed, 18 Sep 2019 05:43:44 +0000 (11:13 +0530)] 
DRILL-7374: Support for IPV6 address

closes #1857

2 years agoDRILL-7174: Expose complex to Json control in the Drill C++ Client
Arjun Gupta [Tue, 25 Jun 2019 09:51:19 +0000 (15:21 +0530)] 
DRILL-7174: Expose complex to Json control in the Drill C++ Client

closes #1814

2 years agoDRILL-7357: Expose Drill Metastore data through information_schema
Arina Ielchiieva [Fri, 20 Sep 2019 16:11:31 +0000 (19:11 +0300)] 
DRILL-7357: Expose Drill Metastore data through information_schema

1. Add additional columns to TABLES and COLUMNS tables.
2. Add PARTITIONS table.
3. General refactoring to adjust information_schema data retrieval from multiple sources.

closes #1860

2 years agoDRILL-7170: Ignore uninitialized vector containers for OOM error messages
Ben-Zvi [Thu, 26 Sep 2019 00:27:13 +0000 (17:27 -0700)] 
DRILL-7170: Ignore uninitialized vector containers for OOM error messages

2 years agoDRILL-7380: Query of a field inside of an array of structs returns null
Igor Guzenko [Fri, 20 Sep 2019 16:41:23 +0000 (19:41 +0300)] 
DRILL-7380: Query of a field inside of an array of structs returns null

1. Fixed parquet reader projection for Logical lists (DrillParquetReader.java)
2. Fixed projection pushdown for RexFieldAccess (ProjectFieldsVisitor.java)
3. DrillParquetReader.getProjection(...) splitted into few methods
4. Added javadocs for PathSegment and SchemaPath

2 years agoDRILL-7252: Read Hive map using Dict<K,V> vector
Igor Guzenko [Thu, 5 Sep 2019 15:05:53 +0000 (18:05 +0300)] 
DRILL-7252: Read Hive map using Dict<K,V> vector

2 years agoDRILL-7373: Fix problems involving reading from DICT type
Bohdan Kazydub [Thu, 12 Sep 2019 17:10:25 +0000 (20:10 +0300)] 
DRILL-7373: Fix problems involving reading from DICT type

- Fixed FieldIdUtil to resolve reading from DICT for some complex cases;
- optimized reading from DICT given a key by passing an appropriate Object type to DictReader#find(...) and DictReader#read(...) methods when schema is known (e.g. when reading from Hive tables) instead of generating it on fly based on int or String path and key type;
- fixed error when accessing value by not existing key value in Avro table.

2 years agoDRILL-7376: Drill ignores Hive schema for MaprDB tables when group scan has star...
Volodymyr Vysotskyi [Wed, 11 Sep 2019 17:51:07 +0000 (20:51 +0300)] 
DRILL-7376: Drill ignores Hive schema for MaprDB tables when group scan has star column

2 years agoDRILL-7368: Fix Iceberg Metastore failure when filter column contains nulls
Arina Ielchiieva [Tue, 10 Sep 2019 12:30:10 +0000 (15:30 +0300)] 
DRILL-7368: Fix Iceberg Metastore failure when filter column contains nulls

2 years agoDRILL-7168: Implement ALTER SCHEMA ADD / REMOVE commands
Arina Ielchiieva [Thu, 29 Aug 2019 13:15:43 +0000 (16:15 +0300)] 
DRILL-7168: Implement ALTER SCHEMA ADD / REMOVE commands

2 years agoDRILL-7369: Schema for MaprDB tables is not used for the case when several fields...
Volodymyr Vysotskyi [Fri, 6 Sep 2019 11:57:41 +0000 (14:57 +0300)] 
DRILL-7369: Schema for MaprDB tables is not used for the case when several fields are queried

closes #1852

2 years agoDRILL-7367: Remove Server details from response headers
Arina Ielchiieva [Thu, 5 Sep 2019 14:04:23 +0000 (17:04 +0300)] 
DRILL-7367: Remove Server details from response headers

closes #1851

2 years agoDRILL-7362: COUNT(*) on JSON with outer list results in JsonParse error
ozinoviev [Thu, 29 Aug 2019 12:14:17 +0000 (15:14 +0300)] 
DRILL-7362: COUNT(*) on JSON with outer list results in JsonParse error

closes #1849

2 years agoDRILL-7343: Add User-Agent UDFs to Drill
Charles Givre [Thu, 5 Sep 2019 14:29:01 +0000 (10:29 -0400)] 
DRILL-7343: Add User-Agent UDFs to Drill

closes #1840

2 years agoDRILL-7096: Develop vector for canonical Map<K,V>
Bohdan Kazydub [Mon, 25 Mar 2019 14:40:32 +0000 (16:40 +0200)] 
DRILL-7096: Develop vector for canonical Map<K,V>

- Added new type DICT;
- Created value vectors for the type for single and repeated modes;
- Implemented corresponding FieldReaders and FieldWriters;
- Made changes in EvaluationVisitor to be able to read values from the map by key;
- Made changes to DrillParquetGroupConverter to be able to read Parquet's MAP type;
- Added an option `store.parquet.reader.enable_map_support` to disable reading MAP type as DICT from Parquet files;
- Updated AvroRecordReader to use new DICT type for Avro's MAP;
- Added support of the new type to ParquetRecordWriter.

2 years agoDRILL-7253: Read Hive struct w/o nulls
Igor Guzenko [Wed, 19 Jun 2019 09:39:11 +0000 (12:39 +0300)] 
DRILL-7253: Read Hive struct w/o nulls

2 years agoDRILL-7360: Refactor WatchService in Drillbit class and fix concurrency issues
Arina Ielchiieva [Fri, 23 Aug 2019 13:30:30 +0000 (16:30 +0300)] 
DRILL-7360: Refactor WatchService in Drillbit class and fix concurrency issues

2 years agoDRILL-7353: Wrong driver class is written to the java.sql.Driver
Anton Gozhiy [Mon, 19 Aug 2019 17:33:14 +0000 (20:33 +0300)] 
DRILL-7353: Wrong driver class is written to the java.sql.Driver

closes #1845

2 years agoDRILL-7222: Visualize estimated and actual row counts for a query
Kunal Khatua [Thu, 22 Aug 2019 17:00:58 +0000 (10:00 -0700)] 
DRILL-7222: Visualize estimated and actual row counts for a query

With statistics in place, it is useful to have the estimated rowcount along side the actual rowcount query profile's operator overview. A toggle button allows this with the estimated rows hidden by default

We can extract this from the Physical Plan section of the profile.
Added a toggle-ready table-column header

closes #1779

2 years agoDRILL-7339: Iceberg commit upgrade and Metastore tests categorization
Arina Ielchiieva [Wed, 14 Aug 2019 16:17:46 +0000 (19:17 +0300)] 
DRILL-7339: Iceberg commit upgrade and Metastore tests categorization

1. Upgraded Iceberg commit to fix issue with deletes in transaction
2. Categorize Metastore tests

closes #1842

2 years agoDRILL-7326: Support repeated lists for CTAS parquet format
Igor Guzenko [Mon, 19 Aug 2019 17:02:51 +0000 (20:02 +0300)] 
DRILL-7326: Support repeated lists for CTAS parquet format

closes #1844

2 years agoDRILL-7356: Introduce session options for the Drill Metastore
Volodymyr Vysotskyi [Thu, 22 Aug 2019 16:01:15 +0000 (19:01 +0300)] 
DRILL-7356: Introduce session options for the Drill Metastore

closes #1846

2 years agoDRILL-7156: Support empty Parquet files creation
Oleg Zinoviev [Sun, 16 Jun 2019 18:21:46 +0000 (21:21 +0300)] 
DRILL-7156: Support empty Parquet files creation

closes #1836

2 years agoDRILL-7350: Move RowSet related classes from test folder
Volodymyr Vysotskyi [Thu, 15 Aug 2019 12:25:10 +0000 (15:25 +0300)] 
DRILL-7350: Move RowSet related classes from test folder

2 years agoDRILL-4517: Support reading empty Parquet files
Arina Ielchiieva [Tue, 6 Aug 2019 20:16:36 +0000 (23:16 +0300)] 
DRILL-4517: Support reading empty Parquet files

1. Modified flat and complex parquet readers to output schema only when requested number of records to read is 0. In this case readers are not initialized to improve performance.
2. Allowed reading requested number of rows instead of all rows in the row group (DRILL-6528).
3. Fixed issue with nulls number determination in the row group (fixed IsPredicate#isAllNulls method).
4. Allowed reading empty parquet files via adding empty / fake row group.
5. General refactoring and unit tests.
6. Parquet tests categorization.

closes #1839

2 years agoDRILL-6961: Handle exceptions during queries to information_schema
Anton Gozhiy [Wed, 10 Jul 2019 16:59:15 +0000 (19:59 +0300)] 
DRILL-6961: Handle exceptions during queries to information_schema

closes #1833

2 years agoDRILL-7338: REST API calls to Drill fail due to insufficient heap memory
Kunal Khatua [Thu, 8 Aug 2019 19:34:12 +0000 (12:34 -0700)] 
DRILL-7338: REST API calls to Drill fail due to insufficient heap memory

This PR allows for the 85% threshold to be customizable with a value of 0 meant for disabling.

closes #1837

2 years agoDRILL-7341: Vector reAlloc may fail after exchange
ozinoviev [Wed, 7 Aug 2019 13:38:01 +0000 (16:38 +0300)] 
DRILL-7341: Vector reAlloc may fail after exchange

closes #1838

2 years agoDRILL-7337: Add vararg UDFs support
Volodymyr Vysotskyi [Fri, 2 Aug 2019 14:34:33 +0000 (17:34 +0300)] 
DRILL-7337: Add vararg UDFs support