Arina Ielchiieva [Mon, 24 Jul 2017 13:47:07 +0000 (13:47 +0000)]
[maven-release-plugin] prepare release drill-1.11.0
chunhui-shi [Wed, 8 Mar 2017 07:39:32 +0000 (23:39 -0800)]
DRILL-5165: For limit all case, no need to push down limit to scan
Arina Ielchiieva [Thu, 29 Jun 2017 13:08:33 +0000 (16:08 +0300)]
DRILL-4720: Fix SchemaPartitionExplorer.getSubPartitions method implementations to return only Drill file system directories
1. Added file system util helper classes to standardize list directory and file statuses usage in Drill with appropriate unit tests.
2. Fixed SchemaPartitionExplorer.getSubPartitions method implementations to return only directories that can be partitions according to Drill file system rules
(excluded all files and directories that start with dot or underscore).
3. Added unit test for directory explorers UDFs with and without metadata cache presence.
4. Minor refactoring.
closes #864
Roman Kulyk [Thu, 20 Jul 2017 13:33:49 +0000 (16:33 +0300)]
DRILL-5083: status.getOutcome() return FAILURE if one of the batches has STOP status (to avoid infinite loop in Merge Join).
closes #881
Boaz Ben-Zvi [Tue, 18 Jul 2017 22:18:08 +0000 (15:18 -0700)]
DRILL-5665: Add the option planner.force_2phase_aggr to override small inputs
closes #872
Boaz Ben-Zvi [Mon, 17 Jul 2017 21:21:29 +0000 (14:21 -0700)]
DRILL-5669: Add a configurable option for minimum memory allocation to buffered ops
closes #879
Boaz Ben-Zvi [Tue, 11 Jul 2017 00:53:28 +0000 (17:53 -0700)]
DRILL-5616: Add memory checks, plus minor metrics changes
closes #871
Rob Wu [Wed, 19 Jul 2017 05:55:52 +0000 (22:55 -0700)]
DRILL-5678: Undefined behavior due to un-initialized values in ServerMetaContext
closes #880
Roman Kulyk [Thu, 6 Jul 2017 15:27:34 +0000 (18:27 +0300)]
DRILL-4511: Add unit tests for "Table does not exist" situation in case of empty directory or incorrect table name
closes #869
Volodymyr Vysotskyi [Tue, 4 Jul 2017 18:19:23 +0000 (18:19 +0000)]
DRILL-4755: Fix IOBE for convert_from/convert_to functions with incorrect encoding type
closes #867
cgivre [Mon, 3 Jul 2017 03:13:02 +0000 (23:13 -0400)]
DRILL-5634: Add Crypto Functions
closes #865
Parth Chandra [Wed, 12 Jul 2017 21:39:17 +0000 (14:39 -0700)]
DRILL-5659: Fix error code checking in reading from socket
This closes #876
Arina Ielchiieva [Sun, 16 Jul 2017 14:04:27 +0000 (14:04 +0000)]
Add Arina's PGP key.
Laurent Goujon [Wed, 12 Jul 2017 05:25:44 +0000 (22:25 -0700)]
DRILL-5668: Fix C++ connector crash on error
Fix C++ connector crash when receiving error messages exceeding
a given size.
closes #873
Volodymyr Vysotskyi [Thu, 29 Jun 2017 15:28:05 +0000 (15:28 +0000)]
DRILL-4970: Prevent changing the negative value of input holder for cast functions
closes #863
Kunal Khatua [Thu, 29 Jun 2017 00:39:08 +0000 (17:39 -0700)]
DRILL-5420: ParquetAsyncPgReader goes into infinite loop during cleanup
PageQueue is cleaned up using poll() instead of take(), which constantly gets interrupted and causes CPU churn.
During a columnReader shutdown, a flag is set so as to block any new page reading tasks from being submitted.
closes #862
Volodymyr Vysotskyi [Mon, 26 Jun 2017 18:12:34 +0000 (18:12 +0000)]
DRILL-4722: Fix EqualityVisitor for interval day expressions with millis
closes #861
Vlad Storona [Thu, 11 May 2017 13:53:08 +0000 (13:53 +0000)]
DRILL-5432: Added pcap-format support as format plugin
closes #831
Paul Rogers [Tue, 16 May 2017 22:55:41 +0000 (15:55 -0700)]
DRILL-5518: Test framework enhancements
* Create a SubOperatorTest base class to do routine setup and shutdown.
* Additional methods to simplify creating complex schemas with field
widths.
* Define a test workspace with plugin-specific options (as for the CSV
storage plugin)
* When verifying row sets, add methods to verify and release just the
"actual" batch in addition to the existing method for verify and free
both the actual and expected batches.
* Allow reading of row set values as object for generic comparisons.
* "Column builder" within schema builder to simplify building a single
MatrializedField for tests.
* Misc. code cleanup.
closes #851
Paul Rogers [Tue, 16 May 2017 20:20:32 +0000 (13:20 -0700)]
DRILL-5517: Size-aware set methods in value vectors
Please see DRILL-5517 for an explanation.
Also includes a workaround for DRILL-5529.
Implements a setEmpties method for repeated and non-nullable
variable-width types in support of the revised column accessors.
Unit test included. Without the setEmpties call, the tests fail with
vector corruption. With the call, things work properly.
closes #840
Arina Ielchiieva [Thu, 15 Jun 2017 13:01:54 +0000 (16:01 +0300)]
DRILL-5538: Create TopProject with validatedNodeType after PHYSICAL phase
closes #844
Padma Penumarthy [Thu, 15 Jun 2017 18:43:04 +0000 (11:43 -0700)]
DRILL-5587: Validate Parquet blockSize and pageSize configured with SYSTEM/SESSION option
close #852
Arina Ielchiieva [Tue, 20 Jun 2017 09:18:27 +0000 (12:18 +0300)]
DRILL-5599: Notify StatusHandler that batch sending has failed even if channel is still open
close #857
Paul Rogers [Fri, 16 Jun 2017 05:46:56 +0000 (22:46 -0700)]
DRILL-5590: Bugs in CSV field matching, null columns
Please see the problem and solution descriptions in DRILL-5590.
Also cleaned up some dead code left over from DRILL-5498.
close #855
Arina Ielchiieva [Thu, 15 Jun 2017 15:03:34 +0000 (18:03 +0300)]
DRILL-5130: Implement DrillValuesRel and ValuesPrel as Calcite Values sub-classes
Sorabh Hamirwasia [Mon, 5 Jun 2017 20:45:27 +0000 (13:45 -0700)]
DRILL-5568: Include hadoop-common jars inside drill-jdbc-all.jar 1) Introduce a new file inside exec/jdbc-all resource package which contains the property key/value pair for prefix 2) At build time based upon the profile choosen this property value is set inside the file 3) It is later consumed by SecurityConfiguration class to rename classpath for classes inside hadoop package.
DRILL-5568: Code review changes
close apache/drill#849
Vitalii Diravka [Fri, 14 Apr 2017 18:57:13 +0000 (18:57 +0000)]
DRILL-3867: Metadata Caching : Moving a directory which contains a cache file causes subsequent queries to fail - change absolute to relative path in the parquet metadata cache files; - add converting of the relative paths in the metadata to absolute ones. - test case when table is moved to other place after creating meta cache files.
Changes according to the review
Minor changes according to the review
close apache/drill#824
Paul Rogers [Thu, 6 Apr 2017 20:57:19 +0000 (13:57 -0700)]
DRILL-5325: Unit tests for the managed sort
Uses the sub-operator test framework (DRILL-5318), including the test
row set abstraction (DRILL-5323) to enable unit testing of the
“managed” external sort. This PR allows early review of the code, but
cannot be pulled until the dependencies (mentioned above) are pulled.
Refactors the external sort code into small chunks that can be unit
tested, then “wraps” that code in tests for all interesting data types,
record batch sizes, and so on.
Refactors some of the operator definitions to more easily allow
programmatic setup in the unit tests.
Fixes a number of bugs discovered by the unit tests. The biggest
changes were in the new code: the code that computes spilling and
merging based on memory levels.
Otherwise, although GitHub will show many files change, most of the
changes are simply moving blocks of code around to create smaller units
that can be tested independently.
Includes a refactoring of the code that does spilling, along with a
complete set of low-level unit tests.
Excludes long-running sort tests.
Defines a test category for long-running tests.
First attempt to provide a way to run such tests from Maven.
closes #808
Boaz Ben-Zvi [Tue, 20 Jun 2017 02:04:30 +0000 (19:04 -0700)]
DRILL-5457: Spill implementation for Hash Aggregate
closes #822
Paul Rogers [Mon, 15 May 2017 22:59:35 +0000 (15:59 -0700)]
DRILL-5514: Enhance VectorContainer to merge two row sets
Adds ability to merge two schemas and to merge two vector containers,
in each case producing a new, merged result. See DRILL-5514 for details.
Also provides a handy constructor to create a vector container given a
pre-defined schema.
closes #837
Vitalii Diravka [Thu, 25 May 2017 17:10:55 +0000 (17:10 +0000)]
DRILL-5544: Out of heap running CTAS against text delimited
Since parquet version of PageWriter cannot allow to use direct memory for allocating ByteBuffers.
This PR introduces other version of PageWriter and PageWriteStore.
See more: https://issues.apache.org/jira/browse/PARQUET-1006.
closes #846
Sorabh Hamirwasia [Thu, 15 Jun 2017 18:00:21 +0000 (11:00 -0700)]
DRILL-5589: JDBC client crashes after successful authentication if trace logging is enabled
closes #854
Parth Chandra [Wed, 17 May 2017 23:56:13 +0000 (16:56 -0700)]
DRILL-5545: Fix issues reported by findbugs in Async Parquet Reader.
This closes #847
Parth Chandra [Tue, 23 May 2017 18:04:15 +0000 (11:04 -0700)]
DRILL-5545: Update POM to add support for running findbugs
Rob Wu [Mon, 5 Jun 2017 21:06:33 +0000 (14:06 -0700)]
DRILL-5541: C++ Client Crashes During Simple "Man in the Middle" Attack Test with Exploitable Write AV
This closes #850
Padma Penumarthy [Tue, 30 May 2017 17:17:48 +0000 (10:17 -0700)]
DRILL-5560: Create configuration file for distribution specific configuration
This closes #848
Sorabh Hamirwasia [Sat, 22 Apr 2017 01:34:19 +0000 (18:34 -0700)]
DRILL-5485: Remove WebServer dependency on DrillClient
1. Added WebUserConnection/AnonWebUserConnection and their providers for Authenticated and Anonymous web users.
2. Updated to store the UserSession, BufferAllocator and other session states inside the HttpSession of Jetty instead
of storing in DrillUserPrincipal. For each request now a new instance of WebUserConnection will be created. However
for authenticated users the UserSession and other states will be re-used whereas for Anonymous Users it will created
for each request and later re-cycled after query execution.
close #829
Arina Ielchiieva [Thu, 25 May 2017 13:23:43 +0000 (13:23 +0000)]
DRILL-5537: Display columns alias for queries with sum() when RDBMS storage plugin is enabled
close #845
Volodymyr Vysotskyi [Wed, 12 Apr 2017 16:07:39 +0000 (16:07 +0000)]
DRILL-5140: Fix CompileException in run-time generated code when record batch has large number of fields.
- Changed estimation of max index value and added comments.
close #818
eskabetxe [Sat, 6 May 2017 11:41:36 +0000 (13:41 +0200)]
DRILL-5229: Update kudu-client to 1.3.0
closes #828
Paul Rogers [Wed, 15 Mar 2017 20:49:07 +0000 (13:49 -0700)]
DRILL-5356: Refactor Parquet Record Reader
The Parquet reader is Drill's premier data source and has worked very well
for many years. As with any piece of code, it has grown in complexity over
that time and has become hard to understand and maintain.
In work in another project, we found that Parquet is accidentally creating
"low density" batches: record batches with little actual data compared to
the amount of memory allocated. We'd like to fix that.
However, the current complexity of the reader code creates a barrier to
making improvements: the code is so complex that it is often better to
leave bugs unfixed, or risk spending large amounts of time struggling to
make small changes.
This commit offers to help revitalize the Parquet reader. Functionality is
identical to the code in master; but code has been pulled apart into
various classes each of which focuses on one part of the task: building
up a schema, keeping track of read state, a strategy for reading various
combinations of records, etc. The idea is that it is easier to understand
several small, focused classes than one huge, complex class. Indeed, the
idea of small, focused classes is common in the industry; it is nothing new.
Unit tests pass with the change. Since no logic has chanaged, we only moved
lines of code, that is a good indication that everything still works.
Also includes fixes based on review comments.
closes #789
Padma Penumarthy [Thu, 20 Apr 2017 00:25:20 +0000 (17:25 -0700)]
DRILL-5379: Set Hdfs Block Size based on Parquet Block Size
Provide an option to specify blocksize during file creation.
This will help create parquet files with single block on HDFS, helping improve performance when we read those files.
See DRILL-5379 for details.
closes #826
Kunal Khatua [Mon, 15 May 2017 20:33:49 +0000 (13:33 -0700)]
DRILL-5481: Allow to persist profiles in-memory only with a max capacity
1. Introduced an InMemoryStoreProvider with the ability to maintain a max capacity
2. DrillbitContext now explicitly has a profileStoreProvider that, by default, re-uses the general PersistentStoreProvider, unless it is InMemory, which is when #1 is used.
2. Cleanly separated out QueryProfileStoreContext
3. Converted literal values to constants within ExecConstants
4. Updated drill-module.conf for default capacity
closes #834
Paul Rogers [Thu, 11 May 2017 19:46:15 +0000 (12:46 -0700)]
DRILL-5504: Add vector validator to diagnose offset vector issues
Validates offset vectors in VarChar and repeated vectors. Validates the
special case of repeated VarChar vectors (two layers of offsets.)
Provides two new session variables to turn on validation. One enables
the existing operator (iterator) validation, the other adds vector
validation. This allows validation to occur in a “production” Drill
(without restarting Drill with assertions, as previously required.)
Unit tests validate the validator. Another test validates the
integration, but requires manual steps, so is ignored by default.
This version is first-cut: all work is done within a single class.
Allows back-porting to an earlier version to solve a specific issues. A
revision should move some of the work into generated code (or refactor
vectors to allow outside access), since offset vectors appear for each
subclass; not on a base class that would allow generic operations.
* Added boot-time options to allow enabling vector validation in Maven
unit tests.
* Code cleanup per suggestions.
* Additional (manual) tests for boot-time options and default options.
closes #832
Paul Rogers [Mon, 15 May 2017 22:00:21 +0000 (15:00 -0700)]
DRILL-5512: Standardize error handling in ScanBatch
Standardizes error handling to throw a UserException. Prior code threw
various exceptions, called the fail() method, or returned a variety of
status codes.
closes #838
Arina Ielchiieva [Mon, 22 May 2017 14:49:31 +0000 (17:49 +0300)]
DRILL-5533: Fix flag assignment in FunctionInitializer.checkInit() method
Changes:
1. Fixed DCL in FunctionInitializer.checkInit() method (update flag parameter when function body is loaded).
2. Fixed ImportGrabber.getImports() method to return the list with imports.
3. Added unit tests for FunctionInitializer.
4. Minor refactoring (renamed methods, added javadoc).
closes #843
Parth Chandra [Fri, 2 Jun 2017 23:54:20 +0000 (16:54 -0700)]
Updated README to include export control section
Sorabh Hamirwasia [Mon, 6 Mar 2017 08:19:50 +0000 (00:19 -0800)]
DRILL-4335: Apache Drill should support network encryption.
NOTE: This pull request provides support for on-wire encryption using SASL framework. Communication channel covered is:
1) C++ Drill Client and Drillbit channel.
close apache/drill#809
Sorabh Hamirwasia [Thu, 2 Feb 2017 02:44:21 +0000 (18:44 -0800)]
DRILL-4335: Apache Drill should support network encryption.
NOTE: This pull request provides support for on-wire encryption using SASL framework. The communication channel that are covered are:
1) Between Drill JDBC client and Drillbit.
2) Between Drillbit to Drillbit i.e. control/data channels.
3) It has UI change to view encryption is enabled on which network channel and number of encrypted/unencrypted connections for
user/control/data connections.
close apache/drill#773
Arina Ielchiieva [Thu, 18 May 2017 10:55:48 +0000 (10:55 +0000)]
DRILL-5523: Revert if condition in UnionAllRecordBatch changed in DRILL-5419
close apache/drill#842
Paul Rogers [Wed, 10 May 2017 23:17:24 +0000 (16:17 -0700)]
DRILL-5498: Better handling of CSV column headers
See DRILL-5498 for details.
Replaced the repeated varchar reader for reading columns with a purpose
built column parser. Implemented rules to recover from invalid column
headers.
Added missing test method
Changes re code review comments
Back out testing-only change
close apache/drill#830
Arina Ielchiieva [Mon, 15 May 2017 15:51:02 +0000 (15:51 +0000)]
DRILL-5516: Limit memory usage for Hbase reader
close apache/drill#839
Paul Rogers [Fri, 12 May 2017 18:01:54 +0000 (11:01 -0700)]
DRILL-5496: Fix for failed Hive connection
If the Hive server restarts, Drill either hangs or continually reports
errors when retrieving schemas. The problem is that the Hive plugin
tries to handle connection failures, but does not do so correctly for
the secure connection case. The problem is complex, see DRILL-5496 for
details.
This is a workaround: we discard the entire Hive schema cache when we
encounter an unhandled connection exception, then we rebuild a new one.
This is not a proper fix; for that we'd have to restructure the code.
This will, however, solve the immediate problem until we do the needed
restructuring.
Volodymyr Vysotskyi [Fri, 12 May 2017 18:21:10 +0000 (18:21 +0000)]
DRILL-5399: Fix race condition in DrillComplexWriterFuncHolder
Vitalii Diravka [Tue, 16 May 2017 09:53:08 +0000 (09:53 +0000)]
DRILL-3250: Drill fails to compare multi-byte characters from hive table - A small refactoring of original fix of this issue (DRILL-4039); - Added test for the fix.
Jinfeng Ni [Sat, 22 Apr 2017 00:34:15 +0000 (17:34 -0700)]
DRILL-5459: Extend physical operator test framework to test mini plans consisting of multiple operators.
This closes #823
Arina Ielchiieva [Thu, 27 Apr 2017 11:44:18 +0000 (11:44 +0000)]
DRILL-5450: Fix initcap function to convert upper case characters correctly
This closes #821
Padma Penumarthy [Tue, 11 Apr 2017 23:34:14 +0000 (16:34 -0700)]
DRILL-5429: Improve query performance for MapR DB JSON Tables Cache and reuse table and tabletInfo per query instead of fetching them multiple times. Compute rowCount from tabletInfo instead of expensive tableStats call.
This closes #817
Arina Ielchiieva [Thu, 6 Apr 2017 10:44:26 +0000 (13:44 +0300)]
DRILL-5419: Calculate return string length for literals & some string functions
1. Revisited calculation logic for string literals and some string functions
(cast, upper, lower, initcap, reverse, concat, concat operator, rpad, lpad, case statement,
coalesce, first_value, last_value, lag, lead).
Synchronized return type length calculation logic between limit 0 and regular queries.
2. Deprecated width and changed it to precision for string types in MajorType.
3. Revisited FunctionScope and splitted it into FunctionScope and ReturnType.
FunctionScope will indicate only function usage in term of number of in / out rows, (n -> 1, 1 -> 1, 1->n).
New annotation in UDFs ReturnType will indicate which return type strategy should be used.
4. Changed MAX_VARCHAR_LENGTH from 65536 to 65535.
5. Updated calculation of precision and display size for INTERVALYEAR & INTERVALDAY.
6. Refactored part of function code-gen logic (ValueReference, WorkspaceReference, FunctionAttributes, DrillFuncHolder).
This closes #819
Paul Rogers [Sun, 9 Apr 2017 03:52:04 +0000 (20:52 -0700)]
DRILL-5423: Refactor ScanBatch to allow unit testing record readers
Refactors ScanBatch to allow unit testing of record reader
implementations, especially the “writer” classes.
See JIRA for details.
closes #811
liyun Liu [Thu, 4 May 2017 04:46:58 +0000 (12:46 +0800)]
DRILL-4039: Query fails when non-ascii characters are used in string literals
closes #825
Paul Rogers [Fri, 10 Mar 2017 23:56:18 +0000 (15:56 -0800)]
DRILL-5344: External sort priority queue copier fails with an empty batch
Unit tests showed that the “priority queue copier” does not handle an
empty batch. This has not been an issue because code elsewhere in the
sort specifically works around this issue. This fix resolves the issue
at the source to avoid the need for future work-arounds.
closes #778
Paul Rogers [Sun, 26 Mar 2017 02:51:43 +0000 (19:51 -0700)]
DRILL-5385: Vector serializer fails to read saved SV2
Unit testing revealed that the VectorAccessorSerializable class claims
to serialize SV2s, but, in fact, does not. Actually, it writes them,
but does not read them, resulting in corrupted data on read.
Fortunately, no code appears to serialize sv2s at present. Still, it is
a bug and needs to be fixed.
First task is to add serialization code for the sv2.
That revealed that the recently-added code to save DrillBufs using a
shared buffer had a bug: it relied on the writer index to know how much
data is in the buffer. Turns out sv2 buffers don’t set this index. So,
new versions of the write function takes a write length.
Then, closer inspection of the read code revealed duplicated code. So,
DrillBuf allocation moved into a version of the read function that now
does reading and DrillBuf allocation.
Turns out that value vectors, but not SV2s, can be built from a
Drillbuf. Added a matching constructor to the SV2 class.
Finally, cleaned up the code a bit to make it easier to follow. Also
allowed test code to access the handy timer already present in the code.
closes #800
Paul Rogers [Tue, 11 Apr 2017 21:42:57 +0000 (14:42 -0700)]
DRILL-5428: submit_plan fails after Drill 1.8 script revisions
When the other scripts were updated, submit_plan was not corrected.
After Drill 1.8, drill-config.sh consumes all command line arguments,
finds the —config and —site options, removes them, and places the rest
in the new args array.
This PR updates submit_plan to use the new args array.
The fix was tested on a test cluster: we verified that a physical plan
was submitted and ran.
closes #816
Arina Ielchiieva [Wed, 26 Apr 2017 13:27:19 +0000 (13:27 +0000)]
DRILL-5391: CTAS: make folder and file permission configurable
close #820
Paul Rogers [Tue, 14 Mar 2017 23:18:24 +0000 (16:18 -0700)]
DRILL-5318: Sub-operator test fixture
This commit depends on:
* DRILL-5323
This PR cannot be accepted (or built) until the above are pulled and
this PR is rebased on top of them. The PR is issued now so that reviews
can be done in parallel.
Provides the following:
* A new OperatorFixture to set up all the objects needed to test at the
sub-operator level. This relies on the refactoring to create the
required interfaces.
* Pulls the config builder code out of the cluster fixture builder so
that configs can be build for sub-operator tests.
* Modifies the QueryBuilder test tool to run a query and get back one
of the new row set objects to allow direct inspection of data returned
from a query.
* Modifies the cluster fixture to create a JDBC connection to the test
cluster. (Use requires putting the Drill JDBC project on the test class
path since exec does not depend on JDBC.)
Created a common subclass for the cluster and operator fixtures to
abstract out the allocator and config. Also provides temp directory
support to the operator fixture.
Merged with DRILL-5415 (Improve Fixture Builder to configure client
properties)
Moved row set tests here from DRILL-5323 so that DRILL-5323 is self
contained. (The tests depend on the fixtures defined here.)
Added comments where needed.
Puts code back as it was prior to a code review comment. The code is
redundant, but necessarily so due to code which is specific to several
primitive types.
closes #788
Paul Rogers [Tue, 14 Mar 2017 23:18:24 +0000 (16:18 -0700)]
DRILL-5323: Test tools for row sets
Provide test tools to create, populate and compare row sets
To simplify tests, we need a TestRowSet concept that wraps a
VectorContainer and provides easy ways to:
- Define a schema for the row set.
- Create a set of vectors that implement the schema.
- Populate the row set with test data via code.
- Add an SV2 to the row set.
- Pass the row set to operator components (such as generated code
blocks.)
- Examine the contents of a row set
- Compare the results of the operation with an expected result set.
- Dispose of the underling direct memory when work is done.
This code builds on that in DRILL-5324 to provide a complete row set
API. See DRILL-5318 for the spec.
Note: this code can be reviewed as-is, but cannot be committed until
after DRILL-5324 is committed: this code has compile-time dependencies
on that code. This PR will be rebased once DRILL-5324 is pulled into
master.
Handles maps and intervals
The row set schema is refined to provide two forms of schema. A
physical schema shows the nested structure of the data with maps
expanding into their contents.
Updates the row set schema builder to easily build a schema with maps.
An access schema shows the row “flattened” to include just scalar
(non-map) columns, with all columns at a single level, with dotted
names identifying nested fields. This form makes for very simple access.
Then, provides tools for reading and writing batches with maps by
presenting the flattened view to the row reader and writer.
HyperVectors have a very complex structure for maps. The hyper row set
implementation takes a first crack at mapping that structure into the
standardized row set format.
Also provides a handy way to set an INTERVAL column from an int. There
is no good mapping from an int to an interval, so an arbitrary
convention is used. This convention is not generally useful, but is
very handy for quickly generating test data.
As before, this is a partial PR. The code here still depends on
DRILL-5324 to provide the column accessors needed by the row reader and
writer.
All this code is getting rather complex, so this commit includes a unit
test of the schema and row set code.
Revisions to support arrays
Arrays require a somewhat different API. Refactored to allow arrays to
appear as a field type.
While refactoring, moved interfaces to more logical locations.
Added more comments.
Rejiggered the row set schema to provide both a physical and flattened
(access) schema, both driven from the original batch schema.
Pushed some accessor and writer classes into the accessor layer.
Added tests for arrays.
Also added more comments where needed.
Moved tests to DRILL-5318
The test classes previously here depend on the new “operator fixture”.
To provide a non-cyclic checkin order, moved the tests to the PR with
the fixtures so that this PR is clear of dependencies. The tests were
reviewed in the context of DRILL-5318.
Also pulls in batch sizer support for map fields which are required by
the tests.
closes #785
Paul Rogers [Sat, 11 Mar 2017 07:03:23 +0000 (23:03 -0800)]
Test-specific column accessor implementation. Provides a simplified, unified set of access methods for value vectors specifically for wrting simple, compact unit test code.
* Interfaces for column readers and writers
* Interfaces for tuple (row and map) readers and writers
* Generated implementations
* Base implementation used by the generated code
* Factory class to create the proper reader or writer given a major
type (type and cardinality)
* Utilities for generic access, type conversions, etc.
Many vector types can be mapped to an int for get and set. One key
exception are the decimal types: decimals, by definition, require a
different representation. In Java, that is `BigDecimal`. Added get, set
and setSafe accessors as required for each decimal type that uses
`BigDecimal` to hold data.
The generated code builds on the `valueVectorTypes.tdd` file, adding
additional properties needed to generate the accessors.
The PR also includes a number of code cleanups done while reviewing
existing code. In particular `DecimalUtility` was very roughly
formatted and thus hard to follow.
Supports Drill’s interval types (INTERVAL, INTERVALDAY,
INTERVALYEAR) in the form of the Joda interval class.
Adds support for Map vectors. Maps are treated as nested tuples and are
expanded out to create a flattened row in the schema. The accessors
then access rows using the flattened column index or the combined name
(“a.b”).
Supports arrays via a writer interface that appends values as written,
and an indexed, random-access reader interface.
Removed HTTP log parser from JDBC jar to keep the JDBC jar from getting
too big.
close apache/drill#783
Volodymyr Vysotskyi [Mon, 10 Apr 2017 13:16:52 +0000 (13:16 +0000)]
DRILL-5424: Fix IOBE for reverse function
close apache/drill#815
Sorabh Hamirwasia [Wed, 5 Apr 2017 18:04:58 +0000 (11:04 -0700)]
DRILL-5415: Improve Fixture Builder to configure client properties and keep collection type properties for server
Updated with review feedback
close apache/drill#807
Patrick Wong [Mon, 10 Apr 2017 22:45:14 +0000 (15:45 -0700)]
DRILL-5409 - update MapR version to 5.2.1
close apache/drill#813
Vitalii Diravka [Mon, 10 Apr 2017 18:54:21 +0000 (18:54 +0000)]
DRILL-5213: Prepared statement for actual query is missing the query text
close apache/drill#812
Paul Rogers [Sun, 19 Mar 2017 02:52:52 +0000 (19:52 -0700)]
DRILL-5319: Refactor "contexts" for unit testing closes #787
This PR is purely a refactoring: no functionality is added or changed.
The refactoring splits various context and related classes into a set
of new interfaces with needed for operator-level unit tests. The other,
Drillbit-related methods are left in the original interfaces. Most code
need not change.
The changes here allow operator-level unit tests to mock up the
exec-time methods so we can use them without firing up a Drillbit (or
using mocking libraries).
A later PR will provide the sub-operator test framework that uses this
refactoring.
Changes include:
* The OptionManager is split, with read-only methods moving to a new
OptionSet interface.
* The FragmentContext is split, with an exec-only FragmentExecContext
proving low-level methods.
* OperatorStats is split, with a new OperatorStatReceiver class
providing write-only support to operators.
* Several places that accepted an OperatorContext or FragmentContext,
but needed only an allocator, are changed to accept the allocator
directly.
Includes fixes for code review comments
Adds more comments. Postpones the suggested rename until all affected
code is in master, else it will be difficult to synchronize the rename
across multiple branches.
Sudheesh Katkam [Fri, 7 Apr 2017 15:59:50 +0000 (08:59 -0700)]
DRILL-5387: Ignore TestBitBitKerberos and TestUserBitKerberos closes #810
Padma Penumarthy [Tue, 28 Mar 2017 22:08:54 +0000 (15:08 -0700)]
DRILL-5395: Query on MapR-DB table fails with NPE due to an issue with assignment logic closes #803
Paul Rogers [Sun, 26 Mar 2017 01:27:40 +0000 (18:27 -0700)]
DRILL-5234: External sort's spilling functionality does not work when the spilled columns contains a map type column closes #799
Paul Rogers [Tue, 14 Mar 2017 22:07:41 +0000 (15:07 -0700)]
DRILL-5355: Misc. code cleanup closes #784
Rob Wu [Mon, 6 Mar 2017 22:56:14 +0000 (14:56 -0800)]
DRILL-5315: Address small typo in the comment in drillClient.hpp closes #771
Arina Ielchiieva [Wed, 22 Mar 2017 15:07:23 +0000 (15:07 +0000)]
DRILL-5375: Nested loop join: return correct result for left join closes #794
Vitalii Diravka [Wed, 5 Apr 2017 17:59:32 +0000 (17:59 +0000)]
DRILL-5413: DrillConnectionImpl.isReadOnly() throws NullPointerException
change is in CALCITE-843.
update drill's calcite version to 1.4.0-drill-r21
close #806
Vitalii Diravka [Thu, 16 Mar 2017 13:45:36 +0000 (13:45 +0000)]
DRILL-5373: Drill JDBC error in the process of connection via SQuirrel
- java.lang.NoClassDefFoundError: javax/validation/constraints/NotNull
Vitalii Diravka [Fri, 20 May 2016 20:11:33 +0000 (20:11 +0000)]
DRILL-3510: Add ANSI_QUOTES option so that Drill's SQL Parser will recognize ANSI_SQL identifiers
- added supporing of quoting identifiers with DOUBLE_QUOTES or BRACKETS via setting new
sys/sess EnumString option QUOTING_IDENTIFIERS;
- added possibility of setting QUOTING_IDENTIFIERS by the jdbc connection URL string;
- added relevant unit tests;
close #520
Vitalii Diravka [Fri, 17 Mar 2017 11:41:46 +0000 (11:41 +0000)]
DRILL-4971: Query encounters system error, when there aren't eval subexpressions of any function in boolean and/or expressions
- New evaluated blocks for boolean operators should be with braces always, since they use labels.
close #792
chunhui-shi [Sat, 25 Mar 2017 01:40:15 +0000 (18:40 -0700)]
DRILL-5297: when the generated plan mismatches, PlanTest print the generated plan along with expected pattern
close #798
Parth Chandra [Sat, 11 Feb 2017 01:40:25 +0000 (17:40 -0800)]
DRILL-5351: Minimize bounds checking in var len vectors for Parquet reader
close #781
Laurent Goujon [Mon, 20 Mar 2017 17:55:17 +0000 (10:55 -0700)]
DRILL-5368: Fix memory leak issue in DrillClientImpl::processServerMetaResult
Fix a small memory leak by doing local allocation instead since the
object doesn't escape the function.
close #790
Laurent Goujon [Mon, 20 Mar 2017 18:46:58 +0000 (11:46 -0700)]
DRILL-5369: Add initializer for ServerMetaContext
ServerMetaContext had no default constructor. The lack of it
might cause m_done to be set to true, same for other variables.
Add a default constructor to explicitly initialize its members.
close #791
Jinfeng Ni [Wed, 22 Mar 2017 22:28:22 +0000 (15:28 -0700)]
DRILL-5378: Put more information for schema change exception in hash join, hash agg, streaming agg and sort operator.
close #801
Padma Penumarthy [Thu, 16 Mar 2017 18:40:13 +0000 (11:40 -0700)]
DRILL-5394: Optimize query planning for MapR-DB tables by caching row counts
close #802
Serhii-Harnyk [Fri, 3 Mar 2017 15:24:26 +0000 (15:24 +0000)]
DRILL-4678: Tune metadata by generating a dispatcher at runtime
main code changes are in Calcite library.
update drill's calcite version to 1.4.0-drill-r20.
close #793
Jinfeng Ni [Thu, 16 Mar 2017 21:44:35 +0000 (14:44 -0700)]
DRILL-5359: Fix ClassCastException when Drill pushes down filter on the output of flatten operator.
- Move findItemOrFlatten as a static method in DrillRelOptUtil.
- Exclude filter conditions if they contain item/flatten operator.
close apache/drill#786
Paul Rogers [Tue, 14 Mar 2017 03:43:25 +0000 (20:43 -0700)]
DRILL-5352: Profile parser printing for multi fragments
Enhances the recently added ProfileParser to display run times for
queries that contain multiple fragments. (The original version handled
just a single fragment.)
Prints the query in “classic” mode if it is linear, or in the new
semi-indented mode if the query forms a tree.
Also cleans up formatting - removing spaces between parens.
Fixes from review
close apache/drill#782
* Fixed process time percent.
* Added support for getting operator profiles in a multi-fragment query.
Parth Chandra [Fri, 10 Mar 2017 22:38:30 +0000 (14:38 -0800)]
DRILL-5349: Fix TestParquetWriter unit tests when synchronous parquet reader is used.
close apache/drill#780
Paul Rogers [Fri, 10 Mar 2017 19:55:13 +0000 (11:55 -0800)]
DRILL-5330: NPE in FunctionImplementationRegistry
Fixes:
* DRILL-5330: NPE in
FunctionImplementationRegistry.functionReplacement()
* DRILL-5331:
NPE in FunctionImplementationRegistry.findDrillFunction() if dynamic
UDFs disabled
When running in a unit test, the dynamic UDF (DUDF) mechanism is not
available. When running in production, the DUDF mechanism is available,
but may be disabled.
One confusing aspect of this code is that the function registry
is given the option manager, but the option manager is not yet valid
(not yet initialized) in the function registry constructor. So, we
cannot access the option manager in the function registry constructor.
In any event, the existing system options cannot be used to disable DUDF
support. For obscure reasons, DUDF support is always enabled, even when
disabled by the user.
Instead, for DRILL-5331, we added a config option to "really" disable DUDFS.
The property is set only for tests, disables DUDF support.
Note that, in the future, this option could be generalized to
"off, read-only, on" to capture the full set of DUDF modes.
But, for now, just turning this off is sufficient.
For DRILL-5330, we use an existing option validator rather than
accessing the raw option directly.
Also includes a bit of code cleanup in the class in question.
The result is that the code now works when used in a sub-operator unit
test.
close apache/drill#777
Rob Wu [Tue, 7 Mar 2017 02:17:25 +0000 (18:17 -0800)]
DRILL-5316: Check drillbits size before we attempt to access the vector element
close apache/drill#772
Laurent Goujon [Fri, 3 Mar 2017 04:38:05 +0000 (20:38 -0800)]
DRILL-5311: Check handshake result in C++ connector
In C++ client connector, DrillClientImpl::recvHandshake always
return success, even in case of connection error (like a tcp
timeout issue). Only on WIN32 platform would the error code be
checked.
Remove the restriction to only check on WIN32, plus add some logging.
close apache/drill#770
Paul Rogers [Fri, 3 Mar 2017 00:09:01 +0000 (16:09 -0800)]
DRILL-5226: Managed external sort fixes
* Memory leak in managed sort if OOM during sv2 allocation
* "Record batch sizer" does not include overhead for variable-sized
vectors
* Paranoid double-checking of merge batch sizes to prevent OOM when the
sizes differ from expectations
* Revised logging
Addresses review comments
close apache/drill#767
Jinfeng Ni [Tue, 14 Mar 2017 21:54:30 +0000 (14:54 -0700)]
Update version to 1.11.0-SNAPSHOT
Vitalii Diravka [Tue, 7 Mar 2017 20:53:03 +0000 (20:53 +0000)]
DRILL-5326: Unit tests failures related to the SERVER_METADTA
- adding of the sql type name for the "GENERIC_OBJECT";
- changing "NullCollation" in the "ServerMetaProvider" to the correct default value;
- changing RpcType to GET_SERVER_META in the appropriate ServerMethod
close #775
Boaz Ben-Zvi [Sat, 25 Feb 2017 00:48:42 +0000 (16:48 -0800)]
DRILL-5293: Change seed for distribution hash function to differ from that of the hash table
close #765