drill.git
5 years ago[maven-release-plugin] prepare release drill-1.10.0 1.10.0 drill1.10.0
Jinfeng Ni [Wed, 8 Mar 2017 21:30:49 +0000 (13:30 -0800)] 
[maven-release-plugin] prepare release drill-1.10.0

5 years agoDRILL-5326: Unit tests failures related to the SERVER_METADTA
Vitalii Diravka [Tue, 7 Mar 2017 20:53:03 +0000 (20:53 +0000)] 
DRILL-5326: Unit tests failures related to the SERVER_METADTA

  - adding of the sql type name for the "GENERIC_OBJECT";
- changing "NullCollation" in the "ServerMetaProvider" to the correct default value;
- changing RpcType to GET_SERVER_META in the appropriate ServerMethod

close #775

5 years agoDRILL-5293: Change seed for distribution hash function to differ from that of the...
Boaz Ben-Zvi [Sat, 25 Feb 2017 00:48:42 +0000 (16:48 -0800)] 
DRILL-5293: Change seed for distribution hash function to differ from that of the hash table

close #765

5 years agoDRILL-5313: Fix compilation issue in C++ connector
Laurent Goujon [Fri, 3 Mar 2017 04:03:42 +0000 (20:03 -0800)] 
DRILL-5313: Fix compilation issue in C++ connector

DRILL-5301 and DRILL-5167 have conflicting changes, which causes
the C++ connector to not compile: the static symbol for the search
escape string has been removed as the server might use a different one.

Fix the issue by using the current search escape string (injected from the
meta to the internal drill client when querying metadata).

close #769

5 years agoDRILL-5208: Finding path to java executable should be deterministic
Paul Rogers [Fri, 24 Feb 2017 22:53:23 +0000 (14:53 -0800)] 
DRILL-5208: Finding path to java executable should be deterministic

See DRILL-5208 for background. Instead of using “find” to locate the
java command, we use the any information available, resorting to find
only if the “usual suspects” fails. The result is that we use the JDK
java when available, instead of randomly choosing JDK or JRE java.

close #763

5 years agoDRILL-5287: Provide option to skip updates of ephemeral state changes in Zookeeper
Padma Penumarthy [Tue, 21 Feb 2017 21:20:57 +0000 (13:20 -0800)] 
DRILL-5287: Provide option to skip updates of ephemeral state changes in Zookeeper

close #758

5 years agoDRILL-5290: Provide an option to build operator table once for built-in static functi...
Padma Penumarthy [Wed, 22 Feb 2017 18:31:01 +0000 (10:31 -0800)] 
DRILL-5290: Provide an option to build operator table once for built-in static functions and reuse it across queries.

close #757

5 years agoDRILL-5034: Select timestamp from hive generated parquet always return in UTC
Vitalii Diravka [Mon, 14 Nov 2016 21:13:28 +0000 (21:13 +0000)] 
DRILL-5034: Select timestamp from hive generated parquet always return in UTC

- TIMESTAMP_IMPALA function is reverted to retaine local timezone
- TIMESTAMP_IMPALA_LOCALTIMEZONE is deleted
- Retain local timezone for the INT96 timestamp values in the parquet files while
  PARQUET_READER_INT96_AS_TIMESTAMP option is on

Minor changes according to the review

Fix for the test, which relies on particular timezone

close #656

5 years agoDRILL-5266: Parquet returns low-density batches
Paul Rogers [Thu, 16 Feb 2017 04:51:17 +0000 (20:51 -0800)] 
DRILL-5266: Parquet returns low-density batches

Fixes one glaring problem related to bit/byte confusion.

Includes a few clean-up items found along the way.

Additional fixes from code review comments

More code clean up from code review

close #749

5 years agoDRILL-5252: Fix a condition that always returns true
jc@lifove.net [Sat, 11 Feb 2017 02:08:00 +0000 (21:08 -0500)] 
DRILL-5252: Fix a condition that always returns true

close #745

5 years agoDRILL-4963: Fix issues with dynamically loaded overloaded functions
Arina Ielchiieva [Tue, 20 Dec 2016 16:57:15 +0000 (16:57 +0000)] 
DRILL-4963: Fix issues with dynamically loaded overloaded functions

close #701

5 years agoDRILL-5284: Roll-up of final fixes for managed sort
Paul Rogers [Fri, 24 Feb 2017 18:31:25 +0000 (10:31 -0800)] 
DRILL-5284: Roll-up of final fixes for managed sort

See subtasks for details.

* Provide detailed, accurate estimate of size consumed by a record batch
* Managed external sort spills too often with Parquet data
* Managed External Sort fails with OOM
* External sort refers to the deprecated HDFS fs.default.name param
* Config param drill.exec.sort.external.batch.size is not used
* NPE in managed external sort while spilling to disk
* External Sort BatchGroup leaks memory if an OOM occurs during read
* DRILL-5294: Under certain low-memory conditions, need to force the sort to merge
two batches to make progress, even though this is a bit more than
comfortably fits into memory.

close #761

5 years agoDRILL-5304: Queries fail intermittently when there is skew in data distribution
Padma Penumarthy [Tue, 28 Feb 2017 02:32:24 +0000 (18:32 -0800)] 
DRILL-5304: Queries fail intermittently when there is skew in data distribution

close #766

5 years agoDRILL-5258: Access mock data definition from SQL
Paul Rogers [Tue, 14 Feb 2017 18:02:13 +0000 (10:02 -0800)] 
DRILL-5258: Access mock data definition from SQL

Extends the mock data source to allow using the full power of the mock
data source from an SQL query by referencing the JSON definition
file. See JIRA and package-info for details.

Adds a boolean data generator and a varying-length string generator.

Adds “mock” table stats for use in the planner.

Revisions based on code review comments

close #752

5 years agoDRILL-5221: Send cancel message as soon as possible in C++ connector
Laurent Goujon [Wed, 25 Jan 2017 02:47:47 +0000 (18:47 -0800)] 
DRILL-5221: Send cancel message as soon as possible in C++ connector

In C++ connector, try to send cancel request to the server as soon as
possible, which means when receiving the queryId or when requested by the
user if queryId has already been received.

close #733

5 years agoDRILL-5167: Send escape character for metadata queries
Laurent Goujon [Thu, 29 Dec 2016 01:03:37 +0000 (17:03 -0800)] 
DRILL-5167: Send escape character for metadata queries

Escape character was not sent when doing metadata queries, which caused
the server to return incorrect results as the pattern is interpreted
differently form what the user asked for.

close #712

5 years agoDRILL-5301: Add C++ client support for Server metadata API
Laurent Goujon [Sun, 26 Feb 2017 18:23:59 +0000 (10:23 -0800)] 
DRILL-5301: Add C++ client support for Server metadata API

Add support to the Server metadata API to the C++ client if
available. If the API is not supported to the server, fallback
to the previous hard-coded values.

Update the querySubmitter example program to query the information.

close #764

5 years agoDRILL-5301: Server metadata API
Laurent Goujon [Fri, 24 Feb 2017 23:41:07 +0000 (15:41 -0800)] 
DRILL-5301: Server metadata API

Add a Server metadata API to the User protocol, to query server support
of various SQL features.

Add support to the client (DrillClient) to query this information.

Add support to the JDBC driver to query this information, if the server supports
the new API, or fallback to the previous behaviour (rely on Avatica defaults) otherwise.

close #764

5 years agoDRILL-4730: Update JDBC DatabaseMetaData implementation to use new Metadata APIs
Laurent Goujon [Fri, 4 Nov 2016 20:32:44 +0000 (13:32 -0700)] 
DRILL-4730: Update JDBC DatabaseMetaData implementation to use new Metadata APIs

Update JDBC driver to use Metadata APIs instead of executing SQL queries

close #613

5 years agoDRILL-4994: Add back JDBC prepared statement for older servers
Laurent Goujon [Sat, 5 Nov 2016 00:36:42 +0000 (17:36 -0700)] 
DRILL-4994: Add back JDBC prepared statement for older servers

When the JDBC client is connected to an older Drill server, it always
attempted to use server-side prepared statement with no fallback.

With this change, client will check server version and will fallback to the
previous client-side prepared statement (which is still limited to only execute
queries and does not provide metadata).

close #613

5 years agoDRILL-4994: Refactor DrillCursor
Laurent Goujon [Fri, 4 Nov 2016 20:31:19 +0000 (13:31 -0700)] 
DRILL-4994: Refactor DrillCursor

Refactor DrillCursor to be more self-contained.

5 years agoBump maxsize of jdbc-all jar to accommodate the increased size of jar file due to...
Jinfeng Ni [Thu, 2 Mar 2017 02:01:13 +0000 (18:01 -0800)] 
Bump maxsize of jdbc-all jar to accommodate the increased size of jar file due to new code.

5 years agoDRILL-4280: CORE (user to bit authentication, C++)
Sudheesh Katkam [Fri, 24 Feb 2017 02:47:04 +0000 (18:47 -0800)] 
DRILL-4280: CORE (user to bit authentication, C++)

closes #578

5 years agoDRILL-4280: CORE (C++ protocol)
Sudheesh Katkam [Tue, 31 Jan 2017 02:55:16 +0000 (18:55 -0800)] 
DRILL-4280: CORE (C++ protocol)

5 years agoDRILL-4280: CORE (unit tests)
Sudheesh Katkam [Fri, 24 Feb 2017 03:00:54 +0000 (19:00 -0800)] 
DRILL-4280: CORE (unit tests)

+ Modify existing tests to use new authentication configuration
+ Add TestUserBitKerberos and TestBitBitKerberos using Apache Kerby library

5 years agoDRILL-4280: CORE (web client)
Sudheesh Katkam [Thu, 26 Jan 2017 03:05:23 +0000 (19:05 -0800)] 
DRILL-4280: CORE (web client)

+ Disabled web server when authentication is enabled but PLAIN mechanism
  is not configured; log a warning

5 years agoDRILL-4280: CORE (bit to bit authentication, data)
Sudheesh Katkam [Thu, 26 Jan 2017 03:04:33 +0000 (19:04 -0800)] 
DRILL-4280: CORE (bit to bit authentication, data)

+ Support authentication in DataServer and DataClient
+ Add AuthenticationCommand as an initial command after handshake
  and before the command that initiates a connection
+ Add DataConnectionConfig to encapsulate configuration
+ Add DataServerRequestHandler to encapsulate all handling of
  requests to DataServer

data

5 years agoDRILL-4280: CORE (bit to bit authentication, control)
Sudheesh Katkam [Thu, 26 Jan 2017 03:03:52 +0000 (19:03 -0800)] 
DRILL-4280: CORE (bit to bit authentication, control)

+ Support authentication in ControlServer and ControlClient
+ Add AuthenticationCommand as an initial command after handshake
  and before the command that initiates a connection
+ Add ControlConnectionConfig to encapsulate configuration
+ ControlMessageHandler now implements RequestHandler

control

5 years agoDRILL-4280: CORE (user to bit authentication, Java)
Sudheesh Katkam [Thu, 26 Jan 2017 03:02:45 +0000 (19:02 -0800)] 
DRILL-4280: CORE (user to bit authentication, Java)

+ Add logic for authentication in UserClient and UserServer with
  backward compatibility in both directions
+ Add abstract extension to ServerConnection and ClientConnection
+ Add concrete extensions to abstract connections:
  BitToUserConnection and UserToBitConnection
+ Add ConnectionConfig interface with abstract and concrete
  implementations to encapsulate configuration for server-side
  connections
+ Encapsulate all requests handled by UserServer in
  UserServerRequestHandler

+ Clear UserSession when connection is closed either by user or
  bit
+ Add DrillProperties to encapsulate all connection properties
  used during connection time

5 years agoDRILL-4280: CORE (security package)
Sudheesh Katkam [Thu, 26 Jan 2017 02:57:35 +0000 (18:57 -0800)] 
DRILL-4280: CORE (security package)

+ Add AuthenticatorFactory interface
+ Kerberos implementation
  + includes SaslServer and SaslClient wrappers
+ Plain implementation
  + PlainServer implements SaslServer (unavailable in Java)
    for username/password based authentication
  + retrofit user authenticator
  + add logic for backward compatibility

+ Add AuthenticatorProvider interface to provide authenticator
  factories, and add two implementations:
  + DrillConfig and ScanResult based AuthenticatorProviderImpl
  + Default and system property based ClientAuthenticatorProvider

+ FastSaslServerFactory caches SaslServer factories
+ FastSaslClientFactory caches SaslClient factories

+ ServerAuthenticationHandler handles authentication on server-side
+ FailingRequestHandler to fail any message received
+ AuthenticationOutcomeListener handles authentication on client-side

security

5 years agoDRILL-4280: HYGIENE
Sudheesh Katkam [Thu, 26 Jan 2017 02:52:28 +0000 (18:52 -0800)] 
DRILL-4280: HYGIENE

+ Do not recreate DrillConfig object in PamUserAuthenticator
+ Add new factory method to CaseInsensitiveMap

+ Clean documentation

5 years agoDRILL-4280: CORE (service login)
Sudheesh Katkam [Thu, 26 Jan 2017 02:51:32 +0000 (18:51 -0800)] 
DRILL-4280: CORE (service login)

+ Support Drillbit login to KDC using Hadoop's
  UserGroupInformation library

+ Set hostname in BootstrapContext
+ Use process user's short name in ImpersonationUtil
+ Add KerberosUtil class

5 years agoDRILL-4280: CORE (revert DRILL-3242)
Sudheesh Katkam [Sat, 25 Feb 2017 01:18:27 +0000 (17:18 -0800)] 
DRILL-4280: CORE (revert DRILL-3242)

+ DRILL-3242 aims to provide offloading request handling to a secondary thread, but this feature is disabled by default due to concurrency issues

+ One of the implications of the feature was to ignore exceptions that were not of UserRpcException type. But exceptions must not be ignored, they should be handled properly, specially in the context of security

5 years agoDRILL-4280: REFACTOR
Sudheesh Katkam [Thu, 26 Jan 2017 02:45:37 +0000 (18:45 -0800)] 
DRILL-4280: REFACTOR

+ Extract RemoteConnection interface, and add AbstractRemoteConnection
+ Add ServerConnection and ClientConnection interfaces
+ Add RequestHandler interface to decouple connections from how requests are handled
+ Add NonTransientRpcException

+ Remove unused classes and methods
+ Code style changes

5 years agoDRILL-4280: CORE (Java protocol)
Sudheesh Katkam [Thu, 26 Jan 2017 02:41:15 +0000 (18:41 -0800)] 
DRILL-4280: CORE (Java protocol)

+ Define SaslStatus and SaslMessage messages in protocol
+ Add "authenticationMechanisms" field to all handshakes
+ Add "saslSupport” field to UserToBitHandshake

5 years agoDRILL-4280: HYGIENE
Sudheesh Katkam [Thu, 26 Jan 2017 02:38:53 +0000 (18:38 -0800)] 
DRILL-4280: HYGIENE

+ Ignore files generated by IntelliJ in RAT plugin

5 years agoDRILL-5195: Publish Operator and MajorFragment Stats in Profile page
Kunal Khatua [Sat, 25 Feb 2017 01:18:40 +0000 (17:18 -0800)] 
DRILL-5195: Publish Operator and MajorFragment Stats in Profile page

Improved UI
1. Introduction of Tooltips
2. Share of each operator as a percentages of the major fragment and of the query
  - This would help identify the most CPU intensive operators within a fragment and across the query
3. Rows emitted by each operator
4. For a running query, changes to 'last update' and 'last progress' now shows the elapsed time since.

closes #756

5 years agoDRILL-5190: Display planning and queued time for a query's profile page
Kunal Khatua [Wed, 22 Feb 2017 07:06:54 +0000 (23:06 -0800)] 
DRILL-5190: Display planning and queued time for a query's profile page

Modified UserSharedBit protobuf for marking planning and wait-in-queue end times. This will allow for accurately reporting the planning, queued and actual execution times of a query.
Planning Time:
In the absence of the planning time's end, for older profiles, the root fragment's (i.e. SCREEN operator) start time is taken as the estimated end of planning time, and as the estimated start time of the execution phase.
QueueWait Time:
We do not estimate the queue time if the planning end time is not available.
Execution Time:
We calculate the execution time based on the availability of these 2 planning time. The computation is done the following way, and reflects a decreasing level of accuracy
1. Execution time = [end(QueueWait) - endTime(Query)]
2. Execution time = [end(Planning) - endTime(Query)]
3. Execution time = [start(rootFragment) - endTime(Query)] - {Estimated}

closes #738

5 years agoDRILL-5196: Init MongoDB cluster when run a single test case directly through command...
chunhui-shi [Sat, 14 Jan 2017 01:20:46 +0000 (17:20 -0800)] 
DRILL-5196: Init MongoDB cluster when run a single test case directly through command line or IDE

Other fixes include:
+ Sync mongo-java-driver versions to newer 3.2.0
+ update flapdoodle package to latest accordingly

closes #741

5 years agoDRILL-5088: Set client's codec for toJson
chunhui-shi [Sun, 18 Dec 2016 08:27:50 +0000 (00:27 -0800)] 
DRILL-5088: Set client's codec for toJson

closes #702

5 years agoDRILL-5255: Remove default temporary workspace check at drillbit start up
Arina Ielchiieva [Wed, 22 Feb 2017 18:13:14 +0000 (18:13 +0000)] 
DRILL-5255: Remove default temporary workspace check at drillbit start up

closes #759

5 years agoDRILL-5257: Run-time control of query profiles
Paul Rogers [Mon, 13 Feb 2017 03:50:35 +0000 (19:50 -0800)] 
DRILL-5257: Run-time control of query profiles

Adds a run-time option to save (default) or not save query profiles.

Adds a run-time option to save query profiles in "debug" mode:
that is, after returning the last client response. (Normal mode is
to return the response before writing the profile.)

Tests for normal case are normal unit tests. Tests for debug mode
case are unit tests using the new framework that parse profiles.
The test framework is extended to save query profiles using this
new option.

Modifies the test framework to use the new options when a test
asks to save query profiles.

closes #747

5 years agoDRILL-5259: Allow listing a user-defined number of profiles
Kunal Khatua [Tue, 21 Feb 2017 19:07:01 +0000 (11:07 -0800)] 
DRILL-5259: Allow listing a user-defined number of profiles

Allow changing default number of finished queries in web UI, when starting up Drillbits.
Option provided in drill-override.conf (default=100 ; defined in drill-module.conf)
Alternatively, the page can be loaded dynamically for the same.
e.g.
https://<hostname>:8047/profiles?max=100

closes #751

5 years agoDRILL-5260: Extend "Cluster Fixture" test framework
Paul Rogers [Sat, 18 Feb 2017 01:39:20 +0000 (17:39 -0800)] 
DRILL-5260: Extend "Cluster Fixture" test framework

- Config option to suppress printing of CSV and other output. (Allows
printing for single tests, not printing when running from Maven.)

- Parsing of query profiles to extract plan and run time information.

- Fix bug in log fixture when enabling logging for a package.

- Improved ZK support.

- Set up the new CTTAS default temporary workspace for tests.

- Clean up persistent storage files on disk to avoid CTTAS startup
failures.

- Provides a set of examples for how to use the cluster fixture.

closes #753

5 years agoDRILL-5273: CompliantTextReader exhausts 4 GB memory when reading 5000 small files
Paul Rogers [Fri, 17 Feb 2017 17:24:22 +0000 (09:24 -0800)] 
DRILL-5273: CompliantTextReader exhausts 4 GB memory when reading 5000 small files

Please see JIRA for details of problem and fix.

closes #750

5 years agoDRILL-5274: Exception thrown in Drillbit shutdown in UDF cleanup code
Arina Ielchiieva [Fri, 24 Feb 2017 12:06:50 +0000 (14:06 +0200)] 
DRILL-5274: Exception thrown in Drillbit shutdown in UDF cleanup code

closes #760

5 years agoDRILL-5275: Sort spill is slow due to repeated allocations
Paul Rogers [Mon, 20 Feb 2017 01:53:31 +0000 (17:53 -0800)] 
DRILL-5275: Sort spill is slow due to repeated allocations

Rather than create a heap buffer per vector when writing and reading,
the revised code creates a single, shared buffer used for all I/O
within a particular container. This improves performance by reducing GC
and CPU costs during I/Os.

Move I/O buffer, and methods to allocator

Allows the buffer to be shared. Especially in the sort, this is
important, as the sort may have many serializations open at once.

closes #754

5 years agoDRILL-5157: Multiple Snappy versions on class path
Paul Rogers [Wed, 28 Dec 2016 01:21:09 +0000 (17:21 -0800)] 
DRILL-5157: Multiple Snappy versions on class path

Multiple Snappy versions on class path; causes unit test failures.

This fix updates the Snappy library and adds dependency management to
exclude older versions brought in by Avro and Parquet.

5 years agoDRILL-5242: The UI breaks when trying to render profiles having unknown metrics
Kunal Khatua [Sat, 4 Feb 2017 01:29:47 +0000 (17:29 -0800)] 
DRILL-5242: The UI breaks when trying to render profiles having unknown metrics

Skip any metrics whose metric ID is unknown, This prevents any ArrayIndexOutOfBoundsException from being thrown and breaking the UI rendering.

5 years agoDRILL-5230: Translation of millisecond duration into hours is incorrect
Kunal Khatua [Mon, 30 Jan 2017 07:08:12 +0000 (23:08 -0800)] 
DRILL-5230: Translation of millisecond duration into hours is incorrect

Fixed invalid representation of readable elapsed time using `TimeUnit` class in JDK.
e.g. 4545 sec is now correctly translated as `1h15m` instead of `17h15m`

5 years agoDRILL-5263: Prevent left NLJoin with non scalar subqueries
Serhii-Harnyk [Mon, 13 Feb 2017 14:30:27 +0000 (14:30 +0000)] 
DRILL-5263: Prevent left NLJoin with non scalar subqueries

5 years agoDRILL-5080: Memory-managed version of external sort
Paul Rogers [Fri, 16 Dec 2016 03:54:05 +0000 (19:54 -0800)] 
DRILL-5080: Memory-managed version of external sort

Please see JIRA entry for reasons for revision, design spec and list of
changes.

This PR covers the changes to the external sort itself. Tests for this
operator require the test framework in DRILL-5126 and the mock data
source in DRILL-5152. Tests for this operator will be issued as a
separate PR once those two dependencies are committed.

Until then, the new operator is disabled by default. It can be enabled
using drill.sort.external.disable_managed: false.

The operator now spills before receiving a new batch. Revised memory calcs and
merge calcs to make them a bit clearer and provide more margin of error
for the power-of-two allocations used when allocating vectors.

We have two external sort implementations, but only one operator code
for both. They can use only one Metrics enum between them. When adding
new metrics to the new version, didn’t add matching metrics to the old
one. This fixes that issue. (The issue will go away once the old one is
retired.)

Revised memory calculations to reflect limit of 16 MB per vector.
Current revision limits to 16 MB per output batch to be safe. Next
revision will enforce per-vector limits to allow the overall batch to
be larger when possible.

Also simplified the merge-time calculations.

Original code provided only crude methods to learn the size of a record
batch. Adds a "RecordBatchSizer" to provide detailed analysis so the
sort can know the amount of memory used to buffer a batch, the number
of rows, and the expected row width once the rows are copied to a
spill file or the output.

Moved generic spill classes to a separate package.

Created parameters for spill batch size and merge batch size. Separated
these values in code. Deprecated the min, max spill parameters as they
no longer add much value. Minor code rearranging.

Bug fix

Fixes a corner case of merging spilled files in a low-memory condition.

Fixes from code review

close apache/drill#717

5 years agoDRILL-4864: Add ANSI format for date/time functions
Serhii-Harnyk [Thu, 1 Sep 2016 14:48:02 +0000 (17:48 +0300)] 
DRILL-4864: Add ANSI format for date/time functions

DRILL-4864: Add ANSI format for date/time functions(review changes)

close apache/drill#581

5 years agoDRILL-5040: Parquet writer unable to delete table folder on abort
Arina Ielchiieva [Mon, 6 Feb 2017 13:11:02 +0000 (13:11 +0000)] 
DRILL-5040: Parquet writer unable to delete table folder on abort

close apache/drill#744

5 years agoDRILL-5219: Relax user properties validation in C++ client
Laurent Goujon [Wed, 25 Jan 2017 18:32:33 +0000 (10:32 -0800)] 
DRILL-5219: Relax user properties validation in C++ client

Unlike Java client, C++ client only allows user properties present in a
whitelist. Relax this restriction so that user can add extra properties.

This closes #727

5 years agoDRILL-5243: Fix TestContextFunctions.sessionIdUDFWithinSameSession unit test
Arina Ielchiieva [Mon, 6 Feb 2017 13:51:49 +0000 (13:51 +0000)] 
DRILL-5243: Fix TestContextFunctions.sessionIdUDFWithinSameSession unit test

This closes #743

5 years agoDRILL-5241: JDBC proxy driver: Do not put null value in map
David Haller [Fri, 20 Jan 2017 20:22:52 +0000 (21:22 +0100)] 
DRILL-5241: JDBC proxy driver: Do not put null value in map

This closes #724

5 years agoDRILL-5223: Drill should ensure balanced workload assignment at node level in order...
Padma Penumarthy [Sat, 21 Jan 2017 01:57:10 +0000 (17:57 -0800)] 
DRILL-5223: Drill should ensure balanced workload assignment at node level in order to get better query performance

This closes #730

5 years agoDRILL-5240: Parquet - fix unnecessary object creation while checking for null values...
Parth Chandra [Thu, 2 Feb 2017 04:14:52 +0000 (20:14 -0800)] 
DRILL-5240: Parquet - fix unnecessary object creation while checking for null values in nullable var length columns

This closes #740

5 years agoDRILL-5238: CTTAS: unable to resolve temporary table if workspace is indicated withou...
Arina Ielchiieva [Thu, 2 Feb 2017 11:47:19 +0000 (11:47 +0000)] 
DRILL-5238: CTTAS: unable to resolve temporary table if workspace is indicated without schema

This closes #736

5 years agoDRILL-5237: FlattenRecordBatch loses nested fields from the schema when returns empty...
Serhii-Harnyk [Fri, 27 Jan 2017 15:36:10 +0000 (15:36 +0000)] 
DRILL-5237: FlattenRecordBatch loses nested fields from the schema when returns empty batches for the first time

This closes #735

5 years agoDRILL-5224: CTTAS: fix errors connected with system path delimiters (Windows)
Arina Ielchiieva [Thu, 26 Jan 2017 18:14:28 +0000 (20:14 +0200)] 
DRILL-5224: CTTAS: fix errors connected with system path delimiters (Windows)

This closes #731

5 years agoDRILL-5220: Provide API to set application/client names in C++ connector
Laurent Goujon [Wed, 25 Jan 2017 18:52:31 +0000 (10:52 -0800)] 
DRILL-5220: Provide API to set application/client names in C++ connector

Add method to DrillClientConfig to set the client and the application names
in the C++ connector.

Allow the ODBC driver (or any user of the C++ connector) to provide more
specific informations like the application using the client.

This closes #728

5 years agoDRILL-5215: CTTAS: disallow temp tables in view expansion logic
Arina Ielchiieva [Tue, 24 Jan 2017 12:33:11 +0000 (12:33 +0000)] 
DRILL-5215: CTTAS: disallow temp tables in view expansion logic

This closes #725

5 years agoDRILL-5207: Improve Parquet Scan pipelining. Add a configurable AsyncPageReader Queue...
Parth Chandra [Wed, 14 Dec 2016 20:08:20 +0000 (12:08 -0800)] 
DRILL-5207: Improve Parquet Scan pipelining. Add a configurable AsyncPageReader Queue. Enforce total size of parquet row group. Do not initialize BufferedDirectBufInputStream buffer in init. Wait for first read. Change default size of BufferedDirectBufInputStream. Do not invoke getOptions too many times in Parquet reader. Add metrics for processing time, and decoding time for varlen and fixedlen columns.

This closes #723

5 years agoDRILL-5043: Function that returns a unique id per session/connection similar to MySQL...
Nagarajan Chinnasamy [Thu, 15 Dec 2016 14:18:39 +0000 (19:48 +0530)] 
DRILL-5043: Function that returns a unique id per session/connection similar to MySQL's CONNECTION_ID() #685

5 years agoDRILL-3562: Query fails when using flatten on JSON data where some documents have...
Serhii-Harnyk [Tue, 20 Dec 2016 16:55:41 +0000 (16:55 +0000)] 
DRILL-3562: Query fails when using flatten on JSON data where some documents have an empty array

closes #713

5 years agoDRILL-4764: Parquet file with INT_16, etc. logical types not supported by simple...
Serhii-Harnyk [Thu, 24 Nov 2016 13:24:03 +0000 (13:24 +0000)] 
DRILL-4764: Parquet file with INT_16, etc. logical types not supported by simple SELECT

closes #673

5 years agoDRILL-5126: Provide simplified, unified "cluster fixture" for test
Paul Rogers [Tue, 13 Dec 2016 21:41:23 +0000 (13:41 -0800)] 
DRILL-5126: Provide simplified, unified "cluster fixture" for test

Drill provides a robust selection of test frameworks that have evolved to satisfy the needs of a variety of test cases.
However, some do some of what a given test needs, while others to other parts. Also, the various frameworks make
assumptions (in the form of boot-time configuration) that differs from what some test may need, forcing the test
to start, then stop, then restart a Drillbit - an expensive operation.

Also, many ways exist to run queries, but they all do part of the job. Several ways exist to channge
runtime options.

This checkin shamelessly grabs the best parts from existing frameworks, adds a fluent builder facade
and provides a complete, versitie test framework for new tests. Old tests are unaffected by this
new code.

An adjustment was made to allow use of the existing TestBuilder mechanism. TestBuilder used to
depend on static members of BaseTestQuery. A "shim" allows the same code to work in the old
way for old tests, but with the new ClusterFixture for new tests.

Details are in the org.apache.drill.test.package-info.java file.

This commit modifies a single test case, TestSimpleExternalSort, to use the new framework.
More cases will follow once this framework itself is committed.

Also, the framework will eventually allow use of the extended mock data source
from SQL. However, that change must await checkin of the mock data source changes.

Includes a LogFixture that allows setting logger options per test to simplify debugging via tests.

Also includes a “summary listener” to run a query and return a summary of the
run. Handy to simply verify that a query runs and to time it.

Added an async query runner for tests that want to run multiple
concurrent queries.

closes #710

5 years agoDRILL-5218: Support optionally disabling heartbeats from C++ client
Sudheesh Katkam [Wed, 25 Jan 2017 22:43:41 +0000 (14:43 -0800)] 
DRILL-5218: Support optionally disabling heartbeats from C++ client

closes #726

5 years agoDRILL-5164: Equi-join query results in CompileException when inputs have large number...
Serhii-Harnyk [Tue, 27 Dec 2016 16:20:37 +0000 (16:20 +0000)] 
DRILL-5164: Equi-join query results in CompileException when inputs have large number of columns.

close apache/drill#711

5 years agoDRILL-5104: Foreman should not set external sort memory for a physical plan
Paul Rogers [Tue, 13 Dec 2016 22:36:42 +0000 (14:36 -0800)] 
DRILL-5104: Foreman should not set external sort memory for a physical plan

Physical plans include a plan for memory allocations. However, the code
path in Foreman replans external sort memory, even for a physical plan.
This makes it impossible to use a physical plan to test memory
configuration.

This change avoids changing memory settings in a physical plan; while
preserving the adjustments for logical plans or SQL queries.

Revised to put a property in the plan itself. Old plans, and those
generated from SQL, will have memory allocations applied. Plans
marked as already "resource management" planned will be used as-is.

Includes a unit test that demonstrates the new behavior.

close apache/drill#703

5 years agoDRILL-5097: Using store.parquet.reader.int96_as_timestamp gives IOOB whereas convert_...
Vitalii Diravka [Wed, 14 Dec 2016 16:24:08 +0000 (16:24 +0000)] 
DRILL-5097: Using store.parquet.reader.int96_as_timestamp gives IOOB whereas convert_from works

close apache/drill#697

5 years agoDRILL-4956: Temporary tables support
Arina Ielchiieva [Thu, 3 Nov 2016 16:55:38 +0000 (16:55 +0000)] 
DRILL-4956: Temporary tables support

close apache/drill#666

5 years agoDRILL-5172: Display elapsed time for queries in the UI 722/head
Kunal Khatua [Thu, 12 Jan 2017 00:45:15 +0000 (16:45 -0800)] 
DRILL-5172: Display elapsed time for queries in the UI

Displays the elapsed time for running queries and the total duration of completed/failed/cancelled queries in the list of query profiles displayed, and within a query's profile page as well.
The query runtime is  displayed in '[hr] [min] sec'.
e.g. A duration of 25,254,321ms is displayed  7 hr 00 min 54.321 sec

This closes #721

5 years agoDRILL-4868: fix how hive function set DrillBuf.
chunhui-shi [Tue, 13 Dec 2016 23:16:40 +0000 (15:16 -0800)] 
DRILL-4868: fix how hive function set DrillBuf.

This closes #695

5 years agoDRILL-4558: BSonReader should prepare buffer size as actual need
chunhui-shi [Wed, 14 Dec 2016 02:00:49 +0000 (18:00 -0800)] 
DRILL-4558: BSonReader should prepare buffer size as actual need

This closes #696

5 years agoDRILL-4919: Fix select count(1) / count(*) on csv with header
Arina Ielchiieva [Thu, 29 Dec 2016 15:42:53 +0000 (15:42 +0000)] 
DRILL-4919: Fix select count(1) / count(*) on csv with header

This closes #714

5 years agoDRILL-5152: Enhance the mock data source: better data, SQL access
Paul Rogers [Thu, 22 Dec 2016 05:47:20 +0000 (21:47 -0800)] 
DRILL-5152: Enhance the mock data source: better data, SQL access

Provides an enhanced version of the mock data source. See the JIRA
entry for motivation, package-info.java for details of operation.

Revisions suggested by code review

Also includes additional comments and a few more compiler warning
cleanups.

This closes #708

5 years agoDRILL-5105: comment out unecessary recursive buffer size check
chunhui-shi [Wed, 4 Jan 2017 01:39:49 +0000 (17:39 -0800)] 
DRILL-5105: comment out unecessary recursive buffer size check

This closes #715

5 years agoDRILL-4996: Parquet Date auto-correction is not working in auto-partitioned parquet...
Vitalii Diravka [Mon, 12 Dec 2016 04:41:49 +0000 (04:41 +0000)] 
DRILL-4996: Parquet Date auto-correction is not working in auto-partitioned parquet files generated by drill-1.6

- Changed detection approach of corrupted date values for the case, when parquet files are generated by drill:
  the corruption status is determined by looking at the min/max values in the metadata;
- Appropriate refactoring of TestCorruptParquetDateCorrection.

This closes #687

5 years agoDRILL-5116: Enable generated code debugging in each Drill operator
Paul Rogers [Tue, 13 Dec 2016 01:30:56 +0000 (17:30 -0800)] 
DRILL-5116: Enable generated code debugging in each Drill operator

DRILL-5052 added the ability to debug generated code. The reviewer suggested
permitting the technique to be used for all Drill operators. This PR provides
the required fixes. Most were small changes, others dealt with the rather
clever way that the existing byte-code merge converted static nested classes
to non-static inner classes, with the way that constructors were inserted
at the byte-code level and so on. See the JIRA for the details.

This code passed the unit tests twice: once with the traditional byte-code
manipulations, a second time using "plain-old Java" code compilation.
Plain-old Java is turned off by default, but can be turned on for all
operators with a single config change: see the JIRA for info. Consider
the plain-old Java option to be experimental: very handy for debugging,
perhaps not quite tested enough for production use.

close apache/drill#716

5 years agoDRILL-5039: NPE - CTAS PARTITION BY (<char-type-column>)
Arina Ielchiieva [Fri, 23 Dec 2016 17:51:38 +0000 (17:51 +0000)] 
DRILL-5039: NPE - CTAS PARTITION BY (<char-type-column>)

close apache/drill#706

5 years agoDRILL-5121 Fix for memory leak. Changes fieldVectorMap in ScanBatch to a CaseInsensit...
karthik [Mon, 14 Nov 2016 18:36:53 +0000 (10:36 -0800)] 
DRILL-5121 Fix for memory leak. Changes fieldVectorMap in ScanBatch to a CaseInsensitiveMap

close apache/drill#690

5 years agoDRILL-5127: Revert the fix for DRILL-4831
Padma Penumarthy [Tue, 3 Jan 2017 22:01:00 +0000 (14:01 -0800)] 
DRILL-5127: Revert the fix for DRILL-4831

close apache/drill#718

5 years agoDRILL-5159: Drill's ProjectMergeRule should operate on RelNodes with same convention...
Jinfeng Ni [Thu, 22 Dec 2016 02:00:46 +0000 (18:00 -0800)] 
DRILL-5159: Drill's ProjectMergeRule should operate on RelNodes with same convention trait.

close apache/drill#705

5 years agoDRILL-5052: Option to debug generated Java code using an IDE
Paul Rogers [Sun, 20 Nov 2016 02:29:24 +0000 (18:29 -0800)] 
DRILL-5052: Option to debug generated Java code using an IDE

Provides a second compilation path for generated code: “plan old Java”
in which generated code inherit from their templates. Such code can be
compiled directly, allowing easy debugging of generated code.

Also show to generate two classes in the External Sort Batch as “plain
old Java” to enable IDE debugging of that generated code. Required
minor clean-up of the templates.

Fixes some broken toString( ) methods in code generation classes
Fixes a variety of small compilation warnings
Adds Java doc to a few classes

Includes clean-up from code review comments.

close apache/drill#660

5 years agoDRILL-5032: Drill query on hive parquet table failed with OutOfMemoryError: Java...
Serhii-Harnyk [Thu, 27 Oct 2016 19:20:27 +0000 (19:20 +0000)] 
DRILL-5032: Drill query on hive parquet table failed with OutOfMemoryError: Java heap space

close apache/drill#654

5 years agoDRILL-5051: Fix incorrect computation of 'fetch' in LimitRecordBatch when 'offset...
hongze.zhz [Fri, 18 Nov 2016 12:11:38 +0000 (20:11 +0800)] 
DRILL-5051: Fix incorrect computation of 'fetch' in LimitRecordBatch when 'offset' is specified

close apache/drill#662

5 years agoDRILL-5098: Improving fault tolerance for connection between client and foreman node.
Sorabh Hamirwasia [Thu, 1 Dec 2016 22:58:00 +0000 (14:58 -0800)] 
DRILL-5098: Improving fault tolerance for connection between client and foreman node.

Adding tries config option in connection string. Improving fault tolerance in Drill client when trying to make first connection with foreman. The client will try to connect to min(tries, num_drillbits) unique drillbits unless a successfull connection is established.

HYGIENE: Refactoring BasicClient::close to call RemoteConnection::close

close apache/drill#679

5 years agoDRILL-5117: Compile error when query a json file with 1000+columns
Serhii-Harnyk [Thu, 8 Dec 2016 20:08:34 +0000 (20:08 +0000)] 
DRILL-5117: Compile error when query a json file with 1000+columns

close apache/drill#686

5 years agoDRILL-5123: Write query profile after sending final response to client to improve...
Paul Rogers [Mon, 12 Dec 2016 17:18:38 +0000 (09:18 -0800)] 
DRILL-5123: Write query profile after sending final response to client to improve latency

In testing a particular query, I used a test setup that does not write
to the "persistent store", causing query profiles to not be saved. I
then changed the config to save them (to local disk). This produced
about a 200ms difference in query run time as perceived by the client.

I then moved writing the query profile after sending the client the
final message. This resulted in an approximately 100ms savings, as
perceived by the client, in query run time on short (~3 sec.) queries.

close apache/drill#692

5 years agoDRILL-4938: Report UserException when constant expression reduction fails
Serhii-Harnyk [Fri, 18 Nov 2016 16:02:04 +0000 (16:02 +0000)] 
DRILL-4938: Report UserException when constant expression reduction fails

closes #689

5 years agoDRILL-5065: Optimize count(*) queries on MapR-DB JSON Tables
Smidth Panchamia [Mon, 28 Nov 2016 21:59:34 +0000 (13:59 -0800)] 
DRILL-5065: Optimize count(*) queries on MapR-DB JSON Tables

In MapR-DB v5.2.0, we enabled '_id' only projection for JSON
tables. Hence, we can now optimize the following queries:

a. count(*) by projecting only the '_id' column.

b. '_id' only projections, including count(_id)

Change the format plugin config parameter name.

Fix setter of config parameter `disableCountOptimization` for drill-maprdb plugin

closes #678

5 years agoDRILL-5081: Lower logging level for corrupt dates message
Sudheesh Katkam [Tue, 29 Nov 2016 21:07:36 +0000 (13:07 -0800)] 
DRILL-5081: Lower logging level for corrupt dates message

* introduced in DRILL-4203

closes #691

5 years agoDRILL-4987: Use ImpersonationUtil to get process user’s groups in RemoteFunctionRegistry
Sudheesh Katkam [Tue, 1 Nov 2016 20:42:52 +0000 (13:42 -0700)] 
DRILL-4987: Use ImpersonationUtil to get process user’s groups in RemoteFunctionRegistry

closes #642

5 years agoDRILL-4812: Fix FileSelection#handleWildCard to use normalized path separator
Mike Lavender [Fri, 21 Oct 2016 21:05:43 +0000 (14:05 -0700)] 
DRILL-4812: Fix FileSelection#handleWildCard to use normalized path separator

closes #627

5 years agoDRILL-5119: Update MapR version to 5.2.0.40963-mapr
Patrick Wong [Fri, 9 Dec 2016 22:26:31 +0000 (14:26 -0800)] 
DRILL-5119: Update MapR version to 5.2.0.40963-mapr

closes #688

5 years agoDRILL-5056: Fix UserException to write full message to log
Paul Rogers [Mon, 21 Nov 2016 18:19:15 +0000 (10:19 -0800)] 
DRILL-5056: Fix UserException to write full message to log

A case occurred in which the External Sort failed during spilling.
All that was written to the log was:

2016-11-18 ... INFO  o.a.d.e.p.i.xsort.ExternalSortBatch - User Error Occurred

Modified the logging code to provide more information to aid tracking down problems that occur in the field.

closes #665

5 years agoDRILL-5112: Fix config in PopUnitTestBase
Paul Rogers [Wed, 7 Dec 2016 05:11:56 +0000 (21:11 -0800)] 
DRILL-5112: Fix config in PopUnitTestBase

Tests rely on command-line settings in the pom.xml file. Those
settings are not available when tests are run in Eclipse.
Replicated required settings into the base test class (as in
BaseTestQuery).

closes #681