Raghunandan S [Fri, 2 Aug 2019 09:13:40 +0000 (14:43 +0530)]
[maven-release-plugin] prepare release apache-CarbonData-1.6.0-rc2
ravipesala [Fri, 2 Aug 2019 05:45:05 +0000 (11:15 +0530)]
[HOTFIX] Removed the hive-exec and commons dependency from hive module
Removed the hive-exec and commons dependency from hive module as spark has its own hive-exec.
Because of external hive-exec dependency, some tests are failing.
This closes #3347
ajantha-bhat [Thu, 25 Jul 2019 13:20:19 +0000 (18:50 +0530)]
[CARBONDATA-3481] Multi-thread pruning fails when datamaps count is just near numOfThreadsForPruning
Cause : When the datamaps count is just near numOfThreadsForPruning,
As code is checking '>= ', last thread may not get the datamaps for prune.
Hence array out of index exception is thrown in this scenario.
There is no issues with higher number of datamaps.
Solution: In this scenario launch threads based on the distribution value,
not on the hardcoded value
This closes #3336
Indhumathi27 [Fri, 26 Jul 2019 11:21:32 +0000 (16:51 +0530)]
[CARBONDATA-3478]Fix ArrayIndexOutOfBound Exception on compaction after alter operation
Problem:
In case of alter add, drop, rename operation, restructuredBlockExists will be true.
Currently, to get RawResultIterator for a block, we check if block has ColumnDrift
or not, by comparing SegmentProperties and columndrift columns.
SegmentProperties will be formed based on restructuredBlockExists.
if restructuredBlockExists is true, we will take current column schema to form SegmentProperties,
else, we will use datafilefooter columnschema to form SegmentProperties.
In the example given in CARBONDATA-3478 for both blocks, we use current column
schema to form SegmentProperties, as restructuredBlockExists will be true.
Hence, while iterating block 1, it throws ArrayIndexOutOfBound exception,
as it uses RawResultIterator instead of ColumnDriftRawResultIterator
Solution:
Use schema from datafilefooter of each block to check if it has columndrift or not
This closes #3337
ravipesala [Thu, 1 Aug 2019 12:42:50 +0000 (18:12 +0530)]
[HOTFIX] CLI test case failed during release because of space differences
CLI test case is failed if the release name is short and without snapshot, it adds more space.
That's why changed test check the individual contains instead of a batch of lines.
This closes #3344
ravipesala [Mon, 15 Jul 2019 16:00:37 +0000 (21:30 +0530)]
[HOTFIX] Fixed date filter issue for fileformat
Problem:
In fileformat spark converts date to date object and sends to the carbon, carbon converts it to string and
then generates miliseconds. But there is a milli second gap of spark generated milli and carbon generated milli.
This causes date filters are not working properly.
Solution:
Convert the date to millis in spark side before give to carbon.
This closes #3327
kunal642 [Mon, 29 Jul 2019 16:16:41 +0000 (21:46 +0530)]
[HOTFIX] Fix failing CI test cases
Problem: Bloom and lucene dependency was removed due to which mvn was downloaded the old jar.
Solution: Add bloom and lucene dependency to the main pom
This closes #3341
Indhumathi27 [Tue, 16 Jul 2019 10:17:13 +0000 (15:47 +0530)]
[CARBONDATA-3474]Fix validate mvQuery having filter expression and correct error message
Problem:
1. Create mv datamap with select query having predicate expression like &&, AND, OR with predicate not present in projection.
Executing the same query throws exception
2. For UDF functions like histogram_numeric, collect_set, collect_list etc., return type is complex type.
Creating mv with above functions throws improper exception
Solution:
3. Validate mv queries having predicate as expression and throw exception on create datamap if predicate is not given in projection
4. Correct unsupported error message for functions like histogram_numeric, collect_set, collect_list etc.,
whose return type is of complex data type with mv
This closes #3329
kunal642 [Mon, 15 Jul 2019 12:11:48 +0000 (17:41 +0530)]
[HOTFIX] Fixed sk/ak not found for datasource table
Fixed sk/ak not found for datasource table
This closes #3326
ravipesala [Tue, 16 Jul 2019 09:29:31 +0000 (14:59 +0530)]
[HOTFIX] Included MV module in assembly jar
Curently MV module is not included in assembly jar and
it needed add mv profile, since carbon supports mv for
all spark versions we are removing the profile and
add it to assembly jar also.
This closes #3328
ravipesala [Sat, 13 Jul 2019 10:43:53 +0000 (16:13 +0530)]
[HOTFIX] Added taskid as UUID while writing files in fileformat to avoid corrupting.
Problem: In FIleFormat write, carbon is using task id as System.nanoTime()
when multiple tasks launched concurrently, there is a chance that two task
can have same id very rarely. Due to this, two spark task launched for one
insert will have same carbondata file name. so, when both tasks write to one file,
chances are more to corrupt the file. which leads in query failure
solution: use unique uuid task id instead of nano seconds.
This closes #3325
Jacky Li [Thu, 11 Jul 2019 07:33:12 +0000 (15:33 +0800)]
[HOTFIX] Fix json to carbon writer
When using SDK to write carbon files with json input, there is a NullPointerException.
This closes #3323
ravipesala [Tue, 9 Jul 2019 13:59:19 +0000 (19:29 +0530)]
[HOTFIX] Reset the hive catalog table stats to none even after refresh lookup relation.
Problem:
Spark plan picks up the catalog table stats in case of relookup of relation in case of modification time is updated.
Solution:
Even in case of relookup of relation set the stats to none.
This closes #3321
kunal642 [Tue, 18 Jun 2019 21:35:34 +0000 (03:05 +0530)]
[CARBONDATA-3462][DOC]Added documentation for index server
Added documentation for index server
This closes #3294
kunal642 [Wed, 3 Jul 2019 05:24:08 +0000 (10:54 +0530)]
[CARBONDATA-3460] Fixed EOFException in CarbonScanRDD
Problem: Delete delta information was not written properly in the OutputStream due the flag based writing.
Solution: Always write the delete delta info, the size of the array will be the deciding factor whether to read further or not.
This closes #3316
kunal642 [Tue, 2 Jul 2019 19:23:43 +0000 (00:53 +0530)]
[CARBONDATA-3459] Fixed id based distribution for showcache command
Problem: Currently tasks are not being fired based on the executor ID because getPrefferedLocation was not overridden.
Solution: override getPreferredLocations in the ShowCache and InvalidateCacheRDD to fire tasks at the appropriate location
This closes #3315
Indhumathi27 [Tue, 9 Jul 2019 08:31:07 +0000 (14:01 +0530)]
[HOTFIX] Fixed MinMax Based Pruning for Measure column in case of Legacy store
This closes #3320
Indhumathi27 [Tue, 9 Jul 2019 03:40:24 +0000 (09:10 +0530)]
[CARBONDATA-3467] Fix count(*) with filter on string column
Problem:
count(*) with filter on string column throws Unresolved Exception
Solution:
Added check for UnresolvedAlias in MVAnalyzer
This closes #3319
Indhumathi27 [Thu, 27 Jun 2019 11:39:20 +0000 (17:09 +0530)]
[CARBONDATA-3457][MV] Fix Column not found issue with Query having Cast Expression
Problem:
For Cast(exp), alias reference is not included, hence throws column not found exception for column given inside cast expression.
Solution:
AliasMap has to be created for CAST[EXP] also and should be replaced with subsmer alias map references.
This closes #3312
Indhumathi27 [Thu, 27 Jun 2019 12:46:04 +0000 (18:16 +0530)]
[CARBONDATA-3456] Fix DataLoading to MV table when Yarn-Application is killed
Problem:
When dataLoad is triggered on datamaptable and new LoadMetaDetail with SegmentStatus as InsertInProgress and segmentMappingInfo is created and then yarn-application is killed. Then on next load, stale loadMetadetail is still in InsertInProgress state and mainTableSegemnts mapped to that loadMetaDetail is not considered for nextLoad resulted in dataMismatch between main table and datamap table
Solution:
Clean up the old invalid segment before creating a new entry for new Load.
This closes #3310
shivamasn [Tue, 2 Jul 2019 10:15:54 +0000 (15:45 +0530)]
[CARBONDATA-3458] Setting Spark Execution Id to null only for Spark version 2.2 and below.
Problem: Spark Execution_ID should not be set to null in Spark 2.3.
Solution: Set EXECUTION_ID to null in spark version 2.2 and below.
In 2.3 version, EXECUTION_ID is to be set by spark code.
This closes #3313
manishnalla1994 [Tue, 2 Jul 2019 05:59:19 +0000 (11:29 +0530)]
[DOCUMENTATION] Document update for new configurations.
Added documentation for new configurations.
This closes #3314
dhatchayani [Thu, 27 Jun 2019 10:29:13 +0000 (15:59 +0530)]
[CARBONDATA-3455] Job Group ID is not displayed for the IndexServer Jobs
Job Group ID is not displayed for the IndexServer Jobs as it is not set.
This closes #3309
ajantha-bhat [Thu, 27 Jun 2019 05:01:30 +0000 (10:31 +0530)]
[CARBONDATA-3453] Fix set segment issue in adaptive execution
Cause: For set segments, driver will check carbon property and
carbon property will look for segments in session params, which
is not set in current thread incase of adaptive execution.
Solution: Use the session params from RDD's session info,
where it will be set correctly.
This closes #3307
manishnalla1994 [Fri, 14 Jun 2019 07:17:11 +0000 (12:47 +0530)]
[CARBONDATA-3437] Changed implementation of Map datatype
Problem: Insert into map should override the old values in case of new duplicate value, which it was not doing.
Solution: Changed the implementation to work fine in case of duplicate values inside map to show the latest values.
This closes #3288
dhatchayani [Wed, 19 Jun 2019 07:36:19 +0000 (13:06 +0530)]
[CARBONDATA-3443] Update hive guide with Read from hive
This closes #3296
kunal642 [Tue, 11 Jun 2019 14:27:23 +0000 (19:57 +0530)]
[CARBONDATA-3440] Updated alter table DDL to accept upgrade_segments as a compaction type
Updated alter table DDL to accept upgrade_segments as a compaction type.
made legacy segment distribution round-robin based.
This closes #3277
manishnalla1994 [Mon, 24 Jun 2019 09:22:29 +0000 (14:52 +0530)]
[TESTCASE] Test Case fix for Spark 2.1
Problem: Testcase not handled for Spark 2.1/2.3 separately.
This closes #3305
namanrastogi [Thu, 13 Jun 2019 14:05:42 +0000 (19:35 +0530)]
[CARBONDATA-3435] Show-Metacache behaviour change
Behaviour of SHOW METACACHE ON TABLE tblName is not same for Driver
and Index-Server. Need same behavior for both.
Old behaviour: Show all the entries for all the child tables,
even if the size is zero.
New behaviour: Don't show any entry with size 0 in both Driver and Index-Server.
This closes #3286
manishnalla1994 [Sat, 22 Jun 2019 05:57:41 +0000 (11:27 +0530)]
[CARBONDATA-3449] Synchronize the initialization of listeners in case of concuurent scenarios
Problem: Initialization of listeners in case of concurrent scenarios is not synchronized.
Solution: Changed the function to a val due to which the synchronization will be handled by scala and init will only occur once.
This closes #3304
ajantha-bhat [Fri, 21 Jun 2019 05:05:06 +0000 (10:35 +0530)]
[CARBONDATA-3448] Fix wrong results in preaggregate query with spark adaptive execution
problem: Wrong results in preaggregate query with spark adaptive execution
Spark2TestQueryExecutor.conf.set(SQLConf.ADAPTIVE_EXECUTION_ENABLED.key, "true")
cause: For preaggreagate, segment info is set into threadLocal. when adaptive execution is called, spark is calling getInternalPartition in
another thread where updated segment conf is not set. Hence it is not using the updated segments.
solution: CarbonScanRdd is already having the sessionInfo, use it instead of taking session info from the current thread.
This closes #3303
manhua [Wed, 12 Jun 2019 01:47:17 +0000 (09:47 +0800)]
[CARBONDATA-3427] Beautify DAG by showing less text
beautify DAG by showing less text
This closes #3278
kunal642 [Tue, 28 May 2019 10:00:49 +0000 (15:30 +0530)]
[CARBONDATA-3398] Added a new Column to show Cache Location
Handled show cache for index server and MV
This closes #3259
kunal642 [Tue, 4 Jun 2019 12:16:59 +0000 (17:46 +0530)]
[CARBONDATA-3412] Empty results are displayed for non-transactional tables
Solution:
Added a check for nonTransactional table when generating the read committed scope.
Empty results are displayed for non-transactional tables
Fixed TableNotFoundException for embedded mode
This closes #3255
lamber-ken [Sun, 12 May 2019 15:53:19 +0000 (23:53 +0800)]
[CARBONDATA-3380] Fix missing appName and AnalysisException bug in DirectSQLExample
Fix missing appName and AnalysisException bug in DirectSQLExample
This closes #3213
dhatchayani [Tue, 18 Jun 2019 05:09:27 +0000 (10:39 +0530)]
[CARBONDATA-3441] Aggregate queries are failing on Reading from Hive
Problem:
Aggregate queries are failing on Reading from Hive, as the table_name, db_name are not set to the conf.
Solution:
Set table_name, db_name to the conf and handle NullPointerException.
This closes #3292
Indhumathi27 [Mon, 10 Jun 2019 07:08:16 +0000 (12:38 +0530)]
[CARBONDATA-3398]Fix drop metacache on table having mv datamap
Fixed drop metacache on table having mv datamap
This closes #3274
akashrn5 [Wed, 19 Jun 2019 08:08:20 +0000 (13:38 +0530)]
[CARBONDATA-3444]Fix MV query failure when projection has cast expression with alias
Problem:
MV datamap creation fails when the project column as cast expression with multiple arithmetic functions on
one of main table columns with alias. It throws the field does not exists error.
when create datamap DDL has DM provider name as capital letters, the query was not hitting the MV table
Solution:
When making fieldRelationMap, handling the above case was missed, added a case to handle this scenario.
When loading the datamapCatalogs, take care to convert to lower case
This closes #3298
kunal642 [Tue, 28 May 2019 10:00:49 +0000 (15:30 +0530)]
[CARBONDATA-3398] Handled show cache for index server and MV
Added support to show/drop metacahe information from index server.
Added tableNotFoundException fix when dbName and tableName have '' in their names, while splitting using '' the dbName was extracted wrongly. Instead now dbname and tableName would be seperated by '-' internally for show cache
This closes #3245
Indhumathi27 [Tue, 18 Jun 2019 15:01:45 +0000 (20:31 +0530)]
[CARBONDATA-3442]Fix creating mv datamap with column name having length more than 128
Problem:
creating mv datamap with column name having length more than 128 fails
Solution:
If column name is more than 128, then take substring and append a counter
This closes #3290
manishnalla1994 [Thu, 13 Jun 2019 04:41:41 +0000 (10:11 +0530)]
[CARBONDATA-3432] Added property for enable/disable Range column compaction and broadcast all splits
Added property for enable/disable Range column compaction for which default value is true.
Instead of sending all splits to all executors one by one, broadcast all splits from driver.
This closes #3284
qiuchenjian [Fri, 14 Jun 2019 14:49:25 +0000 (22:49 +0800)]
[CARBONDATA-3247] Support to select all columns when creating MV datamap
[Cause]
ColumnPruning rule will not generate Project LogicalPlan when select all
columns, So carbon can't generate SELECT node when transforming LogicalPlan
to ModularPlan, then it can't create mv datamap
[Solution]
Add a executor rule to change the logical plan to support this scene
This closes #3072
akashrn5 [Wed, 19 Jun 2019 07:22:30 +0000 (12:52 +0530)]
[CARBONDATA-3444]Fix MV query failure when column name and table name is same in case of join scenario
Problem:
when there are columns with same in different table, after sql generation, the project column will be like gen_subsumer_0.product ,
it fails during logical plan generation from rewritten query, as column names will be ambigous
Solution:
update the outputlist when there are duplicate columns present in query. Here we can form the qualified name for the Attribute reference.
So when qualifier is defined for column, the qualified name wil be like <col_qualifier_name>_<col.name>,
if qualifier is not defined, then it will be <col_exprId_id>_<col.name>. So update for all the nodes like groupby , select nodes,
so that it will be handled when there will be amguity in columns.
This closes #3297
manishnalla1994 [Wed, 19 Jun 2019 15:13:59 +0000 (20:43 +0530)]
[CARBONDATA-3445] Aggregate query empty list error fix.
Problem:
In Aggregate query, CountStarPlan throws head of empty list error.
When right node of join node has Aggregate node with the empty aggregate and group expression then it matches with the count() plan
and we try to get the Head of the aggregate expression assuming only one aggregate expression is present.
This happens only in Spark 2.3 so added an empty list check that if aggregate expression is empty, it will not treat as count() plan.
Aggregate query empty list error fix.
This closes #3300
ajantha-bhat [Tue, 18 Jun 2019 13:01:05 +0000 (18:31 +0530)]
[CARBONDATA-3365] SDK Arrow integration document update
SDK Arrow integration document update
This closes #3293
manhua [Wed, 19 Jun 2019 03:15:30 +0000 (11:15 +0800)]
[CARBONDATA-3373] Fixed measure column filter perf
when sql with 'in numbers' and spark.sql.codegen.wholeStage is false,the query is slow,
the reason is that canbonscan row level filter's time complexity is O(n^2), we can replace list with hashset to improve query performance
sql example: select * from xx where filed in (1,2,3,4,5,6)
This closes #3295
Indhumathi27 [Fri, 21 Jun 2019 03:44:58 +0000 (09:14 +0530)]
[CARBONDATA-3398]Block show/drop metacache directly on child table
Problem:
show/drop metacache directly on mv child table is not blocked
Solution:
Blocked show/drop metacache directly on mv child table
This closes #3302
kumarvishal09 [Wed, 12 Jun 2019 15:22:00 +0000 (20:52 +0530)]
[CARBONDATA-3447]Index server performance improvement
Problem:
When number of splits are high, index server performance is slow as
compared to old flow(Driver caching). This is because data is transferred
over network is more and causing performance bottleneck.
Solution:
1. If data transferred is less we can sent through network, but when it
grows we can write to file and only send file name and in Main driver
it will read the file and construct input split.
2. Use snappy to compress the data, so data transferred through network/written
to file size will be less, so IO time wont impact performance
3. In main driver pruning is done in multiple thread, added same for
index executor as now index executor will do the pruning
4. In case of block cache no need to send blockletdetailinfo object as
size is more and same can be constructed in executor from file footer
This closes #3281
Co-authored-by: kunal642 <kunalkapoor642@gmail.com>
QiangCai [Wed, 10 Apr 2019 03:39:52 +0000 (11:39 +0800)]
[DOC] Improve java doc for DataSkewRangePartitioner
Improve java doc for DataSkewRangePartitioner
This closes #3175
lamber-ken [Mon, 10 Jun 2019 07:34:51 +0000 (15:34 +0800)]
[CARBONDATA-3422] fix missing complex dimensions when prepare the data from raw object
fix missing complex dimensions when prepare the data from raw object
This closes #3270
Indhumathi27 [Tue, 4 Jun 2019 12:38:20 +0000 (18:08 +0530)]
[CARBONDATA-3425] Added documentation for mv
This closes #3275
akashrn5 [Thu, 13 Jun 2019 08:12:19 +0000 (13:42 +0530)]
[CARBONDATA-3433]Fix MV issues related to duplicate columns, limit and constant columns
Problem:
MV has below issues:
when has duplicate columns in select query, MV creation fails, but select is valid query
when used constant column in ctas for datamap creation, it fails
when limit is used in ctas for datamap creation, it fails
Solution:
since duplicate columns in query is valid, MV should support, so when creating columns, better take distinct columns
handle getting field relation map when we have constant column in query
block MV creation for limit ctas query, as it is not a valid case to use MV datamap.
This closes #3285
ajantha-bhat [Wed, 29 May 2019 10:16:34 +0000 (15:46 +0530)]
[CARBONDATA-3436] update insert into rule as per spark
Problem: Carbon is not following hive syntax for one row insert into strictly.
Cause: For insert into, carbon has its own rule. CarbonAnalysisRule[CarbonPreInsertionCasts]
In CarbonPreInsertionCasts, data type cast is missed. Hence carbon was not following hive syntax.
solution: Add data type cast rule for insert into.
This closes #3244
Co-authored-by: shivamasn <shivamasn17@gmail.com>
lamber-ken [Mon, 6 May 2019 06:51:39 +0000 (14:51 +0800)]
[HOTFIX] fix error describe of decimal
fix error describe of decimal
This closes #3204
Indhumathi27 [Wed, 12 Jun 2019 11:07:31 +0000 (16:37 +0530)]
[CARBONDATA-3434] Fix Data Mismatch between MainTable and MV DataMap table during compaction
Fix Data Mismatch between MainTable and DataMap table during compaction
Problem:
checkIfSegmentsToBeReloaded method of DataMapProvider was ignoring one main table segment to be loaded, considering it as already loaded
Solution:
Get all segments merged to given segment and check if it contains all list of segments stored in SegmentMapInfo. If true, then no need to load the segment, only update segment mapping
Block delete operation on datamap table
This closes #3282
dhatchayani [Thu, 6 Jun 2019 10:34:05 +0000 (16:04 +0530)]
[CARBONDATA-3415] Merge index is not working for partition table. Merge index for partition table is taking significantly longer time than normal table.
Problem:
(1) Merge index is not working for partition table.
(2) Merge index for partition table is significantly more than the normal carbon table.
Root cause:
(1) Merge index event listener is moved to preStatusUpdateEvent in #3221 . But preStatusUpdateEvent is not triggered in case of partition table. Test case to validate merge index on partition table is also wrong. Not caught in the test builders.
(2) Currently, merge index job will trigger tasks like one segment one task. But for a partition table, there are partitions in a segments and merge index is for partitions. So per segment it has to iterate and merge the index files inside partitions, because of this the time is little more when the number of partitions are high. Number of Tasks = Number of Segments.
Solution:
(1) Correct the test case and trigger merge index listener for partition table.
(2) Parallelize the tasks launched to the partitions. Number of tasks = Number of partitions in a segment
This closes #3262
ravipesala [Tue, 21 May 2019 12:10:56 +0000 (17:40 +0530)]
[HOTFIX] Fixed count(*) issue when MV is created with simple projection
Problem:
When MV is created with simple projection and select count(*) is fired
on main table then it is wrongly taking MV table with wrong projections.
Solution:
Simple projection MV should not be selected when count(*) is fired.
This closed #3229
xubo245 [Mon, 10 Jun 2019 08:32:03 +0000 (16:32 +0800)]
[CARBONDATA-3423] Validate dictionary for binary data type
Add validation for dictionary Include doesn't support binary data type.
This closes #3271
akashrn5 [Wed, 5 Jun 2019 03:48:50 +0000 (09:18 +0530)]
[CARBONDATA-3309]Fix MV modular plan generation failure for spark 2.1
Problem:
In Spark2.1 version, we have Metastore relation instead of HiveTableRelation
as in current spark. So this case is not handled.
Solution
When it is MetastoreRelaton, get the catalogTable and get all the info from
it to form the ModularRelation, to form the modular plan. Reflection is used to fix this.
This closes #3257
akashrn5 [Thu, 6 Jun 2019 10:45:49 +0000 (16:15 +0530)]
[CARBONDATA-3416]Correct the preparing of carbon analyzer with custom rules with spark analyzer
Problem:
When new analyzer rule added in spark, not reflecting in carbon.
Carbon prepares the session state builder by extending the
hivesession state builder, and create new analyzer by overiding
all the rules added by spark, so when new rule is added in spark,
it will not be reflected in carbon as we have overridden the complete analyzer
Solution
While making the new analyzer in carbon side, better to get all the
rules from super class and add the carbon rules in analyzer,
so that when new rules are added in spark side, since we take super.rules,
we get all the updated rules from spark, before adding the carbon custom rules.
This closes #3261
xubo245 [Mon, 29 Apr 2019 03:00:14 +0000 (11:00 +0800)]
[CARBONDATA-3356] Support decimal for json schema and Provide better exception
for users to solve problem when carbonData DataSource read SDK files with varchar
Support decimal by json schema and refactor exceptions
This closes #3181
ajantha-bhat [Mon, 10 Jun 2019 10:22:31 +0000 (15:52 +0530)]
[CARBONDATA-3426] Fix load performance by fixing task distribution issue
Problem: Consider 3 node cluster (host name a,b,c with IP1, IP2, IP3 as ip address),
to launch load task, host name is required from NewCarbonDataLoadRDD in getPreferredLocations().
But if the driver is 'a' (IP1),
Result is IP1, b,c instead of a,b,c. Hence task was not launching to one executor which is same ip as driver.
Solution: Revert the change in getLocalhostIPs as it is not used in any other flow.
This closes #3276
namanrastogi [Wed, 12 Jun 2019 14:50:18 +0000 (20:20 +0530)]
[CARBONDATA-3429] Updated error for CarbonCli when path is wrong
Problem: Execute CarbonCli with an invalid path. The output even
for invalid path is "unsorted", which is wrong.
Solution: It should throw that it segment path is invalid.
This closes #3280
xubo245 [Wed, 12 Jun 2019 07:13:46 +0000 (15:13 +0800)]
[CARBONDATA-3424] Carbon should throw proper exception when Select query
with average function for substring of binary column
Carbon should throw proper exception when Select query with
average function for substring of binary column
This closes #3272
ajantha-bhat [Tue, 4 Jun 2019 13:30:22 +0000 (19:00 +0530)]
[CARBONDATA-3413] Fix io.netty out of direct memory exception in arrow integration
problem : io.netty out of direct memory exception in arrow integration
cause: In ArrowConverter, allocator is not closed
solution: close the allocator in arrowConverter.
Also handle the problems in test utility API
This closes #3256
xubo245 [Fri, 31 May 2019 12:33:25 +0000 (20:33 +0800)]
[CARBONDATA-3410] Add UDF, Hex/Base64 SQL functions for binary
Add UDF, Hex/Base64 SQL functions for binary
This closes # 3253
jack86596 [Mon, 10 Jun 2019 01:53:49 +0000 (09:53 +0800)]
[CARBONDATA-3421] Fix create table without column with properties failed, but throw incorrect exception
Problem:
Create table without column with properties failed, but throw incorrect exception: Invalid table properties. The exception should be "create table without column."
Solution:
In CarbonSparkSqlParserUtil.createCarbonTable, we will do some validations like checking tblproperties, is column provided for external table so on.
We can add one more validation here to check is column provided for normal table. If not, throw MalformedCarbonCommandException.
This closes #3268
manishnalla1994 [Fri, 7 Jun 2019 05:43:16 +0000 (11:13 +0530)]
[CARBONDATA-3419] Desc Formatted not showing Range Column
Desc Formatted not showing Range Column
This closes #3265
manishnalla1994 [Thu, 6 Jun 2019 12:20:46 +0000 (17:50 +0530)]
[CARBONDATA-3417] Changed the number of cores for load and compaction in Range Column
Changed the number of cores for load and compaction in Range Column
This closes #3263
xubo245 [Fri, 31 May 2019 09:14:40 +0000 (17:14 +0800)]
[CARBONDATA-3408] CarbonSession partition support binary data type
CarbonSession partition support binary data type
This closes #3251
Indhumathi27 [Thu, 6 Jun 2019 13:12:12 +0000 (18:42 +0530)]
[CARBONDATA-3418] Inherit Column Compressor Property from parent table to its child table's
Inherited Column Compressor Property from parent table to its child table's
Fixed Describe formatted command to show inverted_index column, even if sort_scope is 'no_sort'
Fixed inheriting sort_scope to child tables, when sort_columns is provided and sort_scope is not provided
Alter set sort_columns="",when sort_scope is not no_sort is not supported. the same
behavior is added for create table with sort_columns="" and sort_scope is not no_sort
This closes #3264
ajantha-bhat [Thu, 6 Jun 2019 07:28:28 +0000 (12:58 +0530)]
[HOTFIX] Fix compatibility issue with page size
Problem: Array Out of bound exception when old store is read from latest code.
Cause: #3239 partially fix the problem, After this change,
For old store, length is written as zero. so in BlockletInfo,
numberOfRowsPerPage will be array of zero elements. Hence the ArrayOutOfBound exception.
Solution: Fill the numberOfRowsPerPage only when length is non-zero,
when length is zero, it will be filled from setNumberOfRowsPerPage in
BlockletDataRefNode as numberOfRowsPerPage is null.
This closes #3260
ajantha-bhat [Tue, 4 Jun 2019 05:47:20 +0000 (11:17 +0530)]
[CARBONDATA-3411] [CARBONDATA-3414] Fix clear datamaps logs an exception in SDK
problem: In sdk when datamaps are cleared, below exception is logged
java.io.IOException: File does not exist: ../carbondata/store/sdk/testWriteFiles/
771604793030370/Metadata/schema
cause: CarbonTable is required for only launching the job, SDK there is no need to launch job. so , no need to build a carbon table.
solution: build carbon table only when need to launch job.
problem [CARBONDATA-3411]: when Insert into partition table fails, exception doesn't print reason.
cause: Exception was caught , but error message was not from that exception.
solution: throw the exception directly
This closes #3254
qiuchenjian [Sun, 20 Jan 2019 13:33:20 +0000 (21:33 +0800)]
[CARBONDATA-3258] Add more test cases for mv datamap
This closes #3084
Indhumathi27 [Fri, 31 May 2019 10:23:01 +0000 (15:53 +0530)]
[CARBONDATA-3409] Fix Concurrent dataloading Issue with mv
Problem:
While performing concurrent dataloading to MV datamap, if any of the loads was not able to get TableStatusLock, then because newLoadName and segmentMap was empty, it was doing full rebuild.
Solution:
If load was not able to take tablestatuslock, then disable the datamap and return
This closes #3252
akashrn5 [Thu, 30 May 2019 08:45:44 +0000 (14:15 +0530)]
[CARBONDATA-3407]Fix distinct, count, Sum query failure when MV is created on single projection column
Problem:
when MV datamap is created on single column as simple projection, sum, distinct,count queries are failing during sql conversion of modular plan. Basically there is no case to handle the modular plan when we have group by node without alias info and has select child node which is rewritten.
Solution:
the sql generation cases should take this case also, after that the rewritten query will wrong as alias will be present inside count or aggregate function.
So actually rewritten query should be like:
SELECT count(limit_fail_dm1_table.limit_fail_designation) AS count(designation) FROM default.limit_fail_dm1_table
This closes #3249
KanakaKumar [Wed, 29 May 2019 07:09:06 +0000 (12:39 +0530)]
[CARBONDATA-3404] Support CarbonFile API through FileTypeInterface to use custom FileSystem
Currently CarbonData supports few set of FileSystems like HDFS,S3,VIEWFS schemes.
If user configures table path from different file systems apart from supported, FileFactory takes CarbonLocalFile as default and causes errors.
This PR proposes to support a API for user to extend CarbonFile which override the required methods from AbstractCarbonFile if a specific handling required for operations like renameForce.
This closes #3246
QiangCai [Wed, 15 May 2019 08:46:20 +0000 (16:46 +0800)]
[CARBONDATA-3350] Enhance custom compaction to resort old single segment by new sort_columns
This closes #3202
ajantha-bhat [Wed, 29 May 2019 12:14:11 +0000 (17:44 +0530)]
[CARBONDATA-3405] Fix getSplits() should clear the cache in SDK
Problem: when getsplits is called back to back once with blocklet
and once with block cache, block cache is not set.
Cause: cache key was dbname_tableName, but table_name was always hardcoded to null.
Solution: set the table name in cache key, clear cache after
getting splits in the getsplits()
This closes #3247
akashrn5 [Tue, 28 May 2019 06:25:13 +0000 (11:55 +0530)]
[CARBONDATA-3403]Fix MV is not working for like and filter AND and OR queries
Problem:
MV table is not hit during query for like and filter AND and OR queries,
When we have like or filter queries, the queries will have literals which will be case sensitive to fetch the data.
But dring MV modular plan generation, we register the schema for datamap where we convert the complete datamap query to lower case, which will even convert the literals.
So after modular plan generation of user query, during matching pahse of modular plan of datamap and user query, the semantic equals fails for literals, that is Attribute reference type.
Solution: Do not convert the query to lower case when registering schema, that is when adding the preagg fun to query. So it will be handled for MV
For preaggregate, instead converting complete query to lowercase, convert to lower case during ColumnTableRelation generation and createField for preaggregate generation, so it will be handled for preaggregate.
This closes #3242
kunal642 [Mon, 27 May 2019 07:11:54 +0000 (12:41 +0530)]
[CARBONDATA-3399] Implement executor id based distribution for indexserver
This closes #3237
dhatchayani [Thu, 30 May 2019 12:25:16 +0000 (17:55 +0530)]
[CARBONDATA-3406] Support Binary, Boolean,Varchar, Complex data types read and Dictionary columns read
1. Support Read for Binary, Boolean, Varchar, Complex data types.
2. Support Read for Dictionary columns.
This closes #3250
Indhumathi27 [Thu, 30 May 2019 05:42:38 +0000 (11:12 +0530)]
[CARBONDATA-3402] Added Inherit Inverted Index Property from Parent Table for Preagg & MV
1. Inherit Inverted Index Property from Parent Table for Preagg & MV datamap
2. When Preaggregate and mv datamap are present, while loading data to
preaggregate table, we should skip applying mv plan
This closes #3248
QiangCai [Wed, 15 May 2019 08:16:54 +0000 (16:16 +0800)]
[CARBONDATA-3349] support carboncli show sort_columns for a segment folder
carboncli support to show sort_columns for a segment folder.
carboncli -cmd sort_columns -p <segment_folder>
This closes #3183
xubo245 [Tue, 23 Apr 2019 07:45:25 +0000 (15:45 +0800)]
[CARBONDATA-3336] Support configurable decode for loading binary data, support base64 and Hex decode.
Support configurable decode for loading binary data, support base64 and Hex decode.
1. support configurable decode for loading
2. test datamap: mv, preaggregate, timeseries, bloomfilter, lucene
3. test datamap and configurable decode
Default non decoder for loading binary data, this PR support base64 and hex decoder
This closes #3188
RamKrishnan77 [Mon, 13 May 2019 15:07:46 +0000 (20:37 +0530)]
[CARBONDATA-3382] Fixed compressor type display in desc formatted
Problem :
Describe formatted table does not display the correct compression
type when its is configured in carbon.properties instead it always
shows the default compressor type.
Solution :
Get the compressor type from the instance of CarbonProperties.
This closes #3214
manishnalla1994 [Mon, 27 May 2019 09:22:03 +0000 (14:52 +0530)]
[HOTFIX] Fixed Compatibilty Issue
Problem : EOF exception was being thrown in case of old store while reading the blocklet Info.
Solution : Added a fix by writing the number of rows per page in case of old store.
This closes #3239
BJangir [Tue, 7 May 2019 13:36:36 +0000 (19:06 +0530)]
[CARBONDATA-3378]Display original query in Indexserver Job
When any query fired in main jdbcserver , in Index server
there is no mapping of it.
It is difficult to find which job in index server belong to which
query specially in concurrent queries.
This PR will display query in index server also along with Executionid .
This closes #3208
akashrn5 [Mon, 27 May 2019 06:54:33 +0000 (12:24 +0530)]
[CARBONDATA-3394]Clean files command optimization
Problem
Clean files is taking of lot of time to finish, even though there are no segments to delete
Tested for 5000 segments, and clean files takes 15 minutes time to finish
Root cause and Solution
Lot of table status read operations are were happening during clean files
lot of listing operations are happening, even though they are not required.
Read and list operations are reduced to reduce overall time for clean files.
After changes, for the same store, it takes 35 seconds in same 3 node cluste
This closes #3227
Indhumathi27 [Mon, 27 May 2019 13:14:33 +0000 (18:44 +0530)]
[CARBONDATA-3402] Fix block complex data type and validate dmproperties for MV
This PR includes,
Blocked complex data types with mv
Fixed to_date function while creating mv datamap
Added inheriting Global dictionary from parent table to child table for preaggregate & mv
Validate DMproperties for MV
This closes #3241
dhatchayani [Tue, 28 May 2019 13:59:46 +0000 (19:29 +0530)]
[CARBONDATA-3393] Merge Index Job Failure should not trigger the merge index job again. Exception should be propagated to the caller.
Problem:
If the merge index job is failed, the same job is triggered again.
Solution:
Merge index job exception has to be propagated to the caller. It should not trigger the same job again.
Changes:
(1) Merge index job failure will not be propagated to the caller. And will only be LOGGED.
(2) Implement a new method to write the SegmentFile based on the current load timestamp. This helps in case of merge index failures and writing merge index for old store.
This closes #3226
manishnalla1994 [Mon, 27 May 2019 06:39:04 +0000 (12:09 +0530)]
[DOCUMENTATION] Document change for GLOBAL_SORT_PARTITIONS
Documentation change done for Global Sort Partitions during Range Column DataLoad/Compaction.
This closes #3234
manishnalla1994 [Mon, 27 May 2019 06:11:10 +0000 (11:41 +0530)]
[CARBONDATA-3396] Range Compaction Data Mismatch Fix
Problem : When we have to compact the data second time and the ranges made first time have data in more than one file/blocklet, then while compacting second time if the first blocklet does not contain any record then the other files are also skipped. Also, Global Sort and Local Sort with Range Column were taking different time for same data load and compaction as during write step we give only 1 core to Global Sort.
Solution : For the first issue we are reading all the blocklets of a given range and then breaking only when the batch size is full. For the second issue in case of range column both the sort scopes will now take same number of cores and behave similarly.
Also changed the number of tasks to be launched during the compaction, now based on the number of tasks during load.
This closes #3233
BJangir [Mon, 27 May 2019 09:25:39 +0000 (14:55 +0530)]
[CARBONDATA-3397]Remove SparkUnknown Expression to Index Server
Problem
if Query has UDF and it is registered to the Main driver Since UDF function will not be available in Index server , query will be failed in Indexserver (with NoClassDefincationFound).
Solution
UDF are SparkUnkownFilter(RowLevelFilterExecuterImpl) so Remove the SparkUnknown Expression because anyway for pruning we select all blocks. org.apache.carbondata.core.scan.filter.executer.RowLevelFilterExecuterImpl#isScanRequired.
Supply all the UDFs functions and it's related lambda expressions to IndexServer also. But it has below issues
a. Spark FunctionRegistry is not writable
b. sending All functions from Main Server to Index server will be costly(in Size) & no way to find implicit function and explicit user created functions.
So Solution 1 is adopted.
This closes #3238
BJangir [Mon, 27 May 2019 09:56:21 +0000 (15:26 +0530)]
[CARBONDATA-3400] Support IndexSever for Spark-Shell in secure Mode(kerberos)
Problem
In spark-shell OR Spark-Submit mode, Application user and IndexServer User are different .
Application user is based on Kinit user OR based on spark.yarn.principle user whereas Indexserver user is based on spark.carbon.indexserver.principal . it is possible that both are different as Indexserver should have it's own authentication principle and should not depend on Application principle so that any application's Query(Thrifserver,Spark-shell,Spark-sql,Spark-Submit) can be served from IndexServer.
Solution
Authenticate the IndexServer by it's own principle and keytab.
keytab is required so that long run application (client and indexserver ) does not impacted on token expire.
Note:- Spark-default.conf of Thriftserver (beeline), spark-submit ,spark-sql should have both spark.carbon.indexserver.principal and spark.carbon.indexserver.keytab.
This closes #3240
dhatchayani [Mon, 29 Apr 2019 13:22:57 +0000 (18:52 +0530)]
[CARBONDATA-3364] Support Read from Hive. Queries are giving empty results from hive.
This closes #3192
ajantha-bhat [Fri, 24 May 2019 14:20:57 +0000 (19:50 +0530)]
[CARBONDATA-3395] Fix Exception when concurrent readers built with same split object
problem: Fix Exception when concurrent readers built with same split object
cause: In CarbonInputSplit, BlockletDetailInfo and BlockletInfo are made lazy. so, BlockletInfo is prepared during reader builder.
so, when two readers work on same split object, the state of this object is changed and leading to array out of bound issue.
solution: a) synchronize BlockletInfo creation,
b) load BlockletDetailInfo before passing to reader inside getSplit() API itself.
c) Failure case get the proper identifier to cleanup the datamaps.
d) build_with_splits, need to handle default projection filling if not configured.
This closes #3232
akashrn5 [Mon, 27 May 2019 06:58:00 +0000 (12:28 +0530)]
[HOTFIX]Fix select * failure when MV datamap is enabled
Problem:
when select * is executed with limit, ColumnPruning rule will remove the project node from the plan during optimization, so child of limit nod eis relation and it fails in modular plan generation
Solution:
so if child of Limit is relation, then make the select node and make the modular plan
This closes #3235
Indhumathi27 [Mon, 13 May 2019 05:38:31 +0000 (11:08 +0530)]
[CARBONDATA-3387] Support Partition with MV datamap & Show DataMap Status
This PR includes,
Support Partition with Mv Datamap [Datamap with single parent table]
Show DataMap status and ParentTable to Datamap table segment Sync Information with SHOW DATAMAP ddl
Optimization for Incremental DataLoad.
In case of below scenario we can avoid reloading the MV
Maintable segments:0,1,2
MV: 0 => 0,1,2
Now after maintable compaction it will reload the 0.1 segment of maintable to MV, this is avoided by changing the mapping {0,1,2}=>{0.1}
This closes #3216
kunal642 [Wed, 15 May 2019 11:10:28 +0000 (16:40 +0530)]
[CARBONDATA-3392] Make LRU mandatory for index server
Background:
Currently LRU is optional for the user to configure, but this will raise some concerns in case of index server because the invalid segments have to be constantly removed from the cache in case of update/delete/compaction scenarios.
Therefore if clear segment job is failed then the job would not fail bu there has to be a mechanism to prevent that segment from being in cache forever.
To prevent the above mentioned scenario LRU cache size for executor is a mandatory property for the index server application.
This closes #3222