[CARBONDATA-4317] Fix TPCDS performance issues carbon3.0
authorIndhumathi27 <indhumathim27@gmail.com>
Tue, 7 Dec 2021 15:02:05 +0000 (20:32 +0530)
committerkunal642 <kunalkapoor642@gmail.com>
Wed, 22 Dec 2021 14:27:35 +0000 (19:57 +0530)
commit0f1d2a45e5f614fd123bd734ab37d7e453c21344
treeefcff93d34fe2ff69f54bfd6f7a208b0c71148cf
parentd629dc0b894a64bfbef762736775a182e40827fe
[CARBONDATA-4317] Fix TPCDS performance issues

Why is this PR needed?
The following issues has degraded the TPCDS query performance
1. If dynamic filters is not present in partitionFilters Set, then that filter is skipped, to pushdown to spark.
2. In some cases, some nodes like Exchange / Shuffle is not reused, because the CarbonDataSourceSCan plan is not mached
3. While accessing the metadata on the canonicalized plan throws NPE

What changes were proposed in this PR?
1. Check if dynamic filters is present in PartitionFilters set. If not, pushdown the filter
2. Match the plans, by converting them to canonicalized and by normalising the expressions
3. Move variables used in metadata(), to avoid NPE while comparing plans

This closes #4241
integration/spark/src/main/scala/org/apache/spark/sql/execution/strategy/CarbonDataSourceScan.scala
integration/spark/src/main/scala/org/apache/spark/sql/execution/strategy/CarbonSourceStrategy.scala
integration/spark/src/main/spark2.3/org/apache/spark/sql/CarbonToSparkAdapter.scala
integration/spark/src/main/spark2.4/org/apache/spark/sql/CarbonToSparkAdapter.scala
integration/spark/src/main/spark3.1/org/apache/spark/sql/CarbonToSparkAdapter.scala