incubator-doris-spark-connector.git
3 days agofix unit test compile error (#32) master
smallhibiscus [Fri, 13 May 2022 03:08:35 +0000 (11:08 +0800)] 
fix unit test compile error (#32)

fix unit test compile error

3 days ago[fix] Fix doris.read.field configuration does not take effect. (#20)
smallhibiscus [Thu, 12 May 2022 08:47:23 +0000 (16:47 +0800)] 
[fix] Fix doris.read.field configuration does not take effect. (#20)

* Fix doris.read.field configuration does not take effect.

3 days agoadd quick start steps (#31)
LOVEGISER [Thu, 12 May 2022 08:46:26 +0000 (16:46 +0800)] 
add quick start steps (#31)

add quick start steps

8 days ago[feature] Support Spark3.2 compilation (#24)
cxzl25 [Sat, 7 May 2022 09:22:27 +0000 (17:22 +0800)] 
[feature] Support Spark3.2 compilation (#24)

* support spark3.2

9 days agooptimize log, change retry log from warn to debug (#26)
qiye [Fri, 6 May 2022 12:36:35 +0000 (20:36 +0800)] 
optimize log, change retry log from warn to debug (#26)

2 weeks agoRemove redundant variable and functions which could lead to compile fail (#19)
lide [Mon, 25 Apr 2022 10:12:29 +0000 (18:12 +0800)] 
Remove redundant variable and functions which could lead to compile fail (#19)

4 weeks ago[improvement] stream load data is converted to json format (#15)
smallhibiscus [Thu, 14 Apr 2022 01:48:46 +0000 (09:48 +0800)] 
[improvement] stream load data is converted to json format (#15)

* [improvement] stream load data is converted to json format

* Add unit test and Schema.java add keysType property

* modify doris read kafka only with jsonobject format

* format code

Co-authored-by: smallhibiscus <844981280>
5 weeks agogit commit -m '[fix] Deserialize failed caused by the new field:keysType' (#17)
Kikyou1997 [Mon, 11 Apr 2022 02:39:25 +0000 (10:39 +0800)] 
git commit -m '[fix] Deserialize failed caused by the new field:keysType' (#17)

Deserialize failed caused by the new field:keysType

2 months ago[chore] fix name bug in build.sh (#13)
Mingyu Chen [Thu, 10 Mar 2022 05:04:48 +0000 (13:04 +0800)] 
[chore] fix name bug in build.sh (#13)

fix name bug in build.sh

2 months agoDocs: Change http to https (#11)
Cheng Pan [Sun, 6 Mar 2022 14:00:44 +0000 (22:00 +0800)] 
Docs: Change http to https (#11)

Change http to https (#11)

2 months ago[chore] Change GAV (#12)
Mingyu Chen [Sun, 6 Mar 2022 03:10:03 +0000 (11:10 +0800)] 
[chore] Change GAV (#12)

1. modify GAV to "spark-doris-connector-{spark.minor.version}_${scala_version}"

2 months ago[chore] add release plugin (#8)
Mingyu Chen [Tue, 1 Mar 2022 02:41:21 +0000 (10:41 +0800)] 
[chore] add release plugin (#8)

2 months ago[chore] modify some script for building connector (#7)
Mingyu Chen [Tue, 1 Mar 2022 02:00:20 +0000 (10:00 +0800)] 
[chore] modify some script for building connector (#7)

2 months ago[Bug-Fix][Spark-Doris-Connector] resolve the problem of writing Chinese garbled...
jiafeng.zhang [Sun, 13 Feb 2022 08:50:29 +0000 (16:50 +0800)] 
[Bug-Fix][Spark-Doris-Connector]  resolve the problem of writing Chinese garbled characters (#6)

resolve the problem of writing Chinese garbled characters

3 months ago[fix](spark connector) fix spark connector unsupport STRING type. (#2)
haocean [Thu, 10 Feb 2022 14:43:09 +0000 (22:43 +0800)] 
[fix](spark connector) fix spark connector unsupport STRING type. (#2)

fix spark connector unsupported STRING type.

3 months ago[init] do some init work
morningman [Fri, 11 Feb 2022 15:26:38 +0000 (23:26 +0800)] 
[init] do some init work

3 months ago[init] init commit
morningman [Fri, 11 Feb 2022 15:22:38 +0000 (23:22 +0800)] 
[init] init commit

Move spark-doris-connector from incubator-doris@df2c756

3 months ago[fix](httpv2) make http v2 and v1 interface compatible (#7848)
jiafeng.zhang [Mon, 31 Jan 2022 14:12:34 +0000 (22:12 +0800)] 
[fix](httpv2) make http v2 and v1 interface compatible (#7848)

http v2 TableSchemaAction adds the return value of aggregation_type,
and modifies the corresponding code of Flink/Spark Connector

3 months ago[chore][fix][doc](fe-plugin)(mysqldump) fix build auditlog plugin error (#7804)
Zhengguo Yang [Wed, 26 Jan 2022 01:11:23 +0000 (09:11 +0800)] 
[chore][fix][doc](fe-plugin)(mysqldump) fix build auditlog plugin error (#7804)

1. fix problems when build fe_plugins
2. format
3. add docs about dump data using mysql dump

3 months agoFlink / Spark connector compilation problem (#7725)
jiafeng.zhang [Fri, 14 Jan 2022 14:14:48 +0000 (22:14 +0800)] 
Flink / Spark connector compilation problem (#7725)

Flink / Spark connector compilation problem

4 months ago[improvement](spark-connector) Throw an exception when the data push fails and there...
董涛 [Tue, 11 Jan 2022 07:03:06 +0000 (15:03 +0800)] 
[improvement](spark-connector) Throw an exception when the data push fails and there are too many retries (#7531)

4 months ago[refactor](spark-connector) delete useless maven dependencies and some code variable...
jiafeng.zhang [Sun, 9 Jan 2022 08:58:16 +0000 (16:58 +0800)] 
[refactor](spark-connector) delete useless maven dependencies and some code variable definition issues (#7655)

4 months ago[improvement](spark-connector) Stream load http exception handling (#7514)
jiafeng.zhang [Sun, 9 Jan 2022 08:54:55 +0000 (16:54 +0800)] 
[improvement](spark-connector) Stream load http exception handling (#7514)

Stream load http exception handling

4 months ago[chore][docs] add deploy spark/flink connectors to maven release repo docs (#7616)
Zhengguo Yang [Thu, 6 Jan 2022 15:23:33 +0000 (23:23 +0800)] 
[chore][docs] add deploy spark/flink connectors to maven release repo docs (#7616)

4 months ago[refactor] update parent pom version and optimize build scripts (#7548)
Zhengguo Yang [Wed, 5 Jan 2022 02:45:11 +0000 (10:45 +0800)] 
[refactor] update parent pom  version and optimize build scripts (#7548)

4 months ago[refactor] Standardize the writing of pom files, prepare for deployment to maven...
Zhengguo Yang [Thu, 30 Dec 2021 02:16:37 +0000 (10:16 +0800)] 
[refactor] Standardize the writing of pom files, prepare for deployment to maven (#7477)

4 months ago[improvement](spark-connector)(flink-connector) Modify the max num of batch written...
jiafeng.zhang [Sun, 26 Dec 2021 03:13:47 +0000 (11:13 +0800)] 
[improvement](spark-connector)(flink-connector) Modify the max num of batch written by Spark/Flink connector each time. (#7485)

Increase the default batch size and flush interval

4 months ago[chore][community](github) Remove travis and add github action (#7380)
Mingyu Chen [Wed, 15 Dec 2021 05:27:37 +0000 (13:27 +0800)] 
[chore][community](github) Remove travis and add github action (#7380)

1. Remove travis
2. Add github action to build extension:
    1. docs
    2. fs_broker
    3. flink/spark/connector

5 months ago[Improvement](spark-connector) Add 'sink.batch.size' and 'sink.max-retries' options...
wei zhao [Mon, 6 Dec 2021 02:29:33 +0000 (10:29 +0800)] 
[Improvement](spark-connector) Add 'sink.batch.size' and 'sink.max-retries' options in spark-connector (#7281)

Add  `sink.batch.size` `sink.max-retries` options in `Doris Spark-connector`.
Be consistent with `link-connector` options .
eg:
```scala
   df.write
      .format("doris")
      // specify maximum number of lines in a single flushing
      .option("sink.batch.size",2048)
      // specify number of retries after writing failed
      .option("sink.max-retries",3)
      .save()
```

5 months ago[License] Add License header for missing files (#7130)
Mingyu Chen [Tue, 16 Nov 2021 10:37:54 +0000 (18:37 +0800)] 
[License] Add License header for missing files (#7130)

1. Add License header for missing files
2. Modify the spark pom.xml to correct the location of `thrift`

6 months ago[Feature] Support Flink and Spark connector support String type (#7075)
wudi [Sat, 13 Nov 2021 09:10:22 +0000 (17:10 +0800)] 
[Feature] Support Flink and Spark connector support String type (#7075)

Support String type for Flink and Spark connector

6 months ago[SparkConnector] Add thrift dir for spark connector (#7074)
tinkerrrr [Sat, 13 Nov 2021 09:09:52 +0000 (17:09 +0800)] 
[SparkConnector] Add thrift dir for spark connector (#7074)

Add thrift dir for spark connector, to fix error when building spark-doris-connector

6 months ago[Compile] Fix spark-connector compile problem (#7048)
wei zhao [Thu, 11 Nov 2021 07:42:30 +0000 (15:42 +0800)] 
[Compile] Fix spark-connector compile problem (#7048)

Use `thrift` in thirdparty

6 months ago[Build]Compile and output the jar file, add Spark, Flink version and Scala version...
jiafeng.zhang [Tue, 9 Nov 2021 02:02:08 +0000 (10:02 +0800)] 
[Build]Compile and output the jar file, add Spark, Flink version and Scala version (#7051)

The jar file compiled by Flink and Spark Connector, with the corresponding Flink, Spark version
and Scala version at compile time, so that users can know whether the version number matches when using it.

Example of output file name:doris-spark-1.0.0-spark-3.2.0_2.12.jar

6 months ago[HTTP][API] Add backends info API for spark/flink connector (#6984)
Mingyu Chen [Fri, 5 Nov 2021 01:43:06 +0000 (09:43 +0800)] 
[HTTP][API] Add backends info API for spark/flink connector (#6984)

Doris should provide a http api to return backends list for connectors to submit stream load,
and without privilege checking, which can let common user to use it

6 months ago[Revert] Revert RestService.java (#6994)
wei zhao [Thu, 4 Nov 2021 04:13:18 +0000 (12:13 +0800)] 
[Revert] Revert RestService.java (#6994)

6 months ago[Feature] Spark connector supports to specify fields to write (#6973)
wei zhao [Tue, 2 Nov 2021 08:35:29 +0000 (16:35 +0800)] 
[Feature] Spark connector supports to specify fields to write (#6973)

1. By default , Spark connector must write all fields value to `Doris` table .
In this feature , user can specify part of fields to write ,  even specify the order of the fields to write.

eg:
I have a table named `student` which has three columns (name,gender,age) ,
creating table sql as following:
```sql
create table student (name varchar(255), gender varchar(10), age int) duplicate key (name) distributed by hash(name) buckets 2;
```
Now , I just want  to write values to two columns : name , gender.
The code as following:
```scala
    val df = spark.createDataFrame(Seq(
      ("m", "zhangsan"),
      ("f", "lisi"),
      ("m", "wangwu")
    ))
    df.write
      .format("doris")
      .option("doris.fenodes", dorisFeNodes)
      .option("doris.table.identifier", dorisTable)
      .option("user", dorisUser)
      .option("password", dorisPwd)
      //specify your fields or the order
      .option("doris.write.field", "gender,name")
      .save()
```

6 months ago[Optimize] Spark connector supports multiple spark versions:2.1.x/2.3.x/2.4.x/3.x...
wei zhao [Fri, 29 Oct 2021 09:06:05 +0000 (17:06 +0800)] 
[Optimize] Spark connector supports multiple spark versions:2.1.x/2.3.x/2.4.x/3.x (#6956)

* Spark connector supports multiple spark versions:2.1.x/2.3.x/2.4.x/3.x
Co-authored-by: wei.zhao <wei.zhao@aispeech.com>
6 months agoFix spark connector build error (#6948)
jiafeng.zhang [Fri, 29 Oct 2021 06:59:05 +0000 (14:59 +0800)] 
Fix spark connector build error (#6948)

pom.xml error

7 months ago[Dependency] Upgrade thirdparty libs (#6766)
Zhengguo Yang [Fri, 15 Oct 2021 05:03:04 +0000 (13:03 +0800)] 
[Dependency] Upgrade thirdparty libs (#6766)

Upgrade the following dependecies:

libevent -> 2.1.12
OpenSSL 1.0.2k -> 1.1.1l
thrift 0.9.3 -> 0.13.0
protobuf 3.5.1 -> 3.14.0
gflags 2.2.0 -> 2.2.2
glog 0.3.3 -> 0.4.0
googletest 1.8.0 -> 1.10.0
snappy 1.1.7 -> 1.1.8
gperftools 2.7 -> 2.9.1
lz4 1.7.5 -> 1.9.3
curl 7.54.1 -> 7.79.0
re2 2017-05-01 -> 2021-02-02
zstd 1.3.7 -> 1.5.0
brotli 1.0.7 -> 1.0.9
flatbuffers 1.10.0 -> 2.0.0
apache-arrow 0.15.1 -> 5.0.0
CRoaring 0.2.60 -> 0.3.4
orc 1.5.8 -> 1.6.6
libdivide 4.0.0 -> 5.0
brpc 0.97 -> 1.0.0-rc02
librdkafka 1.7.0 -> 1.8.0

after this pr compile doris should use build-env:1.4.0

7 months ago[Feature] support spark connector sink data using sql (#6796)
wei zhao [Sat, 9 Oct 2021 07:47:36 +0000 (15:47 +0800)] 
[Feature] support spark connector sink data using sql (#6796)

Co-authored-by: wei.zhao <wei.zhao@aispeech.com>
7 months ago[Feature] support spark connector sink stream data to doris (#6761)
chovy [Tue, 28 Sep 2021 09:46:19 +0000 (17:46 +0800)] 
[Feature] support spark connector sink stream data to doris (#6761)

* [Feature] support spark connector sink stream data to doris

* [Doc] Add spark-connector batch/stream writing instructions

* add license and remove meaningless blanks code

Co-authored-by: wei.zhao <wei.zhao@aispeech.com>
8 months agoSpark 2.x and 3.x version compilation instructions (#6503)
jiafeng.zhang [Fri, 27 Aug 2021 02:55:29 +0000 (10:55 +0800)] 
Spark 2.x and 3.x version compilation instructions (#6503)

Spark 2.x and 3.x version compilation instructions

8 months ago[Improve]The connector supports spark 3.0, flink 1.13 (#6449)
jiafeng.zhang [Wed, 18 Aug 2021 07:57:50 +0000 (15:57 +0800)] 
[Improve]The connector supports spark 3.0, flink 1.13 (#6449)

Modify the flink/spark compilation documentation

8 months ago[Doc] flink/spark connector: add sources/javadoc plugins (#6435)
wunan1210 [Mon, 16 Aug 2021 14:41:24 +0000 (22:41 +0800)] 
[Doc] flink/spark connector: add sources/javadoc plugins (#6435)

spark-doris-connector/flink-doris-connect add plugins to generate javadoc and sources jar,
so can be easy to distribute and debug.

8 months ago[Feature] Support spark connector sink data to Doris (#6256)
huzk [Mon, 16 Aug 2021 14:40:43 +0000 (22:40 +0800)] 
[Feature] Support spark connector sink data to Doris (#6256)

support spark conector write dataframe to doris

11 months ago[Bug] Modify spark, flink doris connector to send request to FE, fix the problem...
jiafeng.zhang [Wed, 19 May 2021 01:28:21 +0000 (09:28 +0800)] 
[Bug] Modify spark, flink doris connector to send request to FE, fix the problem of POST method, it should be the same as the method when sending the request (#5788)

Modify spark, flink doris connector to send request to FE, fix the problem of POST method,
it should be the same as the method when sending the request

14 months ago[Spark-Doris-Connector][Bug-Fix] Resolve deserialize exception when Spark Doris Conne...
924060929 [Thu, 4 Mar 2021 09:48:59 +0000 (17:48 +0800)] 
[Spark-Doris-Connector][Bug-Fix] Resolve deserialize exception when Spark Doris Connector in aync deserialize mode (#5336)

Resolve deserialize exception when Spark Doris Connector in aync deserialize mode
Co-authored-by: lanhuajian <lanhuajian@sankuai.com>
14 months agoFix file licences (#5414)
Zhengguo Yang [Wed, 24 Feb 2021 08:37:17 +0000 (16:37 +0800)] 
Fix file licences (#5414)

Add license to files
For Doris 0.14

15 months ago[Bug] Spark doris connector http v2 authentication fails, and HTTP v2 interface retur...
张家锋 [Sun, 7 Feb 2021 01:28:55 +0000 (09:28 +0800)] 
[Bug] Spark doris connector http v2 authentication fails, and HTTP v2 interface returns json nesting problem (#5366)

1. Deal with the problem of inconsistent data format returned by http v1 and v2
2. Deal with user authentication failure

16 months ago[Spark on Doris] fix the encode of varchar when convertArrowToRowBatch (#5202)
HuangWei [Sun, 10 Jan 2021 12:48:46 +0000 (20:48 +0800)] 
[Spark on Doris] fix the encode of varchar when convertArrowToRowBatch (#5202)

`convertArrowToRowBatch` use the default charset to encode String.
Set it to UTF_8, because we use `arrow::utf8` on the Backends.

23 months ago[Spark on Doris] Shade and provide the thrift lib in spark-doris-connector (#3631)
Mingyu Chen [Tue, 19 May 2020 06:20:21 +0000 (14:20 +0800)] 
[Spark on Doris] Shade and provide the thrift lib in spark-doris-connector (#3631)

Mainly changes:
1. Shade and provide the thrift lib in spark-doris-connector
2. Add a `build.sh` for spark-doris-connector
3. Move the README.md of spark-doris-connector to `docs/`
4. Change the line delimiter of `fe/src/test/java/org/apache/doris/analysis/AggregateTest.java`

2 years ago[License] Add License to codes (#3272)
lichaoyong [Tue, 7 Apr 2020 08:35:13 +0000 (16:35 +0800)] 
[License] Add License to codes (#3272)

2 years ago[Spark] Support convert Arrow data to RowBatch asynchronously in Spark-Doris-Connect...
Youngwb [Thu, 26 Mar 2020 13:34:37 +0000 (21:34 +0800)] 
[Spark] Support convert  Arrow data to RowBatch asynchronously in Spark-Doris-Connector (#3186)

Currently, in the Spark-Doris-Connector, when Spark iteratively obtains each row of data,
it needs to synchronously convert the Arrow format data into the row format required by Spark.
In order to speed up the conversion process, we can add an asynchronous thread in the Connector,
which is responsible for obtaining the Arrow format data from BE and converting it into the row
format required by Spark calculation

In our test environment, Doris cluster used 1 fe and 7 be (32C+128G). When using Spark-Doris-Connector
to query a table containing 67 columns, the original query returned 69 million rows of data
took about 2.5min, but after improvement, it reduced to about 1.6min, which reduced the time by about 30%

2 years agoRemove unused KUDU codes (#3175)
lichaoyong [Tue, 24 Mar 2020 05:54:05 +0000 (13:54 +0800)] 
Remove unused KUDU codes (#3175)

KUDU table is no longer supported long time ago. Remove code related to it.

2 years agoSupport param exec_mem_limit for spark-doris-connctor (#2775)
Youngwb [Fri, 17 Jan 2020 16:14:39 +0000 (00:14 +0800)] 
Support param exec_mem_limit for spark-doris-connctor (#2775)

2 years agoUpdate arrow's version to 0.15.1 and shaded it in spark-doris-connector (#2769)
vinson0526 [Wed, 15 Jan 2020 13:08:34 +0000 (21:08 +0800)] 
Update arrow's version to 0.15.1 and shaded it in spark-doris-connector (#2769)

2 years agoConvert from arrow to rowbatch (#2723)
Youngwb [Fri, 10 Jan 2020 06:11:15 +0000 (14:11 +0800)] 
Convert from arrow to rowbatch (#2723)

For #2722
In our test environment, Doris cluster used 1 fe and 7 be (32C+128G). When using spakr-doris connecter to query a table containing 67 columns, it took about 1 hour for the query to return 69 million rows of data. After the improvement, the same query condition took 2.5 minutes and the query performance was significantly improved

2 years agoSpark return error to users when spark on doris query failed (#2531)
Youngwb [Mon, 30 Dec 2019 13:58:13 +0000 (21:58 +0800)] 
Spark return error to users when spark on doris query failed (#2531)

2 years agoFix npe in spark-doris-connector when query is complex (#2503)
vinson0526 [Thu, 19 Dec 2019 06:53:29 +0000 (14:53 +0800)] 
Fix npe in spark-doris-connector when query is complex (#2503)

2 years agoFix bug when spark on doris run long time (#2485)
Youngwb [Wed, 18 Dec 2019 05:08:21 +0000 (13:08 +0800)] 
Fix bug when spark on doris run long time   (#2485)

2 years agoFix document bugs in spark-doris-connector (#2275)
vinson0526 [Fri, 22 Nov 2019 10:05:36 +0000 (18:05 +0800)] 
Fix document bugs in spark-doris-connector (#2275)

2 years agoAdd spark-doris-connector extension (#2228)
vinson0526 [Fri, 22 Nov 2019 07:38:05 +0000 (15:38 +0800)] 
Add spark-doris-connector extension (#2228)