DmitryGrb [Sat, 12 Dec 2020 19:16:46 +0000 (22:16 +0300)]
[BAHIR-233] Add SNS message support for SQS streaming source (#97)
Added messageWrapper option for SQS streaming connector
which says if this is pure s3 notification event or it is coming
from SNS topic
abhishekd0907 [Mon, 19 Oct 2020 19:58:00 +0000 (01:28 +0530)]
Properly propagate exceptions in SQSClient (#99)
abhishekd0907 [Thu, 9 Jul 2020 01:34:50 +0000 (07:04 +0530)]
[MINOR] Updating sql-streaming sqs Readme (#98)
Fixing dependency name and adding useInstanceProfileCredentials
in the example on Readme because it is frequently used.
Luciano Resende [Sun, 19 Jan 2020 20:17:48 +0000 (12:17 -0800)]
[MINOR] Add license headers and section titles to component README.md (#95)
abhishekd0907 [Mon, 30 Dec 2019 21:27:39 +0000 (02:57 +0530)]
[BAHIR-222] Add SQL Streaming SQS connector to Readme (#96)
abhishekd0907 [Sat, 28 Dec 2019 20:40:26 +0000 (02:10 +0530)]
[BAHIR-213] Faster S3 file Source for Structured Streaming with SQS (#91)
Using FileStreamSource to read files from a S3 bucket has problems
both in terms of costs and latency:
Latency: Listing all the files in S3 buckets every micro-batch can be both
slow and resource-intensive.
Costs: Making List API requests to S3 every micro-batch can be costly.
The solution is to use Amazon Simple Queue Service (SQS) which lets
you find new files written to S3 bucket without the need to list all the
files every micro-batch.
S3 buckets can be configured to send a notification to an Amazon SQS Queue
on Object Create / Object Delete events. For details see AWS documentation
here Configuring S3 Event Notifications
Spark can leverage this to find new files written to S3 bucket by reading
notifications from SQS queue instead of listing files every micro-batch.
This PR adds a new SQSSource which uses Amazon SQS queue to find
new files every micro-batch.
Usage
val inputDf = spark .readStream
.format("s3-sqs")
.schema(schema)
.option("fileFormat", "json")
.option("sqsUrl", "https://QUEUE_URL")
.option("region", "us-east-1")
.load()
Luciano Resende [Fri, 6 Dec 2019 20:47:58 +0000 (14:47 -0600)]
[HOTFIX] Fix maven version requirement based on Travis deployment
Luciano Resende [Fri, 13 Sep 2019 16:44:13 +0000 (09:44 -0700)]
[maven-release-plugin] prepare for next development iteration
Luciano Resende [Fri, 13 Sep 2019 16:43:52 +0000 (09:43 -0700)]
[maven-release-plugin] prepare release v2.4.0-rc1
Luciano Resende [Thu, 12 Sep 2019 20:27:57 +0000 (13:27 -0700)]
Syncronyze maven version requirement with Spark 2.4
Like [Tue, 3 Sep 2019 00:01:49 +0000 (08:01 +0800)]
[BAHIR-172 ] Replace FileInputStream with Files.newInputStream (#92)
abhishekd0907 [Fri, 30 Aug 2019 18:43:02 +0000 (00:13 +0530)]
[BAHIR-217] Installation of Oracle JDK8 is Failing in Travis CI (#93)
Install of Oracle JDK 8 Failing in Travis CI and as a result,
build is failing for new pull requests.
We just need to add `dist: trusty` in the .travis.yml file
as mentioned in the issue below:
https://travis-ci.community/t/install-of-oracle-jdk-8-failing/3038
Luciano Resende [Wed, 12 Jun 2019 20:36:47 +0000 (22:36 +0200)]
Add new SQL Streaming JDBC extension to Readme
Luciano Resende [Tue, 11 Jun 2019 12:49:37 +0000 (14:49 +0200)]
Fix warnings about undeclared variables on javadoc
Luciano Resende [Tue, 11 Jun 2019 11:47:17 +0000 (13:47 +0200)]
Fix warnings around depecrated usage of shouldRunTest
Esteban Laver [Wed, 12 Jun 2019 20:26:19 +0000 (16:26 -0400)]
Updated sql-cloudant dependencies (#90)
Bumped java-cloudant to 2.17.0 and okhttp to 3.12.2
Wang Yanlin [Tue, 11 Jun 2019 11:04:39 +0000 (19:04 +0800)]
[BAHIR-192] Add jdbc sink for structured streaming. (#81)
Luciano Resende [Sun, 19 May 2019 20:41:25 +0000 (22:41 +0200)]
[MINOR] Add support for release from branch
Luciano Resende [Sun, 19 May 2019 16:49:29 +0000 (18:49 +0200)]
Update pom scm configuration
Grzegorz Lyczba [Thu, 16 May 2019 22:53:05 +0000 (00:53 +0200)]
[MINOR] Handle case when no messages from pubsub
Method getReceivedMessages returns NULL when there is no message in
a subscription. Store for processing in Spark and prepare the ACK request
only when, at least, one message is ready for processing.
Lukasz Antoniak [Thu, 11 Apr 2019 06:14:02 +0000 (08:14 +0200)]
[BAHIR-203] Manual acknowledge PubSub messages
Esteban Laver [Tue, 11 Dec 2018 02:46:33 +0000 (21:46 -0500)]
[BAHIR-187] Sppedup tests by reducing test data files
Reduced the JSON test files to shorten time required
to complete test cases
Closes #75
Lukasz Antoniak [Thu, 24 Jan 2019 12:06:57 +0000 (04:06 -0800)]
[BAHIR-141] Support GCP JSON key type as binary array
Closes #82
Closes #53
Lukasz Antoniak [Tue, 11 Dec 2018 14:57:46 +0000 (06:57 -0800)]
[BAHIR-107] Upgrade to Scala 2.12 and Spark 2.4.0
Closes #76
Lukasz Antoniak [Wed, 19 Dec 2018 21:23:58 +0000 (13:23 -0800)]
[BAHIR-175] Fix MQTT recovery after checkpoint
Closes #79
Lukasz Antoniak [Thu, 20 Dec 2018 18:39:24 +0000 (10:39 -0800)]
[BAHIR-65] Twitter integration test
Closes #80
Luciano Resende [Sat, 15 Dec 2018 21:36:07 +0000 (18:36 -0300)]
[MINOR] Configure Hadoop dependency version as property
wangyanlin01 [Sun, 2 Dec 2018 03:00:21 +0000 (11:00 +0800)]
[BAHIR-183] Using HDFS for saving message for mqtt source.
Closes #78
Lukasz Antoniak [Wed, 12 Dec 2018 17:17:38 +0000 (09:17 -0800)]
[MINOR] Ignore flaky PubNub integration test
Closes #77
Lukasz Antoniak [Wed, 5 Dec 2018 20:44:28 +0000 (12:44 -0800)]
[BAHIR-186] SSL support in MQTT structured streaming
Closes #74
Lukasz Antoniak [Mon, 3 Dec 2018 16:52:10 +0000 (08:52 -0800)]
[BAHIR-103] New module with common utilities and test classes
Closes #73
Luciano Resende [Fri, 30 Nov 2018 19:37:10 +0000 (20:37 +0100)]
[maven-release-plugin] prepare for next development iteration
Luciano Resende [Fri, 30 Nov 2018 19:36:52 +0000 (20:36 +0100)]
[maven-release-plugin] prepare release v2.3.2-rc1
Luciano Resende [Fri, 30 Nov 2018 19:29:15 +0000 (20:29 +0100)]
Set Spark version to 2.3.2 preparing for release
Luciano Resende [Fri, 30 Nov 2018 15:30:31 +0000 (16:30 +0100)]
[maven-release-plugin] prepare for next development iteration
Luciano Resende [Fri, 30 Nov 2018 15:30:13 +0000 (16:30 +0100)]
[maven-release-plugin] prepare release v2.3.1-rc1
Luciano Resende [Fri, 30 Nov 2018 14:56:22 +0000 (15:56 +0100)]
Set Spark version to 2.3.1 preparing for release
Luciano Resende [Fri, 30 Nov 2018 14:54:26 +0000 (15:54 +0100)]
[MINOR] Update release script usage documentation
Luciano Resende [Fri, 30 Nov 2018 13:34:43 +0000 (14:34 +0100)]
[maven-release-plugin] prepare for next development iteration
Luciano Resende [Fri, 30 Nov 2018 13:34:25 +0000 (14:34 +0100)]
[maven-release-plugin] prepare release v2.3.0-rc1
Luciano Resende [Fri, 30 Nov 2018 12:26:57 +0000 (13:26 +0100)]
[MINOR] Fix compilation/build warnings
Disable zinc to avoid build
Replace deprecated Timeouts with TimeLimits
Add imports for scala postfix function support
Lukasz Antoniak [Thu, 22 Nov 2018 22:28:44 +0000 (14:28 -0800)]
[BAHIR-182] Spark Streaming PubNub connector
Implement new connector for PubNub (https://www.pubnub.com/)
which is increasing in popularity as a cloud messaging infrastructure.
Closes #70
Lukasz Antoniak [Tue, 27 Nov 2018 14:58:42 +0000 (06:58 -0800)]
[BAHIR-66] Switch to Java binding for ZeroMQ
Initially, I just wanted to implement integration test for BAHIR-66.
Google pointed me to JeroMQ, which provides official ZeroMQ binding
for Java and does not require native libraries. I have decided to give
it a try, but quickly realized that akka-zeromq module (transient
dependency from current Bahir master) is not compatible with JeroMQ.
Actually Akka team also wanted to move to JeroMQ (akka/akka#13856),
but in the end decided to remove akka-zeromq project completely
(akka/akka#15864, https://www.lightbend.com/blog/akka-roadmap-update-2014).
Having in mind that akka-zeromq does not support latest version of ZeroMQ
protocol and further development may come delayed, I have decided to refactor
streaming-zeromq implementation and leverage JeroMQ. With the change we receive
various benefits, such as support for PUB-SUB and PUSH-PULL messaging patterns
and the ability to bind the socket on whatever end of communication channel
(see test cases), subscription to multiple channels, etc. JeroMQ seems pretty
reliable and reconnection is handled out-of-the-box. Actually, we could even
start the ZeroMQ subscriber trying to connect to remote socket before other
end created and bound the socket. While I tried to preserve backward compatibility
of method signatures, there was no easy way to support Akka API and business
logic that users could put there (e.g. akka.actor.ActorSystem).
Closes #71
Lukasz Antoniak [Mon, 9 Jul 2018 05:42:09 +0000 (07:42 +0200)]
[BAHIR-49] Sink for SQL Streaming MQTT module
Closes #68
zhankeyu [Thu, 1 Nov 2018 09:26:19 +0000 (17:26 +0800)]
[BAHIR-181] Add username and password in MQTTUtils for pyspark
Closes #69
shimamoto [Thu, 31 May 2018 08:15:04 +0000 (17:15 +0900)]
[BAHIR-166] Migrate akka sql streaming source to DataSource v2 API
Migrate akka sql streaming source to DataSource v2 API.
Closes #67
Prashant Sharma [Fri, 27 Apr 2018 07:09:35 +0000 (12:39 +0530)]
[BAHIR-164][BAHIR-165] Port Mqtt sql source to datasource v2 API
Migrating Mqtt spark structured streaming connector to DatasourceV2 API.
Closes #65
Luciano Resende [Thu, 8 Nov 2018 02:09:34 +0000 (18:09 -0800)]
[maven-release-plugin] prepare for next development iteration
Luciano Resende [Thu, 8 Nov 2018 02:09:19 +0000 (18:09 -0800)]
[maven-release-plugin] prepare release v2.2.2-rc1
Luciano Resende [Thu, 8 Nov 2018 01:41:55 +0000 (17:41 -0800)]
Set Spark version to 2.2.2 preparing for release
Luciano Resende [Thu, 8 Nov 2018 01:09:52 +0000 (17:09 -0800)]
[maven-release-plugin] prepare for next development iteration
Luciano Resende [Thu, 8 Nov 2018 01:09:37 +0000 (17:09 -0800)]
[maven-release-plugin] prepare release v2.1.3-rc1
Luciano Resende [Thu, 8 Nov 2018 01:02:41 +0000 (17:02 -0800)]
Set Spark version to 2.1.3 preparing for release
Luciano Resende [Fri, 14 Sep 2018 01:58:14 +0000 (18:58 -0700)]
Update to Scala 2.11.12
Luciano Resende [Wed, 6 Jun 2018 15:23:06 +0000 (17:23 +0200)]
[maven-release-plugin] prepare for next development iteration
Luciano Resende [Wed, 6 Jun 2018 15:22:49 +0000 (17:22 +0200)]
[maven-release-plugin] prepare release v2.2.1-rc1
Luciano Resende [Wed, 6 Jun 2018 13:05:37 +0000 (15:05 +0200)]
Set Spark version to 2.2.1 preparing for release
Luciano Resende [Fri, 1 Jun 2018 04:53:24 +0000 (21:53 -0700)]
[maven-release-plugin] prepare for next development iteration
Luciano Resende [Fri, 1 Jun 2018 04:53:09 +0000 (21:53 -0700)]
[maven-release-plugin] prepare release v2.1.2-rc1
Luciano Resende [Wed, 30 May 2018 18:25:02 +0000 (11:25 -0700)]
Set Spark version to 2.1.2 preparing for release
Update the Spark version to Spark 2.1.2 and update
necessary code to properly compile with the cited
spark version.
Milad [Sat, 19 May 2018 08:10:09 +0000 (12:40 +0430)]
Update twitter4j-streaming to latest version 4.0.5
Closes #66
Luciano Resende [Wed, 25 Apr 2018 19:22:03 +0000 (12:22 -0700)]
[BAHIR-162] Stop publishing MD5 hash with releases
Luciano Resende [Wed, 25 Apr 2018 18:49:20 +0000 (11:49 -0700)]
[BAHIR-163] Enable builds using Travis CI
Esteban Laver [Sun, 17 Dec 2017 18:30:04 +0000 (13:30 -0500)]
[BAHIR-154] Refactor sql-cloudant to use cloudant-client library
- Use java-cloudant’s executeRequest for HTTP requests against
_all_docs endpoint
- Added HTTP 429 backoff with default settings
- Simplified caught exception and message for schema size
- Replaced scala http library with okhttp library for changes receiver
- Updated streaming CloudantReceiver class to use improved
ChangesRowScanner method
- Replaced Play JSON with GSON library
- Updated save operation to use java-cloudant bulk API
- Use _changes feed filter option for Cloudant/CouchDB 2.x and greater
Closes #61
Esteban Laver [Fri, 12 Jan 2018 05:26:29 +0000 (00:26 -0500)]
[BAHIR-138] Fix deprecation warning messages
- Imported ‘spark.implicits._’ to convert Spark RDD to Dataset
- Replaced deprecated `json(RDD[String])` with `json(Dataset[String])`
Closes #63
Esteban Laver [Mon, 2 Oct 2017 15:09:28 +0000 (11:09 -0400)]
[BAHIR-137] CouchDB/Cloudant _changes feed receiver improvements
Adds batchInterval option for tuning _changes receiver’s streaming batch interval
Throw a CloudantException if the final schema for the _changes receiver is empty
Call stop method in streaming receiver when there’s an error
Closes #60
Zubair Nabi [Wed, 29 Nov 2017 05:45:28 +0000 (10:45 +0500)]
[BAHIR-104] Multi-topic MQTT DStream in Python is now a PairRDD.
Closes #55
Esteban Laver [Mon, 2 Oct 2017 20:18:40 +0000 (16:18 -0400)]
[BAHIR-138] fix deprecated warnings in sql-cloudant
Fix warnings in DefaultSource class, and in CloudantStreaming
and CloudantStreamingSelector examples.
How
Imported spark.implicits._ to convert Spark RDD to Dataset
Replaced deprecated json(RDD[String]) with json(Dataset[String])
Improved streaming examples:
Replaced registerTempTable with preferred createOrReplaceTempView
Replaced !isEmpty with nonEmpty
Use an accessible sales database so users can run the example without any setup
Fixed error message when stopping tests by adding logic to streaming
receiver to not store documents in Spark memory when stream has stopped
Closes #59
Esteban Laver [Fri, 8 Sep 2017 14:33:26 +0000 (10:33 -0400)]
[BAHIR-128] Improve sql-cloudant _changes receiver
This change improves the stability of _changes receiver and
fix the intermitent failing test in sql-cloudant's
CloudantChangesDFSuite.
How
Improve performance and decrease testing time by setting batch size
to 8 seconds and using seq_interval _changes feed option.
Use getResource to load json files path
Added Mike Rhodes's ChangesRowScanner for reading each _changes line
and transforming to GSON's JSON object
Added Mike Rhodes's ChangesRow representing a row in the changes feed
Closes #57
Christian Kadner [Sun, 10 Dec 2017 11:15:04 +0000 (03:15 -0800)]
Enforce License Header in Java Sources
Add a "Header" rule to the checkstyle configuration
to enforce proper Apache license headers in Java
source files.
A similar rule ("HeaderMatchesChecker") already exists
in the scalastyle configuration.
Closes #58
Luciano Resende [Tue, 5 Dec 2017 22:08:52 +0000 (23:08 +0100)]
[BAHIR-149] Update Cloudant dependency to release 2.11.0
Luciano Resende [Tue, 5 Dec 2017 21:55:42 +0000 (22:55 +0100)]
[BAHIR-148] Use consistent MQTT client dependency version
Create a property to use a consistent version of the MQTT
client across all extensions based on MQTT.
For now, use org.eclipse.paho.client.mqttv3:1.1.0
Christian Kadner [Sat, 14 Oct 2017 02:16:23 +0000 (19:16 -0700)]
[BAHIR-139] Force scala-maven-plugin to use java.version
Make sure the scala-maven-plugin uses java.version 1.8 instead of the
default source and target version which is Java 1.6.
Upgrade the scala-maven-plugin version from 3.2.2 to 3.3.1
Closes #51
Esteban Laver [Wed, 4 Oct 2017 19:58:50 +0000 (15:58 -0400)]
[BAHIR-123] Upgrade to play-json 2.6.6
Fixed breaking API changes between play-json 2.5.x and 2.6.x
in sql-cloudant by replacing deprecated methods.
Closes #50
Luciano Resende [Thu, 17 Aug 2017 04:50:16 +0000 (22:50 -0600)]
[MINOR] Clean any sha signature files during release process
Luciano Resende [Thu, 17 Aug 2017 04:36:35 +0000 (22:36 -0600)]
[maven-release-plugin] prepare for next development iteration
Luciano Resende [Thu, 17 Aug 2017 04:36:17 +0000 (22:36 -0600)]
[maven-release-plugin] prepare release v2.2.0-rc1
Luciano Resende [Thu, 17 Aug 2017 04:28:14 +0000 (22:28 -0600)]
[MINOR] Fix javadoc issues when performing release:prepare
The maven-javadoc-plugin was complaining with 'javadoc: error -
No public or protected classes found to document.' when running
maven release:prepare.
Luciano Resende [Thu, 17 Aug 2017 04:19:35 +0000 (22:19 -0600)]
[MINOR] Skip Scala 2.10 when publishing release artifacts
Skip Scala 2.10 as a few of the extension dependencies are
not published in Scala 2.10 binaries.
Luciano Resende [Thu, 17 Aug 2017 04:18:40 +0000 (22:18 -0600)]
[MINOR] Update signature algorithm in release script
Luciano Resende [Wed, 16 Aug 2017 16:22:13 +0000 (10:22 -0600)]
[BAHIR-126] Update Akka to version 2.4.20
Address akka vulnerability: CVE-2017-5643
Luciano Resende [Tue, 15 Aug 2017 19:27:19 +0000 (13:27 -0600)]
[MINOR] Use jekyll template to describe Spark and Scala version
Ire Sun [Tue, 1 Aug 2017 09:00:58 +0000 (02:00 -0700)]
[BAHIR-122][PubSub] Make "ServiceAccountCredentials" really broadcastable
Instead of requiring key files on each instance of the cluster, we read
the key file content on the driver node and store the binary in the
ServiceAccountCredentials. When the provider is called, it retrieves the
credential with the in-memory key file.
Closes #48
Luciano Resende [Wed, 26 Jul 2017 17:33:04 +0000 (10:33 -0700)]
[BAHIR-125] Update Bahir parent pom
- Default build using JAVA 8
- Update dependencies to align with Spark 2.2.0
Luciano Resende [Wed, 26 Jul 2017 17:34:27 +0000 (10:34 -0700)]
[BAHIR-124] Update Spark depedency to version 2.2.0
Esteban Laver [Wed, 26 Jul 2017 05:53:58 +0000 (22:53 -0700)]
[BAHIR-110] Implement _changes API for sql-cloudant
- support loading Cloudant data into Spark DataFrames and SQL tables
using '_changes' endpoint
- update README to explain the new config options and differences
between '_all_docs' and '_changes' endpoints when loading data
- Add test suite to test Spark DataFrames using the '_all_docs' and
'_changes' endpoint, assert Cloudant config options, and test Spark
SQL temporary views
Closes #45
Luciano Resende [Thu, 20 Jul 2017 01:24:04 +0000 (18:24 -0700)]
[MINOR] Fix copy+paste typo
drosenst [Wed, 5 Jul 2017 20:41:02 +0000 (23:41 +0300)]
[BAHIR-100] Enhance MQTT connector to support byte arrays
Closes #47
Dheeraj Dwivedi [Sat, 24 Jun 2017 07:49:40 +0000 (13:19 +0530)]
[MINOR] Fix data file path in the streaming-twitter sample app
Closes #46
Luciano Resende [Thu, 8 Jun 2017 04:36:50 +0000 (21:36 -0700)]
[maven-release-plugin] prepare for next development iteration
Luciano Resende [Thu, 8 Jun 2017 04:36:35 +0000 (21:36 -0700)]
[maven-release-plugin] prepare release v2.1.1-rc2
Luciano Resende [Thu, 8 Jun 2017 04:32:39 +0000 (21:32 -0700)]
[BAHIR-88] Add release:prepare statement back to script
Luciano Resende [Thu, 8 Jun 2017 04:28:58 +0000 (21:28 -0700)]
[BAHIR-88] Additional fixes to produce proper rc distribution
Luciano Resende [Thu, 8 Jun 2017 03:18:58 +0000 (20:18 -0700)]
[maven-release-plugin] prepare for next development iteration
Luciano Resende [Thu, 8 Jun 2017 03:18:43 +0000 (20:18 -0700)]
[maven-release-plugin] prepare release v2.1.1-rc1
Luciano Resende [Thu, 8 Jun 2017 03:13:45 +0000 (20:13 -0700)]
[BAHIR-88] Produce distributions without release temp files
Luciano Resende [Wed, 7 Jun 2017 04:10:27 +0000 (21:10 -0700)]
[MINOR] Add checkpoint directory to git ignore configuration
Chen Bin [Thu, 27 Apr 2017 09:18:32 +0000 (17:18 +0800)]
[BAHIR-116] Add spark streaming connector to Google Cloud Pub/Sub
Cloases #42.
Subhobrata Dey [Wed, 31 May 2017 06:43:39 +0000 (23:43 -0700)]
[BAHIR-120] Akka SQL Streaming build fails with Apache Spark 2.1.1
Closes #44.
Clemens Wolff [Thu, 4 May 2017 19:52:54 +0000 (12:52 -0700)]
[BAHIR-117] Expand filtering options for TwitterInputDStream
Adds a new method to TwitterUtils that enables users to pass
an arbitrary FilterQuery down to the TwitterReceiver.
This enables use-cases like receiving Tweets based on location,
based on handle, etc. Previously users were only able to receive
Tweets based on disjunctive keyword queries.
Closes #43.