14 hours agoARROW-2585: [C++] Add Decimal::FromBigEndian, which was formerly a static method... master
Joshua Storck [Tue, 22 May 2018 12:24:35 +0000 (14:24 +0200)] 
ARROW-2585: [C++] Add Decimal::FromBigEndian, which was formerly a static method in parquet-cpp/src/parquet/arrow/

Author: Joshua Storck <>

Closes #2036 from joshuastorck/decimal_from_big_endian and squashes the following commits:

e970c87 <Joshua Storck> Fixing lint errors
4cb4d89 <Joshua Storck> Adding Decimal::FromBigEndian, which was formerly a static method in parquet-cpp/src/parquet/arrow/

32 hours agoARROW-2613: [Docs] Update the gen_apidocs docker script
Korn, Uwe [Mon, 21 May 2018 18:33:10 +0000 (14:33 -0400)] 
ARROW-2613: [Docs] Update the gen_apidocs docker script

Author: Korn, Uwe <>

Closes #2068 from xhochy/ARROW-2613 and squashes the following commits:

ec955c4a <Korn, Uwe> ARROW-2613:  Update the gen_apidocs docker script

37 hours agoARROW-2614: Remove 'group: deprecated' in Travis
Korn, Uwe [Mon, 21 May 2018 13:42:24 +0000 (15:42 +0200)] 
ARROW-2614: Remove 'group: deprecated' in Travis

Author: Korn, Uwe <>

Closes #2069 from xhochy/remove-group-deprecated and squashes the following commits:

9b9e6191 <Korn, Uwe> ARROW-2614: Remove 'group: deprecated' in Travis

39 hours agoARROW-2615: [Rust] Post refactor cleanup
Andy Grove [Mon, 21 May 2018 12:12:54 +0000 (14:12 +0200)] 
ARROW-2615: [Rust] Post refactor cleanup

Just a couple of trivial changes that got missed in the refactor:

- Derive Eq trait for DataType and Field (because I rely on that as a user of this library)
- ArrowPrimitiveType should NOT be implemented for strings

Author: Andy Grove <>

Closes #2070 from andygrove/post_refactor_cleanup and squashes the following commits:

6b245a4e <Andy Grove> add accessor methods to ListArray
3289e3bc <Andy Grove> Update examples
236fde8e <Andy Grove> Minor post-refactor cleanup
82765f93 <Andy Grove> Merge remote-tracking branch 'upstream/master'
d1bfdca5 <Andy Grove> Merge branch 'master' of
52de6a10 <Andy Grove> Merge branch 'master' of
0e2606b2 <Andy Grove> Merge remote-tracking branch 'upstream/master'
d883da2f <Andy Grove> Merge remote-tracking branch 'upstream/master'
589ef71d <Andy Grove> Merge remote-tracking branch 'upstream/master'
bd4fbb55 <Andy Grove> Merge remote-tracking branch 'upstream/master'
9c8a10a4 <Andy Grove> Merge remote-tracking branch 'upstream/master'
05592f8c <Andy Grove> Merge remote-tracking branch 'upstream/master'
8c0e6982 <Andy Grove> Merge remote-tracking branch 'upstream/master'
31ef90ba <Andy Grove> Merge remote-tracking branch 'upstream/master'
2f87c703 <Andy Grove> Fix build - add missing import

3 days agoARROW-2597: [Plasma] remove UniqueIDHasher
Zhijun Fu [Sat, 19 May 2018 19:10:56 +0000 (12:10 -0700)] 
ARROW-2597: [Plasma] remove UniqueIDHasher

Replace UniqueIDHasher with std::hash so that STL containers with ObjectID doesn't need to specify the compare function. This has already been done for Ray, this change applies it to Plasma.

Author: Zhijun Fu <>
Author: Zhijun Fu <>

Closes #2059 from zhijunfu/remove-UniqueIDHasher and squashes the following commits:

2498635a <Zhijun Fu> resolve review comments: remove const version of hash()
d5b51690 <Zhijun Fu>  remove UniqueIDHasher

4 days agoARROW-2612: [Plasma] Fix deprecated PLASMA_DEFAULT_RELEASE_DELAY
Philipp Moritz [Sat, 19 May 2018 02:25:10 +0000 (19:25 -0700)] 
ARROW-2612: [Plasma] Fix deprecated PLASMA_DEFAULT_RELEASE_DELAY

Author: Philipp Moritz <>

Closes #2063 from pcmoritz/fix-plasma-deprecated-const and squashes the following commits:

b6e92f67 <Philipp Moritz> fix test
5e1d82b7 <Philipp Moritz> add test
469b59a6 <Philipp Moritz> fix deprecated PLASMA_DEFAULT_RELEASE_DELAY

5 days agoARROW-2611: [Python] Fix Python 2 integer serialization
Peter Schafhalter [Fri, 18 May 2018 00:53:34 +0000 (17:53 -0700)] 
ARROW-2611: [Python] Fix Python 2 integer serialization

Fixes an issue where serialization turns integers into longs in Python 2.

In [1]: import pyarrow as pa

In [2]: value = 1

In [3]: type(value)
Out[3]: int

In [4]: serialized = pa.serialize(value)

In [5]: deserialized = serialized.deserialize()

In [6]: type(deserialized)
Out[6]: long

Author: Peter Schafhalter <>

Closes #2055 from pschafhalter/fix-python2-int-serialization and squashes the following commits:

7b96b679 <Peter Schafhalter> Fix bug with Python 3 C++ API
5d8ff268 <Peter Schafhalter> Add type checking in assert_equal
d5e5e5db <Peter Schafhalter> Fix python2 integer serialization bug

5 days ago[GLib] Use the default directory of GTK-Doc (#2058)
Kouhei Sutou [Thu, 17 May 2018 20:30:23 +0000 (05:30 +0900)] 
[GLib] Use the default directory of GTK-Doc (#2058)

We don't need to customize it.

5 days agoARROW-2594: [Java] When realloc Vectors, zero out all unfilled bytes of new buffer
Bryan Cutler [Thu, 17 May 2018 18:20:45 +0000 (11:20 -0700)] 
ARROW-2594: [Java] When realloc Vectors, zero out all unfilled bytes of new buffer

Currently when reallocating vectors, only the second half of the new buffer will be zeroed out assuming that it is doubled from the previous buffer and the first half is already populated or cleaned.  This isn't the case if the vector had been cleared and the buffer is empty causing incorrect values in the new buffer if it was recycled from an old one.

Added a new test with a ListVector that should reuse a previous buffer after being cleared.

Author: Bryan Cutler <>

Closes #2054 from BryanCutler/java-vector-realloc-clear-buffer-ARROW-2594 and squashes the following commits:

28b8095 <Bryan Cutler> added a comment about clear
be3ee8f <Bryan Cutler> remove extra spaces
5a39790 <Bryan Cutler> zero out any newly allocated buffer bytes

5 days agoARROW-2521: [Rust] Refactor Rust API to use traits and generic to represent Array...
Andy Grove [Thu, 17 May 2018 16:15:44 +0000 (18:15 +0200)] 
ARROW-2521: [Rust] Refactor Rust API to use traits and generic to represent Array instead of enum

Author: Andy Grove <>

Closes #1971 from andygrove/refactor_rust_api_v2 and squashes the following commits:

a04d66a9 <Andy Grove> cargo fmt with 1.26.0
f3f71dda <Andy Grove> Rename BufferArray to PrimitiveArray
10714a1f <Andy Grove> cargo fmt
b2d9e42e <Andy Grove> add assertions to RecordBatch
d577510c <Andy Grove> Remove need to clone array
be3a981d <Andy Grove> cargo fmt
22f907ab <Andy Grove> Renaming structs and traits and adding documentation
4add4f05 <Andy Grove> Revert "Add type coercion helper method"
51270de5 <Andy Grove> Add type coercion helper method
cc40ba45 <Andy Grove> Removing macros, implemented min/max for arrays of primitives
01bc9538 <Andy Grove> implement min/max for primitive array
b2659b10 <Andy Grove> run cargo fmt with stable rust
66c016e3 <Andy Grove> use usize instead of i32 (except for list offsets)
dbe49a74 <Andy Grove> Rebase
d1bfdca5 <Andy Grove> Merge branch 'master' of
2bae169e <Andy Grove> Refactor Rust API to use traits and generic to represent Array instead of enum
52de6a10 <Andy Grove> Merge branch 'master' of
0e2606b2 <Andy Grove> Merge remote-tracking branch 'upstream/master'
d883da2f <Andy Grove> Merge remote-tracking branch 'upstream/master'
589ef71d <Andy Grove> Merge remote-tracking branch 'upstream/master'
bd4fbb55 <Andy Grove> Merge remote-tracking branch 'upstream/master'
9c8a10a4 <Andy Grove> Merge remote-tracking branch 'upstream/master'
05592f8c <Andy Grove> Merge remote-tracking branch 'upstream/master'
8c0e6982 <Andy Grove> Merge remote-tracking branch 'upstream/master'
31ef90ba <Andy Grove> Merge remote-tracking branch 'upstream/master'
2f87c703 <Andy Grove> Fix build - add missing import

5 days agoARROW-2574: [Python] Add Cython and Python code coverage
Antoine Pitrou [Thu, 17 May 2018 14:21:47 +0000 (16:21 +0200)] 
ARROW-2574: [Python] Add Cython and Python code coverage

After spending a non-trivial time wrestling with Cython and our build system, we're now able to generate and upload Python and Cython coverage results as part of a Travis-CI run (in addition to C++ coverage).

Author: Antoine Pitrou <>

Closes #2050 from pitrou/ARROW-2574-cython-coverage and squashes the following commits:

4553185 <Antoine Pitrou> Remove leftover
b1212a4 <Antoine Pitrou> Silence "unknown warning option" error on clang
e1a5b4a <Antoine Pitrou> Disable ORC when building benchmarks
06b0665 <Antoine Pitrou> Try to fix Sphinx doc building
9b41d24 <Antoine Pitrou> Add nogil tracing
4014951 <Antoine Pitrou> ARROW-2574:  Add Cython and Python code coverage

5 days agoARROW-2486: [C++/Python] Provide a Docker image that contains all dependencies for...
Aneesh Karve [Thu, 17 May 2018 13:50:17 +0000 (15:50 +0200)] 
ARROW-2486: [C++/Python] Provide a Docker image that contains all dependencies for development

Open items
- [x] Why is `py.test pyarrow` failing on plasma deps when script follows [docs](
- [x] Should `/script/*.sh` use the same code as developer docs to avoid denormalization?
- [x] Move docker image to Apache registry?
- [x] Multiple container strategy possible, but overly complex. Requires exposing volume on one container as a mount point for a second container. Only speeds up user's first build.
- [x] Are gcc/g++ 4.8 the ideal versions?
- [x] Unit tests needed?
- [x] Update README per resolution of above

Author: Aneesh Karve <>

Closes #2016 from akarve/master and squashes the following commits:

5aec17a8 <Aneesh Karve> final PR feedback; README indendtation

5 days agoARROW-2595: [Plasma] Use map.find instead of operator[] to avoid producing garbage...
senlin.zsl [Thu, 17 May 2018 04:07:15 +0000 (21:07 -0700)] 
ARROW-2595: [Plasma] Use map.find instead of operator[] to avoid producing garbage data

- Problem
  * Using object_get_requests_[object_id] will produce a lot of garbage data in PlasmaStore::return_from_get. During the measurement process, we found that there was a lot of memory growth in this point.

- Solution
  * Use iterator instead of operator []

Author: senlin.zsl <>

Closes #2056 from wumuzi520/dev_slz and squashes the following commits:

ccaab502 <senlin.zsl> Use map.find instead of operator to avoid producing garbage data

6 days agoARROW-2561: [C++] Fix double free in cuda-test under code coverage
Antoine Pitrou [Wed, 16 May 2018 20:29:40 +0000 (22:29 +0200)] 
ARROW-2561: [C++] Fix double free in cuda-test under code coverage

As far as I can understand, the problem is due to both shared and static linking with libarrow.  Some static std::string in would be destroyed twice at shutdown.  Linking entirely statically seems to fix the issue.

Author: Antoine Pitrou <>

Closes #2048 from pitrou/ARROW-2561 and squashes the following commits:

7a9d1b5e <Antoine Pitrou> Add comment and do not mention arrow_shared in static link libs
0b40b802 <Antoine Pitrou> ARROW-2561:  Fix double free in cuda-test under code coverage

6 days agoARROW-2589: [Python] Workaround regression in Pandas 0.23.0
Antoine Pitrou [Wed, 16 May 2018 16:13:05 +0000 (18:13 +0200)] 
ARROW-2589: [Python] Workaround regression in Pandas 0.23.0

There is a regression (*) in Pandas 0.23.0 that breaks
Pandas does not have an actual "str" dtype anyway, so pass "object" instead.

(*) pandas-dev/pandas#21083

Author: Antoine Pitrou <>

Closes #2051 from pitrou/ARROW-2589 and squashes the following commits:

b581ef36 <Antoine Pitrou> ARROW-2589:  Workaround regression in Pandas 0.23.0

7 days agoARROW-2582: [GLib] Add negate functions for Decimal128
yosuke shiro [Tue, 15 May 2018 23:50:50 +0000 (08:50 +0900)] 
ARROW-2582: [GLib] Add negate functions for Decimal128

Add garrow_decimal128_negate().

Author: yosuke shiro <>

Closes #2047 from shiro615/ARROW-2576-add-negate-functions and squashes the following commits:

7a63d005 [yosuke shiro] rename absolute_value to positive_value
815f4492 [yosuke shiro] Add negate functions

7 days agoARROW-2558: [Plasma] avoid walk through all the objects when a client disconnects
Zhijun Fu [Tue, 15 May 2018 22:54:25 +0000 (15:54 -0700)] 
ARROW-2558: [Plasma] avoid walk through all the objects when a client disconnects

Currently plasma stores list-of-clients in ObjectTableEntry, which is used to track which clients are using a given object, this serves for two purposes:
- If an object is in use.
- If the client trying to abort an object is the one who created it.

A problem with list-of-clients approach is that when a client disconnects, we need to walk through all the objects and remove the client pointer from the list for each object.

Instead, we could add a reference count in ObjectTableEntry, and store list-of-object-ids in client structure. This could both goals that the original approach is targeting, while when a client disconnects, it just walk through its object-ids and dereference each ObjectTableEntry, there's no need to walk through all objects.

Author: Zhijun Fu <>

Closes #2015 from zhijunfu/client_object_ids and squashes the following commits:

d8db8f75 <Zhijun Fu> Address comments from pcmoritz
8a439e88 <Zhijun Fu> Trigger
a0475725 <Zhijun Fu>  use list-of-object-ids in client instead of list-of-clients in object

7 days agoARROW-2584: [JS] Fixes for node v10
ptaylor [Tue, 15 May 2018 21:20:22 +0000 (17:20 -0400)] 
ARROW-2584: [JS] Fixes for node v10

Fixes for node v10 found by @trxcllnt

Author: ptaylor <>
Author: Brian Hulette <>

Closes #2049 from TheNeuralBit/fix-node-v10 and squashes the following commits:

1eec12a2 <Brian Hulette> Bump CI node version to 10.1
60f4c644 <ptaylor> fix instanceof ArrayBuffer in jest/node 10
ba0416fd <ptaylor> fix @std/esm options for node10

7 days agoRemoving extraneous debug print statement from (#2045)
joshuastorck [Tue, 15 May 2018 08:28:39 +0000 (04:28 -0400)] 
Removing extraneous debug print statement from (#2045)

7 days agoARROW-2332: Add Feather Dataset class
Dhruv Madeka [Tue, 15 May 2018 08:09:23 +0000 (10:09 +0200)] 
ARROW-2332: Add Feather Dataset class

Added a class to read a list of `feather` files into a PyArrow table or a pandas DataFrame

Author: Dhruv Madeka <>

Closes #2040 from dmadeka/feather-dataset and squashes the following commits:

044e2c63 <Dhruv Madeka> Add Feather Dataset class

7 days agoSerialize tensors in PyTorch 0.4 (#2033)
Alok Singh [Tue, 15 May 2018 05:35:22 +0000 (22:35 -0700)] 
Serialize tensors in PyTorch 0.4 (#2033)

As of PyTorch 0.4.0, `type(t)` where `t` is a tensor will always equal `torch.Tensor`. To get the original types, you need to call `t.type()`. This change fixes serialization for torch tensors. Otherwise, they're truncated (at least upon deserialization in Ray).

8 days agoARROW-2577: [Plasma] Add asv benchmarks for plasma
Philipp Moritz [Mon, 14 May 2018 23:56:58 +0000 (16:56 -0700)] 
ARROW-2577: [Plasma] Add asv benchmarks for plasma

This adds some initial ASV benchmarks for plasma:

- Put latency
- Get latency
- Put throughput for 1KB, 10KB, 100KB, 1MB, 10MB, 100MB

It also includes some minor code restructuring to expose the start_plasma_store method.

Author: Philipp Moritz <>

Closes #2038 from pcmoritz/plasma-asv and squashes the following commits:

34a06845 <Philipp Moritz> measure wallclock time instead of process cpu time
c89256f7 <Philipp Moritz> parametrize tests
3567ddc7 <Philipp Moritz> fix windows build
eca17675 <Philipp Moritz> build plasma in asv
47671b34 <Philipp Moritz> fix
1261177e <Philipp Moritz> fix linting errors
7d4d6854 <Philipp Moritz> Add asv benchmarks for plasma

8 days agoARROW-2580: [GLib] Fix abs functions for Decimal128
yosuke shiro [Mon, 14 May 2018 21:19:13 +0000 (06:19 +0900)] 
ARROW-2580: [GLib] Fix abs functions for Decimal128

I fixed about

Author: yosuke shiro <>

Closes #2044 from shiro615/ARROW-2580-fix-abs-functions and squashes the following commits:

d3c7d8ab [yosuke shiro] Fix abs functions

8 days agoARROW-2563: [Rust] Poor caching in Travis-CI
Chao Sun [Mon, 14 May 2018 20:46:28 +0000 (22:46 +0200)] 
ARROW-2563: [Rust] Poor caching in Travis-CI

Author: Chao Sun <>

Closes #2021 from sunchao/arrow-2563 and squashes the following commits:

a65dd84 <Chao Sun> fix merge coverage report issue
43aa206 <Chao Sun> ARROW-2563:  Poor caching in Travis-CI

8 days agoARROW-2578: [Plasma] Use mersenne twister to generate random number
Philipp Moritz [Mon, 14 May 2018 08:02:35 +0000 (01:02 -0700)] 
ARROW-2578: [Plasma] Use mersenne twister to generate random number

This gets rid of the std::random_device, which is slow and causes errors in valgrind. Instead we use the std::mt19937 Mersenne Twister.

Author: Philipp Moritz <>

Closes #2039 from pcmoritz/new-rng and squashes the following commits:

21d0e3f7 <Philipp Moritz> fixes
be4bb84d <Philipp Moritz> fix
beb5bab8 <Philipp Moritz> update
83740b5c <Philipp Moritz> update
f60bd99c <Philipp Moritz> more valgrind fixes
62d412f3 <Philipp Moritz> fix on older versions of macOS
841a67f0 <Philipp Moritz> fix linting
cd95cf15 <Philipp Moritz> use mersenne twister to generate random number

9 days agoARROW-2576: [GLib] Add abs functions for Decimal128
yosuke shiro [Sun, 13 May 2018 17:32:54 +0000 (19:32 +0200)] 
ARROW-2576: [GLib] Add abs functions for Decimal128

Add garrow_decimal128_abs().

Author: yosuke shiro <>

Closes #2037 from shiro615/ARROW-2576-add-abs-functions and squashes the following commits:

c34a5de9 <yosuke shiro> Add abs functions test case
0b9a9d4f <yosuke shiro> Add abs functions

9 days agoARROW-2571: [C++] Lz4Codec doesn't properly handle empty data
Dmitry Kalinkin [Sun, 13 May 2018 16:26:36 +0000 (18:26 +0200)] 
ARROW-2571: [C++] Lz4Codec doesn't properly handle empty data

From the lz4 manual [1]:

int LZ4_compress_default(const char* src, char* dst, int srcSize, int
return : the number of bytes written into buffer 'dst' (necessarily <= dstCapacity)
         or 0 if compression fails

int LZ4_decompress_safe (const char* src, char* dst, int compressedSize,
      int dstCapacity);
return : the number of bytes decompressed into destination buffer (necessarily <= dstCapacity)
         If destination buffer is not large enough, decoding will stop and output an error code (negative value).
         If the source stream is detected malformed, the function will stop decoding and return a negative result.
         This function is protected against malicious data packets.

Fixes: 83a4405ea0 ('ARROW-599: [C++] Lz4 compression codec support')


Author: Dmitry Kalinkin <>

Closes #2032 from veprbl/pr/ARROW-2571 and squashes the following commits:

e768eded <Dmitry Kalinkin> ARROW-2571:  use instead of &data, as latter has side effects
2a512864 <Dmitry Kalinkin> ARROW-2571:  add a test case
e263bf3b <Dmitry Kalinkin> ARROW-2571:  fix Lz4Codec to properly handle empty data

9 days agoARROW-2569: [C++] Improve thread pool size heuristic
Antoine Pitrou [Sun, 13 May 2018 15:18:03 +0000 (17:18 +0200)] 
ARROW-2569: [C++] Improve thread pool size heuristic

The heuristic goes this way:
- if the OMP_NUM_THREADS environment variable exists, it defines the baseline
  number of available threads
- otherwise, the baseline is the value returned by std::thread::harware_concurrency()
- the OMP_THREAD_LIMIT environment variable, if it exists, defined the upper bound
  for the final value, i.e. we return min(baseline, limit), otherwise we just
  return the baseline.

This is the heuristic used by other packages such as the GNU "nproc" utility.

Author: Antoine Pitrou <>

Closes #2026 from pitrou/ARROW-2569-thread-pool-heuristic and squashes the following commits:

5572b075 <Antoine Pitrou> Factor out environment variable helpers
2dd0d12e <Antoine Pitrou> ARROW-2569:  Improve thread pool size heuristic

10 days agoARROW-2567: [C++] Not only compare type ids on Array equality
Korn, Uwe [Sat, 12 May 2018 07:20:11 +0000 (09:20 +0200)] 
ARROW-2567: [C++] Not only compare type ids on Array equality

Author: Korn, Uwe <>

Closes #2025 from xhochy/ARROW-2567 and squashes the following commits:

2db252ae <Korn, Uwe> ARROW-2567:  Not only compare type ids on Array equality

11 days agoARROW-2517: [Java] Add list<decimal> writer (#1965)
Teddy Choi [Fri, 11 May 2018 23:04:54 +0000 (08:04 +0900)] 
ARROW-2517: [Java] Add list<decimal> writer (#1965)

* ARROW-2368: [Java] Add list<decimal> writer

* Fixed TestComplexWriter.listDecimalType test failure

* Support more use cases of list<decimal>

* Remove AbstractPromotableFieldWriter.getWriter(MinorType type, ArrowType arrowType)

11 days agoARROW-2207: [GLib] Support GArrowDecimal128
yosuke shiro [Fri, 11 May 2018 21:31:15 +0000 (06:31 +0900)] 
ARROW-2207: [GLib] Support GArrowDecimal128

Support GArrowDecimal128.

- Add garrow_decimal128_to_string_scale()
- Add garrow_decimal128_to_string()

Author: yosuke shiro <>

Closes #2012 from shiro615/support-garrow-decimal128 and squashes the following commits:

6ef73317 [yosuke shiro] change documents
a878a8bd [yosuke shiro] update arrow-glib/
ee29ef8f [yosuke shiro] fix test case
1ba81765 [yosuke shiro] rename to garrow_decimal128_to_string from garrow_decimal128_to_integer_string
96e3f287 [yosuke shiro] rename to garrow_decimal128_to_string_scale from garrow_decimal128_to_string
d849002a [yosuke shiro] fix indent and release version
ed18e560 [yosuke shiro] fix build errors
5ca31c40 [yosuke shiro] [GLib] support GArrowDecimal128

11 days agoARROW-2500: [Java] IPC Writers/readers are not always setting validity bits correctly
bomeng [Fri, 11 May 2018 17:04:26 +0000 (10:04 -0700)] 
ARROW-2500: [Java] IPC Writers/readers are not always setting validity bits correctly

Fix the issue of getNullCount() does not return the correct result in certain cases.

For example, validityBuffer is 0b10110, while valueCount=3, the null count should be 1, but currently it returns 0.

Fix approach:
1. Based on the valueCount, modify the last byte by mask the remaining bits to be all 1's.
0b10110 will become 0b11111110
2. Count how many 1 in the byte by using bitCount()
3. Use 8 * sizeInBytes - count to get the total 0's

Added 2 tests to the existing test classes;
Created 1 new file to purposely test, since it has some public static method and we may add more testes in the future.

Author: bomeng <>

Closes #2008 from bomeng/2500 and squashes the following commits:

7f914e5 <bomeng> improvement
ccc250d <bomeng> improvement
317d62a <bomeng> improvement
72ceedf <bomeng> improvement based on comments
03fa5d5 <bomeng> fix ARROW-2500: getNullCount() returns incorrect result

11 days agoARROW-2570: [Python] Add support for writing parquet files with LZ4 compression
Dmitry Kalinkin [Fri, 11 May 2018 10:44:00 +0000 (12:44 +0200)] 
ARROW-2570: [Python] Add support for writing parquet files with LZ4 compression

Author: Dmitry Kalinkin <>

Closes #2030 from veprbl/pr/ARROW-2570 and squashes the following commits:

c8af8b1 <Dmitry Kalinkin> ARROW-2570:  Add support for writing parquet files with LZ4 compression

12 days agoARROW-2565: [Plasma] new subscriber cannot receive notifications about existing objects
Zhijun Fu [Thu, 10 May 2018 19:17:54 +0000 (12:17 -0700)] 
ARROW-2565: [Plasma] new subscriber cannot receive notifications about existing objects

When a client subscribes to plasma store, we need to add its file descriptor to pending_notifications_ map, so that push_notifications() can find the new client and push notifications about existing objects to it.
Also added an unit test case to cover this.

@pcmoritz  may you kindly help to take a look please? thanks:)

Author: Zhijun Fu <>

Closes #2022 from zhijunfu/refactor-code and squashes the following commits:

398354ef <Zhijun Fu>  Fix issue that new subscriber can't receive notifications about existing objects, and add unit test

12 days agoARROW-2479: [C++] Add ThreadPool class
Antoine Pitrou [Thu, 10 May 2018 15:47:14 +0000 (17:47 +0200)] 
ARROW-2479: [C++] Add ThreadPool class

* A ThreadPool class with future-returning task submission, and the ability to change number of worker threads on-the-fly
* Tests for the ThreadPool class, including stress tests
* A singleton thread pool for cpu-bound tasks, configured based on hardware capacity
* A public API to change global thread pool capacity
* Migrated the Arrow codebase to using the global thread pool (except APIs taking a `nthreads`, see below)

Remaining open question:
* [ ] what do we do with APIs that take a user-facing `nthreads` argument? (the Pandas conversion routines, which are able to convert/copy different columns in parallel)

Author: Antoine Pitrou <>

Closes #1953 from pitrou/ARROW-2479-threadpool and squashes the following commits:

cea94b4 <Antoine Pitrou> Fix typo
1a96830 <Antoine Pitrou> Explicitly expose std::__once_call* in SO files.
154860c <Antoine Pitrou> Adjust
ab41c7c <Antoine Pitrou> Use global thread pool in Plasma
60f1c62 <Antoine Pitrou> Add process-global thread pool
ad8fa41 <Antoine Pitrou> ARROW-2479:  Add ThreadPool class

12 days agoARROW-1964: [Python] Expose StringBuilder to Python
Donal Simmie [Thu, 10 May 2018 15:28:53 +0000 (17:28 +0200)] 
ARROW-1964: [Python] Expose StringBuilder to Python

* Partial implementation of ARROW-1964
* Only implements StringBuilder not the other builders, notably the DictionaryBuilder mentioned in issue 1964

Author: Donal Simmie <>

Closes #1930 from dsimmie/ARROW-1964-Partial and squashes the following commits:

e4e54ec <Donal Simmie> Fixed review comments from #1930
be2bd39 <Donal Simmie> Exposed StringBuilder to Cython - partial implementation of ARROW-1964

12 days agoARROW-2566: [CI] Add badge
Antoine Pitrou [Thu, 10 May 2018 15:26:50 +0000 (17:26 +0200)] 
ARROW-2566: [CI] Add badge

Author: Antoine Pitrou <>

Closes #2024 from pitrou/ARROW-2566 and squashes the following commits:

8e243498 <Antoine Pitrou> ARROW-2566:  Add badge

12 days agoARROW-2562: [CI] C++ and Rust code coverage using
Antoine Pitrou [Thu, 10 May 2018 15:09:19 +0000 (17:09 +0200)] 
ARROW-2562: [CI] C++ and Rust code coverage using

Collect C++ and Rust coverage info on Travis-CI build, and upload it to
The C++ coverage info is gathered from both the C++ and Python test suites.

Author: Antoine Pitrou <>

Closes #2023 from pitrou/ARROW-2562-codecov-io and squashes the following commits:

8804ab5 <Antoine Pitrou> ARROW-2562:  C++ and Rust code coverage using

12 days agoARROW-2564: [C++] Replace deprecated method in documentation
Kendall Willets [Thu, 10 May 2018 09:38:08 +0000 (11:38 +0200)] 
ARROW-2564: [C++] Replace deprecated method in documentation

Tutorial update to keep working code.

Author: Kendall Willets <>

Closes #2020 from KWillets/master and squashes the following commits:

e6da7f1f <Kendall Willets> fit coding style (?)
e173ae43 <Kendall Willets> remove deprecated method

12 days agoARROW-2557: [Rust] Add badge for code coverage in README
Chao Sun [Thu, 10 May 2018 09:32:20 +0000 (11:32 +0200)] 
ARROW-2557: [Rust] Add badge for code coverage in README

Author: Chao Sun <>

Closes #2014 from sunchao/code-coverage-badge and squashes the following commits:

c39e91c8 <Chao Sun> ARROW-2557:  Add badge for code coverage in README

12 days agoARROW-2552: [Plasma] Fix memory error
Philipp Moritz [Thu, 10 May 2018 04:03:36 +0000 (21:03 -0700)] 
ARROW-2552: [Plasma] Fix memory error

I reran Travis 12 times and the test failure didn't happen (the fix is in

Author: Philipp Moritz <>

Closes #2019 from pcmoritz/fix-memcheck and squashes the following commits:

3540d139 <Philipp Moritz> bring back zero initialization
8dce7721 <Philipp Moritz> initialize struct
51d2588f <Philipp Moritz> fix
84b6ca75 <Philipp Moritz> cleanups
7e03ac83 <Philipp Moritz> fix memory error

2 weeks agoARROW-2491: [Python] raise NotImplementedError on from_buffers with nested types
Korn, Uwe [Tue, 8 May 2018 17:15:40 +0000 (19:15 +0200)] 
ARROW-2491: [Python] raise NotImplementedError on from_buffers with nested types

Author: Korn, Uwe <>

Closes #1927 from xhochy/ARROW-2491 and squashes the following commits:

f2300dbf <Korn, Uwe> Test for NotImplementedError
0dbcc34f <Korn, Uwe> ARROW-2491: raise NotImplementedError on from_buffers with nested types

2 weeks agoARROW-2549: [GLib] Apply arrow::StatusCode changes to GArrowError
Kouhei Sutou [Tue, 8 May 2018 07:57:58 +0000 (09:57 +0200)] 
ARROW-2549: [GLib] Apply arrow::StatusCode changes to GArrowError

Author: Kouhei Sutou <>

Closes #2009 from kou/glib-update-status and squashes the following commits:

8be78984 <Kouhei Sutou>  Apply arrow::StatusCode changes to GArrowError

2 weeks agoARROW-2550: [C++] Add missing status codes into arrow::Status::CodeAsString()
Kouhei Sutou [Tue, 8 May 2018 07:57:16 +0000 (09:57 +0200)] 
ARROW-2550: [C++] Add missing status codes into arrow::Status::CodeAsString()

Author: Kouhei Sutou <>

Closes #2010 from kou/cpp-status-code-as-string-update and squashes the following commits:

dd12f56f <Kouhei Sutou>  Add missing status codes into arrow::Status::CodeAsString()

2 weeks agoARROW-2546: [JS] Update to npm>=5.7.1 to fight EINTEGRITY problems
Korn, Uwe [Mon, 7 May 2018 21:45:52 +0000 (17:45 -0400)] 
ARROW-2546: [JS] Update to npm>=5.7.1 to fight EINTEGRITY problems

Author: Korn, Uwe <>

Closes #2006 from xhochy/ARROW-2546 and squashes the following commits:

39fec71 <Korn, Uwe> ARROW-2546:  Update to npm>=5.7.1 to fight EINTEGRITY problems

2 weeks agoARROW-2540: [Plasma] Create constructors & destructors for ObjectTableEntry
Zhijun Fu [Mon, 7 May 2018 19:30:47 +0000 (12:30 -0700)] 
ARROW-2540: [Plasma] Create constructors & destructors for ObjectTableEntry

This makes sure dlfree() is called for pointer field automatically

Author: Zhijun Fu <>

Closes #1996 from zhijunfu/dlfree and squashes the following commits:

9363b4c5 <Zhijun Fu> Trigger travis build
9f56a850 <Zhijun Fu> re-trigger travis build
3246b843 <Zhijun Fu> fix format check
a8c67b84 <Zhijun Fu>  Create constructors & destructors for ObjectTableEntry

2 weeks agoARROW-2545: [Python] Link against required system libraries
Antoine Pitrou [Mon, 7 May 2018 18:59:43 +0000 (20:59 +0200)] 
ARROW-2545: [Python] Link against required system libraries

Python can require system libraries such as "libutil".  When linking dynamically against, this is not a problem since those libraries are automatically depended on.  When linking statically
against libpythonXX.a (this is the case when Python itself is statically linked and therefore does not provide a .so), you need to specify those libraries explicitly.

Author: Antoine Pitrou <>

Closes #2005 from pitrou/ARROW-2545-python-other-libs and squashes the following commits:

773ad168 <Antoine Pitrou> ARROW-2545:  Link against required system libraries

2 weeks agoARROW-2477: [Rust] Set up code coverage in CI
Chao Sun [Mon, 7 May 2018 18:58:36 +0000 (20:58 +0200)] 
ARROW-2477: [Rust] Set up code coverage in CI

It may require some setup on the I'm just giving it a blind try right now.

Author: Chao Sun <>

Closes #1942 from sunchao/rust-code-coverage and squashes the following commits:

3c351ca8 <Chao Sun> fix
061b66f2 <Chao Sun> try again
0142a71a <Chao Sun> try again
bd53614f <Chao Sun> try again
697adc7b <Chao Sun> more fix
81996e83 <Chao Sun> more fix
14e7a590 <Chao Sun> use travis-cargo
5af1bd07 <Chao Sun> ARROW-2477:  Set up code coverage in CI

2 weeks agoARROW-2285: [C++/Python] Can't convert Numpy string arrays
Krisztián Szűcs [Mon, 7 May 2018 18:55:20 +0000 (20:55 +0200)] 
ARROW-2285: [C++/Python] Can't convert Numpy string arrays

Author: Krisztián Szűcs <>

Closes #1998 from kszucs/ARROW-2285 and squashes the following commits:

32ae6ff3 <Krisztián Szűcs> match on both utf8 and utf-8 error msg
aa2bb6dc <Krisztián Szűcs> fix raise assertion
e15ed680 <Krisztián Szűcs> test convert unicode array
448ee49b <Krisztián Szűcs> convert numpy string array to fixed sized binary

2 weeks agoARROW-2548: Clarify `List<Char>` Array example
Frank Wessels [Mon, 7 May 2018 18:53:55 +0000 (20:53 +0200)] 
ARROW-2548: Clarify `List<Char>` Array example

`joemark` only spans bytes 0 through to 6, so starting from byte position 7 (not 8) the contents can be unspecified.

(or is byte 7 nulled out or something?)

Author: Frank Wessels <>

Closes #1999 from fwessels/fwessels-patch-1 and squashes the following commits:

b7e746b5 <Frank Wessels> Clarify `List<Char>` Array example

2 weeks agoARROW-2547: Fix off-by-one in `List<List<byte>>` example
Frank Wessels [Mon, 7 May 2018 18:51:11 +0000 (20:51 +0200)] 
ARROW-2547: Fix off-by-one in `List<List<byte>>` example

Nested Offsets buffer occupies bytes 0-27, not 0-28.

Author: Frank Wessels <>

Closes #2000 from fwessels/fwessels-patch-2 and squashes the following commits:

d44794da <Frank Wessels> `List<List<byte>>` example

2 weeks agoARROW-2389: [C++] Add CapacityError
Antoine Pitrou [Mon, 7 May 2018 11:37:14 +0000 (13:37 +0200)] 
ARROW-2389: [C++] Add CapacityError

This error signals an attempt to exceed capacity of a buffer or container. I initially thought I'd call this `OverflowError` but `CapacityError` makes it clearer that it's not about overflow on arithmetic operations, iMHO.

Author: Antoine Pitrou <>

Closes #1991 from pitrou/ARROW-2389-capacity-error and squashes the following commits:

afc037d <Antoine Pitrou> ARROW-2389:  Add CapacityError

2 weeks agoARROW-2544: [CI] Run the C++ tests with two jobs
Antoine Pitrou [Mon, 7 May 2018 10:19:40 +0000 (12:19 +0200)] 
ARROW-2544: [CI] Run the C++ tests with two jobs

Travis-CI provides workers with two cores. We should use them.
This should make testing our C++ implementation faster.

Author: Antoine Pitrou <>
Author: Omer Katz <>

Closes #1899 from thedrow/patch-1 and squashes the following commits:

ae8ad52 <Antoine Pitrou> Use `--output-on-failure`
84c9ee1 <Omer Katz> Run the tests with two jobs.

2 weeks agoARROW-2543: [Rust] Cache dependencies when building our rust library
Omer Katz [Mon, 7 May 2018 08:13:37 +0000 (10:13 +0200)] 
ARROW-2543: [Rust] Cache dependencies when building our rust library

Because why not?

Author: Omer Katz <>

Closes #2003 from thedrow/patch-2 and squashes the following commits:

dbb418ed <Omer Katz>  Cache dependencies when building our rust library.

2 weeks ago[Website] Update SciDB in "Powered By" (#2004)
rvernica [Mon, 7 May 2018 07:22:55 +0000 (00:22 -0700)] 
[Website] Update SciDB in "Powered By" (#2004)

2 weeks agoARROW-2273: [Python] Raise NotImplementedError when pandas Sparse types serializing
Licht-T [Sat, 5 May 2018 08:47:20 +0000 (10:47 +0200)] 
ARROW-2273: [Python] Raise NotImplementedError when pandas Sparse types serializing

This fixes [ARROW-2273](

`pandas` Sparse types are planned to be deprecated in pandas future releases (
`SparseDataFrame` and `SparseSeries` are naive implementation and have many bugs. IMO, this is not the right time to support these in `pyarrow`.

Author: Licht-T <>

Closes #1997 from Licht-T/add-pandas-sparse-unsupported-msg and squashes the following commits:

64e24cee <Licht-T> ENH: Raise NotImplementedError when pandas Sparse types serializing pandas Sparse types are planned to be deprecated in pandas future releases.

2 weeks agoARROW-2541: [Plasma] Replace macros with constexpr
Philipp Moritz [Sat, 5 May 2018 08:37:48 +0000 (10:37 +0200)] 
ARROW-2541: [Plasma] Replace macros with constexpr

Author: Philipp Moritz <>

Closes #2001 from pcmoritz/remove-macros and squashes the following commits:

cc3a7d24 <Philipp Moritz> fix comments
3ac436b1 <Philipp Moritz> fix linting
de7b0b63 <Philipp Moritz> more cleanups
9024214e <Philipp Moritz> fix linting
a59e1b5b <Philipp Moritz> fix documentation
176b7c66 <Philipp Moritz> clean up macros

2 weeks agoARROW-2539: [Plasma] Use unique_ptr instead of raw pointer
Zhijun Fu [Fri, 4 May 2018 19:26:29 +0000 (12:26 -0700)] 
ARROW-2539: [Plasma] Use unique_ptr instead of raw pointer

use unique_ptr to replace raw pointer, so that allocated memory can be freed automatically

Author: Zhijun Fu <>

Closes #1993 from zhijunfu/improve-code and squashes the following commits:

3c69ada9 <Zhijun Fu> fix format check
6bfebc2d <Zhijun Fu> fix lint
b5b2fac2 <Zhijun Fu> fix build on travis-ci
d4d64b02 <Zhijun Fu> Merge branch 'master' of into improve-code
84b7e371 <Zhijun Fu>  Use unique_ptr instead of raw pointer

2 weeks agoARROW-2478: [C++] Introduce a checked_cast function that performs a dynamic_cast...
Phillip Cloud [Fri, 4 May 2018 08:21:14 +0000 (10:21 +0200)] 
ARROW-2478: [C++] Introduce a checked_cast function that performs a dynamic_cast in debug mode

Author: Phillip Cloud <>

Closes #1937 from cpcloud/ARROW-2478 and squashes the following commits:

afa88af2 <Phillip Cloud> ARROW-2478:  Introduce a checked_cast function that performs a dynamic_cast in debug mode

2 weeks agoARROW-2516: [CI] Filter changes in AppVeyor builds
Antoine Pitrou [Thu, 3 May 2018 14:46:18 +0000 (16:46 +0200)] 
ARROW-2516: [CI] Filter changes in AppVeyor builds

In AppVeyor PR builds, jobs can be exited early if they test something that the PR doesn't affect.

Author: Antoine Pitrou <>

Closes #1989 from pitrou/ARROW-2516-appveyor-filter-changes and squashes the following commits:

635cd74 <Antoine Pitrou> ARROW-2516:  Filter changes in AppVeyor builds

2 weeks agoARROW-1886: [C++/Python] Flatten struct columns in table
Antoine Pitrou [Thu, 3 May 2018 13:12:02 +0000 (15:12 +0200)] 
ARROW-1886: [C++/Python] Flatten struct columns in table

Add C++ and Python APIs to flatten struct fields and struct columns.

Based on PR #1755.

Author: Antoine Pitrou <>

Closes #1768 from pitrou/ARROW-1886-flatten-table and squashes the following commits:

a821b77 <Antoine Pitrou> Add test for empty column
b8335af <Antoine Pitrou> ARROW-1886:  Flatten struct columns in table

2 weeks agoARROW-2522: [C++] Version shared library files
Antoine Pitrou [Thu, 3 May 2018 11:39:25 +0000 (20:39 +0900)] 
ARROW-2522: [C++] Version shared library files

Author: Antoine Pitrou <>

Closes #1975 from pitrou/ARROW-2522-version-so-files and squashes the following commits:

b57311c0 [Antoine Pitrou] Use cmake's project versioning logic
d6073132 [Antoine Pitrou] Version the same way is
d6d02f5a [Antoine Pitrou] Generate SO version from full Arrow version
e079f57d [Antoine Pitrou] ARROW-2522: [C++] Version shared library files

2 weeks ago[C++] Fix a typo in cpplint (#1986)
284km [Thu, 3 May 2018 06:55:12 +0000 (15:55 +0900)] 
[C++] Fix a typo in cpplint (#1986)

This PR imports google/styleguide#348.

2 weeks agoARROW-2536: [Rust] optimize capacity allocation for ListBuilder
Kane [Thu, 3 May 2018 06:53:50 +0000 (08:53 +0200)] 
ARROW-2536: [Rust] optimize capacity allocation for ListBuilder

Supposed to fix #1983 @andygrove

Author: Kane <>

Closes #1985 from Kane-Sendgrid/fix-builder-capacity-allocation and squashes the following commits:

39bee1ec <Kane> optimize capacity allocation for ListBuilder

2 weeks agoARROW-2511: [Java] Fix BaseVariableWidthVector.allocateNew to not swallow exception...
Venki Korukanti [Wed, 2 May 2018 16:13:50 +0000 (09:13 -0700)] 
ARROW-2511: [Java] Fix BaseVariableWidthVector.allocateNew to not swallow exception (#1947)

+ Also remove the e.printStackTrace() calls

2 weeks agoARROW-2505: [C++] Disable MSVC warning C4800
Antoine Pitrou [Wed, 2 May 2018 10:02:33 +0000 (12:02 +0200)] 
ARROW-2505: [C++] Disable MSVC warning C4800

Author: Antoine Pitrou <>

Closes #1980 from pitrou/ARROW-2505-disable-msvc-warning-c4800 and squashes the following commits:

f586da2 <Antoine Pitrou> Fix command-line option
bb064db <Antoine Pitrou> ARROW-2505:  Disable MSVC warning C4800

2 weeks agoARROW-2493: [Python] Add support for pickling to buffers and arrays
Korn, Uwe [Wed, 2 May 2018 09:27:53 +0000 (11:27 +0200)] 
ARROW-2493: [Python] Add support for pickling to buffers and arrays

Author: Korn, Uwe <>

Closes #1928 from xhochy/ARROW-2493 and squashes the following commits:

e3600f99 <Korn, Uwe> Add pickling support for Arrays
17ec8055 <Korn, Uwe> ARROW-2493:  Add support for pickling to buffers

2 weeks agoARROW-2531: [C++] Update clang bits to 6.0
Korn, Uwe [Wed, 2 May 2018 05:04:27 +0000 (07:04 +0200)] 
ARROW-2531: [C++] Update clang bits to 6.0

Author: Korn, Uwe <>

Closes #1977 from xhochy/ARROW-2531 and squashes the following commits:

cd18d57b <Korn, Uwe> Update llvm install command
cbe94bc3 <Korn, Uwe> Support non-standard homebrew location
651c7e2c <Korn, Uwe> ARROW-2531:  Update clang bits to 6.0

2 weeks agoARROW-2466: [C++] Fix "append" flag to FileOutputStream
Antoine Pitrou [Wed, 2 May 2018 05:02:28 +0000 (07:02 +0200)] 
ARROW-2466: [C++] Fix "append" flag to FileOutputStream

The "append" flag only meant "don't truncate", and would write at the start of the file.

Author: Antoine Pitrou <>

Closes #1978 from pitrou/ARROW-2466-file-output-stream-append and squashes the following commits:

4147bb1d <Antoine Pitrou> ARROW-2466:  Fix "append" flag to FileOutputStream

2 weeks agoARROW-2332: Feather Reader option to return Table
Dhruv Madeka [Wed, 2 May 2018 04:55:21 +0000 (06:55 +0200)] 
ARROW-2332: Feather Reader option to return Table

Author: Dhruv Madeka <>

Closes #1960 from dmadeka/feather-table and squashes the following commits:

cfb4c204 <Dhruv Madeka> Create read_table function
1ae2edd9 <Dhruv Madeka> Deprecate read and move to read_table
a12e8b77 <Dhruv Madeka> Fix Pep8 Issues causing build fails
14afeec6 <Dhruv Madeka> ARROW-2332 Table Read

3 weeks agoARROW-2533: [CI] Fast finish failing AppVeyor builds
Korn, Uwe [Tue, 1 May 2018 21:16:24 +0000 (23:16 +0200)] 
ARROW-2533: [CI] Fast finish failing AppVeyor builds

Author: Korn, Uwe <>

Closes #1982 from xhochy/ARROW-2533 and squashes the following commits:

4b2329aa <Korn, Uwe> ARROW-2533:  Fast finish failing AppVeyor builds

3 weeks agoARROW-2534: [C++] Hide all zlib symbols from
Antoine Pitrou [Tue, 1 May 2018 19:40:14 +0000 (21:40 +0200)] 
ARROW-2534: [C++] Hide all zlib symbols from

Author: Antoine Pitrou <>

Closes #1981 from pitrou/ARROW-2534-hide-zlib-symbols and squashes the following commits:

867dc05 <Antoine Pitrou> ARROW-2534:  Hide all zlib symbols from

3 weeks agoARROW-2499: [C++] Factor out Python iteration routines
Antoine Pitrou [Tue, 1 May 2018 18:55:22 +0000 (20:55 +0200)] 
ARROW-2499: [C++] Factor out Python iteration routines

Speeds up list to Arrow conversions by up to 15%. Also fixes a bug where creating a list array would not check that all input items are sequences.

Based on PR #1935.

Author: Antoine Pitrou <>

Closes #1940 from pitrou/ARROW-2499-python-iteration-refactor and squashes the following commits:

ac31c6c <Antoine Pitrou> Fix Ndarray1DIndexer::is_strided (unused)
91c5af1 <Antoine Pitrou> Add TODO for performance issue
00cab9a <Antoine Pitrou> ARROW-2499:  Refactor Python iteration

3 weeks agoARROW-2417: [Rust] Fix API safety issues
Andy Grove [Tue, 1 May 2018 17:03:25 +0000 (19:03 +0200)] 
ARROW-2417: [Rust] Fix API safety issues

I reviewed all uses of unsafe in the API implementation and added appropriate assertions where needed to guarantee that the API we expose is safe. Also added tests to verify in some cases.

Author: Andy Grove <>

Closes #1957 from andygrove/api_safety and squashes the following commits:

4d99cdfe <Andy Grove> changes based on PR feedback
36927881 <Andy Grove> rust fmt
5983b6df <Andy Grove> review builder api for safety, add tests
ce143025 <Andy Grove> review buffer api for safety, add tests
0e2606b2 <Andy Grove> Merge remote-tracking branch 'upstream/master'
d883da2f <Andy Grove> Merge remote-tracking branch 'upstream/master'
589ef71d <Andy Grove> Merge remote-tracking branch 'upstream/master'
bd4fbb55 <Andy Grove> Merge remote-tracking branch 'upstream/master'
9c8a10a4 <Andy Grove> Merge remote-tracking branch 'upstream/master'
05592f8c <Andy Grove> Merge remote-tracking branch 'upstream/master'
8c0e6982 <Andy Grove> Merge remote-tracking branch 'upstream/master'
31ef90ba <Andy Grove> Merge remote-tracking branch 'upstream/master'
2f87c703 <Andy Grove> Fix build - add missing import

3 weeks agoARROW-2509: Build for node 9.8
Korn, Uwe [Tue, 1 May 2018 16:52:50 +0000 (18:52 +0200)] 
ARROW-2509: Build for node 9.8

This pins the node version and gets us a green build again. We probably need to update some dependencies to get the code working on node 10 (I can reproduce the crash locally)

Author: Korn, Uwe <>

Closes #1976 from xhochy/ARROW-2509 and squashes the following commits:

7474623f <Korn, Uwe> ARROW-2509: Build for node 9.8

3 weeks agoARROW-2503: [Python] Prevent trailing space character for string statistics
Julius Neuffer [Tue, 1 May 2018 14:19:55 +0000 (16:19 +0200)] 
ARROW-2503: [Python] Prevent trailing space character for string statistics

The trailing space is added in `parquet-cpp.` `pyarrow` calls the function `FormatStatValue` which adds the trailing space ( is about fixing this behavior. Once the corresponding PR is merged into `parquet-cpp` and `pyarrow` is synced, the `` will break for `str`. This PR fixes that breakage.

Author: Julius Neuffer <>

Closes #1945 from jneuff/fix-trailing-space-in-string-statistics and squashes the following commits:

f702f18 <Julius Neuffer> ARROW-2503:  Fix string statistics test

3 weeks agoARROW-2484: [C++] Document ABI compliance checking
Korn, Uwe [Tue, 1 May 2018 13:35:57 +0000 (15:35 +0200)] 
ARROW-2484: [C++] Document ABI compliance checking

Author: Korn, Uwe <>

Closes #1922 from xhochy/ARROW-2484 and squashes the following commits:

a03cce9d <Korn, Uwe> ARROW-2484:  Document ABI compliance checking

3 weeks agoARROW-2485: Re-write of, such that it outputs the diffs of th…
Joshua Storck [Tue, 1 May 2018 13:34:16 +0000 (15:34 +0200)] 
ARROW-2485: Re-write of, such that it outputs the diffs of th…

…e original file with the formatted file so that it's easier to visualize what's wrong in the formatting

Author: Joshua Storck <>

Closes #1918 from joshuastorck/run_clang_format_rewrite and squashes the following commits:

7e8e48bf <Joshua Storck> Handling subprocess.check_output output in a uniform manner between Python 2 and Python 3. Also changing the code that outputs the 'Formatting {}' string to a join/map call instead of a loop
ca4bd571 <Joshua Storck> Re-write of, such that it outputs the diffs of the original file with the formatted file so that it's easier to visualize what's wrong in the formatting

3 weeks agoARROW-2530: [GLib] Support out-of-source directory build again
Kouhei Sutou [Tue, 1 May 2018 13:30:13 +0000 (15:30 +0200)] 
ARROW-2530: [GLib] Support out-of-source directory build again

Author: Kouhei Sutou <>

Closes #1974 from kou/glib-support-out-of-source-directory-build-again and squashes the following commits:

c83d2ba1 <Kouhei Sutou>  Support out-of-source directory build again

3 weeks agoARROW-2422: Support more operators for partition filtering
Markus Klein [Tue, 1 May 2018 10:20:18 +0000 (12:20 +0200)] 
ARROW-2422: Support more operators for partition filtering

This extends the functionality of, by adding support for '<', '>', '<=', '>=' comparison operators in filters.

Author: Markus Klein <>
Author: Julius Neuffer <>

Closes #1861 from jneuff/extend-partition-filters and squashes the following commits:

8c8cca91 <Markus Klein> test invalid predicat operator
849a5bbf <Markus Klein> formatting
f99554c9 <Markus Klein> remove nested TestParquetFilter
942fd330 <Julius Neuffer> Merge branch 'master' of into extend-partition-filters
b5313a61 <Markus Klein> Merge branch 'master' into extend-partition-filters
7887d64b <Markus Klein> pep8 formatting
ed3a1760 <Markus Klein> fix tests
e2f4ad43 <Markus Klein> replace foo -> integers
af6c13a6 <Julius Neuffer> Extend filter tests to boolean and datetime
7efa79c6 <Markus Klein> extra test for boolean, failing test for dates
36bd4f05 <Julius Neuffer> Support more operators for partition filtering

3 weeks agoARROW-2507: [Rust] Don't take a reference when not needed.
Bruce Mitchener [Tue, 1 May 2018 10:13:06 +0000 (12:13 +0200)] 
ARROW-2507: [Rust] Don't take a reference when not needed.

Since this is already a reference value, we don't need to borrow it again.

Author: Bruce Mitchener <>

Closes #1941 from waywardmonkeys/remove-useless-reference and squashes the following commits:

6f782ad1 <Bruce Mitchener> ARROW-2507:  Don't take a reference when not needed.

3 weeks agoARROW-2482: [Format] Clarify struct field alignment
Andy Grove [Tue, 1 May 2018 10:06:25 +0000 (12:06 +0200)] 
ARROW-2482: [Format] Clarify struct field alignment

Author: Andy Grove <>

Closes #1959 from andygrove/clarify_struct_field_alignment and squashes the following commits:

7a907589 <Andy Grove> change wording as suggested by Wes
475be395 <Andy Grove> Make spec specific about alignment of nested arrays in a struct
0e2606b2 <Andy Grove> Merge remote-tracking branch 'upstream/master'
d883da2f <Andy Grove> Merge remote-tracking branch 'upstream/master'
589ef71d <Andy Grove> Merge remote-tracking branch 'upstream/master'
bd4fbb55 <Andy Grove> Merge remote-tracking branch 'upstream/master'
9c8a10a4 <Andy Grove> Merge remote-tracking branch 'upstream/master'
05592f8c <Andy Grove> Merge remote-tracking branch 'upstream/master'
8c0e6982 <Andy Grove> Merge remote-tracking branch 'upstream/master'
31ef90ba <Andy Grove> Merge remote-tracking branch 'upstream/master'
2f87c703 <Andy Grove> Fix build - add missing import

3 weeks agoARROW-2525: [GLib] Add garrow_struct_array_flatten()
Kouhei Sutou [Tue, 1 May 2018 09:58:35 +0000 (11:58 +0200)] 
ARROW-2525: [GLib] Add garrow_struct_array_flatten()

garrow_struct_array_get_fields() is deprecated.

Version related macros are also added for deprecating

Author: Kouhei Sutou <>

Closes #1962 from kou/glib-struct-array-flatten and squashes the following commits:

e4988a77 <Kouhei Sutou>  Add garrow_struct_array_flatten()

3 weeks agoARROW-2527: [GLib] Enable GPU document
Kouhei Sutou [Tue, 1 May 2018 09:55:23 +0000 (11:55 +0200)] 
ARROW-2527: [GLib] Enable GPU document

If the built Apache Arrow doesn't support GPU, it just generates empty

Author: Kouhei Sutou <>

Closes #1964 from kou/glib-enable-cuda-document and squashes the following commits:

dccfff04 <Kouhei Sutou>  Enable GPU document

3 weeks agoARROW-2474: [Rust] Add windows support for memory pool abstraction
Paddy Horan [Tue, 1 May 2018 09:47:31 +0000 (11:47 +0200)] 
ARROW-2474: [Rust] Add windows support for memory pool abstraction

@andygrove @liurenjie1024 I updated to use `*const` instead of `*mut` as this is the convention in and it is that is used elsewhere.

I also stuck to the way when it came to the `Result` enum used.  I had issues getting the generic version in to work.

Feel free to submit patches to address either of these if you see fit.  Either way we can start to use the memory pool abstraction elsewhere in the code base without breaking windows.

Windows CI should be done soon.

Author: Paddy Horan <>

Closes #1955 from paddyhoran/ARROW-2474 and squashes the following commits:

782a221f <Paddy Horan> Moved alignment into test scope.
99ded86a <Paddy Horan> Ran `cargo fmt`
88a12f6b <Paddy Horan> Updated memory_pool to be windows compatible.

3 weeks agoARROW-2526: [GLib] Update .gitignore
Kouhei Sutou [Tue, 1 May 2018 09:46:29 +0000 (11:46 +0200)] 
ARROW-2526: [GLib] Update .gitignore

We should have done this when we updated document build configuration
for Meson and removed Go examples.

Author: Kouhei Sutou <>

Closes #1963 from kou/glib-update-gitignore and squashes the following commits:

add6631f <Kouhei Sutou>  Update .gitignore

3 weeks agoARROW-2302: [GLib] Unify GNU Autotools build and Meson build into one Travis CI job
Kouhei Sutou [Tue, 1 May 2018 09:43:45 +0000 (11:43 +0200)] 
ARROW-2302: [GLib] Unify GNU Autotools build and Meson build into one Travis CI job

Author: Kouhei Sutou <>

Closes #1967 from kou/glib-ci-unify and squashes the following commits:

fe61c8ab <Kouhei Sutou>  Unify GNU Autotools build and Meson build into one Travis CI job

3 weeks agoARROW-2462: [C++] Fix Segfault in UnpackBinaryDictionary
Matt Topol [Tue, 1 May 2018 09:20:37 +0000 (11:20 +0200)] 
ARROW-2462: [C++] Fix Segfault in UnpackBinaryDictionary

Discovered this through using pyarrow and dealing with RecordBatch Streams and parquet. The issue can be replicated as follows:

import pyarrow as pa
import pyarrow.parquet as pq

# create record batch with 1 dictionary column
indices = pa.array([1,0,1,1,0])
dictionary = pa.array(['Foo', 'Bar'])
dict_array = pa.DictionaryArray.from_arrays(indices, dictionary)
rb = pa.RecordBatch.from_arrays( [ dict_array ], [ 'd0' ] )

# write out using RecordBatchStreamWriter
sink = pa.BufferOutputStream()
writer = pa.RecordBatchStreamWriter(sink, rb.schema)
buf = sink.get_result()

# read in and try to write parquet table
reader = pa.open_stream(buf)
tbl = reader.read_all()
pq.write_table(tbl, 'dict_table.parquet') # SEGFAULTS

When writing record batch streams, if there are no nulls in an array, Arrow will put a placeholder nullptr instead of putting the full bitmap of 1s, when deserializing that stream, the bitmap for the nulls isn't populated and is left to being a nullptr. When attempting to write this table via pyarrow.parquet, you end up [here]( in the parquet writer code which attempts to Cast the dictionary to a non-dictionary representation. Since the null count isn't checked before creating a BitmapReader, the BitmapReader is constructed with a nullptr for the bitmap_data, but a non-zero length which then segfaults in the constructor [here]( because `bitmap` is null.

So a simple check of the null count before constructing the BitmapReader avoids the segfault.

Author: Matt Topol <>
Author: Matthew Topol <>

Closes #1896 from zeroshade/fix_cast and squashes the following commits:

631b95c <Matthew Topol> Adding check in the unit test to validate the data itself too
e085476 <Matt Topol> Add unit test for unpacking
8b86540 <Matt Topol> ARROW-2462:  Fix Segfault in UnpackBinaryDictionary / UnpackFixedSizeBinaryDictionary

3 weeks agoARROW-2436: [Rust] Add windows CI
Paddy Horan [Tue, 1 May 2018 09:15:57 +0000 (11:15 +0200)] 
ARROW-2436: [Rust] Add windows CI

Author: Paddy Horan <>
Author: Paddy <>
Author: Antoine Pitrou <>

Closes #1949 from paddyhoran/ARROW-2436 and squashes the following commits:

bad1f18 <Paddy Horan> Restoring order of %PATH%
82f68d8 <Paddy Horan> Trying to confirm source of error
7f8b050 <Paddy Horan> Wrapping MINICONDA in double quotes
2405086 <Antoine Pitrou> Try adding quotes when setting PATH
c26100f <Paddy Horan> Restoring quotes.
25d3747 <Paddy Horan> Testing variable change.
7a3ef5a <Paddy Horan> Testing branches of if statement.
3c94bdf <Paddy Horan> Updating scripts to assume root directory.
9e8574c <Paddy Horan> Moving Rust build to end.
c6723d3 <Paddy Horan> Cleaned up build and install scripts.
e976fa9 <Paddy Horan> Cleaned up appveyor.yml
0b5099f <Paddy> Removed `BUILD_SCRIPT` variable
e94ff0e <Paddy> Testing multi-line if block
68c183b <Paddy> Temporarily disabling other build jobs.
9ec5d57 <Paddy Horan> Added license
f00d517 <Paddy Horan> Disable CLCACHE for Rust builds
17adfad <Paddy> Move into rust folder before testing
24e839c <Paddy> Updated install section.
6233c96 <Paddy> Updated windows ci to include Rust

3 weeks agoARROW-2529: [C++] Update mention of clang-format to 5.0 in the docs
Alessandro Andrioni [Tue, 1 May 2018 08:27:45 +0000 (10:27 +0200)] 
ARROW-2529: [C++] Update mention of clang-format to 5.0 in the docs

Also add docs on how to install on macOS via Homebrew.

Author: Alessandro Andrioni <>

Closes #1972 from andrioni/patch-1 and squashes the following commits:

4ecc95f <Alessandro Andrioni> clang-format 5.0 is needed now

3 weeks agoARROW-2513: [Python] DictionaryType should give access to index type and dictionary...
Marco Neumann [Mon, 30 Apr 2018 07:47:04 +0000 (09:47 +0200)] 
ARROW-2513: [Python] DictionaryType should give access to index type and dictionary array

Implement missing Python properties for DictionaryType

Author: Marco Neumann <>

Closes #1951 from crepererum/ARROW-2513 and squashes the following commits:

788179c <Marco Neumann> implement DictionaryType.dictionary
b703ecf <Marco Neumann> implement DictionaryType.index_type

3 weeks agoARROW-2515 [Python] Add DictionaryValue class, fixing bugs with nested dictionaries
Brent Kerby [Mon, 30 Apr 2018 07:43:08 +0000 (09:43 +0200)] 
ARROW-2515 [Python] Add DictionaryValue class, fixing bugs with nested dictionaries

This introduces a scalar value class DictionaryValue, which fixes a couple bugs involving dictionaries nested inside of ListArrays or inside of other DictionaryArrays. This also includes a new test, which failed previous to this commit but now passes. See

This is my first time contributing, so feedback would be most welcome.

Author: Brent Kerby <>

Closes #1954 from blkerby/DictionaryValue and squashes the following commits:

1e06963 <Brent Kerby> ARROW-2515:  Add DictionaryValue class, fixing bugs with nested dictionaries

3 weeks ago[GLib] Fix a typo
Kouhei Sutou [Mon, 30 Apr 2018 06:15:37 +0000 (15:15 +0900)] 
[GLib] Fix a typo

3 weeks agoARROW-2452: [TEST] Spark integration test fails with permission error
Krisztián Szűcs [Sat, 28 Apr 2018 01:48:44 +0000 (18:48 -0700)] 
ARROW-2452: [TEST] Spark integration test fails with permission error

The build itself fails too:

[info] Compiling 103 Scala sources and 6 Java sources to /apache-arrow/spark/streaming/target/scala-2.11/classes...
[error] Compile failed at Apr 13, 2018 9:50:50 AM [1:05.284s]
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO] Spark Project Parent POM ........................... SUCCESS [ 17.349 s]
[INFO] Spark Project Tags ................................. SUCCESS [ 20.587 s]
[INFO] Spark Project Sketch ............................... SUCCESS [ 12.047 s]
[INFO] Spark Project Local DB ............................. SUCCESS [  8.086 s]
[INFO] Spark Project Networking ........................... SUCCESS [ 18.759 s]
[INFO] Spark Project Shuffle Streaming Service ............ SUCCESS [ 10.423 s]
[INFO] Spark Project Unsafe ............................... SUCCESS [ 19.453 s]
[INFO] Spark Project Launcher ............................. SUCCESS [ 17.220 s]
[INFO] Spark Project Core ................................. SUCCESS [12:40 min]
[INFO] Spark Project ML Local Library ..................... SUCCESS [ 32.734 s]
[INFO] Spark Project GraphX ............................... SUCCESS [01:02 min]
[INFO] Spark Project Streaming ............................ FAILURE [01:09 min]
[INFO] Spark Project Catalyst ............................. SKIPPED
[INFO] Spark Project SQL .................................. SKIPPED
[INFO] Spark Project ML Library ........................... SKIPPED
[INFO] Spark Project Tools ................................ SKIPPED
[INFO] Spark Project Hive ................................. SKIPPED
[INFO] Spark Project REPL ................................. SKIPPED
[INFO] Spark Project Assembly ............................. SKIPPED
[INFO] Spark Integration for Kafka 0.10 ................... SKIPPED
[INFO] Kafka 0.10 Source for Structured Streaming ......... SKIPPED
[INFO] Spark Project Examples ............................. SKIPPED
[INFO] Spark Integration for Kafka 0.10 Assembly .......... SKIPPED
[INFO] ------------------------------------------------------------------------
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 17:29 min
[INFO] Finished at: 2018-04-13T09:50:50Z
[INFO] Final Memory: 59M/741M
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal net.alchim31.maven:scala-maven-plugin:3.2.2:compile (scala-compile-first) on project spark-streaming_2.11: Execution scala-compile-first of goal net.alchim31.maven:scala-maven-plugin:3.2.2:compile failed.: C
ompileFailed -> [Help 1]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1]
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR]   mvn <goals> -rf :spark-streaming_2.11

Should I create a JIRA ticket?

Author: Krisztián Szűcs <>

Closes #1890 from kszucs/ARROW-2452 and squashes the following commits:

224e9dd <Krisztián Szűcs> forward all arguments to docker-compose
1d34704 <Krisztián Szűcs> fix order of arguments
2641080 <Krisztián Szűcs> don't pass group- and userid to docker-compose

3 weeks agoARROW-2518: [Java] Re-instate JDK tests in matrix, but with JDK 8 instead of JDK 7
Andy Grove [Sat, 28 Apr 2018 01:46:43 +0000 (18:46 -0700)] 
ARROW-2518: [Java] Re-instate JDK tests in matrix, but with JDK 8 instead of JDK 7

Author: Andy Grove <>

Closes #1956 from agrove-rms/restore_java_tests and squashes the following commits:

69309d0 <Andy Grove> re-instate JDK tests in matrix, but with JDK 8 instead of JDK 7

3 weeks agoARROW-2498: [Java] Use java 1.8 instead of java 1.7
Andy Grove [Thu, 26 Apr 2018 22:04:56 +0000 (15:04 -0700)] 
ARROW-2498: [Java] Use java 1.8 instead of java 1.7

Author: Andy Grove <>

Closes #1936 from agrove-rms/jdk8 and squashes the following commits:

d5dca81 <Andy Grove> remove jdk7 from CI matrix
ef01df4 <Andy Grove> use java 1.8 instead of java 1.7

3 weeks agoARROW-2286: [C++/Python] Allow subscripting pyarrow.lib.StructValue
Krisztián Szűcs [Thu, 26 Apr 2018 13:16:36 +0000 (15:16 +0200)] 
ARROW-2286: [C++/Python] Allow subscripting pyarrow.lib.StructValue

Author: Krisztián Szűcs <>

Closes #1943 from kszucs/ARROW-2286 and squashes the following commits:

bbd496c <Krisztián Szűcs> fix review issues
f848858 <Krisztián Szűcs> implement StructValue.__getitem__
708e78f <Krisztián Szűcs> cpp unittests
f4a9bab <Krisztián Szűcs> GetChildByName and GetChildIndex for StructType

3 weeks agoARROW-2448: [Plasma] Reference counting for PlasmaClient::Impl
Philipp Moritz [Wed, 25 Apr 2018 18:41:25 +0000 (11:41 -0700)] 
ARROW-2448: [Plasma] Reference counting for PlasmaClient::Impl

This is a followup to which does reference counting of the PlasmaClient held by PlasmaBuffers to avoid the segfault in ARROW-2448.

Author: Philipp Moritz <>

Closes #1939 from pcmoritz/autoget-sharedptr and squashes the following commits:

f1e6e8b7 <Philipp Moritz> fix test
2da395f9 <Philipp Moritz> update
13b12049 <Philipp Moritz> fixes
b68b15e4 <Philipp Moritz> fix ObjectStatus
6d560db4 <Philipp Moritz> remove headers
94cdfd7c <Philipp Moritz> add test
6798ed01 <Philipp Moritz> Give shared_ptr of PlasmaClient::Impl to PlasmaBuffer

3 weeks agoARROW-2074: [Python] Infer lists of dicts as struct arrays
Antoine Pitrou [Wed, 25 Apr 2018 16:39:35 +0000 (18:39 +0200)] 
ARROW-2074: [Python] Infer lists of dicts as struct arrays

Also refactor the type inference visitor and remove the superfluous separate SeqVisitor; improve inference visitor performance by 30%; and add a struct type inference benchmark.

Author: Antoine Pitrou <>

Closes #1935 from pitrou/ARROW-2074-infer-dict-lists and squashes the following commits:

13ed6c30 <Antoine Pitrou> Fix tests on 2.7
3baa2eac <Antoine Pitrou> ARROW-2074:  Infer lists of dicts as struct arrays

3 weeks agoARROW-2508: [Python] Fix pytest.raises msg to message
Philipp Moritz [Wed, 25 Apr 2018 11:50:45 +0000 (13:50 +0200)] 
ARROW-2508: [Python] Fix pytest.raises msg to message

Author: Philipp Moritz <>
Author: Antoine Pitrou <>

Closes #1944 from pcmoritz/fix-pytest-msg and squashes the following commits:

00b2cd4 <Antoine Pitrou> Use `match` argument as intended by the test
6d6cc68 <Philipp Moritz> fix pytest.raises msg to message