Make ParseExceptions more informative (#12259)
authorLaksh Singla <30999375+LakshSingla@users.noreply.github.com>
Mon, 28 Feb 2022 17:01:15 +0000 (22:31 +0530)
committerGitHub <noreply@github.com>
Mon, 28 Feb 2022 17:01:15 +0000 (22:31 +0530)
commit3f709db173d779db1466e57a1564a5f557b4b0cf
treea6030204f956bc46a57676898147369ab10dcaf4
parentd105519558951aa289992d05043194b8b7ceaee4
Make ParseExceptions more informative (#12259)

This PR aims to make the ParseExceptions in Druid more informative, by adding additional information (metadata) to the ParseException, which can contain additional information about the exception. For example - the path of the file generating the issue, the line number (where it can be easily fetched - like CsvReader)

Following changes are addressed in this PR:

A new class CloseableIteratorWithMetadata has been created which is like CloseableIterator but also has a metadata method that returns a context Map<String, Object> about the current element returned by next().
IntermediateRowParsingReader#read() now attaches the InputEntity and the "record number" which created the exception (while parsing them), and IntermediateRowParsingReader#sample attaches the InputEntity (but not the "record number").
TextReader (and its subclasses), which is a specific implementation of the IntermediateRowParsingReader also include the line number which caused the generation of the error.
This will also help in triaging the issues when InputSourceReader generates ParseException because it can point to the specific InputEntity which caused the exception (while trying to read it).
16 files changed:
core/src/main/java/org/apache/druid/data/input/IntermediateRowParsingReader.java
core/src/main/java/org/apache/druid/data/input/TextReader.java
core/src/main/java/org/apache/druid/data/input/impl/JsonReader.java
core/src/main/java/org/apache/druid/java/util/common/parsers/CloseableIteratorWithMetadata.java [new file with mode: 0644]
extensions-core/avro-extensions/src/main/java/org/apache/druid/data/input/avro/AvroOCFReader.java
extensions-core/avro-extensions/src/main/java/org/apache/druid/data/input/avro/AvroStreamReader.java
extensions-core/kafka-indexing-service/src/test/java/org/apache/druid/indexing/kafka/KafkaIndexTaskTest.java
extensions-core/kinesis-indexing-service/src/test/java/org/apache/druid/indexing/kinesis/KinesisIndexTaskTest.java
extensions-core/orc-extensions/src/main/java/org/apache/druid/data/input/orc/OrcReader.java
extensions-core/parquet-extensions/src/main/java/org/apache/druid/data/input/parquet/ParquetReader.java
extensions-core/protobuf-extensions/src/main/java/org/apache/druid/data/input/protobuf/ProtobufReader.java
indexing-service/src/main/java/org/apache/druid/indexing/input/DruidSegmentReader.java
indexing-service/src/test/java/org/apache/druid/indexing/common/task/IndexTaskTest.java
indexing-service/src/test/java/org/apache/druid/indexing/common/task/batch/parallel/SinglePhaseParallelIndexingTest.java
indexing-service/src/test/java/org/apache/druid/indexing/overlord/sampler/InputSourceSamplerTest.java
server/src/main/java/org/apache/druid/metadata/input/SqlReader.java