Skip to content

Conversation

@abridgett
Copy link

I don't know Scala (at all!) so there's almost certainly cleaner ways - my apologies. The logging at the moment is sometimes unhelpful as it's hard to see the real issue - with DROPMALFORMED you see the line, with another parsing mode you get the error but not the line.

tokenRdd(schemaFields.map(_.name)).flatMap { tokens =>
if (dropMalformed && schemaFields.length != tokens.size) {
logger.warn(s"Dropping malformed line: ${tokens.mkString(delimiter.toString)}")
logger.warn(s"Dropping malformed line (wrong length): ${tokens.mkString(delimiter.toString)}")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check travis. [error] /home/travis/build/databricks/spark-csv/src/main/scala/com/databricks/spark/csv/CsvRelation.scala:176: File line length exceeds 100 characters

@HyukjinKwon
Copy link
Member

For me I think this is reasonable because I sometimes see some users have some difficulties to find out why it was dropped or malformed.

@codecov-io
Copy link

Current coverage is 85.86%

Merging #258 into master will decrease coverage by -0.21% as of 26a1447

@@            master    #258   diff @@
======================================
  Files           12      12       
  Stmts          517     552    +35
  Branches       149     160    +11
  Methods          0       0       
======================================
+ Hit            445     474    +29
  Partial          0       0       
- Missed          72      78     +6

Review entire Coverage Diff as of 26a1447

Powered by Codecov. Updated on successful CI builds.

@abridgett
Copy link
Author

Thanks Hyukjin, I've fixed that overlong line

_: IllegalArgumentException if dropMalformed =>
logger.warn("Number format exception. " +
case e: java.lang.NumberFormatException if dropMalformed =>
logger.warn("Number format exception (" + e + ")." +
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personal opinion: since this is an warning message, shouldn't we maybe just print out the message only by e.getMessage() excluding the class name of the exception?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition, you can write s"Number format exception (${a.getMessage})." instead of concatenating them with + which is not used generally in this library and Spark codes as well.

}

assert(exception.getMessage.contains("Malformed line in FAILFAST mode: 2015,Chevy,Volt"))
assert(exception.getMessage.contains("Malformed line in FAILFAST mode (expected 5 tokens but received 3 tokens): 2015,Chevy,Volt"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line exceeds 100 characters as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants