-
Notifications
You must be signed in to change notification settings - Fork 6
Description
Thanks @kevrasm for solving the clock issue.
I tried to use the new jar but I am facing another issue with databricks 5.2 ML.
After successfully creating a clock, I wanted to use a summarizer with the function summarizeIntervals but it failed with the following error:
/local_disk0/spark-34261885-5939-47e4-b37c-fc95545a6b47/userFiles-25527d91-086d-4a90-839f-09b97f09c196/addedFile5376141714691461041dbfs__FileStore_jars_785cdf36_8307_41eb_9f3d_a9d1a89ab416_flint_0_6_0_databricks-7358e.jar/ts/flint/dataframe.py in summarizeIntervals(self, clock, summarizer, key, inclusion, rounding)
1071 else:
1072 with traceback_utils.SCCallSiteSync(self._sc) as css:
-> 1073 return self._summarizeIntervals_builtin(clock, summarizer, key, inclusion, rounding)
1074
1075 def _summarizeIntervals_udf(self, clock, columns,
/local_disk0/spark-34261885-5939-47e4-b37c-fc95545a6b47/userFiles-25527d91-086d-4a90-839f-09b97f09c196/addedFile5376141714691461041dbfs__FileStore_jars_785cdf36_8307_41eb_9f3d_a9d1a89ab416_flint_0_6_0_databricks-7358e.jar/ts/flint/dataframe.py in _summarizeIntervals_builtin(self, clock, summarizer, key, inclusion, rounding)
1093 scala_key,
1094 inclusion,
-> 1095 rounding)
1096
1097 return TimeSeriesDataFrame._from_tsrdd(tsrdd, self.sql_ctx)
/databricks/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py in call(self, *args)
1255 answer = self.gateway_client.send_command(command)
1256 return_value = get_return_value(
-> 1257 answer, self.gateway_client, self.target_id, self.name)
1258
1259 for temp_arg in temp_args:
/databricks/spark/python/pyspark/sql/utils.py in deco(*a, **kw)
61 def deco(*a, **kw):
62 try:
---> 63 return f(*a, **kw)
64 except py4j.protocol.Py4JJavaError as e:
65 s = e.java_exception.toString()
/databricks/spark/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py in get_return_value(answer, gateway_client, target_id, name)
326 raise Py4JJavaError(
327 "An error occurred while calling {0}{1}{2}.\n".
--> 328 format(target_id, ".", name), value)
329 else:
330 raise Py4JError(
Py4JJavaError: An error occurred while calling o557.summarizeIntervals.
: java.lang.NoClassDefFoundError: Could not initialize class com.twosigma.flint.rdd.function.group.Intervalize$
at com.twosigma.flint.rdd.OrderedRDD.intervalize(OrderedRDD.scala:560)
at com.twosigma.flint.timeseries.TimeSeriesRDDImpl.summarizeIntervals(TimeSeriesRDD.scala:1605)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:380)
at py4j.Gateway.invoke(Gateway.java:295)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:251)
at java.lang.Thread.run(Thread.java:748)
The same with the function groupByInterval.
I tried to run the following example: https://github.com/twosigma/flint/tree/master/example without success. It failed at summarizers level:
sp500_decayed_return = sp500_joined_return.summarizeWindows( window = windows.past_absolute_time('7day'), summarizer = summarizers.ewma('previous_day_return', alpha=0.5) )
What is so special about databricks that makes the two sigma version not compatible ?