Add DABs template for scala job #66

garlandz-db · 2025-02-13T17:35:23Z

as per title
reason for this: we want to create an (experimental) template that will make it easy to start off creating a scala job with a locally assembled jar
instructions

databricks bundle init https://github.com/garlandz-db/bundle-examples --template-dir contrib/templates/scala-job
databricks bundle deploy -t dev
databricks bundle run

Run URL: https://e2-dogfood.staging.cloud.databricks.com/?o=6051921418418893#job/636237385400036/run/484758087823306

2025-02-14 12:50:21 "[dev garland_zhang] project_name" RUNNING 
2025-02-14 13:01:15 "[dev garland_zhang] project_name" TERMINATED SUCCESS 
Hello, World foo123!
Running in a Databricks cluster
Showing range ...
+---+
| id|
+---+
|  0|
|  1|
|  2|
+---+
Showing nyctaxi trips ...
+--------------------+---------------------+-------------+-----------+----------+-----------+-----------+
|tpep_pickup_datetime|tpep_dropoff_datetime|trip_distance|fare_amount|pickup_zip|dropoff_zip| testresult|
+--------------------+---------------------+-------------+-----------+----------+-----------+-----------+
| 2016-02-16 22:40:45|  2016-02-16 22:59:25|         5.35|       18.5|     10003|      11238|test: 11238|
| 2016-02-05 16:06:44|  2016-02-05 16:26:03|          6.5|       21.5|     10282|      10001|test: 10001|
| 2016-02-08 07:39:25|  2016-02-08 07:44:14|          0.9|        5.5|     10119|      10003|test: 10003|
| 2016-02-29 22:25:33|  2016-02-29 22:38:09|          3.5|       13.5|     10001|      11222|test: 11222|
| 2016-02-03 17:21:02|  2016-02-03 17:23:24|          0.3|        3.5|     10028|      10028|test: 10028|
| 2016-02-10 00:47:44|  2016-02-10 00:53:04|          0.0|        5.0|     10038|      10005|test: 10005|
| 2016-02-19 03:24:25|  2016-02-19 03:44:56|         6.57|       21.5|     10001|      11377|test: 11377|
| 2016-02-02 14:05:23|  2016-02-02 14:23:07|         1.08|       11.5|     10103|      10167|test: 10167|
| 2016-02-20 15:42:20|  2016-02-20 15:50:40|          0.8|        7.0|     10003|      10011|test: 10011|
| 2016-02-14 16:19:53|  2016-02-14 16:32:10|          1.3|        9.0|     10199|      10020|test: 10020|
+--------------------+---------------------+-------------+-----------+----------+-----------+-----------+

garlandz-db · 2025-02-13T17:40:21Z

contrib/templates/scala-job/databricks_template_schema.json

+      "default": "2.12",
+      "order": 12
+    },
+    "scala_maintenance_version": {


spent too long on this.. theres no functions afaict in template language so I cant extract from user string :/

What's going on? Can you explain?

Isn't it okay to just have hardcoded defaults here anyway? If you do need to do things with versions, there are things you can do with regexes and other helpers, see e.g. https://github.com/databricks/cli/blob/main/libs/template/templates/dbt-sql/template/%7B%7B.project_name%7D%7D/profile_template.yml.tmpl#L8 and https://github.com/databricks/cli/blob/main/libs/template/templates/dbt-sql/template/%7B%7B.project_name%7D%7D/dbt_profiles/profiles.yml.tmpl#L20. Or you could ask ChatGPT for some help; just indicate they you're working with go txst/template.

nija-at · 2025-02-14T10:46:47Z

contrib/templates/scala-job/databricks_template_schema.json

+    "jar_dest_path": {
+      "type": "string",
+      "description": "Destination path in Databricks where the JAR will be stored. Note: You must create a Volumes first if you plan to store the JAR there.",


Maybe better to call this "artifacts_dest_path" to make this more than just JARs.

Add an example to the description

contrib/templates/scala-job/databricks_template_schema.json

nija-at · 2025-02-14T10:48:28Z

contrib/templates/scala-job/databricks_template_schema.json

+      "description": "Destination path in Databricks where the JAR will be stored. Note: You must create a Volumes first if you plan to store the JAR there.",
+      "order": 3
+    },
+    "cluster_key": {


Suggested change

"cluster_key": {

"existing_cluster_id": {

nija-at · 2025-02-14T10:48:52Z

contrib/templates/scala-job/databricks_template_schema.json

+      "default": "2.12",
+      "order": 12
+    },
+    "scala_maintenance_version": {


What's going on? Can you explain?

nija-at · 2025-02-14T10:49:52Z

contrib/templates/scala-job/databricks_template_schema.json

+    "scala_version": {
+      "type": "string",
+      "description": "Scala version (e.g., 2.12). Run scala -version to find it. Note: Only support 2.12 and 2.13",
+      "default": "2.12",
+      "order": 12
+    },


We know the Scala version they MUST use. Should this be offered as an option?

nija-at · 2025-02-14T14:57:20Z

contrib/templates/scala-job/template/databricks.yml.tmpl

+  name: {{.project_name}}
+
+workspace:
+  host: {{workspace_host}}


Make this an option in the list. Default can be {{workspace_host}}

nija-at · 2025-02-14T14:58:28Z

contrib/templates/scala-job/template/databricks.yml.tmpl

+    build: sbt package && sbt assembly
+    path: .
+    files:
+      - source: {{template `jar_path` .}}


this needs to include ${workspace.current_user.short_name}

this specifies the local path where the target is generated. we don't need the name here

contrib/templates/scala-job/databricks_template_schema.json

nija-at · 2025-02-14T15:10:03Z

contrib/templates/scala-job/template/src/main/scala/com/examples/Main.scala

+
+object Main {
+  def main(args: Array[String]): Unit = {
+    println("Hello, World foo123!")


Suggested change

println("Hello, World foo123!")

println("Hello, World!")

contrib/templates/scala-job/template/local_dependencies/dbconnect-scala-16.2.0-dist.tar.gz

contrib/templates/scala-job/databricks_template_schema.json

nija-at · 2025-02-19T15:04:51Z

contrib/templates/scala-job/databricks_template_schema.json

+    "custom_workspace_host": {
+      "type": "string",
+      "default": "{{workspace_host}}",
+      "description": "Workspace host url",


Add an example for clarity and state what will be the default if not specified.

This seems redundant though because the default already shows up in the text prompt

nija-at · 2025-02-19T15:06:32Z

contrib/templates/scala-job/databricks_template_schema.json

+    },
+    "artifacts_dest_path": {
+      "type": "string",
+      "description": "Destination path in Databricks where the JAR and other artifacts will be stored (e.g.; /Volumes/main/{{short_name}}/scala_job_test). Note: If you use Volumes, You must create a Volumes first and the path should start with /Volumes",


What does {{short_name}} in the description here do?

its a parameter later replaced with their name. its not needed if we append their name at the end

nija-at · 2025-02-19T15:07:19Z

contrib/templates/scala-job/databricks_template_schema.json

+    },
+    "artifacts_dest_path": {
+      "type": "string",
+      "description": "Destination path in Databricks where the JAR and other artifacts will be stored (e.g.; /Volumes/main/{{short_name}}/scala_job_test). Note: If you use Volumes, You must create a Volumes first and the path should start with /Volumes",


Can it be anything else other than a volume? If not, adjust the text.

I believe they can also upload to WSFS

Do we want to offer this option now? Have we tested it?

I just tested it again. This does work with upload not currently running: https://e2-dogfood.staging.cloud.databricks.com/?o=6051921418418893#job/96250107323625/run/23505530243179. The caveat though is it needs to start with / for example /Workspace/Users/garland.zhang@databricks.com/...

update: Oh the upload works but the run fails

Jar Libraries from /Workspace is not allowed on shared UC clusters.

nija-at · 2025-02-19T15:14:21Z

contrib/templates/scala-job/template/build.sbt.tmpl

+  val cp = (assembly / fullClasspath).value
+  cp filter { _.data.getName.matches("scala-.*") } // remove Scala libraries
+}
+


What do you think about adding this option as an option into sbt run.

if you mean running the assembly jar locally then yes

sorry I was not clear. I meant adding the "--add-opens" option as a JVM parameter for sbt run.

https://www.scala-sbt.org/1.x/docs/Forking.html#Forked+JVM+options

lennartkats-db

Great first version!!! I added feedback inline, happy to discuss further

lennartkats-db · 2025-02-20T16:25:15Z

contrib/templates/experimental/README.md

@@ -0,0 +1 @@
+This is where all experimental DABs go. Experimental is anything new that is not fully recommended yet for users to try out.


Could you move it up one directory and indicate in the template's README that it's "experimental"? contrib already indicates that it's "unauthorized" and I'd rather avoid adding another level

lennartkats-db · 2025-02-21T11:05:27Z

contrib/templates/experimental/scala-job/databricks_template_schema.json

+    },
+    "custom_workspace_host": {
+      "type": "string",
+      "default": "{{workspace_host}}",


Could you follow the convention of https://github.com/databricks/cli/blob/main/libs/template/templates/default-sql/databricks_template_schema.json#L2 here, where the URL is just printed and not editable? When it's editable, we can't really derive other properties from the current session anymore, like the current user name, catalog support, or cloud

lennartkats-db · 2025-02-21T11:06:54Z

contrib/templates/experimental/scala-job/databricks_template_schema.json

+    },
+    "artifacts_dest_path": {
+      "type": "string",
+      "description": "Destination path in Databricks where the JAR and other artifacts will be stored You can use /Workspace or /Volumes in Dedicated clusters but only /Volumes in Standard clusters.",


Given that some of these questions need some extra guidance, could you use the multi-line conventions as seen in the latest templates, i.e. https://github.com/databricks/cli/blob/main/libs/template/templates/default-sql/databricks_template_schema.json#L16

lennartkats-db · 2025-02-21T11:08:06Z

contrib/templates/experimental/scala-job/databricks_template_schema.json

+    },
+    "existing_cluster_id": {
+      "type": "string",
+      "description": "Enter the cluster id for an existing cluster or leave empty to create a new cluster",


Could we avoid this question?

We don't recommend using all-purpose clusters for production. For development, users are always free to use all-purpose and they can follow docs (or README.md guidance to do so)

We want to bias toward a minimal number of questions in the templates

lennartkats-db · 2025-02-21T11:09:06Z

contrib/templates/experimental/scala-job/databricks_template_schema.json

+      "type": "string",
+      "enum": ["Standard", "Dedicated"],
+      "default": "Standard",
+      "description": "Select cluster type: Dedicated or Standard. If Standard, is the JAR allowlisted by the admin for your workspace? (If not, inform admin: https://docs.databricks.com/en/data-governance/unity-catalog/manage-privileges/allowlist.html)",


Could we just standardize on 'standard' clusters and offer README.md and/or comment guidance on how to change this after the fact? That avoids the setup-time overhead in understanding and making this decision.

Re. informing the admin: that advice seems appropriate to give as part of the volume path question.

(Also, if we didn't use a 'standard' cluster then we wouldn't need to ask about a volume path, saving a lot of setup steps for customers.)

lennartkats-db · 2025-02-21T11:19:06Z

contrib/templates/experimental/scala-job/template/databricks.yml.tmpl

+      - source: {{template `jar_path` .}}
+
+resources:
+  jobs:


This should be in a separate resources/{{.project_name}}.job.yml.tmpl file as seen in https://github.com/databricks/cli/blob/main/libs/template/templates/default-python/template/%7B%7B.project_name%7D%7D/resources/%7B%7B.project_name%7D%7D.job.yml.tmpl#L1. That folder should also have a .gitkeep.

lennartkats-db · 2025-02-21T11:20:22Z

contrib/templates/experimental/scala-job/template/databricks.yml.tmpl

+            node_type_id: i3.xlarge  # Default instance type (can be changed)
+            autoscale:
+              max_workers: 2
+              min_workers: 2


Why not min 1, max 4?

lennartkats-db · 2025-02-21T11:21:06Z

contrib/templates/experimental/scala-job/template/project/plugins.sbt.tmpl

@@ -0,0 +1 @@
+addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "2.0.0")


File should have a comment at the top, also explaining what the project/ folder is about. (A separate README for that seems like overkill.)

lennartkats-db · 2025-02-21T11:21:18Z

contrib/templates/experimental/scala-job/template/src/main/scala/com/examples/Main.scala

@@ -0,0 +1,41 @@
+package com.examples


File should have a comment at the top

lennartkats-db · 2025-02-21T11:23:08Z

contrib/templates/experimental/scala-job/template/build.sbt.tmpl

+
+// to run with new jvm options, a fork is required otherwise it uses same options as sbt process
+run / fork := true
+run / javaOptions += "--add-opens=java.base/java.nio=ALL-UNNAMED"


This folder should also have a .gitignore file. Basis for that: https://github.com/databricks/cli/blob/main/libs/template/templates/default-python/template/%7B%7B.project_name%7D%7D/.gitignore

lennartkats-db · 2025-02-22T16:31:32Z

contrib/templates/experimental/scala-job/template/databricks.yml.tmpl

+# This is a Databricks asset bundle definition for {{.project_name}}.
+# See https://docs.databricks.com/dev-tools/bundles/index.html for documentation.
+bundle:
+  name: {{.project_name}}


Suggested change

name: {{.project_name}}

name: {{.project_name}}

uuid: {{bundle_uuid}}

All the newer templates include a uuid, which can help for diagnostic and metrics use cases.

lennartkats-db

A few more nits, otherwise this looks great!

contrib/templates/scala-job/README.md

contrib/templates/scala-job/databricks_template_schema.json

lennartkats-db · 2025-02-27T16:40:40Z

contrib/templates/scala-job/library/template_variables.tmpl

+{{- end}}
+
+{{ define `organization` -}}
+    com.examples


Nit: this should probably just be hardcoded? Since it's also hardcoded in the src/main.scala/com/examples path below?

theres another use in build.sbt.tmpl so this will just reduce the hardcodedness as much as possible

contrib/templates/scala-job/template/{{.project_name}}/README.md.tmpl

contrib/templates/scala-job/template/{{.project_name}}/databricks.yml.tmpl

Co-authored-by: Lennart Kats (databricks) <lennart.kats@databricks.com>

…md.tmpl Co-authored-by: Lennart Kats (databricks) <lennart.kats@databricks.com>

…cks.yml.tmpl Co-authored-by: Lennart Kats (databricks) <lennart.kats@databricks.com>

…md.tmpl Co-authored-by: Lennart Kats (databricks) <lennart.kats@databricks.com>

garlandz-db · 2025-02-28T10:56:49Z

seems like user still needs to pass in the VM options since Intellij's run button doesn't seem to pick up these java options..

garlandz-db · 2025-02-28T10:58:31Z

jenkins merge

Add dabs scala template job

5f79aec

garlandz-db force-pushed the dabs_test branch from d3817fc to 5f79aec Compare February 13, 2025 17:37

remove

919e989

garlandz-db commented Feb 13, 2025

View reviewed changes

nija-at reviewed Feb 14, 2025

View reviewed changes

garlandz-db added 10 commits February 14, 2025 11:57

update

58fb57a

.

4a95fa7

.

f73143e

move

126f47b

update scala version

54f5265

update scala version

42cfeba

update iwth example

c231994

format

0155fab

fix

4769a07

use variables

322f75a

nija-at reviewed Feb 14, 2025

View reviewed changes

garlandz-db added 8 commits February 14, 2025 16:21

changes

9826168

We add a whole damn unofficial release of dbconnect

a1e4260

clarify

4b2274f

add slf4j

bb3002a

clarify

8263434

fix this

8cd6917

fix this

310fff3

simplify my life

d0f52f8

garlandz-db requested a review from nija-at February 19, 2025 13:37

garlandz-db changed the title ~~Dabs test~~ Add DABs template for scala job Feb 19, 2025

nija-at reviewed Feb 19, 2025

View reviewed changes

contrib/templates/scala-job/template/local_dependencies/dbconnect-scala-16.2.0-dist.tar.gz Outdated Show resolved Hide resolved

nija-at reviewed Feb 19, 2025

View reviewed changes

garlandz-db added 2 commits February 20, 2025 10:27

Changes

9a63eb1

fix

feed77a

garlandz-db added 3 commits February 20, 2025 13:58

comments

c7ef5ff

fix wording

e0ee690

Move

ad28acf

lennartkats-db requested changes Feb 21, 2025

View reviewed changes

lennartkats-db reviewed Feb 22, 2025

View reviewed changes

Changes from feedback

8452f8e

garlandz-db requested a review from lennartkats-db February 24, 2025 15:15

fix

f0c4dbf

nija-at approved these changes Feb 27, 2025

View reviewed changes

lennartkats-db approved these changes Feb 27, 2025

View reviewed changes

garlandz-db and others added 10 commits February 28, 2025 02:13

Update contrib/templates/scala-job/README.md

c3ffd78

Co-authored-by: Lennart Kats (databricks) <lennart.kats@databricks.com>

Update contrib/templates/scala-job/README.md

6be2d4b

Co-authored-by: Lennart Kats (databricks) <lennart.kats@databricks.com>

Update contrib/templates/scala-job/databricks_template_schema.json

704453f

Co-authored-by: Lennart Kats (databricks) <lennart.kats@databricks.com>

Update contrib/templates/scala-job/databricks_template_schema.json

47f1364

Co-authored-by: Lennart Kats (databricks) <lennart.kats@databricks.com>

Update contrib/templates/scala-job/template/{{.project_name}}/README.…

7724ad8

…md.tmpl Co-authored-by: Lennart Kats (databricks) <lennart.kats@databricks.com>

Update contrib/templates/scala-job/template/{{.project_name}}/README.…

fdffc50

…md.tmpl Co-authored-by: Lennart Kats (databricks) <lennart.kats@databricks.com>

Update contrib/templates/scala-job/template/{{.project_name}}/databri…

972b5a6

…cks.yml.tmpl Co-authored-by: Lennart Kats (databricks) <lennart.kats@databricks.com>

Update contrib/templates/scala-job/template/{{.project_name}}/databri…

d2ad6b0

…cks.yml.tmpl Co-authored-by: Lennart Kats (databricks) <lennart.kats@databricks.com>

Update contrib/templates/scala-job/template/{{.project_name}}/README.…

587b9e3

…md.tmpl Co-authored-by: Lennart Kats (databricks) <lennart.kats@databricks.com>

add template organization variable to build.sbt.tmpl

52e1785

andrewnester merged commit 7679a84 into databricks:main Feb 28, 2025
1 check passed

		@@ -0,0 +1 @@
		This is where all experimental DABs go. Experimental is anything new that is not fully recommended yet for users to try out.

		@@ -0,0 +1 @@
		addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "2.0.0") No newline at end of file

	name: {{.project_name}}
	name: {{.project_name}}
	uuid: {{bundle_uuid}}

Add DABs template for scala job #66

Add DABs template for scala job #66

Uh oh!

Conversation

garlandz-db commented Feb 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lennartkats-db left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

garlandz-db commented Feb 13, 2025 •

edited

Loading