Skip to content

Conversation

@garlandz-db
Copy link
Contributor

@garlandz-db garlandz-db commented Feb 13, 2025

  • as per title
  • reason for this: we want to create an (experimental) template that will make it easy to start off creating a scala job with a locally assembled jar
    instructions
  1. databricks bundle init https://github.com/garlandz-db/bundle-examples --template-dir contrib/templates/scala-job
  2. databricks bundle deploy -t dev
  3. databricks bundle run

Run URL: https://e2-dogfood.staging.cloud.databricks.com/?o=6051921418418893#job/636237385400036/run/484758087823306

2025-02-14 12:50:21 "[dev garland_zhang] project_name" RUNNING 
2025-02-14 13:01:15 "[dev garland_zhang] project_name" TERMINATED SUCCESS 
Hello, World foo123!
Running in a Databricks cluster
Showing range ...
+---+
| id|
+---+
|  0|
|  1|
|  2|
+---+
Showing nyctaxi trips ...
+--------------------+---------------------+-------------+-----------+----------+-----------+-----------+
|tpep_pickup_datetime|tpep_dropoff_datetime|trip_distance|fare_amount|pickup_zip|dropoff_zip| testresult|
+--------------------+---------------------+-------------+-----------+----------+-----------+-----------+
| 2016-02-16 22:40:45|  2016-02-16 22:59:25|         5.35|       18.5|     10003|      11238|test: 11238|
| 2016-02-05 16:06:44|  2016-02-05 16:26:03|          6.5|       21.5|     10282|      10001|test: 10001|
| 2016-02-08 07:39:25|  2016-02-08 07:44:14|          0.9|        5.5|     10119|      10003|test: 10003|
| 2016-02-29 22:25:33|  2016-02-29 22:38:09|          3.5|       13.5|     10001|      11222|test: 11222|
| 2016-02-03 17:21:02|  2016-02-03 17:23:24|          0.3|        3.5|     10028|      10028|test: 10028|
| 2016-02-10 00:47:44|  2016-02-10 00:53:04|          0.0|        5.0|     10038|      10005|test: 10005|
| 2016-02-19 03:24:25|  2016-02-19 03:44:56|         6.57|       21.5|     10001|      11377|test: 11377|
| 2016-02-02 14:05:23|  2016-02-02 14:23:07|         1.08|       11.5|     10103|      10167|test: 10167|
| 2016-02-20 15:42:20|  2016-02-20 15:50:40|          0.8|        7.0|     10003|      10011|test: 10011|
| 2016-02-14 16:19:53|  2016-02-14 16:32:10|          1.3|        9.0|     10199|      10020|test: 10020|
+--------------------+---------------------+-------------+-----------+----------+-----------+-----------+

"default": "2.12",
"order": 12
},
"scala_maintenance_version": {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spent too long on this.. theres no functions afaict in template language so I cant extract from user string :/

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's going on? Can you explain?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't it okay to just have hardcoded defaults here anyway? If you do need to do things with versions, there are things you can do with regexes and other helpers, see e.g. https://github.com/databricks/cli/blob/main/libs/template/templates/dbt-sql/template/%7B%7B.project_name%7D%7D/profile_template.yml.tmpl#L8 and https://github.com/databricks/cli/blob/main/libs/template/templates/dbt-sql/template/%7B%7B.project_name%7D%7D/dbt_profiles/profiles.yml.tmpl#L20. Or you could ask ChatGPT for some help; just indicate they you're working with go txst/template.

Comment on lines 9 to 11
"jar_dest_path": {
"type": "string",
"description": "Destination path in Databricks where the JAR will be stored. Note: You must create a Volumes first if you plan to store the JAR there.",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe better to call this "artifacts_dest_path" to make this more than just JARs.

Add an example to the description

"description": "Destination path in Databricks where the JAR will be stored. Note: You must create a Volumes first if you plan to store the JAR there.",
"order": 3
},
"cluster_key": {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"cluster_key": {
"existing_cluster_id": {

"default": "2.12",
"order": 12
},
"scala_maintenance_version": {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's going on? Can you explain?

Comment on lines 45 to 50
"scala_version": {
"type": "string",
"description": "Scala version (e.g., 2.12). Run scala -version to find it. Note: Only support 2.12 and 2.13",
"default": "2.12",
"order": 12
},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We know the Scala version they MUST use. Should this be offered as an option?

name: {{.project_name}}

workspace:
host: {{workspace_host}}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make this an option in the list. Default can be {{workspace_host}}

build: sbt package && sbt assembly
path: .
files:
- source: {{template `jar_path` .}}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this needs to include ${workspace.current_user.short_name}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this specifies the local path where the target is generated. we don't need the name here


object Main {
def main(args: Array[String]): Unit = {
println("Hello, World foo123!")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
println("Hello, World foo123!")
println("Hello, World!")

@garlandz-db garlandz-db requested a review from nija-at February 19, 2025 13:37
@garlandz-db garlandz-db changed the title Dabs test Add DABs template for scala job Feb 19, 2025
"custom_workspace_host": {
"type": "string",
"default": "{{workspace_host}}",
"description": "Workspace host url",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add an example for clarity and state what will be the default if not specified.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems redundant though because the default already shows up in the text prompt

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Ok.

},
"artifacts_dest_path": {
"type": "string",
"description": "Destination path in Databricks where the JAR and other artifacts will be stored (e.g.; /Volumes/main/{{short_name}}/scala_job_test). Note: If you use Volumes, You must create a Volumes first and the path should start with /Volumes",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does {{short_name}} in the description here do?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

its a parameter later replaced with their name. its not needed if we append their name at the end

},
"artifacts_dest_path": {
"type": "string",
"description": "Destination path in Databricks where the JAR and other artifacts will be stored (e.g.; /Volumes/main/{{short_name}}/scala_job_test). Note: If you use Volumes, You must create a Volumes first and the path should start with /Volumes",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can it be anything else other than a volume? If not, adjust the text.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe they can also upload to WSFS

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to offer this option now? Have we tested it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just tested it again. This does work with upload not currently running: https://e2-dogfood.staging.cloud.databricks.com/?o=6051921418418893#job/96250107323625/run/23505530243179. The caveat though is it needs to start with / for example /Workspace/Users/garland.zhang@databricks.com/...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update: Oh the upload works but the run fails

Jar Libraries from /Workspace is not allowed on shared UC clusters.

val cp = (assembly / fullClasspath).value
cp filter { _.data.getName.matches("scala-.*") } // remove Scala libraries
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about adding this option as an option into sbt run.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you mean running the assembly jar locally then yes

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry I was not clear. I meant adding the "--add-opens" option as a JVM parameter for sbt run.

https://www.scala-sbt.org/1.x/docs/Forking.html#Forked+JVM+options

Copy link
Contributor

@lennartkats-db lennartkats-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great first version!!! I added feedback inline, happy to discuss further

@@ -0,0 +1 @@
This is where all experimental DABs go. Experimental is anything new that is not fully recommended yet for users to try out.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you move it up one directory and indicate in the template's README that it's "experimental"? contrib already indicates that it's "unauthorized" and I'd rather avoid adding another level

},
"custom_workspace_host": {
"type": "string",
"default": "{{workspace_host}}",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you follow the convention of https://github.com/databricks/cli/blob/main/libs/template/templates/default-sql/databricks_template_schema.json#L2 here, where the URL is just printed and not editable? When it's editable, we can't really derive other properties from the current session anymore, like the current user name, catalog support, or cloud

},
"artifacts_dest_path": {
"type": "string",
"description": "Destination path in Databricks where the JAR and other artifacts will be stored You can use /Workspace or /Volumes in Dedicated clusters but only /Volumes in Standard clusters.",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that some of these questions need some extra guidance, could you use the multi-line conventions as seen in the latest templates, i.e. https://github.com/databricks/cli/blob/main/libs/template/templates/default-sql/databricks_template_schema.json#L16

},
"existing_cluster_id": {
"type": "string",
"description": "Enter the cluster id for an existing cluster or leave empty to create a new cluster",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we avoid this question?

  • We don't recommend using all-purpose clusters for production. For development, users are always free to use all-purpose and they can follow docs (or README.md guidance to do so)
  • We want to bias toward a minimal number of questions in the templates

"type": "string",
"enum": ["Standard", "Dedicated"],
"default": "Standard",
"description": "Select cluster type: Dedicated or Standard. If Standard, is the JAR allowlisted by the admin for your workspace? (If not, inform admin: https://docs.databricks.com/en/data-governance/unity-catalog/manage-privileges/allowlist.html)",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we just standardize on 'standard' clusters and offer README.md and/or comment guidance on how to change this after the fact? That avoids the setup-time overhead in understanding and making this decision.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re. informing the admin: that advice seems appropriate to give as part of the volume path question.

(Also, if we didn't use a 'standard' cluster then we wouldn't need to ask about a volume path, saving a lot of setup steps for customers.)

- source: {{template `jar_path` .}}

resources:
jobs:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be in a separate resources/{{.project_name}}.job.yml.tmpl file as seen in https://github.com/databricks/cli/blob/main/libs/template/templates/default-python/template/%7B%7B.project_name%7D%7D/resources/%7B%7B.project_name%7D%7D.job.yml.tmpl#L1. That folder should also have a .gitkeep.

node_type_id: i3.xlarge # Default instance type (can be changed)
autoscale:
max_workers: 2
min_workers: 2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not min 1, max 4?

@@ -0,0 +1 @@
addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "2.0.0") No newline at end of file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

File should have a comment at the top, also explaining what the project/ folder is about. (A separate README for that seems like overkill.)

@@ -0,0 +1,41 @@
package com.examples
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

File should have a comment at the top


// to run with new jvm options, a fork is required otherwise it uses same options as sbt process
run / fork := true
run / javaOptions += "--add-opens=java.base/java.nio=ALL-UNNAMED" No newline at end of file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

# This is a Databricks asset bundle definition for {{.project_name}}.
# See https://docs.databricks.com/dev-tools/bundles/index.html for documentation.
bundle:
name: {{.project_name}}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
name: {{.project_name}}
name: {{.project_name}}
uuid: {{bundle_uuid}}

All the newer templates include a uuid, which can help for diagnostic and metrics use cases.

Copy link
Contributor

@lennartkats-db lennartkats-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few more nits, otherwise this looks great!

{{- end}}

{{ define `organization` -}}
com.examples
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: this should probably just be hardcoded? Since it's also hardcoded in the src/main.scala/com/examples path below?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

theres another use in build.sbt.tmpl so this will just reduce the hardcodedness as much as possible

garlandz-db and others added 10 commits February 28, 2025 02:13
Co-authored-by: Lennart Kats (databricks) <lennart.kats@databricks.com>
Co-authored-by: Lennart Kats (databricks) <lennart.kats@databricks.com>
Co-authored-by: Lennart Kats (databricks) <lennart.kats@databricks.com>
Co-authored-by: Lennart Kats (databricks) <lennart.kats@databricks.com>
…md.tmpl

Co-authored-by: Lennart Kats (databricks) <lennart.kats@databricks.com>
…md.tmpl

Co-authored-by: Lennart Kats (databricks) <lennart.kats@databricks.com>
…cks.yml.tmpl

Co-authored-by: Lennart Kats (databricks) <lennart.kats@databricks.com>
…cks.yml.tmpl

Co-authored-by: Lennart Kats (databricks) <lennart.kats@databricks.com>
…md.tmpl

Co-authored-by: Lennart Kats (databricks) <lennart.kats@databricks.com>
@garlandz-db
Copy link
Contributor Author

seems like user still needs to pass in the VM options since Intellij's run button doesn't seem to pick up these java options..

@garlandz-db
Copy link
Contributor Author

jenkins merge

@andrewnester andrewnester merged commit 7679a84 into databricks:main Feb 28, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants