-
Notifications
You must be signed in to change notification settings - Fork 101
Add DABs template for scala job #66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
d3817fc to
5f79aec
Compare
| "default": "2.12", | ||
| "order": 12 | ||
| }, | ||
| "scala_maintenance_version": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
spent too long on this.. theres no functions afaict in template language so I cant extract from user string :/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's going on? Can you explain?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't it okay to just have hardcoded defaults here anyway? If you do need to do things with versions, there are things you can do with regexes and other helpers, see e.g. https://github.com/databricks/cli/blob/main/libs/template/templates/dbt-sql/template/%7B%7B.project_name%7D%7D/profile_template.yml.tmpl#L8 and https://github.com/databricks/cli/blob/main/libs/template/templates/dbt-sql/template/%7B%7B.project_name%7D%7D/dbt_profiles/profiles.yml.tmpl#L20. Or you could ask ChatGPT for some help; just indicate they you're working with go txst/template.
| "jar_dest_path": { | ||
| "type": "string", | ||
| "description": "Destination path in Databricks where the JAR will be stored. Note: You must create a Volumes first if you plan to store the JAR there.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe better to call this "artifacts_dest_path" to make this more than just JARs.
Add an example to the description
| "description": "Destination path in Databricks where the JAR will be stored. Note: You must create a Volumes first if you plan to store the JAR there.", | ||
| "order": 3 | ||
| }, | ||
| "cluster_key": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| "cluster_key": { | |
| "existing_cluster_id": { |
| "default": "2.12", | ||
| "order": 12 | ||
| }, | ||
| "scala_maintenance_version": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's going on? Can you explain?
| "scala_version": { | ||
| "type": "string", | ||
| "description": "Scala version (e.g., 2.12). Run scala -version to find it. Note: Only support 2.12 and 2.13", | ||
| "default": "2.12", | ||
| "order": 12 | ||
| }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We know the Scala version they MUST use. Should this be offered as an option?
| name: {{.project_name}} | ||
|
|
||
| workspace: | ||
| host: {{workspace_host}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make this an option in the list. Default can be {{workspace_host}}
| build: sbt package && sbt assembly | ||
| path: . | ||
| files: | ||
| - source: {{template `jar_path` .}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this needs to include ${workspace.current_user.short_name}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this specifies the local path where the target is generated. we don't need the name here
|
|
||
| object Main { | ||
| def main(args: Array[String]): Unit = { | ||
| println("Hello, World foo123!") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| println("Hello, World foo123!") | |
| println("Hello, World!") |
contrib/templates/scala-job/template/local_dependencies/dbconnect-scala-16.2.0-dist.tar.gz
Outdated
Show resolved
Hide resolved
| "custom_workspace_host": { | ||
| "type": "string", | ||
| "default": "{{workspace_host}}", | ||
| "description": "Workspace host url", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add an example for clarity and state what will be the default if not specified.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems redundant though because the default already shows up in the text prompt
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. Ok.
| }, | ||
| "artifacts_dest_path": { | ||
| "type": "string", | ||
| "description": "Destination path in Databricks where the JAR and other artifacts will be stored (e.g.; /Volumes/main/{{short_name}}/scala_job_test). Note: If you use Volumes, You must create a Volumes first and the path should start with /Volumes", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does {{short_name}} in the description here do?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
its a parameter later replaced with their name. its not needed if we append their name at the end
| }, | ||
| "artifacts_dest_path": { | ||
| "type": "string", | ||
| "description": "Destination path in Databricks where the JAR and other artifacts will be stored (e.g.; /Volumes/main/{{short_name}}/scala_job_test). Note: If you use Volumes, You must create a Volumes first and the path should start with /Volumes", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can it be anything else other than a volume? If not, adjust the text.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe they can also upload to WSFS
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to offer this option now? Have we tested it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just tested it again. This does work with upload not currently running: https://e2-dogfood.staging.cloud.databricks.com/?o=6051921418418893#job/96250107323625/run/23505530243179. The caveat though is it needs to start with / for example /Workspace/Users/garland.zhang@databricks.com/...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
update: Oh the upload works but the run fails
Jar Libraries from /Workspace is not allowed on shared UC clusters.
| val cp = (assembly / fullClasspath).value | ||
| cp filter { _.data.getName.matches("scala-.*") } // remove Scala libraries | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about adding this option as an option into sbt run.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you mean running the assembly jar locally then yes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry I was not clear. I meant adding the "--add-opens" option as a JVM parameter for sbt run.
https://www.scala-sbt.org/1.x/docs/Forking.html#Forked+JVM+options
lennartkats-db
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great first version!!! I added feedback inline, happy to discuss further
| @@ -0,0 +1 @@ | |||
| This is where all experimental DABs go. Experimental is anything new that is not fully recommended yet for users to try out. | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you move it up one directory and indicate in the template's README that it's "experimental"? contrib already indicates that it's "unauthorized" and I'd rather avoid adding another level
| }, | ||
| "custom_workspace_host": { | ||
| "type": "string", | ||
| "default": "{{workspace_host}}", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you follow the convention of https://github.com/databricks/cli/blob/main/libs/template/templates/default-sql/databricks_template_schema.json#L2 here, where the URL is just printed and not editable? When it's editable, we can't really derive other properties from the current session anymore, like the current user name, catalog support, or cloud
| }, | ||
| "artifacts_dest_path": { | ||
| "type": "string", | ||
| "description": "Destination path in Databricks where the JAR and other artifacts will be stored You can use /Workspace or /Volumes in Dedicated clusters but only /Volumes in Standard clusters.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that some of these questions need some extra guidance, could you use the multi-line conventions as seen in the latest templates, i.e. https://github.com/databricks/cli/blob/main/libs/template/templates/default-sql/databricks_template_schema.json#L16
| }, | ||
| "existing_cluster_id": { | ||
| "type": "string", | ||
| "description": "Enter the cluster id for an existing cluster or leave empty to create a new cluster", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we avoid this question?
- We don't recommend using all-purpose clusters for production. For development, users are always free to use all-purpose and they can follow docs (or README.md guidance to do so)
- We want to bias toward a minimal number of questions in the templates
| "type": "string", | ||
| "enum": ["Standard", "Dedicated"], | ||
| "default": "Standard", | ||
| "description": "Select cluster type: Dedicated or Standard. If Standard, is the JAR allowlisted by the admin for your workspace? (If not, inform admin: https://docs.databricks.com/en/data-governance/unity-catalog/manage-privileges/allowlist.html)", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we just standardize on 'standard' clusters and offer README.md and/or comment guidance on how to change this after the fact? That avoids the setup-time overhead in understanding and making this decision.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re. informing the admin: that advice seems appropriate to give as part of the volume path question.
(Also, if we didn't use a 'standard' cluster then we wouldn't need to ask about a volume path, saving a lot of setup steps for customers.)
| - source: {{template `jar_path` .}} | ||
|
|
||
| resources: | ||
| jobs: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be in a separate resources/{{.project_name}}.job.yml.tmpl file as seen in https://github.com/databricks/cli/blob/main/libs/template/templates/default-python/template/%7B%7B.project_name%7D%7D/resources/%7B%7B.project_name%7D%7D.job.yml.tmpl#L1. That folder should also have a .gitkeep.
| node_type_id: i3.xlarge # Default instance type (can be changed) | ||
| autoscale: | ||
| max_workers: 2 | ||
| min_workers: 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not min 1, max 4?
| @@ -0,0 +1 @@ | |||
| addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "2.0.0") No newline at end of file | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
File should have a comment at the top, also explaining what the project/ folder is about. (A separate README for that seems like overkill.)
| @@ -0,0 +1,41 @@ | |||
| package com.examples | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
File should have a comment at the top
|
|
||
| // to run with new jvm options, a fork is required otherwise it uses same options as sbt process | ||
| run / fork := true | ||
| run / javaOptions += "--add-opens=java.base/java.nio=ALL-UNNAMED" No newline at end of file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This folder should also have a .gitignore file. Basis for that: https://github.com/databricks/cli/blob/main/libs/template/templates/default-python/template/%7B%7B.project_name%7D%7D/.gitignore
| # This is a Databricks asset bundle definition for {{.project_name}}. | ||
| # See https://docs.databricks.com/dev-tools/bundles/index.html for documentation. | ||
| bundle: | ||
| name: {{.project_name}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| name: {{.project_name}} | |
| name: {{.project_name}} | |
| uuid: {{bundle_uuid}} |
All the newer templates include a uuid, which can help for diagnostic and metrics use cases.
lennartkats-db
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few more nits, otherwise this looks great!
| {{- end}} | ||
|
|
||
| {{ define `organization` -}} | ||
| com.examples |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: this should probably just be hardcoded? Since it's also hardcoded in the src/main.scala/com/examples path below?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
theres another use in build.sbt.tmpl so this will just reduce the hardcodedness as much as possible
contrib/templates/scala-job/template/{{.project_name}}/README.md.tmpl
Outdated
Show resolved
Hide resolved
contrib/templates/scala-job/template/{{.project_name}}/README.md.tmpl
Outdated
Show resolved
Hide resolved
contrib/templates/scala-job/template/{{.project_name}}/README.md.tmpl
Outdated
Show resolved
Hide resolved
contrib/templates/scala-job/template/{{.project_name}}/databricks.yml.tmpl
Outdated
Show resolved
Hide resolved
contrib/templates/scala-job/template/{{.project_name}}/databricks.yml.tmpl
Outdated
Show resolved
Hide resolved
Co-authored-by: Lennart Kats (databricks) <lennart.kats@databricks.com>
Co-authored-by: Lennart Kats (databricks) <lennart.kats@databricks.com>
Co-authored-by: Lennart Kats (databricks) <lennart.kats@databricks.com>
Co-authored-by: Lennart Kats (databricks) <lennart.kats@databricks.com>
…md.tmpl Co-authored-by: Lennart Kats (databricks) <lennart.kats@databricks.com>
…md.tmpl Co-authored-by: Lennart Kats (databricks) <lennart.kats@databricks.com>
…cks.yml.tmpl Co-authored-by: Lennart Kats (databricks) <lennart.kats@databricks.com>
…cks.yml.tmpl Co-authored-by: Lennart Kats (databricks) <lennart.kats@databricks.com>
…md.tmpl Co-authored-by: Lennart Kats (databricks) <lennart.kats@databricks.com>
|
seems like user still needs to pass in the VM options since Intellij's run button doesn't seem to pick up these java options.. |
|
jenkins merge |
instructions
databricks bundle init https://github.com/garlandz-db/bundle-examples --template-dir contrib/templates/scala-jobdatabricks bundle deploy -t devdatabricks bundle runRun URL: https://e2-dogfood.staging.cloud.databricks.com/?o=6051921418418893#job/636237385400036/run/484758087823306