-
Notifications
You must be signed in to change notification settings - Fork 243
Add a new script to run BigQuery queries with python client #694
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
ea02d17
c20bb97
fa9d63d
1a92778
1ad61f1
db478ef
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,13 +1,4 @@ | ||
| As of 2025, Google Bigquery allow publishing benchmark results, which was not the case earlier. | ||
|
|
||
| It's very difficult to find, how to create a database. | ||
| Databases are named "datasets". You need to press on `⋮` near project. | ||
|
|
||
| Create dataset `test`. | ||
| Go to the query editor and paste the contents of `create.sql`. | ||
| It will take two seconds to create a table. | ||
|
|
||
| Download Google Cloud CLI: | ||
| Download Google Cloud CLI and configure your project settings. You can skip this step if you are using [Cloud shell](https://docs.cloud.google.com/shell/docs/launching-cloud-shell): | ||
| ``` | ||
| wget --continue --progress=dot:giga https://dl.google.com/dl/cloudsdk/channels/rapid/downloads/google-cloud-cli-linux-x86_64.tar.gz | ||
| tar -xf google-cloud-cli-linux-x86_64.tar.gz | ||
|
|
@@ -16,7 +7,12 @@ source .bashrc | |
| ./google-cloud-sdk/bin/gcloud init | ||
| ``` | ||
|
|
||
| Load the data: | ||
| Create the dataset and table in BigQuery: | ||
| ``` | ||
| ./create.sh | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I followed the steps on l. 3-7 (all successful), then What do I need to fix this?
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Did |
||
| ``` | ||
|
|
||
| Load the data in the table: | ||
| ``` | ||
| wget --continue --progress=dot:giga 'https://datasets.clickhouse.com/hits_compatible/hits.csv.gz' | ||
| gzip -d -f hits.csv.gz | ||
|
|
@@ -26,13 +22,7 @@ command time -f '%e' bq load --source_format CSV --allow_quoted_newlines=1 test. | |
| ``` | ||
|
|
||
| Run the benchmark: | ||
|
|
||
| ``` | ||
| ./run.sh 2>&1 | tee log.txt | ||
|
|
||
| cat log.txt | | ||
| grep -P '^real|^Error' | | ||
| sed -r -e 's/^Error.*$/null/; s/^real\s*([0-9.]+)m([0-9.]+)s$/\1 \2/' | | ||
| awk '{ if ($2 != "") { print $1 * 60 + $2 } else { print $1 } }' | | ||
| awk '{ if ($1 == "null") { skip = 1 } else { if (i % 3 == 0) { printf "[" }; printf skip ? "null" : $1; if (i % 3 != 2) { printf "," } else { print "]," }; ++i; skip = 0; } }' | ||
| pip install google-cloud-bigquery | ||
| python3 run_queries.py > results.txt 2> log.txt | ||
| ``` | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,5 @@ | ||
| #!/bin/bash | ||
|
|
||
| bq mk --dataset test | ||
|
|
||
| bq query --use_legacy_sql=false < create.sql |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,68 @@ | ||
| from google.cloud import bigquery | ||
| from google.cloud.bigquery.enums import JobCreationMode | ||
|
|
||
| import sys | ||
| from typing import TextIO, Any | ||
| from datetime import datetime | ||
|
|
||
| def log(*objects: Any, sep: str = ' ', end: str = '\n', file: TextIO = sys.stderr, severity: str = 'INFO') -> None: | ||
| """ | ||
| Mimics the built-in print() function signature but prepends a | ||
| timestamp and a configurable severity level to the output. | ||
|
|
||
| Args: | ||
| *objects: The objects to be printed (converted to strings). | ||
| sep (str): Separator inserted between values, default a space. | ||
| end (str): String appended after the last value, default a newline. | ||
| file (TextIO): Object with a write(string) method, default sys.stdout. | ||
| severity (str): The log level (e.g., "INFO", "WARNING", "ERROR"). | ||
| """ | ||
| # 1. Prepare the standard print content | ||
| # Use an f-string to join the objects with the specified separator | ||
| message = sep.join(str(obj) for obj in objects) | ||
|
|
||
| # 2. Prepare the log prefix | ||
| timestamp = datetime.now().strftime("%Y-%m-%d %H:%M:%S") | ||
| prefix = f"[{timestamp}] [{severity.upper()}]: " | ||
|
|
||
| # 3. Combine the prefix and the message | ||
| full_message = prefix + message | ||
|
|
||
| # 4. Use the file.write method to output the content | ||
| # The 'end' argument is handled explicitly here | ||
| file.write(full_message + end) | ||
|
|
||
| # Ensure the buffer is flushed (important for file/stream output) | ||
| if file is not sys.stdout and file is not sys.stderr: | ||
| file.flush() | ||
|
|
||
|
|
||
| job_config = bigquery.QueryJobConfig() | ||
| job_config.use_query_cache = False | ||
| client = bigquery.Client( | ||
| default_job_creation_mode=JobCreationMode.JOB_CREATION_OPTIONAL | ||
| ) | ||
|
|
||
| file = open('queries.sql', 'r') | ||
| TRIES = 3 | ||
| for query in file: | ||
| query = query.strip() | ||
| print("[", end='') | ||
| for i in range(TRIES): | ||
| log(f"\n[{i}]: {query}") | ||
| try: | ||
| client_start_time = datetime.now() | ||
| results = client.query_and_wait(query, job_config=job_config) | ||
| client_end_time = datetime.now() | ||
|
|
||
| client_time = client_end_time - client_start_time | ||
| client_time_secs = client_time.total_seconds() | ||
| endstr = "],\n" if i == 2 else "," | ||
| print(f"{client_time_secs}", end=endstr) | ||
|
|
||
| log(f"Job ID: **{results.job_id}**") | ||
| log(f"Query ID: **{results.query_id}**") | ||
| log(f"Client time: **{client_time}**") | ||
|
|
||
| except Exception as e: | ||
| log(f"Job failed with error: {e}", severity="ERROR") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
L. 6: That should be
source ~/.bashrcplease