Skip to content

Add geothermal electricity extraction support#400

Open
bpulluta wants to merge 23 commits into
mainfrom
feature/geothermal-extraction-pr
Open

Add geothermal electricity extraction support#400
bpulluta wants to merge 23 commits into
mainfrom
feature/geothermal-extraction-pr

Conversation

@bpulluta
Copy link
Copy Markdown
Collaborator

@bpulluta bpulluta commented Mar 20, 2026

Overview

This PR adds geothermal electricity as a supported extraction technology in COMPASS, including the extraction schema and plugin configuration needed to discover, retrieve, and extract structured ordinance data from jurisdictions governing utility-scale geothermal electricity generation.

Additional improvements:

  • Clarifies the output directory policy for the CLI, ensuring users understand that --out_dir_exists must be set via the CLI
  • Fixes two bugs in the retrieval layer affecting all technologies.

New: Geothermal Electricity Extraction

Files added:

  • compass/extraction/geothermal_electricity/geothermal_schema.json — Defines 29 extractable features including setbacks, permitting, noise limits, zoning classifications, decommissioning, and drilling requirements.
  • compass/extraction/geothermal_electricity/geothermal_plugin_config.yaml — Configures search queries, website scoring keywords, heuristic filters, and document collection behavior tuned for geothermal electricity ordinances.

The schema follows the standard COMPASS one-shot extraction format and is compatible with the existing compass process pipeline.


CLI Improvements

  • The --out_dir_exists option for handling existing output directories is now available for users to overwrite, fail, or create a new directory automatically
  • Help text and logic ensure robust, user-friendly, and safe behavior for both interactive and automated runs.

Bug Fixes

Bug Fix 1 — PDF URLs with spaces failed to download

  • crawl4ai can return document URLs with raw spaces in the path. These are now percent-encoded before download, ensuring correct retrieval.
  • File: compass/scripts/download.py

Bug Fix 2 — Anchor text was never used in link scoring

  • Anchor text is now correctly populated and used in link scoring, improving document discovery.
  • File: compass/web/website_crawl.py

bpulluta and others added 10 commits March 17, 2026 12:11
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
…rmatting (#399)

* Initial plan

* Fix all review comments in skills documentation

Co-authored-by: bpulluta <115118857+bpulluta@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: bpulluta <115118857+bpulluta@users.noreply.github.com>
…eval

- percent-encode raw spaces in crawl4ai PDF source URLs before downstream use
- populate link text field from anchor text so ELMLinkScorer can score link labels
- add two regression tests covering both fixes
Copilot AI review requested due to automatic review settings March 20, 2026 23:49

This comment was marked as resolved.

@codecov-commenter

This comment was marked as resolved.

@bpulluta

This comment was marked as resolved.

This comment was marked as resolved.

bpulluta and others added 11 commits March 20, 2026 23:06
… test (#401)

* Initial plan

* Extract shared _sanitize_url to url_utils.py, simplify to space-only encoding, fix test robustness

Co-authored-by: bpulluta <115118857+bpulluta@users.noreply.github.com>
Agent-Logs-Url: https://github.com/NatLabRockies/COMPASS/sessions/ceb782b4-c312-41d1-b4eb-eccbbef67097

* fix failing test

* ruff error fix

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: bpulluta <115118857+bpulluta@users.noreply.github.com>
Bumps [release-drafter/release-drafter](https://github.com/release-drafter/release-drafter) from 7.0.0 to 7.1.1.
- [Release notes](https://github.com/release-drafter/release-drafter/releases)
- [Commits](release-drafter/release-drafter@3a7fb5c...139054a)

---
updated-dependencies:
- dependency-name: release-drafter/release-drafter
  dependency-version: 7.1.1
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* Add COMPASS workflow skills

* Added one-shot skills

* update one-shot SKILL.md structure and trigger contracts

* Initial plan (#398)

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>

* Fix skills documentation: correct paths, caching behavior, and tab formatting (#399)

* Initial plan

* Fix all review comments in skills documentation

Co-authored-by: bpulluta <115118857+bpulluta@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: bpulluta <115118857+bpulluta@users.noreply.github.com>

* renamed skills and fixed minor comments

* udpated skills Paul review march 26

---------

Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com>
Bumps [codecov/codecov-action](https://github.com/codecov/codecov-action) from 5 to 6.
- [Release notes](https://github.com/codecov/codecov-action/releases)
- [Changelog](https://github.com/codecov/codecov-action/blob/main/CHANGELOG.md)
- [Commits](codecov/codecov-action@v5...v6)

---
updated-dependencies:
- dependency-name: codecov/codecov-action
  dependency-version: '6'
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Copy link
Copy Markdown
Collaborator

@ppinchuk ppinchuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't dive deep into the schema yet, but there are a few code things I'd like to address before focusing on the schema.

I really like the idea of adding a configurable post-processing pipeline to the schema. I envision that we could build up a "library" of post-processing functions that are generic enough where different schemas can use them. Very cool idea and we should definitely use it.

I'm quite concerned about parsing the summary for drilling hour windows instead of relying on the LLM output. Are you sure this is required? Can you explain why you went with that choice @bpulluta? Is there not a way to shape the output of the LLM instead? I'm worried about the case where the summary has multiple windows, which we already saw in the sample documents you had

Comment thread compass/_cli/process.py
Comment on lines +72 to +77
if out_dir_exists is not None:
out_dir_policy = out_dir_exists
elif sys.stdin.isatty():
out_dir_policy = "prompt"
else:
out_dir_policy = "fail"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please move this to be inside of _resolve_out_dir_conflict. This helps keep things organized

Comment thread compass/_cli/process.py
Comment on lines +164 to +168
if not out_dir.exists():
return out_dir

if policy == "fail":
return out_dir
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please combine these into if not out_dir.exists() or policy == "fail"

Comment thread compass/_cli/process.py
Comment on lines +184 to +214
if not sys.stdin.isatty():
msg = (
"Cannot use out_dir_exists='prompt' in non-interactive mode. "
"Use one of: fail, increment, overwrite."
)
raise click.ClickException(msg)

create_incremented = click.confirm(
f"Output directory '{out_dir!s}' already exists. "
"Create a new incremented directory automatically?",
default=True,
)
if create_incremented:
new_out_dir = _next_versioned_directory(out_dir)
click.echo(f"Using incremented directory: {new_out_dir!s}")
return new_out_dir

overwrite = click.confirm(
f"Overwrite '{out_dir!s}' by deleting it and continuing?",
default=False,
)
if overwrite:
click.echo(f"Overwriting existing output directory: {out_dir!s}")
shutil.rmtree(out_dir)
return out_dir

msg = (
"Run cancelled. Please update out_dir in config, or rerun with "
"--out_dir_exists increment/overwrite."
)
raise click.ClickException(msg)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please move this logic into a separate function for code clarity

Comment thread compass/_cli/process.py
Comment on lines +229 to +233
while True:
candidate = out_dir.parent / f"{out_dir.name}_v{idx}"
if not candidate.exists():
return candidate
idx += 1
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a guard (e.g. if index goes over 1k or even 1M) to prevent infinite loops

Comment on lines +309 to +310
"""
Optional short description of the type of data being extracted
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please undo this change. The style of this repo is to start a docstring immediately after the triple quote

Comment on lines +489 to +529
@staticmethod
def _extract_times_from_text(text):
"""
Extract times from text as normalized 24-hour HH:MM
strings
"""
if not text:
return []
ampm_pattern = re.compile(
r"(?<!\d)(\d{1,2})(?::(\d{2}))?\s*(a\.?m\.?|p\.?m\.?)",
re.IGNORECASE,
)
hhmm_pattern = re.compile(r"\b([01]\d|2[0-4]):([0-5]\d)\b")
out = []
for match in ampm_pattern.finditer(text):
hour = int(match.group(1))
minute = int(match.group(2) or 0)
ampm = match.group(3).lower().replace(".", "")
if (
hour < 1
or hour > _MAGIC_HOUR_12
or minute < 0
or minute > _MAGIC_MINUTE_59
):
continue
if ampm == "am":
hour = 0 if hour == _MAGIC_HOUR_12 else hour
else:
hour = (
_MAGIC_HOUR_12
if hour == _MAGIC_HOUR_12
else hour + _MAGIC_HOUR_12
)
out.append(f"{hour:02d}:{minute:02d}")
out.extend(
[
f"{int(match.group(1)):02d}:{int(match.group(2)):02d}"
for match in hhmm_pattern.finditer(text)
]
)
return sorted(set(out))
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please move this to a regular module-level function instead of a static method

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And once you do, could you please add some tests for this function specifically? I think it's critical to ensure that it does what we expect it to do. Try to think of typical edge cases we might see in the LLM output for this field and use those for testing

Comment on lines +300 to +302
_MAGIC_NEIGHBOR_CHUNK_COUNT = 2
_MAGIC_HOUR_12 = 12
_MAGIC_MINUTE_59 = 59
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can be better about the names here

feature = (row.get("feature") or "").casefold()
source_field = step.get("source_field", "summary")
source_text = row.get(source_field) or ""
time_values = self._extract_times_from_text(source_text)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't love the idea of doing manual processing on the summary text instead of relying on the LLM output. The summary might have multiple pairs of start/end times that need to be interpreted using the text context instead of just heuristic guessing.

Can you explain why you chose not to go with the LLM outputs here?

Comment thread compass/web/url_utils.py
Comment on lines +6 to +12
def _sanitize_url(url):
"""Encode unsafe URL characters while preserving URL semantics"""
parsed = urlsplit(url)
path = quote(parsed.path, safe="/:@-._~!$&'()*+,;=")
query = quote(parsed.query, safe="=&;%:@-._~!$&'()*+,;/?:")
fragment = quote(parsed.fragment, safe="")
return urlunsplit((parsed.scheme, parsed.netloc, path, query, fragment))
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like that you moved this to be it's own utility function, but please make it public (no leading underscore) and document it properly

return {
_Link(
title=title,
text=title,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove this change. The title is already given in the link - there should be no reason to duplicate it as text, right?

@ppinchuk
Copy link
Copy Markdown
Collaborator

I didn't dive deep into the schema yet, but there are a few code things I'd like to address before focusing on the schema.

I really like the idea of adding a configurable post-processing pipeline to the schema. I envision that we could build up a "library" of post-processing functions that are generic enough where different schemas can use them. Very cool idea and we should definitely use it.

I'm quite concerned about parsing the summary for drilling hour windows instead of relying on the LLM output. Are you sure this is required? Can you explain why you went with that choice @bpulluta? Is there not a way to shape the output of the LLM instead? I'm worried about the case where the summary has multiple windows, which we already saw in the sample documents you had

OK on second look, it looks like the summary is only being used as a fallback. That might be ok actually. I think you can disregard the comments about the summary parsing

Comment thread tox.ini
Copy link
Copy Markdown
Collaborator

@ppinchuk ppinchuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few more comments specifically for the plugin and schema configs. Excited to get this merged in!

Comment on lines +13 to +14
- "{jurisdiction} geothermal conditional use permit"
- "{jurisdiction} geothermal special use permit"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you found these to be useful? I imagine if a permit application is found, it would just eventually get filtered out by our content filter, which seemingly makes these unused?

Comment on lines +15 to +18
- "{jurisdiction} geothermal drilling permit regulations"
- "{jurisdiction} geothermal resource development statute"
- "Where can I find the legal text for geothermal power plant zoning ordinances in {jurisdiction}?"
- "What is the specific legal information regarding zoning ordinances for geothermal electricity generation facilities in {jurisdiction}?"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keep in mind that the default number of docs that will be parsed is 5. If all your queries return a different URL, then the results from some of these will not be considered. It might be worthwhile to reduce these to the top 5 most important queries so that we don't eat up our search quota as fast, unless you have found that each one of these uniquely returns a document you are interested in.

Comment on lines +50 to +94
good_tech_keywords:
- "wellfield"
- "well field"
- "production well"
- "geothermal exploration"
- "geothermal generating"
- "geothermal generation"
- "geothermal power"
- "geothermal production"
- "geothermal project"
- "geothermal overlay zone"
- "geothermal power plant"
- "geothermal facility"
- "geothermal electric"
- "geothermal energy facility"
- "steam turbine"
- "binary cycle"
- "flash steam"
- "dry steam"
- "enhanced geothermal"
- "reservoir temperature"
- "brine"
- "reinjection well"
- "production zone"
- "geothermal resource"
- "geothermal production project"
- "geothermal drilling"
- "exploratory well"
- "injection well"
- "geothermal lease"
- "drilling permit"
- "plan of utilization"
- "known geothermal resource"
- "geothermal development"
- "geothermal well"
- "geothermal reservoir"
- "geothermal permit"
- "geothermal ordinance"
- "geothermal zoning"
- "code of ordinances"
- "land use code"
- "use table"
- "zoning ordinance"
- "special use permit"
- "conditional use permit"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple of notes here:

  • Most of these should be under good_tech_phrases. The reason for that is that keywords are a direct lookup, while phrases use ngram comparisons. The latter is more robust when you have spaces, since your phrase may be broken up by things like newlines (e.g. geothermal\nwell). I would move anything with a space into good_tech_phrases and keep good_tech_keywords restricted to single words.

  • Keep in mind that the way the heuristic works is that it checks for matches across good_tech_keywords, good_tech_acronyms, and good_tech_phrases. The matches across all three categories are totaled, and if you find at least two matches (by default), the document passes the heuristic check. Here is what that means in practice:

    • If you repeat words or phrases across categories, then a single match will actually count for two or more, which would automatically qualify the document.
    • If you keep more "generic" terms like "land use code", "use table", "zoning ordinance", etc, your heuristic will pass through non-geothermal ordinance documents.

    Both of these basically mean your heuristic becomes much less effective. While it's ok (and maybe even preferred) for the heuristic to be permissive, if it just lets most kinds of documents through, you will still have to do a lot of filtering afterwards, and in the case of webcrawling, you might miss the correct document entirely.

My suggestion for both of these is to keep the heuristic as focused as possible on geothermal electricity. You can kind of see how I did this for GHP's here:

heuristic_keywords:
good_tech_acronyms:
- "ghp"
- "gshp"
good_tech_keywords:
- "geoexchange"
- "geo-exchange"
- "wellfield"
- "direct-use"
- "closed-loop"
good_tech_phrases:
- "well field"
- "geothermal resource"
- "geothermal drilling"
- "geothermal well"
- "geothermal reservoir"
- "geothermal permit"
- "geothermal ordinance"
- "geothermal zoning"
- "closed loop"
- "open loop"
- "vertical loop"
- "horizontal loop"
- "heating and cooling"
- "space heating"
- "direct use"
- "district heating"
- "greenhouse heating"
- "residential geothermal"
- "heat pump"
- "geothermal heat pump"
- "ground source heat pump"
- "ground-source heat pump"
- "ground heat pump"
- "ground-coupled heat pump"
- "ground coupled heat pump"
- "earth-coupled heat pump"
- "earth-source heat pump"
- "closed loop ground source"
- "open loop ground source"
not_tech_words:
- "production well"
- "geothermal exploration"
- "geothermal generating"
- "geothermal generation"
- "geothermal power"
- "geothermal production"
- "geothermal project"
- "geothermal overlay zone"
- "geothermal power plant"
- "geothermal facility"
- "geothermal electric"
- "geothermal energy facility"
- "geothermal lease"
- "geothermal development"
- "steam turbine"
- "binary cycle"
- "flash steam"
- "dry steam"
- "enhanced geothermal"
- "reservoir temperature"
- "reinjection well"
- "production zone"
- "geothermal production project"
- "exploratory well"
- "injection well"

- "biomass"
- "cannabis"
- "cannabis cultivation"
- "commercial cannabis" No newline at end of file
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add

collection_prompts: True

at the bottom of this config. Without it, you will always pass the full document into the extraction context, which could explain why there were some rate limit issues as well as why the API cost for the ~200 counties was so high.

"description": "Prohibitions, bans, or moratoria on geothermal electricity exploration, drilling, facility siting, or deployment.",
"properties": {
"prohibitions": {
"description": "Extract currently effective bans, moratoria, or explicit prohibitions on geothermal electricity exploration, drilling, well development, plant construction, or facility siting. Include fracking or hydraulic-fracturing bans only when the ordinance explicitly uses them to ban, limit, or condition geothermal electricity activity. If there are carve-outs, exceptions, or conditional permitting paths that still allow the project, do not classify the rule as a prohibition."
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add something like "For moratoria, include start/end timing details in summary; if the ordinance states an end date that has already passed, omit this feature."

The idea is that we want details about the date range for the moratorium, and we don't want to report it if it's expired.

Comment on lines +304 to +383
"$instructions": {
"general": [
"Use direct text excerpts and quotes in summary whenever possible.",
"Each feature may appear at most once in outputs; do not emit multiple rows for the same feature. If multiple ordinance lines map to one feature, build a temporary map keyed by feature, aggregate all evidence clauses under that feature key, consolidate into one row, and keep the controlling most restrictive value in value while listing alternatives in summary.",
"Feature IDs are strict canonical keys. Do not output aliases, prefixed variants, or paraphrased feature names not present in the enum.",
"For any numeric feature, the summary must support the same requirement that produced value and units for that row. Never pair a numeric value from one clause with qualitative-only language from another clause that has no numeric threshold.",
"Standardize units in the units field using this schema's canonical vocabulary, while preserving ordinance-specific wording in summary.",
"Summary is the primary data carrier for all features in this schema; every row must have a non-null, non-empty string for summary.",
"Every row must include an explanation that briefly justifies why the cited summary evidence matches the selected feature under this schema's rules.",
"Emit only positively matched features. Never emit a row to explain why a feature does not apply.",
"The outputs array is a sparse long-form extraction table and does not need to contain every enumerated feature.",
"Tables, table footnotes, and labeled graphics count as valid ordinance evidence when they state the controlling requirement; preserve the relevant table cell or footnote context in summary.",
"Preserve exact local or state regulatory terminology in summary and, where applicable, in value. Do not rename district categories, permit names, or agency approvals into a preferred local template.",
"If ordinance text shows an amended or superseding requirement, extract the current operative requirement as written rather than a superseded historical value unless the ordinance text clearly keeps both rules active.",
"If text is suggestive but not explicit for the target feature, omit the feature.",
"If the text references a different chapter or external document for the controlling value but does not restate that value here, omit the feature instead of outputting blanks or placeholders.",
"If a provision is written for renewable or energy facilities generally, extract it only when the same provided evidence clearly ties that provision to geothermal electricity."
],
"setbacks": [
"Setbacks should be extracted as minimum separation distances.",
"Prefer numeric values with units such as 'feet' or 'meters'.",
"Setback rows must contain numeric value and non-null units; never emit qualitative-only setback rows.",
"If both general and condition-specific setbacks are provided, select the controlling most restrictive value for the geothermal electricity scenario and describe conditions in summary.",
"Do not infer one setback feature from another. A property-line setback is not a structures setback, and a roads setback is not a railroad setback, unless the ordinance text explicitly says so.",
"When one setback clause explicitly names multiple target features and provides one shared numeric threshold, emit one row per explicitly named feature using the same threshold and units.",
"Treat distances to official plan lines or specific plan lines for public highways as roads distance unless the ordinance explicitly defines them as property boundaries for the same requirement."
],
"numerical": [
"Numerical features in this schema are the eleven distance features plus noise, maximum height, and minimum lot size.",
"For noise, maximum height, and minimum lot size, extract only explicit numeric thresholds. If the ordinance gives only narrative standards or references other codes without restating the threshold, omit the feature.",
"For drilling schedule requirements, extract explicit start and end times into 'drilling start time' and 'drilling end time' using 24-hour HH:MM format and units 'HH:MM (24-hour)'."
],
"qualitative": [
"For qualitative features, output only when an explicit enforceable requirement is present.",
"For fencing, color requirements, lighting requirements, visual impact assessment, seismic monitoring plan, bond requirement, and decommissioning, prefer value=null and units=null unless the ordinance states a specific numeric threshold or an explicit list that should be preserved in value.",
"For required permits, always use an array of strings in value, even when only one permit is required.",
"For drilling start and end times, if both a broad 24-hour allowance and a narrower recurring non-emergency drilling window are present, extract the narrower recurring window and mention the 24-hour exception in summary.",
"For bond requirement, preserve formulas, engineer estimates, inflation adjustments, agency-set amounts, and similar non-fixed sizing logic in summary instead of forcing a numeric amount.",
"Do not map generic application materials, narrative findings, or descriptive recitals into these features unless the ordinance explicitly makes them enforceable requirements."
],
"districts": [
"For all district features, use an array of district or zone names and set units to null.",
"Use the exact district names or codes as they appear in the ordinance text whenever possible.",
"Use 'primary use districts' for by-right or principal-use authorization, 'special use districts' for conditional or discretionary authorization, 'accessory use districts' for accessory-only authorization, and 'prohibited use districts' for unconditional district-level bans.",
"Preserve the legal approval posture in summary, but keep only the district names in value.",
"If the jurisdiction is unzoned, statewide, or otherwise does not use district-style land-use categories for the operative geothermal rule, omit district features rather than inventing a district mapping.",
"If the ordinance does not explicitly list district names, omit the feature rather than paraphrasing a generic zoning statement."
],
"prohibitions": [
"Classify prohibitions as currently effective bans or moratoria on geothermal electricity exploration, drilling, well development, facility siting, or deployment.",
"If no active prohibition is found, omit the feature rather than using placeholder values.",
"Distinguish between complete prohibition and conditional permitting. Conditional permitting is not a ban.",
"Do not treat ordinary operational, environmental, design, monitoring, or permit conditions as prohibitions when the ordinance still allows the project to proceed subject to compliance.",
"A fracking ban belongs here only when the ordinance explicitly uses it to regulate geothermal electricity development."
]
},
"$qa_checklist": [
"Enforce uniqueness by feature with len(outputs) == len(unique(feature values)); if duplicates exist, merge or drop invalid rows until equality is true.",
"Every row must have non-null, non-empty strings for summary and explanation.",
"Explanation must explicitly tie summary evidence to the selected feature and must not contradict feature inclusion or exclusion criteria.",
"For every numeric feature row, require numeric value and non-null units.",
"For 'required permits', require value to be a non-empty array of strings and units to be null.",
"For drilling schedule rows, require both 'drilling start time' and 'drilling end time' when the ordinance states an explicit allowed window.",
"For drilling schedule rows with both 24-hour and narrower recurring windows, keep the narrower recurring window values and retain 24-hour language in summary only as an exception.",
"Reject any district row where summary language indicates conditional, special, discretionary, or permit-only approval but feature is 'primary use districts'.",
"Remove any numeric-feature row derived only from qualitative language when no numeric threshold is quoted.",
"If summary or explanation indicates the feature is not applicable, omit the row.",
"If a feature fails any check, omit it rather than returning a partial row."
],
"$qualitative_features": [
"fencing",
"color requirements",
"lighting requirements",
"visual impact assessment",
"seismic monitoring plan",
"bond requirement",
"decommissioning",
"prohibitions"
]
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, thank you for matching the structure of the GHP schema! It's nice to have some cohesion 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants