Skip to content

Conversation

@lindsay-stevens
Copy link
Contributor

@lindsay-stevens lindsay-stevens commented Dec 16, 2025

Closes #747

Why is this the best possible solution? Were any other approaches considered?

The majority of these commits are organising error/warning messages into the errors.py module, which is what I had in mind for the ErrorCode pattern which was added to pyxform a few months ago. After this PR about 30 error messages are organised there, but pyxform raises an error at about 170 locations and many of those have distinct messages.

The goal of this stage of organisation is to try and identify all the errors/warnings which deal with identifiers, so that the validation approach and message content can be more consistent with each other and with the documentation. Also to take the opportunity to move over error/warning messages that are already using the errors.Detail class or seemed easy to move. There are still some name-related errors that have not been moved (e.g. Audits must always be named 'audit.', etc), but they don't specifically deal with identifier grammar. There is also still a name-like regex check for the "app" parameter but that requires an android package name which has different rules to XML names.

What are the regression risks?

If library or script usages were looking for specific error strings then these changes would break them. But putting the errors into an enum/object should make it easier to avoid looking for specific strings - as is shown in the tests that use ErrorCode.

Does this change require updates to documentation? If so, please file an issue here and include the link below.

Maybe? #630 says ODK tools don't support having a colon as the first character in identifier names but that doesn't seem to be stated in the ODK XForms Specification. On one hand maybe it is not a hard requirement from a spec perspective, but on the other hand it is functionally required via pyxform validation and not being supported by ODK form clients.

Before submitting this PR, please make sure you have:

  • included test cases for core behavior and edge cases in tests
  • run python -m unittest and verified all tests pass
  • run ruff format pyxform tests and ruff check pyxform tests to lint code
  • verified that any code or assets from external sources are properly credited in comments

- organising error messages
- add test to verify error raised, since no other test seems to check
  for this error case or message.
- the modified tests pass anyway but would have asserted that each
  letter in the parameter was in the error e.g. ["M", "i", "s", "s"]
- added type check and tests for error/warning contains/not-contains
- organising error messages
- reworded error message to be consistent with the underlying XML
  "Name" rule/token, except that ODK form clients generally don't
  support having a colon for the first character, and not mentioning
  non-ascii unicode letters since that's probably too technical.
- preceding code applies the additional entity dataset name rules for
  not allowing names starting with a period or two underscores.
- organising error messages
- preceding code applies additional entity save_to name rules for
  not allowing reserved words or names starting two underscores.
- organising error messages
- the replaced validation regex is pretty much the XML name rule and
  the error message sounds like that was the intent (although the value
  is put into an attribute like `<value ref="VAL"/>` not a tag).
- organising error messages
- in survey_element.py the code that shows the exact offending character
  seems to have been added for the purpose of checking the form_name,
  which corresponds to the Survey.name attribute, which is used as the
  tag name of the primary instance element. When uploading a XLSForm to
  an online converter this name defaults to "data", and it is only set
  to the basename the XLS/X file name when using the CLI, or some other
  value when using pyxform as a library. This code would not apply to
  most other names since workbook_to_json (or other) code checks names
  before survey_element.py sees them. Also the regex
  INVALID_XFORM_TAG_REGEXP is different to the rule in is_xml_tag. So
  the per-character check was removed since it seems unlikely to be
  useful and is inconsistent with other name validation.
- not caught by relevant test since it only requires the substring
  "must also specify an image".
@lindsay-stevens lindsay-stevens marked this pull request as ready for review December 17, 2025 16:11
- it's possible to set the primary instance root node name via the
  setting "name" (which overrides other ways to set the name), so check
  the settings code path as well.
@lindsay-stevens
Copy link
Contributor Author

Realised there's a name setting so added check/tests for that. It's previously untested, only documented on xlsform.org, and kind of sneaks in there in via json_dict.update(settings) on xls2json.py L352.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Update identifier validation message now that leading colons aren't allowed

1 participant