Skip to content

Fix #454: Add end-to-end tests for the plugin-related SoftwareMetadata API#456

Open
sdruskat wants to merge 65 commits intorefactor/423-implement-public-apifrom
feature/454-e2e-test-plugin-api
Open

Fix #454: Add end-to-end tests for the plugin-related SoftwareMetadata API#456
sdruskat wants to merge 65 commits intorefactor/423-implement-public-apifrom
feature/454-e2e-test-plugin-api

Conversation

@sdruskat
Copy link
Contributor

No description provided.

@sdruskat sdruskat changed the base branch from develop to refactor/data-model November 21, 2025 10:12
@sdruskat sdruskat changed the base branch from refactor/data-model to refactor/423-implement-public-api November 21, 2025 10:17
@sdruskat sdruskat closed this Nov 21, 2025
@sdruskat sdruskat force-pushed the feature/454-e2e-test-plugin-api branch from f45d9b4 to 9be8041 Compare November 21, 2025 10:23
@sdruskat sdruskat deleted the feature/454-e2e-test-plugin-api branch November 21, 2025 10:24
@sdruskat sdruskat restored the feature/454-e2e-test-plugin-api branch November 21, 2025 10:24
@sdruskat sdruskat reopened this Nov 21, 2025
@sdruskat
Copy link
Contributor Author

Re-opening after rebase on correct branch.

The full code and structure is available at [hermes-plugin-git](https://github.com/softwarepub/hermes-plugin-git).
This tutorial will present the basic steps for writing additional plugins.

The full code and structure of a harvest plugin is available at [hermes-plugin-git](https://github.com/softwarepub/hermes-plugin-git).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made an Issue to update the plugin for later: softwarepub/hermes-plugin-git#3

# That would be able to replace _search_license_info.
# FIXME: Some licenses in valid_licenses["hits"]["hits"]["props"]["url"] are only http although
# https://spdx.org/licenses/license.json lists them in crossRef as https

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that should be possible.
We should fix the licenses to https.

if license_info is not None:
break
else:
# FIXME: Why is this only raised here and not always when license_info is None?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, it was originally because that was were the deposit step broke two years ago, when Zenodo implemented the need of the correct license identifier. But you are right, we should fix that.


@pytest.fixture
def sandbox_auth():
path = Path("./../auth.txt")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to tell users how to provide auth or skip the test for ci

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Especially for our testsuite ci

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, implementing a way to provide auth seems reasonable.
(But the test is already skipped if the file doesn't exist.)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should now be possible to pass the auth token when running the tests.
To do this pass the custom command line option --sandbox_auth your_auth_token to the pytest command.

We will have to get a ZenodoSanbox account and auth token to put it into the GitHub secrets and then adapt the test workflow.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe also test for 3-way-merges. And test strategies in different files would be great.

@SKernchen
Copy link
Contributor

SKernchen commented Mar 20, 2026

Findings from user test (tested in poetry environment on command line):
process step:

  • process step fails when one harvester failed in the harvest step
    (maybe process should not look into toml?)

deposit step:
error:

  • 'ld_list' object has no attribute 'startswith'

  • for file add log.info where to find file would be good

@notactuallyfinn
Copy link
Collaborator

Could you please provide more information on this error?

'ld_list' object has no attribute 'startswith'

@SKernchen
Copy link
Contributor

Could you please provide more information on this error?

'ld_list' object has no attribute 'startswith'

That is the only information provided for this error:
1774021073.908437: hermes.cli: ERROR | An error occurred during execution of deposit (Find details in './hermes.log') 1774021073.910363: hermes.cli: DEBUG | Original exception was: 'ld_list' object has no attribute 'startswith'
I do not know where it is coming from. Everything with startswith seemed fine to me 🤷‍♀️

@notactuallyfinn
Copy link
Collaborator

notactuallyfinn commented Mar 23, 2026

Okay, I think I found the error and hopefully fixed it. Please let me know if the commit didn't fix it.
(Can't verify, I never ran into that error.)

@SKernchen
Copy link
Contributor

Yes, this error is gone. Now I'm getting
1774263836.359856: hermes.cli: DEBUG | Original exception was: The given license in CodeMeta must be of type str. Licenses of type 'CreativeWork' are not supported.
Though the license is correct.

@SKernchen
Copy link
Contributor

Yes, this error is gone. Now I'm getting 1774263836.359856: hermes.cli: DEBUG | Original exception was: The given license in CodeMeta must be of type str. Licenses of type 'CreativeWork' are not supported. Though the license is correct.

Ok so this error comes from invenio.py line 216:

        if not isinstance(license_url, str):
            raise RuntimeError(
                "The given license in CodeMeta must be of type str. "
                "Licenses of type 'CreativeWork' are not supported."
            )

where license_url is in my case {"@id": "{license}"}, so this error triggers

@SKernchen
Copy link
Contributor

SKernchen commented Mar 23, 2026

After giving the correct input (using license_url["@id]) it worked, but if your case has different input, we should find the source for this problem.

@notactuallyfinn
Copy link
Collaborator

notactuallyfinn commented Mar 23, 2026

According to schema the values for the license property are URLs or CreativeWork objects.
You passed the URL as {"@id": your_url} which is not a value for URLs (that should be {"@value": your_url}, (shouldn't it?), which then gets replaced by your_url by the ld_container). Therefore {"@id": your_url} is handled as a value of type CreativeWork, which is rejected because the conversion is non-trivial.
That is the reason for the new error.
I'll add an exemption for objects of this format {"@id": your_url} for now.
(I'm aware that in tests the format {"@id": your_url} is required. That should/ will be changed.)

@SKernchen
Copy link
Contributor

Yeah but why is it passed like that, is the real question right.

"http://schema.org/license": [{"@id": "https://spdx.org/licenses/Apache-2.0"}]

is it in the expanded.json from the curate step

@notactuallyfinn
Copy link
Collaborator

notactuallyfinn commented Mar 23, 2026

I don't know why this was (and still is) a requirement for the harvest and process steps.
It worked previously because {"@id": value} was returned by ld_dicts as value, which is not good (the information that this is an id is lost). This was changed so that now an ld_dict with this data_dict ({"@id": value}) is returned.
But that leads to the problem that {"@id": your_url} is no longer returned as a string leading to the error you saw.
I think the options now are to either always try if the id of a CreativeWork object is a valid url for a license or just disallow the use of CreativeWork objects and expect the license to be passed as a value of a URL (i.e. {"@value": your_url}).

@led02
Copy link
Member

led02 commented Mar 23, 2026

It worked previously because {"@id": value} was returned by ld_dicts as value, which is not good (the information that this is an id is lost).

This information about the license being an ID is actually contained in the codemeta context: https://github.com/codemeta/codemeta/blob/13bb230d250775026691a102a24b97dea7669bf6/codemeta.jsonld#L44

@SKernchen
Copy link
Contributor

Nice now it works:
More points I found:

  • With the postprocess plugin (invenio_rdm) in hermes.toml removes license header and sorts postprocess above deposit (and add "," to every list)
  • harvest: breaks if one plugin is not found and don't run the others (if broken plugin is first)
  • process broke with 'str' object has no attribute 'append' with a codemeta with conflicts in it
  • deposit without --initial broke with Error while executing invenio_rdm: Found two different record ids or conflicting metadata.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants