Skip to content

Script update-metrics using service scopus-person does not update all persons inside CRIS #508

@jorgeltd

Description

@jorgeltd

Describe the bug
DSpace-CRIS version: 2024.02.00

Running the script update-metrics using the service scopus-person does not add metrics for all persons who have the metadata person.identifier.scopus-author-id.

When I run the script on an instance with 100 persons who have valid identifiers, it only updates 57 of them.

The script has a parameter named --limit with a default value of 1750, but it seems irrelevant to the execution.

To Reproduce
Steps to reproduce the behavior:

  1. Create 100 person with a valid person.identifier.scopus-author-id inside system (Easily done with the csv importer, all persons can have the same scopus ID)
  2. Run the process update-metrics -service scopus-person.
  3. The script output log will say it only found 57 items:
2025-04-12 17:19:17.876 INFO update-metrics - 215 @ The script has started
2025-04-12 17:19:17.905 INFO update-metrics - 215 @ Update start
2025-04-12 17:19:43.715 INFO update-metrics - 215 @ Found 57 items
2025-04-12 17:19:43.715 INFO update-metrics - 215 @ Updated 57 metrics
2025-04-12 17:19:43.715 INFO update-metrics - 215 @ Update end
2025-04-12 17:19:43.717 INFO update-metrics - 215 @ The script has completed

Expected behavior
The script should update all persons with a Scopus ID in the system, while respecting the --limit parameter.

Some personal findings

The scopus-person service uses a loop to update the found items, but it seems there is a problem with the item iterator. In my case, the iterator always stops at 57 items.

private void performUpdateWithSingleFetches(MetricsExternalServices metricsServices,
Iterator<Item> itemIterator) throws SQLException {
handler.logInfo("Update start");
int count = 0;
int countFoundItems = 0;
int countUpdatedItems = 0;
while (itemIterator.hasNext()) {
Item item = itemIterator.next();
countFoundItems++;
final boolean updated = metricsServices.updateMetric(context, item, param);
if (updated) {
countUpdatedItems++;
}
metricsServices.setLastImportMetadataValue(context, item);
count++;
if (count == 20) {
context.commit();
count = 0;
}
}
context.commit();
getLogsFromMetricService(metricsServices);
handler.logInfo("Found " + countFoundItems + " items");
handler.logInfo("Updated " + countUpdatedItems + " metrics");
handler.logInfo("Update end");
}

I might be wrong in saying this but, the itemIterator is composed of two chained DiscoverResultIterators. Seems that the DiscoverResultIterator executes a Solr query to retrieve the items to update, using a paginator that fetches 20 items at a time. However, since the loop commits the found metrics every 20 items, the query results gets modified and now contain fewer items. Despite this, the paginator still considers the original number of objects found. As a result, there comes a point where the paginator's start value is greater than the number of items that still need to be updated.

I deleted the if statement that commits the results every 20 items, and all persons were updated in my DSpace instance.

This bug might also affect other services that rely on the same function.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions