Skip to content

[#746] establish default order for replicas listed by an iRODSDataObject#815

Open
d-w-moore wants to merge 10 commits intoirods:mainfrom
d-w-moore:746.m
Open

[#746] establish default order for replicas listed by an iRODSDataObject#815
d-w-moore wants to merge 10 commits intoirods:mainfrom
d-w-moore:746.m

Conversation

@d-w-moore
Copy link
Copy Markdown
Collaborator

@d-w-moore d-w-moore commented Apr 15, 2026

The parent data object's modify_time and replica_status fields , as well as some others, actually pertain more to individual replicas.

#747 was an old PR meant to address the issue and contains much discussion as well.

On consideration, I think a minor release is the proper place to address this, and I'm doing it by

  • setting a default sorter that for data_objects.get ( or anytime running the iRODSDataObject constructor) sorts replicas of the data object first by the replica-"goodness" and secondly by reverse chronology of the replica modify_time (ie most recent first.) The replica at array position [0] will then determine the values of the fields discussed above.
  • deciding for the time being not to deprecate anything. yet. To me it makes natural sense to allow modify_time and replica_status to be accessed from the "head" object.

So, this PR replaces the old one, #747 , due to being new work and being based on top of source code conveniently ruff-formatted.

@korydraughn
Copy link
Copy Markdown
Contributor

  • setting a default sorter that for data_objects.get ( or anytime running the iRODSDataObject constructor) sorts replicas of the data object first by the replica-"goodness" and secondly by reverse chronology of the replica modify_time (ie most recent first.) The replica at array position [0] will then determine the values of the fields discussed above.

Keep in mind that for a minor release, we cannot change the behavior of any public APIs. If the default sorter results in the output being different, then that's a no go. The default sorter must mirror the original behavior.

  • deciding for the time being not to deprecate anything. yet.

What are you referring to in regard to deprecation?

To me it makes natural sense to allow modify_time and replica_status to be accessed from the "head" object.

What does this mean?

@d-w-moore
Copy link
Copy Markdown
Collaborator Author

  • setting a default sorter that for data_objects.get ( or anytime running the iRODSDataObject constructor) sorts replicas of the data object first by the replica-"goodness" and secondly by reverse chronology of the replica modify_time (ie most recent first.) The replica at array position [0] will then determine the values of the fields discussed above.

Keep in mind that for a minor release, we cannot change the behavior of any public APIs. If the default sorter results in the output being different, then that's a no go. The default sorter must mirror the original behavior.

  • deciding for the time being not to deprecate anything. yet.

What are you referring to in regard to deprecation?

We'd discussed in the old issue/PR convo's whether we might not just deprecate the iRODSDataObject fields like replica status and modify_time that are really just a reflection of the corresponding attribute of replicas[0]

To me it makes natural sense to allow modify_time and replica_status to be accessed from the "head" object.

What does this mean?

Just that .replicas[0].FIELD is mirrored in .FIELD, but that is pretty natural.
I guess we could actually just make them properties, rather than duplicating the data. But that is low priority.

@d-w-moore
Copy link
Copy Markdown
Collaborator Author

@korydraughn - I'm fine with changing the default order back to sorting on replica number for this minor release, even if it will allow attributes such as dataObject.modify_time to continue to misrepresent the "information advertised" .... It's but a minor code change to allow the application writer to sort differently if they so desire....

Comment thread irods/test/data_obj_test.py
Comment thread irods/data_object.py
Comment thread irods/test/data_obj_test.py Outdated
Comment thread irods/test/data_obj_test.py Outdated

def test_default_sorting_of_replicas__issue_647(self):
@unittest.skipIf(irods.version.version_as_tuple() < (4,), 'too soon for this test.')
def test_modified_default_sorting_of_replicas__issue_647(self):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like we need another test for the sorter option?

Alternatively, you can change the behavior of the test such that it covers PRC 3 and PRC 4. For example:

if irods.version.version_as_tuple() < (4,):
    data = self.sess.data_objects.get(data.path, sorter=<fn>)
else:
    data = self.sess.data_objects.get(data.path)

Doing that implies the name of the test would need to change as well.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're now testing both ways, with and without the replica_sort_function.

@korydraughn
Copy link
Copy Markdown
Contributor

We'd discussed in the old issue/PR convo's whether we might not just deprecate the iRODSDataObject fields like replica status and modify_time that are really just a reflection of the corresponding attribute of replicas[0]

Oh right. That still sounds like an acceptable approach.

Just that .replicas[0].FIELD is mirrored in .FIELD, but that is pretty natural. I guess we could actually just make them properties, rather than duplicating the data. But that is low priority.

I'm not yet convinced that is the proper approach. Feels like it should be handled via support functions which simplify the find-replica step.

Do instances of iRODSDataObject always have the list of replicas? If so, then they can sort/search the list of replicas for what they need. Perhaps that's how the iRODSDataObject constructor works in this PR?

@d-w-moore d-w-moore force-pushed the 746.m branch 2 times, most recently from 0cc7227 to a9c4e99 Compare April 27, 2026 13:38
# Ensure that one of the replicas is stale, to test proper sorting.
with data.open('a', **{kw.RESC_NAME_KW: newResc1}) as f:
f.write(b'.')
time.sleep(2)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's put a comment explaining that this sleep is to ensure the replicas on newResc1 and newResc2 have different modify times

Comment thread irods/data_object.py


_REPL_STATUSES = (1, 0, 2, 3, 4)
_REFERENCE_DATETIME = datetime.datetime(1970, 1, 1, 0, 0, 0, tzinfo=datetime.timezone.utc)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we use this instead to get this value?

datetime.fromtimestamp(0, timezone.utc)

https://docs.python.org/3/library/datetime.html#datetime.datetime.fromtimestamp

Comment thread irods/data_object.py
return "<{}.{} {}>".format(self.__class__.__module__, self.__class__.__name__, self.resource_name)


_REPL_STATUSES = (1, 0, 2, 3, 4)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a possible future improvement, it may be beneficial to represent replica statuses as a proper enumeration with names and all that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

3 participants