Skip to content

Bug: endaq.ide.get_doc() returns data outside requested start/end when the window is out-of-bounds #233

@XanthanGum

Description

@XanthanGum

Summary
When calling endaq.ide.get_doc() with timezone-aware datetime.datetime values for start and end that do not overlap the recording interval of the IDE file, the function still returns a document with non-empty channels and data outside the requested window.
Expected behavior is an object where each channel contains 0 samples (i.e., the request yields empty data when the requested window is outside the recording interval).

Steps to Reproduce
Parse a filename to obtain timezone-aware start/end datetimes (AWST, UTC+08):

from datetime import datetime
from zoneinfo import ZoneInfo
import endaq

filename = "20251018_RC63_LH_SAM4_BATCH_1_202510180930_202510181830"
tz = ZoneInfo("Australia/Perth")

start_time = datetime.strptime(filename.split("_")[-2], "%Y%m%d%H%M").replace(tzinfo=tz)
end_time   = datetime.strptime(filename.split("_")[-1], "%Y%m%d%H%M").replace(tzinfo=tz)

print(start_time)  # 2025-10-18 09:30:00+08:00
print(end_time)    # 2025-10-18 18:30:00+08:00

Request a trimmed document using start/end that do not overlap the IDE’s actual data (the file below appears to contain data on Oct 19 UTC, not Oct 18 AWST):

doc = endaq.ide.get_doc(
    r"C:\...\20251018_RC63_LH_SAM4_BATCH_2_202510180930_202510181830.IDE",
    start=start_time,
    end=end_time,
)

Inspect the channel table:

endaq.ide.get_channel_table(doc)

Output includes non-empty channels and a warning:

...\endaq\ide\info.py:201: RuntimeWarning: divide by zero encountered in scalar divide
  rate = samples / (duration / 10 ** 6)

Example (excerpt):

channel name type units start end duration samples rate
8.0 X (2000g) Acceleration g 00:00.0008 00:00.0552 00:00.0543 2720 5000.78 Hz
8.1 Y (2000g) Acceleration g 00:00.0008 00:00.0552 00:00.0543 2720 5000.78 Hz
8.2 Z (2000g) Acceleration g 00:00.0008 00:00.0552 00:00.0543 2720 5000.78 Hz
80.0 X (40g) Acceleration g 00:00.0015 00:00.0506 00:00.0491 248 504.87 Hz
80.1 Y (40g) Acceleration g 00:00.0015 00:00.0506 00:00.0491 248 504.87 Hz
80.2 Z (40g) Acceleration g 00:00.0015 00:00.0506 00:00.0491 248 504.87 Hz
20.0 Internal Pressure Pressure Pa 00:00.0039 00:06.0112 00:06.0073 62 10.21 Hz
20.1 Internal Temperature Temperature °C 00:00.0039 00:06.0112 00:06.0073 62 10.21 Hz
65.0 X Quaternion q 00:00.0111 00:01.0122 00:01.0010 102 100.94 Hz
65.1 Y Quaternion q 00:00.0111 00:01.0122 00:01.0010 102 100.94 Hz
65.2 Z Quaternion q 00:00.0111 00:01.0122 00:01.0010 102 100.94 Hz
65.3 W Quaternion q 00:00.0111 00:01.0122 00:01.0010 102 100.94 Hz
65.4 Acc Quaternion q 00:00.0111 00:01.0122 00:01.0010 102 100.94 Hz
88.0 Latitude Location Degrees 01:33:55.0979 01:34:27.0322 00:31.0342 32 1.02 Hz
88.1 Longitude Location Degrees 01:33:55.0979 01:34:27.0322 00:31.0342 32 1.02 Hz
88.2 Time Unix Epoch s 01:33:55.0979 01:34:27.0322 00:31.0342 32 1.02 Hz
88.3 Ground Speed GNSS Speed m/s 01:33:55.0979 01:34:27.0322 00:31.0342 32 1.02 Hz
102.0 GNSS Time:00 Unix Epoch Reference s 01:33:59.0322 01:33:59.0322 00:00.0000 1 inf Hz

Retrieve primary sensor data and check timestamps:

df = endaq.ide.get_primary_sensor_data(doc=doc)
print(df.head())
print(len(df))

Example timestamps are UTC on Oct 19, not within the requested Oct 18 AWST window:

                                    X (2000g)  Y (2000g)  Z (2000g)
timestamp
2025-10-19 09:57:15.008941+00:00      3.853685  -1.895579  -1.323955
2025-10-19 09:57:15.009141042+00:00   3.853685  -1.998095  -0.949977
...
2720

Manually filter with the same start_time/end_time:

df_trimmed = df.loc[(df.index > start_time) & (df.index < end_time)]
print(df_trimmed.head())
print(len(df_trimmed))

This yields an empty DataFrame (as expected when the window is out-of-bounds):

Empty DataFrame
Columns: [X (2000g), Y (2000g), Z (2000g)]
Index: []
0

Expected Behavior
If start/end are outside the recording interval of the IDE file, endaq.ide.get_doc() should return a document whose channels each have 0 samples and no data, reflecting the requested empty interval.

Actual Behavior
endaq.ide.get_doc() returns a document containing non-empty channels and data outside the requested start/end window when the window does not overlap the recording interval.
The resulting get_primary_sensor_data(doc) returns data with timestamps outside the requested range.
A RuntimeWarning: divide by zero encountered in scalar divide appears in endaq\ide\info.py:201 (likely due to a channel with zero duration and one sample resulting in inf Hz), which may be related to edge handling but is not the primary issue.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingendaq.ideRelated to `endaq.ide`

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions