Skip to content

No by argument when merging lists of data.tables #7676

@popovs

Description

@popovs

Hello, I'd like to request the incorporation of a by column when merging lists of data.tables. Here is a minimum reproducible example of the desired behavior:

dt1 <- data.table::data.table(idcol = c(1:10),
                              spp = c("a", "b", "a", "b", "c", "c", "d", "a", "d", "c"),
                              value = sample(1:100, 10))
dt2 <- data.table::data.table(idcol = c(1:10),
                              temp_c = sample(5:15, 10),
                              elevation_m = sample(0:500, 10))
dt3 <- data.table::data.table(idcol = c(1, 5, 2, 6, 8),
                              tree_spp = c("fir", "pine", "pine", "fir", "cedar"))

# Option 1 - Correct
# This works fine - we get the expected output
dt.x <- merge(dt1, dt2)
dt.y <- merge(dt.x, dt3, all = TRUE)

dt.y # All records merged, we've got NAs for cases where no tree_spp was recorded for a given record

# Option 2 - Incorrect
# This succeeds, but we get incorrect merging, as it recycles missing values
some_dts <- list(dt2, dt3)
merge(dt1, some_dts, by = "idcol") # Incorrect output - the tree_spp col recycles values instead of NAs - not merging by "idcol"

# Option 3 - Incorrect
# This also fails :( we get the recycling error
all_dts <- list(dt1, dt2, dt3)
data.table::cbindlist(all_dts) # Fails with error

# Ideal situation:
# merge(dt.x, list_of_dts, by = "idcol") # == dt.y
# OR
# data.table::cbindlist(all_dts, by = "idcol") # == dt.y

Thank you to those who maintain this excellent package.

EDIT: In meantime, I have discovered this solution, so perhaps this is low priority:

dt.y <- purrr::reduce(all_dts, merge, by = "idcol", all.x = TRUE)

I will leave this up in case anyone has a similar issue down the line.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions