-
Notifications
You must be signed in to change notification settings - Fork 1k
Open
Description
Hello, I'd like to request the incorporation of a by column when merging lists of data.tables. Here is a minimum reproducible example of the desired behavior:
dt1 <- data.table::data.table(idcol = c(1:10),
spp = c("a", "b", "a", "b", "c", "c", "d", "a", "d", "c"),
value = sample(1:100, 10))
dt2 <- data.table::data.table(idcol = c(1:10),
temp_c = sample(5:15, 10),
elevation_m = sample(0:500, 10))
dt3 <- data.table::data.table(idcol = c(1, 5, 2, 6, 8),
tree_spp = c("fir", "pine", "pine", "fir", "cedar"))
# Option 1 - Correct
# This works fine - we get the expected output
dt.x <- merge(dt1, dt2)
dt.y <- merge(dt.x, dt3, all = TRUE)
dt.y # All records merged, we've got NAs for cases where no tree_spp was recorded for a given record
# Option 2 - Incorrect
# This succeeds, but we get incorrect merging, as it recycles missing values
some_dts <- list(dt2, dt3)
merge(dt1, some_dts, by = "idcol") # Incorrect output - the tree_spp col recycles values instead of NAs - not merging by "idcol"
# Option 3 - Incorrect
# This also fails :( we get the recycling error
all_dts <- list(dt1, dt2, dt3)
data.table::cbindlist(all_dts) # Fails with error
# Ideal situation:
# merge(dt.x, list_of_dts, by = "idcol") # == dt.y
# OR
# data.table::cbindlist(all_dts, by = "idcol") # == dt.y
Thank you to those who maintain this excellent package.
EDIT: In meantime, I have discovered this solution, so perhaps this is low priority:
dt.y <- purrr::reduce(all_dts, merge, by = "idcol", all.x = TRUE)
I will leave this up in case anyone has a similar issue down the line.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels