This question already has answers here:
Select the row with the maximum value in each group
(19 answers)
Closed 2 years ago.
so I have a data frame:
ID: YearMon: Var: Count:
1 012007 H 1
1 012007 D 2
1 022007 NA
1 032007 H 1
2 012007 H 1
2 022007 Na
2 022007 D 1
2 032007 NA
How would i go about getting just the max value for each unique ID for a certain YearMon? Ideally it would return:
1 012007 D 2
1 022007 NA
1 032007 H 1
2 012007 H 1
2 022007 D 1
2 032007 NA
Using plyr this should be easily achieved. This will filter by ID and YearMon and return the max value along with the ID and YearMon in a data frame.
library(plyr)
ddply( dat1 , .(ID,YearMon) ,function(x) {
Count = max( x$Count )
data.frame( Count=Count , Var=x[x$Count == Count,"Var"] )
})
In order to return all Columns:
df[ is.na( df$Count ) , "Count" ] <- -9999
df2 <- ddply(df, .(ID,YearMon) , function(x){
Count = max( x$Count )
index = which( x$Count == max( x$Count ))
y <- x[ index ,]
data.frame( y )
})
df2[ df2$Count == -9999, "Count" ] <- NA
This will return your indexing values back to NA as well.
Using data.table, if you have a data table called dt, you can first calculate the max of Count by group, and then just keep the rows where Count is equal to the max for that group:
newdt <- dt[, max.count := max(Count), by=.(ID, YearMon)][Count==max.count,.(ID, YearMon, Var, Count)]
library(dplyr)
dt %>%
group_by(ID, YearMon) %>%
slice(Count %>% which.max)
Lets not forget about aggregate!
#####Clean up data. You need to change your grouping variables to factors and data needs to be numeric####
dat1$Var.[dat1$Var.==1]=""
dat1$Count.<-as.numeric(dat1$Count.)
dat1$ID.<-as.factor(dat1$ID.)
dat1$YearMon.<-as.factor(dat1$YearMon.)
dat1<-dat1[,-3] ###Lets get rid of the Var column as you're not using it.
aggregate(. ~ ID.+YearMon.,data = dat1,FUN=max ) #### Use aggregate. Simple and short code
ID. YearMon. Count.
1 1 12007 2
2 2 12007 1
3 2 22007 1
4 1 32007 1
Another data.table option using if/else. We convert the 'data.frame' to 'data.table' (setDT(df1)), grouped by 'ID', 'YearMon', if all of the 'Count' values are 'NA' we return the Subset of Data.table (.SD) or else we get the index of maximum value of 'Count' and subset the data.table (.SD[which.max(Count)]).
library(data.table)
setDT(df1)[, if(all(is.na(Count))) .SD else .SD[which.max(Count)],.(ID, YearMon)]
# ID YearMon Var Count
#1: 1 12007 D 2
#2: 1 22007 NA
#3: 1 32007 H 1
#4: 2 12007 H 1
#5: 2 22007 D 1
#6: 2 32007 NA
Or another option would be to concatenate the index from which.max and the rows which have all 'NA' for 'Count' grouped by the variables, get the row index (.I) and use that to subset the 'data.table'.
setDT(df1)[df1[, .I[c(which.max(Count), all(is.na(Count)))], .(ID, YearMon)]$V1]
# ID YearMon Var Count
#1: 1 12007 D 2
#2: 1 22007 NA
#3: 1 32007 H 1
#4: 2 12007 H 1
#5: 2 22007 D 1
#6: 2 32007 NA
Or we replace the NA by a very small number, use which.max and subset
setDT(df1)[, .SD[which.max(replace(Count, is.na(Count),-Inf ))], .(ID, YearMon)]
data
df1 <- structure(list(ID = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L),
YearMon = c(12007L,
12007L, 22007L, 32007L, 12007L, 22007L, 22007L, 32007L), Var = c("H",
"D", "", "H", "H", "", "D", ""), Count = c(1L, 2L, NA, 1L, 1L,
NA, 1L, NA)), .Names = c("ID", "YearMon", "Var", "Count"),
class = "data.frame", row.names = c(NA, -8L))
Related
I have data like this:
stim1 stim2
1: 2 3
2: 1 3
3: 2 1
4: 1 2
5: 3 1
structure(list(stim1 = c(2L, 1L, 2L, 1L, 3L),
stim2 = c(3L, 3L, 1L, 2L, 1L)),
row.names = c(NA, -10L), class = c("data.table", "data.frame"))
My objective is to add two columns: one for 'stim1' and one for 'stim2'. For each row of both columns, I want to find the previous occurrence of its value, in either column, and then grab the value in the opposite column.
For example, on row 3 'stim1' is 2. The previous occurrence of 2 is in 'stim1' on row 1. The value in the other column of that row is 3. So Prev1[3] is 3.
Another example: On row 4 'stim1' is 1. The previous occurrence of 1 is in 'stim2' on row 3. The value in the other column on that row is 2. So Prev1[4] is 2.
Desired output:
stim1 stim2 Prev1 Prev2
1: 2 3 <NaN> <NaN>
2: 1 3 <NaN> 2
3: 2 1 3 3
4: 1 2 2 1
5: 3 1 1 2
The tricky thing is that the OP wants to find the previous occurrence of a value in either column.
Therefore, the idea is to reshape the data into long format and to find the matching rows by aggregating in a non-equi self join.
library(data.table)
long <- melt(DT, measure.vars = patterns("^stim"), value.name = "stim")[
, rn := rowid(variable)][
, opposite := rev(stim), keyby = rn][]
long[, prev := long[long, on = c("stim", "rn < rn"),
.(max(x.rn), x.opposite[which.max(x.rn)]), by = .EACHI]$V2][]
dcast(long, rn ~ rowid(rn), value.var = c("stim", "prev"))
rn stim_1 stim_2 prev_1 prev_2
1: 1 2 3 NA NA
2: 2 1 3 NA 2
3: 3 2 1 3 3
4: 4 1 2 2 1
5: 5 3 1 1 2
Explanation
Reshape DT to long format.
Create an additional column rn which identifies the row numbers in the original dataset DT using rowid(variable).
Create an additional column opposite which contains the values of the opposite column. In long format this means to reverse the order of values within each rn group.
Now, join long with itself. The non-equi join condition is looking for all occurrences of the current stim value in rows before the current row. As there might be more than one match, aggregating by max(rn) within the .EACHI groups picks the row number of the previous occurrence of the value as well as the corresponding opposite value. So,
long[long, on = c("stim", "rn < rn"), .(max(x.rn), x.opposite[which.max(x.rn)]), by = .EACHI]
returns
stim rn V1 V2
1: 2 1 NA NA
2: 3 1 NA NA
3: 1 2 NA NA
4: 3 2 1 2
5: 2 3 1 3
6: 1 3 2 3
7: 1 4 3 2
8: 2 4 3 1
9: 3 5 2 1
10: 1 5 4 2
Create an additional column prev in long which contains the previous opposite value V2.
Finally, reshape long back to wide format, using both measure columns stim and prev.
Edit: Alternative solution
In case DT contains more columns that just stim1 and stim2, DT can be updated by reference, alternatively:
long <- melt(DT, measure.vars = patterns("^stim"), value.name = "stim")[
, rn := rowid(variable)][
, opposite := rev(stim), keyby = rn][]
DT[, c("prev1", "prev2") := dcast(
long[long, on = c("stim", "rn < rn"),
.(max(x.rn), x.opposite[which.max(x.rn)]), by = .EACHI],
rn ~ rowid(rn), value.var = "V2")[, rn := NULL]][]
stim1 stim2 prev1 prev2
1: 2 3 NA NA
2: 1 3 NA 2
3: 2 1 3 3
4: 1 2 2 1
5: 3 1 1 2
Data
library(data.table)
DT <- data.table(stim1 = c(2L, 1L, 2L, 1L, 3L),
stim2 = c(3L, 3L, 1L, 2L, 1L))
A quick helper function to iterate through the data:
func <- function(mtx) {
na <- mtx[1][NA]
c(NA, sapply(seq_len(nrow(mtx))[-1], function(ind) {
v <- mtx[ind,1] ; s <- seq_len(ind-1)
m <- cbind(v == mtx[s,1], v == mtx[s,2])
if (any(m)) {
m <- which(m, arr.ind = TRUE)
row <- which.max(m[,1])
mtx[m[row,1], m[row,2] %% 2 + 1]
} else na
}))
}
Demonstration:
dat[, Prev1 := func(cbind(stim1, stim2)) ][, Prev2 := func(cbind(stim2, stim1)) ]
# stim1 stim2 Prev1 Prev2
# <int> <int> <int> <int>
# 1: 2 3 NA NA
# 2: 1 3 NA 2
# 3: 2 1 3 3
# 4: 1 2 2 1
# 5: 3 1 1 2
Alternative, using zoo::rollapply:
func2 <- function(mtx) {
na <- mtx[1][NA]
if (!is.matrix(mtx)) return(na) # we're on the first row
v <- mtx[nrow(mtx),1] ; s <- seq_len(nrow(mtx)-1)
m <- cbind(v == mtx[s,1], v == mtx[s,2])
if (any(m)) {
m <- which(m, arr.ind = TRUE)
row <- which.max(m[,1])
mtx[m[row,1], m[row,2] %% 2 + 1]
} else na
}
dat[, Prev1 := zoo::rollapplyr(.SD, .N, FUN = func2, by.column = FALSE, partial = TRUE),
.SDcols = c("stim1", "stim2")
][, Prev2 := zoo::rollapplyr(.SD, .N, FUN = func2, by.column = FALSE, partial = TRUE),
.SDcols = c("stim2", "stim1") ]
It's not shorter, and in fact is slower (with a 5-row dataset), but if you prefer to think of this in a rolling fashion, this produces the same results. (It's possible the newer slider package might be clearer, faster, or neither compared with this.)
Note:
I assign na as a class-specific NA (there are at least six types of NA). I do this defensively: if there is at least one match, then the remainder of the NA values will be coerced into the correct class; however, if there are no matches, then the class returned by func will be logical which may not be the same as the original data, and data.table will complain.
Just in case there was interest in a tidyverse approach (probably not as fast!). You could add row numbers to your data.frame, and put it into long form. Then, grouping by each value, get the previous row number and stim to reference in Prev. With a left_join you can obtain the appropriate value for Prev.
library(tidyverse)
df <- mutate(as.data.frame(df), rn = row_number())
df_long <- pivot_longer(df,
cols = -rn,
names_to = "stim",
names_pattern = "stim(\\d+)",
names_transform = list(stim = as.numeric))
df_long %>%
group_by(value) %>%
mutate(match_rn = lag(rn), match_stim = 3 - lag(stim)) %>%
left_join(df_long, by = c("match_rn" = "rn", "match_stim" = "stim")) %>%
pivot_wider(id_cols = rn,
names_from = stim,
values_from = value.y,
names_prefix = "Prev") %>%
right_join(df) %>%
arrange(rn)
Output
rn Prev1 Prev2 stim1 stim2
<int> <int> <int> <int> <int>
1 1 NA NA 2 3
2 2 NA 2 1 3
3 3 3 3 2 1
4 4 2 1 1 2
5 5 1 2 3 1
Here is another option:
setDT(DT)[, rn := .I]
dt1 <- DT[DT, on=.(stim1, rn<rn), mult="last", .(x.rn, v=x.stim2)]
dt2 <- DT[DT, on=.(stim2=stim1, rn<rn), mult="last", .(x.rn, v=x.stim1)]
DT[, Prev1 := fcoalesce(fifelse(dt1$x.rn > dt2$x.rn, dt1$v, dt2$v), dt1$v, dt2$v)]
#have not flipped everything and seems to work for this minimal example, pls let me know if there are cases where Prev2 is wrong
dt1 <- DT[DT, on=.(stim2, rn<rn), mult="last", .(x.rn, v=x.stim1)]
dt2 <- DT[DT, on=.(stim1=stim2, rn<rn), mult="last", .(x.rn, v=x.stim2)]
DT[, Prev2 := fcoalesce(fifelse(dt1$x.rn > dt2$x.rn, dt1$v, dt2$v), dt1$v, dt2$v)]
output:
stim1 stim2 rn Prev1 Prev2
1: 2 3 1 NA NA
2: 1 3 2 NA 2
3: 2 1 3 3 3
4: 1 2 4 2 1
5: 3 1 5 1 2
data:
library(data.table)
DT <- structure(list(stim1 = c(2L, 1L, 2L, 1L, 3L),
stim2 = c(3L, 3L, 1L, 2L, 1L)),
row.names = c(NA, -10L), class = c("data.table", "data.frame"))
I have a dataset with names, dates, and several categorical columns. Let's say
data <- data.table(name = c('Anne', 'Ben', 'Cal', 'Anne', 'Ben', 'Cal', 'Anne', 'Ben', 'Ben', 'Ben', 'Cal'),
period = c(1,1,1,1,1,1,2,2,2,3,3),
category = c("A","A","A","B","B","B","A","B","A","B","A"))
Which looks like this:
name period category
Anne 1 A
Ben 1 A
Cal 1 A
Anne 1 B
Ben 1 B
Cal 1 B
Anne 2 A
Ben 2 B
Ben 2 A
Ben 3 A
Cal 3 B
I want to compute, for each period, how many names were present in the past period, for every group of my categorical variables. The output should be as follows:
period category recurrence_count
2 A 2 # due to Anne and Ben being on A, period 1
2 B 1 # due to Ben being on B, period 1
3 A 1 # due to Ben being on A, period 2
3 B 0 # no match from B, period 2
I am aware of the .I and .GRP operators in data.table, but I have no idea how to write the notion of 'next group' in the j entry of my statement. I imagine something like this might be a reasonable path, but I can't figure out the correct syntax:
data[, .(recurrence_count = length(intersect(name, name[last(.GRP)]))), by = .(category, period)]
You can first summarize your data by category and period.
previous_period_names <- data[, .(names = list(name)), .(category, period)]
previous_period_names[, next_period := period + 1]
Join your summary with your original data.
data[previous_period_names, names := i.names, on = c('period==next_period')]
Now count how many names you see the name in the summarized names
data[, .(recurrence_count = sum(name %in% unlist(names))), by = .(period, category)]
Another data.table alternative. For rows that can have a previous period (period != 1), create such a variable (prev_period := period - 1).
Join original data with a subset that has values for 'prev_period' (data[data[!is.na(prev_period)]). Join on 'category', 'period = prev_period' and 'name'.
In the resulting data set, for each 'period' and 'category' (by = .(period = i.period, category)), count the number of names from original data (x.name) that had a match with previous period (length(na.omit(x.name))).
data[period != 1, prev_period := period - 1]
data[data[!is.na(prev_period)], on = c("category", period = "prev_period", "name"),
.(category, i.period, x.name)][
, .(n = length(na.omit(x.name))), by = .(period = i.period, category)]
# period category n
# 1: 2 A 2
# 2: 2 B 1
# 3: 3 B 1
# 4: 3 A 0
One option in base R is to split the 'data' by 'category', then loop over the list (lapply), use Reduce with intersect on the splitted 'name' by 'period' with accumulate as TRUE, get the lengths of the list, create a data.frame with the unique elements of 'period' and use Map to create the 'category' from the names of the list output, rbind the list of data.frame into a single dataset
library(data.table)
lst1 <- lapply(split(data, data$category), function(x)
data.frame(period = unique(x$period)[-1],
recurrence_count = lengths(Reduce(intersect,
split(x$name, x$period), accumulate = TRUE)[-1])))
rbindlist(Map(cbind, category = names(lst1), lst1))[
order(period), .(period, category, recurrence_count)]
# period category recurrence_count
#1: 2 A 2
#2: 2 B 1
#3: 3 A 1
#4: 3 B 0
Or using the same logic within data.table, grouped by 'category, do the split of 'name' by 'period' and apply the Reduce with intersect
setDT(data)[, .(period = unique(period),
recurrence_count = lengths(Reduce(intersect,
split(name, period), accumulate = TRUE))), .(category)][duplicated(category)]
# category period recurrence_count
#1: A 2 2
#2: A 3 1
#3: B 2 1
#4: B 3 0
Or similar option in tidyverse
library(dplyr)
library(purrr)
data %>%
group_by(category) %>%
summarise(reccurence_count = lengths(accumulate(split(name, period),
intersect)), period = unique(period), .groups = 'drop' ) %>%
filter(duplicated(category))
# A tibble: 4 x 3
# category reccurence_count period
# <chr> <int> <int>
#1 A 2 2
#2 A 1 3
#3 B 1 2
#4 B 0 3
data
data <- structure(list(name = c("Anne", "Ben", "Cal", "Anne", "Ben",
"Cal", "Anne", "Ben", "Ben", "Ben", "Cal"), period = c(1L, 1L,
1L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L), category = c("A", "A", "A",
"B", "B", "B", "A", "B", "A", "A", "B")), class = "data.frame",
row.names = c(NA,
-11L))
A data.table option
setDT(df)[
,
{
u <- split(name, period)
data.table(
period = unique(period)[-1],
recurrence_count = lengths(
Map(
intersect,
head(u, -1),
tail(u, -1)
)
)
)
},
category
]
gives
category period recurrence_count
1: A 2 2
2: A 3 1
3: B 2 1
4: B 3 0
Given this data, the first 4 columns (rowid, order, line, special), I need to create a column, numSpecial as such:
rowid order line special numSpecial
1 A 01 X 1
2 B 01 0
3 B 02 X 2
4 B 03 X 2
5 C 01 X 1
6 C 02 0
Where numSpecial is determined by summing the number of times for each order that is special (value = X), given that order-line is special itself, otherwise its 0.
I first tried adding a column that simply concats 'order' with 'X', call it orderX, and would look like:
orderX
AX
BX
BX
BX
CX
CX
Then do a sum of order & special in orderx:
df$numSpecial <- sum(paste(order, special, sep = "") %in% orderx)
But that doesnt work, it returns the sum of the results for all rows for every order:
numSpecial
4
4
4
4
4
4
I then tried as.data.table, but I'm not getting the expected results using:
as.data.table(mydf)[, numSpecial := sum(paste(order, special, sep = "") %in% orderx), by = rowid]
However that is returning just 1 for each row and not sums:
numSpecial
1
0
1
1
1
0
Where am I going wrong with these? I shouldn't have to create that orderX column either I don't think, but I can't figure out the way to get this count right. It's similar to a countif in excel which is easy to do.
There's probably several ways, but you could just multiply it by a TRUE/FALSE flag of "X" being present:
dat[, numSpecial := sum(special == "X") * (special == "X"), by=order]
dat
# rowid order line special numSpecial
#1: 1 A 1 X 1
#2: 2 B 1 0
#3: 3 B 2 X 2
#4: 4 B 3 X 2
#5: 5 C 1 X 1
#6: 6 C 2 0
You could also do it a bit differently like:
dat[, numSpecial := 0L][special == "X", numSpecial := .N, by=order]
Where dat was:
library(data.table)
dat <- structure(list(rowid = 1:6, order = c("A", "B", "B", "B", "C",
"C"), line = c(1L, 1L, 2L, 3L, 1L, 2L), special = c("X", "",
"X", "X", "X", "")), .Names = c("rowid", "order", "line", "special"
), row.names = c(NA, -6L), class = "data.frame")
setDT(dat)
You could use ave with a dummy variable (just filled with 1s):
df$numSpecial <- ifelse(df$special == "X", ave(rep(1,nrow(df)), df$order, df$special, FUN = length), 0)
df
# rowid order line special numSpecial
#1 1 A 1 X 1
#2 2 B 1 0
#3 3 B 2 X 2
#4 4 B 3 X 2
#5 5 C 1 X 1
#6 6 C 2 0
Note I read in your data without the numSpecial column.
Using the dplyr package:
library(dplyr)
df %>% group_by(order) %>%
mutate(numSpecial = ifelse(special=="X", sum(special=="X"), 0))
rowid order special numSpecial
1 1 A X 1
2 2 B 0
3 3 B X 2
4 4 B X 2
5 5 C X 1
6 6 C 0
One other option using base R only would be to use aggregate:
# Your data
df <- data.frame(rowid = 1:6, order = c("A", "B", "B", "B", "C", "C"), special = c("X", "", "X", "X", "X", ""))
# Make the counts
dat <- with(df,aggregate(x=list(answer=special),by=list(order=order,special=special),FUN=function(x) sum(x=="X")))
# Merge back to original dataset:
dat.fin <- merge(df,dat,by=c('order','special'))
I need to delete all rows that contain a value of 2 or -2 regardless of what column it is in except column one.
Example dataframe:
df
a b c d
zzz 2 2 -1
yyy 1 1 1
xxx 1 -1 -2
Desired output:
df
a b c d
yyy 1 1 1
I have tried
df <- df[!grepl(-2 | 2, df),]
df <- subset(df, !df[-1] == 2 |!df[-1] == -2)
My actual dataset has over 300 rows and 70 variables
I believe I need to use some sort of apply function but I am not sure.
Any help is appreciated please let me know if you need more info.
We can create a logical index by comparing the absolute value of the dataset with that of 2, get the row wise sum and if there are no values, it will be 0 (by negating !, it returns TRUE for those 0 values and FALSE for others) and subset based on the logical index
df[!rowSums(abs(df[-1])==2),]
# a b c d
#2 yyy 1 1 1
Or another option is to compare within each column using lapply, collapse it to a logical vector with | and use that to subset the rows
df[!Reduce(`|`,lapply(abs(df[-1]), `==`, 2)),]
# a b c d
#2 yyy 1 1 1
We could also do this with tidyverse
library(tidyverse)
df %>%
select(-1) %>% #to remove the first column
map(~abs(.) ==2) %>% #do the columnwise comparison
reduce(`|`) %>% #reduce it to logical vector
`!` %>% #negate to convert TRUE/FALSE to FALSE/TRUE
df[., ] #subset the rows of original dataset
# a b c d
# 2 yyy 1 1 1
data
df <- structure(list(a = c("zzz", "yyy", "xxx"), b = c(2L, 1L, 1L),
c = c(2L, 1L, -1L), d = c(-1L, 1L, -2L)), .Names = c("a",
"b", "c", "d"), class = "data.frame", row.names = c(NA, -3L))
Option with dplyr:
library(dplyr)
a <- c("zzz","yyy","xxx")
b <- c(2,1,1)
c <- c(2,1,-1)
d <- c(-1,1,-2)
df <- data.frame(a,b,c,d)
filter(df,((abs(b) != 2) & (abs(c) != 2) & (abs(d) != 2)))
a b c d
1 yyy 1 1 1
I would like to drop any occurrence of a factor level for which one row contains a missing value
Example:
ID var1 var2
1 1 2
1 NA 3
2 1 2
2 2 4
So, in this hypothetical, what would be left would be:
ID var1 var2
2 1 2
2 2 4
Hers's possible data.table solution (sorry #rawr)
library(data.table)
setDT(df)[, if (all(!is.na(.SD))) .SD, ID]
# ID var1 var2
# 1: 2 1 2
# 2: 2 2 4
If you only want to check var1 then
df[, if (all(!is.na(var1))) .SD, ID]
# ID var1 var2
# 1: 2 1 2
# 2: 2 2 4
Assuming that NAs would occur in both var columns,
df[with(df, !ave(!!rowSums(is.na(df[,-1])), ID, FUN=any)),]
# ID var1 var2
#3 2 1 2
#4 2 2 4
Or if it is only specific to var1
df[with(df, !ave(is.na(var1), ID, FUN=any)),]
# ID var1 var2
#3 2 1 2
#4 2 2 4
Or using dplyr
library(dplyr)
df %>%
group_by(ID) %>%
filter(all(!is.na(var1)))
# ID var1 var2
#1 2 1 2
#2 2 2 4
data
df <- structure(list(ID = c(1L, 1L, 2L, 2L), var1 = c(1L, NA, 1L, 2L
), var2 = c(2L, 3L, 2L, 4L)), .Names = c("ID", "var1", "var2"
), class = "data.frame", row.names = c(NA, -4L))
Here's one more option in base R. It will check all columns for NAs.
df[!df$ID %in% df$ID[rowSums(is.na(df)) > 0],]
# ID var1 var2
#3 2 1 2
#4 2 2 4
If you only want to check in column "var1" you can do:
df[!with(df, ID %in% ID[is.na(var1)]),]
# ID var1 var2
#3 2 1 2
#4 2 2 4
In the current development version of data.table, there's a new implementation of na.omit for data.tables, which takes a cols =and invert = arguments.
The cols = allows to specify the columns on which to look for NAs. And invert = TRUE returns the NA rows instead, instead of omitting them.
You can install the devel version by following these instructions. Or you can wait for 1.9.6 on CRAN at some point. Using that, we can do:
require(data.table) ## 1.9.5+
setkey(setDT(df), ID)
df[!na.omit(df, invert = TRUE)]
# ID var1 var2
# 1: 2 1 2
# 2: 2 2 4
How this works:
setDT converts data.frame to data.table by reference.
setkey sorts the data.table by the columns provided and marks those columns as key columns so that we can perform a join.
na.omit(df, invert = TRUE) gives just those rows that have NA anywhere.
X[!Y] does an anit-join by joining on the key column ID, and returns all the rows that don't match ID = 1 (from Y). Check this post to read in detail about data.table's joins.
HTH