I would like to add a counter column in a data frame based on a set of identical rows. To do this, I used the package data.table. In my case, the comparison between rows need doing from the combination of columns "z" AND ("x" OR "y").
I tested:
DF[ , Index := .GRP, by = c("x","y","z") ]
but the result is the combination of "z" AND "x" AND "y".
How can I have the combination of "z" AND ("x" OR "y") ?
Here is a data example:
DF = data.frame(x=c("a","a","a","b","c","d","e","f","f"), y=c(1,3,2,8,8,4,4,6,0), z=c("M","M","M","F","F","M","M","F","F"))
DF <- data.table(DF)
I would like to have this output:
> DF
x y z Index
1: a 1 M 1
2: a 3 M 1
3: a 2 M 1
4: b 8 F 2
5: c 8 F 2
6: d 4 M 3
7: e 4 M 3
8: f 6 F 4
9: f 0 F 4
The new group starts if the value for z is changing or the values both for x and y are changing.
Try this example.
require(data.table)
DF <- data.table(x = c("a","a","a","b","c","d","e","f","f"),
y = c(1,3,2,8,8,4,4,6,0),
z=c("M","M","M","F","F","M","M","F","F"))
# The functions to compare if value is not equal with the previous value
is.not.eq.with.lag <- function(x) c(T, tail(x, -1) != head(x, -1))
DF[, x1 := is.not.eq.with.lag(x)]
DF[, y1 := is.not.eq.with.lag(y)]
DF[, z1 := is.not.eq.with.lag(z)]
DF
DF[, Index := cumsum(z1 | (x1 & y1))]
DF
I know a lot of people warn against a for loop in R, but in this instance I think it is a very direct way of approaching the problem. Plus, the result isn't growing in size so performance issues aren't a large issue. The for loop approach would be:
dt$grp <- rep(NA,nrow(dt))
for (i in 1:nrow(dt)){
if (i == 1){
dt$grp[i] = 1
}
else {
if(dt$z[i-1] == dt$z[i] & (dt$x[i-1] == dt$x[i] | dt$y[i-1] == dt$y[i])){
dt$grp[i] = dt$grp[i-1]
}else{
dt$grp[i] = dt$grp[i-1] + 1
}
}
}
Trying this on OPs original problem, the result is:
DF = data.frame(x=c("a","a","a","b","c","d","e","f","f"), y=c(1,3,2,8,8,4,4,6,0), z=c("M","M","M","F","F","M","M","F","F"))
dt <- data.table(DF)
dt$grp <- rep(NA,nrow(dt))
for (i in 1:nrow(dt)){
if (i == 1){
dt$grp[i] = 1
}
else {
if(dt$z[i-1] == dt$z[i] & (dt$x[i-1] == dt$x[i] | dt$y[i-1] == dt$y[i])){
dt$grp[i] = dt$grp[i-1]
}else{
dt$grp[i] = dt$grp[i-1] + 1
}
}
}
dt
x y z grp
1: a 1 M 1
2: a 3 M 1
3: a 2 M 1
4: b 8 F 2
5: c 8 F 2
6: d 4 M 3
7: e 4 M 3
8: f 6 F 4
9: f 0 F 4
Trying this on the data.table in #Frank's comment, gives the expected result as well:
dt<-data.table(x = c("b", "a", "a"), y = c(1, 1, 2), z = c("F", "F", "F"))
dt$grp <- rep(NA,nrow(dt))
for (i in 1:nrow(dt)){
if (i == 1){
dt$grp[i] = 1
}
else {
if(dt$z[i-1] == dt$z[i] & (dt$x[i-1] == dt$x[i] | dt$y[i-1] == dt$y[i])){
dt$grp[i] = dt$grp[i-1]
}else{
dt$grp[i] = dt$grp[i-1] + 1
}
}
}
dt
x y z grp
1: b 1 F 1
2: a 1 F 1
3: a 2 F 1
EDITED TO ADD: This solution is in some ways a more verbose version of the one advocated by djhurio above. I think it shows what is happening a bit more so I'll leave it.
I think this is a task easier to do if it is broken down a little bit. The below code creates TWO indices at first, one for changes in x (nested in z) and one for changes in y (nested in z). We then find the first row from each of these indices. Taking the cumulative sum of the case where both FIRST.x and FIRST.y is true should give your desired index.
library(data.table)
dt_example <- data.table(x = c("a","a","a","b","c","d","e","f","f"),
y = c(1,3,2,8,8,4,4,6,0),
z = c("M","M","M","F","F","M","M","F","F"))
dt_example[,Index_x := .GRP,by = c("z","x")]
dt_example[,Index_y := .GRP,by = c("z","y")]
dt_example[,FIRST.x := !duplicated(Index_x)]
dt_example[,FIRST.y := !duplicated(Index_y)]
dt_example[,Index := cumsum(FIRST.x & FIRST.y)]
dt_example
x y z Index_x Index_y FIRST.x FIRST.y Index
1: a 1 M 1 1 TRUE TRUE 1
2: a 3 M 1 2 FALSE TRUE 1
3: a 2 M 1 3 FALSE TRUE 1
4: b 8 F 2 4 TRUE TRUE 2
5: c 8 F 3 4 TRUE FALSE 2
6: d 4 M 4 5 TRUE TRUE 3
7: e 4 M 5 5 TRUE FALSE 3
8: f 6 F 6 6 TRUE TRUE 4
9: f 0 F 6 7 FALSE TRUE 4
This approach looks for changes in x & z | y & z. The extra columns are left in the data.table to show the calculations.
DF[, c("Ix", "Iy", "Iz", "dx", "dy", "min.change", "Index") :=
#Create index of values based on consecutive order
list(ix <- rleid(x), iy <- rleid(y), iz <- rleid(z),
#Determine if combinations of x+z OR y+z change
ix1 <- c(0, diff(rleid(ix+iz))),
iy1 <- c(0, diff(rleid(iy+iz))),
#Either combination is constant (no change)?
change <- pmin(ix1, iy1),
#New index based on change
cumsum(change) + 1
)]
x y z Ix Iy Iz dx dy min.change Index
1: a 1 M 1 1 1 0 0 0 1
2: a 3 M 1 2 1 0 1 0 1
3: a 2 M 1 3 1 0 1 0 1
4: b 8 F 2 4 2 1 1 1 2
5: c 8 F 3 4 2 1 0 0 2
6: d 4 M 4 5 3 1 1 1 3
7: e 4 M 5 5 3 1 0 0 3
8: f 6 F 6 6 4 1 1 1 4
9: f 0 F 6 7 4 0 1 0 4
Related
I have two data frames(df1, df2) and performed full_join using the common column of interest col1.
df1 <- data.frame(col1=c('A','D','C','C','E','E','I'),col2=c(4,7,8,3,2,4,9))
df2 <- data.frame(col1=c('A','A','B','C','C','E','E','I'),col2=c(4,1,6,8,3,2,1,9))
df1 %>% full_join(df2, by = "col1")
# col1 col2.x col2.y
# 1 A 4 4
# 2 A 4 1
# 3 D 7 NA
# 4 C 8 8
# 5 C 8 3
# 6 C 3 8
# 7 C 3 3
# 8 E 2 2
# 9 E 2 1
# 10 E 4 2
# 11 E 4 1
# 12 I 9 9
# 13 B NA 6
As expected the full_join provides multiplicty of the joining column values and I wish to avoid it. I wish to arrive at the following output. What kind of post-processing approaches do you suggest?
# col1 col2.x col2.y
# 1 A 4 4
# 2 A NA 1
# 3 D 7 NA
# 4 C 8 8
# 5 C 3 3
# 6 E 2 2
# 7 E 4 1
# 8 I 9 9
# 9 B NA 6
More information:
Case 1: I do not need four rows in the output for two same values in both input objects:
# 4 C 8 8
# 5 C 8 3
# 6 C 3 8
# 7 C 3 3
instead, I want only two as:
# 4 C 8 8
# 5 C 3 3
Case 2: Similarly, I need same row for the difference in values:
# 8 E 2 2
# 9 E 2 1
# 10 E 4 2
# 11 E 4 1
instead, I want only two rows as below:
# 8 E 2 2
# 9 E 4 1
A possible solution in 2 steps using the data.table-package:
0) load package & convert to data.table's
library(data.table)
setDT(df1)
setDT(df2)
1) define helper function
unlistSD <- function(x) {
l <- length(x)
ls <- sapply(x, lengths)
m <- max(ls)
newSD <- vector(mode = "list", length = l)
for (i in 1:l) {
u <- unlist(x[[i]])
lu <- length(u)
if (lu < m) {
u <- c(u, rep(NA_real_, m - lu))
}
newSD[[i]] <- u
}
return(setNames(as.list(newSD), names(x)))
}
2) merge and apply helper function
merge(df1[, .(col2 = list(col2)), by = col1],
df2[, .(col2 = list(col2)), by = col1],
by = "col1", all = TRUE
)[, unlistSD(.SD), by = col1]
which gives the following result:
col1 col2.x col2.y
1: A 4 4
2: A NA 1
3: C 8 8
4: C 3 3
5: D 7 NA
6: E 2 2
7: E 4 1
8: I 9 9
9: B NA 6
Another possibiliy with base R:
unlistDF <- function(d, groupcols) {
ds <- split(d[, setdiff(names(d), groupcols)], d[,groupcols])
ls <- lapply(ds, function(x) max(sapply(x, lengths)))
dl <- lapply(ds, function(x) lapply(as.list(x), unlist))
du <- Map(function(x, y) {
lapply(x, function(i) {
if(length(i) < y) {
c(i, rep(NA_real_, y - length(i)))
} else i
})
}, x = dl, y = ls)
ld <- lapply(du, as.data.frame)
cbind(d[rep(1:nrow(d), ls), groupcols, drop = FALSE],
do.call(rbind.data.frame, c(ld, make.row.names = FALSE)),
row.names = NULL)
}
Now you can use this function as follows in combination with merge:
df <- merge(aggregate(col2 ~ col1, df1, as.list),
aggregate(col2 ~ col1, df2, as.list),
by = "col1", all = TRUE)
unlistDF(df, "col1")
I need some help on creating a special kind of subtraction.
I have a data table x and I must subtract two columns, say a and b.
However, either column may not exist.
If a column does not exist, its value in the subtraction should be set to zero.
So far, I have approached this problem by trying to define a new subtraction operator, %-%
Thus, for example, if x = data.table(a = 5, b = 2), then a %-% b should be 3, whereas a %-% d should be 5.
I have tried to define this subtraction operator as shown below. However, for some reason, my subtraction always yields zero! Can anyone help me understand what am I doing wrong and how may I correct my code?
library(data.table)
x = data.table(a = floor(10 * runif(5)), b = floor(10 * runif(5)), c =floor(10 * runif(5)))
`%-%` <- function(e1,e2, DT = x){
ifelse(is.numeric(substitute(e1, DT)), e1 <- substitute(e1, DT), e1 <- 0)
ifelse(is.numeric(substitute(e2, DT)), e2 <- substitute(e2, DT), e2 <- 0)
return(e1 - e2)
}
x[, d := a %-% b]
x
x[, d := a %-% d]
x
Many thanks!
We can create a function with intersect for passing the column names into .SDcols, then Reduce by subtracting the corresponding rows of each column in .SD (Subset of Data.table)
f1 <- function(dat, .x, .y) intersect(names(dat), c(.x, .y))
x[, d := Reduce('-', .SD), .SDcols = f1(x, 'a', 'b')]
x[, e := Reduce(`-`, .SD), .SDcols = f1(x, 'a', 'f')]
x
# a b c d e
#1: 7 0 8 7 7
#2: 3 6 4 -3 3
#3: 9 9 8 0 9
#4: 3 6 2 -3 3
#5: 0 2 3 -2 0
Or if we want to change the OP's function by passing unquoted arguments, then use enquo to convert it to from quosure and then reconvert it back to string with quo_name. Create an intersection vector from the column names of the dataset, and use - in the Reduce
library(dplyr)
`%-%` <- function(e1,e2, DT){
e1 <- quo_name(enquo(e1))
e2 <- quo_name(enquo(e2))
nm1 <- intersect(names(DT), c(e1, e2))
DT[, Reduce(`-`, .SD), .SDcols = nm1]
}
x[, d := `%-%`(a, b, .SD)]
x[, e := `%-%`(a, f, .SD)]
data
x <- structure(list(a = c(7L, 3L, 9L, 3L, 0L), b = c(0L, 6L, 9L, 6L,
2L), c = c(8L, 4L, 8L, 2L, 3L)), .Names = c("a", "b", "c"), row.names = c("1:",
"2:", "3:", "4:", "5:"), class = "data.frame")
setDT(x)
`%-%`=function(a,b){
DT=eval(sys.status()$sys.calls[[2]][[2]])
a=substitute(a)
b=substitute(b)
stopifnot(is.name(a),is.name(b),is.data.table(DT))
a=deparse(a)
b=deparse(b)
d=numeric(nrow(DT))
a=if(!exists(a,DT)) d else get(a,DT)
b=if(!exists(b,DT)) d else get(b,DT)
a-b
}
set.seed(5)
x = data.table(a = floor(10 * runif(5)), b = floor(10 * runif(5)), c =floor(10 * runif(5)))
x
a b c
1: 2 7 2
2: 6 5 4
3: 9 8 3
4: 2 9 5
5: 1 1 2
x[,a%-%b]
[1] -5 1 1 -7 0
x[,a%-%f]# F is just a column of zeros since it does not exist:
[1] 2 6 9 2 1
Or you can just do:
x[,c("d","e","f"):=.(a%-%b,a%-%h,g%-%h)]
x
a b c d e f
1: 2 7 2 -5 2 0
2: 6 5 4 1 6 0
3: 9 8 3 1 9 0
4: 2 9 5 -7 2 0
5: 1 1 2 0 1 0
This function is written to work on a datatable only. For example:
setDF(x)[,a%-%b]
Error: is.data.table(DT) is not TRUE
setDT(x)[,a%-%b]
[1] -5 1 1 -7 0
EDIT: This answer gives the correct value with regard to the order. (Most of the answers given below do not pass this test)
setDT(x)[,a%-%b]#Column subtract another
[1] -5 1 1 -7 0
setDT(x)[,b%-%a]#Reversing the order
[1] 5 -1 -1 7 0
setDT(x)[,b%-%b]#Column Subtract itself
[1] 0 0 0 0 0
setDT(x)[,a%-%f]#Column subtract a non-existing column
[1] 2 6 9 2 1
setDT(x)[,f%-%a]#a non-existing column subtract an existing column
[1] -2 -6 -9 -2 -1
x[,g%-%f] #subtract two non-existing columns
[1] 0 0 0 0 0
IIUC, you can try this way. We use exist function to ensure if the column is available in the data.
# helper function
do_sub <- function(df, col1 = 'a', col2='b')
{
ans <- integer()
if (exists(col1, df) & exists(col2, df)){
ans <- append(ans, df[[col1]] - df[[col2]])
} else if (exists(col1, df)){
ans <- append(ans, df[[col1]] - 0)
} else {
ans <- append(ans, 0 - df[[col2]])
}
return (ans)
}
# compute new columns
df[, d := do_sub(.SD, col1 = 'a', col2 = 'b')]
df[, e := do_sub(.SD, col1 = 'a', col2 = 'f')]
print(df)
a b c d e
1: 7 0 8 7 7
2: 3 6 4 -3 3
3: 9 9 8 0 9
4: 3 6 2 -3 3
5: 0 2 3 -2 0
I want to be able to extract specific characters from a character vector in a data frame and return a new data frame. The information I want to extract is auditors remark on a specific company's income and balance sheet. My problem is that the auditors remarks are stored in vectors containing the different remarks. For instance:
vec = c("A C G H D E"). Since "A" %in% vec won't return TRUE, I have to use strsplit to break up each character vector in the data frame, hence "A" %in% unlist(strsplit(dat[i, 2], " "). This returns TRUE.
Here is a MWE:
dat <- data.frame(orgnr = c(1, 2, 3, 4), rat = as.character(c("A B C")))
dat$rat <- as.character(dat$rat)
dat[2, 2] <- as.character(c("A F H L H"))
dat[3, 2] <- as.character(c("H X L O"))
dat[4, 2] <- as.character(c("X Y Z A B C"))
Now, to extract information about every single letter in the rat coloumn, I've tried several approaches, following similar problems such as Roland's answer to a similar question (How to split a character vector into data frame?)
DF <- data.frame(do.call(rbind, strsplit(dat$rat, " ", fixed = TRUE)))
DF
X1 X2 X3 X4 X5 X6
1 A B C A B C
2 A F H L H A
3 H X L O H X
4 X Y Z A B C
This returnsthe following error message: Warning message:
In (function (..., deparse.level = 1) :
number of columns of result is not a multiple of vector length (arg 2)
It would be a desirable approach since it's fast, but I can't use DF since it recycles.
Is there a way to insert NA instead of the recycling because of the different length of the vectors?
So far I've found a solution to the problem by using for-loops in combination with ifelse-statements. However, with 3 mill obs. this approach takes years!
dat$A <- 0
for(i in seq(1, nrow(dat), 1)) {
print(i)
dat[i, 3] <- ifelse("A" %in% unlist(strsplit(dat[i, 2], " ")), 1, 0)
}
dat$B <- 0
for(i in seq(1, nrow(dat), 1)) {
print(i)
dat[i, 4] <- ifelse("B" %in% unlist(strsplit(dat[i, 2], " ")), 1, 0)
}
This gives the results I want:
dat
orgnr rat A B
1 1 A B C 1 1
2 2 A F H L H 1 0
3 3 H X L O 0 0
4 4 X Y Z A B C 1 1
I've searched through most of the relevant questions I could find here on StackOverflow. This one is really close to my problem: How to convert a list consisting of vector of different lengths to a usable data frame in R?, but I don't know how to implement strsplit with that approach.
We can use for-loop with grepl to achieve this task. + 0 is to convert the column form TRUE or FALSE to 1 or 0
for (col in c("A", "B")){
dat[[col]] <- grepl(col, dat$rat) + 0
}
dat
# orgnr rat A B
# 1 1 A B C 1 1
# 2 2 A F H L H 1 0
# 3 3 H X L O 0 0
# 4 4 X Y Z A B C 1 1
If performance is an issue, try this data.table approach.
library(data.table)
# Convert to data.table
setDT(dat)
# Create a helper function
dummy_fun <- function(col, vec){
grepl(col, vec) + 0
}
# Apply the function to A and B
dat[, c("A", "B") := lapply(c("A", "B"), dummy_fun, vec = rat)]
dat
# orgnr rat A B
# 1: 1 A B C 1 1
# 2: 2 A F H L H 1 0
# 3: 3 H X L O 0 0
# 4: 4 X Y Z A B C 1 1
using Base R:
a=strsplit(dat$rat," ")
b=data.frame(x=rep(dat$orgnr,lengths(a)),y=unlist(a),z=1)
cbind(dat,as.data.frame.matrix(xtabs(z~x+y,b)))
orgnr rat A B C F H L O X Y Z
1 1 A B C 1 1 1 0 0 0 0 0 0 0
2 2 A F H L H 1 0 0 1 2 1 0 0 0 0
3 3 H X L O 0 0 0 0 1 1 1 1 0 0
4 4 X Y Z A B C 1 1 1 0 0 0 0 1 1 1
From here you can Just call those columns that you want:
d=as.data.frame.matrix(xtabs(z~x+y,b))
cbind(dat,d[c("A","B")])
orgnr rat A B
1 1 A B C 1 1
2 2 A F H L H 1 0
3 3 H X L O 0 0
4 4 X Y Z A B C 1 1
Hypothetical data:
a <- c(400,500,600,700,100,600,700,100)
b <- c(2,2,1,2,2,1,2,1)
c <- c('NA','R','NA','G','NA','R','NA','G')
data <- data.frame(a,b,c)
Output:
a b c
1 400 2 NA
2 500 2 R
3 600 1 NA
4 700 2 G
5 100 2 NA
6 600 1 R
7 700 2 NA
8 100 1 G
You can easily subset if it is in the same row:
subset(data, b== '1' & c =='R')
Output:
a b c
6 600 1 R
My question is how do I subset between rows? For example, how do I find all values of c = 'R' when b = '2' on the above row?
a b c
2 500 1 R
6 600 1 R
How do I find all values of c = 'R' when b = '2' on the above row?
How about
b2above <- which(data$b == 2) + 1L
cR <- which(data$c == "R")
id <- cR[cR %in% b2above] ## or `id <- intersect(cR, b2above)`
data[id, ]
# a b c
#2 500 2 R
#6 600 1 R
You can try this too:
indices.b <- which(data$b == 2)
indices.c <- which(data$c == 'R')
if ((length(indices.b) > 0) && (length(indices.c) > 0)) { # if such rows exist
indices <- which((indices.c - 1) %in% indices.b) # check if consecutive rows
if(length(indices)>0) data[indices.c[indices],] # if consecutive rows exist
}
# a b c
# 2 500 2 R
# 6 600 1 R
I have data set
ID <- c(1,1,2,2,2,2,3,3,3,3,3,4,4,4)
Eval <- c("A","A","B","B","A","A","A","A","B","B","A","A","A","B")
med <- c("c","d","k","k","h","h","c","d","h","h","h","c","h","k")
df <- data.frame(ID,Eval,med)
> df
ID Eval med
1 1 A c
2 1 A d
3 2 B k
4 2 B k
5 2 A h
6 2 A h
7 3 A c
8 3 A d
9 3 B h
10 3 B h
11 3 A h
12 4 A c
13 4 A h
14 4 B k
I try to create variable x and y, group by ID and Eval. For each ID, if Eval = A, and med = "h" or "k", I set x = 1, other wise x = 0, if Eval = B and med = "h" or "k", I set y = 1, other wise y = 0. I use the way I don't like it, I got answer but it seem like not that great
df <- data.table(df)
setDT(df)[, count := uniqueN(med) , by = .(ID,Eval)]
setDT(df)[Eval == "A", x:= ifelse(count == 1 & med %in% c("k","h"),1,0), by=ID]
setDT(df)[Eval == "B", y:= ifelse(count == 1 & med %in% c("k","h"),1,0), by=ID]
ID Eval med count x y
1: 1 A c 2 0 NA
2: 1 A d 2 0 NA
3: 2 B k 1 NA 1
4: 2 B k 1 NA 1
5: 2 A h 1 1 NA
6: 2 A h 1 1 NA
7: 3 A c 3 0 NA
8: 3 A d 3 0 NA
9: 3 B h 1 NA 1
10: 3 B h 1 NA 1
11: 3 A h 3 0 NA
12: 4 A c 2 0 NA
13: 4 A h 2 0 NA
14: 4 B k 1 NA 1
Then I need to collapse the row to get unique ID, I don't know how to collapse rows, any idea?
The output
ID x y
1 0 0
2 1 1
3 0 1
4 0 1
We create the 'x' and 'y' variables grouped by 'ID' without the NA elements directly coercing the logical vector to binary (as.integer)
df[, x := as.integer(Eval == "A" & count ==1 & med %in% c("h", "k")) , by = ID]
and similarly for 'y'
df[, y := as.integer(Eval == "B" & count ==1 & med %in% c("h", "k")) , by = ID]
and summarise it, using any after grouping by "ID"
df[, lapply(.SD, function(x) as.integer(any(x))) , ID, .SDcols = x:y]
# ID x y
#1: 1 0 0
#2: 2 1 1
#3: 3 0 1
#4: 4 0 1
If we need a compact approach, instead of assinging (:=), we summarise the output grouped by "ID", "Eval" based on the conditions and then grouped by 'ID', we check if there is any TRUE values in 'x' and 'y' by looping over the columns described in the .SDcols.
setDT(df)[, if(any(uniqueN(med)==1 & med %in% c("h", "k"))) {
.(x= Eval=="A", y= Eval == "B") } else .(x=FALSE, y=FALSE),
by = .(ID, Eval)][, lapply(.SD, any) , by = ID, .SDcols = x:y]
# ID x y
#1: 1 FALSE FALSE
#2: 2 TRUE TRUE
#3: 3 FALSE TRUE
#4: 4 FALSE TRUE
If needed, we can convert to binary similar to the approach showed in the first solution.
The OP's goal...
"I try to create variable x and y, group by ID and Eval. For each ID, if Eval = A, and med = "h" or "k", I set x = 1, other wise x = 0, if Eval = B and med = "h" or "k", I set y = 1, other wise y = 0. [...] Then I need to collapse the row to get unique ID"
can be simplified to...
For each ID and Eval, flag if all med values are h or all med values are k.
setDT(df) # only do this once
df[, all(med=="k") | all(med=="h"), by=.(ID,Eval)][, dcast(.SD, ID ~ Eval, fun=any)]
ID A B
1: 1 FALSE FALSE
2: 2 TRUE TRUE
3: 3 FALSE TRUE
4: 4 FALSE TRUE
To see what dcast is doing, read ?dcast and try running just the first part on its own, df[, all(med=="k") | all(med=="h"), by=.(ID,Eval)].
The change to use x and y instead of A and B is straightforward but ill-advised (since unnecessary renaming can be confusing and lead to extra work when there are new Eval values); and ditto the change for 1/0 instead of TRUE/FALSE (since the values captured are actually boolean).
Here is my dplyr solution since I find it more readable than data.table.
library(dplyr)
df %>%
group_by(ID, Eval) %>%
mutate(
count = length(unique(med)),
x = ifelse(Eval == "A" &
count == 1 & med %in% c("h", "k"), 1, 0),
y = ifelse(Eval == "B" &
count == 1 & med %in% c("h", "k"), 1, 0)
) %>%
group_by(ID) %>%
summarise(x1 = max(unique(x)),
y1 = max(unique(y)))
A one liner solution for collapsing the rows of your result :
df[,lapply(.SD,function(i) {ifelse(1 %in% i,ifelse(!0 %in% i,1,0),0)}),.SDcols=x:y,by=ID]
ID x y
1: 1 0 0
2: 2 1 1
3: 3 0 1
4: 4 0 1