Creating a new variable by detecting max value for each id - r

My data set contains three variables:
id <- c(1,1,1,1,1,1,2,2,2,2,5,5,5,5,5,5)
ind <- c(0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1)
price <- c(1,2,3,4,5,6,1,2,3,4,1,2,3,4,5,6)
mdata <- data.frame(id,ind,price)
I need to create a new variable (ind2) that is if ind=0, then ind2=0.
also, if ind=1, then ind2=0, unless the price value is max, then ind2=1.
The new data looks like:
id ind ind2 price
1 0 0 1
1 0 0 2
1 0 0 3
1 0 0 4
1 0 0 5
1 0 0 6
2 1 0 1
2 1 0 2
2 1 0 3
2 1 1 4
5 1 0 1
5 1 0 2
5 1 0 3
5 1 0 4
5 1 0 5
5 1 1 6

library(dplyr)
mdata %>%
group_by(id) %>%
mutate(ind2 = +(ind == 1L & price == max(price)))
# id ind price ind2
# 1 1 0 1 0
# 2 1 0 2 0
# 3 1 0 3 0
# 4 1 0 4 0
# 5 1 0 5 0
# 6 1 0 6 0
# 7 2 1 1 0
# 8 2 1 2 0
# 9 2 1 3 0
# 10 2 1 4 1
# 11 5 1 1 0
# 12 5 1 2 0
# 13 5 1 3 0
# 14 5 1 4 0
# 15 5 1 5 0
# 16 5 1 6 1
Or if you prefer data.table
setDT(mdata)[, ind2 := +(ind == 1L & price == max(price)), by = id]
Or with base R
mdata$ind2 <- unlist(lapply(split(mdata,mdata$id),
function(x) +(x$ind == 1L & x$price == max(x$price))))

Related

Recoding by an order in r

I have a data recoding puzzle. Here is how my sample data looks like:
df <- data.frame(
id = c(1,1,1,1,1,1,1, 2,2,2,2,2,2, 3,3,3,3,3,3,3),
scores = c(0,1,1,0,0,-1,-1, 0,0,1,-1,-1,-1, 0,1,0,1,1,0,1),
position = c(1,2,3,4,5,6,7, 1,2,3,4,5,6, 1,2,3,4,5,6,7),
cat = c(1,1,1,1,1,0,0, 1,1,1,0,0,0, 1,1,1,1,1,1,1))
id scores position cat
1 1 0 1 1
2 1 1 2 1
3 1 1 3 1
4 1 0 4 1
5 1 0 5 1
6 1 -1 6 0
7 1 -1 7 0
8 2 0 1 1
9 2 0 2 1
10 2 1 3 1
11 2 -1 4 0
12 2 -1 5 0
13 2 -1 6 0
14 3 0 1 1
15 3 1 2 1
16 3 0 3 1
17 3 1 4 1
18 3 1 5 1
19 3 0 6 1
20 3 1 7 1
There are three ids in the dataset and rows were ordered by a positon variable. For each id, the first row after the scores start by -1 needs to be 0, and the cat variable needs to be 1. For example, for id=1, the first row would be 6th position and in that row, score should be 0 and the cat variable needs to 1. For those ids do not have scores=-1, I keep them as they are.
The desired output should look like below:
id scores position cat
1 1 0 1 1
2 1 1 2 1
3 1 1 3 1
4 1 0 4 1
5 1 0 5 1
6 1 0 6 1
7 1 -1 7 0
8 2 0 1 1
9 2 0 2 1
10 2 1 3 1
11 2 0 4 1
12 2 -1 5 0
13 2 -1 6 0
14 3 0 1 1
15 3 1 2 1
16 3 0 3 1
17 3 1 4 1
18 3 1 5 1
19 3 0 6 1
20 3 1 7 1
Any recommendations??
Thanks
This may be what you are after
df %>%
group_by(id) %>%
mutate(i = which(scores == -1)[1]) %>% # find the first row == -1
mutate(scores = case_when(position == i & scores !=0 ~ 0, T ~ scores), # update the score using position & i
cat = ifelse(scores == -1,0,1)) %>% # then update cat
select (-i) # remove I
After trying a few things and getting ideas from #Ricky and #e.matt, I came up with a solution.
df %>%
filter(scores == -1) %>% # keep cases where var = 1
distinct(id, .keep_all = T) %>% # keep distinct cases based on group
mutate(first = 1) %>% # create first column
right_join(df, by=c("id","scores","position","cat")) %>% # join back original dataset
mutate(first = coalesce(first, 0)) %>% # replace NAs with 0
mutate(scores = case_when(
first == 1 ~ 0,
TRUE~scores)) %>%
mutate(cat = case_when(
first == 1 ~ 1,
TRUE~cat))
This provides my desired output.
id scores position cat first
1 1 0 1 1 0
2 1 1 2 1 0
3 1 1 3 1 0
4 1 0 4 1 0
5 1 0 5 1 0
6 1 0 6 1 1
7 1 -1 7 0 0
8 2 0 1 1 0
9 2 0 2 1 0
10 2 1 3 1 0
11 2 0 4 1 1
12 2 -1 5 0 0
13 2 -1 6 0 0
14 3 0 1 1 0
15 3 1 2 1 0
16 3 0 3 1 0
17 3 1 4 1 0
18 3 1 5 1 0
19 3 0 6 1 0
20 3 1 7 1 0
here is a data.table oneliner
library( data.table )
setDT(df)
df[ df[, .(cumsum( scores == -1 ) == 1), by = .(id)]$V1, `:=`( scores = 0, cat = 1) ]
# id scores position cat
# 1: 1 0 1 1
# 2: 1 1 2 1
# 3: 1 1 3 1
# 4: 1 0 4 1
# 5: 1 0 5 1
# 6: 1 0 6 1
# 7: 1 -1 7 0
# 8: 2 0 1 1
# 9: 2 0 2 1
# 10: 2 1 3 1
# 11: 2 0 4 1
# 12: 2 -1 5 0
# 13: 2 -1 6 0
# 14: 3 0 1 1
# 15: 3 1 2 1
# 16: 3 0 3 1
# 17: 3 1 4 1
# 18: 3 1 5 1
# 19: 3 0 6 1
# 20: 3 1 7 1
You could do something along these lines using the dplyr package:
library(dplyr)
df = mutate(df, cat = ifelse(scores == -1, 1, cat),
scores = ifelse(scores == -1, 0, scores))
Using the mutate() function, I am re-assigning the values for the scores and cat fields according to ifelse() conditional statements. For scores, if the score is -1, the value is replaced by 0, otherwise it keeps the score as is. For cat, it also checks if scores is equal to -1, but would assign a value of 1 when the condition is met, or the already existing value of cat when the condition is not met.
EDIT
After our discussion in the comments, I think something along these lines should be helpful (you may have to modify the logic since I don't exactly follow what the desired output is here):
for(i in 1:nrow(df)){
# Check if score is -1
if(df[i, 'scores'] == -1){
# Update values for the next row
df[i+1, 'scores'] <- 0
df[i+1, 'cat'] <- 1
}
}
Sorry that I don't really follow the desired output, hopefully this is helpful in getting you to your answer!

Generate a dummy variable satisfying a condition for the same individual in a panel dataframe

I have a dataframe of this form
ID panelid dummy1 dummy2
1 1 0 1
1 2 1 0
2 1 1 0
2 2 0 1
3 1 1 0
3 2 1 0
4 1 0 1
4 2 0 1
I want to generate a dummy variable equal to one in occurrence of panelid==2 and only if the same individual presents a value for the dummy1 equal to 1 in panelid==1 and a value for the dummy2 equal to 1 in panelid==2. Thus I want to obtain something like this
ID panelid dummy1 dummy2 result
1 1 0 1 0
1 2 1 0 0
2 1 1 0 0
2 2 0 1 1
3 1 1 0 0
3 2 1 0 0
4 1 0 1 0
4 2 0 1 0
Can someone help me with these?
Many thanks to everyone
This is almost identical solution to #Cole's solution.
dataset <- read.table(text = 'ID panelid dummy1 dummy2
1 1 0 1
1 2 1 0
2 1 1 0
2 2 0 1
3 1 1 0
3 2 1 0
4 1 0 1
4 2 0 1',
header = TRUE)
temp_ID <- dataset$ID[(dataset$panelid == 1) & (dataset$dummy1 == 1)]
dataset$result <- as.integer(x = ((dataset$panelid == 2) & (dataset$dummy2 == 1) & (dataset$ID %in% temp_ID)))
dataset
ID panelid dummy1 dummy2 result
1 1 1 0 1 0
2 1 2 1 0 0
3 2 1 1 0 0
4 2 2 0 1 1
5 3 1 1 0 0
6 3 2 1 0 0
7 4 1 0 1 0
8 4 2 0 1 0
Here's a base R approach:
dummy1_in_panelid <- with(df, ID[panelid == 1 & dummy1 == 1])
#initialize
df$result <- 0
df$result[with(df, which(panelid == 2 & ID %in% dummy1_in_panelid & dummy2 == 1))] <- 1
df
ID panelid dummy1 dummy2 result
1 1 1 0 1 0
2 1 2 1 0 0
3 2 1 1 0 0
4 2 2 0 1 1
5 3 1 1 0 0
6 3 2 1 0 0
7 4 1 0 1 0
8 4 2 0 1 0
And the data...
df <- as.data.frame(data.table::fread('
ID panelid dummy1 dummy2
1 1 0 1
1 2 1 0
2 1 1 0
2 2 0 1
3 1 1 0
3 2 1 0
4 1 0 1
4 2 0 1'))

Row-wise operation by group over time R

Problem:
I am trying to create variable x2 which is equal to 1, for all rows within each ID group where over time x1 switches from 1 to 0.
Additionally, after the switch, every consecutive 0 in the run, x2 is set to 1.
I tried to figure out how to do this using library(dplyr), but could not figure out how to look at previous records within the group.
Input Data:
ID<-c("1","1","1","1","1","2","2","2","2","3","3","3","4","4","5","5","5")
time<-c("1","2","3","4","5","1","2","3","4","1","2","3","1","2","1","2","3")
x1<-c("0","1","1","1","1","0","0","0","0","1","0","0","1","1","1","0","1")
df<-data.frame(ID,time,x1)
Required Output:
ID time x1 x2
1 1 0 0
1 2 1 0
1 3 1 0
1 4 1 0
1 5 1 0
2 1 0 0
2 2 0 0
2 3 0 0
2 4 0 0
3 1 1 0
3 2 0 1
3 3 0 1
4 1 1 0
4 2 1 0
5 1 1 0
5 2 0 1
5 3 1 0
It is better to have the 'x1' as numeric column
library(data.table)
setDT(df)[, x2 := (cumsum(x1) < 2)*cumsum(c(FALSE, diff(x1) < 0)), ID]
df
# ID time x1 x2
# 1: 1 1 0 0
# 2: 1 2 1 0
# 3: 1 3 1 0
# 4: 1 4 1 0
# 5: 1 5 1 0
# 6: 2 1 0 0
# 7: 2 2 0 0
# 8: 2 3 0 0
# 9: 2 4 0 0
#10: 3 1 1 0
#11: 3 2 0 1
#12: 3 3 0 1
#13: 4 1 1 0
#14: 4 2 1 0
#15: 5 1 1 0
#16: 5 2 0 1
#17: 5 3 1 0
data
ID<-c("1","1","1","1","1","2","2","2","2","3","3","3","4","4","5","5","5")
time<-c("1","2","3","4","5","1","2","3","4","1","2","3","1","2","1","2","3")
x1<- as.integer(c("0","1","1","1","1","0","0","0","0","1","0","0","1","1","1","0","1"))
df<-data.frame(ID,time,x1)
If you want a dplyr answer, you can use #akrun's code in mutate after grouping by ID
library(dplyr)
ID<-c("1","1","1","1","1","2","2","2","2","3","3","3","4","4","5","5","5")
time<-c("1","2","3","4","5","1","2","3","4","1","2","3","1","2","1","2","3")
x1<- as.integer(c("0","1","1","1","1","0","0","0","0","1","0","0","1","1","1","0","1"))
df<-data.frame(ID,time,x1)
df <- df %>%
group_by(ID) %>%
mutate(x2 = (cumsum(x1) < 2)*cumsum(c(FALSE, diff(x1) < 0)))
df
# ID time x1 x2
# 1 1 0 0
# 1 2 1 0
# 1 3 1 0
# 1 4 1 0
# 1 5 1 0
# 2 1 0 0
# 2 2 0 0
# 2 3 0 0
# 2 4 0 0
# 3 1 1 0
# 3 2 0 1
# 3 3 0 1
# 4 1 1 0
# 4 2 1 0
# 5 1 1 0
# 5 2 0 1
# 5 3 1 0

Convert continuous dataframe into binary dataframe in R

I have the following data frame:
i39<-c(5,3,5,4,4,3)
i38<-c(5,3,5,3,4,1)
i37<-c(5,3,5,3,4,3)
i36<-c(5,4,5,5,4,2)
ndat1<-as.data.frame(cbind(i39,i38,i37,i36))
> ndat1
i39 i38 i37 i36
1 5 5 5 5
2 3 3 3 4
3 5 5 5 5
4 4 3 3 5
5 4 4 4 4
6 3 1 3 2
My goal is to convert any value that is a 4 or a 5 into a 1, and anything else into a 0 to yield the following:
> ndat1
i39 i38 i37 i36
1 1 1 1 1
2 0 0 0 1
3 1 1 1 1
4 1 0 0 1
5 1 1 1 1
6 0 0 0 0
With your data set I would just do
ndat1[] <- +(ndat1 >= 4)
# i39 i38 i37 i36
# 1 1 1 1 1
# 2 0 0 0 1
# 3 1 1 1 1
# 4 1 0 0 1
# 5 1 1 1 1
# 6 0 0 0 0
Though a more general solution will be
ndat1[] <- +(ndat1 == 4 | ndat1 == 5)
# i39 i38 i37 i36
# 1 1 1 1 1
# 2 0 0 0 1
# 3 1 1 1 1
# 4 1 0 0 1
# 5 1 1 1 1
# 6 0 0 0 0
Some data.table alternative
library(data.table)
setDT(ndat1)[, names(ndat1) := lapply(.SD, function(x) +(x %in% 4:5))]
And I'll to the dplyr guys have fun with mutate_each
I used the following to solve this issue:
recode<-function(ndat1){
ifelse((as.data.frame(ndat1)==4|as.data.frame(ndat1)==5),1,0)
}
sum_dc1<-as.data.frame(sapply(as.data.frame(ndat1),recode),drop=FALSE)
> sum_dc1
i39 i38 i37 i36
1 1 1 1 1
2 0 0 0 1
3 1 1 1 1
4 1 0 0 1
5 1 1 1 1
6 0 0 0 0
I was just wondering if anyone else had any thoughts, but overall I am satisfied with this way of solving the issue. Thank you.

create a new data frame with existing ones

Suppose I have the following data frames
treatmet1<-data.frame(id=c(1,2,7))
treatment2<-data.frame(id=c(3,7,10))
control<-data.frame(id=c(4,5,8,9))
I want to create a new data frame that is the union of those 3 and have an indicator column that takes the value 1 for each one.
experiment<-data.frame(id=c(1:10),treatment1=0, treatment2=0, control=0)
where experiment$treatment1[1]=1 etc etc
What is the best way of doing this in R?
Thanks!
Updated as per # Flodel:
kk<-rbind(treatment1,treatment2,control)
var1<-c("treatment1","treatment2","control")
kk$df<-rep(var1,c(dim(treatment1)[1],dim(treatment2)[1],dim(control)[1]))
kk
id df
1 1 treatment1
2 2 treatment1
3 7 treatment1
4 3 treatment2
5 7 treatment2
6 10 treatment2
7 4 control
8 5 control
9 8 control
10 9 control
If you want in the form of 1 and 0 , you can use table
ll<-table(kk)
ll
df
id control treatment1 treatment2
1 0 1 0
2 0 1 0
3 0 0 1
4 1 0 0
5 1 0 0
7 0 1 1
8 1 0 0
9 1 0 0
10 0 0 1
If you want it as a data.frame, then you can use reshape:
kk2<-reshape(data.frame(ll),timevar = "df",idvar = "id",direction = "wide")
names(kk2)[-1]<-sort(var1)
> kk2
kk2
id control treatment1 treatment2
1 1 0 1 0
2 2 0 1 0
3 3 0 0 1
4 4 1 0 0
5 5 1 0 0
6 7 0 1 1
7 8 1 0 0
8 9 1 0 0
9 10 0 0 1
df.bind <- function(...) {
df.names <- all.names(substitute(list(...)))[-1L]
ids.list <- setNames(lapply(list(...), `[[`, "id"), df.names)
num.ids <- max(unlist(ids.list))
tabs <- lapply(ids.list, tabulate, num.ids)
data.frame(id = seq(num.ids), tabs)
}
df.bind(treatment1, treatment2, control)
# id treatment1 treatment2 control
# 1 1 1 0 0
# 2 2 1 0 0
# 3 3 0 1 0
# 4 4 0 0 1
# 5 5 0 0 1
# 6 6 0 0 0
# 7 7 1 1 0
# 8 8 0 0 1
# 9 9 0 0 1
# 10 10 0 1 0
(Notice how it does include a row for id == 6.)
Taking
treatment1<-data.frame(id=c(1,2,7))
treatment2<-data.frame(id=c(3,7,10))
control<-data.frame(id=c(4,5,8,9))
You can use this:
x <- c("treatment1", "treatment2", "control")
f <- function(s) within(get(s), assign(s, 1))
r <- Reduce(function(x,y) merge(x,y,all=TRUE), lapply(x, f))
r[is.na(r)] <- 0
Result:
> r
id treatment1 treatment2 control
1 1 1 0 0
2 2 1 0 0
3 3 0 1 0
4 4 0 0 1
5 5 0 0 1
6 7 1 1 0
7 8 0 0 1
8 9 0 0 1
9 10 0 1 0
This illustrates what I was imagining to be the rbind strategy:
alldf <- rbind(treatmet1,treatment2,control)
alldf$grps <- model.matrix( ~ factor( c( rep(1,nrow(treatmet1)),
rep(2,nrow(treatment2)),
rep(3,nrow(control) ) ))-1)
dimnames( alldf[[2]])[2]<- list(c("trt1","trt2","ctrl"))
alldf
#-------------------
id grps.trt1 grps.trt2 grps.ctrl
1 1 1 0 0
2 2 1 0 0
3 7 1 0 0
4 3 0 1 0
5 7 0 1 0
6 10 0 1 0
7 4 0 0 1
8 5 0 0 1
9 8 0 0 1
10 9 0 0 1

Resources