I have a dataset with an ID column with multiple visits for every ID. I am trying to create a new variable Status, which will check the Visit column and Value column. The conditions are as follows
For visit in 1,2 & 3, if the values are 1,1,1 then 1
For visit in 1,2 & 3, if the values are 0,1,1 then 0
For visit in 1,2 & 3, if the values are 0,0,0 then 0
How do I specify this condition in R ?
Below is a sample dataset
ID
Visit
Value
1
1
1
1
2
1
1
3
1
2
1
1
2
2
0
2
3
0
3
1
0
3
2
0
3
3
0
4
1
0
4
2
1
4
3
1
Result dataset
ID
Visit
Value
Status
1
1
1
1
1
2
1
1
1
3
1
1
2
1
1
0
2
2
0
0
2
3
0
0
3
1
0
0
3
2
0
0
3
3
0
0
4
1
0
0
4
2
1
0
4
3
1
0
I'd have tried something like this (suppose your initial table is called df):
status = c()
for(i in 1:4){ #1:4 correspond to the ID you showed us
if(sum(df[df$ID == i,'value'])==3) status=c(status,rep(1,3))
if(sum(df[df$ID == i,'value'])!=3) status=c(status,rep(0,3))
}
df = cbind(df,status)
I hope that it will help you
I believe that case_when from the dplyr package is what you need to use. Here more details on that fuction: https://dplyr.tidyverse.org/reference/case_when.html
I have "Y maze" sequence data containing the characters, A,B,C. I am trying to quantitative the number of times those three values are found together. The data looks like this:
Animal=c(1,2,3,4,5)
VisitedZones=c(1,2,3,4,5)
data=data.frame(Animal, VisitedZones)
data[1,2]=("A,C,B,A,C,A,B,A,C,A,C,A,C,B,B,C,A,C,C,C")
data[2,2]=("A,C,B,A,C,A,B,A,C,A,C,A,C,B")
data[3,2]=("A,C,B,A,C,A,B,A,C,A")
data[4,2]=("A,C,B,A,C,A,A,A,B,A,C,A,C,A,C,B")
data[5,2]=("A,C,B,A,C,A,A,A,B,")
The tricky part is that I also have to consider the reading frame so that I can find all instances of ABC combinations. There are three reading frames, For example:
Here is the working example I have so far.
Split <- strsplit(data$VisitedZones, ",", fixed = TRUE)
## How long is each list element?
Ncol <- vapply(Split, length, 1L)
## Create an empty character matrix to store the results
M <- matrix(NA_character_, nrow = nrow(data),ncol = max(Ncol),
dimnames = list(NULL, paste0("V", sequence(max(Ncol)))))
## Use matrix indexing to figure out where to put the results
M[cbind(rep(1:nrow(data), Ncol),sequence(Ncol))] <- unlist(Split,
use.names = FALSE)
# Bind the values back together, here as a "data.table" (faster)
v2=data.table(Animal = data$Animal, M)
# I get error here
df=mutate(as.data.frame(v2),trio=paste0(v2,lead(v2),lead(v2,2)))
table(df$trio[1:(length(v2)-2)])
It would be great if I could get something like this:
Animal VisitedZones ABC ACB BCA BAC CAB CBA
1 A,B,C,A,B.C... 2 0 1 0 1 0
2 A,B,C,C... 1 0 0 0 0 0
3 A,C,B,A... 0 1 0 0 0 1
df<-mutate(as.data.frame(v2),trio=paste0(v2,lead(v2),lead(v2,2)))
table(df$trio[1:(length(v2)-2)])
Using dplyr, I generate for every letter in your vector the three-letter combination that starts from it, then create a table of frequencies of all found combinations (minus the last two, which are incomplete).
Result:
AAB ABC BCA CAA CAB
1 6 5 1 4
Your revised question is basically completely different, so I'll answer it here.
First, I would say your data structure doesn't make much sense to me, so I'll start out by reshaping it into something I can work with:
v2<-as.data.frame(t(v2))
Flip it over so the letters are in columns, not rows;
v2<-tidyr::gather(v2,"v","letter",na.rm=T)
Melt the table so it's long data (so that I'll be able to use lead etc).
v2<-group_by(v2,v)
df=mutate(v2,trio=paste0(letter,lead(letter),lead(letter,2)))
This brings us back basically to where we were at the end of the last question, only the data is grouped by the "animal" variable (here called "v" and represented by V1 thru V5).
df<-df[!grepl("NA",df$trio),]
Even though we removed the unnecessary NA's, we still end up having those pesky ABNA and ANANA etc at the end of each group, so this line will remove anything with an NA in it.
tt<-table(df$v,df$trio)
And finally, we create the table but also break it by "v". The result is this:
AAA AAB ABA ACA ACB ACC BAC BBC BCA CAA CAB CAC CBA CBB CCC
V1 0 0 1 3 2 1 2 1 1 0 1 3 1 1 1
V2 0 0 1 3 2 0 2 0 0 0 1 2 1 0 0
V3 0 0 1 2 1 0 2 0 0 0 1 0 1 0 0
V4 1 1 1 3 2 0 2 0 0 1 0 2 1 0 0
V5 1 1 0 1 1 0 1 0 0 1 0 0 1 0 0
You can now cbind it to your original data to get something like what you described, but it requires just an additional step, because of the way table saves its results:
data<-cbind(data,spread(as.data.frame(tt),Var2,Freq))[,-3]
Which ends up looking like this:
Animal VisitedZones AAA AAB ABA ACA ACB ACC BAC BBC BCA CAA CAB CAC CBA CBB CCC
1 1 A,C,B,A,C,A,B,A,C,A,C,A,C,B,B,C,A,C,C,C 0 0 1 3 2 1 2 1 1 0 1 3 1 1 1
2 2 A,C,B,A,C,A,B,A,C,A,C,A,C,B 0 0 1 3 2 0 2 0 0 0 1 2 1 0 0
3 3 A,C,B,A,C,A,B,A,C,A 0 0 1 2 1 0 2 0 0 0 1 0 1 0 0
4 4 A,C,B,A,C,A,A,A,B,A,C,A,C,A,C,B 1 1 1 3 2 0 2 0 0 1 0 2 1 0 0
5 5 A,C,B,A,C,A,A,A,B, 1 1 0 1 1 0 1 0 0 1 0 0 1 0 0
I have a data frame where each Item has three categories (a, b,c) and a numeric Answer for each category is recorded (either 0 or 1). I would like to create a new column contingent on the rows in the Answer column. This is how my data frame looks like:
Item <- rep(c(1:3), each=3)
Option <- rep(c('a','b','c'), times=3)
Answer <- c(1,1,0,1,0,1,1,1,1)
df <- data.frame(Item, Option, Answer)
Item Option Answer
1 1 a 1
2 1 b 1
3 1 c 0
4 2 a 0
5 2 b 0
6 2 c 1
7 3 a 1
8 3 b 1
9 3 c 1
What is needed: whenever the three categories in the Option column are 1, the New column should receive a 1. In any other case, the column should have a 0. The desired output should look like this:
Item Option Answer New
1 1 a 1 0
2 1 b 1 0
3 1 c 0 0
4 2 a 0 0
5 2 b 0 0
6 2 c 1 0
7 3 a 1 1
8 3 b 1 1
9 3 c 1 1
I tried to achieve this without using a loop, but I got stuck because I don't know how to make a new column contingent on a group of rows, not just a single one. I have tried this solution but it doesn't work if the rows are not grouped in pairs.
Do you have any suggestions? Thanks a bunch!
This should work:
df %>%
group_by(Item)%>%
mutate(New = as.numeric(all(as.logical(Answer))))
using data.table
DT <- data.table(Item, Option, Answer)
DT[, Index := as.numeric(all(as.logical(Answer))), by= Item]
DT
Item Option Answer Index
1: 1 a 1 0
2: 1 b 1 0
3: 1 c 0 0
4: 2 a 1 0
5: 2 b 0 0
6: 2 c 1 0
7: 3 a 1 1
8: 3 b 1 1
9: 3 c 1 1
Or using only base R
df$Index <- with(df, +(ave(!!Answer, Item, FUN = all)))
df$Index
#[1] 0 0 0 0 0 0 1 1 1