Creating a new column based on the condition of others - r

Still very new to coding and R, I am working with some healthcare data in a data frame. There are 3 outcomes that I am interested in - Mobilised_D1, Diet_D1 and Catheter_rm_D1. I wish to create a fourth column called AnyTwo whereby if any 2 of the 3 outcomes are Y or all three outcomes are Y, then it will be T for AnyTwo.
I've managed to do this by using [] as below:
ERAS_limited[ERAS_limited$Mobilised_D1 == "Y" & ERAS_limited$Catheter_rm_D1 == "Y", "AnyTwo"] <- T
ERAS_limited[ERAS_limited$Diet_D1 == "Y" & ERAS_limited$Catheter_rm_D1 == "Y", "AnyTwo"] <- T
ERAS_limited[ERAS_limited$Diet_D1 == "Y" & ERAS_limited$Catheter_rm_D1 == "Y" & ERAS_limited$Mobilised_D1 == "Y", "AnyTwo"] <- T
dput(head(ERAS_limited))
structure(list(Mobilised_D1 = structure(c(2L, 2L, 1L, 1L, 1L,
2L), .Label = c("N", "Y"), class = "factor"), Diet_D1 = structure(c(2L,
2L, 2L, 2L, 1L, 2L), .Label = c("N", "Y"), class = "factor"),
Catheter_rm_D1 = structure(c(2L, 2L, 1L, 1L, 1L, 2L), .Label = c("N",
"Y"), class = "factor"), AnyTwo = c(TRUE, TRUE, FALSE, FALSE,
FALSE, TRUE)), row.names = c(NA, 6L), class = "data.frame")```
However, I would be keen to see if there is a more elegant way of doing this e.g. by writing a loop for my own education and curiosity.

We can use rowSums to create the logical vector
library(dplyr)
ERAS_limited %>%
mutate(AnyTwo = rowSums(.[-4] == "Y") >= 2)
In base R, it would be
ERAS_limited$AnyTwo <- rowSums(ERAS_limited[-4]) == "Y") >= 2

Related

Finding best (or worst) synergizing champions in LoL pro games using R

I have some data that looks like this: https://i.imgur.com/hzEd7bT.png
These will be pro league of legends matches once they occur over the course of the next few months. I filled out a few as examples.
Rows 6-10 are champions that each team banned. Rows 11-15 are champions that each team picked.
Each week has about 10 games and there are 9 weeks.
The B and R at the top are Blue (side) and Red (side) in the game. Blue side always gets first choice of champion and red side always gets last choice.
I want to find the best (or worst) synergizing champions
To clarify what I mean by this, in my screenshot the team with Brand and Yuumi won both times while the team with Aurelion Sol and Azir lost both times.
Optimally, I want to know how many times a 2, 3, 4, or 5 characters were picked and the corresponding winrate.
Edit: I am not sure exactly how the data needs to look in R because I have never done this before, but I made two different versions of inputting it below.
LoLGames <- matrix(c('W','Annie','Ezreal','Yuumi','Camille','Brand',
'L','Nasus', 'Aurelion Sol', 'Azir', 'Blitzcrank', 'Caitlyn',
'L','Nasus', 'Aurelion Sol', 'Blitzcrank', 'Ezreal', 'Camille',
'W','Bard', 'Ashe', 'Yuumi', 'Kogmaw', 'Brand'),
ncol = 6, byrow = TRUE)
colnames(LoLGames) <- c("Result","Champ1","Champ2","Champ3","Champ4","Champ5")
rownames(LoLGames) <- c("Game1","Game2","Game3","Game4")
LoLGames <- as.table(LoLGames)
*Corresponding dput
structure(c("W", "L", "L", "W", "Annie", "Nasus", "Nasus", "Bard",
"Ezreal", "Aurelion Sol", "Aurelion Sol", "Ashe", "Yuumi", "Azir",
"Blitzcrank", "Yuumi", "Camille", "Blitzcrank", "Ezreal", "Kogmaw",
"Brand", "Caitlyn", "Camille", "Brand"), .Dim = c(4L, 6L), .Dimnames = list(
c("Game1", "Game2", "Game3", "Game4"), c("Result", "Champ1",
"Champ2", "Champ3", "Champ4", "Champ5")), class = "table")
Result <- c('W','L','L','W',NA)
G1W <- c('Annie','Ezreal','Yuumi','Camille','Brand')
G1L <- c('Nasus', 'Aurelion Sol', 'Blitzcrank', 'Ezreal', 'Camille')
G2L <- c('Nasus', 'Aurelion Sol', 'Blitzcrank', 'Ezreal', 'Camille')
G2W <- c('Bard', 'Ashe', 'Yuumi', 'Kogmaw', 'Brand')
LoLDf <- data.frame(Result, G1W, G1L, G2L, G2W)
*Corresponding dput
structure(list(Result = structure(c(2L, 1L, 1L, 2L, NA), .Label = c("L",
"W"), class = "factor"), G1W = structure(c(1L, 4L, 5L, 3L, 2L
), .Label = c("Annie", "Brand", "Camille", "Ezreal", "Yuumi"), class = "factor"),
G1L = structure(c(5L, 1L, 2L, 4L, 3L), .Label = c("Aurelion Sol",
"Blitzcrank", "Camille", "Ezreal", "Nasus"), class = "factor"),
G2L = structure(c(5L, 1L, 2L, 4L, 3L), .Label = c("Aurelion Sol",
"Blitzcrank", "Camille", "Ezreal", "Nasus"), class = "factor"),
G2W = structure(c(2L, 1L, 5L, 4L, 3L), .Label = c("Ashe",
"Bard", "Brand", "Kogmaw", "Yuumi"), class = "factor")), class = "data.frame", row.names = c(NA,
-5L))

For loop & if else working for less data but not working for more data

Calculation inside for loop & ifelse is working when I have 100-200 rows but not working when I have 20000 rows.
Can someone help me with the FOR loop and IFELSE if something is wrong or if there is some timeout happening in R studio when running for & if-else loop
Code:
#FROM HERE IT IS NOT WORKING WHEN WE HAVE 20000 ROWS OF DATA IN FINAL DATFRAME.
#WE ARE CREATING FINAL_V1 WHICH IS POPULATING ONLY 1 ROW
#New Dataframe with Null values
Final <- structure(list(Item = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = "0S1576", class = "factor"),
LC = structure(1:6, .Label = c("MW92", "OY01", "RM11", "RS11",
"WK14", "WK15"), class = "factor"), Fiscal.Week = structure(c(1L,
1L, 1L, 1L, 1L, 1L), .Label = "2019-W24", class = "factor"),
SS = c(15L, 7L, 5L, 9L, 2L, 2L), Freq = c(3, 6, 1, 2, 1,
1), agg = c(1, 1, 1, 1, 0, 0)), row.names = c(NA, -6L), class = "data.frame")
lctolc <- structure(list(Item = structure(c(1L, 1L, 1L, 1L, 1L), .Label = "0S1576", class = "factor"),
LC = structure(c(1L, 2L, 2L, 3L, 3L), .Label = c("MW92",
"OY01", "RM11"), class = "factor"), ToLC = structure(1:5, .Label = c("OY01",
"RM11", "RS11", "WK14", "WK15"), class = "factor")), row.names = c(NA,
-5L), class = "data.frame")
df <- as.data.frame(unique(Final$Item))
Final_v1 <- NA
j <- 1
i <- 1
#SS computations
#For 1 to no of rows in df(which is having no of unique items
for(j in 1:nrow(df)) {
#copying the data from Final to Final_v1(with charater type)
Final_v1 <- Final[Final$Item == as.character(df[j,1]),]
#for 1 to the no of rows in Final_v1
for(i in 1:nrow(Final_v1)) {
if(Final_v1[i,6] <= 0)
{
Final_v1[i,7] = Final_v1[i,4]}
else
{
if(Final_v1[i,5] == '1')
{
Final_v1[i,7]=0
}
else
{
Final_v1[i,7]=Final_v1[i,4]
}
SSNew <- Final_v1[i,7]
#Leftover distribution
LCS <- lctolc$ToLC[Final_v1$Item[i] == lctolc$Item & Final_v1$LC[i] == lctolc$LC]
inds <- Final_v1$LC %in% LCS
if (any(inds))
{ Final_v1$SS[inds]<- if (SSNew == 0) {Final_v1$SS[inds]==0} else {Final_v1$SS[inds]=Final_v1$SS[inds]} }
}
}
names(Final_v1)[7] <- "SSNew"
}
Can someone help why it is not performing for 20000rows

Match a pattern within any element of the data using data table rather than plyr

I have a very big data set and have not used data.table before. I am finding the syntax a bit difficult to follow. My main question is how can i reproduce the 'apply' function for a data table?
My data is as follows
dat1 <- structure(list(id = c(1L, 1L, 2L, 3L), diag1 = structure(1:4, .Label = c("I20.1","I21.3", "I48", "I60.8"), class = "factor"), diag2 = structure(c(3L,2L, 1L, 1L), .Label = c("", "I50", "I60.9"), class = "factor"), diag3 = structure(c(1L, 2L, 1L, 1L), .Label = c("", "I38.1"), class = "factor")), .Names = c("id", "diag1", "diag2", "diag3"), row.names = c(NA, -4L), class = "data.frame")
I want to add a variable for all records that have a diagnostic code either within the columns diag1, diag2 or diag 3 of I20, I21 or I60. Using apply and regex i have done the following.
code.list <- c("I20","I21","I60")
dat1$index <- apply(dat1[2:4],1, function(i) any(grep(paste(code.list,
collapse="|"), i)))
I get the final dataset that i want is illustrated as below
structure(list(id = c(1L, 1L, 2L, 3L), diag1 = structure(1:4, .Label = c("I20.1","I21.3", "I48", "I60.8"), class = "factor"), diag2 = structure(c(3L,2L, 1L, 1L), .Label = c("", "I50", "I60.9"), class = "factor"),diag3 = structure(c(1L, 2L, 1L, 1L), .Label = c("", "I38.1"), class = "factor"), index = c(TRUE, TRUE, FALSE, TRUE)), .Names = c("id","diag1", "diag2", "diag3", "index"), row.names = c(NA, -4L), class = "data.frame")
However this is going to take far too long using plyr. I was hoping to get the syntax for a data table. Would anybody be able to help?
Thanks in advance
A
We can do this with data.table
library(data.table)
setDT(dat1)[, index := Reduce(`|`, lapply(.SD, grepl,
pattern = paste(code.list, collapse="|"))), .SDcols = 2:4]
dat1
# id diag1 diag2 diag3 index
#1: 1 I20.1 I60.9 TRUE
#2: 1 I21.3 I50 I38.1 TRUE
#3: 2 I48 FALSE
#4: 3 I60.8 TRUE

change the names for certain columns in a data frame [duplicate]

This question already has answers here:
Changing column names of a data frame
(18 answers)
Closed 7 years ago.
If I want to change the name from 2 column to the end , why my command does not work ?
fredTable <- structure(list(Symbol = structure(c(3L, 1L, 4L, 2L, 5L), .Label = c("CASACBM027SBOG",
"FRPACBW027SBOG", "TLAACBM027SBOG", "TOTBKCR", "USNIM"), class = "factor"),
Name = structure(1:5, .Label = c("bankAssets", "bankCash",
"bankCredWk", "bankFFRRPWk", "bankIntMargQtr"), class = "factor"),
Category = structure(c(1L, 1L, 1L, 1L, 1L), .Label = "Banks", class = "factor"),
Country = structure(c(1L, 1L, 1L, 1L, 1L), .Label = "USA", class = "factor"),
Lead = structure(c(1L, 1L, 3L, 3L, 2L), .Label = c("Monthly",
"Quarterly", "Weekly"), class = "factor"), Freq = structure(c(2L,
1L, 3L, 3L, 4L), .Label = c("1947-01-01", "1973-01-01", "1973-01-03",
"1984-01-01"), class = "factor"), Start = structure(c(1L,
1L, 1L, 1L, 1L), .Label = "Current", class = "factor"), End = c(TRUE,
TRUE, TRUE, TRUE, FALSE), SeasAdj = c(FALSE, FALSE, FALSE,
FALSE, TRUE), Percent = structure(c(1L, 1L, 1L, 1L, 1L), .Label = "Fed", class = "factor"),
Source = structure(c(1L, 1L, 1L, 1L, 1L), .Label = "Res", class = "factor"),
Series = structure(c(1L, 1L, 1L, 1L, 2L), .Label = c("Level",
"Ratio"), class = "factor")), .Names = c("Symbol", "Name",
"Category", "Country", "Lead", "Freq", "Start", "End", "SeasAdj",
"Percent", "Source", "Series"), row.names = c("1", "2", "3",
"4", "5"), class = "data.frame")
Then in order to change the second column name to the end I use the following order but does not work
names(fredTable[,-1]) = paste("case", 1:ncol(fredTable[,-1]), sep = "")
or
names(fredTable)[,-1] = paste("case", 1:ncol(fredTable)[,-1], sep = "")
In general how one can change column names of specific columns for example
2 to end, 2 to 7 and etc and set it as the name s/he like
Replace specific column names by subsetting on the outside of the function, not within the names function as in your first attempt:
> names(fredTable)[-1] <- paste("case", 1:ncol(fredTable[,-1]), sep = "")
Explanation
If we save the new names in a vector newnames we can investigate what is going on under the hood with replacement functions.
#These are the names that will replace the old names
newnames <- paste("case", 1:ncol(fredTable[,-1]), sep = "")
We should always replace specific column names with the format:
#The right way to replace the second name only
names(df)[2] <- "newvalue"
#The wrong way
names(df[2]) <- "newvalue"
The problem is that you are attempting to create a new vector of column names then assign the output to the data frame. These two operations are simultaneously completed in the correct replacement.
The right way [Internal]
We can expand the function call with:
#We enter this:
names(fredTable)[-1] <- newnames
#This is carried out on the inside
`names<-`(fredTable, `[<-`(names(fredTable), -1, newnames))
The wrong way [Internal]
The internals of replacement the wrong way are like this:
#Wrong way
names(fredTable[-1]) <- newnames
#Wrong way Internal
`names<-`(fredTable[-1], newnames)
Notice that there is no `[<-` assignment. The subsetted data frame fredTable[-1] does not exist in the global environment so no assignment for `names<-` occurs.

plotting 3 variables on a single plot in ggplot2

Hi have an experiment which consists of three variables, and I would like to plot them all on a single plot.
This is my df:
AB <- data.frame(block=c("A", "A", "A", "A", "B", "B", "B", "B" ),
familiarity=c("fam", "fam", "unfam", "unfam" ),
prime=c("P", "UP" ),
RT=c("570.6929", "628.7446", "644.6268", "607.4312", "556.3581", "645.4821", "623.5624", "604.4113"))
Right now I can only break one of the variables into two separate plots, like this where A and B are the two levels of the third variable:
A <- AB[which(AB$block == "A"),]
B <- AB[which(AB$block == "B"),]
pa <- ggplot(data=A, aes(x=prime, y=RT, group=familiarity)) +
geom_line(aes(linetype=familiarity), size=1) +
expand_limits(y=c(500,650))
pb <- ggplot(data=B, aes(x=prime, y=RT, group=familiarity)) +
geom_line(aes(linetype=familiarity), size=1) +
expand_limits(y=c(500,650))
I would like to superimpose plot A over plot B, and have this third variables to be identified by color.
Any ideas?
Is this what you mean?
p_all <- ggplot(AB, aes(x=prime,y=RT,group=interaction(familiarity,block))) +
geom_line(aes(linetype=familiarity,color=block))
Data used:
AB <- structure(list(block = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L,
2L), .Label = c("A", "B"), class = "factor"), familiarity = structure(c(1L,
1L, 2L, 2L, 1L, 1L, 2L, 2L), class = "factor", .Label = c("fam",
"unfam")), prime = structure(c(1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L
), class = "factor", .Label = c("P", "UP")), RT = c(570.6929,
628.7446, 644.6268, 607.4312, 556.3581, 645.4821, 623.5624, 604.4113
)), .Names = c("block", "familiarity", "prime", "RT"), row.names = c(NA,
-8L), class = "data.frame")
IF you have different datasets for those variables, then you can specify the data
ggplot()+
geom_line(data=A, aes(x=prime, y=RT, group=familiarity,linetype=familiarity), size=1) +
geom_line(data=B, aes(x=prime, y=RT, group=familiarity,linetype=familiarity), size=1)+
expand_limits(y=c(500,650))

Resources