I have the folowing R statement. Basically it goes through the entire matchesData data frame and checks if the conditions are matched for each row.
If it matches, put a '1' at matchesData$isRedPreferredLineup.
matchesData$isRedPreferredLineup <- ifelse((matchesData$redTop==red_poplist[1] &
matchesData$redADC==red_poplist[2] &
matchesData$redJungle==red_poplist[3] &
matchesData$redSupport==red_poplist[4] &
matchesData$redMiddle==red_poplist[5] &
matchesData$YearSeason==Season), 1,
matchesData$isRedPreferredLineup)
However, now I need the condition to be flexible. Meaning, if
matchesData$redTop==red_poplist[1]
matchesData$redADC==red_poplist[2]
matchesData$redJungle==red_poplist[3]
conditions are matched, or if
matchesData$redJungle==red_poplist[3]
matchesData$redSupport==red_poplist[4]
matchesData$redMiddle==red_poplist[5]
conditions are matched, or any other permutation comprising 3 or more of the following conditions are matched, I would like to put '1' at matchesData$isRedPreferredLineup.
(matchesData$redTop==red_poplist[1] &
matchesData$redADC==red_poplist[2] &
matchesData$redJungle==red_poplist[3] &
matchesData$redSupport==red_poplist[4] &
matchesData$redMiddle==red_poplist[5] &
matchesData$YearSeason==Season)
How can I do so in a vectorized ifelse statement like this?
Or is there a better way to do this?
Please bear with me, I am pretty new to R. Thanks.
Maybe this coud work:
selectIndex <- apply(matchesData,1,function(row){
sum(c(row['redTop'] == red_poplist[1],
row['redADC'] == red_poplist[2],
row['redJungle'] == red_poplist[3],
row['redSupport'] == red_poplist[4],
row['redMiddle'] == red_poplist[5],
row['YearSeason'] == Season) > 3)
})
matchesData$isRedPreferredLineup[selectIndex] <- 1
You could vectorise the TRUE/FALSE statements like this:
my.conditions <- cbind(matchesData$redTop==red_poplist[1], matchesData$redADC==red_poplist[2],
matchesData$redJungle==red_poplist[3], matchesData$redSupport==red_poplist[4],
matchesData$redMiddle==red_poplist[5], matchesData$YearSeason==Season)
Then you could consider S1 <- rowSums(my.conditions) which will give you the number of TRUEs in my.conditions and then (your final condition would boil down to ifelse(S1 > 2, 1, ...)) consider the following:
matchesData$isRedPreferredLineup[which(S1 > 2)] <- 1
Related
How could I identify a column in R dataframe using a variable? In the following code, I used paste0 to identify a columns with variable. Is there any alternative?
if ((leadsnp4[[paste0('Z_in_',trait1)]] > 0) & (leadsnp4[[paste0('Z_in_',trait2)]] > 0))
{leadsnp4$ConcordEffect='Yes'} else if ((leadsnp4[[paste0('Z_in_',trait1)]] < 0) & (leadsnp4[[paste0('Z_in_',trait2)]] < 0))
{leadsnp4$ConcordEffect='Yes'} else if ((leadsnp4[[paste0('Z_in_',trait1)]] > 0) & (leadsnp4[[paste0('Z_in_',trait2)]] < 0))
{leadsnp4$ConcordEffect='No'} else if ((leadsnp4[[paste0('Z_in_',trait1)]] < 0) & (leadsnp4[[paste0('Z_in_',trait2)]] > 0))
{leadsnp4$ConcordEffect='No'}
leadsnp4 is a dataframe. trait1 and trait2 are user defined variables. The above code is giving me warning : The condition has length > 1 and only the first element will be used. Also not getting the expected output.
Not sure what is wrong here. Maybe there are other alternatives for the above if else statements. Any help?
The way you're selecting columns in fine. Using df[[col_name]] (list context) is the same as df[, col_name] -- each returns a vector copy of column col_name. You can save the column name as a variable instead of using paste0 directly in the selection.
The reason you're getting an error is that if is not vectorized and you're giving it a vector with length > 1. In this case, if uses only the first value in the vector, but warns that it's doing so. ifelse is the vectorized version in base R (there's also dplyr::if_else). If I understand your code, the below should be close to what you're looking for.
t1 <- paste0('Z_in_', trait1)
t2 <- paste0('Z_in_', trait2)
# a single boolean vector indicating if trait1 and trait2 are
# both positive or both negative
same_sign <- ((leadsnp4[, t1] > 0) & (leadsnp4[, t2] > 0)) |
((leadsnp4[, t1] < 0) & (leadsnp4[, t2] < 0))
leadsnp4$ConcordEffect <- ifelse(same_sign, "Yes", "No")
Note that if trait1 and/or trai2 are equal to 0 they will be assigned false. You'll need to modify the logic if this is not the desired behavior.
Here is an explanation for why pasting will not work for creating a column reference and one suggestion for what you can do instead: Dynamically select data frame columns using $ and a character value
B <- 10000
results <- replicate(B, {
hand <- sample(hands1, 2)
(hand[1] %in% aces & hand[2] %in% facecard) | (hand[2] %in% aces & hand[1] %in% facecard)
})
mean(results)
this piece of code works perfectly and do the desired thi
this is a monte carlo simulation. I don't understand the way they put curly brackets {} in the replicate function. i can understand the function of that code but i cant understand the way they put the code.
The reason is that we have multiple expressions
hand <- sample(hands1, 2)
is the first expression and the second is
(hand[1] %in% aces & hand[2] %in% facecard) | (hand[2] %in% aces & hand[1] %in% facecard)
i.e. if there is only a single expression, we don't need to block with {}
It is a general case and not related to replicate i.e. if we use a for loop with a single expression, it doesn't need any {}
for(i in 1:5)
print(i)
and similarly, something like if/else
n <- 5
if(n == 5)
print(n)
It is only needed when we need more than one expression
I’m having trouble putting two conditions into a subset. The result is a whole bunch of NA.
> df[(df$col > 0) && (df$col < 4), ]
Drop the space after ',' and you only need one '&'.
df[df$col > 0 & df$col < 4,]
You may be getting NA 'cause you want OR (|) instead of AND (&).
I'm currently researching a matching-to-sample task in monkeys. I want to evaluate how often a certain stimulus was chosen, regardless of correctness of the choice.
To do so, I have a dataframe df with 6288 rows and 6 columns ("Monkey", "Session", "Sample", "Match", "Foil", "Success"), of which only the last three are important now.
The data in df$Match and df$Foilare the names of the stimuli (string) and df$Success is binary. df$Match and df$Foil are made up of 65 distinct stimuli names, which I included in a vector Match.Foil.
Now I want to count how often a picture (part of the vector Match.Foil) is clicked in all 6288 trials. That is, everytime the name is either part of df$Match & df$Success == "1" OR when df$Foil & df$Success == "0".
I tried to build a vector with the number of times clicked for each part of Match.Foil like this:
Pic.clicked= vector(mode="numeric", length= length(Match.Foil))
for (i in 1:length(Match.Foil)){
Pic.clicked[i] = ifelse(
df$Match == Match.Foil[i] & df$Success == "1")|
(df$Foil== Match.Foil[i] & df$Success == "0"),
Pic.clicked[i] +1,
Pic.clicked[i] +0)
}
So, as you see I wanted to use the functions Pic.clicked + 1 and Pic.clicked + 0 as the returns if the statement is TRUE or FALSE. It does not work and gives me the error:
In Pic.clicked[i] = ifelse((df$Match == Match.Foil[i] & ... :
number of items to replace is not a multiple of replacement length
Does anybody have an idea, how to build an appropriate counter? I thought about using switch, but I don't have any experience with that function and it seems not to work like I need it. I also tried running it for 6288 loops, but that produces the same warning.
you can use sum(), which on a boolean vector makes TRUE count as 1:
for (i in 1:length(Match.Foil)) {
Pic.clicked[i]= sum((Stage4.pics$Match == Match.Foil[i] & Stage4.pics$Success == "1")|
(Stage4.pics$Foil== Match.Foil[i] & Stage4.pics$Success == "0"))
}
I am trying to get my head around daa.R, one of the functions in the matchingMarkets R library (links are to GitHub repositories). On lines 134-135, one finds the following if statement
if (0 %in% (c.hist[[j]] & any(c.prefs[ ,j]==proposers[k]))){ # if no history and proposer is on preference list
c.hist[[j]][c.hist[[j]]==0][1] <- proposers[k] # then accept
}
where c.hist and proposers are a list and c.prefs a matrix.
I am puzzled by the parentheses in the conditional statement. Instead of the above synthax, I would have opted for
if (0 %in% c.hist[[j]] & any(c.prefs[ ,j]==proposers[k]))
I don't understand how the original condition may work. How could R possibly check whether 0 is in (c.hist[[j]] & any(c.prefs[ ,j]==proposers[k]))?
I am a beginner in R, so I wanted to make sure I was not missing something and tried to replicate a similar synthax with other conditions such as,
> x = list(4,3)
> y = list(5,2)
> if (3 %in% (x & any(y == 5))){z = 8}
As I expected, I got an error message
Error in x & any(y == 5) : operations are possible only for numeric, logical or complex types
whereas things go just fine when I write
if (3 %in% x & any(y == 5)){z = 8}
instead.
What am I missing? Why would the kind of conditional synthax I am puzzled by work in daa.R and not with the other conditions I tried?
When you ask R if 0 %in% x where x is a logical vector, R will first convert x to a numeric vector where FALSE becomes 0 and TRUE becomes 1. So essentially, asking if 0 %in% x is like asking if x contains any FALSE. This is arguably pretty bad practice. A better approach would be to test if any(!x) or !all(x). Worse, if x has length 1 as it seems to be the case here, you would just test if !x.
In light of the contorted usage, you are raising a very good question: is the code doing what it really meant to do? In R, the %in% operator has higher precedence than & (see ?Syntax), thus these two statements are not the same:
0 %in% (c.hist[[j]]) & any(c.prefs[ ,j]==proposers[k])) # original code
0 %in% c.hist[[j]] & any(c.prefs[ ,j]==proposers[k]) # what you suggested
and we would need to look closely at what the code is supposed to be doing to decide if it is correct or wrong. I will just point out that you did not test your assumption properly: the error you got ("unexpected '{'") is because you forgot a closing parenthesis:
if (3 %in% (x & any(y == 5)){z = 8}
should be
if (3 %in% (x & any(y == 5))){z = 8}