Conditional statement in daa.R of the R "matchingMarkets" library - r

I am trying to get my head around daa.R, one of the functions in the matchingMarkets R library (links are to GitHub repositories). On lines 134-135, one finds the following if statement
if (0 %in% (c.hist[[j]] & any(c.prefs[ ,j]==proposers[k]))){ # if no history and proposer is on preference list
c.hist[[j]][c.hist[[j]]==0][1] <- proposers[k] # then accept
}
where c.hist and proposers are a list and c.prefs a matrix.
I am puzzled by the parentheses in the conditional statement. Instead of the above synthax, I would have opted for
if (0 %in% c.hist[[j]] & any(c.prefs[ ,j]==proposers[k]))
I don't understand how the original condition may work. How could R possibly check whether 0 is in (c.hist[[j]] & any(c.prefs[ ,j]==proposers[k]))?
I am a beginner in R, so I wanted to make sure I was not missing something and tried to replicate a similar synthax with other conditions such as,
> x = list(4,3)
> y = list(5,2)
> if (3 %in% (x & any(y == 5))){z = 8}
As I expected, I got an error message
Error in x & any(y == 5) : operations are possible only for numeric, logical or complex types
whereas things go just fine when I write
if (3 %in% x & any(y == 5)){z = 8}
instead.
What am I missing? Why would the kind of conditional synthax I am puzzled by work in daa.R and not with the other conditions I tried?

When you ask R if 0 %in% x where x is a logical vector, R will first convert x to a numeric vector where FALSE becomes 0 and TRUE becomes 1. So essentially, asking if 0 %in% x is like asking if x contains any FALSE. This is arguably pretty bad practice. A better approach would be to test if any(!x) or !all(x). Worse, if x has length 1 as it seems to be the case here, you would just test if !x.
In light of the contorted usage, you are raising a very good question: is the code doing what it really meant to do? In R, the %in% operator has higher precedence than & (see ?Syntax), thus these two statements are not the same:
0 %in% (c.hist[[j]]) & any(c.prefs[ ,j]==proposers[k])) # original code
0 %in% c.hist[[j]] & any(c.prefs[ ,j]==proposers[k]) # what you suggested
and we would need to look closely at what the code is supposed to be doing to decide if it is correct or wrong. I will just point out that you did not test your assumption properly: the error you got ("unexpected '{'") is because you forgot a closing parenthesis:
if (3 %in% (x & any(y == 5)){z = 8}
should be
if (3 %in% (x & any(y == 5))){z = 8}

Related

Access column using variable name in R in if statement, condition has length > 1

How could I identify a column in R dataframe using a variable? In the following code, I used paste0 to identify a columns with variable. Is there any alternative?
if ((leadsnp4[[paste0('Z_in_',trait1)]] > 0) & (leadsnp4[[paste0('Z_in_',trait2)]] > 0))
{leadsnp4$ConcordEffect='Yes'} else if ((leadsnp4[[paste0('Z_in_',trait1)]] < 0) & (leadsnp4[[paste0('Z_in_',trait2)]] < 0))
{leadsnp4$ConcordEffect='Yes'} else if ((leadsnp4[[paste0('Z_in_',trait1)]] > 0) & (leadsnp4[[paste0('Z_in_',trait2)]] < 0))
{leadsnp4$ConcordEffect='No'} else if ((leadsnp4[[paste0('Z_in_',trait1)]] < 0) & (leadsnp4[[paste0('Z_in_',trait2)]] > 0))
{leadsnp4$ConcordEffect='No'}
leadsnp4 is a dataframe. trait1 and trait2 are user defined variables. The above code is giving me warning : The condition has length > 1 and only the first element will be used. Also not getting the expected output.
Not sure what is wrong here. Maybe there are other alternatives for the above if else statements. Any help?
The way you're selecting columns in fine. Using df[[col_name]] (list context) is the same as df[, col_name] -- each returns a vector copy of column col_name. You can save the column name as a variable instead of using paste0 directly in the selection.
The reason you're getting an error is that if is not vectorized and you're giving it a vector with length > 1. In this case, if uses only the first value in the vector, but warns that it's doing so. ifelse is the vectorized version in base R (there's also dplyr::if_else). If I understand your code, the below should be close to what you're looking for.
t1 <- paste0('Z_in_', trait1)
t2 <- paste0('Z_in_', trait2)
# a single boolean vector indicating if trait1 and trait2 are
# both positive or both negative
same_sign <- ((leadsnp4[, t1] > 0) & (leadsnp4[, t2] > 0)) |
((leadsnp4[, t1] < 0) & (leadsnp4[, t2] < 0))
leadsnp4$ConcordEffect <- ifelse(same_sign, "Yes", "No")
Note that if trait1 and/or trai2 are equal to 0 they will be assigned false. You'll need to modify the logic if this is not the desired behavior.
Here is an explanation for why pasting will not work for creating a column reference and one suggestion for what you can do instead: Dynamically select data frame columns using $ and a character value

R - Flexible conditions

I have the folowing R statement. Basically it goes through the entire matchesData data frame and checks if the conditions are matched for each row.
If it matches, put a '1' at matchesData$isRedPreferredLineup.
matchesData$isRedPreferredLineup <- ifelse((matchesData$redTop==red_poplist[1] &
matchesData$redADC==red_poplist[2] &
matchesData$redJungle==red_poplist[3] &
matchesData$redSupport==red_poplist[4] &
matchesData$redMiddle==red_poplist[5] &
matchesData$YearSeason==Season), 1,
matchesData$isRedPreferredLineup)
However, now I need the condition to be flexible. Meaning, if
matchesData$redTop==red_poplist[1]
matchesData$redADC==red_poplist[2]
matchesData$redJungle==red_poplist[3]
conditions are matched, or if
matchesData$redJungle==red_poplist[3]
matchesData$redSupport==red_poplist[4]
matchesData$redMiddle==red_poplist[5]
conditions are matched, or any other permutation comprising 3 or more of the following conditions are matched, I would like to put '1' at matchesData$isRedPreferredLineup.
(matchesData$redTop==red_poplist[1] &
matchesData$redADC==red_poplist[2] &
matchesData$redJungle==red_poplist[3] &
matchesData$redSupport==red_poplist[4] &
matchesData$redMiddle==red_poplist[5] &
matchesData$YearSeason==Season)
How can I do so in a vectorized ifelse statement like this?
Or is there a better way to do this?
Please bear with me, I am pretty new to R. Thanks.
Maybe this coud work:
selectIndex <- apply(matchesData,1,function(row){
sum(c(row['redTop'] == red_poplist[1],
row['redADC'] == red_poplist[2],
row['redJungle'] == red_poplist[3],
row['redSupport'] == red_poplist[4],
row['redMiddle'] == red_poplist[5],
row['YearSeason'] == Season) > 3)
})
matchesData$isRedPreferredLineup[selectIndex] <- 1
You could vectorise the TRUE/FALSE statements like this:
my.conditions <- cbind(matchesData$redTop==red_poplist[1], matchesData$redADC==red_poplist[2],
matchesData$redJungle==red_poplist[3], matchesData$redSupport==red_poplist[4],
matchesData$redMiddle==red_poplist[5], matchesData$YearSeason==Season)
Then you could consider S1 <- rowSums(my.conditions) which will give you the number of TRUEs in my.conditions and then (your final condition would boil down to ifelse(S1 > 2, 1, ...)) consider the following:
matchesData$isRedPreferredLineup[which(S1 > 2)] <- 1

debug the if statement

I am trying to understand the for and if-statement in r, so I run a code where I am saying that if the sum of rows are bigger than 3 then return 1 else zero:
Here is the code
set.seed(2)
x = rnorm(20)
y = 2*x
a = cbind(x,y)
hold = c()
Now comes the if-statement
for (i in nrow(a)) {
if ([i,1]+ [i,2] > 3) hold[i,] == 1
else ([i,1]+ [i,2]) <- hold[i,] == 0
return (cbind(a,hold)
}
I know that maybe combining for and if may not be ideal, but I just want to understand what is going wrong. Please keep the explanation at a dummy level:) Thanks
You've got some issues. #mnel covered a better way to go about doing this, I'll focus on understanding what went wrong in this attempt (but don't do it this way at all, use a vectorized solution).
Line 1
for (i in nrow(a)) {
a has 20 rows. nrow(a) is 20. Thus your code is equivalent to for (i in 20), which means i will only ever be 20.
Fix:
for (i in 1:nrow(a)) {
Line 2
if ([i,1]+ [i,2] > 3) hold[i,] == 1
[i,1] isn't anything, it's the ith row and first column of... nothing. You need to reference your data: a[i,1]
You initialized hold as a vector, c(), so it only has one dimension, not rows and columns. So we want to assign to hold[i], not hold[i,].
== is used for equality testing. = or <- are for assignment. Right now, if the >3 condition is met, then you check if hold[i,] is equal to 1. (And do nothing with the result).
Fix:
if (a[i,1]+ a[i,2] > 3) hold[i] <- 1
Line 3
else ([i,1]+ [i,2]) <- hold[i,] == 0
As above for assignment vs equality testing. (Here you used an arrow assignment, but put it in the wrong place - as if you're trying to assign to the else)
else happens whenever the if condition isn't met, you don't need to try to repeat the condition
Fix:
else hold[i] <- 0
Fixed code together:
for (i in 1:nrow(a)) {
if (a[i,1] + a[i,2] > 3) hold[i] <- 1
else hold[i] <- 0
}
You aren't using curly braces for your if and else expressions. They are not required for single-line expressions (if something do this one line). They are are required for multi-line (if something do a bunch of stuff), but I think they're a good idea to use. Also, in R, it's good practice to put the else on the same line as a } from the preceding if (inside the for loop or a function it doesn't matter, but otherwise it would, so it's good to get in the habit of always doing it). I would recommend this reformatted code:
for (i in 1:nrow(a)) {
if (a[i, 1] + a[i, 2] > 3) {
hold[i] <- 1
} else {
hold[i] <- 0
}
}
Using ifelse
ifelse() is a vectorized if-else statement in R. It is appropriate when you want to test a vector of conditions and get a result out for each one. In this case you could use it like this:
hold <- ifelse(a[, 1] + a[, 2] > 3, 1, 0)
ifelse will take care of the looping for you. If you want it as a column in your data, assign it directly (no need to initialize first)
a$hold <- ifelse(a[, 1] + a[, 2] > 3, 1, 0)
Such operations in R are nicely vectorised.
You haven't included a reference to the dataset you wish to index with your call to [ (eg a[i,1])
using rowSums
h <- rowSums(a) > 3
I am going to assume that you are new to R and trying to learn about the basic function of the for loop itself. R has fancy functions called "apply" functions that are specifically for doing basic math on each row of a data frame. I am not going to talk about these.
You want to do the following on each row of the array.
Sum the elements of the row.
Test that the sum is greater than 3.
Return a value of 1 or 0 representing the result of 2.
For 1, luckily "sum" is a built in function. It pays off to check out the built in functions within every programming language because they save you time. To sum the elements of a row, just use sum(a[row_number,]).
For 2, you are evaluating a logical statement "is x >3?" where x is the result from 1. The ">3" statement returns a value of true or false. The logical expression is a fancy "if then" statement without the "if then".
> 4>3
[1] TRUE
> 2>3
[1] FALSE
For 3, a true or false value is a data structure called a "logical" value in R. A 1 or 0 value is a data structure called a "numeric" value in R. By converting the "logical" into a "numeric", you can change the TRUE to 1's and FALSE to 0's.
> class(4>3)
[1] "logical"
> as.numeric(4>3)
[1] 1
> class(as.numeric(4>3))
[1] "numeric"
A for loop has a min, a max, a counter, and an executable. The counter starts at the min, and increments until it goes to the max. The executable will run for each run of the counter. You are starting at the first row and going to the last row. Putting all the elements together looks like this.
for (i in 1:nrow(a)){
hold[i] <- as.numeric(sum(a[i,])>3)
}

Logical "Except" operator for If statements in R

FYI, I'm new to using R so my code is likely quite clunky. I've done my homework on this but haven't been able to find an "Except" logical operator for R and really need something like that in my code. My input data is a .csv containing integers and null values with 12 columns and 1440 rows.
oneDayData <- read.csv("data.csv") # Loading data
oneDayMatrix <- data.matrix(oneDayData, rownames.force = NA) #turning data frame into a matrix
rowBefore <- data.frame(oneDayData[i-1,10], stringsAsFactors=FALSE) # Creating a variable to be used in the if statement, represents cell before the cell in the loop
ctr <- 0 # creating a counter and zeroing it
for (i in 1:nrow(oneDayMatrix)) {
if ((oneDayMatrix[i,10] == -180) & (oneDayMatrix[i,4] == 0)) { # Makes sure that there is missing data matched with a zero in activityIn
impute1 <- replace(oneDayMatrix[ ,10], oneDayMatrix[i,10], rowBefore)
ctr <- (ctr + 1) # Populating the counter with how many rows get changed
}
else{
print("No data fit this criteria.")
}
}
print(paste(ctr, "rows have been changed.")) # Printing the counter and number of rows that got changed enter code here
I would like to add some kind of EXCEPT condition to my if statement or equivalent that says something like: employ the two previous conditions (see if statement in code) EXCEPT when oneDayMatrix[i-1, 4] > 0. I would really appreciate any help with this and thank you in advance!
"Except" is equivalent to "if not". The "not" operator in R is !. So to add that oneDayMatrix[i-1, 4] > 0 exception, you just need to modify your if statement as follows:
if ((oneDayMatrix[i, 10] == -180) &
(oneDayMatrix[i, 4] == 0) &
!(oneDayMatrix[i-1, 4] > 0)) { ... }
or equivalently:
if ((oneDayMatrix[i, 10] == -180) &
(oneDayMatrix[i, 4] == 0) &
(oneDayMatrix[i-1, 4] <= 0)) { ... }
This goes on top of a couple fixes that need to be made to your code:
as I pointed out, rowBefore is not defined properly: in terms of i which is not defined yet. Inside your for loop, just replace rowBefore with oneDayMatrix[i-1, 10]
as #noah pointed out, you need to start your loop at the second index: for (i in 2:nrow(oneDayMatrix)).

if function and length of the logical vector

I have a dataframe where the dates are given as hydrological years (October to September). To change this I am trying to use a if statement:
if(cet$month== 10|cet$month==11|cet$month==12)
cet$year <- substr(as.character(cet[,2]),1,4) else
cet$year <- substr(as.character(cet[,2]),6,9)
but I get an error:
the condition has length > 1 and only the first element will be used
Reading the "if" help file I realized that the condition has to be a length-one logical vector. Is there no way of using an "or" with an "if"? All I want is to apply that expression if the month is October, November or December.
ifelse is the vectorised version. You can also use %in% to reduce the number of statements.
cet$year <- ifelse(cet$month%in%(10:12), substr(as.character(cet[,2]),1,4), substr(as.character(cet[,2]),6,9))
Ok, here's a reproducible example that should help to clarify things:
# generate some vector
x <- c(1,2,4,4,5,5,6,6,6)
# have a check using OR, return values
x[x == 2 | x == 1]
## or return TRUE / FALSE
(x == 2 | x == 1)
or check ?ifelse
EDIT: Note that for characters you need to use "", like x == "yourchars" | x == "someotherchars"
Here's also some simple reference and how to work with operators: QuickR
the OR instruction is double pipes
| => || in the if()

Resources