I am trying to understand the for and if-statement in r, so I run a code where I am saying that if the sum of rows are bigger than 3 then return 1 else zero:
Here is the code
set.seed(2)
x = rnorm(20)
y = 2*x
a = cbind(x,y)
hold = c()
Now comes the if-statement
for (i in nrow(a)) {
if ([i,1]+ [i,2] > 3) hold[i,] == 1
else ([i,1]+ [i,2]) <- hold[i,] == 0
return (cbind(a,hold)
}
I know that maybe combining for and if may not be ideal, but I just want to understand what is going wrong. Please keep the explanation at a dummy level:) Thanks
You've got some issues. #mnel covered a better way to go about doing this, I'll focus on understanding what went wrong in this attempt (but don't do it this way at all, use a vectorized solution).
Line 1
for (i in nrow(a)) {
a has 20 rows. nrow(a) is 20. Thus your code is equivalent to for (i in 20), which means i will only ever be 20.
Fix:
for (i in 1:nrow(a)) {
Line 2
if ([i,1]+ [i,2] > 3) hold[i,] == 1
[i,1] isn't anything, it's the ith row and first column of... nothing. You need to reference your data: a[i,1]
You initialized hold as a vector, c(), so it only has one dimension, not rows and columns. So we want to assign to hold[i], not hold[i,].
== is used for equality testing. = or <- are for assignment. Right now, if the >3 condition is met, then you check if hold[i,] is equal to 1. (And do nothing with the result).
Fix:
if (a[i,1]+ a[i,2] > 3) hold[i] <- 1
Line 3
else ([i,1]+ [i,2]) <- hold[i,] == 0
As above for assignment vs equality testing. (Here you used an arrow assignment, but put it in the wrong place - as if you're trying to assign to the else)
else happens whenever the if condition isn't met, you don't need to try to repeat the condition
Fix:
else hold[i] <- 0
Fixed code together:
for (i in 1:nrow(a)) {
if (a[i,1] + a[i,2] > 3) hold[i] <- 1
else hold[i] <- 0
}
You aren't using curly braces for your if and else expressions. They are not required for single-line expressions (if something do this one line). They are are required for multi-line (if something do a bunch of stuff), but I think they're a good idea to use. Also, in R, it's good practice to put the else on the same line as a } from the preceding if (inside the for loop or a function it doesn't matter, but otherwise it would, so it's good to get in the habit of always doing it). I would recommend this reformatted code:
for (i in 1:nrow(a)) {
if (a[i, 1] + a[i, 2] > 3) {
hold[i] <- 1
} else {
hold[i] <- 0
}
}
Using ifelse
ifelse() is a vectorized if-else statement in R. It is appropriate when you want to test a vector of conditions and get a result out for each one. In this case you could use it like this:
hold <- ifelse(a[, 1] + a[, 2] > 3, 1, 0)
ifelse will take care of the looping for you. If you want it as a column in your data, assign it directly (no need to initialize first)
a$hold <- ifelse(a[, 1] + a[, 2] > 3, 1, 0)
Such operations in R are nicely vectorised.
You haven't included a reference to the dataset you wish to index with your call to [ (eg a[i,1])
using rowSums
h <- rowSums(a) > 3
I am going to assume that you are new to R and trying to learn about the basic function of the for loop itself. R has fancy functions called "apply" functions that are specifically for doing basic math on each row of a data frame. I am not going to talk about these.
You want to do the following on each row of the array.
Sum the elements of the row.
Test that the sum is greater than 3.
Return a value of 1 or 0 representing the result of 2.
For 1, luckily "sum" is a built in function. It pays off to check out the built in functions within every programming language because they save you time. To sum the elements of a row, just use sum(a[row_number,]).
For 2, you are evaluating a logical statement "is x >3?" where x is the result from 1. The ">3" statement returns a value of true or false. The logical expression is a fancy "if then" statement without the "if then".
> 4>3
[1] TRUE
> 2>3
[1] FALSE
For 3, a true or false value is a data structure called a "logical" value in R. A 1 or 0 value is a data structure called a "numeric" value in R. By converting the "logical" into a "numeric", you can change the TRUE to 1's and FALSE to 0's.
> class(4>3)
[1] "logical"
> as.numeric(4>3)
[1] 1
> class(as.numeric(4>3))
[1] "numeric"
A for loop has a min, a max, a counter, and an executable. The counter starts at the min, and increments until it goes to the max. The executable will run for each run of the counter. You are starting at the first row and going to the last row. Putting all the elements together looks like this.
for (i in 1:nrow(a)){
hold[i] <- as.numeric(sum(a[i,])>3)
}
Related
I try to fix dates (years) using a function
change_century <- function(x){
a <- year(x)
ifelse(test = a >2020,yes = year(x) <- (year(x)-100),no = year(x) <- a)
return(x)
}
The function works for specific row or using a loop for one column (here date of birth)
for (i in c(1:nrow(Df))){
Df_recode$DOB[i] <- change_century(Df$DOB[i])
}
Then I try to use mutate/across
Df_recode <- Df %>% mutate(across(list_variable_date,~change_century(.)))
It does not work. Is there something I am getting wrong? thank you !
Try:
change_century <- function(x){
a <- year(x)
newx <- ifelse(test = a > 2020, yes = a - 100, no = a)
return(newx)
}
(Frankly, the use of newx as a temporary storage and then returning it was done that way solely to introduce minimal changes in your code. In general, in this case one does not need return, in fact theoretically it adds an unnecessary function to the evaluation stack. I would tend to have two lines in that function: a <- year(x) and ifelse(..), without assignment. The default behavior in R is to return the value of the last expression, which in my case would be the results of ifelse, which is what we want. Assigning it to newx and then return(newx) or even just newx as the last expression has exactly the same effect.)
Rationale
ifelse cannot have variable assignment within it. That's not to say that is is a syntax error (it is not), but that it is counter to its intent. You are asking the function to go through each condition found in test=, and return a value based on it. Regardless of the condition, both yes= and no= are evaluated completely, and then ifelse joins them together as needed.
For demonstration,
ifelse(test = c(TRUE, FALSE, TRUE), yes = 1:3, no = 11:13)
The return value is something like:
c(
if (test[1]) yes[1] else no[1],
if (test[2]) yes[2] else no[2],
if (test[3]) yes[3] else no[3]
)
# c(1, 12, 3)
To capture the results of the zipped-together yeses and nos c(1, 12, 3), one must capture the return value from ifelse itself, not inside of the call to ifelse.
Another point that may be relevant: ifelse(cond, yes, now) is not at all a shortcut for if (cond) { yes } else { no }. Some key differences:
in if, the cond must always be exactly length 1, no more, no less.
In R < 4.2, length 0 returns an error argument is of length zero (see ref), while length 2 or more produces a warning the condition has length > 1 and only the first element will be used (see ref1, ref2).
In R >= 4.2, both conditions (should) produce an error (no warnings).
ifelse is intended to be vectorized, so the cond can be any length. yes= and no= should either be the same length or length 1 (recycling is in effect here); cond= should really be the same length as the longer of yes= and no=.
if does short-circuiting, meaning that if (TRUE || stop("quux")) 1 will never attempt to evaluate stop. This can be very useful when one condition will fail (logically or with a literal error) if attempted on a NULL object, such as if (!is.null(quux) && quux > 5) ....
Conversely, ifelse always evaluates all three of cond=, yes=, and no=, and all values in each, there is no short-circuiting.
Not sure why i am getting this error. Can we not execute this statement
asd <- c()
asd1 <- c()
if(asd == asd1)
{
pr <- 0
} else{
pr <- 1
}
Error in if (asd == asd1) { : argument is of length zero
Can we not execute null values here? Because is asd is also 0 and asd1 is also 0. Also I do not need to use length(asd) == length(asd1). But asd == asd1 should work right?
No, you are trying to compare nothing to nothing. if you don't want to use length but actually want to check if both vectors are the same:
identical(asd,asd1)
Use all function,
asd <- c()
asd1 <- c()
if(all(asd == asd1)){
pr <- 0
} else{
pr <- 1
}
> pr
[1] 0
like most of the r functions, "==" is a vectorial function in R. Therefore ==(x,y) compares the elements of x and y, returning a vector of the same size.
As stated by dvd280, identical is the "strict equality" between objects, that test that everything is exactly the same (by everything i mean everything ; every element of the structure must be the same).
Note : it doesn't check that the 2 objects are stored at the same place in the memory.
all.equal (with some arguments) checks the equality of everyone.
vectorial logic is something that is often seen in R ; by example
c(TRUE,TRUE)&c(TRUE,FALSE)
returns
c(TRUE,FALSE)
which is something fine because it is a language made for statistics : it relies on vectors, not variables which, in fact, in r, are length 1 vectors.
I'm trying to write a function that identifies if a number within a numerical vector is odd or even. The numerical vector has a length of 1000.
I know that the for loop works fine, and I just wanted to generalize it in the form of a function that takes a vector of any length
out<-vector()
f3<- function(arg){
for(i in 1:length(arg)){
if((arg[i]%%2==0)==TRUE){
out[i]<-1
}else{out[i]<-0
}
}
}
When run within a function, however, it just returns a NULL. Why is that, or what do I need to do to generalize the function work with any numerical vector?
As already mentioned by PKumar in the comments: Your function doesn't return anything, which means, the vector out exists only in the environment of your function.
To change this you can add return(out) to the end of your function. And you should also start your function with creating out before the loop. So your function would look like outlined below.
Note, that I assume you want to pass a vector of a certain length to your function, and get as a result a vector of the same length which contains 1 for even numbers and 0 for odd numbers. f3(c(1,1,2)) would return 0 0 1.
f3 <- function(arg){
out <- vector(length = length(arg), mode = "integer")
for(i in 1:length(arg)){
if((arg[i]%%2==0)==TRUE){ # note that arg[i]%%2==0 will suffice
out[i]<-1
} else {out[i]<-0
}
}
return(out) # calling out without return is enough and more inline with the tidyverse style guide
}
However, as also pointed out by sebastiann in the comments, some_vector %% 2 yields almost the same result. The difference is, that odd numbers yield 1 and even numbers 0. You can also put this into a function and subtract 1 from arg to reverse 0 and 1 :
f3 <- function(arg){
(arg-1) %% 2
}
A few thing to note about your code:
A function must return something
The logical if((arg[i]%%2==0)==TRUE) is redundant. if(arg[i]%%2==0) is enough, but wrong, because arg[i] does not exist.
the length(arg) is the length(1000) which, if ran, returns 1
You should change arg[i] with i and assign to i all the values from 1:1000, as follows:
R
out <-vector()
f3 <- function(arg){
for(i in 1:arg){
if(arg[i] %% 2 == 0){
out[i] <- 1
}
else{
out[i] <- 0
}
}
return(out)
}
f3(1000)
I have a command that generates a variable every 10 loops in R (index1, index2, index3... and so on). The command I have is functional, but I am thinking of a smarter way to write this command. Here's what my command looks like:
for (counter in 1:10){
for (i in 1:100){
if (counter == 1){
index1 <- data1 ## some really long command here, I just changed it to this simple command to illustrate the idea
}
if (counter == 2){
index2 <- data2
}
.
.
.
# until I reach index10
} indexing closure
} ## counter closure
Is there a way to write this without having to write the conditional if commands? I would like to generate index1, index2.... I am sure there is some easy way to do this but I just cannot think of it.
Thanks.
What you need is the modulo operator %%. inside the inner loop. Ex: 100%%10 returns 0 101%%10 returns 1 92%%10 returns 2 - in other words if it is multiple of 10 then you get 0. And the assign function.
Note: You no longer need the outer loop used in your example.
So to create a variable at every 10 iteration do something like this
for(i in 1:100){
#check if i is multiple of 10
if(i%%10==0){
myVar<-log(i)
assign(paste("index",i/10,sep=""), myVar)
}
}
ls() #shows that index1, index2, ...index10 objects have been created.
index1 #returns 2.302585
update:
Alternatively, you can store results in a vector
index<-vector(length=10)
for(i in 1:100){
#check if i is multiple of 10
if(i%%10==0){
index[i/10]<-log(i)
}
}
index #returns a vector with 10 elements, each a result at end of an iteration that is a multiple of 10.
FYI, I'm new to using R so my code is likely quite clunky. I've done my homework on this but haven't been able to find an "Except" logical operator for R and really need something like that in my code. My input data is a .csv containing integers and null values with 12 columns and 1440 rows.
oneDayData <- read.csv("data.csv") # Loading data
oneDayMatrix <- data.matrix(oneDayData, rownames.force = NA) #turning data frame into a matrix
rowBefore <- data.frame(oneDayData[i-1,10], stringsAsFactors=FALSE) # Creating a variable to be used in the if statement, represents cell before the cell in the loop
ctr <- 0 # creating a counter and zeroing it
for (i in 1:nrow(oneDayMatrix)) {
if ((oneDayMatrix[i,10] == -180) & (oneDayMatrix[i,4] == 0)) { # Makes sure that there is missing data matched with a zero in activityIn
impute1 <- replace(oneDayMatrix[ ,10], oneDayMatrix[i,10], rowBefore)
ctr <- (ctr + 1) # Populating the counter with how many rows get changed
}
else{
print("No data fit this criteria.")
}
}
print(paste(ctr, "rows have been changed.")) # Printing the counter and number of rows that got changed enter code here
I would like to add some kind of EXCEPT condition to my if statement or equivalent that says something like: employ the two previous conditions (see if statement in code) EXCEPT when oneDayMatrix[i-1, 4] > 0. I would really appreciate any help with this and thank you in advance!
"Except" is equivalent to "if not". The "not" operator in R is !. So to add that oneDayMatrix[i-1, 4] > 0 exception, you just need to modify your if statement as follows:
if ((oneDayMatrix[i, 10] == -180) &
(oneDayMatrix[i, 4] == 0) &
!(oneDayMatrix[i-1, 4] > 0)) { ... }
or equivalently:
if ((oneDayMatrix[i, 10] == -180) &
(oneDayMatrix[i, 4] == 0) &
(oneDayMatrix[i-1, 4] <= 0)) { ... }
This goes on top of a couple fixes that need to be made to your code:
as I pointed out, rowBefore is not defined properly: in terms of i which is not defined yet. Inside your for loop, just replace rowBefore with oneDayMatrix[i-1, 10]
as #noah pointed out, you need to start your loop at the second index: for (i in 2:nrow(oneDayMatrix)).