Loop for deleting rows with column criteria - r

I have a problem with my loop. It just delete some rows that have 0 or NA values in my desire column and I don't know why:
for (i in 1:105) {
for (j in 1:l[i+1]){
if(m[[i]][j,12]==0 | is.na(m[[i]][j,12])) {
m[[i]]=m[[i]][-j,]
}
}
}
Searching on the web I saw that maybe I could use apply function... something like:
for( i in 1:105){m[[i]]<-m[[i]][!apply(is.na(m[[i]]), 1, any),]}
for( i in 1:105){
as.null(0)
m[[i]]<-m[[i]][!apply(is.null(m[[i]]), 1, any),]
}
This throws me a dim(x) error... I want to set Zero number as NULL
I was thinking something as follows but clearly it isn't good... it just the idea.... I really don't know how to use apply function well
for( i in 1:105){as.null(0) m[[i]]<-!apply(m[[i]],1,is.null(m[[i]])) }
Thanks a lot for your useful help !

You use apply to apply a function over a margin of an array, but I think is not the best idea here, since you only need to subset the matrix properly. Let's focus in just one matrix m.
ind = m[,12] == 0 | is.na(m[,12])
ind will have TRUE where appropiate and the you can do
m = m[!ind, ] # m is a matrix, not the list
to remove the rows. You can put this inside the loop, or use lapply (to apply a function over a list), but first you need a function to be applied to every element in the list (all your 105 matrix), so
removeRows = function(m) {
ind = m[,12] == 0 | is.na(m[,12])
m = m[!ind, ]
return(m)
}
m = lapply(m, FUN=removeRows)
That should work.

Related

How to write a function containing a 'for' loop that uses a different function within? Applying this function to vectors?

I think I am misunderstanding some fundamental part of how 'for' loops and functions work. This function:
even.odd <- function(x) {
if (x != round(x)) {
y <- NA
} else if (x %% 2 == 0) {
y <- "even"
} else {
y <- "odd"
}
return(y)
}
works perfectly fine, returning "even", "odd", or "NA" given a number. However I am given two vectors :
test1 <- c(-1, 0, 2, 5)
test2 <- c(0, 2.7, 9.1)
and need to create a 'for' loop containing the even.odd function and test it using these vectors. I have read all the recommended reading/lecture notes, and tried for hours unsuccessfully to produce the desired result, using empty vectors, indexing new objects, just putting even.odd(num.vec). I'm not sure where I've gone so wrong.
We were given a starting point for the new function:
even.odd.vec <- function(num.vec) {
#write your code here
}
So far this is what I have come up with:
#creating intermediate function
intermediate <- function(num.vec) {
if (even.odd(num.vec) == "odd") {
return("odd")
} else if(even.odd(num.vec) == "even") {
return("even")
} else ifelse(is.na(num.vec), NA, "False")
}
#creating new desired function
even.odd.vec <- function(num.vec) {
for (i in seq_along(num.vec)) {
intermediate(num.vec)
print(intermediate(num.vec))
}
}
The intermediate function was the result of me running into various errors when trying to create a simpler body for even.odd.vec. But now when I try to use even.odd.vec with one of the test vectors I get this error:
the condition has length > 1 and only the first element will be usedthe condition has length > 1 and only the first element will be usedthe condition has length > 1 and only the first element will be usedthe condition has length > 1 and only the first element will be used
I am very stuck at this point and dying to know how to make something like this work. I've had a lot of fun working on it but I think I'm digging myself into a hole and/or making things much more complicated than necessary. Any help is unbelievably appreciated as my professor is out of town and the TA seems overwhelmed.
There are definitely ways to do this without for loop but since you want to use a for loop for this exercise explicitly we can use even.odd function that works for a single number and create a new function which uses for loop and calls even.odd function for every element individually.
even.odd.vec <- function(x) {
result <- character(length = length(x))
for (i in seq_along(x)) {
result[i] <- even.odd(x[i])
}
return(result)
}
You can then pass vectors test1, test2 to this function.
even.odd.vec(test1)
#[1] "odd" "even" "even" "odd"
even.odd.vec(test2)
#[1] "even" NA NA

Why does this work as a for loop but not within a function?

I'm trying to write a function that identifies if a number within a numerical vector is odd or even. The numerical vector has a length of 1000.
I know that the for loop works fine, and I just wanted to generalize it in the form of a function that takes a vector of any length
out<-vector()
f3<- function(arg){
for(i in 1:length(arg)){
if((arg[i]%%2==0)==TRUE){
out[i]<-1
}else{out[i]<-0
}
}
}
When run within a function, however, it just returns a NULL. Why is that, or what do I need to do to generalize the function work with any numerical vector?
As already mentioned by PKumar in the comments: Your function doesn't return anything, which means, the vector out exists only in the environment of your function.
To change this you can add return(out) to the end of your function. And you should also start your function with creating out before the loop. So your function would look like outlined below.
Note, that I assume you want to pass a vector of a certain length to your function, and get as a result a vector of the same length which contains 1 for even numbers and 0 for odd numbers. f3(c(1,1,2)) would return 0 0 1.
f3 <- function(arg){
out <- vector(length = length(arg), mode = "integer")
for(i in 1:length(arg)){
if((arg[i]%%2==0)==TRUE){ # note that arg[i]%%2==0 will suffice
out[i]<-1
} else {out[i]<-0
}
}
return(out) # calling out without return is enough and more inline with the tidyverse style guide
}
However, as also pointed out by sebastiann in the comments, some_vector %% 2 yields almost the same result. The difference is, that odd numbers yield 1 and even numbers 0. You can also put this into a function and subtract 1 from arg to reverse 0 and 1 :
f3 <- function(arg){
(arg-1) %% 2
}
A few thing to note about your code:
A function must return something
The logical if((arg[i]%%2==0)==TRUE) is redundant. if(arg[i]%%2==0) is enough, but wrong, because arg[i] does not exist.
the length(arg) is the length(1000) which, if ran, returns 1
You should change arg[i] with i and assign to i all the values from 1:1000, as follows:
R
out <-vector()
f3 <- function(arg){
for(i in 1:arg){
if(arg[i] %% 2 == 0){
out[i] <- 1
}
else{
out[i] <- 0
}
}
return(out)
}
f3(1000)

debug the if statement

I am trying to understand the for and if-statement in r, so I run a code where I am saying that if the sum of rows are bigger than 3 then return 1 else zero:
Here is the code
set.seed(2)
x = rnorm(20)
y = 2*x
a = cbind(x,y)
hold = c()
Now comes the if-statement
for (i in nrow(a)) {
if ([i,1]+ [i,2] > 3) hold[i,] == 1
else ([i,1]+ [i,2]) <- hold[i,] == 0
return (cbind(a,hold)
}
I know that maybe combining for and if may not be ideal, but I just want to understand what is going wrong. Please keep the explanation at a dummy level:) Thanks
You've got some issues. #mnel covered a better way to go about doing this, I'll focus on understanding what went wrong in this attempt (but don't do it this way at all, use a vectorized solution).
Line 1
for (i in nrow(a)) {
a has 20 rows. nrow(a) is 20. Thus your code is equivalent to for (i in 20), which means i will only ever be 20.
Fix:
for (i in 1:nrow(a)) {
Line 2
if ([i,1]+ [i,2] > 3) hold[i,] == 1
[i,1] isn't anything, it's the ith row and first column of... nothing. You need to reference your data: a[i,1]
You initialized hold as a vector, c(), so it only has one dimension, not rows and columns. So we want to assign to hold[i], not hold[i,].
== is used for equality testing. = or <- are for assignment. Right now, if the >3 condition is met, then you check if hold[i,] is equal to 1. (And do nothing with the result).
Fix:
if (a[i,1]+ a[i,2] > 3) hold[i] <- 1
Line 3
else ([i,1]+ [i,2]) <- hold[i,] == 0
As above for assignment vs equality testing. (Here you used an arrow assignment, but put it in the wrong place - as if you're trying to assign to the else)
else happens whenever the if condition isn't met, you don't need to try to repeat the condition
Fix:
else hold[i] <- 0
Fixed code together:
for (i in 1:nrow(a)) {
if (a[i,1] + a[i,2] > 3) hold[i] <- 1
else hold[i] <- 0
}
You aren't using curly braces for your if and else expressions. They are not required for single-line expressions (if something do this one line). They are are required for multi-line (if something do a bunch of stuff), but I think they're a good idea to use. Also, in R, it's good practice to put the else on the same line as a } from the preceding if (inside the for loop or a function it doesn't matter, but otherwise it would, so it's good to get in the habit of always doing it). I would recommend this reformatted code:
for (i in 1:nrow(a)) {
if (a[i, 1] + a[i, 2] > 3) {
hold[i] <- 1
} else {
hold[i] <- 0
}
}
Using ifelse
ifelse() is a vectorized if-else statement in R. It is appropriate when you want to test a vector of conditions and get a result out for each one. In this case you could use it like this:
hold <- ifelse(a[, 1] + a[, 2] > 3, 1, 0)
ifelse will take care of the looping for you. If you want it as a column in your data, assign it directly (no need to initialize first)
a$hold <- ifelse(a[, 1] + a[, 2] > 3, 1, 0)
Such operations in R are nicely vectorised.
You haven't included a reference to the dataset you wish to index with your call to [ (eg a[i,1])
using rowSums
h <- rowSums(a) > 3
I am going to assume that you are new to R and trying to learn about the basic function of the for loop itself. R has fancy functions called "apply" functions that are specifically for doing basic math on each row of a data frame. I am not going to talk about these.
You want to do the following on each row of the array.
Sum the elements of the row.
Test that the sum is greater than 3.
Return a value of 1 or 0 representing the result of 2.
For 1, luckily "sum" is a built in function. It pays off to check out the built in functions within every programming language because they save you time. To sum the elements of a row, just use sum(a[row_number,]).
For 2, you are evaluating a logical statement "is x >3?" where x is the result from 1. The ">3" statement returns a value of true or false. The logical expression is a fancy "if then" statement without the "if then".
> 4>3
[1] TRUE
> 2>3
[1] FALSE
For 3, a true or false value is a data structure called a "logical" value in R. A 1 or 0 value is a data structure called a "numeric" value in R. By converting the "logical" into a "numeric", you can change the TRUE to 1's and FALSE to 0's.
> class(4>3)
[1] "logical"
> as.numeric(4>3)
[1] 1
> class(as.numeric(4>3))
[1] "numeric"
A for loop has a min, a max, a counter, and an executable. The counter starts at the min, and increments until it goes to the max. The executable will run for each run of the counter. You are starting at the first row and going to the last row. Putting all the elements together looks like this.
for (i in 1:nrow(a)){
hold[i] <- as.numeric(sum(a[i,])>3)
}

Assign output from a loop to a list

this might be quiet a strange question but...
I have 3 vectors:
myseq=seq(8,22,1)
myseqema3=seq(3,4,1)
myseqema15=seq(10,20,1)
And I want to assign the results to my list:
SLResultsloop=vector(mode="list")
With this loop:
for (i in myseq){
for(j in myseqema3){
for( k in myseqema15){
SLResultsloop[[i-7]]= StopLoss(data=mydata,n=i,EMA3=j,EMA15=k)
names(SLResultsloop[[i-7]])=rep(paste("RSI=",i,"EMA3=",j,"EMA15=",k,sep="|"),
length=length(SLResultsloop[[i-7]]))
}
}
}
The problem is as follows: the loop above overrides the list elements. So does any one have a clever solution about how to assign the loopresults to unique list elements (without overriding previous results)?
One solution could be to assign the output to different lists but it is a bit of an ugly solution...
Best Regards
You can skip the loops entirely by using expand.grid and apply (or something similar):
g <-
expand.grid(myseq = myseq,
myseqema3 = myseqema3,
myseqema15 = myseqema15)
apply(g, 1, function(a) {
StopLoss(data=mydata, n=a[1], EMA3=a[2], EMA15=a[3])
})
You can then build your names for each element of the return value from apply using something like:
paste("RSI=",g[,1], "EMA3=", g[,2],"EMA15=", g[,3], sep="|")

How to select an element of a list based on its name in a function?

Consider this list:
l <- list(a=1:10,b=1:10,c=rep(0,10),d=11:20)
Then consider this example code (representative of the real larger code).
It simply selects the right element in the list based on name.
Parameters:
object:a list with at maximum four elements (i.e, sometimes less than four). The elements are always called a,b,c and d but do not always appear in the same order in the list.
x: name of element to select (i.e, a,b,c or d)
slct <- function(object,x) {
if (x=="a") {
object$a
} else if (x=="b") {
object$b
} else if (x=="c") {
object$c
} else if (x=="d") {
object$d
}
}
slct(l,"d")
That approach becomes impractible when you have not a mere 4 elements, but hundreds.
Moreover I cannot select based on a number (e.g., object[[1]]) because the elements don't come
in the same order each time. So how can I make the above code shorter?
I was thinking about the macro approach in SAS, but of course this doesn't work in R.
slct <- function(object,x) {
object$x
}
object$a
slct(object=l,x="a")
What do I have to replace object$x with to make it work but with less code than in the above code?
Simply refer to the element in the list using double brackets.
l[['a']]
l[['b']]
etc...
Alternatively, you could use regular expressions to build a function!
select <- function(object, x) {
index <- grep(x, names(object))
return(object[[index]])
}
Hope this helps!
You don't even need the grep here. The function above will result in an error if you try for example: select(l, "f") where as modifying the function in this manner will simply return a NULL which you can check with is.null(.):
select <- function(object, x) {
return(object[[x]])
}
select(l, "a")
# [1] 1 2 3 4 5 6 7 8 9 10
select(l, "f")
# NULL

Resources