R - How do you populate a vector inside a function? - r

I have a complex function. The function works. Inside, there is a vector called csvcols, which reads a csv, and creates a list of columns in my csv that have a quality I am looking for.
After my function is complete, csvcols is NULL, even though when I include print(csvcols) within the function, it returns columns, e.g. (1, 3, 5, 6, 7).
How can I get a list of all of the columns after I have looped through my variable, like for(xx in loop){SumofGeneration}?
SumofGeneration <- function(xx) {
csvcols = c()
for(i in 1:length(xx)) {
func2(xx[i])
if( func2(xx[i]) == T){
csvcols <- c(csvcols, which(colnames(filename) %in% xx[i]))
}
}
xxgeneration <- rowSums(filename[,csvcols])
}
#Generates list of relevant cols (csvcols vector) in file (filename)
#Sums values in those cols and returns xxgeneration
loop <- c( 14 strings here)
For (xx in loop){
SumofGeneration(xx)}
I have tried saving my csvcols as:
csvcols2 <- c(csvcols2, csvcols)
My loop runs SumofGeneration on 14 strings in my list (loop), and within the function generates values in csvcols, and then when the function ends I lose these values and csvcols = NULL.
csvcols2 always returns NULL. How can I stop losing my values?
I want my output to be a vector with all 14 csvcols values in one vector - csvcols2.

Related

Delete items from nested lists which contains a specific string

I would like to delete nested items of a R list that contain the string available. However, I need to keep the list size (i.e. NULL if all items deleted). The below code generate a possible input from which all items should be removed:
nested.list <- list()
for(lop in 1:4){
nested.list[[lop]] <- c("available","available")}
The expected output is:
for(lop in 1:4){
nested.list.out[lop] <- list(NULL)}
However, if the item is other than available, it should be kept. Let's assume the following input:
nested.list[[1]][[2]] <- "hold"
The expected output would be:
nested.list.out[[1]] <- "hold"
It is important to highlight that the string hold is only to exemplify. In my real data-set, each item of each nested list can have random strings and should all be kept in my output. Any idea to do it wisely?
The simplest way is to use lapply to loop over the list. On each iteration remove all available strings and return NULL if a resulting vector is empty.
nested.list.out <- lapply(nested.list, function(x) {
x <- x[x != "available"] # Remove "available" from vector
if (length(x) < 1){
# Here resulting vector is empty, so return NULL
return(NULL)
} else {
return(x)
}
})

Loops and filling empty vectors in R

I'm looking to fill in this empty vector:
empty_vec <- rep("", times=length(number_vec))
with sequential numbers from this loop:
for (numbers in number_vec) {
sqrt <- sqrt(numbers)
empty_vec[numbers] <- sqrt
}
where numbers_vec is c(16:49).
However, when I do this, the first positions (1-15) in my empty_vec are not filled?
You can address this in two ways:
First, you can create a counter, that will register which step of the loop you are, and use this as index to empty_vect, like this:
empty_vec <- rep("", times=length(number_vec))
counter=0
for (numbers in number_vec) {
counter=counter+1
sqrt<-sqrt(numbers)
empty_vec[counter]<-sqrt
}
Or you can just create an empty vector and concatenate each new value, like this:
empty_vec <- c()
for (numbers in number_vec) {
sqrt<-sqrt(numbers)
empty_vec <- c(empty_vec,sqrt)
}
The way you were doing, is like you started to fill your vector in 16th position, that's way you had error.
First you need to understand how for loop works
General express for for loop is
for(var in seq) expr
var = A syntactical name for a variable
seq = An expression evaluating to a vector (including a list and an expression) or to a pairlist or NULL. A factor value will be coerced to a character vector.
so note it , "seq" will be the value of the "var".
In your example , you wrote
for (numbers in number_vec)
where,
numbers = Name of the variable
number_vec = c(16:49)
So here , the initial value of "numbers" will be first value of "number_vec" which is 16.
in later step in loop, the expression
empty_vec[numbers]<-sqrt
where ,
empty_vec[numbers] indicate the 16th position of the empty_vec as initially numbers started with value 16
As you start with 16th position , the previous 15 position remain empty.
Possible solution of your problem :
number_vec = c(16:49)
empty_vec <- rep("", times=length(number_vec))
for (numbers in seq_along(number_vec)) {
sqrt<-sqrt(number_vec[numbers])
empty_vec[numbers]<-sqrt
}

use of double brackets unclear

I'm new to R. Reading Book Of R by Tilman Davies. An example is provided for how to use an externally defined helper function which incidentally utilizes double square brackets [[]]. Please explain what helper.call[[1]] and helper.call[[2]] are doing and use of double brackets here.
multiples_helper_ext <- function(x=foo,matrix.flags,mat=diag(2){
indexes <- which(matrix.flags)
counter <- 0
result <- list()
for(i in indexes){
temp <- x[[i]]
if(ncol(temp)==nrow(mat)){
counter <- counter+1
result[[counter]] <- temp%*%mat
}
}
return(list(result,counter))
}
multiples4 <- function(x,mat=diag(2),str1="no valid matrices",str2=str1){
matrix.flags <- sapply(x,FUN=is.matrix)
if(!any(matrix.flags)){
return(str1)
}
helper.call <- multiples_helper_ext(x,matrix.flags,mat=diag(2)
result <- helper.call[[1]] #I dont understand this use of double bracket
counter <- helper.call[[2]] #and here either
if(counter==0){
return(str2)
} else {
return(result)
}
}
foo <- list(matrix(1:4,2,2),"not a matrix","definitely not a matrix",matrix(1:8,2,4),matrix(1:8,4,2))
In R there are two basic types of objects: lists and vectors. The items of lists can be other objects, the items of of vectors are usually numbers, strings, etc.
To access items in a list, you use the double bracket [[]]. This gives back the object at that place of the list.
So
x <- 1:10
x is now a vector of integers
L <- list( x, x, "hello" )
L is a list whose first item is the vector x, its second item is the vector x, and its third item is the string "hello".
L[[2]]
This give back a vector, 1:10, which is stored in the 2nd place in L.
L[2]
This is a bit confusing, but this gives back a list whose only item is 1:10, i.e. it only contains L[[2]].
In R, when you want to return multiple values, you usually do this with a list. So, you might end you function with
f <- function() {
return( list( result1="hello", result2=1:10) )
}
x = f()
Now you can access the two results with
print( x[["result1"]] )
print( x[["result2"]] )
You can also access items of a list with ''$, so instead you can write
print( x$result1 )
print( x$result2 )
The syntax [[]] is used for list in python. Your helper.call is a list (of result and counter), so helper.cal[[1]] returns the first element of this list (result).
Have a look here: Understanding list indexing and bracket conventions in R

R: output from for loop into vector/dataframe - object not found

I am trying to compile two vectors from my for loop, to then cbind into a table. (I was trying to do this all in one step, but because of the below issue I'm trying to simplify.)
I set three vectors, id_name, count_rows, and id_test.
Going through my new_dat (a pre-exiting data frame), I'm setting vector result to be the number of the rows where the ID is i.
I'm then printing the output, which works fine.
But when I try to push the values into the vectors, I get Error: object 'id_name' not found. And the same for the other two.
Here's my code:
id_name <- c()
count_rows <- c()
id_test <- c()
for (i in id) {
result <- sum(new_dat$ID == i)
id_test <- c("hello", "world")
id_name <- c(id_name, i)
count_rows <-c(count_rows, result)
print(result)
print(i)
}
Initialize your vectors differently. See below; you may need to change numeric if your data aren't numeric, and substitute the data's actual length for N (if you know it beforehand).
id_name <- vector('numeric', length=N)
What you are doing now creates a NULL variable, of class NULL.

Using lapply to subset rows from data frames -- incorrect number of dimensions error

I have a list called "scenbase" that contains 40 data frames, which are each 326 rows by 68 columns. I would like to use lapply() to subset the data frames so they only retain rows 33-152. I've written a simple function called trim() (below), and am attempting to apply it to the list of data frames but am getting an error message. The function and my attempt at using it with lapply is below:
trim <- function(i)
{ (i <- i[33:152,]) }
lapply(scenbase, trim)
Error in i[33:152, ] : incorrect number of dimensions
When I try to do the same thing to one of the individual data frames (soil11base.txt) that are included in the list (below), it works as expected:
soil11base.txt <- soil11base.txt[33:152,]
Any idea what I need to do to get the dimensions correct?
You have 2 solutions. You can either
(a) assign to a new list newList = lapply(scenbase, function(x) { x[33:152,,drop=F]} )
(b) use the <<- operator will assign your trimmed data in place lapply(1:length(scenbase), function(x) { scenbase[[x]] <<- scenbase[[x]][33:152,,drop=F]} ).
Your call does not work because the i is not in the global scope. You can work your way around that by using calls to the <<- operator which assigns to the first variable it finds in successive parent environments. Or by creating a new trimmed list.
Here is some code that reproduces solution (a):
listOfDfs = list()
for(i in 1:10) { listOfDfs[[i]] = data.frame("x"=sample(letters,200,replace=T),"y"=sample(letters,200,replace=T)) }
choppedList = lapply(listOfDfs, function(x) { x[33:152,,drop=F]} )
Here is some code that reproduces solution (b):
listOfDfs = list()
for(i in 1:10) { listOfDfs[[i]] = data.frame("x"=sample(letters,200,replace=T),"y"=sample(letters,200,replace=T)) }
lapply(1:length(listOfDfs), function(x) { listOfDfs[[x]] <<- listOfDfs[[x]][33:152,,drop=F]} )

Resources