How to put many loaded numbered dataframes into a list? - r

how can I efficiently call a larger range of numbered dataframes, i.e. T1,T2,T3,..., without explicitly having to write them down? I tried the code below, but that doesn't work..
lrange <- 1:10
tseries <- as.list(paste0("T", lrange,sep = ""))
I obtain something that can't be used e.g. by do.call("rbind",tseries)

Related

Output of a function in R change with the number of inputs

I am trying to run a function to download data from the USGS website using dataRetrieval package of R and a function I have created called getstreamflow. The code is the following:
siteNumber <- c("094985005","09498501","09489500","09489499","09498502")
Streamflow <- sapply(siteNumber, function(siteNumber) tryCatch(getstreamflow(siteNumber), error = function(e) message(paste("Error in station ", siteNumber))))
Streamflow <- Filter(NROW,Streamflow) #to delete empty data frames
I got the output I want that it is the one shown in the image below:
However, when I ran the same code but increase the number of stations in the input siteNumber
The output change and instead to produce several dataframes inside of a list. It generates a list for each data frame.
Does someone know why this happens? It is the same function only changes the number of stations in the siteNumber
Based on the image showed in the new data, each element in the list is nested as a list. We can extract the list element (of length 1) with [[1]] and then apply the Filter
out <- Filter(NROW, lapply(Streamflow, function(x) x[[1]]))
As we used NROW, it passed the test for list as well where it returns 1 for length attribute of list and thus all the elements meet the condition TRUE. Also, in the previous step, OP uses sapply and sapply is one function which can sometimes simplify the output. Instead of sapply use lapply (or specify simplify = FALSE)

Problems with renaming columns via variables in R

I'm having issues with a specific problem I have a dataset of a ton of matrices that all have V1 as their column names, essentially NULL. I'm trying to write a loop to replace all of these with column names from a list but I'm running into some issues.
To break this down to the most simple form, this code isn't functioning as I'd expect it to.
nameofmatrix <- paste('column_', i, sep = "")
colnames(eval(as.name(nameofmatrix))) <- c("test")
I would expect this to take the value of column_1 for example, and replace (in the 2nd line) with "test" as the column name.
I tried to break this down smaller, for example, if I run print(eval(as.name(nameofmatrix)) I get the object's column/rows printed as expected and if I run print(colnames(eval(as.name(nameofmatrix))) I'm getting NULL as expected for the column header (since it was set as V1).
I've even tried to manually type in the column name, such as colnames(column_1) <- c("test) and this successfully works to rename the column. But once this variable is put in the text's place as shown above, it does not work the same. I'm having difficulties finding a solution on how to rename several matrix columns after they have been created with this method. Does anyone have any advice or suggestions?
Note, the error I'm receiving on trying to run this is
Error in eval([as.name](nameofmatrix)) <- \`vtmp\` : could not find function "eval<-"
We could return the values of the objects in a list with get (if there are multiple objects use mget, then rename the objects in the list and update those objects in the global env with list2env
list2env(lapply(mget(nameofmatrix), function(x) {colnames(x) <- newnames
x}), .GlobalEnv)
It can also be done with assign
data(mtcars)
nameofobject <- 'mtcars'
assign(nameofobject, `colnames<-`(get(nameofobject),
c('mpg1', names(mtcars)[-1])))
Now, check the names of 'mtcars'
names(mtcars)[1]
#[1] "mpg1"

get() not working for column in a data frame in a list in R (phew)

I have a list of data frames. I want to use lapply on a specific column for each of those data frames, but I keep throwing errors when I tried methods from similar answers:
The setup is something like this:
a <- list(*a series of data frames that each have a column named DIM*)
dim_loc <- lapply(1:length(a), function(x){paste0("a[[", x, "]]$DIM")}
Eventually, I'll want to write something like results <- lapply(dim_loc, *some function on the DIMs*)
However, when I try get(dim_loc[[1]]), say, I get an error: Error in get(dim_loc[[1]]) : object 'a[[1]]$DIM' not found
But I can return values from function(a[[1]]$DIM) all day long. It's there.
I've tried working around this by using as.name() in the dim_loc assignment, but that doesn't seem to do the trick either.
I'm curious 1. what's up with get(), and 2. if there's a better solution. I'm constraining myself to the apply family of functions because I want to try to get out of the for-loop habit, and this name-as-list method seems to be preferred based on something like R- how to dynamically name data frames?, but I'd be interested in other, more elegant solutions, too.
I'd say that if you want to modify an object in place you are better off using a for loop since lapply would require the <<- assignment symbol (<- doesn't work on lapply`). Like so:
set.seed(1)
aList <- list(cars = mtcars, iris = iris)
for(i in seq_along(aList)){
aList[[i]][["newcol"]] <- runif(nrow(aList[[i]]))
}
As opposed to...
invisible(
lapply(seq_along(aList), function(x){
aList[[x]][["newcol"]] <<- runif(nrow(aList[[x]]))
})
)
You have to use invisible() otherwise lapply would print the output on the console. The <<- assigns the vector runif(...) to the new created column.
If you want to produce another set of data.frames using lapply then you do:
lapply(seq_along(aList), function(x){
aList[[x]][["newcol"]] <- runif(nrow(aList[[x]]))
return(aList[[x]])
})
Also, may I suggest the use of seq_along(list) in lapply and for loops as opposed to 1:length(list) since it avoids unexpected behavior such as:
# no length list
seq_along(list()) # prints integer(0)
1:length(list()) # prints 1 0.

colMeans failing within for loop R

I've been working on scraping data from the web and manipulating it in R, but I'm having some trouble when I begin to include the dreaded for loop. I am working with some arbitrary sport statistics, with the idea being that I can calculate the per game average for various stats for each player.
Each of these pieces works outside of the loop, but falls apart inside. Ideally, my code will do three things:
1) Scraping the data. I have a list of player names "names" (by row), and the unique piece of the url for each player in column 2. The website has a table named "stats" on each page, which is nice of them.
library(XML)
statMean <- matrix(ncol = 8, nrow = 20)
for(h in 1:20){
webname <- names[h,2]
vurl <- paste("http://www.pro-football-reference.com/players/",
webname, "/gamelog/2015")
tables <- readHTMLTable(vurl)
t1 <- tables$stats
2) Pick the columns I want and turn the values into numeric values.
temp <- t1[, c(9,10,12,13,14,15,17)]
temp <- sapply(temp, function(x) as.numeric(as.character(x)))
3) Calculate the mean of each column, bind the unique player name to the column means, and rbind that vector to a full table.
statMean <- rbind(statMean, c(webname, colMeans(temp)))
When I run through these steps outside of the Loop, it seems to work ... when I run the loop I inevitably get:
Error in colMeans(temp) : 'x' must be an array of at least two dimensions
I've looked at a number of For Loop questions on this site, and my code looks a lot different than how it started. Unfortunately, each time I try something new, I end up with a version of the above error.
Any help would be fantastic. Thanks.

Loop within a function and automatically create objects in R

I try to calculate the column means for diffrent groups in R. there exist several methods to assign groups and so two columns where created that contain diffrent groupings.
# create a test df
df.abcd.2<-data.frame(Grouping1=c("a","f","a","d","d","f","a"),Grouping2=c("y","y","z","z","x","x","q"),Var1=sample(1:7),Var2=sample(1:7),Var3=rnorm(1:7))
df.abcd.2
Now I created a loop with assign, lapply, split and colMeans to get my results and store the in diffrent dfs. The loop works fine.
#Loop to create the colmeans and store them in dataframes
for (i in 1:2){
nam <- paste("RRRRRR",deparse(i), sep=".")
assign(nam, as.data.frame(
lapply(
split(df.abcd.2[,3:5], df.abcd.2[,i]), colMeans)
)
)
}
So now i would like to create a function to apply this method on diffrent dataframes. My attemp looked like this:
# 1. function to calculate colMeans for diffrent groups
# df= desired datatframe,
# a=starting column: beginning of the columns that contain the groups, b= end of columns that contain the groups
# c=startinc column: beginning of columns to be analized, d=end of columns do be analized
function.split.colMeans<-function(df,a,b,c,d)
{for (i in a:b){
nam <- paste("OOOOO",deparse(i), sep=".")
assign(nam, as.data.frame(
lapply(
split(df[,c:d], df[,i]), colMeans)
)
)
}
}
#test the function
function.split.colMeans(df.abcd.2,1,2,3,5)
So when I test this function I get neither an error message nor results... Can anyone help me out, please?
It's working perfectly. Read the help for assign. Learn about frames and environments.
In other words, its creating the variables inside your function, but they don't leak out into the environment you see when you do ls() at the command line. If you put print(ls()) inside your functions loop you'll see them, but when the function ends, they disappear.
Normally, the only way functions interact with their calling environment is by their return value. Any other method is entering a whole world of pain.
DONT use assign to create things with sequential or informative names. Ever. Unless you know what you are doing, which you don't... Stick them in lists, then you can index the parts for looping and so on.

Resources