I am trying to run a function to download data from the USGS website using dataRetrieval package of R and a function I have created called getstreamflow. The code is the following:
siteNumber <- c("094985005","09498501","09489500","09489499","09498502")
Streamflow <- sapply(siteNumber, function(siteNumber) tryCatch(getstreamflow(siteNumber), error = function(e) message(paste("Error in station ", siteNumber))))
Streamflow <- Filter(NROW,Streamflow) #to delete empty data frames
I got the output I want that it is the one shown in the image below:
However, when I ran the same code but increase the number of stations in the input siteNumber
The output change and instead to produce several dataframes inside of a list. It generates a list for each data frame.
Does someone know why this happens? It is the same function only changes the number of stations in the siteNumber
Based on the image showed in the new data, each element in the list is nested as a list. We can extract the list element (of length 1) with [[1]] and then apply the Filter
out <- Filter(NROW, lapply(Streamflow, function(x) x[[1]]))
As we used NROW, it passed the test for list as well where it returns 1 for length attribute of list and thus all the elements meet the condition TRUE. Also, in the previous step, OP uses sapply and sapply is one function which can sometimes simplify the output. Instead of sapply use lapply (or specify simplify = FALSE)
Related
I am using the function text stat_keyness that look at the most frequently appearing words for a specific group of documents in comparison with all the other groups of documents (so basically you input the target group of documents and the output is a dataset containing the words ordered from the most important to the less important and some other columns with some statistics.
I have a a character vector with all the name of the documents groups I want to apply Keynes analysis to:
interests_list <- c(unique(data$interest))
(it looks like : chr "0" , "340" , "456" etc.. basically each number corresponds to a group of documents)
I can easily apply stat_keyness to a single group of document as follows
keyness <- dfm(dfmat_data, groups = "group_interest")
#Calculate keyness and determine audience as target group, compare frequencies of words
between target and reference documents.
result_keyness <- textstat_keyness(keyness, target = "17627")
the problem is that I don't want to run stat_keyness for each group individually as I have around 100 groups.
I was thinking to use a for loop, but I am not sure how to create a list of all the dataframes generated by text stat_keyness
I wrote this so far, but I don't know how to store all the results I would obtain
for(i in interest_list) {textstat_keyness(keyness, target = i )
}
otherwise, I tried with apply but it doesn't work
keylist <- lapply(keyness, textstat_keyness(keyness, target = interest_list ))
any idea how I can do to obtain my list of data frame in any efficient way?
thank you very much,
Carlo
Alternative to the for loop provided by JaiPizGon, is a solution with lapply.
keylist <- lapply(interest_list, function(i) textstat_keyness(keyness, target = i))
Note that lapply is essentially a for loop, which always return a list.
The notation used by JaiPizGon is also correct, only you should be careful in growing objects in R - see chapter 2 in "The R Inferno".
So if you are more comfortable using a for loop I suggest specifying the size of the list prior to assignment, i.e.:
keylist <- vector("list", length(interest_list))
for(i in seq_along(interest_list)) {
keylist[[i]] <- textstat_keyness(keyness, target = interest_list[i])
}
Have you tried initializing a list and assigning the result of textstat_keyness function?
Code:
keylist <- list()
for (i in 1:length(interest_list)) {
keylist[[i]] <- textstat_keyness(keyness, target = interest_list[i])
}
Let say that I have these vectors:
time <- c(306,455,1010,210,883,1022,310,361,218,166)
status <- c(1,1,0,1,1,0,1,1,1,1)
gender <- c(1,1,1,1,1,1,2,2,1,1)
And I turn it into these data frame:
dataset <- data.frame(time, status, gender)
I want to list the factors in the third column using this function (p/s: pardon the immaturity. I'm still learning):
getFactor<-function(dataset){
result <- list()
result["Factors"] <- unique(dataset[[3]])
return(result)
}
And all I get is this:
getFactor(dataset)
$Factors
[1] 1
Warning message:
In result["Factors"] <- unique(dataset[[3]]) :
number of items to replace is not a multiple of replacement length
I tried using levels, but all I get is an empty list. My question is (1) why does this happen? and (2) is there any other way that I can get the list of the factor in a function?
Solution is simple, you just need double brackets around "Factors" :)
In the function
result[["Factors"]] <- unique(dataset[[3]])
That should be the line.
The double brackets return an element, single brackets return that selection as a list.
Sounds silly, by try this
test <- list()
class(test["Factors"])
class(test[["Factors"]])
The first class will be of type 'list'. The second will be of type 'NULL'. This is because the single brackets returns a subset as a list, and the double brackets return the element itself. It's useful depending on the scenario. The element in this case is "NULL" because nothing has been assigned to it.
The error "number of items to replace is not a multiple of replacement length" is because you've asked it to put 3 things into a single element (that element is a list). When you use double brackets you actually put it inside a list, where you can have multiple elements, so it can work!
Hope that makes sense!
Currently, when you create your data frame, dataset$gender is double vector (which R will automatically do if everything in it is numbers). If you want it to be a factor, you can declare it that way at the beginning:
dataset <- data.frame(time, status, gender = as.factor(gender))
Or coerce it to be a factor later:
dataset$gender <- as.factor(gender)
Then getting a vector of the levels is simple, without writing a function:
level_vector <- levels(dataset$gender)
level_vector
You're also subsetting lists & data frames incorrectly in your function. To call the third column of dataset, use dataset[,3]. The first element of a list is called by list[[1]]
how can I efficiently call a larger range of numbered dataframes, i.e. T1,T2,T3,..., without explicitly having to write them down? I tried the code below, but that doesn't work..
lrange <- 1:10
tseries <- as.list(paste0("T", lrange,sep = ""))
I obtain something that can't be used e.g. by do.call("rbind",tseries)
I have a stack made of rasters
s<-stack(list of ASCI files)
I am trying to perform this operation
df<-as.data.frame(c(s[[1]],s[[2]],s[[2]],s[["bathymetry"]]))
but I get this error
Error in as.data.frame.default(x[[i]], optional = TRUE) :
cannot coerce class "structure("RasterLayer", package = "raster")" to a data.frame
When I perform this operation on a single raster such as
df<-as.data.frame(s[[1]])
everything works fine. But I have to extract many rasters and combined them in one dataframe. The only solution I see now is to extract them individually and then combined them, is there a better solution? I am working with hundreds of raster at a time.
EDIT: I should also add that this function goes inside a loop and I am only extracting a subset of the raster on each loop.
Or use...
data.frame( rasterToPoints( s ) )
Drop the columns you don't want afterwards.
To apply a function to each element of a list, the apply family commands are helpful:
lapply( s, as.data.frame )
That returns a list of data.frames.
To restrict it to only the elements that you want, just subset to a smaller list beforehand.
s_small <- s[c(1,2,3,5)]
lapply( s_small, as.data.frame )
Sometimes I have code which references a specific dataset based on some variable ID. I have then been creating lines of code using paste0, and then eval(parse(...)) that line to execute the code. This seems to be getting sloppy as the length of the code increases. Are there any cleaner ways to have dynamic data reference?
Example:
dataset <- "dataRef"
execute <- paste0("data.frame(", dataset, "$column1, ", dataset, "$column2)")
eval(parse(execute))
But now imagine a scenario where dataRef would be called for 1000 lines of code, and sometimes needs to be changed to dataRef2 or dataRefX.
Combining the comments of Jack Maney and G.Grothendieck:
It is better to store your data frames that you want to access by a variable in a list. The list can be created from a vector of names using get:
mynames <- c('dataRef','dataRef2','dataRefX')
# or mynames <- paste0( 'dataRef', 1:10 )
mydfs <- lapply( mynames, get )
Then your example becomes:
dataset <- 'dataRef'
mydfs[[dataset]][,c('column1','column2')]
Or you can process them all at once using lapply, sapply, or a loop:
mydfs2 <- lapply( mydfs, function(x) x[,c('column1','column2')] )
#G.Grothendieck has shown you how to use get and [ to elevate a character value and return the value of a named object and then reference named elements within that object. I don't know what your code was intended to accomplish since the result of executing htat code would be to deliver values to the console, but they would not have been assigned to a name and would have been garbage collected. If you wanted to use three character values: objname, colname1 and colname2 and those columns equal to an object named after a fourth character value.
newname <- "newdf"
assign( newname, get(dataset)[ c(colname1, colname2) ]
The lesson to learn is assign and get are capable of taking character character values and and accessing or creating named objects which can be either data objects or functions. Carl_Witthoft mentions do.call which can construct function calls from character values.
do.call("data.frame", setNames(list( dfrm$x, dfrm$y), c('x2','y2') )
do.call("mean", dfrm[1])
# second argument must be a list of arguments to `mean`