I've read [this][1] and [this][2], but am unable to adapt it to my use case. I am trying to use the openxlsx function writeData to write several dataframes I saved in a vector along with the names of the workbook sheets, also saved in a vector:
names_of_worksheets <- c(
"total_suppliers",
"nato_fvey_suppliers",
"us_suppliers",
"sole_sourcing",
"single_sourcing",
"geographic_risk_us_only",
"foreign_dependence",
"exposure_to_non_nato_fvey"
)
names_of_dataframes <- c(
total_suppliers,
nato_fvey_suppliers,
us_suppliers,
sole_sourcing,
single_sourcing,
geographic_risk_us_only,
foreign_dependence,
exposure_to_non_nato_fvey
)
The pseudo code I'd like to write is a for loop (which I think I can do in python) that iterates over two lists/vectors:
for (name_of_worksheet, name_of_dataframe in names_of_worksheets, names_of_data_frames) {
writeData(workbook, name_of_worksheet, name_of_dataframe)
}
That of course doesn't work. I've tried map2 but gotten an error I can't deal with:
map2(names_of_worksheets, names_of_dataframes, writeData, workbook)
Error: Mapped vectors must have consistent lengths:
* `.x` has length 8
* `.y` has length 25
Any thoughts? Thanks!
[1]: Looping over multiple lists with base R
[2]: R Loop Iterating Over Two Lists
I think you might want to use list() instead of c() to gather your data frames.
The later will take all the columns of the data frames and put them as independent elements in a list (which is why you get inconsistent length error) while the former keeps them as separate elements. The map2 should work then for simultaneous iteration.
It looks like writeData takes workbook as first element and currently your map2 call would use it as third element, so maybe you want to rewrite it as
map2(names_of_worksheets, names_of_dataframes, function(x,y) writeData(workbook, x, y))
Related
I am trying to run a function to download data from the USGS website using dataRetrieval package of R and a function I have created called getstreamflow. The code is the following:
siteNumber <- c("094985005","09498501","09489500","09489499","09498502")
Streamflow <- sapply(siteNumber, function(siteNumber) tryCatch(getstreamflow(siteNumber), error = function(e) message(paste("Error in station ", siteNumber))))
Streamflow <- Filter(NROW,Streamflow) #to delete empty data frames
I got the output I want that it is the one shown in the image below:
However, when I ran the same code but increase the number of stations in the input siteNumber
The output change and instead to produce several dataframes inside of a list. It generates a list for each data frame.
Does someone know why this happens? It is the same function only changes the number of stations in the siteNumber
Based on the image showed in the new data, each element in the list is nested as a list. We can extract the list element (of length 1) with [[1]] and then apply the Filter
out <- Filter(NROW, lapply(Streamflow, function(x) x[[1]]))
As we used NROW, it passed the test for list as well where it returns 1 for length attribute of list and thus all the elements meet the condition TRUE. Also, in the previous step, OP uses sapply and sapply is one function which can sometimes simplify the output. Instead of sapply use lapply (or specify simplify = FALSE)
In R:
I tried to make a list of dataframes arrayed by the names of dataframes (p_text_tm_list_1, p_text_tm_list_2, ..., p_text_tm_list_892)
by using loop (for i in 1:892)
but the result of that codes was arrayed by binary (1,10,100,101...) system as you can see in the second captured console screen.
Why was the result arrayed by binary system?
How can I array the dataframe in decimal system?
Thanks for reading.
Here is a way to solve your problem.
First, create the list p_text_top10_list without resorting to assign. The list is created with its final length in order not to keep extending it,which is ineffective.
p_text_top10_list <- vector("list", length = length(p_text_tm_list))
for(i in seq_along(p_text_tm_list)){
p_text_top10_list[[i]] <- head(p_text_tm_list[[i]], 10)
}
Another much simpler way is to use lapply.
p_text_top10_list <- lapply(p_text_tm_list, head, 10)
That's it. This one-liner does exactly the same as the previous for loop.
Now assign the names with 3 digits to have them in the proper order.
names(p_text_top10_list) <- sprintf("p_text_top10_list_%03d", seq_along(p_text_top10_list))
I want to use function for repetitively making up set with different names.
for example, if I have 5 random vectors.
number1<-sample(1:10, 3)
number2<-sample(1:10, 3)
number3<-sample(1:10, 3)
number4<-sample(1:10, 3)
number5<-sample(1:10, 3)
Then, I will use these vectors for selecting rows in raw data set(i.e. dataframe)
testset1<-raw[number1,]
testset2<-raw[number2,]
testset3<-raw[number3,]
tsetset4<-raw[number4,]
testset5<-raw[number5,]
It takes lot of spaces in manuscript for writing up each commands. I'm trying to shorten these commands with using 'function'
However, I found that it is hard to use variables in a function statement for writing 'text argument'. For example, it is easy to use variables like this.
mean_function<-function(x){
mean(x)
}
But, I want to use function like this.
testset "number with 1-5" <-raw[number"number 1-5",]
I would really appreciate your help.
You don't need to create a function for this task, simply use lapply to loop over the list of elements produced by mget(), then set some names and finally put all results in the global environment:
rowSelected <-lapply(mget(paste0("number", 1:5)), function(x) raw[x, ])
names(rowSelected) <- paste0("testset", 1:5)
list2env(rowSelected, envir = .GlobalEnv)
I have a data.frame dim = (200,500)
I want to do a shaprio.test on each column of my dataframe and append to a list. This is what I'm trying:
colstoremove <- list();
for (i in range(dim(I.df.nocov)[2])) {
x <- shapiro.test(I.df.nocov[1:200,i])
colstoremove[[i]] <- x[2]
}
However this is failing. Some pointers? (background is mainly python, not much of an R user)
Consider lapply() as any data frame passed into it runs operations on columns and the returned list will be equal to number of columns:
colstoremove <- lapply(I.df.noconv, function(col) shapiro.test(col)[2])
Here is what happens in
for (i in range(dim(I.df.nocov)[2]))
For the sake of example, I assume that I.df.nocov contains 100 rows and 5 columns.
dim(I.df.nocov) is the vector of I.df.nocov dimensions, i.e. c(100, 5)
dim(I.df.nocov)[2] is the 2nd dimension of I.df.nocov, i.e. 5
range(x)is a 2-element vector which contains minimal and maximal values of x. For example, range(c(4,10,1)) is c(1,10). So range(dim(I.df.nocov)[2]) is c(5,5).
Therefore, the loop iterate twice: first time with i=5, and second time also with i=5. Not surprising that it fails!
The problem is that R's function range and Python's function with the same name do completely different things. The equivalent of Python's range is called seq. For example, seq(5)=c(1,2,3,4,5), while seq(3,5)=c(3,4,5), and seq(1,10,2)=c(1,3,5,7,9). You may also write 1:n, it is the same as seq(n), and m:n is same as seq(m,n) (but the priority of ':' is very high, so 1:2*x is interpreted as (1:2)*x.
Generally, if something does not work in R, you should print the subexpressions from the innerwise to the outerwise. If some subexpression is too big to be printed, use str(x) (str means "structure"). And never assume that functions in Python and R are same! If there is a function with same name, it usually does a different thing.
On a side note, instead of dim(I.df.nocov)[2] you could just write ncol(I.df.nocov) (there is also a function nrow).
I have a stack made of rasters
s<-stack(list of ASCI files)
I am trying to perform this operation
df<-as.data.frame(c(s[[1]],s[[2]],s[[2]],s[["bathymetry"]]))
but I get this error
Error in as.data.frame.default(x[[i]], optional = TRUE) :
cannot coerce class "structure("RasterLayer", package = "raster")" to a data.frame
When I perform this operation on a single raster such as
df<-as.data.frame(s[[1]])
everything works fine. But I have to extract many rasters and combined them in one dataframe. The only solution I see now is to extract them individually and then combined them, is there a better solution? I am working with hundreds of raster at a time.
EDIT: I should also add that this function goes inside a loop and I am only extracting a subset of the raster on each loop.
Or use...
data.frame( rasterToPoints( s ) )
Drop the columns you don't want afterwards.
To apply a function to each element of a list, the apply family commands are helpful:
lapply( s, as.data.frame )
That returns a list of data.frames.
To restrict it to only the elements that you want, just subset to a smaller list beforehand.
s_small <- s[c(1,2,3,5)]
lapply( s_small, as.data.frame )