Related
I am making some plots in R in a for-loop and would like to store them using a name to describe the function being plotted, but also which data it came from.
So when I have a list of 2 data sets "x" and "y" and the loop has a structure like this:
x = matrix(
c(1,2,4,5,6,7,8,9),
nrow=3,
ncol=2)
y = matrix(
c(20,40,60,80,100,120,140,160,180),
nrow=3,
ncol=2)
data <- list(x,y)
for (i in data){
??? <- boxplot(i)
}
I would like the ??? to be "name" + (i) + "_" separator. In this case the 2 plots would be called "plot_x" and "plot_y".
I tried some stuff with paste("plot", names(i), sep = "_") but I'm not sure if this is what to use, and where and how to use it in this scenario.
We can create an empty list with the length same as that of the 'data' and then store the corresponding output from the for loop by looping over the sequence of 'data'
out <- vector('list', length(data))
for(i in seq_along(data)) {
out[[i]] <- boxplot(data[[i]])
}
str(out)
#List of 2
# $ :List of 6
# ..$ stats: num [1:5, 1:2] 1 1.5 2 3 4 5 5.5 6 6.5 7
# ..$ n : num [1:2] 3 3
# ..$ conf : num [1:2, 1:2] 0.632 3.368 5.088 6.912
# ..$ out : num(0)
# ..$ group: num(0)
# ..$ names: chr [1:2] "1" "2"
# $ :List of 6
# ..$ stats: num [1:5, 1:2] 20 30 40 50 60 80 90 100 110 120
# ..$ n : num [1:2] 3 3
# ..$ conf : num [1:2, 1:2] 21.8 58.2 81.8 118.2
# ..$ group: num(0)
# ..$ names: chr [1:2] "1" "2"
If required, set the names of the list elements with the object names
names(out) <- paste0("plot_", c("x", "y"))
It is better not to create multiple objects in the global environment. Instead as showed above, place the objects in a list
akrun is right, you should try to avoid setting names in the global environment. But if you really have to, you can try this,
> y = matrix(c(20,40,60,80,100,120,140,160,180),ncol=1)
> .GlobalEnv[[paste0("plot_","y")]] <- boxplot(y)
> str(plot_y)
List of 6
$ stats: num [1:5, 1] 20 60 100 140 180
$ n : num 9
$ conf : num [1:2, 1] 57.9 142.1
$ out : num(0)
$ group: num(0)
$ names: chr "1"
You can read up on .GlobalEnv by typing in ?.GlobalEnv, into the R command prompt.
I have looked through various Overflow pages with similar questions (some linked) but haven't found anything that seems to help with this complicated task.
I have a series of data frames in my workspace and I would like to loop the same function (rollmean or some version of that) over all of them, then save the results to new data frames.
I have written a couple of lines of to generate a list of all data frames and a for loop that should iterate an apply statement over each data frame; however, I'm having problems trying to accomplish everything I'm hoping to achieve (my code and some sample data are included below):
1) I would like to restrict the rollmean function to all columns, except the 1st (or first several), so that the column(s) 'info' does not get averaged. I would also like to add this column(s) back to the output data frame.
2) I want to save the output as a new data frame (with a unique name). I do not care if it is saved to the workspace or exported as an xlsx, as I already have batch import codes written.
3) Ideally, I would like the resultant data frame to be the same number of observations as the input, where as rollmean shrinks your data. I also do not want these to become NA, so I don't want to use fill = NA This could be accomplished by writing a new function, passing type = "partial" in rollmean (though that still shrinks my data by 1 in my hands), or by starting the roll mean on the nth+2 term and binding the non averaged nth and nth+1 terms to the resulting data frame. Any way is fine.
(see picture for detail, it illustrates what the later would look like)
My code only accomplishes parts of these things and I cannot get the for loop to work together but can get parts to work if I run them on single data frames.
Any input is greatly appreciated because I'm out of ideas.
#reproducible data frames
a = as.data.frame(cbind(info = 1:10, matrix(rexp(200), 10)))
b = as.data.frame(cbind(info = 1:10, matrix(rexp(200), 10)))
c = as.data.frame(cbind(info = 1:10, matrix(rexp(200), 10)))
colnames(a) = c("info", 1:20)
colnames(b) = c("info", 1:20)
colnames(c) = c("info", 1:20)
#identify all dataframes for looping rollmean
dflist = as.list(ls()[sapply(mget(ls(), .GlobalEnv), is.data.frame)]
#for loop to create rolling average and save as new dataframe
for (j in 1:length(dflist)){
list = as.list(ls()[sapply(mget(ls(), .GlobalEnv), is.data.frame)])
new.names = as.character(unique(list))
smoothed = as.data.frame(
apply(
X = names(list), MARGIN = 1, FUN = rollmean, k = 3, align = 'right'))
assign(new.names[i], smoothed)
}
I also tried a nested apply approach but couldn't get it to call the rollmean/rollapply function similar to issue here so I went back to for loops but if someone can make this work with nested applies, I'm down!
Picture is ideal output: Top is single input dataframe with colored boxes demonstrating a rolling average across all columns, to be iterated over each column; bottom is ideal output with colors reflecting the location of output for each colored window above
To approach this, think about one column, then one frame (which is just a list of columns), then a list of frames.
(My data used is at the bottom of the answer.)
One Column
If you don't like the reduction of zoo::rollmean, then write your own:
myrollmean <- function(x, k, ..., type=c("normal","rollin","keep"), na.rm=FALSE) {
type <- match.arg(type)
out <- zoo::rollmean(x, k, ...)
aug <- c()
if (type == "rollin") {
# effectively:
# c(mean(x[1]), mean(x[1:2]), ..., mean(x[1:j]))
# for the j=k-1 elements that precede the first from rollmean,
# when it'll become something like:
# c(mean(x[3:5]), mean(x[4:6]), ...)
aug <- sapply(seq_len(k-1), function(i) mean(x[seq_len(i)], na.rm=na.rm))
} else if (type == "keep") {
aug <- x[seq_len(k-1)]
}
out <- c(aug, out)
out
}
myrollmean(1:8, k=3) # "normal", default behavior
# [1] 2 3 4 5 6 7
myrollmean(1:8, k=3, type="rollin")
# [1] 1.0 1.5 2.0 3.0 4.0 5.0 6.0 7.0
myrollmean(1:8, k=3, type="keep")
# [1] 1 2 2 3 4 5 6 7
I caution that this implementation is a bit naïve at best, and needs to be fixed. Make sure that you understand what it is doing when you pick other than "normal" (which will not work for you, I'm just defaulting to the normal zoo::rollmean behavior). This function could easily be applied to other zoo::roll* functions.
On one column of the data:
rbind(
dflist[[1]][,2], # for comparison
myrollmean(dflist[[1]][,2], k=3, type="keep")
)
# [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
# [1,] 1.865352 0.4047481 0.1466527 1.7307097 0.08952618 0.6668976 1.0743669 1.511629 1.314276 0.1565303
# [2,] 1.865352 0.4047481 0.8055844 0.7607035 0.65562952 0.8290445 0.6102636 1.084298 1.300091 0.9941452
One "frame"
Simple use of lapply, omitting the first column:
str(dflist[[1]][1:4, 1:3])
# 'data.frame': 4 obs. of 3 variables:
# $ info: num 1 2 3 4
# $ 1 : num 1.865 0.405 0.147 1.731
# $ 2 : num 0.745 1.243 0.674 1.59
dflist[[1]][-1] <- lapply(dflist[[1]][-1], myrollmean, k=3, type="keep")
str(dflist[[1]][1:4, 1:3])
# 'data.frame': 4 obs. of 3 variables:
# $ info: num 1 2 3 4
# $ 1 : num 1.865 0.405 0.806 0.761
# $ 2 : num 0.745 1.243 0.887 1.169
(For validation, column $ 1 matches the second row in the "one column" example above.)
List of "frames"
(I reset the data to what it was before I modified it above ... see the "data" code at the bottom of the answer.)
We nest the previous technique into another lapply:
dflist2 <- lapply(dflist, function(ldf) {
ldf[-1] <- lapply(ldf[-1], myrollmean, k=3, type="keep")
ldf
})
str(lapply(dflist2, function(a) a[1:4, 1:3]))
# List of 3
# $ :'data.frame': 4 obs. of 3 variables:
# ..$ info: num [1:4] 1 2 3 4
# ..$ 1 : num [1:4] 1.865 0.405 0.806 0.761
# ..$ 2 : num [1:4] 0.745 1.243 0.887 1.169
# $ :'data.frame': 4 obs. of 3 variables:
# ..$ info: num [1:4] 1 2 3 4
# ..$ 1 : num [1:4] 0.271 3.611 2.36 3.095
# ..$ 2 : num [1:4] 0.127 0.722 0.346 0.73
# $ :'data.frame': 4 obs. of 3 variables:
# ..$ info: num [1:4] 1 2 3 4
# ..$ 1 : num [1:4] 1.278 0.346 1.202 0.822
# ..$ 2 : num [1:4] 0.341 1.296 1.244 1.528
(Again, for simple validation, see that the first frame's $ 1 row shows the same rolled means as the second row of the "one column" example, above.)
PS:
if you need to skip more than just the first column, then inside the outer lapply, use instead ldf[-(1:n)] <- lapply(ldf[-(1:n)], myrollmean, k=3, type="keep") to skip the first n columns
to use a window function other than zoo::rollmean, you'll want to change the special-cases of myrollmean, though it should be straight-forward enough given this example
I use a concocted str(...) to shorten the output for display here. You should verify all of your data that it is doing what you expect for the whole of each frame.
Reproducible Data
set.seed(2)
a = as.data.frame(cbind(info = 1:10, matrix(rexp(200), 10)))
b = as.data.frame(cbind(info = 1:10, matrix(rexp(200), 10)))
c = as.data.frame(cbind(info = 1:10, matrix(rexp(200), 10)))
colnames(a) = c("info", 1:20)
colnames(b) = c("info", 1:20)
colnames(c) = c("info", 1:20)
dflist <- list(a,b,c)
str(lapply(dflist, function(a) a[1:3, 1:4]))
# List of 3
# $ :'data.frame': 3 obs. of 4 variables:
# ..$ info: num [1:3] 1 2 3
# ..$ 1 : num [1:3] 1.865 0.405 0.147
# ..$ 2 : num [1:3] 0.745 1.243 0.674
# ..$ 3 : num [1:3] 0.356 0.689 0.833
# $ :'data.frame': 3 obs. of 4 variables:
# ..$ info: num [1:3] 1 2 3
# ..$ 1 : num [1:3] 0.271 3.611 3.198
# ..$ 2 : num [1:3] 0.127 0.722 0.188
# ..$ 3 : num [1:3] 1.99 2.74 4.78
# $ :'data.frame': 3 obs. of 4 variables:
# ..$ info: num [1:3] 1 2 3
# ..$ 1 : num [1:3] 1.278 0.346 1.981
# ..$ 2 : num [1:3] 0.341 1.296 2.094
# ..$ 3 : num [1:3] 1.1159 3.05877 0.00506
Below dfnames is the names of the data frames in env, the global environment -- we have named it env in case you want to later change where they are located. Note that ls has a pattern= argument and if the data frame names have a distinct pattern then dfnames <- ls(pattern=whatever) could be used instead where whatever is a suitable regular expression.
Now define make_new which calls rollapplyr with a new mean function mean3 which returns the last value of its input if the input vector has a length less than 3 and mean otherwise. Then loop over the names using rollappyr with FUN=mean3 and partial=TRUE.
library(zoo)
env <- .GlobalEnv
dfnames <- Filter(function(x) is.data.frame(get(x, env)), ls(env))
# make_new - first version
mean3 <- function(x, k = 3) if (length(x) < k) tail(x, 1) else mean(x)
make_new <- function(df) replace(df, -1, rollapplyr(df[-1], 3, mean3, partial = TRUE))
for(nm in dfnames) env[[paste(nm, "new", sep = "_")]] <- make_new(get(nm, env))
Alternative version of make_new
An alternative to the first version of make_new shown above is the following second version. In the second version instead of defining mean3 we use just plain mean but specify a vector of widths w in rollapplyr such that w equals c(1, 1, 3, 3, ..., 3). Thus it takes the mean of just the last element for the first two input components and the mean of the 3 last elements for the rest. Note that now that we specify the widths explicitly we no longer need to specify partial= .
# make_new -- second version
make_new <- function(df) {
w <- replace(rep(3, nrow(df)), 1:2, 1)
replace(df, -1, rollapplyr(df[-1], w, mean))
}
Note
Normally when writing R and manpulating a set of objects one stores the objects in a list rather than leaving them loose in the global environment. We could create such a list L like this and then use lapply to create a second list L2 containing the new versions. Either version of make_new would work here.
L <- mget(dfnames, env)
L2 <- lapply(L, make_new)
I have a list of values which I would like to use as names for separate tables scraped from separate URLs on a certain website.
> Fac_table
[[1]]
[1] "fulltime_fac_table"
[[2]]
[1] "parttime_fac_table"
[[3]]
[1] "honorary_fac_table"
[[4]]
[1] "retired_fac_table"
I would like to loop through the list to automatically generate 4 tables with the respective names.
The result should look like this:
> fulltime_fac_table
職稱
V1 "教授兼系主任"
V2 "教授"
V3 "教授"
V4 "教授"
V5 "特聘教授"
> parttime_fac_table
職稱 姓名
V1 "教授" "XXX"
V2 "教授" "XXX"
V3 "教授" "XXX"
V4 "教授" "XXX"
V5 "教授" "XXX"
V6 "教授" "XXX"
I have another list, named 'headers', containing column headings of the respective tables online.
> headers
[[1]]
[1] "職稱" "姓名" " 研究領域"
[4] "聯絡方式"
[[2]]
[1] "職稱" "姓名" "研究領域" "聯絡方式"
I was able to assign values to the respective tables with this code:
> assign(eval(parse(text="Fac_table[[i]]")), as_tibble(matrix(fac_data,
> nrow = length(headers[[i]])))
This results in a populated table, without column headings, like this one:
> honorary_fac_table
[,1] [,2]
V1 "名譽教授" "XXX"
V2 "名譽教授" "XXX"
V3 "名譽教授" "XXX"
V4 "名譽教授" "XXX"
But was unable to assign column names to each table.
Neither of the code below worked:
> assign(colnames(eval(parse(text="Fac_table[1]"))), c(gsub("\\s", "", headers[[1]])))
Error in assign(colnames(eval(parse(text = "Fac_table[1]"))), c(gsub("\\s", :
第一個引數不正確
> colnames(eval(parse(text="Fac_table[i]"))) <- c(gsub("\\s", "", headers[[i]]))
Error in colnames(eval(parse(text = "Fac_table[i]"))) <- c(gsub("\\s", :
賦值目標擴充到非語言的物件
> do.call("<-", colnames(eval(parse(text="Fac_table[i]"))), c(gsub("\\s", "", headers[[i]])))
Error in do.call("<-", colnames(eval(parse(text = "Fac_table[i]"))), c(gsub("\\s", :
second argument must be a list
To simplify the issue, a reproducible example is as follows:
> varNamelist <- list(c("tbl1","tbl2","tbl3","tbl4"))
> colHeaderlist <- list(c("col1","col2","col3","col4"))
> tableData <- matrix([1:12], ncol=4)
This works:
> assign(eval(parse(text="varNamelist[[1]][1]")), matrix(tableData, ncol
> = length(colHeaderlist[[1]])))
But this doesn't:
> colnames(as.name(varNamelist[[1]][1])) <- colHeaderlist[[1]]
Error in `colnames<-`(`*tmp*`, value = c("col1", "col2", "col3", "col4" :
attempt to set 'colnames' on an object with less than two dimensions
It seems like the colnames() function in R is unable to treat the strings as represented by "Fac_table[i]" as variable names, in which independent data (separate from Fac_table) can be stored.
> colnames(as.name(Fac_table[[1]])) <- headers[[1]]
Error in `colnames<-`(`*tmp*`, value = c("a", "b", "c", :
attempt to set 'colnames' on an object with less than two dimensions
Substituting for 'fulltime_fac_table' directly works fine.
> colnames(fulltime_fac_table) <- headers[[1]]
Is there any way around this issue?
Thanks!
There is a solution to this, but I think the current set up may be more complex than necessary if I understand correctly. So I'll try to make this task easier.
If you're working with one-dimensional data, I'd recommend using vectors, as they're more appropriate than lists for that purpose. So for this project, I'd begin by storing the names of tables and headers, like this:
varNamelist <- c("tbl1","tbl2","tbl3","tbl4")
colHeaderlist <- c("col1","col2","col3","col4")
It's still difficult to determine what the data format and origin for the input of these table is from your question, but in general, sometimes a data frame can be easier to work with than a matrix, as long as your not working with Big Data. The assign function is also typically not necessary for these sort of steps. Instead, when setting up a dataframe, we can apply the name of the data frame, the name of the columns, and the data contents all at once, like this:
tbl1 <- data.frame("col1"=c(1,2,3),
"col2"=c(4,5,6),
"col3"=c(7,8,9),
"col4"=c(10,11,12))
Again, we're using vectors, noted by the c() instead of list(), to fill each column since each column is it's own single dimension.
To check the output of tbl1, we can then use print():
print(tbl1)
col1 col2 col3 col4
1 1 4 7 10
2 2 5 8 11
3 3 6 9 12
If it's an option to create the tables closer to this way shown, that might make things easier than using so many lists and assign functions; that quickly becomes overly complicated.
But if you want at the end to store all the tables in a single place, you could put them in a list:
tableList <– list(tbl1=tbl1,tbl2=tbl2,tbl3=tbl3,tbl4=tbl4)
str(tableList)
List of 4
$ tbl1:'data.frame': 3 obs. of 4 variables:
..$ col1: num [1:3] 1 2 3
..$ col2: num [1:3] 4 5 6
..$ col3: num [1:3] 7 8 9
..$ col4: num [1:3] 10 11 12
$ tbl2:'data.frame': 3 obs. of 4 variables:
..$ col1: num [1:3] 1 2 3
..$ col2: num [1:3] 4 5 6
..$ col3: num [1:3] 7 8 9
..$ col4: num [1:3] 10 11 12
$ tbl3:'data.frame': 3 obs. of 4 variables:
..$ col1: num [1:3] 1 2 3
..$ col2: num [1:3] 4 5 6
..$ col3: num [1:3] 7 8 9
..$ col4: num [1:3] 10 11 12
$ tbl4:'data.frame': 3 obs. of 4 variables:
..$ col1: num [1:3] 1 2 3
..$ col2: num [1:3] 4 5 6
..$ col3: num [1:3] 7 8 9
..$ col4: num [1:3] 10 11 12
I've found a work around solution based on #Ryan's recommendation, given by this code:
for (i in seq_along(url)){
webpage <- read_html(url[i]) #loop through URL list to access html data
fac_data <- html_nodes(webpage,'.tableunder') %>% html_text()
fac_data1 <- html_nodes(webpage,'.tableunder1') %>% html_text()
fac_data <- c(fac_data, fac_data1) #Store table data on each URL in a variable
x <- fac_data %>% matrix(ncol = length(headers[[i]]), byrow=TRUE) #make matrix to extract column data
for (j in seq_along(headers[[i]])){
y <- cbind(x[,j]) #extract column data and store in temporary variable
colnames(y) <- as.character(headers[[i]][j]) #add column name
print(cbind(y)) #loop through headers list to print column data in sequence. ** cbind(y) will be overwritten when I try to store the result on a list with 'z <- cbind(y)'.
}
}
I am now able to print out all values, complete with headers of the data in question.
Follow-up questions have been posted here.
The final code solved this problem as well.
I have some output from the vegan function specaccum. It is a list of 8 objects of varying lengths;
> str(SPECIES)
List of 8
$ call : language specaccum(comm = PRETEND.DATA, method = "rarefaction")
$ method : chr "rarefaction"
$ sites : num [1:5] 1 2 3 4 5
$ richness : num [1:5] 20.9 34.5 42.8 47.4 50
$ sd : num [1:5] 1.51 2.02 1.87 1.35 0
$ perm : NULL
$ individuals: num [1:5] 25 50 75 100 125
$ freq : num [1:50] 1 2 3 2 4 3 3 3 4 2 ...
- attr(*, "class")= chr "specaccum"
I want to extract three of the lists ('richness', 'sd' and 'individuals') and convert them to columns in a data frame. I have developed a workaround;
SPECIES.rich <- data.frame(SPECIES[["richness"]])
SPECIES.sd <- data.frame(SPECIES[["sd"]])
SPECIES.individuals <- data.frame(SPECIES[["individuals"]])
SPECIES.df <- cbind(SPECIES.rich, SPECIES.sd, SPECIES.individuals)
But this seems clumsy and protracted. I wonder if anyone could suggest a neater solution? (Should I be looking at something with lapply??) Thanks!
Example data to generate the specaccum output;
Set.Seed(100)
PRETEND.DATA <- matrix(sample(0:1, 250, replace = TRUE), 5, 50)
library(vegan)
SPECIES <- specaccum(PRETEND.DATA, method = "rarefaction")
We can concatenate the names in a vector and extract it
SPECIES.df <- data.frame(SPECIES[c("richness", "sd", "individuals")])
Another alternative, similar to akrun, is:
ctoc1 = as.data.frame(cbind(SPECIES$richness, SPECIES$sd, SPECIES$individuals))
Please note that in both cases (my answer and akrun) you will get an error if the lengths of the columns do not match.
e.g.: SPECIES.df <- data.frame(SPECIES[c( "sd", "freq")])
Error in data.frame(richness = c(20.5549865665613, 33.5688503093388, 41.4708434700877, :
arguments imply differing number of rows:7, 47
If so, remember to use length() function :
length(SPECIES$sd) <- 47 # this will add NAs to increase the column length.
SPECIES.df <- data.frame(SPECIES[c("sd", "freq")])
SPECIES.df # dataframe with 2 columns and 7 rows.
I have a function that I apply to a column and puts results in another column and it sometimes gives me integer(0) as output. So my output column will be something like:
45
64
integer(0)
78
How can I detect these integer(0)'s and replace them by NA? Is there something like is.na() that will detect them ?
Edit: Ok I think I have a reproducible example:
df1 <-data.frame(c("267119002","257051033",NA,"267098003","267099020","267047006"))
names(df1)[1]<-"ID"
df2 <-data.frame(c("257051033","267098003","267119002","267047006","267099020"))
names(df2)[1]<-"ID"
df2$vals <-c(11,22,33,44,55)
fetcher <-function(x){
y <- df2$vals[which(match(df2$ID,x)==TRUE)]
return(y)
}
sapply(df1$ID,function(x) fetcher(x))
The output from this sapply is the source of the problem.
> str(sapply(df1$ID,function(x) fetcher(x)))
List of 6
$ : num 33
$ : num 11
$ : num(0)
$ : num 22
$ : num 55
$ : num 44
I don't want this to be a list - I want a vector, and instead of num(0) I want NA (note in this toy data it gives num(0) - in my real data it gives (integer(0)).
Here's a way to (a) replace integer(0) with NA and (b) transform the list into a vector.
# a regular data frame
> dat <- data.frame(x = 1:4)
# add a list including integer(0) as a column
> dat$col <- list(45,
+ 64,
+ integer(0),
+ 78)
> str(dat)
'data.frame': 4 obs. of 2 variables:
$ x : int 1 2 3 4
$ col:List of 4
..$ : num 45
..$ : num 64
..$ : int
..$ : num 78
# find zero-length values
> idx <- !(sapply(dat$col, length))
# replace these values with NA
> dat$col[idx] <- NA
# transform list to vector
> dat$col <- unlist(dat$col)
# now the data frame contains vector columns only
> str(dat)
'data.frame': 4 obs. of 2 variables:
$ x : int 1 2 3 4
$ col: num 45 64 NA 78
Best to do that in your function, I'll call it myFunctionForApply but that's your current function. Before you return, check the length and if it is 0 return NA:
myFunctionForApply <- function(x, ...) {
# Do your processing
# Let's say it ends up in variable 'ret':
if (length(ret) == 0)
return(NA)
return(ret)
}