Calculate sd and mean for many data frames in R - r

I have data frames called bmw_1,bmw_2,....bmw_9 and I want to calculate standard deviation and mean for each data frame but I don’t want to write
mean(bmw_1)
mean(bmw_2)
mean(bmw_3)
...
mean(bmw_9)
many times, so any help please

as mentioned in the comment, best way is to get the data frames into a list so you can apply a function over each.
Get all dfs into a list by name pattern:
ls_bmw <- mget(ls(pattern = "bmw_"))
Then apply the mean.
result <- lapply(ls_bmw, mean)
Difficult to go much further without a data example, but to get the results alongside the data frame name use:
names(ls_bmw)
... to get a vector of the df names and:
unlist(result)
... to get a vector of the results. The order of names and results elements will match and you convert that into a single result dataframe.

Related

How to create a matrix/data frame from a high number of single objects by using a loop?

I have a high number of single objects each one containing a mean value for a year. They are called cddmean1950, cddmean1951, ... ,cddmean2019.
Now I would like to put them together into a matrix or data frame with the first column being the year (1950 - 2019) and the second column being the single mean values.
This is a very long way to do it without looping:
matrix <- rbind(cddmean1950,cddmean1951,cddmean1952,...,cddmean2019)
Afterwards you transform the matrix to a data frame, create a vector with the years and add it to the data frame.
I am sure there must be a smarter and faster way to do this by using a loop or anything else?
I think this could be an easy way to do it. Provided all those single objects are in your current environment.
First we would create a list of the variable names using the paste0 function
YearRange <- 1950:2019
ObjectName <- paste0("cddmean", YearRange)
Then we can use lapply and get to get the values of all these variables as a list.
Then using do.call and rbind we can rbind all these values into a single vector and then finally create your dataframe as you requested.
ListofSingleObjects <- lapply(ObjectName, get)
MeanValues <- do.call(rbind,ListofSingleObjects)
df <- data.frame( year = YearRange , Mean = MeanValues )

How to get only a few columns from several data frames obtained from using the lapply function?

I have this function "get_animals" that retrieves data for several specimens of different species of animals. It works by giving a vector with several species names, and it retrieves the data regarding those species (location, dna sequences ...). The thing is that the data base I'm using can't handle a query with too many species names in a single line of code, so I'm trying to use lapply to get one by one.
I tried this:
species_list<-as.list(as.character(unique(df$species_name)))
e<-lapply(species_list, function (x) get_animals(animal_names=x))
The thing is that the lapply returns a series of data frames with too many columns for each species name in "species_list", and what I wanted was only two columns from each data frame, and then I aimed to fuse all those data frame in a single one.
I tried to unlist the result from the lapply function:
e<-unlist(e)
But it didn't work because it just returned all the occurences for the first column of each data frame.
Thanks in advance for any answers
If we need to subset the columns, use either the column index
lapply(species_list, function (x) get_animals(animal_names=x)[c(1, 5)])
Or column name
lapply(species_list, function (x)
get_animals(animal_names=x)[c("species_name", "location")])

Applying a function to a dataframe to trim empty columns within a list environment R

I am a naive user of R and am attempting to come to terms with the 'apply' series of functions which I now need to use due to the complexity of the data sets.
I have large, ragged, data frame that I wish to reshape before conducting a sequence of regression analyses. It is further complicated by having interlaced rows of descriptive data(characters).
My approach to date has been to use a factor to split the data frame into sets with equal row lengths (i.e. a list), then attempt to remove the trailing empty columns, make two new, matching lists, one of data and one of chars and then use reshape to produce a common column number, then recombine the sets in each list. e.g. a simplified example:
myDF <- as.data.frame(rbind(c("v1",as.character(1:10)),
c("v1",letters[1:10]),
c("v2",c(as.character(1:6),rep("",4))),
c("v2",c(letters[1:6], rep("",4)))))
myDF[,1] <- as.factor(myDF[,1])
myList <- split(myDF, myDF[,1])
myList[[1]]
I can remove the empty columns for an individual set and can split the data frame into two sets from the interlacing rows but have been stumped with the syntax in writing a function to apply the following function to the list - though 'lapply' with 'seq_along' should do it?
Thus for the individual set:
DF <- myList[[2]]
DF <- DF[,!sapply(DF, function(x) all(x==""))]
DF
(from an earlier answer to a similar, but simpler example on this site). I have a large data set and would like an elegant solution (I could use a loop but that would not use the capabilities of R effectively). Once I have done that I ought to be able to use the same rationale to reshape the frames and then recombine them.
regards
jac
Try
lapply(split(myDF, myDF$V1), function(x) x[!colSums(x=='')])

how to make groups of variables from a data frame in R?

Dear Friends I would appreciate if someone can help me in some question in R.
I have a data frame with 8 variables, lets say (v1,v2,...,v8).I would like to produce groups of datasets based on all possible combinations of these variables. that is, with a set of 8 variables I am able to produce 2^8-1=63 subsets of variables like {v1},{v2},...,{v8}, {v1,v2},....,{v1,v2,v3},....,{v1,v2,...,v8}
my goal is to produce specific statistic based on these groupings and then compare which subset produces a better statistic. my problem is how can I produce these combinations.
thanks in advance
You need the function combn. It creates all the combinations of a vector that you provide it. For instance, in your example:
names(yourdataframe) <- c("V1","V2","V3","V4","V5","V6","V7","V8")
varnames <- names(yourdataframe)
combn(x = varnames,m = 3)
This gives you all permutations of V1-V8 taken 3 at a time.
I'll use data.table instead of data.frame;
I'll include an extraneous variable for robustness.
This will get you your subsetted data frames:
nn<-8L
dt<-setnames(as.data.table(cbind(1:100,matrix(rnorm(100*nn),ncol=nn))),
c("id",paste0("V",1:nn)))
#should be a smarter (read: more easily generalized) way to produce this,
# but it's eluding me for now...
#basically, this generates the indices to include when subsetting
x<-cbind(rep(c(0,1),each=128),
rep(rep(c(0,1),each=64),2),
rep(rep(c(0,1),each=32),4),
rep(rep(c(0,1),each=16),8),
rep(rep(c(0,1),each=8),16),
rep(rep(c(0,1),each=4),32),
rep(rep(c(0,1),each=2),64),
rep(c(0,1),128)) *
t(matrix(rep(1:nn),2^nn,nrow=nn))
#now get the correct column names for each subset
# by subscripting the nonzero elements
incl<-lapply(1:(2^nn),function(y){paste0("V",1:nn)[x[y,][x[y,]!=0]]})
#now subset the data.table for each subset
ans<-lapply(1:(2^nn),function(y){dt[,incl[[y]],with=F]})
You said you wanted some statistics from each subset, in which case it may be more useful to instead specify the last line as:
ans2<-lapply(1:(2^nn),function(y){unlist(dt[,incl[[y]],with=F])})
#exclude the first row, which is null
means<-lapply(2:(2^nn),function(y){mean(ans2[[y]])})

How can I use pattern to combine data frames using a wildcard?

A series of functions generate varying number of data frames (minimum of 1 and a max of 11).
I'd like to combine them using rbind. If I knew the names, I could easily just rbind(d1,d2...) but can't do that since I have to combine a different number of data frames each time.
So lags=rbind(pattern("lags_2_Y*")) didn't work.
I can get the list of the generated lag names into a vector like so: lag_names=ls(pattern="lags_2_Y*")
If I do: lags=llply(lag_names,rbind), I just get a list with the lag names. I want to rbind the contents of those data frames.
Ideas?
try
library(plyr)
lags = ldply(lag_names, get)
Edit:
If you give lag_names names, ldply() will add an id column
names(lag_names) <- lag_names
lags = ldply(lag_names, get)

Resources