I am attempting to loop over both data frames and columns to produce multiple plots. I have a list of data frames, and for each, I want to plot the response against one of several predictors.
For example, I can easily loop across data frames:
df1=data.frame(response=rpois(10,1),value1=rpois(10,1),value2=rpois(10,1))
df2=data.frame(response=rpois(10,1),value1=rpois(10,1),value2=rpois(10,1))
#Looping across data frames
lapply(list(df1,df2), function(i) ggplot(i,aes(y=response,x=value1))+geom_point())
But I am having trouble looping across columns within a data frame:
lapply(list("value1","value2"), function(i) ggplot(df1,aes_string(x=i,y=response))+geom_point())
I suspect it has something to do with the way I am treating the aesthetics.
Ultimately I want to string together lapply to generate all combinations of data frames and columns.
Any help is appreciated!
EDIT: Joran has it! One must put the non-list responses in quotes when using aes_string
lapply(list("value1","value2"), function(i) ggplot(df1,aes_string(x=i,y="response"))+geom_point())
For reference, here is stringing the lapply functions to generate all combinations:
lapply(list(df1,df2), function(x)
lapply(list("value1","value2"), function(i) ggplot(x,aes_string(x=i,y="response"))+geom_point() ) )
Inside of aes_string(), all variables need to be represented as character. Add quotes around "response".
lapply(list("value1","value2"),
function(i) ggplot(df1, aes_string(x=i, y="response")) + geom_point())
Related
I have a list of lists of matrices. All lists have the same amount of matrices, all matrices have the same number of columns and all matrices in a list have the same number of rows, however, the number of rows differs between matrices of different lists. I tried to recreate a small example dataset below
set.seed(100)
biglist <- list(
a=list(foo=matrix(sample(1:10,30,replace=TRUE),ncol=3 ),
bar=matrix(sample(1:10,30,replace=TRUE),ncol=3),
puppy=matrix(sample(1:10,30,replace=TRUE),ncol=3)
),
b=list(foo=matrix(sample(1:10,24,replace=TRUE),ncol=3),
bar=matrix(sample(1:10,24,replace=TRUE),ncol=3),
puppy=matrix(sample(1:10,24,replace=TRUE),ncol=3)
)
)
I am trying create a boxplot of each matrix, where the columns are different boxplots all on the same plot. I can do this for a single matrix, but I am having trouble applying it to the entire list of lists.
Here is the code I have written for a single boxplot
yoptions=c('foo','bar','puppy')
mynames=c('first','second','third')
titleoptions=c('a','b')
plotdata=biglist$a$foo
colnames(plotdata)=mynames
afoo=ggplot(melt(as.data.table(plotdata)),aes(x=variable,y=value))+
geom_boxplot()+ggtitle(paste(mynames[1],titleoptions[1],sep=" "))+
ylab(paste(yoptions[1]))
This gives me what I want for the foo matrix in list a. I want to now be able to apply this to each matrix in the list, changing the plotdata, and titles and labels to match the corresponding matrix in the list, and then saving it to a variable that combines the list and the matrix. This would result in the following variables, all with a different plot attached to that variable:
afoo
abar
apuppy
bfoo
bbar
bpuppy
I know it is not very r-like but the first idea I had was some sort of nested for-loop, though I am not sure how that would work in this case. I know lapply is used for lists and I've done some looking into nested lapplys (lapplies?) but am not sure how to get that to make multiple variables that correspond to my figures I want to produce.
Any advice is most appreciated! Thanks!
something like this (two nested loops):
for(i in 1:length(biglist)) {
sublist <- biglist[[i]]
for(j in 1:length(sublist)) {
plotdata <- sublist[[j]]
name <- paste(english::ordinal(j),letters[i],sep="_")
assign(name, ggplot(melt(as.data.table(plotdata)),
aes(x=variable,y=value))+
geom_boxplot()+ggtitle(name)+
ylab(names(sublist)[j]))
}
}
I want to combine all rows of different data sets. The names of all data sets starts with test. All data sets have same number of observations. I know i can combine it by using rbind(). But typing the names of every data set will take a lot of time. Suggest me some better approach.
rbind(test1,test2,test3,test4)
Try first obtaining a vector of all matching objects using ls() with the pattern ^test:
dfs <- lapply(ls(pattern="^test"), function(x) get(x))
result <- rbindlist(dfs)
I am taking the suggestion by #Rohit to use rbindlist to make our lives easier to rbind together a list of data frames.
Second line of above code will work only if data sets are in data.table form or data frame form. IF data sets are in xts/zoo format then one have to make slight improvement use do.call() function.
## First make a list of all your data sets as suggested above
list_xts <- lapply(ls(pattern="^test"), function(x) get(x))
## then use do call and rbind()
xts_results<-do.call(rbind,list_xts)
I have n data frames, each corresponding to data from a city.
There are 3 variables per data frame and currently they are all factor variables.
I want to transform all of them into numeric variables.
I have started by creating a vector with the names of all the data frames in order to use in a for loop.
cities <- as.vector(objects())
for ( i in cities){
i <- as.data.frame(lapply(i, function(x) as.numeric(levels(x))[x]))
}
Although the code runs and there I get no error code, I don't see any changes to my data frames as all three variables remain factor variables.
The strangest thing is that when doing them one by one (as below) it works:
df <- as.data.frame(lapply(df, function(x) as.numeric(levels(x))[x]))
What you're essentially trying to do is modify the type of the field if it is a factor (to a numeric type). One approach using purrr would be:
library(purrr)
map(cities, ~ modify_if(., is.factor, as.numeric))
Note that modify() in itself is like lapply() but it doesn't change the underlying data structure of the objects you are modifying (in this case, dataframes). modify_if() simply takes a predicate as an additional argument.
for anyone who's interested in my question, I worked out the answer:
for ( i in cities){
assign(i, as.data.frame(lapply(get(i), function(x) as.numeric(levels(x))[x])))
}
I am a naive user of R and am attempting to come to terms with the 'apply' series of functions which I now need to use due to the complexity of the data sets.
I have large, ragged, data frame that I wish to reshape before conducting a sequence of regression analyses. It is further complicated by having interlaced rows of descriptive data(characters).
My approach to date has been to use a factor to split the data frame into sets with equal row lengths (i.e. a list), then attempt to remove the trailing empty columns, make two new, matching lists, one of data and one of chars and then use reshape to produce a common column number, then recombine the sets in each list. e.g. a simplified example:
myDF <- as.data.frame(rbind(c("v1",as.character(1:10)),
c("v1",letters[1:10]),
c("v2",c(as.character(1:6),rep("",4))),
c("v2",c(letters[1:6], rep("",4)))))
myDF[,1] <- as.factor(myDF[,1])
myList <- split(myDF, myDF[,1])
myList[[1]]
I can remove the empty columns for an individual set and can split the data frame into two sets from the interlacing rows but have been stumped with the syntax in writing a function to apply the following function to the list - though 'lapply' with 'seq_along' should do it?
Thus for the individual set:
DF <- myList[[2]]
DF <- DF[,!sapply(DF, function(x) all(x==""))]
DF
(from an earlier answer to a similar, but simpler example on this site). I have a large data set and would like an elegant solution (I could use a loop but that would not use the capabilities of R effectively). Once I have done that I ought to be able to use the same rationale to reshape the frames and then recombine them.
regards
jac
Try
lapply(split(myDF, myDF$V1), function(x) x[!colSums(x=='')])
I want to split a large dataframe into a list of dataframes according to the values in two columns. I then want to apply a common data transformation on all dataframes (lag transformation) in the resulting list. I'm aware of the split command but can only get it to work on one column of data at a time.
You need to put all the factors you want to split by in a list, eg:
split(mtcars,list(mtcars$cyl,mtcars$gear))
Then you can use lapply on this to do what else you want to do.
If you want to avoid having zero row dataframes in the results, there is a drop parameter whose default is the opposite of the drop parameter in the "[" function.
split(mtcars,list(mtcars$cyl,mtcars$gear), drop=TRUE)
how about this one:
library(plyr)
ddply(df, .(category1, category2), summarize, value1 = lag(value1), value2=lag(value2))
seems like an excelent job for plyr package and ddply() function. If there are still open questions please provide some sample data. Splitting should work on several columns as well:
df<- data.frame(value=rnorm(100), class1=factor(rep(c('a','b'), each=50)), class2=factor(rep(c('1','2'), 50)))
g <- c(factor(df$class1), factor(df$class2))
split(df$value, g)
You can also do the following:
split(x = df, f = ~ var1 + var2...)
This way, you can also achieve the same split dataframe by many variables without using a list in the f parameter.