Using dplyr for exploratory plots - r

I regularly used d_ply to produce exploratory plots.
A trivial example:
require(plyr)
plot_species <- function(species_data){
p <- qplot(data=species_data,
x=Sepal.Length,
y=Sepal.Width)
print(p)
}
d_ply(.data=iris,
.variables="Species",
function(x)plot_species(x))
Which produces three separate plots, one for each species.
I would like to reproduce this behaviour using functions in dplyr.
This seems to require the reassembly of the data.frame within the function called by summarise, which is often impractical.
require(dplyr)
iris_by_species <- group_by(iris,Species)
plot_species <- function(Sepal.Length,Sepal.Width){
species_data <- data.frame(Sepal.Length,Sepal.Width)
p <- qplot(data=species_data,
x=Sepal.Length,
y=Sepal.Width)
print(p)
}
summarise(iris_by_species, plot_species(Sepal.Length,Sepal.Width))
Can parts of the data.frame be passed to the function called by summarise directly, rather than passing columns?

I believe you can work with do for this task with the same function you used in d_ply. It will print directly to the plotting window, but also saves the plots as a list within the resulting data.frame if you use a named argument (see help page, this is essentially like using dlply). I don't fully grasp all that do can do, but if I don't use a named argument I get an error message but the plots still print to the plotting window (in RStudio).
plot_species <- function(species_data){
p <- qplot(data=species_data,
x=Sepal.Length,
y=Sepal.Width)
print(p)
}
group_by(iris, Species) %>%
do(plot = plot_species(.))

Related

ggplot within a function does not return a scatterplot with datapoints, instead a plot with dataframe values. How to fix this?

I'm writing a function where I should get 2 ggplots objects returned to me in RStudio based on two different dataframes generated within my function. However, instead I get a plot with all the dataframe values "printed" in it returned and not a normal scatterplot.
I tried:
return(list(df1, df2))
Plots<- list(df1, df2), return(Plots)
View(df1) View(df2)
ggplot without storing it into an object
Just return a single ggplot and not using list() to return two.
Print() instead of return or view.
Every result has the same outcome (picture):
As you can see on the bottom right, I do not get a scatter plot. The console does show output [1] and [[2]], but nothing else. The code itself is working perfectly.
I ran debug, I've got no errors and above all when I replaced ggplot with plot(), this DID return the prefered scatterplot to me. So I assume the problem is not related to the code itself.
However, I am much more familiar with customizations with ggplot than plot(), so if anyone knows how to solve this issue it would be amazing. Provided below I added some sample data and some sample code, although I'm not sure whether that is relevant with this issue.
The code I used within my function to create and return the ggplots is:
MD_filter_trial<- function(dataframe, mz_col, a = 0.00112, b = 0.01953){
MZ<- mz_col
MZR<- trunc(mz_col, digits = 0)#Either floor() or trunc() can be used for this part.
MD<- as.numeric(MZ-MZR)
MD.limit<- b + a*mz_col
dataframe<- dataframe%>%
dplyr::mutate(MD, MZ, MD.limit)%>%
dplyr::select(MD, MZ, MD.limit)
highlight_df <- dataframe %>% filter(MD >= MD.limit) #Notice how this is the exact opposite from the
MD_plot<- ggplot(data=dataframe, aes(x=MZ, y=MD))+
geom_point()+
geom_point(data=highlight_df, aes(x=MZ,y=MD), color='red')+#I added this one, so the data which will be removed will be highlighted in red.
ggtitle(paste("Unfiltered MD data - ", dataframe))
filtered<- dataframe%>%
filter(MD <= MD.limit)# As I understood: Basically all are coordinates. The maxima equation basically gives coordinates
MD_plot_2<- ggplot(data=filtered, aes(x=MZ, y=MD))+ #Filtered is basically the second dataframe, #which subsets datapoints with an Y value (which is the MD), below the linear equation MD...
geom_point()+
ggtitle(paste("Filtered MD data - ", dataframe))
N_Removed_datapoints <- nrow(dataframe) - nrow(filtered)
print(paste("Number of peaks removed:", N_Removed_datapoints))
MD_PLOTS<-list(dataframe, filtered, MD_plot, MD_plot_2)
return(MD_PLOTS)
}
Sample data:
structure(list(mz_col= c(99.0001, 99.0056, 99.0079, 99.0097, 99.0105,
99.0116, 99.0158, 99.0169, 99.019, 99.0196, 99.0207, 99.0215,
99.0239, 99.0252, 99.026, 99.0269, 99.0288, 99.0295, 99.0302,
99.0311, 99.0318, 99.0332, 99.034, 99.0346, 99.0355, 99.0376,
99.039, 99.04, 99.0405, 99.0414, 99.0421, 99.043, 99.0444, 99.0473,
99.048, 99.0517, 99.0536, 99.0547, 99.0556, 99.057, 99.0575,
99.0586, 99.0599, 99.0606, 99.0621, 99.0637, 99.0652, 99.0661,
99.0668, 99.0686, 99.0694, 99.0699, 99.0707, 99.0714, 99.072,
99.075, 99.0762, 99.0794, 99.0808, 99.0836, 99.0888, 99.0901,
99.0911, 99.092, 99.095, 99.0962, 99.1001, 99.1064, 99.1173,
99.4889, 99.5059, 99.5084, 99.5126, 99.5158, 99.5165, 99.5173,
99.5183, 99.526, 99.5266, 99.5315, 99.5345, 99.5358, 99.5402,
99.543, 99.5472, 99.548, 99.5529, 99.5572, 99.5577, 99.9408,
99.9551, 99.9599, 99.9646, 99.9718, 99.9887)), row.names = c(NA,
-95L), class = c("tbl_df", "tbl", "data.frame"))
In your ggtitles calls perhaps you mean:
ggtitle(paste("Filtered MD data -", deparse(substitute(dataframe)))
Within a function this takes the name of the object passed to the dataframe argument and pastes it into a string, rather than putting the whole dataframe in.

Apply a function to different dataframes

I am trying to run a function over different datasets, and can't seem to get it work. The variable names x and y are the same across datasets, but the dataset (argument z in my custom function) is different.
I have tried lapply but it is not working
Running the function over individual datasets works fine:
resultsmadrid <- customfunction (x=types, y=score, z=madrid)
resultsnavarra <- customfunction (x=types, y=score, z=navarra)
resultsaragon <- customfunction (x=types, y=score, z=aragon)
Trying to do it in one take is not working
regiones <- list(madrid, navarra, aragon) #Creates the list
resultregiones <- lapply(regiones, customfunction(x=types, y=score, z=regiones)) #Applies that to the list (?)
It's not looping the analysis across the dataframes in the list, the error message says there is a missing argument in the function.
I am not clear on how to call each dataframe from the function argument that does that (z, in my case). It seems the name of the comprehensive list object is not the right approach. Thanks for the help!
since types and scores are the same, you need to 'loop' throught the elements of you regiones list. Try it like this:
regiones <- list(madrid, navarra, aragon) #Creates the list
resultregiones <- lapply(regiones,function(X) customfunction(x=types, y=score, z=X))

Can I create value name from function argument, and assign a function output to it?

I have created a function which cleans up my data and plots using ggplot. I want to name the cleaned data and plot with a suffix so that it can be recalled easily.
For example:
data_frame
data_frame_cleaned
data_frame_plot
I haven't managed to find anything that might pull this off.
I read about using deparse(substitute(x)) to turn the variable into a string, so I gave it a shot together with paste().
import a new data frame
my_data <- read.csv("my_data.csv")
analyze_data(my_data)
function with dpylr and ggplot.
Then, I want to store analyse_data and data_plot in the environment, here is what I thought might work, but no...
analyze_data <- function(x){
x_data <- x %>%
filter()%>%
group_by() %>%
summarize() %>%
mutate()
x_plot <- ggplot(x_data)
x_name <- deparse(substitute(x))
assign(paste(x_name,"cleaned",sep="_"),x_data)
assign(paste(x_name,"plot",sep="_"),x_plot)
}
I got warning message instead.
Warning messages:
1: In assign(paste(x_name, "cost_plot", sep = "_"), campg_data) :
only the first element is used as variable name
Using assign to assign variables is not the best idea. You can litter your environment with lots of variables, which can become confusing, and makes it difficult to handle them programmatically. It's better to store your objects in something like a list, which allows you to extract data easily or modify it in sequence using the *apply or map_* functions. That said…
I cannot replicate the warning when I run your function more or less as it is above. Nevertheless, although the function seems to run just fine, it doesn't do what is desired, i.e. no new variables appear in .GlobalEnv. The issue is that you haven't specified the environment in which the variables should be assigned, so they are assigned within the function's own local environment and vanish when the function completes.
You can use pos = 1 to assign your variables within the .GlobalEnv. The following code create variables mtcars_cleaned and mtcars_plot in my .GlobalEnv:
library(dplyr)
analyze_data <- function(x){
x_data <- x %>%
filter(cyl > 4)
x_plot <- ggplot(x_data, aes(mpg, disp)) + geom_point()
x_name <- deparse(substitute(x))
assign(paste(x_name,"cleaned", sep="_"), x_data, pos = 1)
assign(paste(x_name,"plot", sep="_"), x_plot, pos = 1)
}
analyze_data(mtcars)

Creating a function that names objects created based on a function in R

I have a question about creating a function that creates ggplots. I want to create my own function to graph values in multiple data frames quickly instead of writing a whole ggplot with each argument filled out each time. What I want to do is to input a vector of the names of the data frames, have the function create the graphs and have each saved as a new object with a different name. Example of my idea is…
myfunction <- function(x) {
ggplot(x, aes(x = time, y = result)) +
geom_point()
}
I want to be able to do something like
myfunction(c(testtype1, testtype2, testtype3))
and have the function create objects plot1, plot2, plot3. As of now, I can only do
plot1 <- myfunction(testtype1)
plot2 <- myfunction(testtype2)
plot3 <- myfunction (testtype3)
I don’t want to keep typing that over and over, especially if I have a lot of test types. Is there a way that the function can be modified to use the function to name the objects according to some formula?
With this, you can provide any number of (appropriate) data frames, and the l_my_fun would return a list containing the plots.
l_my_fun <- function(x, ...) {
l <- list(x, ...)
ps <- lapply(l, myfunction)
ps
}
out <- l_my_fun(testtype1, testtype2, testtype3)
For example, now access the second plot as
out[[2]]

Use of ggplot() within a for-loop in another function in R without returning a graph object?

all, I got a task of writing a function which returns a list as output.
At the same time, plot something.
Say my function is as follows using mtcars data
library(ggplot2)
data(mtcars)
myfunc<-function(mtcars){
for(i in 1:ncol(mtcars)){
g1<- ggplot(mtcars, aes(x=mtcars[,i]))
g1 + geom_histogram()+
geom_vline(xintercept=mean(mtcars[,i]),col="red")
}
return (list(mtcars))
}
myfunc(mtcars)
How can I modify the above code which return a list as wanted and display gglots ?
If your question is "why does this not display any plots?", then the answer is this:
In the R command line, just typing a variable name or an expression invokes the print method. This does not happen in functions, or in loops, or when using source(...), so to cause anything to display (print or plot), you need to do that explicitly. But this is only part of your problem.
Using an index in the aes(...) call is a monumentally bad idea. Rather, extract the column name and use that in a call to aes_string(...):
myfunc<-function(mtcars){
for(i in 1:ncol(mtcars)){
col <- names(mtcars)[i]
ggp <- ggplot(mtcars, aes_string(x=col))+
geom_histogram()+
geom_vline(xintercept=mean(mtcars[[col]]),col="red")
plot(ggp)
}
return (list(mtcars))
}
myfunc(mtcars)
This will plot the histograms.

Resources