Two ggplot with subset in pipe - r

I would like to plot two lines in one plot (both has the same axis), but one of the line is subset values from data frame.
I tries this
DF%>% ggplot(subset(., Cars == "A"), aes(Dates, sold_A)) +geom_line()+ ggplot(., (Dates, sold_ALL))
but this error occurred
object '.' not found

(1) You can't add a ggplot object to a ggplot object:
(2) Try taking the subset out of the call to ggplot.
DF %>%
subset(Cars == "A") %>%
ggplot(aes(Dates, sold_A)) +
geom_line() +
geom_line(data = DF, aes(Dates, sold_ALL))

I think you are misunderstanding how ggplot works. If we are attempting to do it your way, we could do:
DF %>% {ggplot(subset(., Cars == "A"), aes(Dates, sold_A)) +
geom_line(colour = "red") +
geom_line(data = subset(., Cars == "B"), colour = "blue") +
lims(y = c(0, 60))}
But it would be easier and better to map the variable Cars to the colour aesthetic, so your plot would be as simple as:
DF %>% ggplot(aes(Dates, sold_A, color = Cars)) + geom_line() + lims(y = c(0, 60))
Note that as well as being simpler code, we get the legend for free.
Data
Obviously, we didn't have your data for this question, but here is a constructed data set with the same name and same column variables:
set.seed(1)
Dates <- rep(seq(as.Date("2020-01-01"), by = "day", length = 20), 2)
Cars <- rep(c("A", "B"), each = 20)
sold_A <- rpois(40, rep(c(20, 40), each = 20))
DF <- data.frame(Dates, Cars, sold_A)

If you want only one plot, you would need to remove ggplot(., aes(Dates, sold_ALL)) and wrap directly into a structure like geom_line(data=., aes(Dates, sold_ALL)). Then, use the sage advice from #MrFlick. Here an example using iris data:
library(ggplot2)
library(dplyr)
#Example
iris %>%
{ggplot(subset(., Species == "setosa"), aes(Sepal.Length, Sepal.Width)) +
geom_point()+
geom_point(data=.,aes(Petal.Length, Petal.Width),color='blue')}
Output:
The ggplot(., aes(Dates, sold_ALL)) is creating a new canvas and the new plot.

Related

How can you plot `geom_point()` with `facet_wrap()` using per-group row number as x?

Is there a way to plot geom_point() so that it implicitly uses the row number as x in a facet? Just like plot(y) but also for multiple facets.
The following fails with Error: geom_point requires the following missing aesthetics: x:
df = data.frame(y = rnorm(60), group = rep(c("A", "B", "C"), 20))
ggplot(df, aes(y = y)) +
geom_point() +
facet_wrap(~group)
Naturally, you can do it using something like the following, but it is quite cumbersome.
df = df %>%
group_by(group) %>%
mutate(row = row_number())
ggplot(df, aes(x = row, y = y)) +
geom_point() +
facet_wrap(~group)
You can try this:
ggplot(df, aes(x=seq(y),y = y))+geom_point() + facet_wrap(~group)
In that way you can avoid the creation of an index variable as you mentioned!!!

How do I facet by geom / layer in ggplot2?

I'm hoping to recreate the gridExtra output below with ggplot's facet_grid, but I'm unsure of what variable ggplot identifies with the layers in the plot. In this example, there are two geoms...
require(tidyverse)
a <- ggplot(mpg)
b <- geom_point(aes(displ, cyl, color = drv))
c <- geom_smooth(aes(displ, cyl, color = drv))
d <- a + b + c
# output below
gridExtra::grid.arrange(
a + b,
a + c,
ncol = 2
)
# Equivalent with gg's facet_grid
# needs a categorical var to iter over...
d$layers
#d + facet_grid(. ~ d$layers??)
The gridExtra output that I'm hoping to recreate is:
A hacky way of doing this is to take the existing data frame and create two, three, as many copies of the data frame you need with a value linked to it to be used for the facet and filtering later on. Union (or rbind) the data frames together into one data frame. Then set up the ggplot and geoms and filter each geom for the desired attribute. Also for the facet use the existing attribute to split the plots.
This can be seen below:
df1 <- data.frame(
graph = "point_plot",
mpg
)
df2 <- data.frame(
graph = "spline_plot",
mpg
)
df <- rbind(df1, df2)
ggplot(df, mapping = aes(x = displ, y = hwy, color = class)) +
geom_point(data = filter(df, graph == "point_plot")) +
geom_smooth(data = filter(df, graph == "spline_plot"), se=FALSE) +
facet_grid(. ~ graph)
If you really want to show different plots on different facets, one hacky way would be to make separate copies of the data and subset those...
mpg2 <- mpg %>% mutate(facet = 1) %>%
bind_rows(mpg %>% mutate(facet = 2))
ggplot(mpg2, aes(displ, cyl, color = drv)) +
geom_point(data = subset(mpg2, facet == 1)) +
geom_smooth(data = subset(mpg2, facet == 2)) +
facet_wrap(~facet)

R: Unexplainable behavior of ggplot inside a function

I have composed a function that develops histograms using ggplot2 on the numerical columns of a dataframe that will be passed to it. The function stores these plots into a list and then returns the list.
However when I run the function I get the same plot again and again.
My code is the following and I provide also a reproducible example.
hist_of_columns = function(data, class, variables_to_exclude = c()){
library(ggplot2)
library(ggthemes)
data = as.data.frame(data)
variables_numeric = names(data)[unlist(lapply(data, function(x){is.numeric(x) | is.integer(x)}))]
variables_not_to_plot = c(class, variables_to_exclude)
variables_to_plot = setdiff(variables_numeric, variables_not_to_plot)
indices = match(variables_to_plot, names(data))
index_of_class = match(class, names(data))
plots = list()
for (i in (1 : length(variables_to_plot))){
p = ggplot(data, aes(x= data[, indices[i]], color= data[, index_of_class], fill=data[, index_of_class])) +
geom_histogram(aes(y=..density..), alpha=0.3,
position="identity", bins = 100)+ theme_economist() +
geom_density(alpha=.2) + xlab(names(data)[indices[i]]) + labs(fill = class) + guides(color = FALSE)
name = names(data)[indices[i]]
plots[[name]] = p
}
plots
}
data(mtcars)
mtcars$am = factor(mtcars$am)
data = mtcars
variables_to_exclude = 'mpg'
class = 'am'
plots = hist_of_columns(data, class, variables_to_exclude)
If you check the list plots you will discover that it contains the same plot repeated.
Simply use aes_string to pass string variables into the ggplot() call. Right now, your plot uses different data sources, not aligned with ggplot's data argument. Below x, color, and fill are separate, unrelated vectors though they derive from same source but ggplot does not know that:
ggplot(data, aes(x= data[, indices[i]], color= data[, index_of_class], fill=data[, index_of_class]))
However, with aes_string, passing string names to x, color, and fill will point to data:
ggplot(data, aes_string(x= names(data)[indices[i]], color= class, fill= class))
Here is strategy using tidyeval that does what you are after:
library(rlang)
library(tidyverse)
hist_of_cols <- function(data, class, drop_vars) {
# tidyeval overhead
class_enq <- enquo(class)
drop_enqs <- enquo(drop_vars)
data %>%
group_by(!!class_enq) %>% # keep the 'class' column always
select(-!!drop_enqs) %>% # drop any 'drop_vars'
select_if(is.numeric) %>% # keep only numeric columns
gather("key", "value", -!!class_enq) %>% # go to long form
split(.$key) %>% # make a list of data frames
map(~ ggplot(., aes(value, fill = !!class_enq)) + # plot as usual
geom_histogram() +
geom_density(alpha = .5) +
labs(x = unique(.$key)))
}
hist_of_cols(mtcars, am, mpg)
hist_of_cols(mtcars, am, c(mpg, wt))

Select specific column data to geom point

I have this data:
Date ID Value
10-Apr-17 12:02:30 A 4.107919756
10-Apr-17 12:02:31 A 4.107539119
10-Apr-17 12:02:32 A 5.503949115
10-Apr-17 12:02:33 B 5.842728032
10-Apr-17 12:02:34 B 8.516053634
10-Apr-17 12:02:35 B 1.515112486
10-Apr-17 12:02:36 B 5.224667007
I want to plot geom_point using only the column ID == 'A'.
library(ggplot2)
library(lubridate)
library(magrittr)
thedata <- read.csv("~/Downloads/Vel.csv", header = TRUE)
thedata$newDate <- dmy_hms(thedata$Date)
ggplot(thedata, aes(newDate, Value)) +
geom_point(thedata=thedata$ID %>% filter(thedata$ID == "A"))
But it plots all points (A and B IDs).
And it gives me
"Warning: Ignoring unknown parameters: thedata"
when using ggplot.
UPDATE
Using :
thedata <- read.csv("~/Downloads/Vel.csv", header = TRUE)
thedata <- as.data.frame(thedata)
thedata$newDate <- dmy_hms(thedata$Date)
ggplot(thedata, aes(newDate, Value)) +
geom_point(data=thedata$ID %>% filter(thedata$ID == "A"))
hence, using data as data frame , and using geom_point(data=thedata$ID %>% instead of geom_point(thedata=thedata$ID %>% as #aosmith pointed,
results in :
Error: ggplot2 doesn't know how to deal with data of class ts
I think this is how you should do it :
ggplot(thedata %>% dplyr::filter(ID == "A"), aes(newDate, Value)) +
geom_point()
The point is that you can't specify a new dataframe in the geom when you specified one in ggplot(). I guess you could also do something like that :
ggplot() +
geom_point(data = thedata %>% dplyr::filter(ID == "A"), aes(newDate, Value))
edit :
I updated the second code block so it should work now.
About the filter() function, you don't need to pipe thedata in your case. This work just as well and is easier to read : geom_point(data = filter(thedata, ID == "A"), aes(newDate, Value))
Also, it's only my opinion but I guess it would be more interesting for you to plot the whole data and color by ID, like this :
ggplot() +
geom_point(data = thedata, aes(newDate, Value, colour = ID))
To finish on the question of feeding ggplot() with a dataframe, note that you can specify different data to all geom, as in this example with mtcars dataset :
ggplot() +
geom_point(data = mtcars, aes(mpg, disp, colour = cyl)) +
geom_point(data = filter(mtcars, cyl == 6), aes(qsec, drat))

How to mark minimum point from ggplot line plot [duplicate]

I am using the built-in economics (from the ggplot2 package) dataset in R, and have plotted a time-series for each variable in the same graph using the following code :
library(reshape2)
library(ggplot2)
me <- melt(economics, id = c("date"))
ggplot(data = me) +
geom_line(aes(x = date, y = value)) +
facet_wrap(~variable, ncol = 1, scales = 'free_y')
Now, I further want to refine my graph, For each series, I want to display a red point for the smallest and the largest value.
So I thought if I could find the co-ordinates of the min and max of each time-series, I could find a way to plot a red dot at beginning and ending of each time series. For this I used the following code :
which(pce == min(economics$pce), arr.ind = TRUE)
which(pca == max(pca), arr.ind = TRUE)
This doesnt really lead me anywhere.
Thank you:)
Method 1: Using Joins
This can be nice when you want to save the filtered subsets
library(reshape2)
library(ggplot2)
library(dplyr)
me <- melt(economics, id=c("date"))
me %>%
group_by(variable) %>%
summarise(min = min(value),
max = max(value)) -> me.2
left_join(me, me.2) %>%
mutate(color = value == min | value == max) %>%
filter(color == TRUE) -> me.3
ggplot(data=me, aes(x = date, y = value)) +
geom_line() +
geom_point(data=me.3, aes(x = date, y = value), color = "red") +
facet_wrap(~variable, ncol=1, scales='free_y')
Method 2: Simplified without Joins
Thanks #Gregor
me.2 <- me %>%
group_by(variable) %>%
mutate(color = (min(value) == value | max(value) == value))
ggplot(data=me.2, aes(x = date, y = value)) +
geom_line() +
geom_point(aes(color = color)) +
facet_wrap(~variable, ncol=1, scales="free_y") +
scale_color_manual(values = c(NA, "red"))

Resources