Select specific column data to geom point

Select specific column data to geom point - r

I have this data:
Date ID Value
10-Apr-17 12:02:30 A 4.107919756
10-Apr-17 12:02:31 A 4.107539119
10-Apr-17 12:02:32 A 5.503949115
10-Apr-17 12:02:33 B 5.842728032
10-Apr-17 12:02:34 B 8.516053634
10-Apr-17 12:02:35 B 1.515112486
10-Apr-17 12:02:36 B 5.224667007
I want to plot geom_point using only the column ID == 'A'.
library(ggplot2)
library(lubridate)
library(magrittr)
thedata <- read.csv("~/Downloads/Vel.csv", header = TRUE)
thedata$newDate <- dmy_hms(thedata$Date)
ggplot(thedata, aes(newDate, Value)) +
geom_point(thedata=thedata$ID %>% filter(thedata$ID == "A"))
But it plots all points (A and B IDs).
And it gives me
"Warning: Ignoring unknown parameters: thedata"
when using ggplot.
UPDATE
Using :
thedata <- read.csv("~/Downloads/Vel.csv", header = TRUE)
thedata <- as.data.frame(thedata)
thedata$newDate <- dmy_hms(thedata$Date)
ggplot(thedata, aes(newDate, Value)) +
geom_point(data=thedata$ID %>% filter(thedata$ID == "A"))
hence, using data as data frame , and using geom_point(data=thedata$ID %>% instead of geom_point(thedata=thedata$ID %>% as #aosmith pointed,
results in :
Error: ggplot2 doesn't know how to deal with data of class ts

I think this is how you should do it :
ggplot(thedata %>% dplyr::filter(ID == "A"), aes(newDate, Value)) +
geom_point()
The point is that you can't specify a new dataframe in the geom when you specified one in ggplot(). I guess you could also do something like that :
ggplot() +
geom_point(data = thedata %>% dplyr::filter(ID == "A"), aes(newDate, Value))
edit :
I updated the second code block so it should work now.
About the filter() function, you don't need to pipe thedata in your case. This work just as well and is easier to read : geom_point(data = filter(thedata, ID == "A"), aes(newDate, Value))
Also, it's only my opinion but I guess it would be more interesting for you to plot the whole data and color by ID, like this :
ggplot() +
geom_point(data = thedata, aes(newDate, Value, colour = ID))
To finish on the question of feeding ggplot() with a dataframe, note that you can specify different data to all geom, as in this example with mtcars dataset :
ggplot() +
geom_point(data = mtcars, aes(mpg, disp, colour = cyl)) +
geom_point(data = filter(mtcars, cyl == 6), aes(qsec, drat))

Related

Two ggplot with subset in pipe

I would like to plot two lines in one plot (both has the same axis), but one of the line is subset values from data frame.
I tries this
DF%>% ggplot(subset(., Cars == "A"), aes(Dates, sold_A)) +geom_line()+ ggplot(., (Dates, sold_ALL))
but this error occurred
object '.' not found

(1) You can't add a ggplot object to a ggplot object:
(2) Try taking the subset out of the call to ggplot.
DF %>%
subset(Cars == "A") %>%
ggplot(aes(Dates, sold_A)) +
geom_line() +
geom_line(data = DF, aes(Dates, sold_ALL))

I think you are misunderstanding how ggplot works. If we are attempting to do it your way, we could do:
DF %>% {ggplot(subset(., Cars == "A"), aes(Dates, sold_A)) +
geom_line(colour = "red") +
geom_line(data = subset(., Cars == "B"), colour = "blue") +
lims(y = c(0, 60))}
But it would be easier and better to map the variable Cars to the colour aesthetic, so your plot would be as simple as:
DF %>% ggplot(aes(Dates, sold_A, color = Cars)) + geom_line() + lims(y = c(0, 60))
Note that as well as being simpler code, we get the legend for free.
Data
Obviously, we didn't have your data for this question, but here is a constructed data set with the same name and same column variables:
set.seed(1)
Dates <- rep(seq(as.Date("2020-01-01"), by = "day", length = 20), 2)
Cars <- rep(c("A", "B"), each = 20)
sold_A <- rpois(40, rep(c(20, 40), each = 20))
DF <- data.frame(Dates, Cars, sold_A)

If you want only one plot, you would need to remove ggplot(., aes(Dates, sold_ALL)) and wrap directly into a structure like geom_line(data=., aes(Dates, sold_ALL)). Then, use the sage advice from #MrFlick. Here an example using iris data:
library(ggplot2)
library(dplyr)
#Example
iris %>%
{ggplot(subset(., Species == "setosa"), aes(Sepal.Length, Sepal.Width)) +
geom_point()+
geom_point(data=.,aes(Petal.Length, Petal.Width),color='blue')}
Output:
The ggplot(., aes(Dates, sold_ALL)) is creating a new canvas and the new plot.

How can you plot `geom_point()` with `facet_wrap()` using per-group row number as x?

Is there a way to plot geom_point() so that it implicitly uses the row number as x in a facet? Just like plot(y) but also for multiple facets.
The following fails with Error: geom_point requires the following missing aesthetics: x:
df = data.frame(y = rnorm(60), group = rep(c("A", "B", "C"), 20))
ggplot(df, aes(y = y)) +
geom_point() +
facet_wrap(~group)
Naturally, you can do it using something like the following, but it is quite cumbersome.
df = df %>%
group_by(group) %>%
mutate(row = row_number())
ggplot(df, aes(x = row, y = y)) +
geom_point() +
facet_wrap(~group)

You can try this:
ggplot(df, aes(x=seq(y),y = y))+geom_point() + facet_wrap(~group)
In that way you can avoid the creation of an index variable as you mentioned!!!

How do I facet by geom / layer in ggplot2?

I'm hoping to recreate the gridExtra output below with ggplot's facet_grid, but I'm unsure of what variable ggplot identifies with the layers in the plot. In this example, there are two geoms...
require(tidyverse)
a <- ggplot(mpg)
b <- geom_point(aes(displ, cyl, color = drv))
c <- geom_smooth(aes(displ, cyl, color = drv))
d <- a + b + c
# output below
gridExtra::grid.arrange(
a + b,
a + c,
ncol = 2
)
# Equivalent with gg's facet_grid
# needs a categorical var to iter over...
d$layers
#d + facet_grid(. ~ d$layers??)
The gridExtra output that I'm hoping to recreate is:

A hacky way of doing this is to take the existing data frame and create two, three, as many copies of the data frame you need with a value linked to it to be used for the facet and filtering later on. Union (or rbind) the data frames together into one data frame. Then set up the ggplot and geoms and filter each geom for the desired attribute. Also for the facet use the existing attribute to split the plots.
This can be seen below:
df1 <- data.frame(
graph = "point_plot",
mpg
)
df2 <- data.frame(
graph = "spline_plot",
mpg
)
df <- rbind(df1, df2)
ggplot(df, mapping = aes(x = displ, y = hwy, color = class)) +
geom_point(data = filter(df, graph == "point_plot")) +
geom_smooth(data = filter(df, graph == "spline_plot"), se=FALSE) +
facet_grid(. ~ graph)

If you really want to show different plots on different facets, one hacky way would be to make separate copies of the data and subset those...
mpg2 <- mpg %>% mutate(facet = 1) %>%
bind_rows(mpg %>% mutate(facet = 2))
ggplot(mpg2, aes(displ, cyl, color = drv)) +
geom_point(data = subset(mpg2, facet == 1)) +
geom_smooth(data = subset(mpg2, facet == 2)) +
facet_wrap(~facet)

R: Unexplainable behavior of ggplot inside a function

I have composed a function that develops histograms using ggplot2 on the numerical columns of a dataframe that will be passed to it. The function stores these plots into a list and then returns the list.
However when I run the function I get the same plot again and again.
My code is the following and I provide also a reproducible example.
hist_of_columns = function(data, class, variables_to_exclude = c()){
library(ggplot2)
library(ggthemes)
data = as.data.frame(data)
variables_numeric = names(data)[unlist(lapply(data, function(x){is.numeric(x) | is.integer(x)}))]
variables_not_to_plot = c(class, variables_to_exclude)
variables_to_plot = setdiff(variables_numeric, variables_not_to_plot)
indices = match(variables_to_plot, names(data))
index_of_class = match(class, names(data))
plots = list()
for (i in (1 : length(variables_to_plot))){
p = ggplot(data, aes(x= data[, indices[i]], color= data[, index_of_class], fill=data[, index_of_class])) +
geom_histogram(aes(y=..density..), alpha=0.3,
position="identity", bins = 100)+ theme_economist() +
geom_density(alpha=.2) + xlab(names(data)[indices[i]]) + labs(fill = class) + guides(color = FALSE)
name = names(data)[indices[i]]
plots[[name]] = p
}
plots
}
data(mtcars)
mtcars$am = factor(mtcars$am)
data = mtcars
variables_to_exclude = 'mpg'
class = 'am'
plots = hist_of_columns(data, class, variables_to_exclude)
If you check the list plots you will discover that it contains the same plot repeated.

Simply use aes_string to pass string variables into the ggplot() call. Right now, your plot uses different data sources, not aligned with ggplot's data argument. Below x, color, and fill are separate, unrelated vectors though they derive from same source but ggplot does not know that:
ggplot(data, aes(x= data[, indices[i]], color= data[, index_of_class], fill=data[, index_of_class]))
However, with aes_string, passing string names to x, color, and fill will point to data:
ggplot(data, aes_string(x= names(data)[indices[i]], color= class, fill= class))

Here is strategy using tidyeval that does what you are after:
library(rlang)
library(tidyverse)
hist_of_cols <- function(data, class, drop_vars) {
# tidyeval overhead
class_enq <- enquo(class)
drop_enqs <- enquo(drop_vars)
data %>%
group_by(!!class_enq) %>% # keep the 'class' column always
select(-!!drop_enqs) %>% # drop any 'drop_vars'
select_if(is.numeric) %>% # keep only numeric columns
gather("key", "value", -!!class_enq) %>% # go to long form
split(.$key) %>% # make a list of data frames
map(~ ggplot(., aes(value, fill = !!class_enq)) + # plot as usual
geom_histogram() +
geom_density(alpha = .5) +
labs(x = unique(.$key)))
}
hist_of_cols(mtcars, am, mpg)
hist_of_cols(mtcars, am, c(mpg, wt))

Don't drop zero count: dodged barplot

I am making a dodged barplot in ggplot2 and one grouping has a zero count that I want to display. I remembered seeing this on HERE a while back and figured the scale_x_discrete(drop=F) would work. It does not appear to work with dodged bars. How can I make the zero counts show?
For instance, (code below) in the plot below, type8~group4 has no examples. I would still like the plot to display the empty space for the zero count instead of eliminating the bar. How can I do this?
mtcars2 <- data.frame(type=factor(mtcars$cyl),
group=factor(mtcars$gear))
m2 <- ggplot(mtcars2, aes(x=type , fill=group))
p2 <- m2 + geom_bar(colour="black", position="dodge") +
scale_x_discrete(drop=F)
p2

Here's how you can do it without making summary tables first.
It did not work in my CRAN versioin (2.2.1) but in the latest development version of ggplot (2.2.1.900) I had no issues.
ggplot(mtcars, aes(factor(cyl), fill = factor(vs))) +
geom_bar(position = position_dodge(preserve = "single"))
http://ggplot2.tidyverse.org/reference/position_dodge.html

Updated geom_bar() needs stat = "identity"
For what it's worth: The table of counts, dat, above contains NA. Sometimes, it is useful to have an explicit 0 instead; for instance, if the next step is to put counts above the bars. The following code does just that, although it's probably no simpler than Joran's. It involves two steps: get a crosstabulation of counts using dcast, then melt the table using melt, followed by ggplot() as usual.
library(ggplot2)
library(reshape2)
mtcars2 = data.frame(type=factor(mtcars$cyl), group=factor(mtcars$gear))
dat = dcast(mtcars2, type ~ group, fun.aggregate = length)
dat.melt = melt(dat, id.vars = "type", measure.vars = c("3", "4", "5"))
dat.melt
ggplot(dat.melt, aes(x = type,y = value, fill = variable)) +
geom_bar(stat = "identity", colour = "black", position = position_dodge(width = .8), width = 0.7) +
ylim(0, 14) +
geom_text(aes(label = value), position = position_dodge(width = .8), vjust = -0.5)

The only way I know of is to pre-compute the counts and add a dummy row:
dat <- rbind(ddply(mtcars2,.(type,group),summarise,count = length(group)),c(8,4,NA))
ggplot(dat,aes(x = type,y = count,fill = group)) +
geom_bar(colour = "black",position = "dodge",stat = "identity")
I thought that using stat_bin(drop = FALSE,geom = "bar",...) instead would work, but apparently it does not.

I asked this same question, but I only wanted to use data.table, as it's a faster solution for much larger data sets. I included notes on the data so that those that are less experienced and want to understand why I did what I did can do so easily. Here is how I manipulated the mtcars data set:
library(data.table)
library(scales)
library(ggplot2)
mtcars <- data.table(mtcars)
mtcars$Cylinders <- as.factor(mtcars$cyl) # Creates new column with data from cyl called Cylinders as a factor. This allows ggplot2 to automatically use the name "Cylinders" and recognize that it's a factor
mtcars$Gears <- as.factor(mtcars$gear) # Just like above, but with gears to Gears
setkey(mtcars, Cylinders, Gears) # Set key for 2 different columns
mtcars <- mtcars[CJ(unique(Cylinders), unique(Gears)), .N, allow.cartesian = TRUE] # Uses CJ to create a completed list of all unique combinations of Cylinders and Gears. Then counts how many of each combination there are and reports it in a column called "N"
And here is the call that produced the graph
ggplot(mtcars, aes(x=Cylinders, y = N, fill = Gears)) +
geom_bar(position="dodge", stat="identity") +
ylab("Count") + theme(legend.position="top") +
scale_x_discrete(drop = FALSE)
And it produces this graph:
Furthermore, if there is continuous data, like that in the diamonds data set (thanks to mnel):
library(data.table)
library(scales)
library(ggplot2)
diamonds <- data.table(diamonds) # I modified the diamonds data set in order to create gaps for illustrative purposes
setkey(diamonds, color, cut)
diamonds[J("E",c("Fair","Good")), carat := 0]
diamonds[J("G",c("Premium","Good","Fair")), carat := 0]
diamonds[J("J",c("Very Good","Fair")), carat := 0]
diamonds <- diamonds[carat != 0]
Then using CJ would work as well.
data <- data.table(diamonds)[,list(mean_carat = mean(carat)), keyby = c('cut', 'color')] # This step defines our data set as the combinations of cut and color that exist and their means. However, the problem with this is that it doesn't have all combinations possible
data <- data[CJ(unique(cut),unique(color))] # This functions exactly the same way as it did in the discrete example. It creates a complete list of all possible unique combinations of cut and color
ggplot(data, aes(color, mean_carat, fill=cut)) +
geom_bar(stat = "identity", position = "dodge") +
ylab("Mean Carat") + xlab("Color")
Giving us this graph:

Use count and complete from dplyr to do this.
library(tidyverse)
mtcars %>%
mutate(
type = as.factor(cyl),
group = as.factor(gear)
) %>%
count(type, group) %>%
complete(type, group, fill = list(n = 0)) %>%
ggplot(aes(x = type, y = n, fill = group)) +
geom_bar(colour = "black", position = "dodge", stat = "identity")

You can exploit the feature of the table() function, which computes the number of occurrences of a factor for all its levels
# load plyr package to use ddply
library(plyr)
# compute the counts using ddply, including zero occurrences for some factor levels
df <- ddply(mtcars2, .(group), summarise,
types = as.numeric(names(table(type))),
counts = as.numeric(table(type)))
# plot the results
ggplot(df, aes(x = types, y = counts, fill = group)) +
geom_bar(stat='identity',colour="black", position="dodge")

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Select specific column data to geom point - r

Related

Two ggplot with subset in pipe

How can you plot `geom_point()` with `facet_wrap()` using per-group row number as x?

How do I facet by geom / layer in ggplot2?

R: Unexplainable behavior of ggplot inside a function

Don't drop zero count: dodged barplot

Categories

Resources