ggplot scatter creating uniform points - r

I am trying to make a scatter plot with ggplot to show time watching TV on x axis and immigrant sentiment on y axis.
The code I am using is
ggplot(totalTV,
aes(x = dfnew.TV.watching..total.time.on.average.weekday,
y = dfnew.Immigrant.Sentiment)) +
geom_point()
I am getting this output
My table is so, with first variable being character, and subsequent two being numeric:
Any idea on how to produce a representative scatter of the outcome?
Cheers

Here are some examples using the mtcars dataset.
library(ggplot2)
# Original
ggplot(mtcars,aes(factor(cyl),mpg)) +
geom_point()
# Jitter
ggplot(mtcars,aes(factor(cyl),mpg)) +
geom_jitter(width = .2) # Control spread with width
# Violin plot
ggplot(mtcars,aes(factor(cyl),mpg)) +
geom_violin()
# Boxplot
ggplot(mtcars,aes(factor(cyl),mpg)) +
geom_boxplot()
# Remember that different geoms can be combined
ggplot(mtcars,aes(factor(cyl),mpg)) +
geom_violin() +
geom_jitter(width = .2)
# Or something more exotic ala Raincloud-plots
# https://micahallen.org/2018/03/15/introducing-raincloud-plots/

Related

Draw interval on geom_density

How do I draw a horizontal line indicating the Highest (Posterior) Density interval for faceted density plots in ggplot2? This is what I have tried:
# Functions to calculate lower and upper part of HPD.
hpd_lower = function(x) coda::HPDinterval(as.mcmc(x))[1]
hpd_upper = function(x) coda::HPDinterval(as.mcmc(x))[2]
# Data: two groups with different means
df = data.frame(value=c(rnorm(500), rnorm(500, mean=5)), group=rep(c('A', 'B'), each=500))
# Plot it
ggplot(df, aes(x=value)) +
geom_density() +
facet_wrap(~group) +
geom_segment(aes(x=hpd_lower(value), xend=hpd_upper(value), y=0, yend=0), size=3)
As you can see, geom_segment computes on all data for both facets whereas I would like it to respect the faceting. I would also like a solution where HPDinterval is only run once per facet.
Pre-calculate the hpd intervals. ggplot evaluates the calculations in the aes() function in the entire data frame, even when data are grouped.
# Plot it
library(dplyr)
df_hpd <- group_by(df, group) %>% summarize(x=hpd_lower(value), xend=hpd_upper(value))
ggplot(df, aes(x=value)) +
geom_density() +
facet_wrap(~group) +
geom_segment(data = df_hpd, aes(x=x, xend=xend, y=0, yend=0), size=3)

Connect points within x values for ggplot2?

I am plotting a series of point that are grouped by two factors. I would like to add lines within one group across the other and within the x value (across the position-dodge distance) to visually highlight trends within the data.
geom_line(), geom_segment(), and geom_path() all seem to plot only to the actual x value rather than the position-dodge place of the data points. Is there a way to add a line connecting points within the x value?
Here is a structurally analogous sample:
# Create a sample data set
d <- data.frame(expand.grid(x=letters[1:3],
g1=factor(1:2),
g2=factor(1:2)),
y=rnorm(12))
# Load ggplot2
library(ggplot2)
# Define position dodge
pd <- position_dodge(0.75)
# Define the plot
p <- ggplot(d, aes(x=x, y=y, colour=g1, group=interaction(g1,g2))) +
geom_point(aes(shape = factor(g2)), position=pd) +
geom_line()
# Look at the figure
p
# How to plot the line instead across g1, within g2, and within x?
Simply trying to close this question (#Axeman please feel free to take over my answer).
p <- ggplot(d, aes(x=x, y=y, colour=g1, group=interaction(g1,g2))) +
geom_point(aes(shape = factor(g2)), position=pd) +
geom_line(position = pd)
# Look at the figure
p

How to use free scales but keep a fixed reference point in ggplot?

I am trying to create a plot with facets. Each facet should have its own scale, but for ease of visualization I would like each facet to show a fixed y point. Is this possible with ggplot?
This is an example using the mtcars dataset. I plot the weight (wg) as a function of the number of miles per gallon (mpg). The facets represent the number of cylinders of each car. As you can see, I would like the y scales to vary across facets, but still have a reference point (3, in the example) at the same height across facets. Any suggestions?
library(ggplot2)
data(mtcars)
ggplot(mtcars, aes(mpg, wt)) + geom_point() +
geom_hline (yintercept=3, colour="red", lty=6, lwd=1) +
facet_wrap( ~ cyl, scales = "free_y")
[EDIT: in my actual data, the fixed reference point should be at y = 0. I used y = 3 in the example above because 0 didn't make sense for the range of the data points in the example]
It's unclear where the line should be, let's assume in the middle; you could compute limits outside ggplot, and add a dummy layer to set the scales,
library(ggplot2)
library(plyr)
# data frame where 3 is the middle
# 3 = (min + max) /2
dummy <- ddply(mtcars, "cyl", summarise,
min = 6 - max(wt),
max = 6 - min(wt))
ggplot(mtcars, aes(mpg, wt)) + geom_point() +
geom_blank(data=dummy, aes(y=min, x=Inf)) +
geom_blank(data=dummy, aes(y=max, x=Inf)) +
geom_hline (yintercept=3, colour="red", lty=6, lwd=1) +
facet_wrap( ~ cyl, scales = "free_y")

How to use scale from previous plot in current plot with ggplot2?

I am using ggplot2 to produce a plot that has 3 facets. Because I am comparing two different data sets, I would like to then be able to plot a second data set using the same y scale for the facets as in the first plot. However, I cannot find a simple way to save the settings of the first plot to then re-use them with the second plot. Since each facet has its own y scale, it will be a pain to specify them by hand for the second plot. Does anyone know of a quick way of re-using scales? To make this concrete, here is how I am generating first my plot:
p <- ggplot(mtcars, aes(mpg, wt)) + geom_point()
p + facet_wrap(~ cyl, scales = "free_y")
EDIT
When applying one of the suggestions below, I found out that my problem was more specific than described in the original post, and it had to do specifically with scaling of the error bars. Concretely, the error bars look weird when I rescale the second plot as suggested. Does anyone have any suggestions on how to keep the same scale for both plots and dtill display the error bars correctly? I am attaching example below for concreteness:
#Create sample data
d1 <- data.frame(fixtype=c('ff','ff','fp','fp'), detype=c('det','pro','det','pro'),
diffscore=c(-1,-15,3,-17),se=c(2,3,1,2))
d2 <- data.frame(fixtype=c('ff','ff','fp','fp'), detype=c('det','pro','det','pro'),
diffscore=c(-1,-3,-2,-1),se=c(4,3,5,3))
#Plot for data frame 1, this is the scale I want to keep
lim_d1 <- aes(ymax = diffscore + se, ymin=diffscore - se)
ggplot(d1, aes(colour=detype, y=diffscore, x=detype)) +
geom_point(aes(size=1), shape=15) +
geom_errorbar(lim_d1, width=0.2,size=1) +
facet_wrap(~fixtype, nrow=2, ncol=2, scales = "free_y")
#Plot for data frame 2 original scale
lim_d2 <- aes(ymax = diffscore + se, ymin=diffscore - se)
ggplot(d2, aes(colour=detype, y=diffscore, x=detype)) +
geom_point(aes(size=1), shape=15) +
geom_errorbar(lim_d2, width=0.2,size=1) +
facet_wrap(~fixtype, nrow=2, ncol=2, scales = "free_y")
#Plot for data frame 2 adjusted scale. This is where things go wrong!
#As suggested below, first I plot the first plot, then I draw a blank screen and try
#to plot the second data frame on top.
lim_d2 <- aes(ymax = diffscore + se, ymin=diffscore - se)
ggplot(d1, aes(colour=detype, y=diffscore, x=detype)) +
geom_blank() +
geom_point(data=d2, aes(size=1), shape=15) +
geom_errorbar(lim_d2, width=0.2,size=1) +
facet_wrap(~fixtype, nrow=2, ncol=2, scales = "free_y")
#If the error bars are fixed, by adding data=d2 to geom_errorbar(), then
#the error bars are displayed correctly but the scale gets distorted again
lim_d2 <- aes(ymax = diffscore + se, ymin=diffscore - se)
ggplot(d1, aes(colour=detype, y=diffscore, x=detype)) +
geom_blank() +
geom_point(data=d2, aes(size=1), shape=15) +
geom_errorbar(data=d2,lim_d2, width=0.2,size=1) +
facet_wrap(~fixtype, nrow=2, ncol=2, scales = "free_y")
You may first call ggplot on your original data where you add a geom_blank as a first layer. This sets up a plot area, with axes and legends based on the data provided in ggplot.
Then add geoms which use data other than the original data. In the example, I use a simple subset of the original data.
From ?geom_blank: "The blank geom draws nothing, but can be a useful way of ensuring common scales between different plots.".
ggplot(data = mtcars, aes(mpg, wt)) +
geom_blank() +
geom_point(data = subset(mtcars, wt < 3)) +
facet_wrap(~ cyl, scales = "free_y")
Here is an ugly hack that assumes you have an identical facetting layout in both plots.
It replaces the panel element of the ggplot build.
p <- ggplot(mtcars, aes(mpg, wt)) + geom_point()
p1 <- p + facet_wrap(~ cyl, scales = "free_y") + labs(title = 'original')
# create "other" data.frame
n <- nrow(mtcars)
set.seed(201405)
mtcars2 <- mtcars[sample(seq_len(n ),n-15),]
# create this second plot
p2 <- p1 %+% mtcars2 + labs(title = 'new data')
# and a copy so we can attempt to fix
p3 <- p2 + labs(title = 'new data original scale')
# use ggplot_build to construct the plots for rendering
p1b <- ggplot_build(p1)
p3b <- ggplot_build(p3)
# replace the 'panel' information in plot 2 with that
# from plot 1
p3b[['panel']] <- p1b[['panel']]
# render the revised plot
# for comparison
library(gridExtra)
grid.arrange(p1 , p2, ggplot_gtable(p3b))

converting boxplots to densities in ggplot2 in R

I have the following ggplot2 plot:
ggplot(iris) + geom_boxplot(aes(x=Species, y=Petal.Length, fill=Species)) + coord_flip()
I would like to instead plot this as horizontal density plots or histograms, meaning have density line plots for each species or histograms instead of boxplots. This does not do the trick:
> ggplot(iris) + geom_density(aes(x=Species, y=Petal.Length, fill=Species)) + coord_flip()
Error in eval(expr, envir, enclos) : object 'y' not found
for simplicity I used Species as the x variable and as the fill but in my actual data the X axis represents one set of conditions and the fill represents another. Though that should not matter for plotting purposes. I'm trying to make it so the X axis represents different conditions for which the value y is plotted as a density/histogram instead of boxplots.
edit this is better illustrated with a variable that has two factor-like variables like Species. In the mpg dataset, I want to make a density plot for each manufacturer, plotting the distribution of displ for each cyl value. The x-axis (which is vertical in flipped coordinates) represents each manufacturer, and value being histogrammed is displ, but for each manufacturer, I want as many histograms as there are cyl values for that manufacturer. Hope this is clearer. I know that this doesn't work because y= expects counts.
ggplot(mpg, aes(x=manufacturer, fill=cyl, y=displ)) +
geom_density(position="identity") + coord_flip()
The closest I get is:
> ggplot(mpg, aes(x=displ, fill=cyl)) +
+ geom_density(position="identity") + facet_grid(manufacturer ~ .)
But I don't want different grids, I'd like them to be different entries in the same plot like in the histogram case.
Something like this? For both histogram and density plots, the y variable is count. So, you've to plot x = Petal.Length whose frequency (for that given binwidth) will be plotted in the y-axis. Just use fill=Species along with x=Petal.Length to give colours by Species.
For histogram:
ggplot(iris, aes(x=Petal.Length, fill=Species)) +
geom_histogram(position="identity") + coord_flip()
For density:
ggplot(iris, aes(x=Petal.Length, fill=Species)) +
geom_density(position="identity") + coord_flip()
Edit: Maybe you're looking for facetting??
ggplot(mpg, aes(x=displ, fill=factor(cyl))) +
geom_density(position="identity") +
facet_wrap( ~ manufacturer, ncol=3)
Gives:
Edit: Since, you don't want facetting, the only other way I can think of is to create a separate group by pasting manufacturer and cyl together:
dd <- mpg
dd$grp <- factor(paste(dd$manufacturer, dd$cyl))
ggplot(dd, aes(x=displ)) +
geom_density(aes(fill=grp), position="identity")
gives:

Resources