Two plots in one plot with ggplot - r

I need to create "two plots" in "one plot" with ggplot. I managed to do it with base R as follows:
x=rnorm(10)
y=rnorm(10)*20+100
plot(1:10,rev(sort(x)),cex=2,col='red',ylim=c(0,2.2))
segments(x0=1:10, x1=1:10, y0=1.8,y1=1.8+y/max(y)*.2,lwd=3,col='dodgerblue')
However, I am struggling with ggplot, how can it be done?

Here's one possible translation of that code.
ggplot(data.frame(idx=seq_along(x), x,y)) +
geom_point(aes(idx, rev(sort(x))), col="red") +
geom_segment(aes(x=idx, xend=idx, y=1.8, yend=1.8+y/max(y)*.2), color="dodgerblue")
In general with ggplot2, you can add multiple views of data to a plot by adding additional layers (geoms)

My solution is similar to #MrFlick.
I would always recommend having a plot data frame and referring to the variables from there as you can more easily relate variables to plot aesthetics.
library(tidyverse)
plot_df <- data.frame(x, y) %>%
arrange(-x) %>%
mutate(id = 1:10)
ggplot(plot_df) +
geom_point(aes(id, x), color = "red", pch = 1, size = 5) +
geom_segment(aes(x = id, xend = id, y = 1.8, yend = 1.8+y/max(y)*.2),
lwd = 2, color = 'dodgerblue') +
scale_y_continuous(limits = c(0,2.2)) +
theme_light()
Ultimately, the goal of ggplot is to add aesthetics (in this case, the points and the segments) to form the final plot.
If you'd like to learn more, check out the ggplot cheat sheet and read more on the ideas behind ggplot: https://ggplot2.tidyverse.org/

Related

'lines' function in R not showing

I am trying to add lines for confidence intervals in R but lines() isn't working. In the following code b is a dataframe, 100 observations of 2 variables 'pred' and 'se'.
plot(c(1:300),b$pred,type="l",lwd=1.5)
lines(c(1:300),b$pred+2*b$se,type="l",lty=2,col='red')
The first line is working but the second is not. I have tried it with and without the x values (plot works with or without, lines works for neither). I can get lines to work for different dataframes, but not this one.
It seems very fragile to me to use 1:300 when also referencing b; it might work when b has 300 rows, but any other time it's going to either complain with warnings or recycling silently and show a misleading/meaningless plot. In general, "never" use hard-coded numbers when working programmatically like this, perhaps better seq_len(nrow(b)) instead of 1:300.
The bounds (x/y limits) for the plot are defined with the first plot command. After that, in base R graphics, no other plotting command will alter the limits. This means it is highly likely that all of pred+2*se are greater than max(pred), so R thinks it's plotting the lines, but due to plotting inefficiency is really doing nothing since the lines are off-canvas.
For this, you need to set the limits up front, perhaps:
xlims <- with(b, range(c(pred, pred+2*se), na.rm = TRUE))
plot(seq_len(nrow(b)), b$pred, type="l", lwd=1.5, xlim=xlims)
lines(seq_len(nrow(b)), b$pred+2*b$se, type="l", lty=2, col='red')
That should address your question. Continue reading if you want to consider migration to ggplot2 ... not a one-for-one migration, not trivial, and perhaps premature at this point, but still something to think about.
While the above should fix the problem you cited, you might also consider migrating to ggplot2: it allows many other things (too many to discuss here), including the feature of updating the x/y limits with every "layer" you add to it. For instance, I wonder if the above will work:
library(ggplot2)
ggplot(b, aes(x = seq_along(pred), y = pred)) +
geom_line(linewidth = 1.5) + # this is doing what your first 'plot' is doing
geom_line(aes(y = pred + 2*se), linewidth = 2, color = "red") # your call to lines
(Notice no need to handle the x/y limits manually, ggplot2 figures it out for you with each layer added.)
I'm going to infer that you'll want to add a pred - 2*se as well, in which case it'll be another call to geom_line, as in
ggplot(b, aes(x = seq_along(pred), y = pred)) +
geom_line(linewidth = 1.5) +
geom_line(aes(y = pred + 2*se), linewidth = 2, color = "red") +
geom_line(aes(y = pred - 2*se), linewidth = 2, color = "blue")
Note that ggplot2 would actually prefer that you handle this with "long" data ... in that case, we can do something like below:
library(dplyr)
library(tidyr) # pivot_longer
b %>%
select(x, pred, se) %>%
mutate(
x = row_number(),
sehigh = pred + 2*se,
selow = pred - 2*se
) %>%
pivot_longer(-x, names_to = "type", values_to = "val") %>%
ggplot(aes(x, val, group = type, color = type)) +
geom_line() +
scale_color_manual(values = c(pred = "black", sehigh = "red", selow = "blue"))
In this case, only one call to geom_line, and ggplot will handle colors automatically (based on the new categorical variable type that we created in a previous step).

R code of scatter plot for three variables

Hi I am trying to code for a scatter plot for three variables in R:
Race= [0,1]
YOI= [90,92,94]
ASB_mean = [1.56, 1.59, 1.74]
Antisocial <- read.csv(file = 'Antisocial.csv')
Table_1 <- ddply(Antisocial, "YOI", summarise, ASB_mean = mean(ASB))
Table_1
Race <- unique(Antisocial$Race)
Race
ggplot(data = Table_1, aes(x = YOI, y = ASB_mean, group_by(Race))) +
geom_point(colour = "Black", size = 2) + geom_line(data = Table_1, aes(YOI,
ASB_mean), colour = "orange", size = 1)
Image of plot: https://drive.google.com/file/d/1E-ePt9DZJaEr49m8fguHVS0thlVIodu9/view?usp=sharing
Data file: https://drive.google.com/file/d/1UeVTJ1M_eKQDNtvyUHRB77VDpSF1ASli/view?usp=sharing
Can someone help me understand where I am making mistake? I want to plot mean ASB vs YOI grouped by Race. Thanks.
I am not sure what is your desidered output. Maybe, if I well understood your question I Think that you want somthing like this.
g_Antisocial <- Antisocial %>%
group_by(Race) %>%
summarise(ASB = mean(ASB),
YOI = mean(YOI))
Antisocial %>%
ggplot(aes(x = YOI, y = ASB, color = as_factor(Race), shape = as_factor(Race))) +
geom_point(alpha = .4) +
geom_point(data = g_Antisocial, size = 4) +
theme_bw() +
guides(color = guide_legend("Race"), shape = guide_legend("Race"))
and this is the output:
#Maninder: there are a few things you need to look at.
First of all: The grammar of graphics of ggplot() works with layers. You can add layers with different data (frames) for the different geoms you want to plot.
The reason why your code is not working is that you mix the layer call and or do not really specify (and even mix) what is the scatter and line visualisation you want.
(I) Use ggplot() + geom_point() for a scatter plot
The ultimate first layer is: ggplot(). Think of this as your drawing canvas.
You then speak about adding a scatter plot layer, but you actually do not do it.
For example:
# plotting antisocal data set
ggplot() +
geom_point(data = Antisocial, aes(x = YOI, y = ASB, colour = as.factor(Race)))
will plot your Antiscoial data set using the scatter, i.e. geom_point() layer.
Note that I put Race as a factor to have a categorical colour scheme otherwise you might end up with a continous palette.
(II) line plot
In analogy to above, you would get for the line plot the following:
# plotting Table_1
ggplot() +
geom_line(data = Table_1, aes(x = YOI, y = ASB_mean))
I save showing the plot of the line.
(III) combining different layers
# putting both together
ggplot() +
geom_point(data = Antisocial, aes(x = YOI, y = ASB, colour = as.factor(Race))) +
geom_line(data = Table_1, aes(x = YOI, y = ASB_mean)) +
## this is to set the legend title and have a nice(r) name in your colour legend
labs(colour = "Race")
This yields:
That should explain how ggplot-layering works. Keep an eye on the datasets and geoms that you want to use. Before working with inheritance in aes, I recommend to keep the data= and aes() call in the geom_xxxx. This avoids confustion.
You may want to explore with geom_jitter() instead of geom_point() to get a bit of a better presentation of your dataset. The "few" points plotted are the result of many datapoints in the same position (and overplotted).
Moving away from plotting to your question "I want to plot mean ASB vs YOI grouped by Race."
I know too little about your research to fully comprehend what you mean with that.
I take it that the mean ASB you calculated over the whole population is your reference (aka your Table_1), and you would like to see how the Race groups feature vs this population mean.
One option is to group your race data points and show them as boxplots for each YOI.
This might be what you want. The boxplot gives you the median and quartiles, and you can compare this per group against the calculated ASB mean.
For presentation purposes, I highlighted the line by increasing its size and linetype. You can play around with the colours, etc. to give you the aesthetics you aim for.
Please note, that for the grouped boxplot, you also have to treat your integer variable YOI, I coerced into a categorical factor. Boxplot works with fill for the body (colour sets only the outer line). In this setup, you also need to supply a group value to geom_line() (I just assigned it to 1, but that is arbitrary - in other contexts you can assign another variable here).
ggplot() +
geom_boxplot(data = Antisocial, aes(x = as.factor(YOI), y = ASB, fill = as.factor(Race))) +
geom_line(data = Table_1, aes(x = as.factor(YOI), y = ASB_mean, group = 1)
, size = 2, linetype = "dashed") +
labs(x = "YOI", fill = "Race")
Hope this gets you going!

R ggplot2 could not add legend to graph

I'm using visual studio with R version 3.5.1 where I tried to plot legend to the graph.
f1 = function(x) {
return(x+1)}
x1 = seq(0, 1, by = 0.01)
data1 = data.frame(x1 = x1, f1 = f1(x1), F1 = cumtrapz(x1, f1(x1)) )
However, when I tried to plot it, it never give me a legend!
For example, I used the same code in this (Missing legend with ggplot2 and geom_line )
ggplot(data = data1, aes(x1)) +
geom_line(aes(y = f1), color = "1") +
geom_line(aes(y = F1), color = "2") +
scale_color_manual(values = c("red", "blue"))
I also looked into (How to add legend to ggplot manually? - R
) and many other websites in stackoverflo, and I have tried every single function in https://www.rstudio.com/wp-content/uploads/2016/11/ggplot2-cheatsheet-2.1.pdf
i.e.
theme(legend.position = "bottom")
scale_fill_discrete(...)
group
guides()
show.legend=TRUE
I even tried to use the original plot() and legend() function. Neither worked.
I thought there might be something wrong with the dataframe, but I split them(x2,f1,F1) apart, it still didn't work.
I thought there might be something wrong with IDE, but the code given by kohske acturally plotted legend!
d<-data.frame(x=1:5, y1=1:5, y2=2:6)
ggplot(d, aes(x)) +
geom_line(aes(y=y1, colour="1")) +
geom_line(aes(y=y2, colour="2")) +
scale_colour_manual(values=c("red", "blue"))
What's wrong with the code?
As far as I know, you only have X and Y variables in your aesthetics. Therefore there is no need for a legend. You have xlab and ylab to describe your two lines. If you want to have legends, you should put the grouping in the aesthetics, which might require recoding your dataset
d<- data.frame(x=c(1:5, 1:5), y=c(1:5, 2:6), colorGroup = c(rep("redGroup", 5),
rep("blueGroup", 5)))
ggplot(d, aes(x, y, color = colorGroup )) + geom_line()
This should give you two lines and a legend

How to plot a line with color vector in R Plotly

Say I have the following data frame:
ret <- rnorm(100, 0, 5)
df <- data.frame(
x = seq(1, 100, 1),
ret = ret,
y = 100 + cumsum(ret),
col = c(ifelse(ret > 0, "red", "forestgreen"), NA)[-1]
)
Here I'm simulating the returns of some fictional financial asset using rnorm named 'ret', and am defining a color vector named 'col' where upticks are green and downticks are red.
What I want to produce is something like the following:
library(ggplot2)
ggplot(df, aes(x=x, y=y)) + geom_line(aes(colour=col, group=1))
But I want to make a similar image using plotly so that I can zoom in on sections of the plot. My first thought was to try simply using the ggplotly() function around the code that produced the desired image:
library(plotly)
ggplotly(ggplot(df, aes(x=x, y=y)) + geom_line(aes(colour=col, group=1)))
But the plot is no longer grouped. Additionally, I tried using plot_ly() but can't seem to make the line segments get their color according to the 'col' attribute that I'm specifying:
plot_ly(data=df, x = ~x) %>% add_lines(y = ~y, line = list(color=~col))
But my color argument doesn't affect the color of the line. I've tried various other things but keep ending up with one of the two undesired plots. Any help would be much appreciated!
Note: I've already made candlestick and OHLC charts with plot_ly(), but I can't work with them because the y axis doesn't scale when you zoom in to a subsection of the plot.
I was able to get the desired behaviour from ggplotly by using geom_segment and making each segment link up to the next (x, y) value, regardless of colour:
library(dplyr)
df = df %>%
arrange(x) %>%
mutate(x_next = lead(x), y_next = lead(y))
p = ggplot(df, aes(x=x, y=y)) +
geom_segment(aes(xend = x_next, yend = y_next, colour=col))
ggplotly(p)
That said, I don't have a good answer for why ggplotly doesn't produce the desired output in the first place.

How to add gaussian curve to histogram created with qplot?

I have question probably similar to Fitting a density curve to a histogram in R. Using qplot I have created 7 histograms with this command:
(qplot(V1, data=data, binwidth=10, facets=V2~.)
For each slice, I would like to add a fitting gaussian curve. When I try to use lines() method, I get error:
Error in plot.xy(xy.coords(x, y), type = type, ...) :
plot.new has not been called yet
What is the command to do it correctly?
Have you tried stat_function?
+ stat_function(fun = dnorm)
You'll probably want to plot the histograms using aes(y = ..density..) in order to plot the density values rather than the counts.
A lot of useful information can be found in this question, including some advice on plotting different normal curves on different facets.
Here are some examples:
dat <- data.frame(x = c(rnorm(100),rnorm(100,2,0.5)),
a = rep(letters[1:2],each = 100))
Overlay a single normal density on each facet:
ggplot(data = dat,aes(x = x)) +
facet_wrap(~a) +
geom_histogram(aes(y = ..density..)) +
stat_function(fun = dnorm, colour = "red")
From the question I linked to, create a separate data frame with the different normal curves:
grid <- with(dat, seq(min(x), max(x), length = 100))
normaldens <- ddply(dat, "a", function(df) {
data.frame(
predicted = grid,
density = dnorm(grid, mean(df$x), sd(df$x))
)
})
And plot them separately using geom_line:
ggplot(data = dat,aes(x = x)) +
facet_wrap(~a) +
geom_histogram(aes(y = ..density..)) +
geom_line(data = normaldens, aes(x = predicted, y = density), colour = "red")
ggplot2 uses a different graphics paradigm than base graphics. (Although you can use grid graphics with it, the best way is to add a new stat_function layer to the plot. The ggplot2 code is the following.
Note that I couldn't get this to work using qplot, but the transition to ggplot is reasonably straighforward, the most important difference is that your data must be in data.frame format.
Also note the explicit mapping of the y aesthetic aes=aes(y=..density..)) - this is slighly unusual but takes the stat_function results and maps it to the data:
library(ggplot2)
data <- data.frame(V1 <- rnorm(700), V2=sample(LETTERS[1:7], 700, replace=TRUE))
ggplot(data, aes(x=V1)) +
stat_bin(aes(y=..density..)) +
stat_function(fun=dnorm) +
facet_grid(V2~.)

Resources