Scatterplot with single regression line despite two groups using ggplot2 - r

I would like to produce a scatter plot with ggplot2, which contains both a regression line through all data points (regardless which group they are from), but at the same time varies the shape of the markers by the grouping variable. The code below produces the group markers, but comes up with TWO regression lines, one for each group.
#model=lm(df, ParamY~ParamX)
p1<-ggplot(df,aes(x=ParamX,y=ParamY,shape=group)) + geom_point() + stat_smooth(method=lm)
How can I program that?

you shouldn't have to redo your full aes in the geom_point and add another layer, just move the shape aes to the geom_point call:
df <- data.frame(x=1:10,y=1:100+5,grouping = c(rep("a",10),rep("b",10)))
ggplot(df,aes(x=x,y=y)) +
geom_point(aes(shape=grouping)) +
stat_smooth(method=lm)
EDIT:
To help with your comment:
because annotate can end up, for me anyway, with the same labels on each facet. I like to make a mini data.frame that has my variable for faceting and the facet levels with another column representing the labels I want to use. In this case the label data frame is called dfalbs.
Then use this to label data frame to label the facets individually e.g.
df <- data.frame(x=1:10,y=1:10,grouping =
c(rep("a",5),rep("b",5)),faceting=c(rep(c("oneR2","twoR2"),5)))
dflabs <- data.frame(faceting=c("oneR2","twoR2"),posx=c(7.5,7.5),posy=c(2.5,2.5))
ggplot(df,aes(x=x,y=y,group=faceting)) +
geom_point(aes(shape=grouping),size=5) +
stat_smooth(method=lm) +
facet_wrap( ~ faceting) +
geom_text(data=dflabs,aes(x=posx,y=posy,label=faceting))

Related

Scatter plot with ggplot2

I am trying to make scatter plot with ggplot2. Below you can see data and my code.
data=data.frame(
gross_i.2019=seq(1,101),
Prediction=seq(21,121))
ggplot(data=data, aes(x=gross_i.2019, y=Prediction, group=1)) +
geom_point()
This code produce chart below
So now I want to have values on scatter plot with different two different colors, first for gross_i.2019 and second for Prediction. I try with this code below with different color but this code this lines of code only change previous color into new color.
sccater <- ggplot(data=data, aes(x=gross_i.2019, y=Prediction))
sccater + geom_point(color = "#00AFBB")
So can anybody help me how to make this plot with two different color (e.g black and red) one for gross_i.2019 and second for Prediction?
I may be confused by what you are trying to accomplish, but it doesn't seem like you have two groups of data to plot two different colors for. You have one dependent(Prediction) and one independent (gross_i.2019) variable that you are plotting a relationship for. If Prediction and gross_i.2019 are both supposed to be dependent variables of different groups, you need a common independent variable to plot them separately, against (like time for example). Then you can do something like geompoint(color=groups)
Edit1: If you wanted the index (count of the dataset to be your independent x axis then you could do the following:
library(tidyverse)
data=data.frame(gross_i.2019=seq(1,101),Prediction=seq(21,121))
#create a column for the index numbers
data$index <- c(1:101)
#using tidyr pivot your dataset to a tidy dataset (long not wide)
data <- data %>% pivot_longer(!index, names_to="group",values_to="count")
#asign the groups to colors
p<- ggplot(data=data, aes(x=index, y=count, color=group))
p1<- p + geom_point()
p1
This type of problems generally has to do with reshaping the data. The format should be the long format and the data is in wide format. See this post on how to reshape the data from wide to long format.
long <- reshape(data,
ids = row.names(data),
varying = c("gross_i.2019", "Prediction"),
v.names = "line",
direction = "long")
long$time <- names(data)[long$time]
long$id <- as.numeric(long$id)
library(ggplot2)
ggplot(long, aes(id, line, color = time)) +
geom_point() +
scale_color_manual(values = c("#000000", "#00AFBB"))

R: ggplot - YX relationship with 2 groups - how to get only one (whole sample) slope? [duplicate]

I have multiple sources of data over three decades.
The data is discontiguous and overlaps in multiple places. I would like to plot the points for each data source in a different color but then add a single trendline that uses all of the data sources.
The included code has some sample data and two plot examples. The first call to ggplot, plots a single trendline for all of the data. the second ggplot call, plots each source distinctly in different colors with its own trendline.
library(ggplot2)
the.data <- read.table( header=TRUE, sep=",",
text="source,year,value
S1,1976,56.98
S1,1977,55.26
S1,1978,68.83
S1,1979,59.70
S1,1980,57.58
S1,1981,61.54
S1,1982,48.65
S1,1983,53.45
S1,1984,45.95
S1,1985,51.95
S1,1986,51.85
S1,1987,54.55
S1,1988,51.61
S1,1989,52.24
S1,1990,49.28
S1,1991,57.33
S1,1992,51.28
S1,1993,55.07
S1,1994,50.88
S2,1993,54.90
S2,1994,51.20
S2,1995,52.10
S2,1996,51.40
S3,2002,57.95
S3,2003,47.95
S3,2004,48.15
S3,2005,37.80
S3,2006,56.96
S3,2007,48.91
S3,2008,44.00
S3,2009,45.35
S3,2010,49.40
S3,2011,51.19")
ggplot( the.data, aes( the.data$year, the.data$value ) ) + geom_point() + geom_smooth()
#ggplot( the.data, aes( the.data$year, the.data$value, color=the.data$source ) ) + geom_point() + geom_smooth()
The second call displays the colored data points and I would like to add a single contiguous trendline representing all of the years.
Like this:
ggplot(the.data, aes( x = year, y = value ) ) +
geom_point(aes(colour = source)) +
geom_smooth(aes(group = 1))
A few notes:
Don't map aesthetics to an isolated vector like the.data$year. (Until you really know what you're doing, and know when to break that rule.) Just use the column names.
Map the aesthetics that you want in separate layers in their respective geom calls. In this case, I want the points colored differently, but for the smooth line, I want the data grouped all together (group = 1).

ggplot2: Is there a way to overlay a single plot to all facets in a ggplot

I would like to use ggplot and faceting to construct a series of density plots grouped by a factor. Additionally, I would like to a layer another density plot on each of the facets that is not subject to the constraints imposed by the facet.
For example, the faceted plot would look like this:
require(ggplot2)
ggplot(diamonds, aes(price)) + facet_grid(.~clarity) + geom_density()
and then I would like to have the following single density plot layered on top of each of the facets:
ggplot(diamonds, aes(price)) + geom_density()
Furthermore, is ggplot with faceting the best way to do this, or is there a preferred method?
One way to achieve this would be to make new data frame diamonds2 that contains just column price and then two geom_density() calls - one which will use original diamonds and second that uses diamonds2. As in diamonds2 there will be no column clarity all values will be used in all facets.
diamonds2<-diamonds["price"]
ggplot(diamonds, aes(price)) + geom_density()+facet_grid(.~clarity) +
geom_density(data=diamonds2,aes(price),colour="blue")
UPDATE - as suggested by #BrianDiggs the same result can be achieved without making new data frame but transforming it inside the geom_density().
ggplot(diamonds, aes(price)) + geom_density()+facet_grid(.~clarity) +
geom_density(data=transform(diamonds, clarity=NULL),aes(price),colour="blue")
Another approach would be to plot data without faceting. Add two calls to geom_density() - in one add aes(color=clarity) to have density lines in different colors for each level of clarity and leave empty second geom_density() - that will add overall black density line.
ggplot(diamonds,aes(price))+geom_density(aes(color=clarity))+geom_density()

ggplot boxplots with scatterplot overlay (same variables)

I'm an undergrad researcher and I've been teaching myself R over the past few months. I just started trying ggplot, and have run into some trouble. I've made a series of boxplots looking at the depth of fish at different acoustic receiver stations. I'd like to add a scatterplot that shows the depths of the receiver stations. This is what I have so far:
data <- read.csv(".....MPS.csv", header=TRUE)
df <- data.frame(f1=factor(data$Tagging.location), #$
f2=factor(data$Station),data$Detection.depth)
df2 <- data.frame(f2=factor(data$Station), data$depth)
df$f1f2 <- interaction(df$f1, df$f2) #$
plot1 <- ggplot(aes(y = data$Detection.depth, x = f2, fill = f1), data = df) + #$
geom_boxplot() + stat_summary(fun.data = give.n, geom = "text",
position = position_dodge(height = 0, width = 0.75), size = 3)
plot1+xlab("MPS Station") + ylab("Depth(m)") +
theme(legend.title=element_blank()) + scale_y_reverse() +
coord_cartesian(ylim=c(150, -10))
plot2 <- ggplot(aes(y=data$depth, x=f2), data=df2) + geom_point()
plot2+scale_y_reverse() + coord_cartesian(ylim=c(150,-10)) +
xlab("MPS Station") + ylab("Depth (m)")
Unfortunately, since I'm a new user in this forum, I'm not allowed to upload images of these two plots. My x-axis is "Stations" (which has 12 options) and my y-axis is "Depth" (0-150 m). The boxplots are colour-coded by tagging site (which has 2 options). The depths are coming from two different columns in my spreadsheet, and they cannot be combined into one.
My goal is to to combine those two plots, by adding "plot2" (Station depth scatterplot) to "plot1" boxplots (Detection depths). They are both looking at the same variables (depth and station), and must be the same y-axis scale.
I think I could figure out a messy workaround if I were using the R base program, but I would like to learn ggplot properly, if possible. Any help is greatly appreciated!
Update: I was confused by the language used in the original post, and wrote a slightly more complicated answer than necessary. Here is the cleaned up version.
Step 1: Setting up. Here, we make sure the depth values in both data frames have the same variable name (for readability).
df <- data.frame(f1=factor(data$Tagging.location), f2=factor(data$Station), depth=data$Detection.depth)
df2 <- data.frame(f2=factor(data$Station), depth=data$depth)
Step 2: Now you can plot this with the 'ggplot' function and split the data by using the `col=f1`` argument. We'll plot the detection data separately, since that requires a boxplot, and then we'll plot the depths of the stations with colored points (assuming each station only has one depth). We specify the two different plots by referencing the data from within the 'geom' functions, instead of specifying the data inside the main 'ggplot' function. It should look something like this:
ggplot()+geom_boxplot(data=df, aes(x=f2, y=depth, col=f1)) + geom_point(data=df2, aes(x=f2, y=depth), colour="blue") + scale_y_reverse()
In this plot example, we use boxplots to represent the detection data and color those boxplots by the site label. The stations, however, we plot separately using a specific color of points, so we will be able to see them clearly in relation to the boxplots.
You should be able to adjust the plot from here to suit your needs.
I've created some dummy data and loaded into the chart to show you what it would look like. Keep in mind that this is purely random data and doesn't really make sense.

How to get a single trendline with multiple data sets in R and ggplot2?

I have multiple sources of data over three decades.
The data is discontiguous and overlaps in multiple places. I would like to plot the points for each data source in a different color but then add a single trendline that uses all of the data sources.
The included code has some sample data and two plot examples. The first call to ggplot, plots a single trendline for all of the data. the second ggplot call, plots each source distinctly in different colors with its own trendline.
library(ggplot2)
the.data <- read.table( header=TRUE, sep=",",
text="source,year,value
S1,1976,56.98
S1,1977,55.26
S1,1978,68.83
S1,1979,59.70
S1,1980,57.58
S1,1981,61.54
S1,1982,48.65
S1,1983,53.45
S1,1984,45.95
S1,1985,51.95
S1,1986,51.85
S1,1987,54.55
S1,1988,51.61
S1,1989,52.24
S1,1990,49.28
S1,1991,57.33
S1,1992,51.28
S1,1993,55.07
S1,1994,50.88
S2,1993,54.90
S2,1994,51.20
S2,1995,52.10
S2,1996,51.40
S3,2002,57.95
S3,2003,47.95
S3,2004,48.15
S3,2005,37.80
S3,2006,56.96
S3,2007,48.91
S3,2008,44.00
S3,2009,45.35
S3,2010,49.40
S3,2011,51.19")
ggplot( the.data, aes( the.data$year, the.data$value ) ) + geom_point() + geom_smooth()
#ggplot( the.data, aes( the.data$year, the.data$value, color=the.data$source ) ) + geom_point() + geom_smooth()
The second call displays the colored data points and I would like to add a single contiguous trendline representing all of the years.
Like this:
ggplot(the.data, aes( x = year, y = value ) ) +
geom_point(aes(colour = source)) +
geom_smooth(aes(group = 1))
A few notes:
Don't map aesthetics to an isolated vector like the.data$year. (Until you really know what you're doing, and know when to break that rule.) Just use the column names.
Map the aesthetics that you want in separate layers in their respective geom calls. In this case, I want the points colored differently, but for the smooth line, I want the data grouped all together (group = 1).

Resources