how to add trendline for ggplot graph in R - r

I have a graph with 7 dots and I want to add trendline
thanks for the help!
percentage.no.work <- cleanData %>% group_by(AREA) %>%
summarise(percentage = mean(ESTIMATED.CITY.UNEMPLOYMENT))
ggplot() +
geom_point(data=percentage.no.work, aes(x=AREA, y=percentage), alpha=0.6, color="purple", size=2) +
geom_smooth(method = "lm") +
theme_minimal() + ggtitle("Percentage Estimated City Unemployment") +
ylab("Percentage")

Related

how to add text for the dots in ggplot graph R

I'm trying to add the numbers for the dots in my graph. thanks for the help!
percentage.no.work <- cleanData %>% group_by(AREA) %>%
summarise(percentage = mean(ESTIMATED.CITY.UNEMPLOYMENT))
ggplot() +
geom_point(data=percentage.no.work, aes(x=AREA, y=percentage), alpha=0.6, color="purple", size=2) +
geom_smooth(method = "lm") +
theme_minimal() + ggtitle("Percentage Estimated City Unemployment") +
ylab("Percentage")
Using a little made-up data, you can add text labels like this. Note, you also need the aes() in ggplot rather than geom_point and a group = 1 so that you get the geom_smooth rendered.
library(tidyverse)
tribble(
~AREA, ~percentage,
"a", 0.2,
"b", 0.4
) |>
ggplot(aes(AREA, percentage, group = 1), alpha = 0.6, color = "purple", size = 2) +
geom_point() +
geom_text(aes(label = percentage), nudge_x = 0.1) +
geom_smooth(method = "lm") +
theme_minimal() +
ggtitle("Percentage Estimated City Unemployment") +
ylab("Percentage")
Created on 2022-06-04 by the reprex package (v2.0.1)

Trying to only display two decades on axis of box-plot in R

Hi!
This is how my plot looks at the moment...
Want to try and only display decades 1980 and 2010 in my box plots.
This is how my code looks now:
gender_race_income <- ggplot(data = gender_pay_gap, aes(x = factor(decade), y = income, colour
= sex)) +
geom_boxplot() +
coord_flip() +
facet_grid(rows = vars(race)) +
scale_y_log10() +
theme_bw() +
scale_color_brewer(palette = "Set2")
Thank you :)
welcome to Stack Overflow. You can subset your data using filter() from dplyr and then pass this onto the ggplot using the %>% pipe.
gender_pay_gap %>%
dplyr::filter(decade == 1980 | decade == 2010) %>%
ggplot(aes(x = factor(decade), y = income, colour = sex)) +
geom_boxplot() +
coord_flip() +
facet_grid(rows = vars(race)) +
scale_y_log10() +
theme_bw() +
scale_color_brewer(palette = "Set2")

Adding a legend + slope of trend line in ggplot()

Using ggplot and the following code I created the plot below:
df_have %>%
ggplot(aes(date, num_visits)) +
geom_line(colour = "blue") +
ggtitle("Number of Customers Visiting Store by Month") +
geom_smooth(method = "lm", fit = num_visits ~ date, se=F, color="red") +
xlab("Date") +
ylab("Number of Visits")
However, I am wondering how to:
add a legend differentiating between the blue line (number of visits) and red line (line of best fit) and
annotate this chart with the equation of the red line in the form of y = mx + c
Thanks!
Edit: Sample data
df_have
num_visits date
1 19.825 2021.000
2 25.025 2021.083
3 27.950 2021.167
4 25.425 2021.250
5 29.575 2021.333
Update: One way could be using ggpmisc package:
library(tidyverse)
library(ggpmisc)
my.formula <- y ~ x
df_have %>%
ggplot(aes(date, num_visits)) +
geom_line(aes(colour = "blue")) +
ggtitle("Number of Customers Visiting Store by Month") +
geom_smooth(method = "lm", fit = num_visits ~ date, se=F, aes(color="red")) +
stat_poly_eq(formula = my.formula,
aes(label = paste(..eq.label.., ..rr.label.., sep = "~~~")),
parse = TRUE) +
xlab("Date") +
ylab("Number of Visits")+
scale_color_manual(values = c('blue', 'red'), labels=c('num_visits', 'y = mx + c'))
First_answer:
You mean this kind of output?
df_have %>%
ggplot(aes(date, num_visits)) +
geom_line(aes(colour = "blue")) +
ggtitle("Number of Customers Visiting Store by Month") +
geom_smooth(method = "lm", fit = num_visits ~ date, se=F, aes(color="red")) +
xlab("Date") +
ylab("Number of Visits")+
scale_color_manual(values = c('blue', 'red'), labels=c('num_visits', 'y = mx + c'))

Adding smoother to ggplot 2

I have used the following code to create a plot in r using ggplot2:
g <- ggplot(newdata, aes(MVPAper, FMI) +
geom_smooth(method = 'lm'))
I then added the following:
p <- g + geom_point(aes(color = Age)) +
facet_grid(Age ~ .) +
stat_smooth(method = 'lm') +
theme_bw(base_family = 'Times')`
I am wanting to have a smoother for each of the four graphs I have created, using the facet grid to split the graph into four ages 8,9,12,and 15) can anyone assist with my code?
You don't need both geom_smooth() and stat_smooth(). Try this:
library(tidyverse)
df <- diamonds %>% filter(price < 10000, carat < 2.5)
g <- ggplot(df, aes(carat, price, color = cut))
g +
geom_point() +
geom_smooth(method = 'lm') +
facet_grid(cut ~ .) +
theme_bw()

r: Automatically stagger overlapping labels in ggplot slopegraph

While creating a slopegraph with ggplot2, as below, I find that many of my labels overlap when their data points are close together. How can I change the labelling to automatically stagger my labels if there is overlap?
library(ggplot2)
library(scales)
install.packages("Lock5Data", repos = "http://cran.us.r-project.org") # you might need this
library(Lock5Data)
data("NBAStandings1e")
data("NBAStandings2016")
colnames(NBAStandings1e)[4] <- "year1" # 2010-2011
colnames(NBAStandings2016)[4] <- "year2" # 2015-2016
nba_df <- merge(NBAStandings1e[,c('Team','year1')], NBAStandings2016[,c('Team','year2')])
scale <- dim(nba_df)[1]
a<-nba_df
p<-ggplot(nba_df) + geom_segment(aes(x=0,xend=scale,y=year1,yend=year2),size=.75)
# clear junk
p<-p + theme(panel.background = element_blank())
p<-p + theme(panel.grid=element_blank())
p<-p + theme(axis.ticks=element_blank())
# p<-p + theme(axis.text=element_blank())
p<-p + theme(panel.border=element_blank())
# p<-p + theme(panel.grid.major = element_line(linetype = "dashed", fill = NA))
p<-p + theme(panel.grid.major = element_line(linetype = "dashed",color = "grey80"))
p<-p + theme(panel.grid.major.x = element_blank())
p<-p + theme(axis.text.x = element_blank())
# annotate
p<-p + xlab("") + ylab("Percentage Wins")
p<-p + xlim((-5),(scale+12))
p<-p + geom_text(label="2010-2011 Season", x=0, y=(1.1*(max(a$year2,a$year1))),hjust= 1.2,size=3)
p<-p + geom_text(label="2015-2016 Season", x=months,y=(1.1*(max(a$year2,a$year1))),hjust=-0.1,size=3)
p<-p + geom_text(label=nba_df$Team, y=nba_df$year2, x=rep.int(scale,dim(a)[1]),hjust=-0.2,size=2)
p<-p + geom_text(label=nba_df$Team, y=nba_df$year1, x=rep.int( 0,dim(a)[1]),hjust=1.2,size=2)
p
Since the teams that overlap have the same winning percentage, you can deal with overlap more simply by combining the labels for teams with the same winning percentage. I've also made a few other changes to your code intended to streamline the process.
library(Lock5Data)
library(tidyverse)
library(scales)
data("NBAStandings1e")
data("NBAStandings2016")
colnames(NBAStandings1e)[4] <- "2010-11" # 2010-2011
colnames(NBAStandings2016)[4] <- "2015-16" # 2015-2016
nba_df <- merge(NBAStandings1e[,c('Team','2010-11')], NBAStandings2016[,c('Team','2015-16')])
# Convert data to long format
dat = gather(nba_df, Season, value, -Team)
# Combine labels for teams with same winning percentage (see footnote * below)
dat_lab = dat %>% group_by(Season, value) %>%
summarise(Team = paste(Team, collapse="\U2014")) # \U2014 is the emdash character
ggplot(dat, aes(Season, value, group=Team)) +
geom_line() +
theme_minimal() + theme(panel.grid.minor=element_blank()) +
labs(y="Winning Percentage") +
scale_y_continuous(limits=c(0,1), labels=percent) +
geom_text(data=subset(dat_lab, Season=="2010-11"), aes(label=Team, x=0.98), hjust=1, size=2) +
geom_text(data=subset(dat_lab, Season=="2015-16"), aes(label=Team, x=2.02), hjust=0, size=2)
Here's a closeup of what the labels look like:
* If there are teams that overlap due to having very close, but unequal, winning percentages, you can still group them by rounding. For example, if you wanted to group teams with winning percentages that are the same when rounded to the nearest 2 percent, you could do:
dat_lab = dat %>% group_by(Season, group=round(value/0.02)*0.02) %>%
summarise(Team = paste(Team, collapse="\U2014"),
value = mean(value))
This would result in the labels being placed at the mean value for their group.

Resources