Using ggplot and the following code I created the plot below:
df_have %>%
ggplot(aes(date, num_visits)) +
geom_line(colour = "blue") +
ggtitle("Number of Customers Visiting Store by Month") +
geom_smooth(method = "lm", fit = num_visits ~ date, se=F, color="red") +
xlab("Date") +
ylab("Number of Visits")
However, I am wondering how to:
add a legend differentiating between the blue line (number of visits) and red line (line of best fit) and
annotate this chart with the equation of the red line in the form of y = mx + c
Thanks!
Edit: Sample data
df_have
num_visits date
1 19.825 2021.000
2 25.025 2021.083
3 27.950 2021.167
4 25.425 2021.250
5 29.575 2021.333
Update: One way could be using ggpmisc package:
library(tidyverse)
library(ggpmisc)
my.formula <- y ~ x
df_have %>%
ggplot(aes(date, num_visits)) +
geom_line(aes(colour = "blue")) +
ggtitle("Number of Customers Visiting Store by Month") +
geom_smooth(method = "lm", fit = num_visits ~ date, se=F, aes(color="red")) +
stat_poly_eq(formula = my.formula,
aes(label = paste(..eq.label.., ..rr.label.., sep = "~~~")),
parse = TRUE) +
xlab("Date") +
ylab("Number of Visits")+
scale_color_manual(values = c('blue', 'red'), labels=c('num_visits', 'y = mx + c'))
First_answer:
You mean this kind of output?
df_have %>%
ggplot(aes(date, num_visits)) +
geom_line(aes(colour = "blue")) +
ggtitle("Number of Customers Visiting Store by Month") +
geom_smooth(method = "lm", fit = num_visits ~ date, se=F, aes(color="red")) +
xlab("Date") +
ylab("Number of Visits")+
scale_color_manual(values = c('blue', 'red'), labels=c('num_visits', 'y = mx + c'))
Related
I'm trying to add the numbers for the dots in my graph. thanks for the help!
percentage.no.work <- cleanData %>% group_by(AREA) %>%
summarise(percentage = mean(ESTIMATED.CITY.UNEMPLOYMENT))
ggplot() +
geom_point(data=percentage.no.work, aes(x=AREA, y=percentage), alpha=0.6, color="purple", size=2) +
geom_smooth(method = "lm") +
theme_minimal() + ggtitle("Percentage Estimated City Unemployment") +
ylab("Percentage")
Using a little made-up data, you can add text labels like this. Note, you also need the aes() in ggplot rather than geom_point and a group = 1 so that you get the geom_smooth rendered.
library(tidyverse)
tribble(
~AREA, ~percentage,
"a", 0.2,
"b", 0.4
) |>
ggplot(aes(AREA, percentage, group = 1), alpha = 0.6, color = "purple", size = 2) +
geom_point() +
geom_text(aes(label = percentage), nudge_x = 0.1) +
geom_smooth(method = "lm") +
theme_minimal() +
ggtitle("Percentage Estimated City Unemployment") +
ylab("Percentage")
Created on 2022-06-04 by the reprex package (v2.0.1)
I have a graph with 7 dots and I want to add trendline
thanks for the help!
percentage.no.work <- cleanData %>% group_by(AREA) %>%
summarise(percentage = mean(ESTIMATED.CITY.UNEMPLOYMENT))
ggplot() +
geom_point(data=percentage.no.work, aes(x=AREA, y=percentage), alpha=0.6, color="purple", size=2) +
geom_smooth(method = "lm") +
theme_minimal() + ggtitle("Percentage Estimated City Unemployment") +
ylab("Percentage")
I am currently working with a dataset of "world bank islands". In that, I am trying to plot the population Vs country graph for each year. Below is the code that I have done.
library(ggplot2)
options(scipen = 999)
bank <- read.csv("C:/Users/True Gamer/OneDrive/Desktop/world_bank_international_arrivals_islands.csv")
bank[bank == "" | bank == "."] <- NA
bank$country <- as.numeric(bank$country)
bank$year <- as.numeric(bank$year)
bank$areakm2 <- as.numeric(bank$areakm2)
bank$pop <- as.numeric(bank$pop)
bank$gdpnom <- as.numeric(bank$gdpnom)
bank$flights...WB <- as.numeric(bank$flights...WB)
bank$hotels <- as.numeric(bank$hotels)
bank$hotrooms <- as.numeric(bank$hotrooms)
bank$receipt <- as.numeric(bank$receipt)
bank$ovnarriv <- as.numeric(bank$ovnarriv)
bank$dayvisit <- as.numeric(bank$dayvisit)
bank$arram <- as.numeric(bank$arram)
bank$arreur <- as.numeric(bank$arreur)
bank$arraus <- as.numeric(bank$arraus)
str(bank)
plot1 <- ggplot(bank, aes(x=country,y=pop)) + geom_bar(stat = "identity",aes(fill=year)) + ggtitle("Population of each country yearwise") + xlab("Countries") + ylab("Population")
plot1
However, when I do this, the y values shown on the graph are different from the actual y values. This is the link to the dataset
The problem is that you are stacking the bars (this is default behaviour). Also, geom_bar(stat = "identity") is just a long way of writing geom_col. One further point to note is that since all your columns are numeric, the single line:
bank <- as.data.frame(lapply(bank, as.numeric))
replaces all your individual numeric conversions.
The plot you are trying to create would be something like this:
ggplot(bank, aes(x = country, y = pop)) +
geom_col(aes(fill = factor(year)), position = "dodge") +
ggtitle("Population of each country yearwise") +
xlab("Countries") +
ylab("Population") +
labs(fill = "Year") +
scale_y_continuous(labels = scales::comma) +
scale_x_continuous(breaks = 1:27)
However, it would probably be best to present your data in a different way. Perhaps, if you are comparing population growth, something like this would be better:
ggplot(bank, aes(x = year, y = pop)) +
geom_line(aes(color = factor(country)), position = "dodge") +
ggtitle("Population of each country yearwise") +
xlab("Year") +
ylab("Population") +
facet_wrap(.~country, scales = "free_y", nrow = 6) +
scale_y_continuous(labels = scales::comma) +
scale_x_continuous(breaks = c(0, 5, 10)) +
theme_minimal() +
theme(legend.position = "none")
Or with bars:
ggplot(bank, aes(x = year, y = pop)) +
geom_col(aes(fill = factor(country)), position = "dodge") +
ggtitle("Population of each country yearwise") +
xlab("Year") +
ylab("Population") +
facet_wrap(.~country, scales = "free_y", nrow = 6) +
scale_y_continuous(labels = scales::comma) +
scale_x_continuous(breaks = c(0, 5, 10)) +
theme_minimal() +
theme(legend.position = "none")
In the following plot, time is on the x-axis but tick marks do not show for every year:
ggplot(mm, aes(x = time, y = value)) +
geom_point(aes(color = variable)) +
geom_line(stat = "smooth", method = "lm", alpha = 0.5) +
facet_grid(variable ~ ., scales = "free_y") +
theme(legend.position="none") +
coord_fixed(ratio = 10)
In order to fix this, I have converted the time variable to a factor, which works but then the linear regression disappears:
ggplot(mm, aes(x = factor(time), y = value)) +
geom_point(aes(color = variable)) +
geom_line(stat = "smooth", method = "lm", alpha = 0.5) +
facet_grid(variable ~ ., scales = "free_y") +
theme(legend.position = "none") +
coord_fixed(ratio = 10)
Is there a workaround for this with geom_line?
I think that scale_x_date is what you are looking for.
First, some reproducible data:
df <-
data.frame(
y = 99:117
, x = seq(as.Date("1999-01-01")
, as.Date("2017-01-01")
, "year")
)
Then, this is the way you can set to some "pretty" break points while still getting a tick at each year. If you want every year labelled, then use date_breaks = "1 year" instead of the breaks and date_minor_breaks arguments I have now
ggplot(df, aes(x = x, y = y) ) +
geom_smooth(method = "lm") +
geom_point() +
scale_x_date(breaks = pretty(df$x)
, date_minor_breaks = "1 year"
, date_labels = "%Y")
gives
Or, if your years are just numeric (and not dates), you can use scale_x_continuous for a similar effect:
df <-
data.frame(
y = 99:117
, x = 1999:2017
)
ggplot(df, aes(x = x, y = y) ) +
geom_smooth(method = "lm") +
geom_point() +
scale_x_continuous(breaks = pretty(df$x)
, minor_breaks = unique(df$x)) +
theme_gray()
Gives a plot that is indistinguishable from above.
ggplot(all, aes(x=area, y=nq)) +
geom_point(size=0.5) +
geom_abline(data = levelnew, aes(intercept=log10(exp(interceptmax)), slope=fslope)) + #shifted regression line
scale_y_log10(labels = function(y) format(y, scientific = FALSE)) +
scale_x_log10(labels = function(x) format(x, scientific = FALSE)) +
facet_wrap(~levels) +
theme_bw() +
theme(panel.grid.major = element_line(colour = "#808080"))
And I get this figure
Now I want to add one geom_line to one of the facets. Basically, I wanted to have a dotted line (Say x=10,000) in only the major panel. How can I do this?
I don't have your data, so I made some up:
df <- data.frame(x=rnorm(100),y=rnorm(100),z=rep(letters[1:4],each=25))
ggplot(df,aes(x,y)) +
geom_point() +
theme_bw() +
facet_wrap(~z)
To add a vertical line at x = 1 we can use geom_vline() with a dataframe that has the same faceting variable (in my case z='b', but yours will be levels='major'):
ggplot(df,aes(x,y)) +
geom_point() +
theme_bw() +
facet_wrap(~z) +
geom_vline(data = data.frame(xint=1,z="b"), aes(xintercept = xint), linetype = "dotted")
Another way to express this which is possibly easier to generalize (and formatting stuff left out):
ggplot(df, aes(x,y)) +
geom_point() +
facet_wrap(~ z) +
geom_vline(data = subset(df, z == "b"), aes(xintercept = 1))
The key things being: facet first, then decorate facets by subsetting the original data frame, and put the details in a new aes if possible. Other examples of a similar idea:
ggplot(df, aes(x,y)) +
geom_point() +
facet_wrap(~ z) +
geom_vline(data = subset(df, z == "b"), aes(xintercept = 1)) +
geom_smooth(data = subset(df, z == "c"), aes(x, y), method = lm, se = FALSE) +
geom_text(data = subset(df, z == "d"), aes(x = -2, y=0, label = "Foobar"))