I have a simple plot of same data from an experiment.
plot(x=sample95$PositionA, y=sample95$AbsA, xlab=expression(position (mm)), ylab=expression(A[260]), type='l')
I would like to shade a particular area under the line, let's say from 35-45mm. From what I've searched so far, I think I need to use the polygon function, but I'm unsure how to assign vertices from a big dataset like this. Every example I've seen so far uses a normal curve.
Any help is appreciated, I am very new to R/RStudio!
Here is a solution using tidyverse tools including ggplot2. I use the built in airquality dataset as an example.
This first part is just to put the data in a format that we can plot by combining the month and the day into a single date. You can just substitute date for PositionA in your data.
library(tidyverse)
df <- airquality %>%
as_tibble() %>%
magrittr::set_colnames(str_to_lower(colnames(.))) %>%
mutate(date = as.Date(str_c("1973-", month, "-", day)))
This is the plot code. In ggplot2, we start with the function ggplot() and add geom functions to it with + to create the plot in layers.
The first function, geom_line, joins up all observations in the order that they appear based on the x variable, so it makes the line that we see. Each geom needs a particular mapping to an aesthetic, so here we want date on the x axis and temp on the y axis, so we write aes(x = date, y = temp).
The second function, geom_ribbon, is designed to plot bands at particular x values between a ymax and a ymin. This lets us shade the area underneath the line by choosing a constant ymin = 55 (a value lower than the minimum temperature) and setting ymax = temp.
We shade a specific part of the chart by specifying the data argument. Normally geom functions act on the dataset inherited from ggplot(), but you can override them by specifying individually. Here we use filter to only plot the points where the date is in June in geom_ribbon.
ggplot(df) +
geom_line(aes(x = date, y = temp)) +
geom_ribbon(
data = filter(df, date < as.Date("1973-07-01") & date > as.Date("1973-06-01")),
mapping = aes(x = date, ymax = temp, ymin = 55)
)
This gives the chart below:
Created on 2018-02-20 by the reprex package (v0.2.0).
Related
I would really appreciate some insight on the zagging when using the following code in R:
tbi_military %>%
ggplot(aes(x = year, y = diagnosed, color = service)) +
geom_line() +
facet_wrap(vars(severity))
The dataset is comprised of 5 variables (3 character, 2 numerical). Any insight would be so appreciated.
enter image description here
This is just an illustration with a standard dataset. Let's say we're interested in plotting the weight of chicks over time depending on a diet. We would attempt to plot this like so:
library(ggplot2)
ggplot(ChickWeight, aes(Time, weight, colour = factor(Diet))) +
geom_line()
You can see the zigzag pattern appear, because per diet/time point, there are multiple observations. Because geom_line sorts the data depending on the x-axis, this shows up as a vertical line spanning the range of datapoints at that time per diet.
The data has an additional variable called 'Chick' that separates out individual chicks. Including that in the grouping resolves the zigzag pattern and every line is the weight over time per individual chick.
ggplot(ChickWeight, aes(Time, weight, colour = factor(Diet))) +
geom_line(aes(group = interaction(Chick, Diet)))
If you don't have an extra variable that separates out individual trends, you could instead choose to summarise the data per timepoint by, for example, taking the mean at every timepoint.
ggplot(ChickWeight, aes(Time, weight, colour = factor(Diet))) +
geom_line(stat = "summary", fun = mean)
Created on 2021-08-30 by the reprex package (v1.0.0)
I have data saved in multiple datasets, each consisting of four variables. Imagine something like a data.table dt consisting of the variables Country, Male/Female, Birthyear, Weighted Average Income. I would like to create a graph where you see only one country's weighted average income by birthyear and split by male/female. I've used the facet_grid() function to get a grid of graphs for all countries as below.
ggplot() +
geom_line(data = dt,
aes(x = Birthyear,
y = Weighted Average Income,
colour = 'Weighted Average Income'))+
facet_grid(Country ~ Male/Female)
However, I've tried isolating the graphs for just one country, but the below code doesn't seem to work. How can I subset the data correctly?
ggplot() +
geom_line(data = dt[Country == 'Germany'],
aes(x = Birthyear,
y = Weighted Average Income,
colour = 'Weighted Average Income'))+
facet_grid(Country ~ Male/Female)
For your specific case the problem is that you are not quoting Male/Female and Weighted Average Income. Also your data and basic aesthetics should likely be part of ggplot and not geom_line. Doing so isolates these to the single layer, and you would have to add the code to every layer of your plot if you were to add for example geom_smooth.
So to fix your problem you could do
library(tidyverse)
plot <- ggplot(data = dt[Country == 'Germany'],
aes(x = Birthyear,
y = sym("Weighted Average Income"),
col = sym("Weighted Average Income")
) + #Could use "`x`" instead of sym(x)
geom_line() +
facet_grid(Country ~ sym("Male/Female")) ##Could use "`x`" instead of sym(x)
plot
Now ggplot2 actually has a (lesser known) builtin functionality for changing your data, so if you wanted to compare this to the plot with all of your countries included you could do:
plot %+% dt # `%+%` is used to change the data used by one or more layers. See help("+.gg")
I have this data representing the value of a variable Q1 along time.
The time is not represented by dates, it is represented by the number of days since one event.
https://www.mediafire.com/file/yfzbx67yivvvkgv/dat.xlsx/file
I'm trying to plot the mean value of Q1along time, like in here
Plotting average of multiple variables in time-series using ggplot
I'm using this code
library(Hmisc)
ggplot(dat,aes(x=days,y=Q1,colour=type,group=type)) +
stat_summary(fun.data = "mean_cl_boot", geom = "smooth")
Besides the code, which does not appear to work with the new ggplot2 version, you also have the problem that your data is not really suited for that kind of plot. This code achieves what you wanted to do:
dat <- rio::import("dat.xlsx")
library(ggplot2)
library(dplyr)dat %>%
ggplot(aes(x = days, y = Q1, colour = type, group = type)) +
geom_smooth(stat = 'summary', fun.data = mean_cl_boot)
But the plot doesn't really tell you anything, simply because there aren't enough values in your data. Most often there seems to be only one value per day, the vales jump quickly up and down, and the gaps between days are sometimes quite big.
You can see this when you group the values into timespans instead. Here I used round(days, -2) which will round to the nearest 100 (e.g., 756 is turned into 800, 301 becomes 300, 49 becomes 0):
dat %>%
mutate(days = round(days, -2)) %>%
ggplot(aes(x = days, y = Q1, colour = type, group = type)) +
geom_smooth(stat = 'summary', fun.data = mean_cl_boot)
This should be the same plot as linked but with huge confidence intervals. Which is not surprising since, as mentioned, values quickly alternate between values 1-5. I hope that helps.
I want to plot a chart in R where it will show me vertical lines for each type in facet.
df is the dataframe with person X takes time in minutes to reach from A to B and so on.
I have tried below code but not able to get the result.
df<-data.frame(type =c("X","Y","Z"), "A_to_B"= c(20,56,57), "B_to_C"= c(10,35,50), "C_to_D"= c(53,20,58))
ggplot(df, aes(x = 1,y = df$type)) + geom_line() + facet_grid(type~.)
I have attached image from excel which is desired output but I need only vertical lines where there are joins instead of entire horizontal bar.
I would not use facets in your case, because there are only 3 variables.
So, to get a similar plot in R using ggplot2, you first need to reformat the dataframe using gather() from the tidyverse package. Then it's in long or tidy format.
To my knowledge, there is no geom that does what you want in standard ggplot2, so some fiddling is necessary.
However, it's possible to produce the plot using geom_segment() and cumsum():
library(tidyverse)
# First reformat and calculate cummulative sums by type.
# This works because factor names begins with A,B,C
# and are thus ordered correctly.
df <- df %>%
gather(-type, key = "route", value = "time") %>%
group_by(type) %>%
mutate(cummulative_time = cumsum(time))
segment_length <- 0.2
df %>%
mutate(route = fct_rev(route)) %>%
ggplot(aes(color = route)) +
geom_segment(aes(x = as.numeric(type) + segment_length, xend = as.numeric(type) - segment_length, y = cummulative_time, yend = cummulative_time)) +
scale_x_discrete(limits=c("1","2","3"), labels=c("Z", "Y","X"))+
coord_flip() +
ylim(0,max(df$cummulative_time)) +
labs(x = "type")
EDIT
This solutions works because it assigns values to X,Y,Z in scale_x_discrete. Be careful to assign the correct labels! Also compare this answer.
I'm trying to create a graph in R like this:
I have three columns (online, offline and routes). However, when I add the following code:
library(ggplot2)
ggplot(coefroute, aes(routes,offline)) + geom_line()
I get the following message:
geom_path: Each group consists of only one observation. Do you need to adjust the group aesthetic?
sample of coefroute:
routes online offline
(Intercept) 210.4372 257.215
route10 7.543 30.0182
route100 18.3794 1.5313
route11 38.6537 78.8655
route12 66.501 94.8838
route13 -22.2391 -25.8448
route14 24.3652 177.7728
route15 48.5464 51.126 ...
routes: char, online and offline: num
Can anybody help me with putting strings in x-axis in R?
Thank you!
In the absence of sample data, here's some toy data that has the same structure as yours:
coefroute <- data.frame(routes = c("A","B","C","D","E"),
online = c(21,26,30,15,20),
offline = c(15,20,7,12,15))
To replicate your example graph in ggplot2 you would want your data in a long format, so that you can group on offline/online. See more here: Plotting multiple lines from a data frame with ggplot2 and http://ggplot2.tidyverse.org/reference/aes.html.
You can rearrange your data into a long format very easily with lots of different functions or packages, but a standard approach is to use gather from tidyr and group your series for online and offline into something called, say, status or whatever you want.
library(tidyr)
coefroute <- gather(coefroute, key = status, value = coef, online:offline)
Then you can plot this easily in ggplot:
library(ggplot2)
ggplot(coefroute, aes(x = routes, y = coef, group = status, colour = status))
+ geom_line() + scale_x_discrete()
That should create something like your example graph. You may want to modify the colours, captions, etc. There's lots of documentation about these things that's easy enough to find. I've added scale_x_discrete() here so that ggplot knows to treat your x variable as a discrete one.
Secondly, my suspicion is that a line plot may be less effective than geoms in communicating what you're trying to communicate here. I would perhaps use geom_bar(stat = "identity", position = "dodge") in place of geom_line. That would create a vertical bar chart for each coefficient with offline and online coefficients side by side.
ggplot(coefroute, aes(x = routes, y = coef, group = status, fill = status))
+ geom_bar(stat = "identity", position = "dodge") + scale_x_discrete()
There are two approaches:
Plotting the data in wide format (quick & dirty, not recommended)
plotting the data after reshaping from wide to long format (as shown by dshkol but using a different approach.
Plotting the data in wide format
# using dshkol's toy data
coefroute <- data.frame(routes = c("A","B","C","D","E"),
online = c(21,26,30,15,20),
offline = c(15,20,7,12,15))
library(ggplot2)
# plotting data in wide format (not recommended)
ggplot(coefroute, aes(x = routes, group = 1L)) +
geom_line(aes(y = online), colour = "blue") +
geom_line(aes(y = offline), colour = "orange")
This approach has several drawbacks. Each variable needs its own call to geom_line() and there is no legend.
Plotting reshaped data
For reshaping, the melt() is used which is available from the reshape2 package (the predecessor of the tidyr/dplyr packages) or in a faster implementation form the data.table package.
ggplot(data.table::melt(coefroute, id.var = "routes"),
aes(x = routes, y = value, group = variable, colour = variable)) +
geom_line()
Note that in both cases the group aesthetic has to be specified because the x-axis is discrete. This tells ggplot to consider the data points belonging to one series despite the discrete x values.