R ggplot multiple series curved line - r

I am plotting multiple series of data on one plot.
I have data that looks like this:
count_id AMV Hour duration_in_traffic AMV_norm
1 16012E 4004 14 99 0
2 16012E 4026 12 94 22
3 16012E 4099 15 93 95
4 16012E 4167 11 100 163
5 16012E 4239 10 97 235
I am plotting in R using:
ggplot(td_results, aes(AMV,duration_in_traffic)) + geom_line(aes(colour=count_id))
This is giving me:
However, rather than straight lines linking points I would like curved.
I found the following question but got an unexpected output. Equivalent of curve() for ggplot
I used: ggplot(td_results, aes(AMV,duration_in_traffic)) + geom_line(aes(colour=count_id)) + stat_function(fun=sin)
Thus giving:
How can I get a curve with some form of higher order polynomial?

As #MrFlick mentions in the comments, there are serious statistical ways of getting curved lines, which are probably off topic here.
If you just want your graph to look nicer however, you could try interpolating your data with spline, then adding it on as another layer.
First we make some spline data, using 10 times the number of data points you had (you can increase or decrease this as desired):
library(dplyr)
dat2 <- td_results %>% select(count_id, AMV, duration_in_traffic) %>%
group_by(count_id) %>%
do(as.data.frame(spline(x= .[["AMV"]], y= .[["duration_in_traffic"]], n = nrow(.)*10)))
Then we plot, using your original data for points, but then using lines from the spline data (dat2):
library(ggplot2)
ggplot(td_results, aes(AMV, duration_in_traffic)) +
geom_point(aes(colour = factor(count_id))) +
geom_line(data = dat2, aes(x = x, y = y, colour = factor(count_id)))
This gives me the following graph from your test data:

Related

Grouped Bar Plot Extra Variable in R

I have the following data frame in R:
> data <- data.frame(tbi_military[0:4])
> data
Severity Active Guard Reserve
1 Penetrating 189 33 12
2 Severe 102 26 11
3 Moderate 709 177 63
4 Mild 5896 1332 541
5 Not Classifiable 122 29 12
And when I do barplot(as.matrix(data)) I get the following output:
Barplot Image
Is there a way for me to get rid of the severity on the x-axis to only have Active, Guard, Reserve? Thanks
one option is to send only the data you want to plot to the plotting function. In this case you want all columns from the second to the last (number four) so a small adjustment to your function call does the job:
barplot(as.matrix(data[, 2:4]))
A solution within the tidyverse (dplyr, tidyr and ggplot2) would be this:
library(dplyr)
library(tidyr)
library(ggplot2)
data %>%
# get data in tidy format to be able to use ggplot2 efciently
tidyr::pivot_longer(-Severity, names_to = "Type", values_to = "Value") %>%
# set up the plot by assigning variable to plot
ggplot2::ggplot(aes(Type, Value, fill = Severity)) +
# put out a bar chart with stat parameter set for stacked barchart
ggplot2::geom_bar(stat = "identity")

Chart showing current value and historic value relative to a range

I would like to recreate the following chart in R using ggplot. My data is as per a similar table where for each code (A, B, C etc.). I have a current value, a value 12M ago and the respective range (max, min) over the period.
My chart needs to show the current value in red, the value 12M ago in blue and then a line show the max and min range.
I can produce this painstaking in Excel using error bars, but I would like to reproduce it in R.
Any ideas on how I can do this using ggplot? Thanks.
Here's what I came up with, but just a note: please if you post your dataset, don't post an image, but instead post the result of dput(your.data.frame). The result of that is easily copy-pasted into the console in order to replicate your dataset, whereas I recreated your data frame manually. :/
A few points first regarding your data as is and the intended plot:
The red and blue hash marks used to indicate 12 months ago and today are not a geom I know of off the top of my head, so I'm using geom_point here to show them (easiest way). You can pick another geom of you wish to show them differently.
The ranges for high and low are already specified by those column names. I'll use those values for the required aesthetics in geom_errorbar.
You can use your data as is to plot and use two separate geom_point calls (one for "today" and one for "12M ago"), but that's going to make creating the legend more difficult than it needs to be, so the better option is to adjust the dataset to support having the legend created automatically. For that, we'll use the gather function from tidyr, being sure to just "gather together" the information in "today" and "12M ago" (my column name for that was different b/c you need to start with a letter in the data frame), but leave alone the columns for "high", "low", and the letters (called "category" in my dataframe).
Where df is the original data frame:
df1 <- df %>% gather(time, value, -category, -high, -low)
The new dataframe (df1) looks like this (18 observations total):
category high low time value
1 A 82 28 M12.ago 81
2 B 82 54 M12.ago 80
3 C 80 65 M12.ago 75
4 D 76 34 M12.ago 70
5 E 94 51 M12.ago 93
6 F 72 61 M12.ago 65
where "time" has "M12.ago" or "today".
For the plot, you apply category to x and value to y, and specify ymax and ymin with high and low, respectively for the geom_errorbar:
ggplot(df1, aes(x=category, y=value)) +
geom_errorbar(aes(ymin=low, ymax=high), width=0.2) + ylim(0,100) +
geom_point(aes(color=time), size=2) +
scale_color_manual(values=list('M12.ago'='blue', 'today'='red')) +
theme_bw() + labs(color="") + theme(legend.position='bottom')
Giving you this:

How to add a shaded rectangle to the part of a plot meeting a certain condition?

I have a dataset for body temprature of a subject at different time points since a bacteria challenge.
Temprature time since challenge(in hours)
36 9
36.5 12
37 24
38 36
38.4 49
37 60
38.3 72
If the body temprature is more than 38 for at least 12 hours, it means the person has got ill, so I would like to add a shaded rectangle and a segment to the part of the plot which has this condition.
I am using ggplot to plot the data,
p<-ggplot(data, aes(factor(x=time,levels=time), y=temprature, group=1)) +geom_line()+ geom_point()+
geom_hline(yintercept=38,color = "blue")
p+annotate("rect", xmin="132:35", xmax="180:35", ymin=38, ymax=38.5, alpha=.1, fill="blue")
s<-q+annotate("segment", x="132:35", xend="180:35", y=38.35, yend=38.35, arrow=arrow(ends="both",angle=90, length=unit(.2,"cm")))
p1<-s+annotate("text",x="157:35", y=38.5, label=">12 h")+xlab("Time since challenge") + ylab("Temprature")
p1
This code adds the rectangle to the plot manually, but I would like to write a code which adds the shaded rectangle and the segment automatically to the plot, using the condition of having temprature of >38 for more than 12 h. Because this code will be replicated for all the subjects.
DO you know how one can do that?
I took the liberty of adding a likely more complicated case, when there are more measurements that sum up to intervals of 12h with temperature >=38 degrees.
There might be more elegant solutions, but this one works.
df=data.frame(Temperature=c(36,36.5,37,38,38.4,37,38.3,39,38.3,35,36),
time_since_challenge=c(9,12,24,36,49,60,72,78,84,90,96))
#define variables
df$interval_start=NA
df$interval_end=NA
df$Temp_interval_start=NA
df$Temp_interval_end=NA
for(i in 1:(nrow(df)-1)){ #-1 so it doesn't run again in the last df-line
k=0
if(df$Temperature[i]>=38& df$Temperature[i+1]>=38){
while(df$Temperature[i+k]>=38){ #needed to address the case of multiple measurements above 12 degrees
interval_end=i+k
k=k+1
}
df$interval_start[i]=df$time_since_challenge[i]
df$interval_end[i]=df$time_since_challenge[i+k-1]
df$Temp_interval_start[i]=df$Temperature[i]
df$Temp_interval_end[i]=df$Temperature[i+k-1]
}
}
df_intervals=df[(!is.na(df$interval_end)),] #take only cases that should get the rectangle
df_intervals=df_intervals[!(duplicated(df_intervals$interval_end)),] #remove overlapping intervals
#added time above 38 degrees
df_intervals$Time_above_38degrees=df_intervals$interval_end-df_intervals$interval_start
ggplot(df, aes(x=time_since_challenge, y=Temperature)) +
geom_rect(data=df_intervals,
aes(xmin=interval_start,
xmax=interval_end,
ymin=Temp_interval_start, #could be hardcoded to 38
ymax=Temp_interval_end), #could be hardcoded to Inf ,then it would always go up to the yaxis end
fill="blue",alpha=0.5)+ #added alpha for shading
geom_line(aes(group=1))+
geom_point()+
geom_hline(yintercept=38,color = "blue")+
geom_segment(data=df_intervals,aes(x=interval_start,xend=interval_end,y=38,yend=38),
color="red",arrow=arrow(ends="both",angle=90, length=unit(.2,"cm")))

Generating a histogram and density plot from binned data

I've binned some data and currently have a dataframe that consists of two columns, one that specifies a bin range and another that specifies the frequency like this:-
> head(data)
binRange Frequency
1 (0,0.025] 88
2 (0.025,0.05] 72
3 (0.05,0.075] 92
4 (0.075,0.1] 38
5 (0.1,0.125] 20
6 (0.125,0.15] 16
I want to plot a histogram and density plot using this but I can't seem to find a way of doing so without having to generate new bins etc. Using this solution here I tried to do the following:-
p <- ggplot(data, aes(x= binRange, y=Frequency)) + geom_histogram(stat="identity")
but it crashes. Anyone know of how to deal with this?
Thank you
the problem is that ggplot doesnt understand the data the way you input it, you need to reshape it like so (I am not a regex-master, so surely there are better ways to do is):
df <- read.table(header = TRUE, text = "
binRange Frequency
1 (0,0.025] 88
2 (0.025,0.05] 72
3 (0.05,0.075] 92
4 (0.075,0.1] 38
5 (0.1,0.125] 20
6 (0.125,0.15] 16")
library(stringr)
library(splitstackshape)
library(ggplot2)
# extract the numbers out,
df$binRange <- str_extract(df$binRange, "[0-9].*[0-9]+")
# split the data using the , into to columns:
# one for the start-point and one for the end-point
df <- cSplit(df, "binRange")
# plot it, you actually dont need the second column
ggplot(df, aes(x = binRange_1, y = Frequency, width = 0.025)) +
geom_bar(stat = "identity", breaks=seq(0,0.125, by=0.025))
or if you don't want the data to be interpreted numerically, you can just simply do the following:
df <- read.table(header = TRUE, text = "
binRange Frequency
1 (0,0.025] 88
2 (0.025,0.05] 72
3 (0.05,0.075] 92
4 (0.075,0.1] 38
5 (0.1,0.125] 20
6 (0.125,0.15] 16")
library(ggplot2)
ggplot(df, aes(x = binRange, y = Frequency)) + geom_bar(stat = "identity")
you won't be able to plot a density-plot with your data, given its not continous but rather categorical, thats why I actually prefer the second way of showing it,
You can try
library(ggplot2)
ggplot(df, aes(x = binRange, y = Frequency)) + geom_col()

The difference between geom_density in ggplot2 and density in base R

I have a data in R like the following:
bag_id location_type event_ts
2 155 sorter 2012-01-02 17:06:05
3 305 arrival 2012-01-01 07:20:16
1 155 transfer 2012-01-02 15:57:54
4 692 arrival 2012-03-29 09:47:52
10 748 transfer 2012-01-08 17:26:02
11 748 sorter 2012-01-08 17:30:02
12 993 arrival 2012-01-23 08:58:54
13 1019 arrival 2012-01-09 07:17:02
14 1019 sorter 2012-01-09 07:33:15
15 1154 transfer 2012-01-12 21:07:50
where class(event_ts) is POSIXct.
I wanted to find the density of bags at each location in different times.
I used the command geom_density(ggplot2) and I could plot it very nice. I wonder if there is any difference between density(base) and this command. I mean any difference about the methods that they are using or the default bandwith that they are using and the like.
I need to add the densities to my data frame. If I had used the function density(base), I knew how I can use the function approxfun to add these values to my data frame, but I wonder if it is the same when I use geom_density(ggplot2) .
A quick perusal of the ggplot2 documentation for geom_density() reveals that it wraps up the functionality in stat_density(). The first argument there references that the adjust parameter coming from the base function density(). So, to your direct question - they are built off of the same function, though the exact parameters used may be different. You have some control over setting those parameters, but you may not be able to have the amount of flexibility you want.
One alternative to using geom_density() is to calculate the density that you want outside of ggplot() and then plot it with geom_line(). For example:
library(ggplot2)
#100 random variables
x <- data.frame(x = rnorm(100))
#Calculate own density, set parameters as you desire
d <- density(x$x)
x2 <- data.frame(x = d$x, y = d$y)
#Using geom_density()
ggplot(x, aes(x)) + geom_density()
#Using home grown density
ggplot(x2, aes(x,y)) + geom_line(colour = "red")
Here, they give nearly identical plots, though they may vary more significantly with your data and your settings.

Resources