I have a small table of summary data with the odds ratio, upper and lower confidence limits for four categories, with six levels within each category. I'd like to produce a chart using ggplot2 that looks similar to the usual one created when you specify a lm and it's se, but I'd like R just to use the pre-specified values I have in my table. I've managed to create the line graph with error bars, but these overlap and make it unclear. The data look like this:
interval OR Drug lower upper
14 0.004 a 0.002 0.205
30 0.022 a 0.001 0.101
60 0.13 a 0.061 0.23
90 0.22 a 0.14 0.34
180 0.25 a 0.17 0.35
365 0.31 a 0.23 0.41
14 0.84 b 0.59 1.19
30 0.85 b 0.66 1.084
60 0.94 b 0.75 1.17
90 0.83 b 0.68 1.01
180 1.28 b 1.09 1.51
365 1.58 b 1.38 1.82
14 1.9 c 0.9 4.27
30 2.91 c 1.47 6.29
60 2.57 c 1.52 4.55
90 2.05 c 1.31 3.27
180 2.422 c 1.596 3.769
365 2.83 c 1.93 4.26
14 0.29 d 0.04 1.18
30 0.09 d 0.01 0.29
60 0.39 d 0.17 0.82
90 0.39 d 0.2 0.7
180 0.37 d 0.22 0.59
365 0.34 d 0.21 0.53
I have tried this:
limits <- aes(ymax=upper, ymin=lower)
dodge <- position_dodge(width=0.9)
ggplot(data, aes(y=OR, x=days, colour=Drug)) +
geom_line(stat="identity") +
geom_errorbar(limits, position=dodge)
and searched for a suitable answer to create a pretty plot, but I'm flummoxed!
Any help greatly appreciated!
You need the following lines:
p<-ggplot(data=data, aes(x=interval, y=OR, colour=Drug)) + geom_point() + geom_line()
p<-p+geom_ribbon(aes(ymin=data$lower, ymax=data$upper), linetype=2, alpha=0.1)
Here is a base R approach using polygon() since #jmb requested a solution in the comments. Note that I have to define two sets of x-values and associated y values for the polygon to plot. It works by plotting the outer perimeter of the polygon. I define plot type = 'n' and use points() separately to get the points on top of the polygon. My personal preference is the ggplot solutions above when possible since polygon() is pretty clunky.
library(tidyverse)
data('mtcars') #built in dataset
mean.mpg = mtcars %>%
group_by(cyl) %>%
summarise(N = n(),
avg.mpg = mean(mpg),
SE.low = avg.mpg - (sd(mpg)/sqrt(N)),
SE.high =avg.mpg + (sd(mpg)/sqrt(N)))
plot(avg.mpg ~ cyl, data = mean.mpg, ylim = c(10,30), type = 'n')
#note I have defined c(x1, x2) and c(y1, y2)
polygon(c(mean.mpg$cyl, rev(mean.mpg$cyl)),
c(mean.mpg$SE.low,rev(mean.mpg$SE.high)), density = 200, col ='grey90')
points(avg.mpg ~ cyl, data = mean.mpg, pch = 19, col = 'firebrick')
Related
This is my first question on this platform, though I have thoroughly used it to solve many problems in R programming.
(1) I am stuck with SPI plots. The current SPI plot from SPEI package does not allow nice plots and I am not able to add the years along the x-axis. Kindly if anyone can help me to solve it.
(2) I have reworked the SPI data and created a data frame for different stations. However, when I use ggplot to make a similar plot as in (1), the chart is totally different. It appears that ggplot is not plotting the data continuously.
> head(s1)
year month rrP rrV rrPp rrL rrR rrM rrF rrBC rrA rrStM
1 1971 1 0.34 0.81 0.97 0.36 1.06 0.87 0.87 0.53 0.77 0.15
2 1971 2 0.80 1.96 1.07 0.64 1.59 1.29 0.85 0.66 1.76 0.96
3 1971 3 0.42 -0.43 -0.34 -0.46 -0.38 -0.01 0.04 -0.02 -0.46 -0.18
4 1971 4 0.65 0.93 1.69 1.83 0.82 1.54 1.02 0.94 0.64 0.68
5 1971 5 0.48 0.66 1.24 1.04 0.83 1.17 0.88 1.08 -0.45 -0.23
6 1971 6 0.19 -0.90 -0.75 -0.46 -1.25 -1.24 -0.46 -0.10 -0.50 -0.18
'''
Plot I obtained using the code below
s1<-data.frame (s1)
s1 = as.data.table(s1)
ggplot(data = s1, aes(x = year, y = rrP)) +
geom_col(data = s1[Mau <= 0], fill = "red") +
geom_col(data = s1[Mau >= 0], fill = "blue") +
theme_bw()
I am looking to plot figures like this
Thanking you in advance for your replies.
Vimal
To have years in x-axis, you have to convert the data into ts() object like the following code
library(SPEI)
data(wichita)
#calculate 6-month SPI
plot(spi(ts(wichita$PRCP,freq=12,start=c(1980,1)),scale = 6))
Or you can follow this question
How to format the x-axis of the hard coded plotting function of SPEI package in R?
I have the following code to plot a line graph:
df %>% pivot_longer(-Client) %>%
ggplot(aes(x=name,y=value,color=factor(Client),group=factor(Client)))+
geom_line()+
xlab('Client')+
theme_bw()+
labs(color='Client')
It plots a line for each of my clients, but since i have too many clients, plot all of them in one graph gets pretty messy, I've been tryin to use the facet_wrap() function to divide the clients in separate graphs but couldn't figure out how to do this, so here I am...
There is a sample of my data:
Client Model_1 Model_2 Model_3 Model_4 Model_5
1 10.34 0.22 0.62 0.47 1.96
2 0.97 0.60 0.04 0.78 0.19
3 2.01 0.15 0.27 0.49 0.00
4 0.57 0.94 0.11 0.66 0.00
5 0.68 0.65 0.26 0.41 0.50
6 0.55 3.59 0.06 0.01 5.50
7 10.68 1.08 0.07 0.16 0.20
Try creating a group over number of customer using module like this:
library(ggplot2)
library(dplyr)
library(tidyr)
#Code
df %>% pivot_longer(-Client) %>%
mutate(Group=ifelse(Client %% 2==0,'G1','G2')) %>%
ggplot(aes(x=name,y=value,color=factor(Client),group=factor(Client)))+
geom_line()+
xlab('Client')+
theme_bw()+
labs(color='Client')+
facet_wrap(.~Group,scales = 'free')
Output:
I have a weather dataset, i found a simple linear model for two columns Temperature and Humidity and plotted the histogram of its residuals and calculated the mean and std.
model <- lm(Temperature..C. ~ Humidity, data = inputData)
model.res = resid(model)
hist(model.res)
mean(model.res)
sd(model.res)
I should Plot QQ-plot of residuals versus a zero-mean normal distribution with estimated std. I used Kolmogorov-Smirnov to compare a sample with a reference probability distribution but i don't know how to plot it together:
ks<-ks.test(model.res, "pnorm", mean=0, sd=sd(model.res))
qqnorm(model.res, main="qqnorm")
qqline(model.res)
Data example:
Temperature..C. Humidity
1 9.472222 0.89
2 9.355556 0.86
3 9.377778 0.89
4 8.288889 0.83
5 8.755556 0.83
6 9.222222 0.85
7 7.733333 0.95
8 8.772222 0.89
9 10.822222 0.82
10 13.772222 0.72
11 16.016667 0.67
12 17.144444 0.54
13 17.800000 0.55
14 17.333333 0.51
15 18.877778 0.47
16 18.911111 0.46
17 15.388889 0.60
18 15.550000 0.63
19 14.255556 0.69
20 13.144444 0.70
Here is a solution using ggplot2
ggplot(model, aes(sample = rstandard(model))) +
geom_qq() +
stat_qq_line(dparams=list(sd=sd(model.res)), color="red") +
stat_qq_line()
The red line represents the qqline with your sample sd, the blackline a sd of 1.
You did not ask for that, but you could also add a smoothed qqplot:
data_model <- model
data_model$theo <- unlist(qqnorm(data_model$residuals)[1])
ggplot(data_model, aes(sample = rstandard(data_model))) +
geom_qq() +
stat_qq_line(dparams=list(sd=sd(model.res)), color="red") +
geom_smooth(aes(x=data_model$theo, y=data_model$residuals), method = "loess")
This question already has answers here:
Calculate the Area under a Curve
(7 answers)
Closed 7 years ago.
I want to integrate a one dimensional vector in R, How should I do that?
Let's say I have:
d=hist(p, breaks=100, plot=FALSE)$density
where p is a sample like:
p=rnorm(1e5)
How can I calculate an integral over d?
If we assume that the values in d correspond to the y values of a function then we can calculate the integral by using a discrete approximation. We can for example use the trapezium rule or Simpsons rule for this purpose. We then also need to input the stepsize that corresponds to the discrete interval on the x-axis in order to "approximate the area under the curve".
Discrete integration functions defined below:
p=rnorm(1e5)
d=hist(p,breaks=100,plot=FALSE)$density
discreteIntegrationTrapeziumRule <- function(v,lower=1,upper=length(v),stepsize=1)
{
if(upper > length(v))
upper=length(v)
if(lower < 1)
lower=1
integrand <- v[lower:upper]
l <- length(integrand)
stepsize*(0.5*integrand[1]+sum(integrand[2:(l-1)])+0.5*v[l])
}
discreteIntegrationSimpsonRule <- function(v,lower=1,upper=length(v),stepsize=1)
{
if(upper > length(v))
upper=length(v)
if(lower < 1)
lower=1
integrand <- v[lower:upper]
l <- length(integrand)
a = seq(from=2,to=l-1,by=2);
b = seq(from=3,to=l-1,by=2)
(stepsize/3)*(integrand[1]+4*sum(integrand[a])+2*sum(integrand[b])+integrand[l])
}
As an example, let's approximate the complete area under the curve while assuming discrete x steps of size 1 and then do the same for the second half of d while we assume x-steps of size 0.2.
> plot(1:length(d),d) # stepsize one on x-axis
> resultTrapeziumRule <- discreteIntegrationTrapeziumRule(d) # integrate over complete interval, assume x-stepsize = 1
> resultSimpsonRule <- discreteIntegrationSimpsonRule(d) # integrate over complete interval, assume x-stepsize = 1
> resultTrapeziumRule
[1] 9.9999
> resultSimpsonRule
[1] 10.00247
> plot(seq(from=-10,to=(-10+(length(d)*0.2)-0.2),by=0.2),d) # stepsize 0.2 on x-axis
> resultTrapziumRule <- discreteIntegrationTrapeziumRule(d,ceiling(length(d)/2),length(d),0.2) # integrate over second part of vector, x-stepsize=0.2
> resultSimpsonRule <- discreteIntegrationSimpsonRule(d,ceiling(length(d)/2),length(d),0.2) # integrate over second part of vector, x-stepsize=0.2
> resultTrapziumRule
[1] 1.15478
> resultSimpsonRule
[1] 1.11678
In general, the Simpson rule offers better approximations of the integral. The more y-values you have (and the smaller the x-axis stepsize), the better your approximations will become.
Small EDIT for clarity:
In this particular case the stepsize should obviously be 0.1. The complete area under the density curve is then (approximately) equal to 1, as expected.
> d=hist(p,breaks=100,plot=FALSE)$density
> hist(p,breaks=100,plot=FALSE)$mids # stepsize = 0.1
[1] -4.75 -4.65 -4.55 -4.45 -4.35 -4.25 -4.15 -4.05 -3.95 -3.85 -3.75 -3.65 -3.55 -3.45 -3.35 -3.25 -3.15 -3.05 -2.95 -2.85 -2.75 -2.65 -2.55
[24] -2.45 -2.35 -2.25 -2.15 -2.05 -1.95 -1.85 -1.75 -1.65 -1.55 -1.45 -1.35 -1.25 -1.15 -1.05 -0.95 -0.85 -0.75 -0.65 -0.55 -0.45 -0.35 -0.25
[47] -0.15 -0.05 0.05 0.15 0.25 0.35 0.45 0.55 0.65 0.75 0.85 0.95 1.05 1.15 1.25 1.35 1.45 1.55 1.65 1.75 1.85 1.95 2.05
[70] 2.15 2.25 2.35 2.45 2.55 2.65 2.75 2.85 2.95 3.05 3.15 3.25 3.35 3.45 3.55 3.65 3.75 3.85 3.95 4.05 4.15
> resultTrapeziumRule <- discreteIntegrationTrapeziumRule(d,stepsize=0.1)
> resultTrapeziumRule
[1] 0.999985
I have a data frame in the following form:
Data <- data.frame(X = sample(1:10), Y = sample(1:10))
I would like to color the dots obtained with
plot(Data$X,Data$Y)
using the values from another data frame:
X1 X2 X3 X4 X5
1 0.57 0.40 0.64 0.07 0.57
2 0.40 0.45 0.49 0.21 0.39
3 0.72 0.65 0.74 0.61 0.71
4 0.73 0.54 0.76 0.39 0.64
5 0.88 0.81 0.89 0.75 0.64
6 0.70 0.65 0.78 0.51 0.66
7 0.84 0.91 0.89 0.86 0.83
8 -0.07 0.39 -0.02 0.12 -0.01
9 0.82 0.83 0.84 0.81 0.79
10 0.82 0.55 0.84 0.51 0.59
So to have five different graphs using the five columns from the second data frame to color the dots. I manage to do this by looking here (Colour points in a plot differently depending on a vector of values), but I'm not able to figure out how to set the same color scale for all the five different plots.
The columns in the second data frame could have different minimum and maximum so If I generate the colors using the cut function on the first column this will generate factors, and later colors, that are relative to this column.
Hope this is clear,
Thanks.
You need your color ramp to include all values so you likely want to get them in the same vector. I would probably melt the data, then make the color ramp, then use the facet function in ggplot to get multiple plots. Alternately if you don't want to use ggplot you could cast the data back to multiple columns with 5 extra columns for your colors.
require(reshape2)
require(ggplot2)
Data.m <- melt(Data,id=Y)
rbPal <- colorRampPalette(c('red','blue'))
Data.m$Col <- rbPal(10)[as.numeric(cut(Data.m$value,breaks = 10))]
ggplot(Data.m, aes(value, Y,col=Col)) +
geom_point() +
facet_grid(variable~.)
Your Data object has two variables, X and Y, but then you talk about making 5 graphs, so that part is a little unclear, but I think the melt function will help getting a comprehensive color ramp and the facet_grid function may make it easier to do 5 graphs at once if that is what you want.