How to make SPI plots using ggplot2? - r

This is my first question on this platform, though I have thoroughly used it to solve many problems in R programming.
(1) I am stuck with SPI plots. The current SPI plot from SPEI package does not allow nice plots and I am not able to add the years along the x-axis. Kindly if anyone can help me to solve it.
(2) I have reworked the SPI data and created a data frame for different stations. However, when I use ggplot to make a similar plot as in (1), the chart is totally different. It appears that ggplot is not plotting the data continuously.
> head(s1)
year month rrP rrV rrPp rrL rrR rrM rrF rrBC rrA rrStM
1 1971 1 0.34 0.81 0.97 0.36 1.06 0.87 0.87 0.53 0.77 0.15
2 1971 2 0.80 1.96 1.07 0.64 1.59 1.29 0.85 0.66 1.76 0.96
3 1971 3 0.42 -0.43 -0.34 -0.46 -0.38 -0.01 0.04 -0.02 -0.46 -0.18
4 1971 4 0.65 0.93 1.69 1.83 0.82 1.54 1.02 0.94 0.64 0.68
5 1971 5 0.48 0.66 1.24 1.04 0.83 1.17 0.88 1.08 -0.45 -0.23
6 1971 6 0.19 -0.90 -0.75 -0.46 -1.25 -1.24 -0.46 -0.10 -0.50 -0.18
'''
Plot I obtained using the code below
s1<-data.frame (s1)
s1 = as.data.table(s1)
ggplot(data = s1, aes(x = year, y = rrP)) +
geom_col(data = s1[Mau <= 0], fill = "red") +
geom_col(data = s1[Mau >= 0], fill = "blue") +
theme_bw()
I am looking to plot figures like this
Thanking you in advance for your replies.
Vimal

To have years in x-axis, you have to convert the data into ts() object like the following code
library(SPEI)
data(wichita)
#calculate 6-month SPI
plot(spi(ts(wichita$PRCP,freq=12,start=c(1980,1)),scale = 6))
Or you can follow this question
How to format the x-axis of the hard coded plotting function of SPEI package in R?

Related

Include facet_wrap in R line plot

I have the following code to plot a line graph:
df %>% pivot_longer(-Client) %>%
ggplot(aes(x=name,y=value,color=factor(Client),group=factor(Client)))+
geom_line()+
xlab('Client')+
theme_bw()+
labs(color='Client')
It plots a line for each of my clients, but since i have too many clients, plot all of them in one graph gets pretty messy, I've been tryin to use the facet_wrap() function to divide the clients in separate graphs but couldn't figure out how to do this, so here I am...
There is a sample of my data:
Client Model_1 Model_2 Model_3 Model_4 Model_5
1 10.34 0.22 0.62 0.47 1.96
2 0.97 0.60 0.04 0.78 0.19
3 2.01 0.15 0.27 0.49 0.00
4 0.57 0.94 0.11 0.66 0.00
5 0.68 0.65 0.26 0.41 0.50
6 0.55 3.59 0.06 0.01 5.50
7 10.68 1.08 0.07 0.16 0.20
Try creating a group over number of customer using module like this:
library(ggplot2)
library(dplyr)
library(tidyr)
#Code
df %>% pivot_longer(-Client) %>%
mutate(Group=ifelse(Client %% 2==0,'G1','G2')) %>%
ggplot(aes(x=name,y=value,color=factor(Client),group=factor(Client)))+
geom_line()+
xlab('Client')+
theme_bw()+
labs(color='Client')+
facet_wrap(.~Group,scales = 'free')
Output:

R Function to get Confidence Interval of Difference Between Means

I am trying find a function that allows me two easily get the confidence interval of difference between two means.
I am pretty sure t.test has this functionality, but I haven't been able to make it work. Below is a screenshot of what I have tried so far:
Image
This is the dataset I am using
Indoor Outdoor
1 0.07 0.29
2 0.08 0.68
3 0.09 0.47
4 0.12 0.54
5 0.12 0.97
6 0.12 0.35
7 0.13 0.49
8 0.14 0.84
9 0.15 0.86
10 0.15 0.28
11 0.17 0.32
12 0.17 0.32
13 0.18 1.55
14 0.18 0.66
15 0.18 0.29
16 0.18 0.21
17 0.19 1.02
18 0.20 1.59
19 0.22 0.90
20 0.22 0.52
21 0.23 0.12
22 0.23 0.54
23 0.25 0.88
24 0.26 0.49
25 0.28 1.24
26 0.28 0.48
27 0.29 0.27
28 0.34 0.37
29 0.39 1.26
30 0.40 0.70
31 0.45 0.76
32 0.54 0.99
33 0.62 0.36
and I have been trying to use t.test function that has been installed from
install.packages("ggpubr")
I am pretty new to R, so sorry if there is a simple answer to this question. I have searched around quite a bit and haven't been able to find anything that I am looking for.
Note: The output I am looking for is Between -1.224 and 0.376
Edit:
The CI of difference between means I am looking for is if a random 34th datapoint was added to the chart by picking a random value in the Indoor column and a random value in the Outdoor column and duplicating it. Running the t.test will output the correct CI for the difference of means for the given sample size of 33.
How can I go about doing this pretending the sample size is 34?
there's probably something more convenient in the standard library, but it's pretty easy to calculate. given your df variable, we can just do:
# calculate mean of difference
d_mu <- mean(df$Indoor) - mean(df$Outdoor)
# calculate SD of difference
d_sd <- sqrt(var(df$Indoor) + var(df$Outdoor))
# calculate 95% CI of this
d_mu + d_sd * qt(c(0.025, 0.975), nrow(df)*2)
giving me: -1.2246 0.3767
mostly for #AkselA: I often find it helpful to check my work by sampling simpler distributions, in this case I'd do something like:
a <- mean(df$Indoor) + sd(df$Indoor) * rt(1000000, nrow(df)-1)
b <- mean(df$Outdoor) + sd(df$Outdoor) * rt(1000000, nrow(df)-1)
quantile(a - b, c(0.025, 0.975))
which gives me answers much closer to the CI I gave in the comment
Even though I always find the approach of manually calculating the results, as shown by #Sam Mason, the most insightful, there are some who want a shortcut. And sometimes, it's also ok to be lazy :)
So among the different ways to calculate CIs, this is imho the most comfortable:
DescTools::MeanDiffCI(Indoor, Outdoor)
Here's a reprex:
IV <- diamonds$price
DV <- rnorm(length(IV), mean = mean(IV), sd = sd(IV))
DescTools::MeanDiffCI(IV, DV)
gives
meandiff lwr.ci upr.ci
-18.94825 -66.51845 28.62195
This is calculated with 999 bootstrapped samples by default. If you want 1000 or more, you can just add that in the argument R:
DescTools::MeanDiffCI(IV, DV, R = 1000)

Logaritmic scale in x-axis

I have the following code:
S = [100 200 500 1000 10000];
H = [0.14 0.15 0.17 0.19 0.28;0.14 0.16 0.18 0.20 0.29;0.15 0.17 0.19 0.21 0.31;0.16 0.17 0.20 0.22 0.32;0.23 0.22 0.28 0.30 0.44;0.23 0.23 0.29 0.3 0.5;0.33 0.32 0.4 0.42 0.63;0.32 0.31 0.39 0.40 0.61;0.23 0.23 0.30 0.30 0.50];
for i = 1:9
hold on
plot(S, H(i,:));
legend('GHM01','GHM02','GHM03','GHM04','GHM05','GHM06','GHM07','GHM08','GHM09'); %legend not correctly
axis([100 10000 0.1 1])
end
set(gca,'xscale','log')
The x-axis looks like this:
Because The S-values are very far from each other, I used a logaritmic x-axis (and linear y-axis).
I have on the axis 5 values (see S), and I only want those 5 values visible on the x-axis with equidistant spacing between the values. How do I do this? Or is there a better alternative to display my x-axis, rather than logaritmic scale?
If you want the X-axis ticks to be equally distant although they are not (neither on a linear nor on a log scale) then you basically treat this axis as categorical, and then it should get and ordinal temporary value (say 1:5) to determine the distance between them.
Here is a quick implementation of your comment above:
S = {'100' '200' '500' '1000' '10000'};
H = [0.14 0.15 0.17 0.19 0.28;...
0.14 0.16 0.18 0.20 0.29;
0.15 0.17 0.19 0.21 0.31;
0.16 0.17 0.20 0.22 0.32;
0.23 0.22 0.28 0.30 0.44;
0.23 0.23 0.29 0.3 0.5;
0.33 0.32 0.4 0.42 0.63;
0.32 0.31 0.39 0.40 0.61;
0.23 0.23 0.30 0.30 0.50];
f = figure;
plot(1:length(S),H);
f.Children.XTick = 1:length(S);
f.Children.XTickLabel = S;
TMHO this is the most straightforward way to solve this problem ;)

How to add shaded confidence intervals to line plot with specified values

I have a small table of summary data with the odds ratio, upper and lower confidence limits for four categories, with six levels within each category. I'd like to produce a chart using ggplot2 that looks similar to the usual one created when you specify a lm and it's se, but I'd like R just to use the pre-specified values I have in my table. I've managed to create the line graph with error bars, but these overlap and make it unclear. The data look like this:
interval OR Drug lower upper
14 0.004 a 0.002 0.205
30 0.022 a 0.001 0.101
60 0.13 a 0.061 0.23
90 0.22 a 0.14 0.34
180 0.25 a 0.17 0.35
365 0.31 a 0.23 0.41
14 0.84 b 0.59 1.19
30 0.85 b 0.66 1.084
60 0.94 b 0.75 1.17
90 0.83 b 0.68 1.01
180 1.28 b 1.09 1.51
365 1.58 b 1.38 1.82
14 1.9 c 0.9 4.27
30 2.91 c 1.47 6.29
60 2.57 c 1.52 4.55
90 2.05 c 1.31 3.27
180 2.422 c 1.596 3.769
365 2.83 c 1.93 4.26
14 0.29 d 0.04 1.18
30 0.09 d 0.01 0.29
60 0.39 d 0.17 0.82
90 0.39 d 0.2 0.7
180 0.37 d 0.22 0.59
365 0.34 d 0.21 0.53
I have tried this:
limits <- aes(ymax=upper, ymin=lower)
dodge <- position_dodge(width=0.9)
ggplot(data, aes(y=OR, x=days, colour=Drug)) +
geom_line(stat="identity") +
geom_errorbar(limits, position=dodge)
and searched for a suitable answer to create a pretty plot, but I'm flummoxed!
Any help greatly appreciated!
You need the following lines:
p<-ggplot(data=data, aes(x=interval, y=OR, colour=Drug)) + geom_point() + geom_line()
p<-p+geom_ribbon(aes(ymin=data$lower, ymax=data$upper), linetype=2, alpha=0.1)
Here is a base R approach using polygon() since #jmb requested a solution in the comments. Note that I have to define two sets of x-values and associated y values for the polygon to plot. It works by plotting the outer perimeter of the polygon. I define plot type = 'n' and use points() separately to get the points on top of the polygon. My personal preference is the ggplot solutions above when possible since polygon() is pretty clunky.
library(tidyverse)
data('mtcars') #built in dataset
mean.mpg = mtcars %>%
group_by(cyl) %>%
summarise(N = n(),
avg.mpg = mean(mpg),
SE.low = avg.mpg - (sd(mpg)/sqrt(N)),
SE.high =avg.mpg + (sd(mpg)/sqrt(N)))
plot(avg.mpg ~ cyl, data = mean.mpg, ylim = c(10,30), type = 'n')
#note I have defined c(x1, x2) and c(y1, y2)
polygon(c(mean.mpg$cyl, rev(mean.mpg$cyl)),
c(mean.mpg$SE.low,rev(mean.mpg$SE.high)), density = 200, col ='grey90')
points(avg.mpg ~ cyl, data = mean.mpg, pch = 19, col = 'firebrick')

How to color point in R with the same scale

I have a data frame in the following form:
Data <- data.frame(X = sample(1:10), Y = sample(1:10))
I would like to color the dots obtained with
plot(Data$X,Data$Y)
using the values from another data frame:
X1 X2 X3 X4 X5
1 0.57 0.40 0.64 0.07 0.57
2 0.40 0.45 0.49 0.21 0.39
3 0.72 0.65 0.74 0.61 0.71
4 0.73 0.54 0.76 0.39 0.64
5 0.88 0.81 0.89 0.75 0.64
6 0.70 0.65 0.78 0.51 0.66
7 0.84 0.91 0.89 0.86 0.83
8 -0.07 0.39 -0.02 0.12 -0.01
9 0.82 0.83 0.84 0.81 0.79
10 0.82 0.55 0.84 0.51 0.59
So to have five different graphs using the five columns from the second data frame to color the dots. I manage to do this by looking here (Colour points in a plot differently depending on a vector of values), but I'm not able to figure out how to set the same color scale for all the five different plots.
The columns in the second data frame could have different minimum and maximum so If I generate the colors using the cut function on the first column this will generate factors, and later colors, that are relative to this column.
Hope this is clear,
Thanks.
You need your color ramp to include all values so you likely want to get them in the same vector. I would probably melt the data, then make the color ramp, then use the facet function in ggplot to get multiple plots. Alternately if you don't want to use ggplot you could cast the data back to multiple columns with 5 extra columns for your colors.
require(reshape2)
require(ggplot2)
Data.m <- melt(Data,id=Y)
rbPal <- colorRampPalette(c('red','blue'))
Data.m$Col <- rbPal(10)[as.numeric(cut(Data.m$value,breaks = 10))]
ggplot(Data.m, aes(value, Y,col=Col)) +
geom_point() +
facet_grid(variable~.)
Your Data object has two variables, X and Y, but then you talk about making 5 graphs, so that part is a little unclear, but I think the melt function will help getting a comprehensive color ramp and the facet_grid function may make it easier to do 5 graphs at once if that is what you want.

Resources