Include facet_wrap in R line plot - r

I have the following code to plot a line graph:
df %>% pivot_longer(-Client) %>%
ggplot(aes(x=name,y=value,color=factor(Client),group=factor(Client)))+
geom_line()+
xlab('Client')+
theme_bw()+
labs(color='Client')
It plots a line for each of my clients, but since i have too many clients, plot all of them in one graph gets pretty messy, I've been tryin to use the facet_wrap() function to divide the clients in separate graphs but couldn't figure out how to do this, so here I am...
There is a sample of my data:
Client Model_1 Model_2 Model_3 Model_4 Model_5
1 10.34 0.22 0.62 0.47 1.96
2 0.97 0.60 0.04 0.78 0.19
3 2.01 0.15 0.27 0.49 0.00
4 0.57 0.94 0.11 0.66 0.00
5 0.68 0.65 0.26 0.41 0.50
6 0.55 3.59 0.06 0.01 5.50
7 10.68 1.08 0.07 0.16 0.20

Try creating a group over number of customer using module like this:
library(ggplot2)
library(dplyr)
library(tidyr)
#Code
df %>% pivot_longer(-Client) %>%
mutate(Group=ifelse(Client %% 2==0,'G1','G2')) %>%
ggplot(aes(x=name,y=value,color=factor(Client),group=factor(Client)))+
geom_line()+
xlab('Client')+
theme_bw()+
labs(color='Client')+
facet_wrap(.~Group,scales = 'free')
Output:

Related

How to make SPI plots using ggplot2?

This is my first question on this platform, though I have thoroughly used it to solve many problems in R programming.
(1) I am stuck with SPI plots. The current SPI plot from SPEI package does not allow nice plots and I am not able to add the years along the x-axis. Kindly if anyone can help me to solve it.
(2) I have reworked the SPI data and created a data frame for different stations. However, when I use ggplot to make a similar plot as in (1), the chart is totally different. It appears that ggplot is not plotting the data continuously.
> head(s1)
year month rrP rrV rrPp rrL rrR rrM rrF rrBC rrA rrStM
1 1971 1 0.34 0.81 0.97 0.36 1.06 0.87 0.87 0.53 0.77 0.15
2 1971 2 0.80 1.96 1.07 0.64 1.59 1.29 0.85 0.66 1.76 0.96
3 1971 3 0.42 -0.43 -0.34 -0.46 -0.38 -0.01 0.04 -0.02 -0.46 -0.18
4 1971 4 0.65 0.93 1.69 1.83 0.82 1.54 1.02 0.94 0.64 0.68
5 1971 5 0.48 0.66 1.24 1.04 0.83 1.17 0.88 1.08 -0.45 -0.23
6 1971 6 0.19 -0.90 -0.75 -0.46 -1.25 -1.24 -0.46 -0.10 -0.50 -0.18
'''
Plot I obtained using the code below
s1<-data.frame (s1)
s1 = as.data.table(s1)
ggplot(data = s1, aes(x = year, y = rrP)) +
geom_col(data = s1[Mau <= 0], fill = "red") +
geom_col(data = s1[Mau >= 0], fill = "blue") +
theme_bw()
I am looking to plot figures like this
Thanking you in advance for your replies.
Vimal
To have years in x-axis, you have to convert the data into ts() object like the following code
library(SPEI)
data(wichita)
#calculate 6-month SPI
plot(spi(ts(wichita$PRCP,freq=12,start=c(1980,1)),scale = 6))
Or you can follow this question
How to format the x-axis of the hard coded plotting function of SPEI package in R?

Logaritmic scale in x-axis

I have the following code:
S = [100 200 500 1000 10000];
H = [0.14 0.15 0.17 0.19 0.28;0.14 0.16 0.18 0.20 0.29;0.15 0.17 0.19 0.21 0.31;0.16 0.17 0.20 0.22 0.32;0.23 0.22 0.28 0.30 0.44;0.23 0.23 0.29 0.3 0.5;0.33 0.32 0.4 0.42 0.63;0.32 0.31 0.39 0.40 0.61;0.23 0.23 0.30 0.30 0.50];
for i = 1:9
hold on
plot(S, H(i,:));
legend('GHM01','GHM02','GHM03','GHM04','GHM05','GHM06','GHM07','GHM08','GHM09'); %legend not correctly
axis([100 10000 0.1 1])
end
set(gca,'xscale','log')
The x-axis looks like this:
Because The S-values are very far from each other, I used a logaritmic x-axis (and linear y-axis).
I have on the axis 5 values (see S), and I only want those 5 values visible on the x-axis with equidistant spacing between the values. How do I do this? Or is there a better alternative to display my x-axis, rather than logaritmic scale?
If you want the X-axis ticks to be equally distant although they are not (neither on a linear nor on a log scale) then you basically treat this axis as categorical, and then it should get and ordinal temporary value (say 1:5) to determine the distance between them.
Here is a quick implementation of your comment above:
S = {'100' '200' '500' '1000' '10000'};
H = [0.14 0.15 0.17 0.19 0.28;...
0.14 0.16 0.18 0.20 0.29;
0.15 0.17 0.19 0.21 0.31;
0.16 0.17 0.20 0.22 0.32;
0.23 0.22 0.28 0.30 0.44;
0.23 0.23 0.29 0.3 0.5;
0.33 0.32 0.4 0.42 0.63;
0.32 0.31 0.39 0.40 0.61;
0.23 0.23 0.30 0.30 0.50];
f = figure;
plot(1:length(S),H);
f.Children.XTick = 1:length(S);
f.Children.XTickLabel = S;
TMHO this is the most straightforward way to solve this problem ;)

How to add shaded confidence intervals to line plot with specified values

I have a small table of summary data with the odds ratio, upper and lower confidence limits for four categories, with six levels within each category. I'd like to produce a chart using ggplot2 that looks similar to the usual one created when you specify a lm and it's se, but I'd like R just to use the pre-specified values I have in my table. I've managed to create the line graph with error bars, but these overlap and make it unclear. The data look like this:
interval OR Drug lower upper
14 0.004 a 0.002 0.205
30 0.022 a 0.001 0.101
60 0.13 a 0.061 0.23
90 0.22 a 0.14 0.34
180 0.25 a 0.17 0.35
365 0.31 a 0.23 0.41
14 0.84 b 0.59 1.19
30 0.85 b 0.66 1.084
60 0.94 b 0.75 1.17
90 0.83 b 0.68 1.01
180 1.28 b 1.09 1.51
365 1.58 b 1.38 1.82
14 1.9 c 0.9 4.27
30 2.91 c 1.47 6.29
60 2.57 c 1.52 4.55
90 2.05 c 1.31 3.27
180 2.422 c 1.596 3.769
365 2.83 c 1.93 4.26
14 0.29 d 0.04 1.18
30 0.09 d 0.01 0.29
60 0.39 d 0.17 0.82
90 0.39 d 0.2 0.7
180 0.37 d 0.22 0.59
365 0.34 d 0.21 0.53
I have tried this:
limits <- aes(ymax=upper, ymin=lower)
dodge <- position_dodge(width=0.9)
ggplot(data, aes(y=OR, x=days, colour=Drug)) +
geom_line(stat="identity") +
geom_errorbar(limits, position=dodge)
and searched for a suitable answer to create a pretty plot, but I'm flummoxed!
Any help greatly appreciated!
You need the following lines:
p<-ggplot(data=data, aes(x=interval, y=OR, colour=Drug)) + geom_point() + geom_line()
p<-p+geom_ribbon(aes(ymin=data$lower, ymax=data$upper), linetype=2, alpha=0.1)
Here is a base R approach using polygon() since #jmb requested a solution in the comments. Note that I have to define two sets of x-values and associated y values for the polygon to plot. It works by plotting the outer perimeter of the polygon. I define plot type = 'n' and use points() separately to get the points on top of the polygon. My personal preference is the ggplot solutions above when possible since polygon() is pretty clunky.
library(tidyverse)
data('mtcars') #built in dataset
mean.mpg = mtcars %>%
group_by(cyl) %>%
summarise(N = n(),
avg.mpg = mean(mpg),
SE.low = avg.mpg - (sd(mpg)/sqrt(N)),
SE.high =avg.mpg + (sd(mpg)/sqrt(N)))
plot(avg.mpg ~ cyl, data = mean.mpg, ylim = c(10,30), type = 'n')
#note I have defined c(x1, x2) and c(y1, y2)
polygon(c(mean.mpg$cyl, rev(mean.mpg$cyl)),
c(mean.mpg$SE.low,rev(mean.mpg$SE.high)), density = 200, col ='grey90')
points(avg.mpg ~ cyl, data = mean.mpg, pch = 19, col = 'firebrick')

How to color point in R with the same scale

I have a data frame in the following form:
Data <- data.frame(X = sample(1:10), Y = sample(1:10))
I would like to color the dots obtained with
plot(Data$X,Data$Y)
using the values from another data frame:
X1 X2 X3 X4 X5
1 0.57 0.40 0.64 0.07 0.57
2 0.40 0.45 0.49 0.21 0.39
3 0.72 0.65 0.74 0.61 0.71
4 0.73 0.54 0.76 0.39 0.64
5 0.88 0.81 0.89 0.75 0.64
6 0.70 0.65 0.78 0.51 0.66
7 0.84 0.91 0.89 0.86 0.83
8 -0.07 0.39 -0.02 0.12 -0.01
9 0.82 0.83 0.84 0.81 0.79
10 0.82 0.55 0.84 0.51 0.59
So to have five different graphs using the five columns from the second data frame to color the dots. I manage to do this by looking here (Colour points in a plot differently depending on a vector of values), but I'm not able to figure out how to set the same color scale for all the five different plots.
The columns in the second data frame could have different minimum and maximum so If I generate the colors using the cut function on the first column this will generate factors, and later colors, that are relative to this column.
Hope this is clear,
Thanks.
You need your color ramp to include all values so you likely want to get them in the same vector. I would probably melt the data, then make the color ramp, then use the facet function in ggplot to get multiple plots. Alternately if you don't want to use ggplot you could cast the data back to multiple columns with 5 extra columns for your colors.
require(reshape2)
require(ggplot2)
Data.m <- melt(Data,id=Y)
rbPal <- colorRampPalette(c('red','blue'))
Data.m$Col <- rbPal(10)[as.numeric(cut(Data.m$value,breaks = 10))]
ggplot(Data.m, aes(value, Y,col=Col)) +
geom_point() +
facet_grid(variable~.)
Your Data object has two variables, X and Y, but then you talk about making 5 graphs, so that part is a little unclear, but I think the melt function will help getting a comprehensive color ramp and the facet_grid function may make it easier to do 5 graphs at once if that is what you want.

Multiple boxplots with predefined statistics using lattice-like graphs in r

I have a dataset which looks like this
VegType 87MIN 87MAX 87Q25 87Q50 87Q75 96MIN 96MAX 96Q25 96Q50 96Q75 00MIN 00MAX 00Q25 00Q50 00Q75
1 0.02 0.32 0.11 0.12 0.13 0.02 0.26 0.08 0.09 0.10 0.02 0.28 0.10 0.11 0.12
2 0.02 0.45 0.12 0.13 0.13 0.02 0.20 0.09 0.10 0.11 0.02 0.26 0.11 0.12 0.12
3 0.02 0.29 0.13 0.14 0.14 0.02 0.27 0.11 0.11 0.12 0.02 0.26 0.12 0.13 0.13
4 0.02 0.41 0.13 0.13 0.14 0.02 0.58 0.10 0.11 0.12 0.02 0.34 0.12 0.13 0.13
5 0.02 0.42 0.12 0.13 0.14 0.02 0.46 0.10 0.11 0.11 0.02 0.28 0.12 0.12 0.13
6 0.02 0.32 0.13 0.14 0.14 0.02 0.52 0.12 0.12 0.13 0.02 0.29 0.13 0.14 0.14
7 0.02 0.55 0.12 0.13 0.14 0.02 0.24 0.10 0.11 0.11 0.02 0.37 0.12 0.12 0.13
8 0.02 0.55 0.12 0.13 0.14 0.02 0.19 0.10 0.11 0.12 0.02 0.22 0.11 0.12 0.13
In reality I have 26 variables and 5 years (87,96 and 00 in the column names are years). In an ideal world I would like to have a lattice-like graph with 26 plots, one per variable, with each plot containing 5 boxes, i.e. one per year. I understand that it is not possible to do this is lattice because lattice won't accept predefined statistics. Is there a fairly unpainful way to do this in R with predefined stats? I have used bxp for simple boxplots plotting all the variables for one year in a single plot e.g.
Yr01 = read.csv('dat.csv',header=T)
dat01=t(Yr01[,c("01Min","01Q25","01Mean","01Q75","01Max")])
bxp(list(stats=dat01, n=rep(26, ncol(dat01))),ylim=c(0.07,0.2))
but I don't know how to go from there to what I need.
Thanks.
This can be done, at least using ggplot2, but you'll have to reshape your data quite a bit. And you really have to have a data where the quantiles actually make sense!! Your quantile values are all messed up! For example, Var1 has 01Max = 0.26 and 01Q75 = .67!!
First, I'll recreate a valid data:
n <- c("01Min", "01Max", "01Med", "01Q25", "01Q75", "02Min",
"02Max", "02Med", "02Q25", "02Q75")
v1 <- c(0.03, 0.76, 0.41, 0.13, 0.67, 0.10, 0.43, 0.27, 0.2, 0.33)
v2 <- c(0.03, 0.28, 0.14, 0.08, 0.20, 0.02, 0.77, 0.13, 0.06, 0.44)
df <- data.frame(v1=v1, v2=v2)
df <- as.data.frame(t(df))
names(df) <- n
df <- cbind(var=c("v1","v2"), df)
> df
# var 01Min 01Max 01Med 01Q25 01Q75 02Min 02Max 02Med 02Q25 02Q75
# v1 v1 0.03 0.76 0.41 0.13 0.67 0.10 0.43 0.27 0.20 0.33
# v2 v2 0.03 0.28 0.14 0.08 0.20 0.02 0.77 0.13 0.06 0.44
Next, we'll reshape the data:
require(reshape2)
df.m <- melt(df, id="var")
# look for a bunch of numbers from the start of the string and capture it
# in the first variable: () captures the pattern. And replace it with the
# captured pattern with the variable "\\1"
df.m$year <- gsub("^([0-9]+)(.*$)", "\\1", df.m$variable)
# the same but instead refer to the captured pattern in the second
# paranthesis using "\\2"
df.m$quan <- gsub("^([0-9]+)(.*)$", "\\2", df.m$variable)
df.f <- dcast(df.m, var+year ~ quan, value.var="value")
To get to this format:
> df.f
# var year Max Med Min Q25 Q75
# 1 v1 01 0.76 0.41 0.03 0.13 0.67
# 2 v1 02 0.43 0.27 0.10 0.20 0.33
# 3 v2 01 0.28 0.14 0.03 0.08 0.20
# 4 v2 02 0.77 0.13 0.02 0.06 0.44
Now, we can plot by directly providing the quantile values to corresponding parameters using the corresponding column names as follows:
require(ggplot2)
require(scales)
p <- ggplot(df.f, aes(x=var, ymin=`Min`, lower=`Q25`, middle=`Med`,
upper=`Q75`, ymax=`Max`))
p <- p + geom_boxplot(aes(fill=year), stat="identity")
p
# if you want facetting:
p + facet_wrap( ~ var, scales="free")
You can now accomplish your task of plotting all years for each var in a separate plot using a lapply with this code and subsetting as follows:
lapply(levels(df.f$var), function(x) {
p <- ggplot(df.f[df.f$var == x, ],
aes(x=var, ymin=`Min`, lower=`Q25`,
middle=`Med`, upper=`Q75`, ymax=`Max`))
p <- p + geom_boxplot(aes(fill=year), stat="identity")
p
ggsave(paste0(x, ".pdf"), last_plot())
})
Edit: Your data is different from the earlier data you provided in some aspects. So, here's the version of the code for your new data:
# change var to VegType everywhere
require(reshape2)
df.m <- melt(df, id="VegType")
df.m$year <- gsub("^X([0-9]+)(.*$)", "\\1", df.m$variable) # pattern has a X
df.m$quan <- gsub("^X([0-9]+)(.*)$", "\\2", df.m$variable) # pattern has a X
df.f <- dcast(df.m, VegType+year ~ quan, value.var="value")
df.f$VegType <- factor(df.f$VegType) # convert integer to factor
require(ggplot2)
require(scales)
p <- ggplot(df.f, aes(x=VegType, ymin=`MIN`, lower=`Q25`, middle=`Q50`,
upper=`Q75`, ymax=`MAX`))
p <- p + geom_boxplot(aes(fill=year), stat="identity")
p
You can facet/write as separate plots using same code as before.

Resources