Error: could not find function "stat_sum_single" [ggplot2] - r

I'm trying to use "stat_sum_single" with a factor variable but I get the error:
Error: could not find function "stat_sum_single"
I tried converting the factor variable to a numeric but it doesn't seem to work - any ideas?
Full code:
ggplot(sn, aes(x = person,y = X, group=Plan, colour = Plan)) +
geom_line(size=0.5) +
scale_y_continuous(limits = c(0, 1.5)) +
scale_x_discrete(breaks = c(0,50,100), labels= c(0,50,100)) +
labs(x = "X",y = "%") +
stat_sum_single(mean, geom = 'line', aes(x = as.numeric(as.character(person))), size = 3, colour = 'red')
Data:
Plan person X m mad mmad
1 1 95 0.323000 0.400303 0.12
1 2 275 0.341818 0.400303 0.12
1 3 2 0.618000 0.400303 0.12
1 4 75 0.320000 0.400303 0.12
1 5 13 0.399000 0.400303 0.12
1 6 20 0.400000 0.400303 0.12
2 7 219 0.393000 0.353350 0.45
2 8 50 0.060000 0.353350 0.45
2 9 213 0.390000 0.353350 0.45
2 15 204 0.496100 0.353350 0.45
2 19 19 0.393000 0.353350 0.45
2 24 201 0.388000 0.353350 0.45
3 30 219 0.567 0.1254 0.89
3 14 50 0.679 0.1254 0.89
3 55 213 0.1234 0.1254 0.89
3 18 204 0.6135 0.1254 0.89
3 59 19 0.39356 0.1254 0.89
3 101 201 0.300 0.1254 0.89
Person is a factor variable.

Function stat_sum_single() isn't directly implemented in library ggplot2 but this function should be defined before using as shown in the help file of function stat_summary().
stat_sum_single <- function(fun, geom="point", ...) {
stat_summary(fun.y=fun, colour="red", geom=geom, size = 3, ...)
}

Here is the ggplot2 cran package:
http://cran.r-project.org/web/packages/ggplot2/ggplot2.pdf
on page 185, there is an example of using stat_sum_single.
I believe you need to somehow define it first in stat_summary.

Related

Rotate Line graph in R

I am trying to create a line graph with rotated x and y axes.
This is what my graph looks like, but
this is what I want
I am using the basic plot function in R as I am unfamiliar with ggplot2.
My code thus far is:
mytab=read.csv("stratotyperidge.csv")
plot(mytab$meters,mytab$d180,lwd=2,col="darkblue",bty='n',type='b',xlab="Height above base (m)",ylab="d180",main="Stratotype Ridge", horiz=TRUE)
But horiz=TRUE returns an error, although I have used it with barplot. I do not want to save my plot as an image and simply rotate it. I want to plot it like the picture linked above. This specific question has not been answered on SO.
This is what my data looks like:
ID# Identifier 1 d180 d13C meters
1 JEM 1 -6.5 1.09 0.5
2 JEM 2 -6.99 0.38 0.85
4 JEM 4 -6.94 0.66 10
5 JEM 5 -6.39 0.75 30.8
6 JEM 6 -7.15 0.38 50.2
7 JEM 7 -8.14 0.03 62.15
8 JEM 8A -7.4 0.33 71
8.5 JEM 8B -7.21 -0.05 71.4
10 JEM 10 -7.39 0.14 82.4
12 JEM 12 -7.27 1.22 87.5
I assume you are after something like this?
library(tidyverse);
df %>%
gather(what, value, d180, d13C) %>%
ggplot(aes(meters, value)) +
geom_point() +
geom_line() +
facet_wrap(~ what, scales = "free_x") +
coord_flip()
Sample data
df <- read.table(text =
"ID 'Identifier 1' d180 d13C meters
1 'JEM 1' -6.5 1.09 0.5
2 'JEM 2' -6.99 0.38 0.85
4 'JEM 4' -6.94 0.66 10
5 'JEM 5' -6.39 0.75 30.8
6 'JEM 6' -7.15 0.38 50.2
7 'JEM 7' -8.14 0.03 62.15
8 'JEM 8A' -7.4 0.33 71
8.5 'JEM 8B' -7.21 -0.05 71.4
10 'JEM 10' -7.39 0.14 82.4
12 'JEM 12' -7.27 1.22 87.5", header = T)
Instead of this:
plot(mytab$meters, mytab$d180, type="l")
Try this:
plot(mytab$d180, mytab$meters, type="l")
You should get something like this:

How build a nonlinear approximation?

There was a need to build an approximation of data using the formula
y = a(exp(x/b) - 1) (below the code).
library("ggplot2")
df <- read.table(file='vah_p_1',header =TRUE)
p <- ggplot(df, aes(x = x, y = y)) + geom_point() +
geom_smooth(data = df, method = "nls",size=0.4, se=FALSE,color ='cyan2',
formula = y ~ a(exp^(x*b)-1),method.args = list(start=c(a=1.0,b=0.0)))
p
Unfortunately the approximation line is not being built.I think the problem is in method.args = list(start=c(a=1.0,b=0.0). How to find a, b?
In vah_p_1 is located:
x y
0 4
0.25 5
0.27 6
0,29 7
0.31 8
0.33 10
0.34 13
0.36 16
0.37 20
0.38 23
0.39 28
0.4 37
0.41 43
0.42 55
0.43 67
0.44 81
0.45 94
0.46 118
0.47 143
0.48 187
0.49 225

Area under the curve

I have my data in long-format like this with 20 different variables (but they all have the same Time points):
Time variable value
1 0 P1 0.07
2 1 P1 0.02
3 2 P1 0.12
4 3 P1 0.17
5 4 P1 0.10
6 5 P1 0.17
66 0 P12 0.02
67 1 P12 0.11
68 2 P12 0.20
69 3 P12 0.19
70 4 P12 0.07
71 5 P12 0.20
72 6 P12 0.19
73 7 P12 0.19
74 8 P12 0.12
75 10 P12 0.13
76 12 P12 0.08
77 14 P12 NA
78 24 P12 0.07
79 0 P13 0.14
80 1 P13 0.17
81 2 P13 0.24
82 3 P13 0.24
83 4 P13 0.26
84 5 P13 0.25
85 6 P13 0.21
86 7 P13 0.21
87 8 P13 NA
88 10 P13 0.19
89 12 P13 0.14
90 14 P13 NA
91 24 P13 0.12
I would like to calculate the area under the curve for each variable between time=0 and time=24. Ideally I would also like to calculate area under the curve where y>0.1.
I have tried the pracma package but it just comes out with NA.
trapz(x=P2ROKIlong$Time, y=P2ROKIlong$value)
Do I have to split my data into lots of different vectors and then do it manually or is there a way of getting it out of the long-format data?
The following code runs fine for me:
require(pracma)
df = data.frame(Time =c(0,1,2,3,4,5),value=c(0.07,0.02,0.12,0.17,0.10,0.17))
AUC = trapz(df$Time,df$value)
Is there anything strange (NA's?) in your the rest of your dataframe?
EDIT: New code based on comments
May not be the most efficient, but the size of your data seems limited. This returns a vector AUC_result with the AUC per variable. Does this solve your issue?
require(pracma)
df = data.frame(Time =c(0,1,2,3,4,5),value=c(0.07,0.02,0.12,0.17,NA,0.17),variable = c("P1","P1","P1","P2","P2","P2"))
df=df[!is.na(df$value),]
unique_groups = as.character(unique(df$variable))
AUC_result = c()
for(i in 1:length(unique_groups))
{
df_subset = df[df$variable %in% unique_groups[i],]
AUC = trapz(df_subset$Time,df_subset$value)
AUC_result[i] = AUC
names(AUC_result)[i] = unique_groups[i]
}

ggplot2 with two geoms: remove space between axis/plot area for ONE geom only (or equivalent)

I am producing a plot with 4 facets.
I thought I would attempt to produce just a plot of one part of the data first, and then facet it.
But I am having issues setting up the plot that I want. I think this is primarily because my x-axis was as a factor, but there are issues I cannot get around after converting it to numeric.
The data has a placeholder name right now, HOLD (columns transformation and replicate are factors:
transformation replicate calibration validation difference X1 X2 X3 X4 x1min x1max x2min x2max x3min x3max x4min x4max
1 NSE 1 0.847 0.794 0.053 185.67 0.53 1063.31 1.02 100 1200 -5 3 20 300 1.1 2.9
2 NSE 2 0.758 0.760 -0.002 552.53 0.95 235.70 1.05 100 1200 -5 3 20 300 1.1 2.9
3 NSE 3 0.813 0.817 -0.004 953.37 0.65 225.88 1.01 100 1200 -5 3 20 300 1.1 2.9
4 NSE 4 0.916 0.802 0.114 1232.67 0.86 141.11 1.01 100 1200 -5 3 20 300 1.1 2.9
5 NSE 5 0.787 0.799 -0.012 888.91 1.29 239.85 0.99 100 1200 -5 3 20 300 1.1 2.9
6 NSE 6 0.846 0.760 0.086 996.63 1.93 201.67 0.95 100 1200 -5 3 20 300 1.1 2.9
7 sqrt 1 0.864 0.817 0.047 190.57 0.57 1064.22 1.00 100 1200 -5 3 20 300 1.1 2.9
8 sqrt 2 0.793 0.763 0.030 482.99 1.07 284.29 1.03 100 1200 -5 3 20 300 1.1 2.9
9 sqrt 3 0.820 0.829 -0.009 862.64 0.71 244.69 1.01 100 1200 -5 3 20 300 1.1 2.9
10 sqrt 4 0.922 0.805 0.117 1195.74 0.88 146.52 1.02 100 1200 -5 3 20 300 1.1 2.9
11 sqrt 5 0.805 0.807 -0.002 862.64 1.49 270.43 0.96 100 1200 -5 3 20 300 1.1 2.9
12 sqrt 6 0.855 0.751 0.104 915.67 2.40 248.72 0.93 100 1200 -5 3 20 300 1.1 2.9
13 log 1 0.870 0.802 0.068 192.48 0.49 1085.72 0.99 100 1200 -5 3 20 300 1.1 2.9
14 log 2 0.817 0.734 0.083 186.41 -1.19 746.40 1.03 100 1200 -5 3 20 300 1.1 2.9
15 log 3 0.808 0.812 -0.004 820.57 0.70 247.15 1.02 100 1200 -5 3 20 300 1.1 2.9
16 log 4 0.912 0.780 0.132 1224.15 0.77 130.32 1.03 100 1200 -5 3 20 300 1.1 2.9
17 log 5 0.812 0.793 0.019 828.82 1.66 298.87 0.95 100 1200 -5 3 20 300 1.1 2.9
18 log 6 0.857 0.718 0.139 787.60 2.86 296.08 0.92 100 1200 -5 3 20 300 1.1 2.9
19 inv 1 0.854 0.659 0.195 202.73 0.24 1135.53 0.98 100 1200 -5 3 20 300 1.1 2.9
20 inv 2 0.765 0.622 0.143 186.83 -0.03 689.33 0.97 100 1200 -5 3 20 300 1.1 2.9
21 inv 3 0.689 0.684 0.005 962.95 0.27 175.91 0.98 100 1200 -5 3 20 300 1.1 2.9
22 inv 4 0.867 0.670 0.197 1436.55 0.44 91.84 0.92 100 1200 -5 3 20 300 1.1 2.9
23 inv 5 0.781 0.683 0.098 743.07 1.78 364.78 0.94 100 1200 -5 3 20 300 1.1 2.9
24 inv 6 0.773 0.626 0.147 711.62 2.78 285.22 0.92 100 1200 -5 3 20 300 1.1 2.9
Code for plots:
ggplot(data = HOLD, aes(x = as.numeric(replicate))) +
geom_ribbon(aes(ymin = x1min-1, ymax = x1max+1), alpha = 0.25) +
geom_jitter(aes(y = X1, color = transformation), size = 3, width = 0.125, height = 0) +
scale_x_continuous(breaks = 1:6) +
theme(panel.grid.minor = element_blank())
The plots are essentially x = replicate and y = X#. I'm representing this using geom_jitter, with the colouration from the factor transformation. This all works fine
However, I need to plot over the 80% confidence interval range of these X values; these are in the columns labelled with min and max. I was told that geom_hline() isn't clear enough so I opted to use geom_ribbon(). I'm aware that ribbon only works for a continuous variable so I have converted my replicate factor into numeric.
This does work, but there are gaps at the side. I know I can get rid of them by using expand() but then my values on the jitter geom will be at the edges. Is there some way I can have the ribbon go to the edges of the plot, but not the jitter? Or is there an alternative to using geom_ribbon? I have added some images to illustrate below...
You can use geom_rect instead and set xmin and xmax to -Inf/Inf, but as lots of rectangles will be plotted on top of each other (one for each row), you need to decrease alpha to get the transparency.
ggplot(data = HOLD, aes(x = as.numeric(replicate))) +
geom_rect(aes(ymin = x1min-1, ymax = x1max+1, xmin = -Inf, xmax = Inf), alpha = 0.01) +
geom_jitter(aes(y = X1, color = transformation), size = 3, width = 0.125, height = 0) +
scale_x_continuous(breaks = 1:6) +
theme(panel.grid.minor = element_blank())
You can probably try to get geom_ribbon to work, if you do some transformation to the x-axis coordinates, but the easiest way to achieve your result is to use geom_rect, because it understands the xmin and xmax aesthetics. Setting xmin = -Inf and xmax = Inf ensures that the rectangle will span the whole x-axis.
As your x1min and x1max variables are equal in all rows of the dataset, you only need to draw a single rect, so it's best to add annotate("rect", ...) than geom_rect(...) to your plot.
So all you have to do is change the geom_ribbon line to
annotate("rect", ymin = HOLD$x1min[1]-1, ymax = HOLD$x1max[1]+1,
xmin = -Inf, xmax = Inf, alpha = .25)
Result:

How to scale the dots of a graph based on their p-value in R?

I have a data.frame named df.ordered that looks like:
labels gvs order color pvals
1 Adygei -2.3321916 1 1 0.914
2 Basque -0.8519079 2 1 0.218
3 French -0.9298674 3 1 0.000
4 Italian -2.8859587 4 1 0.024
5 Orcadian -1.4996229 5 1 0.148
6 Russian -1.5597359 6 1 0.626
7 Sardinian -1.4494841 7 1 0.516
8 Tuscan -2.4279528 8 1 0.420
9 Bedouin -3.1717421 9 2 0.914
10 Druze -0.5058627 10 2 0.220
11 Mozabite -2.6491331 11 2 0.200
12 Palestinian -0.7819299 12 2 0.552
13 Balochi -1.4095947 13 3 0.158
14 Brahui -1.2534511 14 3 0.162
15 Burusho 1.7958170 15 3 0.414
16 Hazara 2.2810477 16 3 0.152
17 Kalash -0.9258497 17 3 0.974
18 Makrani -0.9007551 18 3 0.226
19 Pathan 2.5543214 19 3 0.112
20 Sindhi 2.6614486 20 3 0.338
21 Uygur -1.2207974 21 3 0.652
22 Cambodian 2.3706977 22 4 0.118
23 Dai -0.9441980 23 4 0.686
24 Daur -1.0325107 24 4 0.932
25 Han -0.7381369 25 4 0.794
26 Hezhen -2.7590587 26 4 0.182
27 Japanese -0.5644325 27 4 0.366
28 Lahu -0.8449225 28 4 0.560
29 Miao -0.7237586 29 4 0.194
30 Mongola -0.9452944 30 4 0.768
31 Naxi -0.1625003 31 4 0.554
32 Oroqen -1.2035258 32 4 0.782
33 She -2.7758460 33 4 0.912
34 Tu -0.7703779 34 4 0.254
35 Tujia -1.0265275 35 4 0.912
36 Xibo -1.1163019 36 4 0.292
37 Yakut -3.2102686 37 4 0.030
38 Yi -0.9614190 38 4 0.838
39 Colombian -1.9659984 39 5 0.166
40 Karitiana -0.9195156 40 5 0.660
41 Maya 2.1239768 41 5 0.818
42 Pima -3.0895998 42 5 0.818
43 Surui -0.9377928 43 5 0.536
44 Melanesian -1.6961014 44 6 0.414
45 Papuan -0.7037952 45 6 0.386
46 BantuKenya -1.9311354 46 7 0.484
47 BantuSouthAfrica -1.8515908 47 7 0.016
48 BiakaPygmy -1.7657017 48 7 0.538
49 Mandenka -0.5423822 49 7 0.076
50 MbutiPygmy -1.6244801 50 7 0.054
51 San -0.9049735 51 7 0.478
52 Yoruba 2.0949378 52 7 0.904
I have made the following graph
I used the code:
jpeg("test3.jpg", 700,700)
df.ordered$color <- as.factor(df.ordered$color)
levels(df.ordered$color) <- c("blue","yellow3","red","pink","purple","green","orange")
plot(df.ordered$gvs, pch = 19, cex=2, col = as.character(df.ordered$color), xaxt="n")
axis(1, at=1:52, col=as.character(df.ordered$color),labels=df.ordered$labels, las=2)
dev.off()
I now want to scale the dots of the graph to the pvals column. I want the low pvalues to be larger dots, and the higher p-value to be the smaller dots. One issue is that some pvalues are 0. I was thinking of turning all pvals values that are 0.000 to 0.001 to fix this. Does anyone know how to do this? I want the graph to look similar to the graph in figure 5 here: http://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1004412
The cex argument is vectorized, i.e., you can pass in a vector (of the same length of your data to plot). Take this as a simple example:
plot(1:5, cex = 1:5)
Now, it is completely up to you to define a relationship between cex and pvals. How about a + (1 - pvals) * (b - a)? This will map 1-pvals from [0,1] to [a,b]. For example, with a = 1, b = 5, you can try:
cex <- 1 + (1 - df.ordered$pvals) * (5 - 1)
I'm looking to have the p-values between 0.000 and 0.0010 to have cex = ~10, p-values between 0.010 and 0.20 to have cex = ~5, and p-values from 0.20-1.00 to have cex = ~0.5.
I recommend using cut():
fac <- cut(df.ordered$pvals, breaks = c(0, 0.001, 0.2, 1),
labels = c(10, 5, 0.5), right = FALSE)
cex <- c(10, 5, 0.5)[as.integer(fac)]
Adding to #zheyuan-li's answer, here is a normalization that puts the size of the points for p-values "equal" to 0 with size 2, and the point size of observations with p-values "equal" to 1 with size zero:
plot(df.ordered$gvs, pch = 19,
cex=2 * (1-df.ordered$pvals)/(df.ordered$pvals +1),
col = as.character(df.ordered$color), xaxt="n")

Resources