Area under the curve - r

I have my data in long-format like this with 20 different variables (but they all have the same Time points):
Time variable value
1 0 P1 0.07
2 1 P1 0.02
3 2 P1 0.12
4 3 P1 0.17
5 4 P1 0.10
6 5 P1 0.17
66 0 P12 0.02
67 1 P12 0.11
68 2 P12 0.20
69 3 P12 0.19
70 4 P12 0.07
71 5 P12 0.20
72 6 P12 0.19
73 7 P12 0.19
74 8 P12 0.12
75 10 P12 0.13
76 12 P12 0.08
77 14 P12 NA
78 24 P12 0.07
79 0 P13 0.14
80 1 P13 0.17
81 2 P13 0.24
82 3 P13 0.24
83 4 P13 0.26
84 5 P13 0.25
85 6 P13 0.21
86 7 P13 0.21
87 8 P13 NA
88 10 P13 0.19
89 12 P13 0.14
90 14 P13 NA
91 24 P13 0.12
I would like to calculate the area under the curve for each variable between time=0 and time=24. Ideally I would also like to calculate area under the curve where y>0.1.
I have tried the pracma package but it just comes out with NA.
trapz(x=P2ROKIlong$Time, y=P2ROKIlong$value)
Do I have to split my data into lots of different vectors and then do it manually or is there a way of getting it out of the long-format data?

The following code runs fine for me:
require(pracma)
df = data.frame(Time =c(0,1,2,3,4,5),value=c(0.07,0.02,0.12,0.17,0.10,0.17))
AUC = trapz(df$Time,df$value)
Is there anything strange (NA's?) in your the rest of your dataframe?
EDIT: New code based on comments
May not be the most efficient, but the size of your data seems limited. This returns a vector AUC_result with the AUC per variable. Does this solve your issue?
require(pracma)
df = data.frame(Time =c(0,1,2,3,4,5),value=c(0.07,0.02,0.12,0.17,NA,0.17),variable = c("P1","P1","P1","P2","P2","P2"))
df=df[!is.na(df$value),]
unique_groups = as.character(unique(df$variable))
AUC_result = c()
for(i in 1:length(unique_groups))
{
df_subset = df[df$variable %in% unique_groups[i],]
AUC = trapz(df_subset$Time,df_subset$value)
AUC_result[i] = AUC
names(AUC_result)[i] = unique_groups[i]
}

Related

How build a nonlinear approximation?

There was a need to build an approximation of data using the formula
y = a(exp(x/b) - 1) (below the code).
library("ggplot2")
df <- read.table(file='vah_p_1',header =TRUE)
p <- ggplot(df, aes(x = x, y = y)) + geom_point() +
geom_smooth(data = df, method = "nls",size=0.4, se=FALSE,color ='cyan2',
formula = y ~ a(exp^(x*b)-1),method.args = list(start=c(a=1.0,b=0.0)))
p
Unfortunately the approximation line is not being built.I think the problem is in method.args = list(start=c(a=1.0,b=0.0). How to find a, b?
In vah_p_1 is located:
x y
0 4
0.25 5
0.27 6
0,29 7
0.31 8
0.33 10
0.34 13
0.36 16
0.37 20
0.38 23
0.39 28
0.4 37
0.41 43
0.42 55
0.43 67
0.44 81
0.45 94
0.46 118
0.47 143
0.48 187
0.49 225

post code areas plotting in R - how to add legend

I am new to R and need some help.
could you please help me with the below ? I would like to add gradient legend next to the plot from 0 to 1 showing different color as value change, but this is best I was able to get. As well please some tips how to add text with the post code inside of the map ? Thanks.
rm(list=ls())
library(tidyverse)
library(maptools)
library(raster)
library(plotrix)
# collection data set load + post codes lo la - 2016
df2016 <- read.csv('C:/Users/thomas/desktop/coll2016WORKINGFILE.csv')
colnames(df2016) <- c('name','value','amount')
df2016$amount <- NULL
df2016$name <- as.character(df2016$name)
# OPTIONAL: Depending on your data, you may need to rescale it for the color ramp to work
df2016$value <- rescale(df2016$value, newrange = c(0, 1))
# Download a shapefile of postal codes into your working directory
download.file(
"http://www.opendoorlogistics.com/wp-content/uploads/Data/UK-postcode-boundaries-Jan-2015.zip",
"postal_shapefile"
)
# Unzip the shapefile
unzip("postal_shapefile")
# Read the shapefile
postal <- readShapeSpatial("./Distribution/Areas")
postal.df <- fortify(postal, region = "name")
# Join your data to the shapefile
postal <- raster::merge(postal, df2016, by = "name")
postal$value[is.na(postal$value)] <- 0.50
# Get centroids of spatialPolygonDataFrame and convert to dataframe
# for use in plotting area names.
postal.centroids.df <- data.frame(long = coordinates(postal)[, 1],
lat = coordinates(postal)[, 2],
id=postal$name,
ratio = postal$value)
plot(postal, col = gray(postal$value))
title("UK Success Rate")
legend("right",NULL,legend = postal$value,col = gray(postal$value))
Original dataset from csv has below 3 columns:
Row Labels Success/Failed ratio N of coll
LD 1 3
ZE 1 2
WS 0.79 19
ML 0.75 12
HS 0.75 4
TQ 0.74 38
WN 0.73 15
CA 0.71 28
HU 0.7 33
FY 0.69 16
HG 0.69 16
IV 0.68 19
DL 0.68 25
CB 0.68 115
TS 0.67 46
IP 0.67 87
AB 0.67 66
NP 0.67 45
FK 0.67 18
IM 0.67 9
SM 0.66 50
HD 0.66 32
EN 0.66 61
CO 0.65 52
ME 0.65 54
PE 0.64 266
EX 0.64 81
WV 0.63 49
JE 0.63 24
NE 0.62 148
YO 0.62 47
DE 0.62 78
LN 0.61 36
SN 0.61 109
IG 0.6 63
NR 0.6 90
SP 0.59 37
BA 0.59 93
UB 0.59 127
TN 0.59 95
BT 0.59 180
BD 0.59 51
HP 0.59 126
TA 0.59 46
PO 0.58 113
DH 0.58 55
WD 0.58 102
BH 0.57 96
DG 0.57 14
CV 0.57 225
RG 0.57 255
BN 0.56 158
DY 0.56 48
HA 0.56 148
W 0.56 359
WA 0.56 77
DA 0.55 38
CT 0.55 62
GU 0.55 231
RH 0.55 132
BL 0.55 33
HX 0.55 11
BS 0.54 184
SS 0.54 46
EH 0.54 185
DT 0.54 37
G 0.54 137
B 0.54 283
LU 0.54 41
NG 0.54 97
OX 0.53 208
S 0.53 179
CM 0.53 100
DD 0.53 17
GL 0.53 87
AL 0.53 89
HR 0.53 38
LS 0.52 122
TF 0.52 21
RM 0.52 44
SL 0.52 155
MK 0.52 136
SY 0.52 46
DN 0.52 81
N 0.52 191
M 0.52 226
SR 0.52 29
SK 0.52 64
BB 0.51 140
KY 0.51 41
WF 0.51 51
PR 0.51 63
L 0.51 81
KT 0.5 185
CF 0.5 118
ST 0.5 84
TR 0.5 46
CW 0.5 44
TD 0.5 12
P 0.5 2
SW 0.5 317
LL 0.49 49
CH 0.49 43
E 0.49 275
EC 0.48 364
PA 0.48 27
SO 0.48 157
CR 0.48 84
PL 0.48 61
SG 0.47 59
KA 0.47 15
LA 0.47 43
SA 0.46 78
LE 0.46 194
TW 0.45 125
OL 0.44 41
SE 0.44 297
NN 0.43 143
NW 0.42 236
WC 0.41 138
WR 0.38 73
BR 0.37 62
GY 0.26 35
PH 0.23 13
Here you go. Use sf with new ggplot or stuff from my misc package for base graphs.
# collection data set load + post codes lo la - 2016
df2016 <- read.table(stringsAsFactors=FALSE, header=TRUE, text="
name value amount
LD 1 3
ZE 1 2
WS 0.79 19
# YOUR OTHER VALUES FROM ABOVE
PH 0.23 13")
if(FALSE){ # don't run when sourcing file
# Download a shapefile of postal codes into your working directory
download.file(
"http://www.opendoorlogistics.com/wp-content/uploads/Data/UK-postcode-boundaries-Jan-2015.zip",
"postal_shapefile.zip"
)
# Unzip and read the shapefile
unzip("postal_shapefile.zip")
}
# install.packages("sf")
postal <- sf::st_read("Distribution/Areas.shp")
# Join your data to the shapefile
postal2 <- merge(postal, df2016, by="name")
#devtools::install_github("tidyverse/ggplot2") # need newer ggplot2 version for geom_sf
library(ggplot2)
ggplot(postal2) + geom_sf(aes(fill = value))
# Want to remain in base graphs?
#install.packages("berryFunctions")
library(berryFunctions)
cols <- seqPal(n=100)
cls <- classify(postal2$value, breaks=100)$index
plot(postal2[,c("value","geometry")], col=cols[cls], graticule=TRUE, axes=TRUE) # ?sf::plot_sf
colPointsLegend(postal2$value, colors=cols, horizontal=FALSE, title="UK value")

How to fill out white/missing parts of the map in R?

This code below creates map of UK postcodes using ggplot, however leaves some of the parts white/missing from the map, could you please advise how to make sure that whole map is filled and that the postcode areas have a border ? Thanks.
MAP OF UK from the below code
rm(list=ls())
library(tidyverse)
library(maptools)
library(raster)
library(plotrix)
library(ggrepel)
df2016 <- read.table(stringsAsFactors=FALSE, header=TRUE, text="
name value amount
LD 1 3
ZE 1 2
WS 0.79 19
ML 0.75 12
HS 0.75 4
TQ 0.74 38
WN 0.73 15
CA 0.71 28
HU 0.7 33
FY 0.69 16
HG 0.69 16
IV 0.68 19
DL 0.68 25
CB 0.68 115
TS 0.67 46
IP 0.67 87
AB 0.67 66
NP 0.67 45
FK 0.67 18
IM 0.67 9
SM 0.66 50
HD 0.66 32
EN 0.66 61
CO 0.65 52
ME 0.65 54
PE 0.64 266
EX 0.64 81
WV 0.63 49
JE 0.63 24
NE 0.62 148
YO 0.62 47
DE 0.62 78
LN 0.61 36
SN 0.61 109
IG 0.6 63
NR 0.6 90
SP 0.59 37
BA 0.59 93
UB 0.59 127
TN 0.59 95
BT 0.59 180
BD 0.59 51
HP 0.59 126
TA 0.59 46
PO 0.58 113
DH 0.58 55
WD 0.58 102
BH 0.57 96
DG 0.57 14
CV 0.57 225
RG 0.57 255
BN 0.56 158
DY 0.56 48
HA 0.56 148
W 0.56 359
WA 0.56 77
DA 0.55 38
CT 0.55 62
GU 0.55 231
RH 0.55 132
BL 0.55 33
HX 0.55 11
BS 0.54 184
SS 0.54 46
EH 0.54 185
DT 0.54 37
G 0.54 137
B 0.54 283
LU 0.54 41
NG 0.54 97
OX 0.53 208
S 0.53 179
CM 0.53 100
DD 0.53 17
GL 0.53 87
AL 0.53 89
HR 0.53 38
LS 0.52 122
TF 0.52 21
RM 0.52 44
SL 0.52 155
MK 0.52 136
SY 0.52 46
DN 0.52 81
N 0.52 191
M 0.52 226
SR 0.52 29
SK 0.52 64
BB 0.51 140
KY 0.51 41
WF 0.51 51
PR 0.51 63
L 0.51 81
KT 0.5 185
CF 0.5 118
ST 0.5 84
TR 0.5 46
CW 0.5 44
TD 0.5 12
P 0.5 2
SW 0.5 317
LL 0.49 49
CH 0.49 43
E 0.49 275
EC 0.48 364
PA 0.48 27
SO 0.48 157
CR 0.48 84
PL 0.48 61
SG 0.47 59
KA 0.47 15
LA 0.47 43
SA 0.46 78
LE 0.46 194
TW 0.45 125
OL 0.44 41
SE 0.44 297
NN 0.43 143
NW 0.42 236
WC 0.41 138
WR 0.38 73
BR 0.37 62
GY 0.26 35
PH 0.23 13
")
#df2016$amount <- NULL
df2016$name <- as.character(df2016$name)
# Download a shapefile of postal codes into your working directory
download.file(
"http://www.opendoorlogistics.com/wp-content/uploads/Data/UK-postcode-boundaries-Jan-2015.zip",
"postal_shapefile"
)
# Unzip the shapefile
unzip("postal_shapefile")
# Read the shapefile
postal <- readShapeSpatial("./Distribution/Areas")
postal.df <- fortify(postal, region = "name")
# Join your data to the shapefile
colnames(postal.df)[colnames(postal.df) == "id"] <- "name"
postal.df <- raster::merge(postal.df, df2016, by = "name")
postal.df$value[is.na(postal.df$value)] <- 0.50
# Get centroids of spatialPolygonDataFrame and convert to dataframe
# for use in plotting area names.
postal.centroids.df <- data.frame(long = coordinates(postal)[, 1],
lat = coordinates(postal)[, 2],
id=postal$name)
p <- ggplot(postal.df, aes(x = long, y = lat, group = group)) + geom_polygon(aes(fill = cut(value,5))) +
geom_text_repel(data = postal.centroids.df, aes(label = id, x = long, y = lat, group = id), size = 3, check_overlap = T) +
labs(x=" ", y=" ") +
theme_bw() + scale_fill_brewer('Success Rate 2016', palette = 15) +
coord_map() +
theme(panel.grid.minor=element_blank(), panel.grid.major=element_blank()) +
theme(axis.ticks = element_blank(), axis.text.x = element_blank(), axis.text.y = element_blank()) +
theme(panel.border = element_blank())
p
Try arranging the postal code by name or number just before plotting
postal.centroids.df %>%
arrange(id)
My county maps of the US did the same thing when they weren't in order. If that doesn't work try by lat or long as well.
Solution was to use left_join from dplyr instead of merge:
rm(list=ls())
library(tidyverse)
library(maptools)
library(raster)
library(plotrix)
library(ggrepel)
df2016 <- read.table(stringsAsFactors=FALSE, header=TRUE, text="
name value amount
LD 1 3
ZE 1 2
WS 0.79 19
ML 0.75 12
HS 0.75 4
TQ 0.74 38
WN 0.73 15
CA 0.71 28
HU 0.7 33
FY 0.69 16
HG 0.69 16
IV 0.68 19
DL 0.68 25
CB 0.68 115
TS 0.67 46
IP 0.67 87
AB 0.67 66
NP 0.67 45
FK 0.67 18
IM 0.67 9
SM 0.66 50
HD 0.66 32
EN 0.66 61
CO 0.65 52
ME 0.65 54
PE 0.64 266
EX 0.64 81
WV 0.63 49
JE 0.63 24
NE 0.62 148
YO 0.62 47
DE 0.62 78
LN 0.61 36
SN 0.61 109
IG 0.6 63
NR 0.6 90
SP 0.59 37
BA 0.59 93
UB 0.59 127
TN 0.59 95
BT 0.59 180
BD 0.59 51
HP 0.59 126
TA 0.59 46
PO 0.58 113
DH 0.58 55
WD 0.58 102
BH 0.57 96
DG 0.57 14
CV 0.57 225
RG 0.57 255
BN 0.56 158
DY 0.56 48
HA 0.56 148
W 0.56 359
WA 0.56 77
DA 0.55 38
CT 0.55 62
GU 0.55 231
RH 0.55 132
BL 0.55 33
HX 0.55 11
BS 0.54 184
SS 0.54 46
EH 0.54 185
DT 0.54 37
G 0.54 137
B 0.54 283
LU 0.54 41
NG 0.54 97
OX 0.53 208
S 0.53 179
CM 0.53 100
DD 0.53 17
GL 0.53 87
AL 0.53 89
HR 0.53 38
LS 0.52 122
TF 0.52 21
RM 0.52 44
SL 0.52 155
MK 0.52 136
SY 0.52 46
DN 0.52 81
N 0.52 191
M 0.52 226
SR 0.52 29
SK 0.52 64
BB 0.51 140
KY 0.51 41
WF 0.51 51
PR 0.51 63
L 0.51 81
KT 0.5 185
CF 0.5 118
ST 0.5 84
TR 0.5 46
CW 0.5 44
TD 0.5 12
P 0.5 2
SW 0.5 317
LL 0.49 49
CH 0.49 43
E 0.49 275
EC 0.48 364
PA 0.48 27
SO 0.48 157
CR 0.48 84
PL 0.48 61
SG 0.47 59
KA 0.47 15
LA 0.47 43
SA 0.46 78
LE 0.46 194
TW 0.45 125
OL 0.44 41
SE 0.44 297
NN 0.43 143
NW 0.42 236
WC 0.41 138
WR 0.38 73
BR 0.37 62
GY 0.26 35
PH 0.23 13
")
# Download a shapefile of postal codes into your working directory
download.file(
"http://www.opendoorlogistics.com/wp-content/uploads/Data/UK-postcode-boundaries-Jan-2015.zip",
"postal_shapefile"
)
# Unzip the shapefile
unzip("postal_shapefile")
# Read the shapefile
postal <- readShapeSpatial("./Distribution/Areas")
postal.df <- fortify(postal, region = "name")
# Join your data to the shapefile
colnames(postal.df)[colnames(postal.df) == "id"] <- "name"
library(dplyr)
test <- left_join(postal.df, df2016, by = "name", copy = FALSE)
#postal.df <- raster::merge(postal.df, df2016, by = "name")
test$value[is.na(test$value)] <- 0.50
# for use in plotting area names.
postal.centroids.df <- data.frame(long = coordinates(postal)[, 1],
lat = coordinates(postal)[, 2],
id=postal$name)
p <- ggplot(test, aes(x = long, y = lat, group = group)) + geom_polygon(aes(fill = cut(value,5))) +
geom_text_repel(data = postal.centroids.df, aes(label = id, x = long, y = lat, group = id), size = 3, check_overlap = T) +
labs(x=" ", y=" ") +
theme_bw() + scale_fill_brewer('Success Rate 2016', palette = 15) +
coord_map() +
theme(panel.grid.minor=element_blank(), panel.grid.major=element_blank()) +
theme(axis.ticks = element_blank(), axis.text.x = element_blank(), axis.text.y = element_blank()) +
theme(panel.border = element_blank())
p

How to barplot in R using the first column as data labels

Below is my data, with headers.
Using R, I would like to barplot() this data using the value in the S column as the label.
S Value
10 0.00
20 0.00
30 0.00
40 0.01
50 0.71
60 4.97
70 13.22
80 22.95
90 32.93
100 42.93
I'm scouring the help files, but I can't seem to find an example of this seemingly simple task.
This will quickly resolve your problem, but then you'll have to add details to set the graph details layout:
Your example:
S <- c(10, 20,30,40,50,6,70,80,90,100)
Value <- c(0.00,0.00,0.00,0.01,0.71,4.97,13.22,22.95,32.93,42.93)
df <- do.call(rbind, Map(data.frame, S=S, Value=Value))
df
S Value
1 10 0.00
2 20 0.00
3 30 0.00
4 40 0.01
5 50 0.71
6 6 4.97
7 70 13.22
8 80 22.95
9 90 32.93
10 100 42.93
barplot(df$Value, names.arg = df$S)

Error: could not find function "stat_sum_single" [ggplot2]

I'm trying to use "stat_sum_single" with a factor variable but I get the error:
Error: could not find function "stat_sum_single"
I tried converting the factor variable to a numeric but it doesn't seem to work - any ideas?
Full code:
ggplot(sn, aes(x = person,y = X, group=Plan, colour = Plan)) +
geom_line(size=0.5) +
scale_y_continuous(limits = c(0, 1.5)) +
scale_x_discrete(breaks = c(0,50,100), labels= c(0,50,100)) +
labs(x = "X",y = "%") +
stat_sum_single(mean, geom = 'line', aes(x = as.numeric(as.character(person))), size = 3, colour = 'red')
Data:
Plan person X m mad mmad
1 1 95 0.323000 0.400303 0.12
1 2 275 0.341818 0.400303 0.12
1 3 2 0.618000 0.400303 0.12
1 4 75 0.320000 0.400303 0.12
1 5 13 0.399000 0.400303 0.12
1 6 20 0.400000 0.400303 0.12
2 7 219 0.393000 0.353350 0.45
2 8 50 0.060000 0.353350 0.45
2 9 213 0.390000 0.353350 0.45
2 15 204 0.496100 0.353350 0.45
2 19 19 0.393000 0.353350 0.45
2 24 201 0.388000 0.353350 0.45
3 30 219 0.567 0.1254 0.89
3 14 50 0.679 0.1254 0.89
3 55 213 0.1234 0.1254 0.89
3 18 204 0.6135 0.1254 0.89
3 59 19 0.39356 0.1254 0.89
3 101 201 0.300 0.1254 0.89
Person is a factor variable.
Function stat_sum_single() isn't directly implemented in library ggplot2 but this function should be defined before using as shown in the help file of function stat_summary().
stat_sum_single <- function(fun, geom="point", ...) {
stat_summary(fun.y=fun, colour="red", geom=geom, size = 3, ...)
}
Here is the ggplot2 cran package:
http://cran.r-project.org/web/packages/ggplot2/ggplot2.pdf
on page 185, there is an example of using stat_sum_single.
I believe you need to somehow define it first in stat_summary.

Resources