I have the folloing R code which visualizes a multiline graph where each line corresponds to a category of data. In the code the categories are given my the variable nk:
My dataset looks like this :
k precision recall
0.25 0.02 1.011
0.25 0.04 1.011
0.5 0.15 0.941
0.5 0.17 0.931
0.5 0.18 0.921
0.5 0.19 0.911
1.0 0.36 0.831
1.0 0.39 0.811
1.0 0.41 0.801
The problem is that it only visualizes the lines for k = 1.0 and not the lines for k = 0.5 and 0.25
My question is ? How can i use a nk variable which is not
an integer in order to visualize lines for k = 0.5 or 0.25?
dtf$k <- as.numeric(dtf$k)
nk <- max(dtf$k)
xrange <- range(dtf$precision)
yrange <- range(dtf$recall)
plot(xrange, yrange,
type="n",
xlab="Precision",
ylab="Recall"
)
colors <- rainbow(nk)
linetype <- c(1:nk)
plotchar <- seq(18, 18+nk, 1)
for (i in 1:nk) {
Ki <- subset(dtf, k==i)
lines(Ki$precision, Ki$recall,
type="b",
lwd=2,
lty=linetype[i],
col=colors[i],
pch=plotchar[i]
)
}
title("Methods varying K", "Precision Recall")
legend(xrange[1], yrange[2],
1:nk,
cex=1.0,
col=colors,
inset=c(-0.2,0),
pch=plotchar,
lty=linetype,
title="k"
)
Data
dtf <- read.table(header = TRUE, text = 'k precision recall
0.25 0.02 1.011
0.25 0.04 1.011
0.5 0.15 0.941
0.5 0.17 0.931
0.5 0.18 0.921
0.5 0.19 0.911
1.0 0.36 0.831
1.0 0.39 0.811
1.0 0.41 0.801')
dtf$k <- factor(dtf$k)
ggplot2 solution
require(ggplot2)
ggplot(dtf, aes(x = precision, y = recall, col = k)) +
geom_line()
base solution
plot(recall ~ precision, data = dtf, type = 'n')
cols = c('red', 'blue', 'green')
levs <- levels(df$k)
for(i in seq_along(levs)){
take <- df[df$k == levs[i], ]
lines(take$precision, take$recall, col = cols[i])
}
Related
My code is to plot NDVI versus time. here is the code below
ggplot(data = EdinburghNDVI, aes(x = EdinburghNDVIDate, y = NDVI)) +
geom_point(color = "blue") +
labs(title = "Edinburgh NDVI",
x = "Date",
y = "NDVI")
When I try to add ylim I get an error saying that there is a discrete value supplied to continuous scale.
Representative data
> head(EdinburghNDVIDate)
[1] "2000-02-24" "2000-02-25" "2000-02-26" "2000-02-27" "2000-02-28" "2000-02-29"
> head(EdinburghNDVI$NDVI)
[1] 0.39 0.48 0.47 0.47 0.47 0.47 82
Levels: -0.07 -0.08 -0.23 -0.24 -0.35 #DIV/0! 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.10 0.11 ... 0.76 –
Working with levelplot in lattice, I have figured out how to display the corresponding value of each cell. For a matrix m:
myPanel <- function(x,y,z, ...){
panel.levelplot(x,y,z,...)
panel.text(x,y, round(m,2),col=bw[col.m])
}
levelplot(m, col.regions=col.range, colorkey=NULL, xlab=NULL, ylab=NULL,
scales = list(x = list(draw = FALSE), y = list(draw = FALSE)),
panel= myPanel)
The rounded matrix values are
round(m,2)
13 14 15 16 17 18
GDcsp -0.44 -0.34 -0.39 -0.35 -0.53 -0.60
GDsor 0.14 0.07 0.03 0.01 0.06 0.09
GDdup 0.43 0.36 0.34 0.36 0.46 0.52
GDhsw 0.22 0.05 0.11 0.00 0.20 0.26
Gdwpa 0.17 0.25 0.32 0.37 0.46 0.47
The problem is that -0.60 and 0.00 are displayed in the corresponding cell as 0.6 and 0, respectively, while I would like to have all numbers with two decimals. Any idea to solve this would be most welcome.
myPanel <- function(x,y,z, ...){
panel.levelplot(x,y,z,...)
panel.text(x,y, sprintf("%.2f", m))
}
levelplot(m, colorkey=NULL, xlab=NULL, ylab=NULL,
scales = list(x = list(draw = FALSE), y = list(draw = FALSE)),
panel= myPanel)
You can use sprintf to force the output to be 2 decimal places.
I have a small table of summary data with the odds ratio, upper and lower confidence limits for four categories, with six levels within each category. I'd like to produce a chart using ggplot2 that looks similar to the usual one created when you specify a lm and it's se, but I'd like R just to use the pre-specified values I have in my table. I've managed to create the line graph with error bars, but these overlap and make it unclear. The data look like this:
interval OR Drug lower upper
14 0.004 a 0.002 0.205
30 0.022 a 0.001 0.101
60 0.13 a 0.061 0.23
90 0.22 a 0.14 0.34
180 0.25 a 0.17 0.35
365 0.31 a 0.23 0.41
14 0.84 b 0.59 1.19
30 0.85 b 0.66 1.084
60 0.94 b 0.75 1.17
90 0.83 b 0.68 1.01
180 1.28 b 1.09 1.51
365 1.58 b 1.38 1.82
14 1.9 c 0.9 4.27
30 2.91 c 1.47 6.29
60 2.57 c 1.52 4.55
90 2.05 c 1.31 3.27
180 2.422 c 1.596 3.769
365 2.83 c 1.93 4.26
14 0.29 d 0.04 1.18
30 0.09 d 0.01 0.29
60 0.39 d 0.17 0.82
90 0.39 d 0.2 0.7
180 0.37 d 0.22 0.59
365 0.34 d 0.21 0.53
I have tried this:
limits <- aes(ymax=upper, ymin=lower)
dodge <- position_dodge(width=0.9)
ggplot(data, aes(y=OR, x=days, colour=Drug)) +
geom_line(stat="identity") +
geom_errorbar(limits, position=dodge)
and searched for a suitable answer to create a pretty plot, but I'm flummoxed!
Any help greatly appreciated!
You need the following lines:
p<-ggplot(data=data, aes(x=interval, y=OR, colour=Drug)) + geom_point() + geom_line()
p<-p+geom_ribbon(aes(ymin=data$lower, ymax=data$upper), linetype=2, alpha=0.1)
Here is a base R approach using polygon() since #jmb requested a solution in the comments. Note that I have to define two sets of x-values and associated y values for the polygon to plot. It works by plotting the outer perimeter of the polygon. I define plot type = 'n' and use points() separately to get the points on top of the polygon. My personal preference is the ggplot solutions above when possible since polygon() is pretty clunky.
library(tidyverse)
data('mtcars') #built in dataset
mean.mpg = mtcars %>%
group_by(cyl) %>%
summarise(N = n(),
avg.mpg = mean(mpg),
SE.low = avg.mpg - (sd(mpg)/sqrt(N)),
SE.high =avg.mpg + (sd(mpg)/sqrt(N)))
plot(avg.mpg ~ cyl, data = mean.mpg, ylim = c(10,30), type = 'n')
#note I have defined c(x1, x2) and c(y1, y2)
polygon(c(mean.mpg$cyl, rev(mean.mpg$cyl)),
c(mean.mpg$SE.low,rev(mean.mpg$SE.high)), density = 200, col ='grey90')
points(avg.mpg ~ cyl, data = mean.mpg, pch = 19, col = 'firebrick')
I have been trying to produce a scatter plot with two levels of alpha applied to dots that are above or below a score threshold. To do so, I am storing the alpha value for each point in a vector, item_alpha, within the data frame and supplying this vector as the argument for alpha in my call to geom_point:
library( ggplot2 );
library( scales );
one.data <- read.table("test.data", header = TRUE)
p1 <- ggplot( data = one.data )
p1 <- p1 + geom_point( aes( plot_X, plot_Y, colour = log10_p_value, size = plot_size, alpha = item_alpha ) )
p1 <- p1 + scale_colour_gradientn( colours = c("red", "yellow", "green", "blue"), limits = c( min(one.data$log10_p_value), max(one.data$log10_p_value)));
p1 <- p1 + geom_point( aes(plot_X, plot_Y, size = plot_size), shape = 21, fill = "transparent", colour = I (alpha ("black", 0.6) ));
p1 <- p1 + scale_size( range=c(5, 30)) + theme_bw();
one.x_range = max(one.data$plot_X) - min(one.data$plot_X);
one.y_range = max(one.data$plot_Y) - min(one.data$plot_Y);
p1 <- p1 + xlim(min(one.data$plot_X) one.x_range/10,max(one.data$plot_X)+one.x_range/10);
p1 <- p1 + ylim(min(one.data$plot_Y)one.y_range/10,max(one.data$plot_Y)+one.y_range/10);
p1
However, it seems alpha is only being set properly for the eight points with the smaller value, while the remaining points remain opaque. I've consulted the ggplot documentation, played with the examples and tried some other variations which have mostly produced various errors and I'm really hoping someone will have some insight on this! Thanks in advance!
Contents of test.data:
"plot_X" "plot_Y" "plot_size" "log10_p_value" "item_alpha"
5.326 3.194 4.411 -27.3093 0.6
-2.148 7.469 3.434 -12.3487 0.6
-6.14 -2.796 3.062 -22.8069 0.6
3.648 6.091 3.597 -15.5032 0.6
0.356 -6.925 3.95 -10.4754 0.6
5.532 -0.135 3.246 -19.2883 0.6
3.794 -2.279 3.557 -16.4438 0.6
-3.784 1.42 2.914 -17.9687 0.6
-7.645 -1.571 3.163 -12.4498 0.6
-1.526 -4.756 3.509 -10.8972 0.6
-6.461 2.293 2.962 -13.4306 0.6
-5.806 0.983 4.38 -24.5422 0.6
-3.592 0.769 2.971 -17.8119 0.6
0.127 3.572 3.603 -11.4277 0.6
-0.566 0.706 3.77 -13.0952 0.3
2.25 -2.604 0.845 -11.7949 0.3
-7.845 -0.927 3.21 -12.6408 0.3
1.084 -6.691 3.654 -10.7319 0.3
-3.546 6.46 2.994 -11.6777 0.3
-5.478 -0.645 4.256 -17.7344 0.3
-6.251 -0.418 4.273 -19.29 0.3
-3.855 5.969 3.236 -10.9057 0.3
0.345 0.971 3.383 -11.5973 0.6
0.989 0.345 2.959 -10.8252 0.6
You're using a distinctly base plotting approach with ggplot2, which is obviously not the right way to go. Here are two options:
dat <- read.table(text = "plot_X plot_Y plot_size log10_p_value item_alpha
5.326 3.194 4.411 -27.3093 0.6
-2.148 7.469 3.434 -12.3487 0.6
-6.14 -2.796 3.062 -22.8069 0.6
3.648 6.091 3.597 -15.5032 0.6
0.356 -6.925 3.95 -10.4754 0.6
5.532 -0.135 3.246 -19.2883 0.6
3.794 -2.279 3.557 -16.4438 0.6
-3.784 1.42 2.914 -17.9687 0.6
-7.645 -1.571 3.163 -12.4498 0.6
-1.526 -4.756 3.509 -10.8972 0.6
-6.461 2.293 2.962 -13.4306 0.6
-5.806 0.983 4.38 -24.5422 0.6
-3.592 0.769 2.971 -17.8119 0.6
0.127 3.572 3.603 -11.4277 0.6
-0.566 0.706 3.77 -13.0952 0.3
2.25 -2.604 0.845 -11.7949 0.3
-7.845 -0.927 3.21 -12.6408 0.3
1.084 -6.691 3.654 -10.7319 0.3
-3.546 6.46 2.994 -11.6777 0.3
-5.478 -0.645 4.256 -17.7344 0.3
-6.251 -0.418 4.273 -19.29 0.3
-3.855 5.969 3.236 -10.9057 0.3
0.345 0.971 3.383 -11.5973 0.6
0.989 0.345 2.959 -10.8252 0.6",header = TRUE)
dat$alpha_grp <- ifelse(dat$item_alpha == 0.6,'High','Low')
#If you want a legend; although you can suppress the legend
# here if you want.
ggplot(data = dat,aes(x = plot_X,y = plot_Y)) +
geom_point(aes(alpha = alpha_grp)) +
scale_alpha_manual(values = c(0.3,0.6))
#If you don't care about a legend
ggplot() +
geom_point(data = dat[dat$alpha_grp == 'High',],
aes(x = plot_X,y = plot_Y),alpha = 0.6) +
geom_point(data = dat[dat$alpha_grp == 'Low',],
aes(x = plot_X,y = plot_Y),alpha = 0.3)
I have estimates of odds ratio with corresponding 95% CI of six pollutants overs 4 lag periods. How can I create a vertical plot similar to the attached figure in R? The figure below was created in SPSS.
Sample data that produced the figure is the following:
lag pollut or lcl ucl
0 CO 0.97 0.90 1.06
0 PM10 1.00 0.91 1.09
0 NO 0.97 0.92 1.02
0 NO2 1.01 0.89 1.15
0 SO2 0.97 0.85 1.11
0 Ozone 1.00 0.87 1.15
1 CO 1.03 0.95 1.10
1 PM10 0.93 0.86 1.01
1 NO 1.01 0.97 1.06
1 NO2 1.08 0.97 1.20
1 SO2 0.94 0.84 1.04
1 Ozone 0.94 0.84 1.04
2 CO 1.09 1.02 1.16
2 PM10 1.04 0.96 1.13
2 NO 1.04 1.00 1.08
2 NO2 1.07 0.96 1.18
2 SO2 1.05 0.95 1.17
2 Ozone 0.93 0.84 1.03
3 CO 0.98 0.91 1.06
3 PM10 1.14 1.05 1.24
3 NO 0.99 0.95 1.04
3 NO2 1.01 0.91 1.12
3 SO2 1.11 1.00 1.23
3 Ozone 1.00 0.90 1.11
You can also do this with ggplot2. The code is somewhat shorter:
dat <- read.table("clipboard", header = T)
dat$lag <- paste0("L", dat$lag)
library(ggplot2)
ggplot(dat, aes(x = pollut, y = or, ymin = lcl, ymax = ucl)) + geom_pointrange(aes(col = factor(lag)), position=position_dodge(width=0.30)) +
ylab("Odds ratio & 95% CI") + geom_hline(aes(yintercept = 1)) + scale_color_discrete(name = "Lag") + xlab("")
EDIT: Here is a version is closer to the SPSS figure:
ggplot(dat, aes(x = pollut, y = or, ymin = lcl, ymax = ucl)) + geom_linerange(aes(col = factor(lag)), position=position_dodge(width=0.30)) +
geom_point(aes(shape = factor(lag)), position=position_dodge(width=0.30)) + ylab("Odds ratio & 95% CI") + geom_hline(aes(yintercept = 1)) + xlab("")
Assuming your data are in datf...
I'd sort it first into just what you want order wise.
datf <- datf[order(datf$pollut, datf$lag), ]
You want a space before and after every lab grouping so I'd add some extra rows in that are NA. That makes it easier because then you'll automatically have blanks in your plot calls.
datfPlusNA <- lapply(split(datf, datf$pollut), function(x) rbind(NA, x, NA))
datf <- do.call(rbind, datfPlusNA)
Now that you have your data.frame sorted and with the extra NAs the plotting is easy.
nr <- nrow(datf) # find out how many rows all together
with(datf, {# this allows entering your commands more succinctly
# first you could set up the plot so you can select the order of drawing
plot(1:nr, or, ylim = c(0.8, 1.3), type = 'n', xaxt = 'n', xlab = '', ylab = 'Odds Ratio and 95% CI', frame.plot = TRUE, panel.first = grid(nx = NA, ny = NULL))
# arrows(1:nr, lcl, 1:nr, ucl, length = 0.02, angle = 90, code = 3, col = factor(lag))
# you could use arrows above but you don't want ends so segments is easier
segments(1:nr, lcl, 1:nr, ucl, col = factor(lag))
# add your points
points(1:nr, or, pch = 19, cex = 0.6)
xLabels <- na.omit(unique(pollut))
axis(1, seq(4, 34, by = 6) - 0.5, xLabels)
})
abline(h = 1.0)
There are packages that make this kind of thing easier but if you can do it like this you can start doing any graphs that you can imagine.