R: interpolate a value from dataframe based on two inputs - r

I have a data frame that looks like this:
Teff logg M_div_H U B V R I J H K L Lprime M
1 2000 4.0 -0.1 -13.443 -11.390 -7.895 -4.464 -1.831 1.666 3.511 2.701 4.345 4.765 5.680
2 2000 4.5 -0.1 -13.402 -11.416 -7.896 -4.454 -1.794 1.664 3.503 2.728 4.352 4.772 5.687
3 2000 5.0 -0.1 -13.358 -11.428 -7.888 -4.431 -1.738 1.664 3.488 2.753 4.361 4.779 5.685
4 2000 5.5 -0.1 -13.220 -11.079 -7.377 -4.136 -1.483 1.656 3.418 2.759 4.355 4.753 5.638
5 2200 3.5 -0.1 -11.866 -9.557 -6.378 -3.612 -1.185 1.892 3.294 2.608 3.929 4.289 4.842
6 2200 4.5 -0.1 -11.845 -9.643 -6.348 -3.589 -1.132 1.874 3.310 2.648 3.947 4.305 4.939
...
Let's say I have two values:
input_Teff = 4.8529282904170595E+003
input_log_g = 1.9241934741026787E+000
Notice how every V value has a unique Teff, logg combination. From the input values, I would like to interpolate a value for V. Is there a way to do this in R?
Edit 1: Here is the link to the full data frame: https://www.dropbox.com/s/prbceabxmd25etx/lcb98cor.dat?dl=0

Building on Ian Campbell's observation that you can consider your data as points on a two-dimensional plane, you can use spatial interpolation methods. The simplest approach is inverse-distance weighting, which you can implement like this
library(data.table)
d <- fread("https://www.dropbox.com/s/prbceabxmd25etx/lcb98cor.dat?dl=1")
setnames(d,"#Teff","Teff")
First rescale the data as appropriate (not shown here, see Ian's answer)
library(gstat)
# fit model
idw <- gstat(id="V", formula = V~1, locations = ~Teff+logg, data=d, nmax=7, set=list(idp = .5))
# new "points" to predict to
newd <- data.frame(Teff=c(4100, 4852.928), logg=c(1.5, 1.9241934741026787))
p <- predict(idw, newd)
#[inverse distance weighted interpolation]
p$V.pred
#[1] -0.9818571 -0.3602857
For higher dimensions you could use fields::Tps (I think you can force that to be an exact method, that is, exactly honor the observations, by making each observation a node)

We can imagine that Teff and logg exist in a 2-dimensional plane. We can see that your input point exists in that same space:
library(tidyverse)
ggplot(data,aes(x = Teff, y = logg)) +
geom_point() +
geom_point(data = data.frame(Teff = 4.8529282904170595e3, logg = 1.9241934741026787),
color = "orange")
However, we can see the scale of Teff and logg are not the same. Simply taking log(Teff) gets us pretty close, but not quite. So we can rescale between 0 and 1 instead. We can create a custom rescale function. It will become clear why we can't use scales::rescale in a moment.
rescale = function(x,y){(x - min(y))/(max(y)-min(y))}
We can now rescale the data:
data %>%
mutate(Teff.scale = rescale(Teff,Teff),
logg.scale = rescale(logg,logg)) -> data
From here, we might use raster::pointDistance to calculate the distance from the input point to all of the scaled values:
raster::pointDistance(cbind(rescale(input_Teff,data$Teff),rescale(input_log_g,data$logg)),
data[,c("Teff.scale","logg.scale")],
lonlat = FALSE)
We can use which.min to find the row with the minimum distance:
data[which.min(raster::pointDistance(cbind(rescale(input_Teff,data$Teff),rescale(input_log_g,data$logg)),
data[,c("Teff.scale","logg.scale")],
lonlat = FALSE)),]
Teff logg M_div_H U B V R I J H K L Lprime M Teff.scale logg.scale
1: 4750 2 -0.1 -2.447 -1.438 -0.355 0.159 0.589 1.384 1.976 1.881 2.079 2.083 2.489 0.05729167 0.4631902
Here we can visualize the result:
ggplot(data,aes(x = Teff.scale, y = logg.scale)) +
geom_point() +
geom_point(data = data[which.min(raster::pointDistance(cbind(rescale(input_Teff,data$Teff),rescale(input_log_g,data$logg)),data[,c("Teff.scale","logg.scale")], FALSE)),],
color = "blue") +
geom_point(data = data.frame(Teff.scale = rescale(input_Teff,data$Teff),logg.scale = rescale(input_log_g,data$logg)),
color = "orange")
And access the appropriate value for V:
data[which.min(raster::pointDistance(cbind(rescale(input_Teff,data$Teff),rescale(input_log_g,data$logg)),data[,c("Teff.scale","logg.scale")], FALSE)),"V"]
V
1: -0.355
Data:
library(data.table)
data <- fread("https://www.dropbox.com/s/prbceabxmd25etx/lcb98cor.dat?dl=1")
setnames(data,"#Teff","Teff")
input_Teff = 4.8529282904170595E+003
input_log_g = 1.9241934741026787E+000

Related

stat_density2d - What does the legend mean?

I have a map done in R with stat_density2d. This is the code:
ggplot(data, aes(x=Lon, y=Lat)) +
stat_density2d(aes(fill = ..level..), alpha=0.5, geom="polygon",show.legend=FALSE)+
geom_point(colour="red")+
geom_path(data=map.df,aes(x=long, y=lat, group=group), colour="grey50")+
scale_fill_gradientn(colours=rev(brewer.pal(7,"Spectral")))+
xlim(-10,+2.5) +
ylim(+47,+60) +
coord_fixed(1.7) +
theme_void()
And it produces this:
Great. It works. However I do not know what the legend means. I did find this wikipedia page:
https://en.wikipedia.org/wiki/Multivariate_kernel_density_estimation
And the example they used (which contains red, orange and yellow) stated:
The coloured contours correspond to the smallest region which contains
the respective probability mass: red = 25%, orange + red = 50%, yellow
+ orange + red = 75%
However, using stat_density2d, I have 11 contours in my map. Does anyone know how stat_density2d works and what the legend means? Ideally I wanted to be able to state something like the red contour contains 25% of the plots etc.
I have read this: https://ggplot2.tidyverse.org/reference/geom_density_2d.html and I am still none the wiser.
Let's take the faithful example from ggplot2:
ggplot(faithful, aes(x = eruptions, y = waiting)) +
stat_density_2d(aes(fill = factor(stat(level))), geom = "polygon") +
geom_point() +
xlim(0.5, 6) +
ylim(40, 110)
(apologies in advance for not making this prettier)
The level is the height at which the 3D "mountains" were sliced. I don't know of a way (others might) to translate that to a percentage but I do know to get you said percentages.
If we look at that chart, level 0.002 contains the vast majority of the points (all but 2). Level 0.004 is actually 2 polygons and they contain all but ~dozen of the points. If I'm getting the gist of what you're asking that's what you want to know, except not count but the percentage of points encompassed by polygons at a given level. That's straightforward to compute using the methodology from the various ggplot2 "stats" involved.
Note that while we're importing the tidyverse and sp packages we'll use some other functions fully-qualified. Now, let's reshape the faithful data a bit:
library(tidyverse)
library(sp)
xdf <- select(faithful, x = eruptions, y = waiting)
(easier to type x and y)
Now, we'll compute the two-dimensional kernel density estimation the way ggplot2 does:
h <- c(MASS::bandwidth.nrd(xdf$x), MASS::bandwidth.nrd(xdf$y))
dens <- MASS::kde2d(
xdf$x, xdf$y, h = h, n = 100,
lims = c(0.5, 6, 40, 110)
)
breaks <- pretty(range(zdf$z), 10)
zdf <- data.frame(expand.grid(x = dens$x, y = dens$y), z = as.vector(dens$z))
z <- tapply(zdf$z, zdf[c("x", "y")], identity)
cl <- grDevices::contourLines(
x = sort(unique(dens$x)), y = sort(unique(dens$y)), z = dens$z,
levels = breaks
)
I won't clutter the answer with str() output but it's kinda fun looking at what happens there.
We can use spatial ops to figure out how many points fall within given polygons, then we can group the polygons at the same level to provide counts and percentages per-level:
SpatialPolygons(
lapply(1:length(cl), function(idx) {
Polygons(
srl = list(Polygon(
matrix(c(cl[[idx]]$x, cl[[idx]]$y), nrow=length(cl[[idx]]$x), byrow=FALSE)
)),
ID = idx
)
})
) -> cont
coordinates(xdf) <- ~x+y
data_frame(
ct = sapply(over(cont, geometry(xdf), returnList = TRUE), length),
id = 1:length(ct),
lvl = sapply(cl, function(x) x$level)
) %>%
count(lvl, wt=ct) %>%
mutate(
pct = n/length(xdf),
pct_lab = sprintf("%s of the points fall within this level", scales::percent(pct))
)
## # A tibble: 12 x 4
## lvl n pct pct_lab
## <dbl> <int> <dbl> <chr>
## 1 0.002 270 0.993 99.3% of the points fall within this level
## 2 0.004 259 0.952 95.2% of the points fall within this level
## 3 0.006 249 0.915 91.5% of the points fall within this level
## 4 0.008 232 0.853 85.3% of the points fall within this level
## 5 0.01 206 0.757 75.7% of the points fall within this level
## 6 0.012 175 0.643 64.3% of the points fall within this level
## 7 0.014 145 0.533 53.3% of the points fall within this level
## 8 0.016 94 0.346 34.6% of the points fall within this level
## 9 0.018 81 0.298 29.8% of the points fall within this level
## 10 0.02 60 0.221 22.1% of the points fall within this level
## 11 0.022 43 0.158 15.8% of the points fall within this level
## 12 0.024 13 0.0478 4.8% of the points fall within this level
I only spelled it out to avoid blathering more but the percentages will change depending on how you modify the various parameters to the density computation (same holds true for my ggalt::geom_bkde2d() which uses a different estimator).
If there is a way to tease out the percentages without re-performing the calculations there's no better way to have that pointed out than by letting other SO R folks show how much more clever they are than the person writing this answer (hopefully in more diplomatic ways than seem to be the mode of late).

group and average a large numeric vector to plot

I have an R matrix which is very data dense. It has 500,000 rows. If I plot 1:500000 (x axis) to the third column of the matrix mat[, 3] it takes too long to plot, and sometimes even crashes. I've tried plot, matplot, and ggplot and all of them take very long.
I am looking to group the data by 10 or 20. ie, take the first 10 elements from the vector, average that, and use that as a data point.
Is there a fast and efficient way to do this?
We can use cut and aggregate to reduce the number of points plotted:
generate some data
set.seed(123)
xmat <- data.frame(x = 1:5e5, y = runif(5e5))
use cut and aggregate
xmat$cutx <- as.numeric(cut(xmat$x, breaks = 5e5/10))
xmat.agg <- aggregate(y ~ cutx, data = xmat, mean)
make plot
plot(xmat.agg, pch = ".")
more than 1 column solution:
Here, we use the data.table package to group and summarize:
generate some more data
set.seed(123)
xmat <- data.frame(x = 1:5e5,
u = runif(5e5),
z = rnorm(5e5),
p = rpois(5e5, lambda = 5),
g = rbinom(n = 5e5, size = 1, prob = 0.5))
use data.table
library(data.table)
xmat$cutx <- as.numeric(cut(xmat$x, breaks = 5e5/10))
setDT(xmat) #convert to data.table
#for each level of cutx, take the mean of each column
xmat[,lapply(.SD, mean), by = cutx] -> xmat.agg
# xmat.agg
# cutx x u z p g
# 1: 1 5.5 0.5782475 0.372984058 4.5 0.6
# 2: 2 15.5 0.5233693 0.032501186 4.6 0.8
# 3: 3 25.5 0.6155837 -0.258803746 4.6 0.4
# 4: 4 35.5 0.5378580 0.269690334 4.4 0.8
# 5: 5 45.5 0.3453964 0.312308395 4.8 0.4
# ---
# 49996: 49996 499955.5 0.4872596 0.006631221 5.6 0.4
# 49997: 49997 499965.5 0.5974486 0.022103345 4.6 0.6
# 49998: 49998 499975.5 0.5056578 -0.104263093 4.7 0.6
# 49999: 49999 499985.5 0.3083803 0.386846148 6.8 0.6
# 50000: 50000 499995.5 0.4377497 0.109197095 5.7 0.6
plot it all
par(mfrow = c(2,2))
for(i in 3:6) plot(xmat.agg[,c(1,i), with = F], pch = ".")

How to simplify the code using Apply Function

i have this script:
library(plyr)
library(gstat)
library(sp)
library(dplyr)
library(ggplot2)
library(scales)
a<-c(10,20,30,40,50,60,70,80,90,100)
b<-c(15,25,35,45,55,65,75,85,95,105)
x<-rep(a,3)
y<-rep(b,3)
E<-sample(30)
freq<-rep(c(100,200,300),10)
data<-data.frame(x,y,freq,E)
data<-arrange(data,x,y,freq)
df <- ddply(data,"freq", function (h){
dim_h<-length(h$x)
perc_max <- 0.9
perc_min <- 0.8
u<-round((seq(perc_max,perc_min,by=-0.1))*dim_h)
dim_u<-length(u)
perc_punti<- percent(seq(perc_max,perc_min,by=-0.1))
for (i in 1:dim_u)
{ t<-u[i]
time[i]<-system.time(
for (j in 1:2)
{
df_tass <- sample_n(h, t)
df_residuo <- slice(h,-as.numeric(rownames(df_tass)))
coordinates(df_tass)= ~x + y
x.range <- range(h$x)
y.range <- range(h$y)
grid <- expand.grid(x = seq(from = x.range[1], to = x.range[2], by = 1), y = seq(from = y.range[1],
to = y.range[2], by = 1))
coordinates(grid) <- ~x + y
gridded(grid) <- TRUE
nearest = krige(E ~ 1, df_tass, grid, nmax = 1)
nearest_df<-as.data.frame(nearest)
names(nearest_df) <- c("x", "y", "E")
#Error of prediction
df_pred <- inner_join(nearest_df[1:3],select(df_residuo,x,y,E),by=c("x","y"))
names(df_pred) <- c("x", "y", "E_pred","E")
sqm[j] <- mean((df_pred[,4]-df_pred[,3])^2)
})[3]
sqmm[i]<-mean(sqm)
}
df_finale<-data.frame(sqmm,time,perc_punti)
})
df
I measured in several points of coordinates (x,y) the value of the electromagnetic field (E value) at different frequencies (freq value). For each frequency value, I use once 90% of points and once 80% (with the for loop with l) to interpolate the value of the electromagnetic field (E) inside grid with Nearest Neighbour Interpolation (krige function); and i repeat this 2 times. The remaining points will then be used to calculate the prediction error. I hope it's clear.
This script above is a simplified case. Unfortunately, in my case the script takes too long for the two for-loops implemented.
I want to ask if it's possible to simplify the code in some way, for instance by using the apply function family. Thanks.
Reply #clemlaflemme ok it works! thanks... now i have a little proble with the final dataframe, it looks like this:
freq 1 2
1 100 121.00 338.00
2 100 0.47 0.85
3 200 81.00 462.50
4 200 0.74 0.73
5 300 36.00 234.00
6 300 0.82 0.76
but i want something like this:
freq sqmm time
1 100 121.0 0.47
2 100 338.0 0.85
3 200 81.0 0.74
4 200 462.5 0.73
5 300 36.0 0.82
6 300 234.0 0.76
how can i do that??

Generate multiple serial graphs/scatterplots from data in two dataframes

I have 2 dataframes, Tg and Pf, each of 127 columns. All columns have at least one row and can have up to thousands of them. All the values are between 0 and 1 and there are some missing values (empty cells). Here is a little subset:
Tg
Tg1 Tg2 Tg3 ... Tg127
0.9 0.5 0.4 0
0.9 0.3 0.6 0
0.4 0.6 0.6 0.3
0.1 0.7 0.6 0.4
0.1 0.8
0.3 0.9
0.9
0.6
0.1
Pf
Pf1 Pf2 Pf3 ...Pf127
0.9 0.5 0.4 1
0.9 0.3 0.6 0.8
0.6 0.6 0.6 0.7
0.4 0.7 0.6 0.5
0.1 0.6 0.5
0.3
0.3
0.3
Note that some cell are empty and the vector lengths for the same subset (i.e. 1 to 127) can be of very different length and are rarely the same exact length.
I want to generate 127 graph as follow for the 127 vectors (i.e. graph is for col 1 from each dataframe, graph 2 is for col 2 for each dataframe etc...):
Hope that makes sense. I'm looking forward to your assistance as I don't want to make those graphs one by one...
Thanks!
Here is an example to get you started (data at https://gist.github.com/1349300). For further tweaking, check out the excellent ggplot2 documentation that is all over the web.
library(ggplot2)
# Load data
Tg = read.table('Tg.txt', header=T, fill=T, sep=' ')
Pf = read.table('Pf.txt', header=T, fill=T, sep=' ')
# Format data
Tg$x = as.numeric(rownames(Tg))
Tg = melt(Tg, id.vars='x')
Tg$source = 'Tg'
Tg$variable = factor(as.numeric(gsub('Tg(.+)', '\\1', Tg$variable)))
Pf$x = as.numeric(rownames(Pf))
Pf = melt(Pf, id.vars='x')
Pf$source = 'Pf'
Pf$variable = factor(as.numeric(gsub('Pf(.+)', '\\1', Pf$variable)))
# Stack data
data = rbind(Tg, Pf)
# Plot
dev.new(width=5, height=4)
p = ggplot(data=data, aes(x=x)) + geom_line(aes(y=value, group=source, color=source)) + facet_wrap(~variable)
p
Highlighting the area between the lines
First, interpolate the data onto a finer grid. This way the ribbon will follow the actual envelope of the lines, rather than just where the original data points were located.
data = ddply(data, c('variable', 'source'), function(x) data.frame(approx(x$x, x$value, xout=seq(min(x$x), max(x$x), length.out=100))))
names(data)[4] = 'value'
Next, calculate the data needed for geom_ribbon - namely ymax and ymin.
ribbon.data = ddply(data, c('variable', 'x'), summarize, ymin=min(value), ymax=max(value))
Now it is time to plot. Notice how we've added a new ribbon layer, for which we've substituted our new ribbon.data frame.
dev.new(width=5, height=4)
p + geom_ribbon(aes(ymin=ymin, ymax=ymax), alpha=0.3, data=ribbon.data)
Dynamic coloring between the lines
The trickiest variation is if you want the coloring to vary based on the data. For that, you currently must create a new grouping variable to identify the different segments. Here, for example, we might use a function that indicates when the "Tg" group is on top:
GetSegs <- function(x) {
segs = x[x$source=='Tg', ]$value > x[x$source=='Pf', ]$value
segs.rle = rle(segs)
on.top = ifelse(segs, 'Tg', 'Pf')
on.top[is.na(on.top)] = 'Tg'
group = rep.int(1:length(segs.rle$lengths), times=segs.rle$lengths)
group[is.na(segs)] = NA
data.frame(x=unique(x$x), group, on.top)
}
Now we apply it and merge the results back with our original ribbon data.
groups = ddply(data, 'variable', GetSegs)
ribbon.data = join(ribbon.data, groups)
For the plot, the key is that we now specify a grouping aesthetic to the ribbon geom.
dev.new(width=5, height=4)
p + geom_ribbon(aes(ymin=ymin, ymax=ymax, group=group, fill=on.top), alpha=0.3, data=ribbon.data)
Code is available together at: https://gist.github.com/1349300
Here is a three-liner to do the same :-). We first reshape from base to convert the data into long form. Then, it is melted to suit ggplot2. Finally, we generate the plot!
mydf <- reshape(cbind(Tg, Pf), varying = 1:8, direction = 'long', sep = "")
mydf_m <- melt(mydf, id.var = c(1, 4), variable = 'source')
qplot(id, value, colour = source, data = mydf_m, geom = 'line') +
facet_wrap(~ time, ncol = 2)
NOTE. The reshape function in base R is extremely powerful, albeit very confusing to use. It is used to transform data between long and wide formats.
Kudos for automating something you used to do in Excel using R! That's exactly how I got started with R and a common path to R enlightenment :)
All you really need is a little looping. Here's an example, most of which is creating example data that represents your data structure:
## create some example data
Tg <- data.frame(Tg1 = rnorm(10))
for (i in 2:10) {
vec <- rep(NA, 8)
vec <- c(rnorm(sample(5:10,1)), vec)
Tg[paste("Tg", i, sep="")] <- vec[1:10]
}
Pf <- data.frame(Pf1 = rnorm(10))
for (i in 2:10) {
vec <- rep(NA, 8)
vec <- c(rnorm(sample(5:10,1)), vec)
Pf[paste("Pf", i, sep="")] <- vec[1:10]
}
## ok, sample data created
## now lets loop through all the columns
## if you didn't know how many columns there are you could
## use ncol(Tg) to figure out
for (i in 1:10) {
plot(1:10, Tg[,i], type = "l", col="blue", lwd=5, ylim=c(-3,3),
xlim=c(1, max(length(na.omit(Tg[,i])), length(na.omit(Pf[,i])))))
lines(1:10, Pf[,i], type = "l", col="red", lwd=5, ylim=c(-3,3))
dev.copy(png, paste('rplot', i, '.png', sep=""))
dev.off()
}
This will result in 10 graphs in your working directory that look like the following:

using R.zoo to plot multiple series with error bars

I have data that looks like this:
> head(data)
groupname ob_time dist.mean dist.sd dur.mean dur.sd ct.mean ct.sd
1 rowA 0.3 61.67500 39.76515 43.67500 26.35027 8.666667 11.29226
2 rowA 60.0 45.49167 38.30301 37.58333 27.98207 8.750000 12.46176
3 rowA 120.0 50.22500 35.89708 40.40000 24.93399 8.000000 10.23363
4 rowA 180.0 54.05000 41.43919 37.98333 28.03562 8.750000 11.97061
5 rowA 240.0 51.97500 41.75498 35.60000 25.68243 28.583333 46.14692
6 rowA 300.0 45.50833 43.10160 32.20833 27.37990 12.833333 14.21800
Each groupname is a data series. Since I want to plot each series separately, I've separated them like this:
> A <- zoo(data[which(groupname=='rowA'),3:8],data[which(groupname=='rowA'),2])
> B <- zoo(data[which(groupname=='rowB'),3:8],data[which(groupname=='rowB'),2])
> C <- zoo(data[which(groupname=='rowC'),3:8],data[which(groupname=='rowC'),2])
ETA:
Thanks to gd047: Now I'm using this:
z <- dlply(data,.(groupname),function(x) zoo(x[,3:8],x[,2]))
The resulting zoo objects look like this:
> head(z$rowA)
dist.mean dist.sd dur.mean dur.sd ct.mean ct.sd
0.3 61.67500 39.76515 43.67500 26.35027 8.666667 11.29226
60 45.49167 38.30301 37.58333 27.98207 8.750000 12.46176
120 50.22500 35.89708 40.40000 24.93399 8.000000 10.23363
180 54.05000 41.43919 37.98333 28.03562 8.750000 11.97061
240 51.97500 41.75498 35.60000 25.68243 28.583333 46.14692
300 45.50833 43.10160 32.20833 27.37990 12.833333 14.21800
So if I want to plot dist.mean against time and include error bars equal to +/- dist.sd for each series:
how do I combine A,B,C dist.mean and dist.sd?
how do I make a bar plot, or perhaps better, a line graph of the resulting object?
I don't see the point of breaking up the data into three pieces only to have to combine it together for a plot. Here is a plot using the ggplot2 library:
library(ggplot2)
qplot(ob_time, dist.mean, data=data, colour=groupname, geom=c("line","point")) +
geom_errorbar(aes(ymin=dist.mean-dist.sd, ymax=dist.mean+dist.sd))
This spaces the time values along the natural scale, you can use scale_x_continuous to define the tickmarks at the actual time values. Having them equally spaced is trickier: you can convert ob_time to a factor, but then qplot refuses to connect the points with a line.
Solution 1 - bar graph:
qplot(factor(ob_time), dist.mean, data=data, geom=c("bar"), fill=groupname,
colour=groupname, position="dodge") +
geom_errorbar(aes(ymin=dist.mean-dist.sd, ymax=dist.mean+dist.sd), position="dodge")
Solution 2 - add lines manually using the 1,2,... recoding of the factor:
qplot(factor(ob_time), dist.mean, data=data, geom=c("line","point"), colour=groupname) +
geom_errorbar(aes(ymin=dist.mean-dist.sd, ymax=dist.mean+dist.sd)) +
geom_line(aes(x=as.numeric(factor(ob_time))))
This is a hint of the way I would try to do it. I have ignored grouping, so you'll have to modify it to include more than one series. Also I haven't used zoo cause I don't know much.
g <- (nrow(data)-1)/(3*nrow(data))
plot(data[,"dist.mean"],col=2, type='o',lwd=2,cex=1.5, main="This is the title of the graph",
xlab="x-Label", ylab="y-Label", xaxt="n",
ylim=c(0,max(data[,"dist.mean"])+max(data[,"dist.sd"])),
xlim=c(1-g,nrow(data)+g))
axis(side=1,at=c(1:nrow(data)),labels=data[,"ob_time"])
for (i in 1:nrow(data)) {
lines(c(i,i),c(data[i,"dist.mean"]+data[i,"dist.sd"],data[i,"dist.mean"]-data[i,"dist.sd"]))
lines(c(i-g,i+g),c(data[i,"dist.mean"]+data[i,"dist.sd"], data[i,"dist.mean"]+data[i,"dist.sd"]))
lines(c(i-g,i+g),c(data[i,"dist.mean"]-data[i,"dist.sd"], data[i,"dist.mean"]-data[i,"dist.sd"]))
}
Read the data in using read.zoo with the split= argument to split it by groupname. Then bind together the dist, lower and upper lines. Finally plot them.
Lines <- "groupname ob_time dist.mean dist.sd dur.mean dur.sd ct.mean ct.sd
rowA 0.3 61.67500 39.76515 43.67500 26.35027 8.666667 11.29226
rowA 60.0 45.49167 38.30301 37.58333 27.98207 8.750000 12.46176
rowA 120.0 50.22500 35.89708 40.40000 24.93399 8.000000 10.23363
rowA 180.0 54.05000 41.43919 37.98333 28.03562 8.750000 11.97061
rowB 240.0 51.97500 41.75498 35.60000 25.68243 28.583333 46.14692
rowB 300.0 45.50833 43.10160 32.20833 27.37990 12.833333 14.21800"
library(zoo)
# next line is only needed until next version of zoo is released
source("http://r-forge.r-project.org/scm/viewvc.php/*checkout*/pkg/zoo/R/read.zoo.R?revision=719&root=zoo")
z <- read.zoo(textConnection(Lines), header = TRUE, split = 1, index = 2)
# pick out the dist and sd columns binding dist with lower & upper
z.dist <- z[, grep("dist.mean", colnames(z))]
z.sd <- z[, grep("dist.sd", colnames(z))]
zz <- cbind(z = z.dist, lower = z.dist - z.sd, upper = z.dist + z.sd)
# plot using N panels
N <- ncol(z.dist)
ylab <- sub("dist.mean.", "", colnames(z.dist))
plot(zz, screen = 1:N, type = "l", lty = rep(1:2, N*1:2), ylab = ylab)
I don't think you need to create zoo objects for this type of plot, I would do it directly from the data frame. Of course, there may be other reasons to use zoo objects, such a smart merging, aggregation, etc.
One option is the segplot function from latticeExtra
library(latticeExtra)
segplot(ob_time ~ (dist.mean + dist.sd) + (dist.mean - dist.sd) | groupname,
data = data, centers = dist.mean, horizontal = FALSE)
## and with the latest version of latticeExtra (from R-forge):
trellis.last.object(segments.fun = panel.arrows, ends = "both", angle = 90, length = .1) +
xyplot(dist.mean ~ ob_time | groupname, data, col = "black", type = "l")
Using Gabor's nicely-reproducible dataset this produces:

Resources