Making a 3D surface from time series data in R - r

I have a large data set which I would like to make a 3D surface from. I would like the x-axis to be the date, the y-axis to be the time (24h) and the z-axis (height) to be a value I have ($). I am a beginner with R, so the simpler the better!
http://www.quantmod.com/examples/chartSeries3d/ has a nice example, but the code is way to complicated for my skill level!
Any help would be much appreciated - anything I have researched so far needs to have the data sorted, which is not suitable I think.

Several options present themselves, persp() and wireframe(), the latter in package lattice.
First some dummy data:
set.seed(3)
dat <- data.frame(Dates = rep(seq(Sys.Date(), Sys.Date() + 9, by = 1),
each = 24),
Times = rep(0:23, times = 10),
Value = rep(c(0:12,11:1), times = 10) + rnorm(240))
persp() needs the data as the x and y grid locations and a matrix z of observations.
new.dates <- with(dat, sort(unique(Dates)))
new.times <- with(dat, sort(unique(Times)))
new.values <- with(dat, matrix(Value, nrow = 10, ncol = 24, byrow = TRUE))
and can be plotted using:
persp(new.dates, new.times, new.values, ticktype = "detailed", r = 10,
theta = 35, scale = FALSE)
The facets can be coloured using the col argument. You could do a lot worse than study the code for chartSeries3d0() at the page you linked to. Most of the code is just drawing proper axes as neither persp() nor wireframe() handle Date objects easily.
As for wireframe(), we
require(lattice)
wireframe(Value ~ as.numeric(Dates) + Times, data = dat, drape = TRUE)
You'll need to do a bit or work to sort out the axis labelling as wireframe() doesn't work with objects of class "Date" at the moment (hence the cast as numeric).

Related

R-package beeswarm generates same x-coordinates

I am working on a script where I need to calculate the coordinates for a beeswarm plot without immediately plotting. When I use beeswarm, I get x-coordinates that aren't swarmed, and more or less the same value:
But if I generate the same plot again it swarms correctly:
And if I use dev.off() I again get no swarming:
The code I used:
n <- 250
df = data.frame(x = floor(runif(n, 0, 5)),
y = rnorm(n = n, mean = 500, sd = 100))
#Plot 1:
A = with(df, beeswarm(y ~ x, do.plot = F))
plot(x = A$x, y=A$y)
#Plot 2:
A = with(df, beeswarm(y ~ x, do.plot = F))
plot(x = A$x, y=A$y)
dev.off()
#Plot 3:
A = with(df, beeswarm(y ~ x, do.plot = F))
plot(x = A$x, y=A$y)
It seems to me like beeswarm uses something like the current plot parameters (or however it is called) to do the swarming and therefore chokes when a plot isn't showing. I have tried to play around with beeswarm parameters such as spacing, breaks, corral, corralWidth, priority, and xlim, but it does not make a difference. FYI: If do.plot is set to TRUE the x-coordinates are calculated correctly, but this is not helpful as I don't want to plot immediately.
Any tips or comments are greatly appreciated!
You're right; beeswarm uses the current plot parameters to calculate the amount of space to leave between points. It seems that setting "do.plot=FALSE" does not do what one would expect, and I'm not sure why I included this parameter.
If you want to control the parameters manually, you could use the functions swarmx or swarmy instead. These functions must be applied to each group separately, e.g.
dfsplitswarmed <- by(df, df$x, function(aa) swarmx(aa$x, aa$y, xsize = 0.075, ysize = 7.5, cex = 1, log = ""))
dfswarmed <- do.call(rbind, dfsplitswarmed)
plot(dfswarmed)
In this case, I set the xsize and ysize values based on what the function would default to for this particular data set. If you can find a set of xsize/ysize values that work for your data, this approach might work for you.
Otherwise, perhaps a simpler approach would be to leave do.plot=TRUE, and then discard the plots.

Add elements to a previous subplot within an active base R graphics device?

Let's say I generate 9 groups of data in a list data and plot them each with a for loop. I could use *apply here too, whichever you prefer.
data = list()
layout(mat = matrix(1:9, nrow = 3))
for(i in 1:9){
data[[i]] = rnorm(n = 100, mean = i, sd = 1)
plot(data[[i]])
}
After creating all the data, I want to decide which one is best:
best_data = which.min(sapply(data, sd))
Now I want to highlight that best data on the plot to distinguish it. Is there a plotting function that lets me go back to a specified sub-plot in the active device and add an element (maybe a title)?
I know I could make a second for loop: for loop 1 generates the data, then I assess which is best, then for loop 2 creates the plots, but this seems less efficient and more verbose.
Does such a plotting function exist for base R graphics?
#rawr's answer is simple and easy. But I thought I'd point out another option that allows you to select the "best" data set before you plot, in case you want more flexibility to plot the "best" data set differently from the rest.
For example:
# Create the data
data = lapply(1:9, function(i) rnorm(n = 100, mean = i, sd = 1))
par(mar=c(4,4,1,1))
layout(mat = matrix(1:9, nrow = 3))
rng = range(data)
# Plot each data frame
lapply(1:9, function(i) {
# Select data frame with lowest SD
best = which.min(sapply(data, sd))
# Highlight data frame with lowest SD by coloring points red
plot(data[[i]], col=ifelse(best==i,"red","black"), pch=ifelse(best==i, 3, 1), ylim=rng)
})

Creating Hexbins with Dates in R hexbin()

I am trying to create hexbins where the x-axis is a date using the hexbin function in the hexbin package in R. When I feed in my data, it seems to convert the dates into a numeric, which gets displayed on the x-axis. I want it force the x-axis to be a date.
#Create Hex Bins
hbin <- hexbin(xData$Date, xData$YAxis, xbins = 80)
#Plot using rBokeh
figure() %>%
ly_hexbin(hbin)
This gives me:
Here's a brute force approach using the underlying grid plotting package. The axes are ugly; maybe someone with better grid skills than I could pretty them up.
# make some data
x = seq.Date(as.Date("2015-01-01"),as.Date("2015-12-31"),by='days')
y = sample(x)
# make the plot and capture the plot
p <- plot(hexbin(x,y),yaxt='n',xaxt='n')
# calculate the ticks
x_ticks_date <-
x_ticks <- axTicks(1, log = FALSE, usr = as.numeric(range(x)),
axp=c(as.numeric(range(x)) ,5))
class(x_ticks_date) <- 'Date'
y_ticks_date <-
y_ticks <- axTicks(1, log = FALSE, usr = as.numeric(range(y)),
axp=c(as.numeric(range(y)) ,5))
class(y_ticks_date) <- 'Date'
# push the ticks to the view port.
pushViewport(p$plot.vp#hexVp.off)
grid.xaxis(at=x_ticks, label = format(y_ticks_date))
grid.yaxis(at=y_ticks, label = format(y_ticks_date))

R: Create a more readable X-axis after binning data in ggplot2. Turn bins into whole numbers

I have a dummy variable call it "drink" and a corresponding age variable that represents a precise age estimate (several decimal points) for each person in a dataset. I want to first "bin" the age variable, extracting the mean value for each bin based on the "drink" dummy, and then graph the result. My code to do so looks like this:
df$bins <- cut(df$age, seq(from = 17, to = 31, by = .2), include.lowest = TRUE)
df.plot <- ddply(df, .(bins), summarise, avg.drink = mean(drinks_alcohol))
qplot(bins, avg.drink, data = df.plot)
This works well enough, but the x-axis in the graph is unreadable because it corresponds to the length size of the bins. Is there a way to make the modify the X-axis to show, for example, ages 19-23 only, with the "ticks" still aligning with the correct bins? For example, in my current code there is a bin for (19, 19.2] and another bin for (20, 20.2]. I would want only the bins that start in whole numbers to be identified on the X-axis with the first number (19, 20), not the second (19.2, 20.2) shown.
Is there any straightforward way to do this?
The most direct way to specify axis labels is with the appropriate scale function... in the case of factors on the x axis, scale_x_discrete. It will use whatever labels you give it with the labels argument, or you can give it a function that formats things as you like.
To "manually" specify the labels, you just need to create a vector of appropriate length. In this case, if you factor values go are intervals beginning with seq(17, 31.8, by = 0.2) and you want to label bins beginning with integers, then your labels vector will be
bin_starts = seq(17, 31.8, by = 0.2)
bin_labels = ifelse(bin_starts - trunc(bin_starts) < 0.0001, as.character(bin_starts), "")
(I use the a - b < 0.0001 in case of precision problems, though it shouldn't be a problem in this particular case).
A more robust solution would to label the factor levels with the number at the start of the interval from the beginning. cut also has a labels argument.
my_breaks = seq(17, 32, by = 0.2)
df$bins <- cut(df$age, breaks = my_breaks, labels = head(my_breaks, -1),
include.lowest = TRUE)
You could then fairly easily write a formatter (following templates from the scales package) to print only the ones you want:
int_only = function(x) {
# test if we can coerce to numeric, if not do nothing
if (any(is.na(as.numeric(x)))) return(x)
# otherwise convert to numeric and return integers and blanks as labels
x = as.numeric(x)
return(ifelse(x - trunc(x) < 1e-10, as.character(x), ""))
}
Then, using the nicely formatted data created above, you should be able to pass int_only as a formatter function to labels to get the labels you want. (Note: untested! necessary tweaks left as an exercise for the reader, though I'll gladly accept edits :) )

Visualize data using histogram in R

I am trying to visualize some data and in order to do it I am using R's hist.
Bellow are my data
jancoefabs <- as.numeric(as.vector(abs(Janmodelnorm$coef)))
jancoefabs
[1] 1.165610e+00 1.277929e-01 4.349831e-01 3.602961e-01 7.189458e+00
[6] 1.856908e-04 1.352052e-05 4.811291e-05 1.055744e-02 2.756525e-04
[11] 2.202706e-01 4.199914e-02 4.684091e-02 8.634340e-01 2.479175e-02
[16] 2.409628e-01 5.459076e-03 9.892580e-03 5.378456e-02
Now as the more cunning of you might have guessed these are the absolute values of some model's coefficients.
What I need is an histogram that will have for axes:
x will be the number (count or length) of coefficients which is 19 in total, along with their names.
y will show values of each column (as breaks?) having a ylim="" set, according to min and max of those values (or something similar).
Note that Janmodelnorm$coef simply produces the following
(Intercept) LON LAT ME RAT
1.165610e+00 -1.277929e-01 -4.349831e-01 -3.602961e-01 -7.189458e+00
DS DSA DSI DRNS DREW
-1.856908e-04 1.352052e-05 4.811291e-05 -1.055744e-02 -2.756525e-04
ASPNS ASPEW SI CUR W_180_270
-2.202706e-01 -4.199914e-02 4.684091e-02 -8.634340e-01 -2.479175e-02
W_0_360 W_90_180 W_0_180 NDVI
2.409628e-01 5.459076e-03 -9.892580e-03 -5.378456e-02
So far and consulting ?hist, I am trying to play with the code bellow without success. Therefore I am taking it from scratch.
# hist(jancoefabs, col="lightblue", border="pink",
# breaks=8,
# xlim=c(0,10), ylim=c(20,-20), plot=TRUE)
When plot=FALSE is set, I get a bunch of somewhat useful info about the set. I also find hard to use breaks argument efficiently.
Any suggestion will be appreciated. Thanks.
Rather than using hist, why not use a barplot or a standard plot. For example,
## Generate some data
set.seed(1)
y = rnorm(19, sd=5)
names(y) = c("Inter", LETTERS[1:18])
Then plot the cofficients
barplot(y)
Alternatively, you could use a scatter plot
plot(1:19, y, axes=FALSE, ylim=c(-10, 10))
axis(2)
axis(1, 1:19, names(y))
and add error bars to indicate the standard errors (see for example Add error bars to show standard deviation on a plot in R)
Are you sure you want a histogram for this? A lattice barchart might be pretty nice. An example with the mtcars built-in data set.
> coef <- lm(mpg ~ ., data = mtcars)$coef
> library(lattice)
> barchart(coef, col = 'lightblue', horizontal = FALSE,
ylim = range(coef), xlab = '',
scales = list(y = list(labels = coef),
x = list(labels = names(coef))))
A base R dotchart might be good too,
> dotchart(coef, pch = 19, xlab = 'value')
> text(coef, seq(coef), labels = round(coef, 3), pos = 2)

Resources