Display multiple time series with rCharts hPlot - r

Using a simple data frame to illustrate this problem:
df <- data.frame(x=c(1,2,3), y1=c(1,2,3), y2=c(3,4,5))
Single time series plot is easy:
hPlot(y="y1", x="x", data=df)
Cannot figure out how to plot both y1 and y2 together. Tried this but it returns an error
> hPlot(x='x', y=c('y1','y2'), data=df)
Error in .subset2(x, i, exact = exact) : subscript out of bounds
Checking the code in hPlot where it uses [[ to extract one column from input data.frame, does it mean it only works for single time series?
hPlot <- highchartPlot <- function(..., radius = 3, title = NULL, subtitle = NULL, group.na = NULL){
rChart <- Highcharts$new()
# Get layers
d <- getLayer(...)
data <- data.frame(
x = d$data[[d$x]],
y = d$data[[d$y]]
)

Try to use long formatt data with group:
hPlot(x = "x", y = "value", group = "variable", data = reshape2::melt(df, id.vars = "x"))

Related

How to create a list in R for reproducible example question

I am trying to create a list for use in a reproducible example here which has structure (Delta.Tmax) that looks like this:
This is the code I've come up with so far:
#create data
tmintest=array(1:100, c(512,256,12))
#create the list
Variable <- list(varName = c("tmin","tmin","tmin","tmin","tmin","tmin","tmin","tmin","tmin","tmin","tmin","tmin"),level = c(NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA))
Data <- list(Data = tmintest)
xyCoords <- list(x = seq(-40.37,64.37,length.out=420), y = seq(25.37,72.37,length.out=189))
Dates <- list(start = seq(as.Date("2012-01-01"), as.Date("2015-12-31"), by="days"), end=seq(as.Date("2012-01-01"), as.Date("2015-12-31"), by="days"))
Delta <- list(Variable = Variable,Data=Data, xyCoords=xyCoords,Dates=Dates)
But as you can see, it is not the same:
Delta Data is expandable (where it shouldn't be). I have added Delta Data using this too:
Variable <- list(Data = tmintest,varName = c("tmin","tmin","tmin","tmin","tmin","tmin","tmin","tmin","tmin","tmin","tmin","tmin"),level = c(NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA))
But then Delta Data is at the same level as varName and level. Also, it doesn't have that extra dimension in the Data that I see in Delta.Tmax$Data

How to create multipe boxplots in one by only chosing certain rows from a data frame

What I would like to do is creating several boxplots (all displayed in a single boxplot) only from certain values of my original data frame.
My data frame looks as follows:
enter image description here
So now I want R to visualise Parameter ~ Station (Parameter are all variables coloured green and Station is the "station id")
Is there a way to tell R that I want all my Parameters on the x-axis ONLY for BB0028 for example, which would mean that I only take the first 6 values of mean_area, mean_area_exc, esd, feret, min and max into account in the boxplot?
That would look like this:
enter image description here
I tried it in very complicated way to add single boxplots one by one but I am sure there must be a more simple way.
This is what I tried:
bb28 <- df[c(1:6),]
bb28area <- boxplot(bb28$mean_area ~ bb28$BBnr)
bb28area_exc <- boxplot(bb28$mean_area_exc ~ bb28$BBnr)
bb28esd <- boxplot(bb28$mean_esd ~ bb28$BBnr)
bb28feret <- boxplot(bb28$mean_feret ~ bb28$BBnr)
bb28min <- boxplot(bb28$mean_min ~ bb28$BBnr)
bb28max <- boxplot(bb28$mean_max ~ bb28$BBnr)
boxplot(bb28$mean_area ~ bb28$BBnr)
boxplot(bb28$mean_area_exc ~ bb28$BBnr, add=TRUE, at = 1:1+0.45)
Also it doesn't look very nice because in the plot the x-axis does not adjust to the new boxplot which is cut off then:
enter image description here
I hope you can help me with simple a proper code to get my plot.
Thank you!
Cheers, Merle
Maybe the function multi.boxplot below is what you are looking for. It uses base R only.
Data.
First, make up a dataset, since you have not provided us with one in a copy&paste friendly format.
set.seed(1234)
n <- 50
BBnr <- sort(sprintf("BB%04d", sample(28:30, n, TRUE)))
bb28 <- data.frame(col1 = 1:n, col2 = n:1, BBnr = BBnr)
tmp <- matrix(runif(3*n), ncol = 3)
colnames(tmp) <- paste("mean", c("this", "that", "other"), sep = "_")
bb28 <- cbind(bb28, tmp)
rm(BBnr, tmp)
Code.
multi.boxplot <- function(x, by, col=0, ...){
x <- as.data.frame(x)
uniq.by <- unique(by)
len <- length(uniq.by) - 1
n <- ncol(x)
n1 <- n + 1
col <- rep(col, n)[seq_len(n)]
boxplot(x[[ 1 ]] ~ by, at = 0:len*n1 + 1,
xlim = c(0, (len + 1)*n1), ylim = range(unlist(x)), xaxt = "n", col=col[1], ...)
for(i in seq_len(n)[-1])
boxplot(x[[i]] ~ by, at = 0:len*n1 + i, xaxt = "n", add = TRUE, col=col[i], ...)
axis(1, at = 0:len*n1 + n1/2, labels = uniq.by, tick = TRUE)
}
inx <- grep("mean", names(bb28))
multi.boxplot(bb28[inx], by = bb28$BBnr, col = rainbow(length(inx)))

Memory leakage in using `ggplot` on large binned datasets

I am making various ggplots on a very large dataset (much larger than the examples). I created a binning function on both x- and y-axes to enable plotting of such large dataset.
In the following example, the memory.size() is recorded at the start. Then the large dataset is simulated as dt. dt's x2 is plotted against x1 with binning. Plotting is repeated with different subsets of dt. The size of the ploted object is checked by object.size() and stored. After the plotting objects have been created, rm(dt) is executed, followed by a double gc(). At this point, memory.size() is recorded again. At the end, the memory.size() at the end is compared to that at the beginning and printed.
In view of the small size of the plotted object, it is expected that the memory.size() at the end should be similar to that at the beginning. But no. memory.size() does not go down anymore until I restart a new R session.
REPRODUCIBLE EXAMPLE
library(data.table)
library(ggplot2)
library(magrittr)
# The binning function
# x = column name for x-axis (character)
# y = column name for y-axis (character)
# xNItv = Number of bin for x-axis
# yNItv = Number of bin for y-axis
# Value: A binned data.table
tab_by_bin_idxy <- function(dt, x, y, xNItv, yNItv) {
#Binning
xBreaks = dt[, seq(min(get(x), na.rm = T), max(get(x), na.rm = T), length.out = xNItv + 1)]
yBreaks = dt[, seq(min(get(y), na.rm = T), max(get(y), na.rm = T), length.out = yNItv + 1)]
xbinCode = dt[, .bincode(get(x), breaks = xBreaks, include.lowest = T)]
xbinMid = sapply(seq(xNItv), function(i) {return(mean(xBreaks[c(i, i+1)]))})[xbinCode]
ybinCode = dt[, .bincode(get(y), breaks = yBreaks, include.lowest = T)]
ybinMid = sapply(seq(yNItv), function(i) {return(mean(yBreaks[c(i, i+1)]))})[ybinCode]
#Creating table
tab_match = CJ(xbinCode = seq(xNItv), ybinCode = seq(yNItv))
tab_plot = data.table(xbinCode, xbinMid, ybinCode, ybinMid)[
tab_match, .(xbinMid = xbinMid[1], ybinMid = ybinMid[1], N = .N), keyby = .EACHI, on = c("xbinCode", "ybinCode")
]
#Returning table
return(tab_plot)
}
before.mem.size <- memory.size()
# Simulation of dataset
nrow <- 6e5
ncol <- 60
dt <- do.call(data.table, lapply(seq(ncol), function(i) {return(runif(nrow))}) %>% set_names(paste0("x", seq(ncol))))
# Graph plotting
dummyEnv <- new.env()
with(dummyEnv, {
fcn <- function(tab) {
binned.dt <- tab_by_bin_idxy(dt = tab, x = "x1", y = "x2", xNItv = 50, yNItv = 50)
plot <- ggplot(binned.dt, aes(x = xbinMid, y = ybinMid)) + geom_point(aes(size = N))
return(plot)
}
lst_plots <- list(
plot1 = fcn(dt),
plot2 = fcn(dt[x1 <= 0.7]),
plot3 = fcn(dt[x5 <= 0.3])
)
assign("size.of.plots", object.size(lst_plots), envir = .GlobalEnv)
})
rm(dummyEnv)
# After use, remove and clean up of dataset
rm(dt)
gc();gc()
after.mem.size <- memory.size()
# Memory reports
print(paste0("before.mem.size = ", before.mem.size))
print(paste0("after.mem.size = ", after.mem.size))
print(paste0("plot.objs.size = ", size.of.plots / 1000000))
I have tried the following modifications to the code:
Inside fcn, removing ggplot and returning a NULL instead of a plot object: The memory leakage is totally gone. But this is not a solution. I need the plot.
The less plots requested / less columns / less rows passed to fcn, the less is the memory leakage.
Memory leakage also exists if I do not make any subset and make only one plot object (In the examples, I plotted 3).
After the process, even after I call rm(list = ls()), the memory is still non-recoverable.
I wish to know why this happens and how to get rid of it without compromising my need to do binned plots and subset dt to make different plots.
Thanks for attention!

Create R Function with flexibility to reference different datasets

I am trying to create a simple function in R that can reference multiple datasets and multiple variable names. Using the following code, I get an error, which I believe is due to referencing:
set.seed(123)
dat1 <- data.frame(x = sample(10), y = sample(10), z = sample(10))
dat2 <- data.frame(x = sample(10), y = sample(10), z = sample(10))
table(dat1$x, dat1$y)
table(dat2$x, dat2$y)
fun <- function(dat, sig, range){print(table(dat$sig, dat$range))}
fun(dat = dat1, sig = x, range = y)
fun(dat = dat2, sig = x, range = y)
Any idea how to adjust this code so that it can return the table appropriately?
The [[ ]] operator on data frame is similar to $ but allows you to introduce an object and look for it's value. Then outside of the function you assign "x" value to sig. if you don't put quotes there R will look for x object
fun <- function(dat, sig, range){print(table(dat[[sig]], dat[[range]]))}
fun(dat = dat1, sig = "x", range = "y")
fun(dat = dat2, sig = "x", range = "y")

R: Find intersection of two vectors

I have two vectors. I need to find the intersection between these two, and do a nice plot of it.
So, here is a very simple data frame example:
df <- data.frame( id <- c(1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2),
p <-c(5,7,9,11,13,15,17,19,21,23,20,18,16,14,12,10,8,6,4,2 ),
q <-c(3,5,7,13,19,31,37,53,61,67,6,18,20,24,40,46,66,70,76,78))
colnames(df) <- c("id","price","quantity")
supply <- df[df$id == 1,]
demand <- df[df$id == 2,]
plot( x = supply$quantity, y = supply$price, type = "l", ylab = "price", xlab = "quantity")
lines(x = demand$quantity , y = demand$price, type = "l")
grid()
Now, I can plot them and find the intersection manually, but can you make R calculate the intersection between these two lines?
The data can take huge jumps, and the lines can go from very step to nearly horizontal.
Be careful creating your data frame. You want =, not <-. Also, make id a factor, for clarity.
df <- data.frame(
id = factor(rep(c("supply", "demand"), each = 10)),
price = c(5,7,9,11,13,15,17,19,21,23,20,18,16,14,12,10,8,6,4,2 ),
quantity = c(3,5,7,13,19,31,37,53,61,67,6,18,20,24,40,46,66,70,76,78)
)
First we define common, frequent points to evaluate the quantity at.
quantity_points <- with(
df,
seq(min(quantity), max(quantity), length.out = 500)
)
Now split the dataset into supply/demand parts.
by_id <- split(df[, c("price", "quantity")], df$id)
Then we use approx to calculate the price at each of these quantities, for supply and demand separately.
interpolated_price <- lapply(
by_id,
function(x)
{
with(
x,
approx(
quantity,
price,
xout = quantity_points
)
)$y
}
)
Finally, the crossing point is where the absolute value of the supply price minus the demand price is minimised.
index_of_equality <- with(interpolated_price, which.min(abs(supply - demand)))
quantity_points[index_of_equality]

Resources