I have a function that uses matplot to plot some data. Data structure is like this:
test = data.frame(x = 1:10, a = 1:10, b = 11:20)
matplot(test[,-1])
matlines(test[,1], test[,-1])
So far so good. However, if there are missing values in the data set, then there are gaps in the resulting plot, and I would like to avoid those by connecting the edges of the gaps.
test$a[3:4] = NA
test$b[7] = NA
matplot(test[,-1])
matlines(test[,1], test[,-1])
In the real situation this is inside a function, the dimension of the matrix is bigger and the number of rows, columns and the position of the non-overlapping missing values may change between different calls, so I'd like to find a solution that could handle this in a flexible way. I also need to use matlines
I was thinking maybe filling in the gaps with intrapolated data, but maybe there is a better solution.
I came across this exact situation today, but I didn't want to interpolate values - I just wanted the lines to "span the gaps", so to speak. I came up with a solution that, in my opinion, is more elegant than interpolating, so I thought I'd post it even though the question is rather old.
The problem causing the gaps is that there are NAs between consecutive values. So my solution is to 'shift' the column values so that there are no NA gaps. For example, a column consisting of c(1,2,NA,NA,5) would become c(1,2,5,NA,NA). I do this with a function called shift_vec_na() in an apply() loop. The x values also need to be adjusted, so we can make the x values into a matrix using the same principle, but using the columns of the y matrix to determine which values to shift.
Here's the code for the functions:
# x -> vector
# bool -> boolean vector; must be same length as x. The values of x where bool
# is TRUE will be 'shifted' to the front of the vector, and the back of the
# vector will be all NA (i.e. the number of NAs in the resulting vector is
# sum(!bool))
# returns the 'shifted' vector (will be the same length as x)
shift_vec_na <- function(x, bool){
n <- sum(bool)
if(n < length(x)){
x[1:n] <- x[bool]
x[(n + 1):length(x)] <- NA
}
return(x)
}
# x -> vector
# y -> matrix, where nrow(y) == length(x)
# returns a list of two elements ('x' and 'y') that contain the 'adjusted'
# values that can be used with 'matplot()'
adj_data_matplot <- function(x, y){
y2 <- apply(y, 2, function(col_i){
return(shift_vec_na(col_i, !is.na(col_i)))
})
x2 <- apply(y, 2, function(col_i){
return(shift_vec_na(x, !is.na(col_i)))
})
return(list(x = x2, y = y2))
}
Then, using the sample data:
test <- data.frame(x = 1:10, a = 1:10, b = 11:20)
test$a[3:4] <- NA
test$b[7] <- NA
lst <- adj_data_matplot(test[,1], test[,-1])
matplot(lst$x, lst$y, type = "b")
You could use the na.interpolation function from the imputeTS package:
test = data.frame(x = 1:10, a = 1:10, b = 11:20)
test$a[3:4] = NA
test$b[7] = NA
matplot(test[,-1])
matlines(test[,1], test[,-1])
library('imputeTS')
test <- na.interpolation(test, option = "linear")
matplot(test[,-1])
matlines(test[,1], test[,-1])
Had also the same issue today. In my context I was not permitted to interpolate. I am providing here a minimal, but sufficiently general working example of what I did. I hope it helps someone:
mymatplot <- function(data, main=NULL, xlab=NULL, ylab=NULL,...){
#graphical set up of the window
plot.new()
plot.window(xlim=c(1,ncol(data)), ylim=range(data, na.rm=TRUE))
mtext(text = xlab,side = 1, line = 3)
mtext(text = ylab,side = 2, line = 3)
mtext(text = main,side = 3, line = 0)
axis(1L)
axis(2L)
#plot the data
for(i in 1:nrow(data)){
nin.na <- !is.na(data[i,])
lines(x=which(nin.na), y=data[i,nin.na], col = i,...)
}
}
The core 'trick' is in x=which(nin.na). It aligns the data points of the line consistently with the indices of the x axis.
The lines
plot.new()
plot.window(xlim=c(1,ncol(data)), ylim=range(data, na.rm=TRUE))
mtext(text = xlab,side = 1, line = 3)
mtext(text = ylab,side = 2, line = 3)
mtext(text = main,side = 3, line = 0)
axis(1L)
axis(2L)`
draw the graphical part of the window.
range(data, na.rm=TRUE) adapts the plot to a proper size being able to include all data points.
mtext(...) is used to label the axes and provides the main title. The axes themselves are drawn by the axis(...) command.
The following for-loop plots the data.
The function head of mymatplot provides the ... argument for an optional passage of typical plot parameters as lty, lwt, cex etc. via . Those will be passed on to the lines.
At last word on the choice of colors - they are up to your flavor.
Related
When plotting the ratio between two variables, their relative order is often of no concern, yet depending on which variable is in the numerator, its relative size is constrained either to (0,1) or (1, Inf), which is somewhat unintuitive and breaks symmetry. I want to plot ratios "symmetrically", without resorting to symmetric log-scale, by having a y-axis that goes like 1/4, 1/3, 1/2, 1, 2, 3, 4 or, equivalently, 4^-1, 3^-1, 2^-1, 1, 2, 3, 4 in regular intervals. I've come up with the following:
symmult <- function(x){
isf <- is.finite(x) & (x>0)
xf <- x[isf]
xf <- ifelse(xf>=1,
xf-1,
1-(1/xf))
x[isf] <- xf
x[!isf] <- NA
x[!is.finite(x)] <- NA
return(x)
}
symmultinv <- function(x){
isf <- is.finite(x)
xf <- x[isf]
xf <- ifelse(x[isf]>=0,
x[isf]+1,
-1/(x[isf]-1))
x[isf] <- xf
x[!isf] <- NA
x[!is.finite(x)] <- NA
return(x)
}
sym_mult_trans = function(){trans_new("sym_mult", symmult, symmultinv )}
x <- c(-4:-2, 1:4)
x[x<1] <- 1/abs(x[x<1])
ggplot() +
geom_point(aes(x=x, y=x)) +
scale_y_continuous(trans="sym_mult")
The transformation works, but I cannot get the axis labels etc. to work for any 0<x<1, without setting them manually. Any help would be greatly appreciated.
You can create bespoke 'breaks' and 'format' functions that you can use inside trans_new (or pass to scale_y_continuous directly via its breaks and labels parameters).
For the breaks function, remember it will take as input a length-two numeric vector representing the range of the y axis. You must then convert this to a number of appropriate breaks. Here, if the minimum of the range is less than one, we take its reciprocal, find the pretty breaks between one and that number, then take the reciprocal of the output. We concatenate that onto pretty breaks between 1 and our range maximum:
# Define breaks function
symmult_breaks <- function(x) {
c(1 / extended_breaks(5)(c(1/x[x < 1], 1)),
extended_breaks(5)(c(1, x[x >= 1])))
}
For the labelling function, remember, it needs to take as input the vector of numbers produced by our breaks function. We can paste a 1/ in front of the reciprocal of numbers less than one, but leave numbers of 1 or more unaltered:
# Define labelling function
symmult_labs <- function(x) {
labs <- character(length(x))
labs[x >= 1] <- as.character(x[x >= 1])
labs[x < 1] <- paste("1", as.character(1/x[x < 1]), sep = "/")
labs
}
So your full new transformation becomes:
# Use our four functions to define the whole transformation:
sym_mult_trans <- function() {
trans_new(name = "sym_mult",
transform = symmult,
inverse = symmultinv,
breaks = symmult_breaks,
format = symmult_labs)
}
And your plot becomes:
ggplot() +
geom_point(aes(x = x, y = x)) +
scale_y_continuous(trans = "sym_mult")
I have the following code which creates a plot for, the data is located here
Data
data<-lidar
x<-lidar$range
y<-lidar$logratio
h<-20
par(mfrow=c(2,2))
r<-max(x)-min(x)
bn<-ceiling(r/h)
binwidth=c(5,10,30,100)
#Creates a matrix to handle the data of same length
W<-matrix(nrow=length(x),ncol=bn)
for (j in 1:bn){
for (i in 1:length(x)){
if (x[i]>=(min(x)+(j-1)*h) && x[i]<=(min(x)+(j)*h)){W[i,j]=1}
else {W[i,j]=0}
}
}
#Sets up the y-values of the bins
fit<-rep(0,bn)
for (j in 1:bn){
fit[j]<- sum(y*W[,j]/sum(W[,j]))
}
#Sets up the x values of the bins
t<-numeric(bn)
for (j in 1:bn){
t[j]=(min(x)+0.5*h)+(j-1)*h
}
plot(x,y)
lines(t,fit,type = "S", col = 1, lwd = 2)
This creates a single plot in the left corner of a page since I have
par(mfrow=c(2,2))
Is there a way to create a for statement that will plot 4 graphs for me on that one page using h values of 5,10,30,100 (The values provided by the variable binwidth) so I don't have to manually change my h value every time to reproduce a new plot so my final result appears like this,
Essentially I want to run the code 4 times with different values of h using another for statement that plots all 4 results without me changing h all the time. Any help or hints are greatly appreciated.
Here's a fully reproducible example that loads the data directly from the url then uses the apply family to iterate through the different plots
lidar <- read.table(paste0("http://www.stat.cmu.edu/%7Elarry",
"/all-of-nonpar/=data/lidar.dat"),
header = TRUE)
par(mfrow = c(2, 2))
breaks <- lapply(c(5, 10, 30, 100), function(i) {
val <- seq(min(lidar$range), max(lidar$range), i)
c(val, max(val) + i)})
means <- lapply(breaks, function(i) {
vals <- tapply(lidar$logratio,
cut(lidar$range, breaks = i, include.lowest = TRUE), mean)
c(vals[1], vals)})
invisible(mapply(function(a, b) {
plot(lidar$range, lidar$logratio)
lines(a, b, type = "S", lwd = 2)
}, breaks, means))
Created on 2020-09-25 by the reprex package (v0.3.0)
Answering directly your question: keep the same other parameters:
data<-lidar
x<-lidar$range
y<-lidar$logratio
h<-20
par(mfrow=c(2,2))
r<-max(x)-min(x)
bn<-ceiling(r/h)
binwidth=c(5,10,30,100)
do a plot function (not necessary, but good practice)
doplot = function(h){
#Creates a matrix to handle the data of same length
W<-matrix(nrow=length(x),ncol=bn)
for (j in 1:bn){
for (i in 1:length(x)){
if (x[i]>=(min(x)+(j-1)*h) && x[i]<=(min(x)+(j)*h)){W[i,j]=1}
else {W[i,j]=0}
}
}
#Sets up the y-values of the bins
fit<-rep(0,bn)
for (j in 1:bn){
fit[j]<- sum(y*W[,j]/sum(W[,j]))
}
#Sets up the x values of the bins
t<-numeric(bn)
for (j in 1:bn){
t[j]=(min(x)+0.5*h)+(j-1)*h
}
plot(x,y)
lines(t,fit,type = "S", col = 1, lwd = 2)
}
and then loop on the h parameter
for(h in c(5,10,30,100)){
doplot(h)
}
A general comment: you could gain a lot learning how to use the data.frames, a bit of dplyr or data.table and ggplot2 to do that. I feels that you could replicate your entire code + plots in 10 more comprehensible lines.
I am using the multiplot function of this website
This function takes as argument an indefinite number of plots and plot them together. With the code below I can plot for example 4 plots together in 2 columns.
library(ggplot2)
n = 4
p = lapply(1:n, function(i) {
ggplot(data.frame(x = 1:10,y = rep(i,10)), aes(x = x, y = y)) +
geom_point()
})
multiplot(p[[1]],p[[2]],p[[3]],p[[4]],cols = 2)
How can I do if I have an undetermined number of plots n ?
I already tried things like
multiplot(p,cols = 2)
do.call(multiplot, list(p,cols = 2))
but it doesn't give the result that I want
With the second option you were also close:
do.call(multiplot, c(p, cols = 2))
does the job since
length(list(p, cols = 2))
# [1] 2
length(c(p, cols = 2))
# [1] 5
That is, list(p, cols = 2) makes a list of two components: a list p and an integer 2, while what you want is to extend p by adding cols = 2, and we need for that. (Somewhat not immediately obvious perhaps, but the title of ?c does indeed say Combine Values into a Vector or List.)
maybe multiplot(plotlist = p) !
cheers
I have a data frame with 3 columns of data that I would like to plot separately - 3 plots. The data has NA in it (in different places within the 3 columns). I basically want to interpolate the missing values and plot that segment of the line (multiple sections) in red and the remainder of the line black.
I have managed to use 'zoo' to create the interpolated data but am unsure how then to plot this data point a different colour. I have found the following Elegant way to select the color for a particular segment of a line plot?
but was thinking I could use a for loop with if else statement to create the colour column as advised in the link - I would need 3 separate colour columns as I have 3 datasets.
Appreciate any help - cannot really provide an example as I'm unsure where to start! Thanks
This is my solution. It assumes that the NAs are still present in the original data. These will be omitted in the first plot() command. The function then loops over just the NAs.
You will probably get finer control if you take the plot() command out of the function. As written, "..." gets passed to plot() and a type = "b" graph is mimicked - but it's trivial to change it to whatever you want.
# Function to plot interpolated valules in specified colours.
PlotIntrps <- function(exxes, wyes, int_wyes, int_pt = "red", int_ln = "grey",
goodcol = "darkgreen", ptch = 20, ...) {
plot(exxes, wyes, type = "b", col = goodcol, pch = ptch, ...)
nas <- which(is.na(wyes))
enn <- length(wyes)
for (idx in seq(nas)) {
points(exxes[nas[idx]], int_wyes[idx], col = int_pt, pch = ptch)
lines(
x = c(exxes[max(nas[idx] - 1, 1)], exxes[nas[idx]],
exxes[min(nas[idx] + 1, enn)]),
y = c(wyes[max(nas[idx] - 1, 1)], int_wyes[idx],
wyes[min(nas[idx] + 1, enn)]),
col = int_ln, type = "c")
# Only needed if you have 2 (or more) contiguous NAs (interpolations)
wyes[nas[idx]] <- int_wyes[idx]
}
}
# Dummy data (jitter() for some noise)
x_data <- 1:12
y_data <- jitter(c(12, 11, NA, 9:7, NA, NA, 4:1), factor = 3)
interpolations <- c(10, 6, 5)
PlotIntrps(exxes = x_data, wyes = y_data, int_wyes = interpolations,
main = "Interpolations in pretty colours!",
ylab = "Didn't manage to get all of these")
Cheers.
I am trying to implement Chebyshev filter to smooth a time series but, unfortunately, there are NAs in the data series.
For example,
t <- seq(0, 1, len = 100)
x <- c(sin(2*pi*t*2.3) + 0.25*rnorm(length(t)),NA, cos(2*pi*t*2.3) + 0.25*rnorm(length(t)))
I am using Chebyshev filter: cf1 = cheby1(5, 3, 1/44, type = "low")
I am trying to filter the time series exclude NAs, but not mess up the orders/position. So, I have already tried na.rm=T, but it seems there's no such argument.
Then
z <- filter(cf1, x) # apply filter
Thank you guys.
Try using x <- x[!is.na(x)] to remove the NAs, then run the filter.
You can remove the NAs beforehand using the compelete.cases function. You also might consider imputing the missing data. Check out the mtsdi or Amelia II packages.
EDIT:
Here's a solution with Rcpp. This might be helpful is speed is important:
require(inline)
require(Rcpp)
t <- seq(0, 1, len = 100)
set.seed(7337)
x <- c(sin(2*pi*t*2.3) + 0.25*rnorm(length(t)),NA, cos(2*pi*t*2.3) + 0.25*rnorm(length(t)))
NAs <- x
x2 <- x[!is.na(x)]
#do something to x2
src <- '
Rcpp::NumericVector vecX(vx);
Rcpp::NumericVector vecNA(vNA);
int j = 0; //counter for vx
for (int i=0;i<vecNA.size();i++) {
if (!(R_IsNA(vecNA[i]))) {
//replace and update j
vecNA[i] = vecX[j];
j++;
}
}
return Rcpp::wrap(vecNA);
'
fun <- cxxfunction(signature(vx="numeric",
vNA="numeric"),
src,plugin="Rcpp")
if (identical(x,fun(x2,NAs)))
print("worked")
# [1] "worked"
I don't know if ts objects can have missing values, but if you just want to re-insert the NA values, you can use ?insert from R.utils. There might be a better way to do this.
install.packages(c('R.utils', 'signal'))
require(R.utils)
require(signal)
t <- seq(0, 1, len = 100)
set.seed(7337)
x <- c(sin(2*pi*t*2.3) + 0.25*rnorm(length(t)), NA, NA, cos(2*pi*t*2.3) + 0.25*rnorm(length(t)), NA)
cf1 = cheby1(5, 3, 1/44, type = "low")
xex <- na.omit(x)
z <- filter(cf1, xex) # apply
z <- as.numeric(z)
for (m in attributes(xex)$na.action) {
z <- insert(z, ats = m, values = NA)
}
all.equal(is.na(z), is.na(x))
?insert
Here is a function you can use to filter a signal with NAs in it.
The NAs are ignored rather than replaced by zero.
You can then specify a maximum percentage of weight which the NAs may take at any point of the filtered signal. If there are too many NAs (and too few actual data) at a specific point, the filtered signal itself will be set to NA.
# This function applies a filter to a time series with potentially missing data
filter_with_NA <- function(x,
window_length=12, # will be applied centrally
myfilter=rep(1/window_length,window_length), # a boxcar filter by default
max_percentage_NA=25) # which percentage of weight created by NA should not be exceeded
{
# make the signal longer at both sides
signal <- c(rep(NA,window_length),x,rep(NA,window_length))
# see where data are present and not NA
present <- is.finite(signal)
# replace the NA values by zero
signal[!is.finite(signal)] <- 0
# apply the filter
filtered_signal <- as.numeric(filter(signal,myfilter, sides=2))
# find out which percentage of the filtered signal was created by non-NA values
# this is easy because the filter is linear
original_weight <- as.numeric(filter(present,myfilter, sides=2))
# where this is lower than one, the signal is now artificially smaller
# because we added zeros - compensate that
filtered_signal <- filtered_signal / original_weight
# but where there are too few values present, discard the signal
filtered_signal[100*(1-original_weight) > max_percentage_NA] <- NA
# cut away the padding to left and right which we previously inserted
filtered_signal <- filtered_signal[((window_length+1):(window_length+length(x)))]
return(filtered_signal)
}