I'm trying to plot a dataset over time (timeframe of ms/s). I need to show the order of events, the type of event and the duration of each event + the time between events. The dataset consists of a start time, end time and category.
I got close with this code someone used to answer a similar question back in '11 but found that I couldn't get it to colour the events according to the category, and I don't understand what the code is doing well enough to fix the issue.
zucchini <- function(st, en, mingap=1)
{
i <- order(st, en-st);
st <- st[i];
en <- en[i];
last <- r <- 1
while( sum( ok <- (st > (en[last] + mingap)) ) > 0 )
{
last <- which(ok)[1];
r <- append(r, last);
}
if( length(r) == length(st) )
return( list(c = list(st[r], en[r]), n = 1 ));
ne <- zucchini( st[-r], en[-r]);
return(list( c = c(list(st[r], en[r]), ne$c), n = ne$n+1));
}
{
zu <- zucchini(st, en, mingap = 1);
plot.new();
plot.window( xlim=c(min(st), max(en)), ylim = c(0, zu$n+1));
box(); axis(1);
for(i in seq(1, 2*zu$n, 2))
{
x1 <- zu$c[[i]];
x2 <- zu$c[[i+1]];
for(j in 1:length(x1))
rect( x1[j], (i+1)/2, x2[j], (i+1)/2+0.5,col=data$Type, border="black",
);
legend('bottomright', legend = levels(data$Type), col = 1:10, cex = 0.8, pch = 1)}
}
st <- data$Time
en <- data$End
coliflore(st,en)
current code outputs this As best as I can tell it is assigning all boxes the same colour, that of the category of the first data point.
Does anyone know either: how to get this code to assign colours to the boxes based on a category, or how to accomplish this kind of plotting another way?
Its a little hard to for me to see whats going on without a toy dataset for your example. For maximum control over coloring in plots I like to add a color column to the dataframe or create a vector to store color values for use in plotting instead of using the factor levels to generate colors (eg data$Type). For instance if I want factors 1:3 to be red, green, and blue:
# create data frame with X,Y coordinates and 3 factor levels
toy_data<- data.frame (X= 1:9, Y=9:1, Factor = rep(1:3, times=3))
# create a vector of colors to use for plotting
# color function
colFxn<-function(val){
cw_df<-data.frame(value=1:3, color = c("red", "green", "blue"))
return(cw_df[cw_df$value %in% val,]$color)
}
col_vec<-sapply (toy_data$Factor, colFxn)
#plot
plot(toy_data$X, toy_data$Y, col=col_vec)
I prefer this option because of the control I have over my colors. This can also be expanded to transparent colors by changing the alpha value using the RGB function, or through using a color pallet available through many packages.
Related
I need to scan nearly a million datapoints and determine if they lay under or above a threshold. I have the threshold defined globally and I have a simple predefined function
function.lower.penalty <- function(i,j){ if( i < j ){
#if gate condition is met, flip the gate flag:
n <- 1 }else{n<-0} return(n) }
that I call with mapply, which will write a 0/1 flag column in my dataframe:
df[, paste0("outside.highpass")] <- mapply(function.lower.penalty,i="somesignal.found.in.df", j="*some.threshold.found.in.df*" )
This is pretty straightforward, I can flag dozens of signals with their respective thresholds like this in a second big dataframe. Also, given how the threshold is written, the code will either flag the signals as below/above the threshold (meaning I got also a function.higher.penalty).
Now I was asked to make a more complex threshold that has the shape of a multisegmented line.
What is the fastest way to flag datapoints given that you have only the corner points of the multisegmented line (I can guess them according to how they painted the line) visible here.
Until now I had a predefined threshold (gray 0.2) and used mapply to scan the signal drawn on the x-axis. I just used a function to return 0 or 1 if datapoint was smaller or bigger than the threshold. Now I need a multisegmented line like the one drawn in red to do the same job.
Edit: Using the suggestion from det I was able to flag datapoints in the dataframe. However, it seems that some datapoints close to the defined line are wrongly assinged, see here. I am wondering as how to work around it or if this is a drawing error?
You can create function which returns picewise linear function based on points:
picewiseLinear <- function(x.var, y.var){
stopifnot(length(x.var) == length(y.var), sum(duplicated(x.var)) == 0)
p <- order(x.var)
x.var <- x.var[p]
y.var <- y.var[p]
k <- diff(y.var) / diff(x.var)
l <- -1 * k * head(x.var, -1) + head(y.var, -1)
function(x){
ind <- findInterval(x, x.var)
if(!all(between(ind, 1, length(x.var) - 1))) stop("wrong input")
x * k[ind] + l[ind]
}
}
For example:
point_df <- tribble(
~x, ~y,
3, 0,
5, 2,
3, 3,
5, 4
)
f <- picewiseLinear(point_df$y, point_df$x)
(on your picture you have picewise linear function but looked on x as dependent variable)
and on example dataset you get something like this:
set.seed(123)
tibble(
x = runif(1000, 0, 6),
y = runif(1000, 0, 4)
) %>%
mutate(color = ifelse(x > f(y), "red", "blue")) %>%
ggplot(aes(x, y)) +
geom_point(aes(color = color)) +
scale_color_identity() +
geom_path(data = point_df)
The data I have generated is as follows:
set.seed(100)
n = 100
c1_prob = 0.8
X = matrix(0, nrow = n, ncol = 2)
y = matrix(0,nrow=n,ncol=1)
for (i in 1:n){
if(runif(1) < c1_prob){
X[i,] = mvrnorm(1,mu=c(2,2),Sigma=matrix(c(1,0,0,1),2,2))
y[i] = 1;
} else {
X[i,] = mvrnorm(1,mu=c(-2,-2),Sigma=matrix(c(1,0,0,1),2,2))
y[i] = 0;
}
}
I want to plot X and then color in the points using the class labels of 1 or 0 from y. I tried to created a dataframe merging X and y and then plotting the first two columns, and then coloring based on the third column (originally y).
df = data.frame(cbind(X,y))
plot(df$X1, df$X2, col = df$X3)
This is not working however, and I was wondering if there is another way to do this. Specifically, is there a way to plot the data that does not require me to merge the two matrices into a dataframe. Thanks
try ggplots this is the general format from. If you wanna make a data frame and title them then instead of x,y just type in the titles of your columns
library(ggplot2)
your_data %>% ggplot(aes(X,Y, color = Y ))+geom_point()
I have the following code which creates a plot for, the data is located here
Data
data<-lidar
x<-lidar$range
y<-lidar$logratio
h<-20
par(mfrow=c(2,2))
r<-max(x)-min(x)
bn<-ceiling(r/h)
binwidth=c(5,10,30,100)
#Creates a matrix to handle the data of same length
W<-matrix(nrow=length(x),ncol=bn)
for (j in 1:bn){
for (i in 1:length(x)){
if (x[i]>=(min(x)+(j-1)*h) && x[i]<=(min(x)+(j)*h)){W[i,j]=1}
else {W[i,j]=0}
}
}
#Sets up the y-values of the bins
fit<-rep(0,bn)
for (j in 1:bn){
fit[j]<- sum(y*W[,j]/sum(W[,j]))
}
#Sets up the x values of the bins
t<-numeric(bn)
for (j in 1:bn){
t[j]=(min(x)+0.5*h)+(j-1)*h
}
plot(x,y)
lines(t,fit,type = "S", col = 1, lwd = 2)
This creates a single plot in the left corner of a page since I have
par(mfrow=c(2,2))
Is there a way to create a for statement that will plot 4 graphs for me on that one page using h values of 5,10,30,100 (The values provided by the variable binwidth) so I don't have to manually change my h value every time to reproduce a new plot so my final result appears like this,
Essentially I want to run the code 4 times with different values of h using another for statement that plots all 4 results without me changing h all the time. Any help or hints are greatly appreciated.
Here's a fully reproducible example that loads the data directly from the url then uses the apply family to iterate through the different plots
lidar <- read.table(paste0("http://www.stat.cmu.edu/%7Elarry",
"/all-of-nonpar/=data/lidar.dat"),
header = TRUE)
par(mfrow = c(2, 2))
breaks <- lapply(c(5, 10, 30, 100), function(i) {
val <- seq(min(lidar$range), max(lidar$range), i)
c(val, max(val) + i)})
means <- lapply(breaks, function(i) {
vals <- tapply(lidar$logratio,
cut(lidar$range, breaks = i, include.lowest = TRUE), mean)
c(vals[1], vals)})
invisible(mapply(function(a, b) {
plot(lidar$range, lidar$logratio)
lines(a, b, type = "S", lwd = 2)
}, breaks, means))
Created on 2020-09-25 by the reprex package (v0.3.0)
Answering directly your question: keep the same other parameters:
data<-lidar
x<-lidar$range
y<-lidar$logratio
h<-20
par(mfrow=c(2,2))
r<-max(x)-min(x)
bn<-ceiling(r/h)
binwidth=c(5,10,30,100)
do a plot function (not necessary, but good practice)
doplot = function(h){
#Creates a matrix to handle the data of same length
W<-matrix(nrow=length(x),ncol=bn)
for (j in 1:bn){
for (i in 1:length(x)){
if (x[i]>=(min(x)+(j-1)*h) && x[i]<=(min(x)+(j)*h)){W[i,j]=1}
else {W[i,j]=0}
}
}
#Sets up the y-values of the bins
fit<-rep(0,bn)
for (j in 1:bn){
fit[j]<- sum(y*W[,j]/sum(W[,j]))
}
#Sets up the x values of the bins
t<-numeric(bn)
for (j in 1:bn){
t[j]=(min(x)+0.5*h)+(j-1)*h
}
plot(x,y)
lines(t,fit,type = "S", col = 1, lwd = 2)
}
and then loop on the h parameter
for(h in c(5,10,30,100)){
doplot(h)
}
A general comment: you could gain a lot learning how to use the data.frames, a bit of dplyr or data.table and ggplot2 to do that. I feels that you could replicate your entire code + plots in 10 more comprehensible lines.
Put simply, I want to color outliers, but only if they belong to specific category, i.e. I want
boxplot(mydata[,2:3], col=c("chartreuse","gold"), outcol="red")
but red only for those elements for which mydata[,1] is M .
It appears that outcol only specifies one color per variable (box). However, you can use points to overplot individual points any way that you want. You need to figure out the relevant x and y coordinates to use for plotting. When you make a boxplot with a statement like boxplot(mydata[,2:3]) the first variable (column 2) is plotted at x=1 and the second variable (column 3) is plotted at x=2. By capturing the return value of boxplot you can figure out the y values. Since you do not provide any data, I will illustrate with randomly generated data.
## Data
set.seed(42)
NumPts = 400
a = rnorm(NumPts)
b = rnorm(NumPts)
c = rnorm(NumPts)
CAT = sample(c("M", "N"), NumPts, replace=T)
mydata = data.frame(a,b,c, CAT)
## Find outliers
BP = boxplot(mydata[,2:3], col=c("chartreuse","gold"))
OUT2 = which(mydata[,2] %in% BP$out)
OUT3 = which(mydata[,3] %in% BP$out)
## Find outliers with category == M
M_OUT2 = OUT2[which(mydata$CAT[OUT2] == "M")]
M_OUT3 = OUT3[which(mydata$CAT[OUT3] == "M")]
## Plot desired points
points(rep(1, length(M_OUT2)),mydata[M_OUT2, 2], col="red")
points(rep(2, length(M_OUT3)),mydata[M_OUT3, 3], col="red")
I have a data frame with 3 columns of data that I would like to plot separately - 3 plots. The data has NA in it (in different places within the 3 columns). I basically want to interpolate the missing values and plot that segment of the line (multiple sections) in red and the remainder of the line black.
I have managed to use 'zoo' to create the interpolated data but am unsure how then to plot this data point a different colour. I have found the following Elegant way to select the color for a particular segment of a line plot?
but was thinking I could use a for loop with if else statement to create the colour column as advised in the link - I would need 3 separate colour columns as I have 3 datasets.
Appreciate any help - cannot really provide an example as I'm unsure where to start! Thanks
This is my solution. It assumes that the NAs are still present in the original data. These will be omitted in the first plot() command. The function then loops over just the NAs.
You will probably get finer control if you take the plot() command out of the function. As written, "..." gets passed to plot() and a type = "b" graph is mimicked - but it's trivial to change it to whatever you want.
# Function to plot interpolated valules in specified colours.
PlotIntrps <- function(exxes, wyes, int_wyes, int_pt = "red", int_ln = "grey",
goodcol = "darkgreen", ptch = 20, ...) {
plot(exxes, wyes, type = "b", col = goodcol, pch = ptch, ...)
nas <- which(is.na(wyes))
enn <- length(wyes)
for (idx in seq(nas)) {
points(exxes[nas[idx]], int_wyes[idx], col = int_pt, pch = ptch)
lines(
x = c(exxes[max(nas[idx] - 1, 1)], exxes[nas[idx]],
exxes[min(nas[idx] + 1, enn)]),
y = c(wyes[max(nas[idx] - 1, 1)], int_wyes[idx],
wyes[min(nas[idx] + 1, enn)]),
col = int_ln, type = "c")
# Only needed if you have 2 (or more) contiguous NAs (interpolations)
wyes[nas[idx]] <- int_wyes[idx]
}
}
# Dummy data (jitter() for some noise)
x_data <- 1:12
y_data <- jitter(c(12, 11, NA, 9:7, NA, NA, 4:1), factor = 3)
interpolations <- c(10, 6, 5)
PlotIntrps(exxes = x_data, wyes = y_data, int_wyes = interpolations,
main = "Interpolations in pretty colours!",
ylab = "Didn't manage to get all of these")
Cheers.