When plotting the ratio between two variables, their relative order is often of no concern, yet depending on which variable is in the numerator, its relative size is constrained either to (0,1) or (1, Inf), which is somewhat unintuitive and breaks symmetry. I want to plot ratios "symmetrically", without resorting to symmetric log-scale, by having a y-axis that goes like 1/4, 1/3, 1/2, 1, 2, 3, 4 or, equivalently, 4^-1, 3^-1, 2^-1, 1, 2, 3, 4 in regular intervals. I've come up with the following:
symmult <- function(x){
isf <- is.finite(x) & (x>0)
xf <- x[isf]
xf <- ifelse(xf>=1,
xf-1,
1-(1/xf))
x[isf] <- xf
x[!isf] <- NA
x[!is.finite(x)] <- NA
return(x)
}
symmultinv <- function(x){
isf <- is.finite(x)
xf <- x[isf]
xf <- ifelse(x[isf]>=0,
x[isf]+1,
-1/(x[isf]-1))
x[isf] <- xf
x[!isf] <- NA
x[!is.finite(x)] <- NA
return(x)
}
sym_mult_trans = function(){trans_new("sym_mult", symmult, symmultinv )}
x <- c(-4:-2, 1:4)
x[x<1] <- 1/abs(x[x<1])
ggplot() +
geom_point(aes(x=x, y=x)) +
scale_y_continuous(trans="sym_mult")
The transformation works, but I cannot get the axis labels etc. to work for any 0<x<1, without setting them manually. Any help would be greatly appreciated.
You can create bespoke 'breaks' and 'format' functions that you can use inside trans_new (or pass to scale_y_continuous directly via its breaks and labels parameters).
For the breaks function, remember it will take as input a length-two numeric vector representing the range of the y axis. You must then convert this to a number of appropriate breaks. Here, if the minimum of the range is less than one, we take its reciprocal, find the pretty breaks between one and that number, then take the reciprocal of the output. We concatenate that onto pretty breaks between 1 and our range maximum:
# Define breaks function
symmult_breaks <- function(x) {
c(1 / extended_breaks(5)(c(1/x[x < 1], 1)),
extended_breaks(5)(c(1, x[x >= 1])))
}
For the labelling function, remember, it needs to take as input the vector of numbers produced by our breaks function. We can paste a 1/ in front of the reciprocal of numbers less than one, but leave numbers of 1 or more unaltered:
# Define labelling function
symmult_labs <- function(x) {
labs <- character(length(x))
labs[x >= 1] <- as.character(x[x >= 1])
labs[x < 1] <- paste("1", as.character(1/x[x < 1]), sep = "/")
labs
}
So your full new transformation becomes:
# Use our four functions to define the whole transformation:
sym_mult_trans <- function() {
trans_new(name = "sym_mult",
transform = symmult,
inverse = symmultinv,
breaks = symmult_breaks,
format = symmult_labs)
}
And your plot becomes:
ggplot() +
geom_point(aes(x = x, y = x)) +
scale_y_continuous(trans = "sym_mult")
Related
I'm going to depict a continuous graph on R using ggplot2 library, which is based on piecewise function.
fun1 <- function(x){
ifelse(x<=-2, -1.5*(-x)^2, NA)
}
fun2 <- function(x){
ifelse(x>-2, 6*x-5, NA)
}
gg <- ggplot(data.frame(x = c(-5, 5)),
aes(x = x))+
stat_function(fun = fun1, n=1001)+
stat_function(fun = fun2, n=1001)
print(gg)
This shows the correct graph of the piecewise function.
But if I change the condition slightly different, such that ifelse(x<-2, -1.5*(-x)^2, NA) and ifelse(x>=-2, 6*x-5, NA), then suddenly the first part of the graph would not be depicted correctly.
This seems that the ifelse function now always returns -6 for all values x, irrespective of if it is <=-2 or not.
But why does this change suddenly happen? I tried to depict a continuous curve using standard plot function but still got the same result...
This is what happens when you don't leave spaces between operators. As well as making your code less readable, you can sometimes confuse the parser. In your case, the parser interprets x<-2 as x <- 2, i.e. "assign 2 to x", not x < -2, i.e. "x is less than minus 2".
When you use an assigment as the first argument in ifelse, it will assess the "truthiness" of the value of the assignment, and since 2 is a positive integer, it will always evaluate to true. Since by the time the clause is evaluated, x is 2, then for all input values of the function, your output will be -1.5 * (-2)^2, i.e. -6.
If you use spaces (or parentheses) , this doesn't happen:
fun1 <- function(x){
ifelse(x < -2, -1.5*(-x)^2, NA)
}
fun2 <- function(x){
ifelse(x > -2, 6*x-5, NA)
}
gg <- ggplot(data.frame(x = c(-5, 5)),
aes(x = x))+
stat_function(fun = fun1, n=1001)+
stat_function(fun = fun2, n=1001)
print(gg)
I need to scan nearly a million datapoints and determine if they lay under or above a threshold. I have the threshold defined globally and I have a simple predefined function
function.lower.penalty <- function(i,j){ if( i < j ){
#if gate condition is met, flip the gate flag:
n <- 1 }else{n<-0} return(n) }
that I call with mapply, which will write a 0/1 flag column in my dataframe:
df[, paste0("outside.highpass")] <- mapply(function.lower.penalty,i="somesignal.found.in.df", j="*some.threshold.found.in.df*" )
This is pretty straightforward, I can flag dozens of signals with their respective thresholds like this in a second big dataframe. Also, given how the threshold is written, the code will either flag the signals as below/above the threshold (meaning I got also a function.higher.penalty).
Now I was asked to make a more complex threshold that has the shape of a multisegmented line.
What is the fastest way to flag datapoints given that you have only the corner points of the multisegmented line (I can guess them according to how they painted the line) visible here.
Until now I had a predefined threshold (gray 0.2) and used mapply to scan the signal drawn on the x-axis. I just used a function to return 0 or 1 if datapoint was smaller or bigger than the threshold. Now I need a multisegmented line like the one drawn in red to do the same job.
Edit: Using the suggestion from det I was able to flag datapoints in the dataframe. However, it seems that some datapoints close to the defined line are wrongly assinged, see here. I am wondering as how to work around it or if this is a drawing error?
You can create function which returns picewise linear function based on points:
picewiseLinear <- function(x.var, y.var){
stopifnot(length(x.var) == length(y.var), sum(duplicated(x.var)) == 0)
p <- order(x.var)
x.var <- x.var[p]
y.var <- y.var[p]
k <- diff(y.var) / diff(x.var)
l <- -1 * k * head(x.var, -1) + head(y.var, -1)
function(x){
ind <- findInterval(x, x.var)
if(!all(between(ind, 1, length(x.var) - 1))) stop("wrong input")
x * k[ind] + l[ind]
}
}
For example:
point_df <- tribble(
~x, ~y,
3, 0,
5, 2,
3, 3,
5, 4
)
f <- picewiseLinear(point_df$y, point_df$x)
(on your picture you have picewise linear function but looked on x as dependent variable)
and on example dataset you get something like this:
set.seed(123)
tibble(
x = runif(1000, 0, 6),
y = runif(1000, 0, 4)
) %>%
mutate(color = ifelse(x > f(y), "red", "blue")) %>%
ggplot(aes(x, y)) +
geom_point(aes(color = color)) +
scale_color_identity() +
geom_path(data = point_df)
I have a function that uses matplot to plot some data. Data structure is like this:
test = data.frame(x = 1:10, a = 1:10, b = 11:20)
matplot(test[,-1])
matlines(test[,1], test[,-1])
So far so good. However, if there are missing values in the data set, then there are gaps in the resulting plot, and I would like to avoid those by connecting the edges of the gaps.
test$a[3:4] = NA
test$b[7] = NA
matplot(test[,-1])
matlines(test[,1], test[,-1])
In the real situation this is inside a function, the dimension of the matrix is bigger and the number of rows, columns and the position of the non-overlapping missing values may change between different calls, so I'd like to find a solution that could handle this in a flexible way. I also need to use matlines
I was thinking maybe filling in the gaps with intrapolated data, but maybe there is a better solution.
I came across this exact situation today, but I didn't want to interpolate values - I just wanted the lines to "span the gaps", so to speak. I came up with a solution that, in my opinion, is more elegant than interpolating, so I thought I'd post it even though the question is rather old.
The problem causing the gaps is that there are NAs between consecutive values. So my solution is to 'shift' the column values so that there are no NA gaps. For example, a column consisting of c(1,2,NA,NA,5) would become c(1,2,5,NA,NA). I do this with a function called shift_vec_na() in an apply() loop. The x values also need to be adjusted, so we can make the x values into a matrix using the same principle, but using the columns of the y matrix to determine which values to shift.
Here's the code for the functions:
# x -> vector
# bool -> boolean vector; must be same length as x. The values of x where bool
# is TRUE will be 'shifted' to the front of the vector, and the back of the
# vector will be all NA (i.e. the number of NAs in the resulting vector is
# sum(!bool))
# returns the 'shifted' vector (will be the same length as x)
shift_vec_na <- function(x, bool){
n <- sum(bool)
if(n < length(x)){
x[1:n] <- x[bool]
x[(n + 1):length(x)] <- NA
}
return(x)
}
# x -> vector
# y -> matrix, where nrow(y) == length(x)
# returns a list of two elements ('x' and 'y') that contain the 'adjusted'
# values that can be used with 'matplot()'
adj_data_matplot <- function(x, y){
y2 <- apply(y, 2, function(col_i){
return(shift_vec_na(col_i, !is.na(col_i)))
})
x2 <- apply(y, 2, function(col_i){
return(shift_vec_na(x, !is.na(col_i)))
})
return(list(x = x2, y = y2))
}
Then, using the sample data:
test <- data.frame(x = 1:10, a = 1:10, b = 11:20)
test$a[3:4] <- NA
test$b[7] <- NA
lst <- adj_data_matplot(test[,1], test[,-1])
matplot(lst$x, lst$y, type = "b")
You could use the na.interpolation function from the imputeTS package:
test = data.frame(x = 1:10, a = 1:10, b = 11:20)
test$a[3:4] = NA
test$b[7] = NA
matplot(test[,-1])
matlines(test[,1], test[,-1])
library('imputeTS')
test <- na.interpolation(test, option = "linear")
matplot(test[,-1])
matlines(test[,1], test[,-1])
Had also the same issue today. In my context I was not permitted to interpolate. I am providing here a minimal, but sufficiently general working example of what I did. I hope it helps someone:
mymatplot <- function(data, main=NULL, xlab=NULL, ylab=NULL,...){
#graphical set up of the window
plot.new()
plot.window(xlim=c(1,ncol(data)), ylim=range(data, na.rm=TRUE))
mtext(text = xlab,side = 1, line = 3)
mtext(text = ylab,side = 2, line = 3)
mtext(text = main,side = 3, line = 0)
axis(1L)
axis(2L)
#plot the data
for(i in 1:nrow(data)){
nin.na <- !is.na(data[i,])
lines(x=which(nin.na), y=data[i,nin.na], col = i,...)
}
}
The core 'trick' is in x=which(nin.na). It aligns the data points of the line consistently with the indices of the x axis.
The lines
plot.new()
plot.window(xlim=c(1,ncol(data)), ylim=range(data, na.rm=TRUE))
mtext(text = xlab,side = 1, line = 3)
mtext(text = ylab,side = 2, line = 3)
mtext(text = main,side = 3, line = 0)
axis(1L)
axis(2L)`
draw the graphical part of the window.
range(data, na.rm=TRUE) adapts the plot to a proper size being able to include all data points.
mtext(...) is used to label the axes and provides the main title. The axes themselves are drawn by the axis(...) command.
The following for-loop plots the data.
The function head of mymatplot provides the ... argument for an optional passage of typical plot parameters as lty, lwt, cex etc. via . Those will be passed on to the lines.
At last word on the choice of colors - they are up to your flavor.
Actually I try to plot a figure but it puts and shows all the columns(lines) on each other so it is not representative. I try to make a simulated data and show you how I plot it, and also show you what I want
I don't know how to make a data like example i show below but here what I do
set.seed(1)
M <- matrix(rnorm(20),20,5)
x <- as.matrix(sort(runif(20, 5.0, 7.5)))
df <- as.data.frame(cbind(x,M))
After making the data frame, I will plot all columns versus the first one by melting it and using ggplot
require(ggplot2)
require(reshape)
dff <- melt(df , id.vars = 'V1')
b <- ggplot(dff, aes(V1,value)) + geom_line(aes(colour = variable))
I want to have specific distance between each line (in this case we have 6) something like below. in one dimension it is V1, in another dimension it is the number of column. I don't care about the function , I just want the photo
This solution uses rgl and produces this plot:
It uses this function that accepts 3 arguments:
df : a data.frame just like your 'M' above
x : a numeric vector (or a 1-coldata.frame`) for the x-axis
cols : (optionnal) a vector of colours to repeat. If missing, black line are drawn
Here is the function:
nik_plot <- function(df, x, cols){
require(rgl)
# if a data.frame is
if (is.data.frame(x) && ncol(x)==1)
x <- as.numeric(x[, 1])
# prepare a vector of colors
if (missing(cols))
cols <- rep_len("#000000", nrow(df))
else
cols <- rep_len(cols, nrow(df))
# initialize an empty 3D plot
plot3d(NA, xlim=range(x), ylim=c(1, ncol(df)-1), zlim=range(df), xlab="Mass/Charge (M/Z)", ylab="Time", zlab="Ion Spectra", box=FALSE)
# draw lines, silently
silence_please <- sapply(1:ncol(df), function(i) lines3d(x=x, y=i, z=df[, i], col=cols[i]))
}
Note that you can remove require(rgl) from the function and library(rgl) somewhere in your script, eg at the beginning.
If you don't have rgl installed, then install.packages("rgl").
Black lines, the default, may produce some moiré effect, but a repeating color palette is worse. This may be brain-dependant. A single colour would also avoid introducing an artificial dimension (and a strong one).
An example below:
# black lines
nik_plot(M, x)
# as in the image above
nik_plot(M, x, "grey40")
# an unreadable rainbow
nik_plot(M, x, rainbow(12))
The 3D window can be navigated with the mouse.
Do you need something else?
EDIT
You can build your second plot with the function below. The range of your data is so large, and I think the whole idea behind shifting upwards every line, prevent having an y-axis with a reliable scale. Here I have normalized all signals (0 <= signal <= 1). Also the parameter gap can be use to play with this. We could disconnect the two behaviors but I think it's nice. Try different values of gap and see examples below.
df : a data.frame just like your 'M' above
x : a numeric vector (or a 1-coldata.frame`) for the x-axis
cols : (optionnal) a vector of colours to repeat. If missing, black line are drawn
gap : gap factor between individual lines
more_gap_each: every n lines, a bigger gap is produced...
more_gap_relative: ... and will be gap x more_gap_relative wide
Here is the function:
nik_plot2D <- function(df, x, cols, gap=10, more_gap_each=1, more_gap_relative=0){
if (is.data.frame(x) && ncol(x)==1)
x <- as.numeric(x[, 1])
# we normalize ( 0 <= signal <= 1)
df <- df-min(df)
df <- (df/max(df))
# we prepare a vector of colors
if (missing(cols))
cols <- rep_len("#00000055", nrow(df))
else
cols <- rep_len(cols, nrow(df))
# we prepare gap handling. there is probably more elegant
gaps <- 1
for (i in 2:ncol(df))
gaps[i] <- gaps[i - 1] + 1/gap + ifelse((i %% more_gap_each) == 0, (1/gap)*more_gap_relative, 0)
# we initialize the plot
plot(NA, xlim=range(x), ylim=c(min(df), 1+max(gaps)), xlab="Time", ylab="", axes=FALSE, mar=rep(0, 4))
axis(1)
# finally, the lines
silent <- lapply(1:ncol(df), function(i) lines(x, df[, i] + gaps[i], col=cols[i]))
}
We can use it with (default):
nik_plot2D(M, x) # gap=10
And you obtain this plot:
or:
nik_plot2D(M, x, 50)
or, with colors:
nik_plot2D(M, x, gap=20, cols=1:3)
nik_plot2D(M, x, gap=20, cols=rep(1:3, each=5))
or, still with colours and but with larger gaps:
nik_plot2D(M, x, gap=20, cols=terrain.colors(10), more_gap_each = 1, more_gap_relative = 0) # no gap by default
nik_plot2D(M, x, gap=20, cols=terrain.colors(10), more_gap_each = 10, more_gap_relative = 4) # large gaps every 10 lines
nik_plot2D(M, x, gap=20, cols=terrain.colors(10), more_gap_each = 5, more_gap_relative = 2) # small gaps every 5 lines
As other have pointed out, your data have very large peaks and it's not clear whether you want to allow some curves to overlap,
m <- read.table("~/Downloads/M.txt", head=T)
fudge <- 0.05
shifty <- function(m, fudge=1){
shifts <- fudge * max(abs(apply(m, 2, diff))) * seq(0, ncol(m)-1)
m + matrix(shifts, nrow=nrow(m), ncol=ncol(m), byrow=TRUE)
}
par(mfrow=c(1,2), mar=c(0,0,1,0))
cols <- colorRampPalette(blues9[4:9])(ncol(m))
matplot(shifty(m), t="l", lty=1, bty="n", yaxt="n", xaxt="n", ylab="", col=cols)
title("no overlap")
matplot(shifty(m, 0.05), t="l", lty=1, bty="n", yaxt="n", xaxt="n", ylab="", col=cols)
title("some overlap")
Alternatively, some outlier/peak detection scheme could be used to filter them out before calculating the shift between curves,
library(outliers)
shifty2 <- function(m, outliers = 10){
tmp <- m
for(ii in seq_len(outliers)) tmp <- rm.outlier(tmp, median = TRUE)
shifts <- max(abs(apply(tmp, 2, diff))) * seq(0, ncol(m)-1)
m + matrix(shifts, nrow=nrow(m), ncol=ncol(m), byrow=TRUE)
}
matplot(shifty2(m), t="l", lty=1, bty="n", yaxt="n", xaxt="n", ylab="", col=cols)
(there are probably good algorithms to decide which points to remove, but I don't know them)
For 3D plotting I prefer the rgl package. This should be close to your desired solution.
The color of each scan changes on every third one.
library(rgl)
M<-read.table("M.txt", sep="\t", header = TRUE, colClasses = "numeric")
x<-read.table("x.txt", sep="\t", header = TRUE)
n<-ncol(M)
M[M<1]<-1
plot3d(x='', xlim=range(x$Time), ylim=c(1, n), zlim=(range(M)), box=FALSE)
sapply(seq(1,n), function(t){lines3d(x$Time, y=t*10, z=(M[,t])/10000, col=t/3+1)})
title3d(xlab="scan", ylab="time", zlab="intensity")
title3d(main ="Extracted Spectra Subset")
axes3d()
#axis3d(edge="x")
#axis3d(edge="y")
#axis3d(edge="z")
Do the huge differences in magnitude of the data points, I needed to scale some factors to make a readable graph. The intensity of the goes from 0 to nearly 1,000,000, thus distorting the graph. Attempted to normalize by taking the ln, but plot became unreadable.
I need to modify the scale of the y-axis in a ggplot2 graphic : I want to express the y-axis in thousands and not in units. For example, the labels have to be 0 ; 1,000 ; 2,000 ; 3,000 instead of 0 ; 1000000 ; 2000000 ; 3000000.
Please, don't tell me to divide my data by 1000 !
My question is the same as ggplot2 axis transformation by constant factor. But the solution provided here modifies the lables parameter of the scale_y_continuous function, whereas I need this parameter to be set to comma. With this solution I get the following breaks : 0 ; 1000 ; 2000 ; 3000 ... Breaks are expressed in thousands and not in millions and this is a good point, but I loose the comma labels. I want to see 1,000 ; 2,000 ; 3,000 and not 1000 ; 2000 ; 3000...
So modifying the lables parameter of the scale_y_continuous function isn't useful. That's why I think I have to work with the trans parameter of the scale_y_continuous function instead of the labels parameter.
There are a lot of built-in transformation that match the trans parameter and solve similar problems in the scales package (look at log_trans for example). So I tried to build my own homothetic transformation, with the code below.
library(ggplot2)
var0 <- c(1:100)
var1 <- 1000000*rnorm(100)
homothetic_breaks<- function (n = 5, base = 1000)
{
function(x) {
rng <- (range(x, na.rm = TRUE)/base)
min <- floor(rng[1])
max <- ceiling(rng[2])
if (max == min)
return(base*min)
by <- floor((max - min)/n) + 1
base*seq(min, max, by = by)
}
}
homothetic_trans <- function(base = 1000) {
trans <- function(x) x/base
inv <- function(x) x*base
trans_new(paste0("diviseur_par_", format(base)), trans, inv,
homothetic_breaks(base=base), domain = c(-Inf, Inf))
}
data <- data.frame(var0,var1)
p <- ggplot(data,aes(var0,var1))+geom_path()
p <- p + scale_y_continuous(trans=homothetic_trans,labels = comma)
p
When I run this code I get the following message :
"Error: Input to str_c should be atomic vectors", and the breaks of the y axis arethe same as the ones I get when I run the following code :
library(ggplot2)
var1 <- 1000*rnorm(100)
var0 <- c(1:100)
data <- data.frame(var0,var1)
p <- ggplot(data,aes(var0,var1))+geom_path()
p