R: Change Vector Output to Several Ranges - r

I am using Jenks Natural Breaks via the BAMMtools package to segment my data in RStudio Version 1.0.153. The output is a vector that shows where the natural breaks occur in my data set, as such:
[1] 14999 41689 58415 79454 110184 200746
I would like to take the output above and create the ranges inferred by the breaks. Ex: 14999-41689, 41690-58415, 58416-79454, 79455-110184, 110185-200746
Are there any functions that I can use in R Studio to accomplish this? Thank you in advance!

Input data
x <- c(14999, 41689, 58415, 79454, 110184, 200746)
If you want the ranges as characters you can do
y <- x; y[1] <- y[1] - 1 # First range given in question doesn't follow the pattern. Adjusting for that
paste(head(y, -1) + 1, tail(y, -1), sep = '-')
#[1] "14999-41689" "41690-58415" "58416-79454" "79455-110184" "110185-200746"
If you want a list of the actual sets of numbers in each range you can do
seqs <- Map(seq, head(y, -1) + 1, tail(y, -1))

You can definitely create your own function that produces the exact output you're looking for, but you can use the cut function that will give you something like this:
# example vector
x = c(14999, 41689, 58415, 79454, 110184, 200746)
# use the vector and its values as breaks
ranges = cut(x, x, dig.lab = 6)
# see the levels
levels(ranges)
#[1] "(14999,41689]" "(41689,58415]" "(58415,79454]" "(79454,110184]" "(110184,200746]"

Related

Missing ticks in custom axis transformation

When plotting the ratio between two variables, their relative order is often of no concern, yet depending on which variable is in the numerator, its relative size is constrained either to (0,1) or (1, Inf), which is somewhat unintuitive and breaks symmetry. I want to plot ratios "symmetrically", without resorting to symmetric log-scale, by having a y-axis that goes like 1/4, 1/3, 1/2, 1, 2, 3, 4 or, equivalently, 4^-1, 3^-1, 2^-1, 1, 2, 3, 4 in regular intervals. I've come up with the following:
symmult <- function(x){
isf <- is.finite(x) & (x>0)
xf <- x[isf]
xf <- ifelse(xf>=1,
xf-1,
1-(1/xf))
x[isf] <- xf
x[!isf] <- NA
x[!is.finite(x)] <- NA
return(x)
}
symmultinv <- function(x){
isf <- is.finite(x)
xf <- x[isf]
xf <- ifelse(x[isf]>=0,
x[isf]+1,
-1/(x[isf]-1))
x[isf] <- xf
x[!isf] <- NA
x[!is.finite(x)] <- NA
return(x)
}
sym_mult_trans = function(){trans_new("sym_mult", symmult, symmultinv )}
x <- c(-4:-2, 1:4)
x[x<1] <- 1/abs(x[x<1])
ggplot() +
geom_point(aes(x=x, y=x)) +
scale_y_continuous(trans="sym_mult")
The transformation works, but I cannot get the axis labels etc. to work for any 0<x<1, without setting them manually. Any help would be greatly appreciated.
You can create bespoke 'breaks' and 'format' functions that you can use inside trans_new (or pass to scale_y_continuous directly via its breaks and labels parameters).
For the breaks function, remember it will take as input a length-two numeric vector representing the range of the y axis. You must then convert this to a number of appropriate breaks. Here, if the minimum of the range is less than one, we take its reciprocal, find the pretty breaks between one and that number, then take the reciprocal of the output. We concatenate that onto pretty breaks between 1 and our range maximum:
# Define breaks function
symmult_breaks <- function(x) {
c(1 / extended_breaks(5)(c(1/x[x < 1], 1)),
extended_breaks(5)(c(1, x[x >= 1])))
}
For the labelling function, remember, it needs to take as input the vector of numbers produced by our breaks function. We can paste a 1/ in front of the reciprocal of numbers less than one, but leave numbers of 1 or more unaltered:
# Define labelling function
symmult_labs <- function(x) {
labs <- character(length(x))
labs[x >= 1] <- as.character(x[x >= 1])
labs[x < 1] <- paste("1", as.character(1/x[x < 1]), sep = "/")
labs
}
So your full new transformation becomes:
# Use our four functions to define the whole transformation:
sym_mult_trans <- function() {
trans_new(name = "sym_mult",
transform = symmult,
inverse = symmultinv,
breaks = symmult_breaks,
format = symmult_labs)
}
And your plot becomes:
ggplot() +
geom_point(aes(x = x, y = x)) +
scale_y_continuous(trans = "sym_mult")

Is there an R function to find local mins and maxes?

Is there a function in R that can take a vector as input and output a vector with local mins and maxes, and where they occur in the original vector?
Let v be a plain vector and define a peak as an element which is strictly larger than the elements to either side and similarly for trough.
1) rollapply Then the following gives two logical vectors each having the same length as v. One indicates positions of peaks using TRUE and FALSE for other positions and the other indicates troughs in the same manner. which(peaks) and which(troughs) can be used to get the index numbers if that representation is preferred.
library(zoo)
peaks <- rollapply(v, 3, function(x) x[2] > max(x[-2]), fill = FALSE)
troughs <- rollapply(v, 3, function(x) x[2] < min(x[-2]), fill = FALSE)
We could combine them like this where each output component is 1 if it is a peak, -1 if it is a trough and 0 otherwise.
extreme <- function(x) (x[2] > max(x[-2])) - (x[2] < min(x[-2]))
rollapply(v, 3, extreme, fill = FALSE)
2) Base R A base R method would be:
prev <- c(NA, v[-length(v)])
post <- c(v[-1], NA)
(v > pmax(prev, post)) - (v < pmin(prev, post))

How to avoid gaps due to missing values in matplot in R?

I have a function that uses matplot to plot some data. Data structure is like this:
test = data.frame(x = 1:10, a = 1:10, b = 11:20)
matplot(test[,-1])
matlines(test[,1], test[,-1])
So far so good. However, if there are missing values in the data set, then there are gaps in the resulting plot, and I would like to avoid those by connecting the edges of the gaps.
test$a[3:4] = NA
test$b[7] = NA
matplot(test[,-1])
matlines(test[,1], test[,-1])
In the real situation this is inside a function, the dimension of the matrix is bigger and the number of rows, columns and the position of the non-overlapping missing values may change between different calls, so I'd like to find a solution that could handle this in a flexible way. I also need to use matlines
I was thinking maybe filling in the gaps with intrapolated data, but maybe there is a better solution.
I came across this exact situation today, but I didn't want to interpolate values - I just wanted the lines to "span the gaps", so to speak. I came up with a solution that, in my opinion, is more elegant than interpolating, so I thought I'd post it even though the question is rather old.
The problem causing the gaps is that there are NAs between consecutive values. So my solution is to 'shift' the column values so that there are no NA gaps. For example, a column consisting of c(1,2,NA,NA,5) would become c(1,2,5,NA,NA). I do this with a function called shift_vec_na() in an apply() loop. The x values also need to be adjusted, so we can make the x values into a matrix using the same principle, but using the columns of the y matrix to determine which values to shift.
Here's the code for the functions:
# x -> vector
# bool -> boolean vector; must be same length as x. The values of x where bool
# is TRUE will be 'shifted' to the front of the vector, and the back of the
# vector will be all NA (i.e. the number of NAs in the resulting vector is
# sum(!bool))
# returns the 'shifted' vector (will be the same length as x)
shift_vec_na <- function(x, bool){
n <- sum(bool)
if(n < length(x)){
x[1:n] <- x[bool]
x[(n + 1):length(x)] <- NA
}
return(x)
}
# x -> vector
# y -> matrix, where nrow(y) == length(x)
# returns a list of two elements ('x' and 'y') that contain the 'adjusted'
# values that can be used with 'matplot()'
adj_data_matplot <- function(x, y){
y2 <- apply(y, 2, function(col_i){
return(shift_vec_na(col_i, !is.na(col_i)))
})
x2 <- apply(y, 2, function(col_i){
return(shift_vec_na(x, !is.na(col_i)))
})
return(list(x = x2, y = y2))
}
Then, using the sample data:
test <- data.frame(x = 1:10, a = 1:10, b = 11:20)
test$a[3:4] <- NA
test$b[7] <- NA
lst <- adj_data_matplot(test[,1], test[,-1])
matplot(lst$x, lst$y, type = "b")
You could use the na.interpolation function from the imputeTS package:
test = data.frame(x = 1:10, a = 1:10, b = 11:20)
test$a[3:4] = NA
test$b[7] = NA
matplot(test[,-1])
matlines(test[,1], test[,-1])
library('imputeTS')
test <- na.interpolation(test, option = "linear")
matplot(test[,-1])
matlines(test[,1], test[,-1])
Had also the same issue today. In my context I was not permitted to interpolate. I am providing here a minimal, but sufficiently general working example of what I did. I hope it helps someone:
mymatplot <- function(data, main=NULL, xlab=NULL, ylab=NULL,...){
#graphical set up of the window
plot.new()
plot.window(xlim=c(1,ncol(data)), ylim=range(data, na.rm=TRUE))
mtext(text = xlab,side = 1, line = 3)
mtext(text = ylab,side = 2, line = 3)
mtext(text = main,side = 3, line = 0)
axis(1L)
axis(2L)
#plot the data
for(i in 1:nrow(data)){
nin.na <- !is.na(data[i,])
lines(x=which(nin.na), y=data[i,nin.na], col = i,...)
}
}
The core 'trick' is in x=which(nin.na). It aligns the data points of the line consistently with the indices of the x axis.
The lines
plot.new()
plot.window(xlim=c(1,ncol(data)), ylim=range(data, na.rm=TRUE))
mtext(text = xlab,side = 1, line = 3)
mtext(text = ylab,side = 2, line = 3)
mtext(text = main,side = 3, line = 0)
axis(1L)
axis(2L)`
draw the graphical part of the window.
range(data, na.rm=TRUE) adapts the plot to a proper size being able to include all data points.
mtext(...) is used to label the axes and provides the main title. The axes themselves are drawn by the axis(...) command.
The following for-loop plots the data.
The function head of mymatplot provides the ... argument for an optional passage of typical plot parameters as lty, lwt, cex etc. via . Those will be passed on to the lines.
At last word on the choice of colors - they are up to your flavor.

Multiply unique pairs of values in a vector and sum the result

I want to multiply and then sum the unique pairs of a vector, excluding pairs made of the same element, such that for c(1:4):
(1*2) + (1*3) + (1*4) + (2*3) + (2*4) + (3*4) == 35
The following code works for the example above:
x <- c(1:4)
bar <- NULL
for( i in 1:length(x)) { bar <- c( bar, i * c((i+1) : length(x)))}
sum(bar[ 1 : (length(bar) - 2)])
However, my actual data is a vector of rational numbers, not integers, so the (i+1) portion of the loop will not work. Is there a way to look at the next element of the set after i, e.g. j, so that I could write i * c((j : length(x))?
I understand that for loops are usually not the most efficient approach, but I could not think of how to accomplish this via apply etc. Examples of that would be welcome, too. Thanks for your help.
An alternative to a loop would be to use combn and multiply the combinations using the FUN argument. Then sum the result:
sum(combn(x = 1:4, m = 2, FUN = function(x) x[1] * x[2]))
# [1] 35
Even better to use prod in FUN, as suggested by #bgoldst:
sum(combn(x = 1:4, m = 2, FUN = prod))

Select an existing variable by using readline

I'd like to allow the user of my script to pick an existing object (a vector).
I thought something like this
...
message("Select a vector of y values")
nwd <- readLines(n = 1)
return(mean(nwd))
...
but the result is NA because nwd is seen as a character.
How can I solve?
Thanks.
A bit safer than eval(parse(...)):
x <- 1:10
message("Select a vector of y values")
nwd <- readLines(n = 1)
#input x
mean(get(nwd))
#[1] 5.5

Resources