Is there a function in R that can take a vector as input and output a vector with local mins and maxes, and where they occur in the original vector?
Let v be a plain vector and define a peak as an element which is strictly larger than the elements to either side and similarly for trough.
1) rollapply Then the following gives two logical vectors each having the same length as v. One indicates positions of peaks using TRUE and FALSE for other positions and the other indicates troughs in the same manner. which(peaks) and which(troughs) can be used to get the index numbers if that representation is preferred.
library(zoo)
peaks <- rollapply(v, 3, function(x) x[2] > max(x[-2]), fill = FALSE)
troughs <- rollapply(v, 3, function(x) x[2] < min(x[-2]), fill = FALSE)
We could combine them like this where each output component is 1 if it is a peak, -1 if it is a trough and 0 otherwise.
extreme <- function(x) (x[2] > max(x[-2])) - (x[2] < min(x[-2]))
rollapply(v, 3, extreme, fill = FALSE)
2) Base R A base R method would be:
prev <- c(NA, v[-length(v)])
post <- c(v[-1], NA)
(v > pmax(prev, post)) - (v < pmin(prev, post))
Related
I have a matrix similar to this:
m <- matrix(rnorm(100), 100, 50)
and I'd like to change all the values row by row, so that any value above its row (standard deviation) *2 will become 1, otherwise 0 (basically a threshold).
I tried something like this:
cutoff <- function(x){
x[x < 2*sd(x)] <- 0
x[x > 2*sd(x)] <- 1
return(x)
}
mT <- apply(m, 1, cutoff)
but it's giving me something different. Any help would be very appreciated.
Your code is correct, you need to transpose the result as apply always returns a transposed result (See Why apply() returns a transposed xts matrix? ).
mT <- t(apply(m, 1, cutoff))
You can also reduce the cutoff function to -
cutoff <- function(x){
as.integer(x > 2*sd(x))
}
x > 2*sd(x) returns a logical value (TRUE/FALSE), if we convert it to integer it changes TRUE to 1 and FALSE to 0.
Will this work:
t(apply(m, 1, function(x) +(x > 2*sd(x))))
We can use dapply from collapse without transposing and is very efficient
library(collapse)
dapply(m, cutoff, MARGIN = 1)
I am using Jenks Natural Breaks via the BAMMtools package to segment my data in RStudio Version 1.0.153. The output is a vector that shows where the natural breaks occur in my data set, as such:
[1] 14999 41689 58415 79454 110184 200746
I would like to take the output above and create the ranges inferred by the breaks. Ex: 14999-41689, 41690-58415, 58416-79454, 79455-110184, 110185-200746
Are there any functions that I can use in R Studio to accomplish this? Thank you in advance!
Input data
x <- c(14999, 41689, 58415, 79454, 110184, 200746)
If you want the ranges as characters you can do
y <- x; y[1] <- y[1] - 1 # First range given in question doesn't follow the pattern. Adjusting for that
paste(head(y, -1) + 1, tail(y, -1), sep = '-')
#[1] "14999-41689" "41690-58415" "58416-79454" "79455-110184" "110185-200746"
If you want a list of the actual sets of numbers in each range you can do
seqs <- Map(seq, head(y, -1) + 1, tail(y, -1))
You can definitely create your own function that produces the exact output you're looking for, but you can use the cut function that will give you something like this:
# example vector
x = c(14999, 41689, 58415, 79454, 110184, 200746)
# use the vector and its values as breaks
ranges = cut(x, x, dig.lab = 6)
# see the levels
levels(ranges)
#[1] "(14999,41689]" "(41689,58415]" "(58415,79454]" "(79454,110184]" "(110184,200746]"
Let's say I have a vector full of zeros:
x <- rep(0, 100)
I want to set the values in certain ranges to 1:
starts <- seq(10, 90, 10)
stops <- starts + round(runif(length(starts), 1, 5))
I can do this with a for loop:
for(i in seq_along(starts)) x[starts[i]:stops[i]] <- 1
But I know this is frowned upon in R. How can I do this in a vectorized way, ideally without an external package?
You can use Map() to get all of the indices, Reduce(union, ...) to drop that list down to an atomic vector of the unique indices and then [<- or replace() to replace.
replace(x, Reduce(union, Map(":", starts, stops)), 1L)
Or
x[Reduce(union, Map(":", starts, stops))] <- 1L
Additionally, for() loops are not necessarily "frowned upon" in R. It depends on the situation. Many times for() loops turn out to be the most efficient route.
A solution that uses apply:
x[unlist(apply(cbind(starts, stops), 1, function(x) x[[1]]:x[[2]]))] <- 1
starts <- seq(10, 90, 1)
change_index <- starts[starts %% 10 <= 5]
x[change_index] <- 1
I want to multiply and then sum the unique pairs of a vector, excluding pairs made of the same element, such that for c(1:4):
(1*2) + (1*3) + (1*4) + (2*3) + (2*4) + (3*4) == 35
The following code works for the example above:
x <- c(1:4)
bar <- NULL
for( i in 1:length(x)) { bar <- c( bar, i * c((i+1) : length(x)))}
sum(bar[ 1 : (length(bar) - 2)])
However, my actual data is a vector of rational numbers, not integers, so the (i+1) portion of the loop will not work. Is there a way to look at the next element of the set after i, e.g. j, so that I could write i * c((j : length(x))?
I understand that for loops are usually not the most efficient approach, but I could not think of how to accomplish this via apply etc. Examples of that would be welcome, too. Thanks for your help.
An alternative to a loop would be to use combn and multiply the combinations using the FUN argument. Then sum the result:
sum(combn(x = 1:4, m = 2, FUN = function(x) x[1] * x[2]))
# [1] 35
Even better to use prod in FUN, as suggested by #bgoldst:
sum(combn(x = 1:4, m = 2, FUN = prod))
In a large dataframe (1 million+ rows), I am counting the number of elements (rows) that are within a particular range and satisfy a third criteria. I have 33 of those ranges and use a very slow for loop to get me the answer, no problem.
As speed is of massive concern, I would appreciate any help to get this to run faster. Can I get rid of the for loop and "vectorise" or any sort of "apply" solution?
Thanks in advance
Code:
N.data<-c(1:33)
Lower<-c(0,100000,125000,150000,175000,200000,225000,250000,275000,300000,325000,350000,375000,400000,425000,450000,475000,500000,550000,600000,650000,700000,750000,800000,850000,900000,950000,1000000,1100000,1200000,1300000,1400000,1500000)
Upper<-c(100000,125000,150000,175000,200000,225000,250000,275000,300000,325000,350000,375000,400000,425000,450000,475000,500000,550000,600000,650000,700000,750000,800000,850000,900000,950000,1000000,1100000,1200000,1300000,1400000,1500000, 5000000)
for (i in 1:(length(N.data))){
N.data[i]<-nrow(dataset[dataset$Z==c & dataset$X > Lower[i] & dataset$X < Upper[i],])
}
A more efficient approach:
# first logical index (vector)
idx1 <- dataset$Z == c
# second logical index (matrix)
idx2 <- mapply(function(l, u) dataset$X > l & dataset$X < u, Lower, Upper)
# combine both indices and count number of rows
N.data <- colSums(idx1 & idx2)
apply functions are not VECTORIZED. They are merely more efficient implementations of a for loop. To achieve what you seek using vectorization, here is one approach.
# Create a Dummy Dataset and Breaks
dataset = data.frame(
X = rpois(100, 10),
Z = rpois(100, 20)
)
breaks = seq(0, max(dataset$Z), length = 5)
# Add Column with Breaks
dataset = transform(dataset, Z2 = cut(Z, breaks, labels = FALSE))
# Use Aggregate to compute length for each value of Z2
c = 10
aggregate(Z ~ Z2, data = dataset, length, subset = (X == c))
This should be more efficient that using mapply, as it is completely vectorized.