Missing last sequence in seq() in R - r

I have this example data
by<-200
to<-seq(from=by,to=35280,by=by)
Problem is that to ends at 35200 and ignore the last 80 which I need to involve in as last value.
Is there any straigthforward way how to achieve it?
I have tried along.with and length.out parameters but I cannot go trough.

You can place if statement for the last element of the vector such as in the following function :
seqlast <- function (from, to, by)
{
vec <- do.call(what = seq, args = list(from, to, by))
if ( tail(vec, 1) != to ) {
return(c(vec, to))
} else {
return(vec)
}
}
Then
by <- 200
to <- seqlast(from=by,to=35280,by=by)
will return
> head(to)
[1] 200 400 600 800 1000 1200
> tail(to)
[1] 34400 34600 34800 35000 35200 35280

In seq(), "The second form generates from, from+by, ..., up to the sequence value less than or equal to to." And also since 35280 is not in the requested sequence, it is not returned.
But you can use a calculation in the arguments it you wan to include the next value. Since you know the to value, assign it a name and use it.
by <- 200
out <- 35280
x <- seq(from = by, to = (out + by - out %% by), by = by)
length(x)
# [1] 177
x[length(x)]
# [1] 35400
If you want to include the to value, even if it is not in the requested sequence, you can write a little function to add it back on
seqil <- function(..., include.last = TRUE) {
x <- do.call(seq.default, list(...))
if(include.last) c(x, to) else x
}
by <- 200
x <- seqil(from = by, to = 35280, by = by)
tail(x)
# [1] 34400 34600 34800 35000 35200 35280

First of all, seq() is behaving as it should in your example. You want something that seq() by itself will simply not deliver.
One solution (there are certainly many) is to check weather "there was anything left" at the end of your sequence and, if so, add another element to it. (Or modify the last element of your sequence, it is not exactly clear what you are trying to achieve.) Like so:
step <- 200
end <- 35280
to<-seq(from=step,to=end,by=step)
# the modulus of your end point by step
m = end%%step
# if not zero, you have something to add to your sequence
if(m != 0) {
# add the end point
to = c(to, end)
}
tail(to,2)
# 35200 35280

Whilst not solving the exact issue raised my preferred solution to this is to extend the sequence to add an additional value so that the to value is included in the sequence rather than just appending the to value at the end.
This builds on the answers by both #djas and #Etienne Kintzler.
seq0 <- function(from = 1, to = 1, by = 1, incLast = TRUE){
out = do.call(what = seq, args = list(from, to, by))
if (incLast & to%%by != 0){
out = c(out, tail(out, 1) + by)
}
return(out)
}
Example outputs:
> seq0(from = 0, to = 20, by = 6, incLast = FALSE)
[1] 0 6 12 18
> seq0(from = 0, to = 20, by = 6, incLast = TRUE)
[1] 0 6 12 18 24
> seq0(from = 0, to = -20, by = -6, incLast = FALSE)
[1] 0 -6 -12 -18
> seq0(from = 0, to = -20, by = -6, incLast = TRUE)
[1] 0 -6 -12 -18 -24

This is a modification of Rich Scriven answer, that does not add an extra number, when it is not necessary (as with 20)
by <- 2
out <- 21
out <- 20
x <- seq(from = 0, to = out + by*ifelse(out %% by>0,1,0) - out %% by , by = by)
x

You can use c() to include the last number.
by<-200
c(seq(from=by,to=35280,by=by), 35280)

Related

Randomly remove some numeric data from a matrix in R?

I have a large data matrix with many numeric values (counts) in it. I would like to remove 10% of all counts. So, for example, a matrix which looks like this:
30 10
0 20
The sum of all counts here is 60. 10% of 60 is 6. So I want to randomly remove 6. A correct output could be:
29 6
0 19
(As you can see it removed 1 from 30, 4 from 10 and 1 from 20). There cannot be negative values.
How could I program this in R?
Here is a way. It subtracts 1 to positive matrix elements until a certain total to remove is reached.
subtract_int <- function(X, n){
inx <- which(X != 0, arr.ind = TRUE)
N <- nrow(inx)
while(n > 0){
i <- sample(N, 1)
if(X[ inx[i, , drop = FALSE] ] > 0){
X[ inx[i, , drop = FALSE] ] <- X[ inx[i, , drop = FALSE] ] - 1
n <- n - 1
}
if(any(X[inx] == 0)){
inx <- which(X != 0, arr.ind = TRUE)
N <- nrow(inx)
}
}
X
}
set.seed(2021)
to_remove <- round(sum(A)*0.10)
subtract_int(A, to_remove)
# [,1] [,2]
#[1,] 30 6
#[2,] 0 18
Data
A <- structure(c(30, 0, 10, 20), .Dim = c(2L, 2L))
Maybe this helps you at least to get on the right track. It's nothing more than a draft though:
randomlyRemove <- function(matrix) {
sum_mat <- sum(matrix)
while (sum_mat > 0) {
sum_mat <- sum_mat - runif(1, min = 0, max = sum_mat)
x <- round(runif(1, 1, dim(matrix)[1]), digits = 0)
y <- round(runif(1, 1, dim(matrix)[2]), digits = 0)
matrix[x,y] <- matrix[x,y] - sum_mat
}
return(matrix)
}
You might want to play with the random number generator process to get more evenly distributed substractions.
edit: added round(digits = 0) to get only integer (dimension) values and modified the random (dimension) value generation to start from 1 (not zero).
I think we can make it work with using sample. This solution is a lot more compact.
The data
A <- structure(c(30, 0, 11, 20), .Dim = c(2L, 2L))
sum(A)
#> [1] 61
The logic
UseThese <- (1:length(A))[A > 0] # Choose indices to be modified because > 0
Sample <- sample(UseThese, sum(A)*0.1, replace = TRUE) # Draw a sample of indices
A[UseThese] <- A[UseThese] - as.vector(table(Sample)) # Subtract handling repeated duplicate indices in the sample
Check the result
A
#> [,1] [,2]
#> [1,] 28 8
#> [2,] 0 19
sum(A) # should be the value above minus 6
#> [1] 55
One disadvantage of this solution is that it could lead to negative
values. So check with:
any(A < 0)
#> [1] FALSE

Centered moving average in R (without using packages)

I have been constructing a function for centered moving average in R (without using any packages), and have encountered a challenge as below:
As you know, the centered moving average includes the concept of incorporating the 'incomplete portions' (i.e. at the beginning and the end of the datapoint). For example, consider below vector p:
p <- c(10,20,30,40,50,60,70,80,90)
In this case, centered moving average that I am interested in looks like this:
x <- ((10+20)/2, (10+20+30)/3, (20+30+40)/3 ..... (70+80+90)/3, (80+90)/2)
To achieve above, I tried function with if function as below:
wd means window size
mov_avg <- function(p, wd) {
x <- c(0, cumsum(p))
if ((p > p[1])&(p < p[length(p)])) {
neut <- 1:(length(p)-(wd-1))
upper <- neut+(wd-1)
x <- (x[upper]-x[neut])/(upper-neut)
} else if (p==p[1]) {
neut <- 0
upper <- neut+3
x <- (x[upper]-x[neut])/(upper-1-neut)
} else if (p==p[length(p)]) {
upper <-(length(p)+1)
neut <- (length(p)-(wd-2))
x <- (x[upper]-x[neut])/(upper-neut)
}
return(x)
}
Then I entered below line to execute:
mov_avg(p, 3)
I encountered errors as below:
numeric(0)
Warning messages:
1: In if ((p > p[1]) & (p < p[length(p)])) { :
the condition has length > 1 and only the first element will be used
2: In if (p == p[1]) { :
the condition has length > 1 and only the first element will be used
Could someone help me out in making this a working function?
Thank you!
How about something like this in base R:
window <- 3
p <- c(10,20,30,40,50,60,70,80,90)
x <- c(NA, p, NA)
sapply(seq_along(x[-(1:(window - 1))]), function(i)
mean(x[seq(i, i + window - 1)], na.rm = T))
#[1] 15 20 30 40 50 60 70 80 85
The trick is to add flanking NAs and then use mean with na.rm = T.
I know you said "without using packages", but the same is even shorter using zoo::rollapply
library(zoo)
rollapply(c(NA, p, NA), 3, mean, na.rm = T)
#[1] 15 20 30 40 50 60 70 80 85
We could also use rowMeans
rowMeans(embed(c(NA, p, NA), 3)[, 3:1], na.rm = TRUE)
#[1] 15 20 30 40 50 60 70 80 85
Another method is to create a function where we can adjust with variable windows
mov_avg <- function(p, window) {
mean_number = numeric()
index = 1
while(index < length(p)) {
if (index == 1 | index == length(p) - 1)
mean_number = c(mean_number, mean(p[index:(index + window - 2)]))
else
mean_number = c(mean_number, mean(p[index:(index + window - 1)]))
index = index + 1
}
mean_number
}
mov_avg(p, 3)
#[1] 15 30 40 50 60 70 80 85
mov_avg(p, 2)
#[1] 10 25 35 45 55 65 75 80
Take the mean by rows in a matrix with columns that are x, and the head and tail appended with the means respectively of the first two and last two elements.
apply( matrix( c(x,
c( x[1]+x[2])/2, head(x,-1) ),
c( tail(x,-1), sum( tail(x,2))/2) ),
ncol = 3),
1, mean)

Represent a numeric vector as a set of characters using :

In R, you can define an arbitrary integer sequence using :, e.g.
a = c(1:3, 12:14)
print(a)
## 1 2 3 12 13 14
I'm looking for a way to do the inverse operation, e.g. given a vector of integers I want to produce a character (or character vector) that collapses the integer sequence(s) to the equivalent expressions using :, e.g.
some_function (a)
## "1:3" "12:14"
Bonus if the stride can be detected, e.g. c(1, 3, 5) becomes "1:2:5" or something like that.
Motivation: generate an integer sequence in R based on some data manipulation to identify database row selection, and pass the most concise representation of that sequence to an external program in the proper format.
We can be able to take into consideration the rle of the differences and paste the range together taking into consideration the sequence distance.
fun=function(s){
m=c(0,diff(s))
b=rle(m)
b$values[b$lengths==1&b$values!=1]=0
l=cumsum(!inverse.rle(b))
d=function(x)paste0(range(x[,1]),
collapse = paste0(":",unique(x[-1,-1]),":"))
f=c(by(cbind(s,m),l,d))
sub("::.*","",sub(":1:",":",f))
}
fun(c(1,1:3,12:14,c(1,3,5)))
1 2 3 4
"1" "1:3" "12:14" "1:2:5"
fun(c(1, 3, 5, 8:10, 14, 17, 20))
1 2 3
"1:2:5" "8:10" "14:3:20"
fun(1)
1
"1"
Ah, nerd heaven. Here's a first shot. You could even use this for encoding within R.
Needs testing; code always prints the stride out.
encode_ranges <- function (x) {
rle_diff <- list(
start = x[1],
rled = rle(diff(x))
)
class(rle_diff) <- "rle_diff"
rle_diff
}
decode_ranges <- function (x) {
stopifnot(inherits(x, "rle_diff"))
cumsum(c(x$start, inverse.rle(x$rled)))
}
format.rle_diff <- function (x, ...) {
stopifnot(inherits(x, "rle_diff"))
output <- character(length(x$rled$values))
start <- x$start
for (j in seq_along(x$rled$values)) {
stride <- x$rled$values[j]
len <- x$rled$lengths[j]
if (len == 1L) {
start <- end + stride
next
}
end <- start + stride * x$rled$lengths[j]
output[j] <- paste(start, end, stride, sep = ":")
}
output <- output[nchar(output) > 0]
paste(output, collapse = ", ")
}
print.rle_diff <- function (x, ...) cat(format(x, ...))
encode_ranges(c(1:3, 12:14))
encode_ranges(c(1, 3, 5, 8:10, 14, 17, 20))
We create a grouping variable with diff and cumsum, then use on the group by functions to paste the range of values
f1 <- function(vec) {
unname(tapply(vec, cumsum(c(TRUE, diff(vec) != 1)),
FUN = function(x) paste(range(x), collapse=":")))
}
f1(a)
#[1] "1:3" "12:14"
For the second case
b <- c(1, 3, 5)
un1 <- unique(diff(c(1, 3, 5)))
paste(b[1], un1, b[length(b)], sep=":")
#[1] "1:2:5"

Replacing the value in between vectors when there is a defined difference

I have question on replacing the value in between the vectors.
The algorithm should find that replacement number when the certain condition is met. In this case finding the number which makes the difference -20 with the previous number. So I prefer to use diff function.
Here is what I mean
x <- c(20,20,0,20,0,5)
> diff(x)
[1] 0 -20 20 -20 5
So in this case 0 makes the difference -20 and I want to change those 0s to 20.
. I know the easiest solution is the directly assigning x[3] <- 20 or x[5] <- 20
However, the 0 location is always different so I need an automated process that can do that. Thanks!
**EDIT
if we need to do this in a grouped data.frame
> df
x gr
1 20 1
2 20 1
3 0 1
4 20 1
5 0 1
6 5 1
7 33 2
8 0 2
9 20 2
10 0 2
11 20 2
12 0 2
How can we implement this ?
modify <- function(x){
value_search = c(0, 33)
value_replacement = c(20, 44)
for (k in 1:length(value_search)) {
index_position = which(x %in% value_search[k])
replacement = value_replacement[k]
for (i in index_position) {
x[i] = replacement
}
}
}
df%>%
group_by(gr)%>%
mutate(modif_x=modify(x))
Error in mutate_impl(.data, dots) :
Evaluation error: 'match' requires vector arguments.
You can do it using which to get the position, i.e.
x[which(diff(x) == -20)+1] <- 20
x
#[1] 20 20 20 20 20 5
if you want a generic way to replace values of a vector based on particular values, i would approach it this way.
x = c(20,20,0,20,0,5)
value_search = 0
value_replacement = 20
index_position = which(x %in% value_search)
for (i in index_position) {
x[i] = value_replacement
}
but this works for single values. if you want to look for multiple values, you can use a nested loop as below:
x = c(20,20,0,20,0,5,33)
value_search = c(0, 33)
value_replacement = c(20, 44)
for (k in 1:length(value_search)) {
index_position = which(x %in% value_search[k])
replacement = value_replacement[k]
for (i in index_position) {
x[i] = replacement
}
}
in response to OP's edits:
any number of ways to do this:
x = c(20,20,0,20,0,5,33)
gr = c(1,1,1,1,2,2,2)
df = data.frame(x, gr)
func_replace <- function(source, value_search, value_replacement) {
for (k in 1:length(source)) {
index_position = which(x %in% value_search[k])
replacement = value_replacement[k]
for (i in index_position) {
source[i] = replacement
} # for i loop
} # for k loop
return(source)
} # func_replace
value_search = c(0, 33)
value_replacement = c(20, 44)
gr_value = 1
df$replacement = with(df, ifelse(gr == gr_value, sapply(df, FUN = function(x) func_replace(x, value_search, value_replacement)), NA))

pick a random number, always with increasing value over last random number picked

How would I efficiently go about taking a 1-by-1 ascending random sample of the values 1:n, making sure that each of the randomly sampled values is always higher than
the previous value?
e.g.:
For the values 1:100, get a random number, say which is 61. (current list=61)
Then pick another number between 62 and 100, say which is 90 (current list=61,90)
Then pick another number between 91 and 100, say which is 100.
Stop the process as the max value has been hit (final list=61,90,100)
I have been stuck in loop land, thinking in this clunky manner:
a1 <- sample(1:100,1)
if(a1 < 100) {
a2 <- sample((a+1):100,1)
}
etc etc...
I want to report a final vector being the concatenation of a1,a2,a(n):
result <- c(a1,a2)
Even though this sounds like a homework question, it is not. I thankfully left the days of homework many years ago.
Coming late to the party, but I think this is gonna rock your world:
unique(cummax(sample.int(100)))
This uses a while loop and is wrapped in a function
# from ?sample
resample <- function(x, ...) x[sample.int(length(x), ...)]
sample_z <- function(n){
z <- numeric(n)
new <- 0
count <- 1
while(new < n){
from <- seq(new+1,n,by=1)
new <- resample(from, size= 1)
z[count] <- new
if(new < n) count <- count+1
}
z[1:count]
}
set.seed(1234)
sample_z(100)
## [1] 12 67 88 96 100
Edit
note the change to deal with when the new sample is 100 and the way sample deals with an integer as opposed to a vector for x
Edit 2
Actually reading the help for sample gave the useful resample function. Which avoids the pitfalls when length(x) == 1
Not particularly efficient but:
X <- 0
samps <- c()
while (X < 100) {
if(is.null(samps)) {z <- 1 } else {z <- 1 + samps[length(samps)]}
if (z == 100) {
samps <- c(samps, z)
} else {
samps <- c(samps, sample(z:100, 1))
}
X <- samps[length(samps)]
}
samps
EDIT: Trimming a little fat from it:
samps <- c()
while (is.null(samps[length(samps)]) || samps[length(samps)] < 100 ) {
if(is.null(samps)) {z <- 1 } else {z <- 1 + samps[length(samps)]}
if (z == 100) {
samps <- c(samps, z)
} else {
samps <- c(samps, sample(z:100, 1))
}
}
samps
even later to the party, but just for kicks:
X <- Y <- sample(100L)
while(length(X <- Y) != length(Y <- X[c(TRUE, diff(X)>0)])) {}
> print(X)
[1] 28 44 60 98 100
Sorting Random Vectors
Create a vector of random integers and sort it afterwards.
sort(sample(1:1000, size = 10, replace = FALSE),decreasing = FALSE)
Gives 10 random Integers between 1 and 1000.
> sort(sample(1:1000, size = 10, replace = FALSE),decreasing = FALSE)
[1] 44 88 164 314 617 814 845 917 944 995
This of course also works with random decimals and floats.

Resources