round but .5 should be floored - r

From R help function: Note that for rounding off a 5, the IEC 60559 standard is expected to be used, ‘go to the even digit’. Therefore round(0.5) is 0 and round(-1.5) is -2.
> round(0.5)
[1] 0
> round(1.5)
[1] 2
> round(2.5)
[1] 2
> round(3.5)
[1] 4
> round(4.5)
[1] 4
But I need all values ending with .5 to be rounded down. All other values should be rounded as it they are done by round() function.
Example:
round(3.5) = 3
round(8.6) = 9
round(8.1) = 8
round(4.5) = 4
Is there a fast way to do it?

Per Dietrich Epp's comment, you can use the ceiling() function with an offset to get a fast, vectorized, correct solution:
round_down <- function(x) ceiling(x - 0.5)
round_down(seq(-2, 3, by = 0.5))
## [1] -2 -2 -1 -1 0 0 1 1 2 2 3
I think this is faster and much simpler than many of the other solutions shown here.
As noted by Carl Witthoft, this adds much more bias to your data than simple rounding. Compare:
mean(round_down(seq(-2, 3, by = 0.5)))
## [1] 0.2727273
mean(round(seq(-2, 3, by = 0.5)))
## [1] 0.4545455
mean(seq(-2, 3, by = 0.5))
## [1] 0.5
What is the application for such a rounding procedure?

Check if the remainder of x %% 1 is equal to .5 and then floor or round the numbers:
x <- seq(1, 3, 0.1)
ifelse(x %% 1 == 0.5, floor(x), round(x))
> 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3

I'll join the circus too:
rndflr <- function(x) {
sel <- vapply(x - floor(x), function(y) isTRUE(all.equal(y, 0.5)), FUN.VALUE=logical(1))
x[sel] <- floor(x[sel])
x[!sel] <- round(x[!sel])
x
}
rndflr(c(3.5,8.6,8.1,4.5))
#[1] 3 9 8 4

This function works by finding elements that have decimal part equal to 0.5, and adding a small negative number to to them before rounding, ensuring that they'll be rounded downwards. (It relies -- harmlessly but in slightly obfuscated manner --- on the fact that a Boolean vector in R will be converted to a vector of 0's and 1's when multiplied by a numeric vector.)
f <- function(x) {
round(x - .1*(x%%1 == .5))
}
x <- c(0.5,1,1.5,2,2.5,2.01,2.99)
f(x)
[1] 0 1 1 2 2 2 3

The function (not golfed) is very simple and checks whether the decimals that are left are .5 or less. In effect you could easily make it more useful and take 0.5 as an argument:
nice.round <- function(x, myLimit = 0.5) {
bX <- x
intX <- as.integer(x)
decimals <- x%%intX
if(is.na(decimals)) {
decimals <- 0
}
if(decimals <= myLimit) {
x <- floor(x)
} else {
x <- round(x)
}
if (bX > 0.5 & bX < 1) {
x <- 1
}
return(x)
}
Tests
Currently, this function does not work properly with values between 0.5 and 1.0.
> nice.round(1.5)
[1] 1
> nice.round(1.6)
[1] 2
> nice.round(10000.624541)
[1] 10001
> nice.round(0.4)
[1] 0
> nice.round(0.6)
[1] 1

Related

Limit rpois distribution?

I'm looking to use rpois but I'm having trouble with the lower limits erroring my model.
A toy example:
a <- 1
repeat{
if (a >= 1) {
x <- rpois(a, 1 * .15)
} else {
x <- 0
}
print(x)
if (a - x < 0){
break
}
}
Basically, I can't have a - x be negative. Is there a way to set a min/max limit on rpois?
You have found a solution in extraDistr::rtpois
This would have worked as a base R alternative:
rlimpois <- function(n, lambda, lowlimit, toplimit){
sample(x=lowlimit:toplimit, size=n,
prob=dpois(lowlimit:toplimit, lambda), replace=TRUE)
}
so for example for a sample sized 20 from a Poisson distribution with parameter 2 where none of the values exceed 3 might be
set.seed(1)
rlimpois(20, 2, 0, 3)
# 1 2 2 0 1 0 0 3 2 1 1 1 3 2 3 2 3 0 2 3

loop and modular operation in R

I want to make such a loop using R.
for i=1 output will be
1
2
3
for i=2 output will be
2
3
1
for i=3 output will be
3
1
2
Namely the outputs are successive integers. It is just when the integer reaches 4 it returns 1 and goes on. I guess I must use modular operations, how can I do that?
If you have
a <- 1:3
for a value if i, you get the the sequence with
f <- function(i) (a+i+1) %% length(a) +1
f(1)
# [1] 1 2 3
f(2)
# [1] 2 3 1
f(3)
# [1] 3 1 2
f(4)
# [1] 1 2 3
Note that it starts over again at 4
This is my solution:
f <- function(i) { x <- i:(i+2) %% 3; x[x==0] <- 3; x }
for (i in 1:5) print(f(i))
Here is a second solution:
r <- matrix(c(3,1,2, 1,2,3, 2,3,1),3)
for (i in 1:5) print(r[i %% 3 + 1,])

Generating a sequence in R

I am working with a list in R that looks like this
[1] 0 0 4 4 0 4 0
now suppose that I want to generate a list of numbers corresponding to this list like this
[[1]]
[1] 1 1
[[2]]
[1] 2 2
[[3]]
[1] 3 6
[[4]]
[1] 7 10
[[5]]
[1] 11 11
[[6]]
[1] 12 15
[[7]]
[1] 16 16
So in other words, the first list gives the groups of a list of a sequence of numbers from 1 to 16 and the second list gives the start and end of each group.
This is probably easier to see if you consider the sequence
1 2 3-6 7-10 11 12-15 16
Is there an easy way to do this? I think I could do it using some sort of global index and lapply but I wanted to see if there were an easier way.
Here's one way
# alternate input suggested by #MichaelChirico
d = c(0,0,3,3,0,3,0)
# my preferred approach
library(data.table) # version 1.9.5+
Map(c,
seq_along(d)+shift(cumsum(d), type="lag", fill=0),
seq_along(d)+cumsum(d)
)
A similar variation by #akrun:
# alternate input, starting from OP's
d2 = c(0, 0, 4, 4, 0, 4, 0)
d2 = replace( d2, !d2, 1)
# #akrun's answer
Map(c, cumsum(d2)-d2+1, cumsum(d2))
And some more:
# my original answer
start = c(1,head(cumsum(d+1)+1,-1))
Map(c, start, start + d)
# another way
s = sequence(d+1)
Map(c, seq_along(s)[s==1], seq_along(s)[c(diff(s) < 1, TRUE)] )
Here's a slightly different approach:
x <- c(0,0,3,3,0,3,0)
f <- function(x) {
ee <- split(seq_len(sum(x+1)), rep.int(seq_along(x), x+1))
lapply(ee, range)
}
f(x)
Here's a function that'll do it, no way near as elegant as #Frank's answer:
mygenerator <- function(vec){
counter <- 1
outlist <- list()
for(i in 1:length(vec)){
if(vec[i] == 0){
outlist[[i]] <- c(counter, counter)
counter <- counter + 1
} else {
outlist[[i]] <- c(counter, counter + vec[i] - 1)
counter <- counter + vec[i]
}
}
outlist
}
mygenerator(c(0, 0, 4, 4, 0, 4, 0))
[[1]]
[1] 1 1
[[2]]
[1] 2 2
[[3]]
[1] 3 6
[[4]]
[1] 7 10
[[5]]
[1] 11 11
[[6]]
[1] 12 15
[[7]]
[1] 16 16

insert elements in a vector in R

I have a vector in R,
a = c(2,3,4,9,10,2,4,19)
let us say I want to efficiently insert the following vectors, b, and c,
b = c(2,1)
d = c(0,1)
right after the 3rd and 7th positions (the "4" entries), resulting in,
e = c(2,3,4,2,1,9,10,2,4,0,1,19)
How would I do this efficiently in R, without recursively using cbind or so.
I found a package R.basic but its not part of CRAN packages so I thought about using a supported version.
Try this:
result <- vector("list",5)
result[c(TRUE,FALSE)] <- split(a, cumsum(seq_along(a) %in% (c(3,7)+1)))
result[c(FALSE,TRUE)] <- list(b,d)
f <- unlist(result)
identical(f, e)
#[1] TRUE
EDIT: generalization to arbitrary number of insertions is straightforward:
insert.at <- function(a, pos, ...){
dots <- list(...)
stopifnot(length(dots)==length(pos))
result <- vector("list",2*length(pos)+1)
result[c(TRUE,FALSE)] <- split(a, cumsum(seq_along(a) %in% (pos+1)))
result[c(FALSE,TRUE)] <- dots
unlist(result)
}
> insert.at(a, c(3,7), b, d)
[1] 2 3 4 2 1 9 10 2 4 0 1 19
> insert.at(1:10, c(4,7,9), 11, 12, 13)
[1] 1 2 3 4 11 5 6 7 12 8 9 13 10
> insert.at(1:10, c(4,7,9), 11, 12)
Error: length(dots) == length(pos) is not TRUE
Note the bonus error checking if the number of positions and insertions do not match.
You can use the following function,
ins(a, list(b, d), pos=c(3, 7))
# [1] 2 3 4 2 1 9 10 2 4 0 1 4 19
where:
ins <- function(a, to.insert=list(), pos=c()) {
c(a[seq(pos[1])],
to.insert[[1]],
a[seq(pos[1]+1, pos[2])],
to.insert[[2]],
a[seq(pos[2], length(a))]
)
}
Here's another function, using Ricardo's syntax, Ferdinand's split and #Arun's interleaving trick from another question:
ins2 <- function(a,bs,pos){
as <- split(a,cumsum(seq(a)%in%(pos+1)))
idx <- order(c(seq_along(as),seq_along(bs)))
unlist(c(as,bs)[idx])
}
The advantage is that this should extend to more insertions. However, it may produce weird output when passed invalid arguments, e.g., with any(pos > length(a)) or length(bs)!=length(pos).
You can change the last line to unname(unlist(... if you don't want a's items named.
The straightforward approach:
b.pos <- 3
d.pos <- 7
c(a[1:b.pos],b,a[(b.pos+1):d.pos],d,a[(d.pos+1):length(a)])
[1] 2 3 4 2 1 9 10 2 4 0 1 19
Note the importance of parenthesis for the boundaries of the : operator.
After using Ferdinand's function, I tried to write my own and surprisingly it is far more efficient.
Here's mine :
insertElems = function(vect, pos, elems) {
l = length(vect)
j = 0
for (i in 1:length(pos)){
if (pos[i]==1)
vect = c(elems[j+1], vect)
else if (pos[i] == length(vect)+1)
vect = c(vect, elems[j+1])
else
vect = c(vect[1:(pos[i]-1+j)], elems[j+1], vect[(pos[i]+j):(l+j)])
j = j+1
}
return(vect)
}
tmp = c(seq(1:5))
insertElems(tmp, c(2,4,5), c(NA,NA,NA))
# [1] 1 NA 2 3 NA 4 NA 5
insert.at(tmp, c(2,4,5), c(NA,NA,NA))
# [1] 1 NA 2 3 NA 4 NA 5
And there's the benchmark result :
> microbenchmark(insertElems(tmp, c(2,4,5), c(NA,NA,NA)), insert.at(tmp, c(2,4,5), c(NA,NA,NA)), times = 10000)
Unit: microseconds
expr min lq mean median uq max neval
insertElems(tmp, c(2, 4, 5), c(NA, NA, NA)) 9.660 11.472 13.44247 12.68 13.585 1630.421 10000
insert.at(tmp, c(2, 4, 5), c(NA, NA, NA)) 58.866 62.791 70.36281 64.30 67.923 2475.366 10000
my code works even better for some cases :
> insert.at(tmp, c(1,4,5), c(NA,NA,NA))
# [1] 1 2 3 NA 4 NA 5 NA 1 2 3
# Warning message:
# In result[c(TRUE, FALSE)] <- split(a, cumsum(seq_along(a) %in% (pos))) :
# number of items to replace is not a multiple of replacement length
> insertElems(tmp, c(1,4,5), c(NA,NA,NA))
# [1] NA 1 2 3 NA 4 NA 5
Here's an alternative that uses append. It's fine for small vectors, but I can't imagine it being efficient for large vectors since a new vector is created upon each iteration of the loop (which is, obviously, bad). The trick is to reverse the vector of things that need to be inserted to get append to insert them in the correct place relative to the original vector.
a = c(2,3,4,9,10,2,4,19)
b = c(2,1)
d = c(0,1)
pos <- c(3, 7)
z <- setNames(list(b, d), pos)
z <- z[order(names(z), decreasing=TRUE)]
for (i in seq_along(z)) {
a <- append(a, z[[i]], after = as.numeric(names(z)[[i]]))
}
a
# [1] 2 3 4 2 1 9 10 2 4 0 1 19

Create counter within consecutive runs of certain values

I have an hourly value. I want to count how many consecutive hours the value has been zero since the last time it was not zero. This is an easy job for a spreadsheet or for loop, but I am hoping for a snappy vectorized one-liner to accomplish the task.
x <- c(1, 0, 1, 0, 0, 0, 1, 1, 0, 0)
df <- data.frame(x, zcount = NA)
df$zcount[1] <- ifelse(df$x[1] == 0, 1, 0)
for(i in 2:nrow(df))
df$zcount[i] <- ifelse(df$x[i] == 0, df$zcount[i - 1] + 1, 0)
Desired output:
R> df
x zcount
1 1 0
2 0 1
3 1 0
4 0 1
5 0 2
6 0 3
7 1 0
8 1 0
9 0 1
10 0 2
William Dunlap's posts on R-help are the place to look for all things related to run lengths. His f7 from this post is
f7 <- function(x){ tmp<-cumsum(x);tmp-cummax((!x)*tmp)}
and in the current situation f7(!x). In terms of performance there is
> x <- sample(0:1, 1000000, TRUE)
> system.time(res7 <- f7(!x))
user system elapsed
0.076 0.000 0.077
> system.time(res0 <- cumul_zeros(x))
user system elapsed
0.345 0.003 0.349
> identical(res7, res0)
[1] TRUE
Here's a way, building on Joshua's rle approach: (EDITED to use seq_len and lapply as per Marek's suggestion)
> (!x) * unlist(lapply(rle(x)$lengths, seq_len))
[1] 0 1 0 1 2 3 0 0 1 2
UPDATE. Just for kicks, here's another way to do it, around 5 times faster:
cumul_zeros <- function(x) {
x <- !x
rl <- rle(x)
len <- rl$lengths
v <- rl$values
cumLen <- cumsum(len)
z <- x
# replace the 0 at the end of each zero-block in z by the
# negative of the length of the preceding 1-block....
iDrops <- c(0, diff(v)) < 0
z[ cumLen[ iDrops ] ] <- -len[ c(iDrops[-1],FALSE) ]
# ... to ensure that the cumsum below does the right thing.
# We zap the cumsum with x so only the cumsums for the 1-blocks survive:
x*cumsum(z)
}
Try an example:
> cumul_zeros(c(1,1,1,0,0,0,0,0,1,1,1,0,0,1,1))
[1] 0 0 0 1 2 3 4 5 0 0 0 1 2 0 0
Now compare times on a million-length vector:
> x <- sample(0:1, 1000000,T)
> system.time( z <- cumul_zeros(x))
user system elapsed
0.15 0.00 0.14
> system.time( z <- (!x) * unlist( lapply( rle(x)$lengths, seq_len)))
user system elapsed
0.75 0.00 0.75
Moral of the story: one-liners are nicer and easier to understand, but not always the fastest!
rle will "count how many consecutive hours the value has been zero since the last time it was not zero", but not in the format of your "desired output".
Note the lengths for the elements where the corresponding values are zero:
rle(x)
# Run Length Encoding
# lengths: int [1:6] 1 1 1 3 2 2
# values : num [1:6] 1 0 1 0 1 0
A simple base R approach:
ave(!x, cumsum(x), FUN = cumsum)
#[1] 0 1 0 1 2 3 0 0 1 2
One-liner, not exactly super elegant:
x <- c(1, 0, 1, 0, 0, 0, 1, 1, 0, 0)
unlist(lapply(split(x, c(0, cumsum(abs(diff(!x == 0))))), function(x) (x[1] == 0) * seq(length(x))))
Using purr::accumulate() is very straightforward, so this tidyverse solution may add some value here. I must acknowledge it is definitely not the fastest, as it calls the same function length(x)times.
library(purrr)
accumulate(x==0, ~ifelse(.y!=0, .x+1, 0))
[1] 0 1 0 1 2 3 0 0 1 2

Resources