It is possible to round results into two significant digits using signif:
> signif(12500,2)
[1] 12000
> signif(12501,2)
[1] 13000
But are there an equally handy functions, like the fictitious functions below signif.floor and signif.ceiling, so that I could get two or more significant digits with flooring or ceiling?
> signif.ceiling(12500,2)
[1] 13000
> signif.floor(12501,2)
[1] 12000
EDIT:
The existing signif function works with negative numbers and decimal numbers.
Therefore, the possible solution would preferably work also with negative numbers:
> signif(-125,2)
[1] -120
> signif.floor(-125,2)
[1] -130
and decimal numbers:
> signif(1.23,2)
[1] 1.2
> signif.ceiling(1.23,2)
[1] 1.3
As a special case, also 0 should return 0:
> signif.floor(0,2)
[1] 0
I think this approach is proper for all types of numbers (i.e. integers, negative, decimal).
The floor function
signif.floor <- function(x, n){
pow <- floor( log10( abs(x) ) ) + 1 - n
y <- floor(x / 10 ^ pow) * 10^pow
# handle the x = 0 case
y[x==0] <- 0
y
}
The ceiling function
signif.ceiling <- function(x, n){
pow <- floor( log10( abs(x) ) ) + 1 - n
y <- ceiling(x / 10 ^ pow) * 10^pow
# handle the x = 0 case
y[x==0] <- 0
y
}
They both do the same thing. First count the number of digits, next use the standard floor/ceiling function. Check if it works for you.
Edit 1 Added the handler for the case of x = 0 as suggested in the comments by Heikki.
Edit 2 Again following Heikki I add some examples:
Testing different values of x
# for negative values
> values <- -0.12151 * 10^(0:4); values
# [1] -0.12151 -1.21510 -12.15100 -121.51000 -1215.10000
> sapply(values, function(x) signif.floor(x, 2))
# [1] -0.13 -1.30 -13.00 -130.00 -1300.00
> sapply(values, function(x) signif.ceiling(x, 2))
# [1] -0.12 -1.20 -12.00 -120.00 -1200.00
# for positive values
> sapply(-values, function(x) signif.floor(x, 2))
# [1] 0.12 1.20 12.00 120.00 1200.00
> sapply(-values, function(x) signif.ceiling(x, 2))
# [1] 0.13 1.30 13.00 130.00 1300.00
Testing different values of n
> sapply(1:5, function(n) signif.floor(-121.51,n))
# [1] -200.00 -130.00 -122.00 -121.60 -121.51
> sapply(1:5, function(n) signif.ceiling(-121.51,n))
# [1] -100.00 -120.00 -121.00 -121.50 -121.51
Edit Nowhere near as nice as #storaged's answer, but I'd started so I might as well finish:
Basically runs through each case (positive, negative, decimal or not)
signif.floor=function(x,n){
if(x==0)(out=0)
if(x%%round(x)==0 & sign(x)==1){out=as.numeric(paste0(el(strsplit(as.character(x),''))[1:n],collapse=''))*10^(nchar(x)-n)}
if(x%%round(x) >0 & sign(x)==1){out=as.numeric(paste0(el(strsplit(as.character(x),''))[1:(n+1)],collapse=''))}
if(x%%round(x)==0 & sign(x)==-1){out=(as.numeric(paste0(el(strsplit(as.character(x),''))[1:(n+1)],collapse=''))-1)*10^(nchar(x)-n-1)}
if(x%%round(x) <0 & sign(x)==-1){out=as.numeric(paste0(el(strsplit(as.character(x),''))[1:(n+2)],collapse=''))-+10^(-n+1)}
return(out)
}
signif.ceiling=function(x,n){
if(x==0)(out=0)
if(x%%round(x)==0 & sign(x)==1){out=(as.numeric(paste0(el(strsplit(as.character(x),''))[1:n],collapse=''))+1)*10^(nchar(x)-n)}
if(x%%round(x) >0 & sign(x)==1){out=as.numeric(paste0(el(strsplit(as.character(x),''))[1:(n+1)],collapse=''))+10^(-n+1)}
if(x%%round(x)==0 & sign(x)==-1){out=(as.numeric(paste0(el(strsplit(as.character(x),''))[1:(n+1)],collapse='')))*10^(nchar(x)-n-1)}
if(x%%round(x) < 0 & sign(x)==-1){out=as.numeric(paste0(el(strsplit(as.character(x),''))[1:(n+2)],collapse=''))}
return(out)
}
Related
I have a vector with variable elements in it, and I want to check whether it's last two element are in the same digit order.
For example, if the last two vectors are 0.0194 and 0.0198 return TRUE. because their digit order after zero is the same (0.01 order 10^-2). ! for other example the number could be 0.00014 and 0.00012 so their precision is still around the same the function should return also TRUE.
How can we build a logical statement or function to check this.
x<- c(0.817104, 0.241665, 0.040581, 0.022903, 0.019478, 0.019846)
I may be over-thinking this, but you can test that the order of magnitude and first non-zero digit are identical for each.
x <- c(0.817104, 0.241665, 0.040581, 0.022903, 0.019478, 0.019846)
oom <- function(x, base = 10) as.integer(ifelse(x == 0, 0, floor(log(abs(x), base))))
oom(x)
# [1] -1 -1 -2 -2 -2 -2
(tr <- trunc(x / 10 ** oom(x, 10)))
# [1] 8 2 4 2 1 1
So for the last two, the order of magnitude for both is -2 and the first non-zero digit is 1 for both.
Put into a function:
f <- function(x) {
oom <- function(x, base = 10) as.integer(ifelse(x == 0, 0, floor(log(abs(x), base))))
x <- tail(x, 2)
oo <- oom(x)
tr <- trunc(x / 10 ** oo)
(oo[1] == oo[2]) & (tr[1] == tr[2])
}
## more test cases
x1 <- c(0.019, 0.011)
x2 <- c(0.01, 0.001)
f(x) ## TRUE
f(x1) ## TRUE
f(x2) ## FALSE
Here is a more general function than the above for checking the last n instead of 2
g <- function(x, n = 2) {
oom <- function(x, base = 10) as.integer(ifelse(x == 0, 0, floor(log(abs(x), base))))
x <- tail(x, n)
oo <- oom(x)
tr <- trunc(x / 10 ** oo)
Reduce(`==`, oo) & Reduce(`==`, tr)
}
g(c(.24, .15, .14), 2) ## TRUE
g(c(.24, .15, .14), 3) ## FALSE
#rawr worries about over-thinking. I guess I should as well. This is what I came up with and do note that this handles the fact that print representations of floating point numbers are sometimes deceiving.
orddig <- function(x) which( sapply( 0:16, function(n){ isTRUE(all.equal(x*10^n ,
round(x*10^n,0)))}))[1]
> sapply( c(0.00014 , 0.00012 ), orddig)
[1] 6 6
My original efforts were with the signif function but that's a different numerical thought trajectory, since 0.01 and 0.001 have the same number of significant digits. Also notice that:
> sapply( 10^5*c(0.00014 , 0.00012 ), trunc, 4)
[1] 13 12
Which was why we need the isTRUE(all.equal(... , ...))
I am searching for a (simple) function in R to remove duplicated elements, like unique() or duplicated() which can consider for "near equality" of numerical values like all.equal():
unique( c(0, 0))
[1] 0
works fine, but
unique( c(0, cos(pi/2)) )
[1] 0.000000e+00 6.123032e-17
does not remove the second element, although a comparison with all.equal returns TRUE:
all.equal( 0, cos(pi/2) )
[1] TRUE
Same is valid for duplicated:
duplicated( c(0, cos(pi/2)))
[1] FALSE FALSE
Any suggestions? Thanks!
You might also consider the zapsmall function:
x <- rep(c(1,2), each=5) + rnorm(10)/(10^rep(1:5,2))
unique(x)
# [1] 1.0571484 1.0022854 1.0014347 0.9998829 0.9999985 2.1095720 1.9888208 2.0002687 1.9999723 2.0000078
unique(zapsmall(x, digits=4))
# [1] 1.0571 1.0023 1.0014 0.9999 1.0000 2.1096 1.9888 2.0003 2.0000
unique(zapsmall(x, digits=2))
# [1] 1.06 1.00 2.11 1.99 2.00
unique(zapsmall(x, digits=0))
# [1] 1 2
If you'd like to take into account the absolute error, and not the relative error (as all.equal does), try:
x <- c(0, cos(pi/2), 1, 1+1e-16)
unique(x)
## [1] 0.000000e+00 6.123234e-17 1.000000e+00
(x <- x[!duplicated(round(x, 10))])
## [1] 0 1
Here we remove the elements that are the same w.r.t. a fixed (10 above) number of decimal digits.
You could try this code (disclaimer: from my package cgwtools)
approxeq <- function (x, y, tolerance = .Machine$double.eps^0.5, ...)
{
if (length(x) != length(y))
warning("x,y lengths differ. Will recycle.")
checkit <- abs(x - y) < tolerance
return(invisible(checkit))
}
I have the following sorted vector:
> v
[1] -1 0 1 2 4 5 2 3 4 5 7 8 5 6 7 8 10 11
How can I remove the -1, 0, and 11 entries without looping over the whole vector, either with a user loop or implicitly with a language keyword? That is, I want to trim the vector at each edge and only at each edge, such that the sorted sequence is within my min,max parameters 1 and 10. The solution should assume that the vector is sorted to avoid checking every element.
This kind of solutions can come handy in vectorized operations for very large vectors, when we want to use the items in the vector as indexes in another object. For one application see this thread.
To include elements in a vector by index:
v [2:10]
to exclude certain elements
v [-c (1, 11) ]
to only include a certain range:
v <- v [v>=1 & v <=10]
If I'm allowed to assume that, like in your example, the number of elements to be trimmed << the number of elements in the vector, then I think I can beat the binary search:
> n<-1e8
> v<--3:(n+3)
>
> min <- 1
> max <- length(v)
>
> calcMin <- function(v, minVal){
+ while(v[min] < minVal){
+ min <- min + 1
+ }
+ min
+ }
>
> calcMax <- function(v, maxVal){
+ while(v[max] > maxVal){
+ max <- max - 1
+ }
+ max
+ }
>
> #Compute the min and max indices and create a sequence
> system.time(a <- v[calcMin(v, 1):calcMax(v,n)])
user system elapsed
1.030 0.269 1.298
>
> #do a binary search to find the elements (as suggested by #nograpes)
> system.time(b <- v[do.call(seq,as.list(findInterval(c(1,n),v)))])
user system elapsed
2.208 0.631 2.842
>
> #use negative indexing to remove elements
> system.time(c <- v[-c(1:(calcMin(v, 1)-1), (calcMax(v,n)+1):length(v))])
user system elapsed
1.449 0.256 1.704
>
> #use head and tail to trim the vector
> system.time(d <- tail(head(v, n=(calcMax(v,n)-length(v))), n=-calcMin(v, 1)+1))
user system elapsed
2.994 0.877 3.871
>
> identical(a, b)
[1] TRUE
> identical(a, c)
[1] TRUE
> identical(a, d)
[1] TRUE
There are many ways to do it, here's some:
> v <- -1:11 # creating your vector
> v[v %in% 1:10]
[1] 1 2 3 4 5 6 7 8 9 10
> setdiff(v, c(-1,0,11))
[1] 1 2 3 4 5 6 7 8 9 10
> intersect(v, 1:10)
[1] 1 2 3 4 5 6 7 8 9 10
Two more options, not so elegant.
> na.omit(match(v, 1:10))
> na.exclude(match(v, 1:10))
All of the previous solutions implicitly check every element of the vector. As #Robert Kubrick points out, this does not take advantage of the fact that the vector is already sorted.
To take advantage of the sorted nature of the vector, you can use binary search (through findInterval) to find the start and end indexes without looking at every element:
n<-1e9
v<--3:(n+3)
system.time(a <- v [v>=1 & v <=n]) # 68 s
system.time(b <- v[do.call(seq,as.list(findInterval(c(1,n),v)))]) # 15s
identical(a,b) # TRUE
It is a little clumsy, and there is some discussion that the binary search in findInterval may not be entirely efficient, but the general idea is there.
As was pointed out in the comments, the above only works when the index is in the vector. Here is a function that I think will work:
in.range <- function(x, lo = -Inf, hi = +Inf) {
lo.idx <- findInterval(lo, x, all.inside = TRUE)
hi.idx <- findInterval(hi, x)
lo.idx <- lo.idx + x[lo.idx] >= lo
x[seq(lo.idx, hi.idx)]
}
system.time(b <- in.range(v, 1, n) # 15s
You can use %in% also :
vv <- c(-1, 0 ,1 ,2 ,4 ,5, 2 ,3 ,4, 5, 7 ,8, 5, 6, 7, 8, 10, 11)
vv[vv %in% 1:10]
[1] 1 2 4 5 2 3 4 5 7 8 5 6 7 8 10
I am writing a function to plot data. I would like to specify a nice round number for the y-axis max that is greater than the max of the dataset.
Specifically, I would like a function foo that performs the following:
foo(4) == 5
foo(6.1) == 10 #maybe 7 would be better
foo(30.1) == 40
foo(100.1) == 110
I have gotten as far as
foo <- function(x) ceiling(max(x)/10)*10
for rounding to the nearest 10, but this does not work for arbitrary rounding intervals.
Is there a better way to do this in R?
The plyr library has a function round_any that is pretty generic to do all kinds of rounding. For example
library(plyr)
round_any(132.1, 10) # returns 130
round_any(132.1, 10, f = ceiling) # returns 140
round_any(132.1, 5, f = ceiling) # returns 135
If you just want to round up to the nearest power of 10, then just define:
roundUp <- function(x) 10^ceiling(log10(x))
This actually also works when x is a vector:
> roundUp(c(0.0023, 3.99, 10, 1003))
[1] 1e-02 1e+01 1e+01 1e+04
..but if you want to round to a "nice" number, you first need to define what a "nice" number is. The following lets us define "nice" as a vector with nice base values from 1 to 10. The default is set to the even numbers plus 5.
roundUpNice <- function(x, nice=c(1,2,4,5,6,8,10)) {
if(length(x) != 1) stop("'x' must be of length 1")
10^floor(log10(x)) * nice[[which(x <= 10^floor(log10(x)) * nice)[[1]]]]
}
The above doesn't work when x is a vector - too late in the evening right now :)
> roundUpNice(0.0322)
[1] 0.04
> roundUpNice(3.22)
[1] 4
> roundUpNice(32.2)
[1] 40
> roundUpNice(42.2)
[1] 50
> roundUpNice(422.2)
[1] 500
[[EDIT]]
If the question is how to round to a specified nearest value (like 10 or 100), then James answer seems most appropriate. My version lets you take any value and automatically round it to a reasonably "nice" value. Some other good choices of the "nice" vector above are: 1:10, c(1,5,10), seq(1, 10, 0.1)
If you have a range of values in your plot, for example [3996.225, 40001.893] then the automatic way should take into account both the size of the range and the magnitude of the numbers. And as noted by Hadley, the pretty() function might be what you want.
The round function in R assigns special meaning to the digits parameter if it is negative.
round(x, digits = 0)
Rounding to a negative number of digits means rounding to a power of ten, so for example round(x, digits = -2) rounds to the nearest hundred.
This means a function like the following gets pretty close to what you are asking for.
foo <- function(x)
{
round(x+5,-1)
}
The output looks like the following
foo(4)
[1] 10
foo(6.1)
[1] 10
foo(30.1)
[1] 40
foo(100.1)
[1] 110
If you add a negative number to the digits-argument of round(), R will round it to the multiples of 10, 100 etc.
round(9, digits = -1)
[1] 10
round(89, digits = -1)
[1] 90
round(89, digits = -2)
[1] 100
How about:
roundUp <- function(x,to=10)
{
to*(x%/%to + as.logical(x%%to))
}
Which gives:
> roundUp(c(4,6.1,30.1,100.1))
[1] 10 10 40 110
> roundUp(4,5)
[1] 5
> roundUp(12,7)
[1] 14
Round ANY number Up/Down to ANY interval
You can easily round numbers to a specific interval using the modulo operator %%.
The function:
round.choose <- function(x, roundTo, dir = 1) {
if(dir == 1) { ##ROUND UP
x + (roundTo - x %% roundTo)
} else {
if(dir == 0) { ##ROUND DOWN
x - (x %% roundTo)
}
}
}
Examples:
> round.choose(17,5,1) #round 17 UP to the next 5th
[1] 20
> round.choose(17,5,0) #round 17 DOWN to the next 5th
[1] 15
> round.choose(17,2,1) #round 17 UP to the next even number
[1] 18
> round.choose(17,2,0) #round 17 DOWN to the next even number
[1] 16
How it works:
The modulo operator %% determines the remainder of dividing the first number by the 2nd. Adding or subtracting this interval to your number of interest can essentially 'round' the number to an interval of your choosing.
> 7 + (5 - 7 %% 5) #round UP to the nearest 5
[1] 10
> 7 + (10 - 7 %% 10) #round UP to the nearest 10
[1] 10
> 7 + (2 - 7 %% 2) #round UP to the nearest even number
[1] 8
> 7 + (100 - 7 %% 100) #round UP to the nearest 100
[1] 100
> 7 + (4 - 7 %% 4) #round UP to the nearest interval of 4
[1] 8
> 7 + (4.5 - 7 %% 4.5) #round UP to the nearest interval of 4.5
[1] 9
> 7 - (7 %% 5) #round DOWN to the nearest 5
[1] 5
> 7 - (7 %% 10) #round DOWN to the nearest 10
[1] 0
> 7 - (7 %% 2) #round DOWN to the nearest even number
[1] 6
Update:
The convenient 2-argument version:
rounder <- function(x,y) {
if(y >= 0) { x + (y - x %% y)}
else { x - (x %% abs(y))}
}
Positive y values roundUp, while negative y values roundDown:
# rounder(7, -4.5) = 4.5, while rounder(7, 4.5) = 9.
Or....
Function that automatically rounds UP or DOWN based on standard rounding rules:
Round <- function(x,y) {
if((y - x %% y) <= x %% y) { x + (y - x %% y)}
else { x - (x %% y)}
}
Automatically rounds up if the x value is > halfway between subsequent instances of the rounding value y:
# Round(1.3,1) = 1 while Round(1.6,1) = 2
# Round(1.024,0.05) = 1 while Round(1.03,0.05) = 1.05
Regarding the rounding up to the multiplicity of an arbitrary number, e.g. 10, here is a simple alternative to James's answer.
It works for any real number being rounded up (from) and any real positive number rounded up to (to):
> RoundUp <- function(from,to) ceiling(from/to)*to
Example:
> RoundUp(-11,10)
[1] -10
> RoundUp(-0.1,10)
[1] 0
> RoundUp(0,10)
[1] 0
> RoundUp(8.9,10)
[1] 10
> RoundUp(135,10)
[1] 140
> RoundUp(from=c(1.3,2.4,5.6),to=1.1)
[1] 2.2 3.3 6.6
If you always want to round a number up to the nearest X, you can use the ceiling function:
#Round 354 up to the nearest 100:
> X=100
> ceiling(354/X)*X
[1] 400
#Round 47 up to the nearest 30:
> Y=30
> ceiling(47/Y)*Y
[1] 60
Similarly, if you always want to round down, use the floor function. If you want to simply round up or down to the nearest Z, use round instead.
> Z=5
> round(367.8/Z)*Z
[1] 370
> round(367.2/Z)*Z
[1] 365
I think your code just works great with a small modification:
foo <- function(x, round=10) ceiling(max(x+10^-9)/round + 1/round)*round
And your examples run:
> foo(4, round=1) == 5
[1] TRUE
> foo(6.1) == 10 #maybe 7 would be better
[1] TRUE
> foo(6.1, round=1) == 7 # you got 7
[1] TRUE
> foo(30.1) == 40
[1] TRUE
> foo(100.1) == 110
[1] TRUE
> # ALL in one:
> foo(c(4, 6.1, 30.1, 100))
[1] 110
> foo(c(4, 6.1, 30.1, 100), round=10)
[1] 110
> foo(c(4, 6.1, 30.1, 100), round=2.3)
[1] 101.2
I altered your function in two way:
added second argument (for your specified X )
added a small value (=1e-09, feel free to modify!) to the max(x) if you want a bigger number
This rounds x up to the nearest integer multiple of y when y is positive and down when y is negative:
rom=\(x,y)x+(y-x%%y)%%y
rom(8.69,.1) # 8.7
rom(8.69,-.1) # 8.6
rom(8.69,.25) # 8.75
rom(8.69,-.25) # 8.5
rom(-8.69,.25) # -8.5
This always rounds to the nearest multiple like round_any in plyr (https://github.com/hadley/plyr/blob/34188a04f0e33c4115304cbcf40e5b1c7b85fedf/R/round-any.r):
rnm=\(x,y)round(x/y)*y
rnm(8.69,.25) # 8.75
plyr::round_any(8.69,.25) # 8.75
round_any can also be given ceiling as the third argument to always round up or floor to always round down:
plyr::round_any(8.51,.25,ceiling) # 8.75
plyr::round_any(8.69,.25,floor) # 8.5
You will find an upgraded version of Tommy's answer that takes into account several cases:
Choosing between lower or higher bound
Taking into account negative and zero values
two different nice scale in case you want the function to round differently small and big numbers. Example: 4 would be rounded at 0 while 400 would be rounded at 400.
Below the code :
round.up.nice <- function(x, lower_bound = TRUE, nice_small=c(0,5,10), nice_big=c(1,2,3,4,5,6,7,8,9,10)) {
if (abs(x) > 100) {
nice = nice_big
} else {
nice = nice_small
}
if (lower_bound == TRUE) {
if (x > 0) {
return(10^floor(log10(x)) * nice[[max(which(x >= 10^floor(log10(x)) * nice))[[1]]]])
} else if (x < 0) {
return(- 10^floor(log10(-x)) * nice[[min(which(-x <= 10^floor(log10(-x)) * nice))[[1]]]])
} else {
return(0)
}
} else {
if (x > 0) {
return(10^floor(log10(x)) * nice[[min(which(x <= 10^floor(log10(x)) * nice))[[1]]]])
} else if (x < 0) {
return(- 10^floor(log10(-x)) * nice[[max(which(-x >= 10^floor(log10(-x)) * nice))[[1]]]])
} else {
return(0)
}
}
}
I tried this without using any external library or cryptic features and it works!
Hope it helps someone.
ceil <- function(val, multiple){
div = val/multiple
int_div = as.integer(div)
return (int_div * multiple + ceiling(div - int_div) * multiple)
}
> ceil(2.1, 2.2)
[1] 2.2
> ceil(3, 2.2)
[1] 4.4
> ceil(5, 10)
[1] 10
> ceil(0, 10)
[1] 0
Might be missing something but is it not as easy as:
some_number = 789
1000 * round(some_number/1000, 0)
to produce something rounded to 1000s?
I have a vector, such as c(1, 3, 4, 5, 9, 10, 17, 29, 30) and I would like to group together the 'neighboring' elements that form a regular, consecutive sequence, i.e. an increase by 1, in a ragged vector resulting in:
L1: 1
L2: 3,4,5
L3: 9,10
L4: 17
L5: 29,30
Naive code (of an ex-C programmer):
partition.neighbors <- function(v)
{
result <<- list() #jagged array
currentList <<- v[1] #current series
for(i in 2:length(v))
{
if(v[i] - v [i-1] == 1)
{
currentList <<- c(currentList, v[i])
}
else
{
result <<- c(result, list(currentList))
currentList <<- v[i] #next series
}
}
return(result)
}
Now I understand that a) R is not C (despite the curly brackets) b) global variables are pure evil c) that is a horribly inefficient way of achieving the result
, so any better solutions are welcome.
Making heavy use of some R idioms:
> split(v, cumsum(c(1, diff(v) != 1)))
$`1`
[1] 1
$`2`
[1] 3 4 5
$`3`
[1] 9 10
$`4`
[1] 17
$`5`
[1] 29 30
daroczig writes "you could write a lot neater code based on diff"...
Here's one way:
split(v, cumsum(diff(c(-Inf, v)) != 1))
EDIT (added timings):
Tommy discovered this could be faster by being careful with types; the reason it got faster is that split is faster on integers, and is actually faster still on factors.
Here's Joshua's solution; the result from the cumsum is a numeric because it's being c'd with 1, so it's the slowest.
system.time({
a <- cumsum(c(1, diff(v) != 1))
split(v, a)
})
# user system elapsed
# 1.839 0.004 1.848
Just cing with 1L so the result is an integer speeds it up considerably.
system.time({
a <- cumsum(c(1L, diff(v) != 1))
split(v, a)
})
# user system elapsed
# 0.744 0.000 0.746
This is Tommy's solution, for reference; it's also splitting on an integer.
> system.time({
a <- cumsum(c(TRUE, diff(v) != 1L))
split(v, a)
})
# user system elapsed
# 0.742 0.000 0.746
Here's my original solution; it also is splitting on an integer.
system.time({
a <- cumsum(diff(c(-Inf, v)) != 1)
split(v, a)
})
# user system elapsed
# 0.750 0.000 0.754
Here's Joshua's, with the result converted to an integer before the split.
system.time({
a <- cumsum(c(1, diff(v) != 1))
a <- as.integer(a)
split(v, a)
})
# user system elapsed
# 0.736 0.002 0.740
All the versions that split on an integer vector are about the same; it could be even faster if that integer vector was already a factor, as the conversion from integer to factor actually takes about half the time. Here I make it into a factor directly; this is not recommended in general because it depends on the structure of the factor class. It'ss done here for comparison purposes only.
system.time({
a <- cumsum(c(1L, diff(v) != 1))
a <- structure(a, class = "factor", levels = 1L:a[length(a)])
split(v,a)
})
# user system elapsed
# 0.356 0.000 0.357
Joshua and Aaron were spot on. However, their code can still be made more than twice as fast by careful use of the correct types, integers and logicals:
split(v, cumsum(c(TRUE, diff(v) != 1L)))
v <- rep(c(1:5, 19), len = 1e6) # Huge vector...
system.time( split(v, cumsum(c(1, diff(v) != 1))) ) # Joshua's code
# user system elapsed
# 2.64 0.00 2.64
system.time( split(v, cumsum(c(TRUE, diff(v) != 1L))) ) # Modified code
# user system elapsed
# 1.09 0.00 1.12
You could define the cut-points easily:
which(diff(v) != 1)
Based on that try:
v <- c(1,3,4,5,9,10,17,29,30)
cutpoints <- c(0, which(diff(v) != 1), length(v))
ragged.vector <- vector("list", length(cutpoints)-1)
for (i in 2:length(cutpoints)) ragged.vector[[i-1]] <- v[(cutpoints[i-1]+1):cutpoints[i]]
Which results in:
> ragged.vector
[[1]]
[1] 1
[[2]]
[1] 3 4 5
[[3]]
[1] 9 10
[[4]]
[1] 17
[[5]]
[1] 29 30
This algorithm is not a nice one but you could write a lot neater code based on diff :) Good luck!
You can create a data.frame and assign the elements to groups using diff, ifelse and cumsum, then aggregate using tapply:
v.df <- data.frame(v = v)
v.df$group <- cumsum(ifelse(c(1, diff(v) - 1), 1, 0))
tapply(v.df$v, v.df$group, function(x) x)
$`1`
[1] 1
$`2`
[1] 3 4 5
$`3`
[1] 9 10
$`4`
[1] 17
$`5`
[1] 29 30