I've got some sort of index, like:
index <- 1:100
I've also got a list of "exclusion intervals" / ranges
exclude <- data.frame(start = c(5,50, 90), end = c(10,55, 95))
start end
1 5 10
2 50 55
3 90 95
I'm looking for an efficient way (in R) to remove all the indexes that belong in the ranges in the exclude data frame
so the desired output would be:
1,2,3,4, 11,12,...,48,49, 56,57,...,88,89, 96,97,98,99,100
I could do this iteratively: go over every exclusion interval (using ddply) and iteratively remove indexes that fall in each interval. But is there a more efficient way (or function) that does this?
I'm using library(intervals) to calculate my intervals, I could not find a built-in function tha does this.
Another approach that looks valid could be:
starts = findInterval(index, exclude[["start"]])
ends = findInterval(index, exclude[["end"]])# + 1L) ##1 needs to be added to remove upper
##bounds from the 'index' too
index[starts != (ends + 1L)] ##a value above a lower bound and
##below an upper is inside that interval
The main advantage here is that no vectors including all intervals' elements need to be created and, also, that it handles any set of values inside a particular interval; e.g.:
set.seed(101); x = round(runif(15, 1, 100), 3)
x
# [1] 37.848 5.339 71.259 66.111 25.736 30.705 58.902 34.013 62.579 55.037 88.100 70.981 73.465 93.232 46.057
x[findInterval(x, exclude[["start"]]) != (findInterval(x, exclude[["end"]]) + 1L)]
# [1] 37.848 71.259 66.111 25.736 30.705 58.902 34.013 62.579 55.037 88.100 70.981 73.465 46.057
We can use Map to get the sequence for the corresponding elements in 'start' 'end' columns, unlist to create a vector and use setdiff to get the values of 'index' that are not in the vector.
setdiff(index,unlist(with(exclude, Map(`:`, start, end))))
#[1] 1 2 3 4 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
#[20] 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44
#[39] 45 46 47 48 49 56 57 58 59 60 61 62 63 64 65 66 67 68 69
#[58] 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88
#[77] 89 96 97 98 99 100
Or we can use rep and then use setdiff.
i1 <- with(exclude, end-start) +1L
setdiff(index,with(exclude, rep(start, i1)+ sequence(i1)-1))
NOTE: Both the methods return the index position that needs to be excluded. In the above case, the original vector ('index') is a sequence so I used setdiff. If it contains random elements, use the position vector appropriately, i.e.
index[-unlist(with(exclude, Map(`:`, start, end)))]
or
index[setdiff(seq_along(index), unlist(with(exclude,
Map(`:`, start, end))))]
Another approach
> index[-do.call(c, lapply(1:nrow(exclude), function(x) exclude$start[x]:exclude$end[x]))]
[1] 1 2 3 4 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
[25] 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 56 57 58 59 60
[49] 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84
[73] 85 86 87 88 89 96 97 98 99 100
Related
I stumbled upon something weird, where I don't know why r behaves in the way it does.
I have a vector, with values from 1:100 (vec100). Now I only want to get the values lower than 50. Normally, I'd write vec100[vec100<50] and would be happy. Today, however, I used the assigned the logical vec100<50 to an object x for reasons of demonstration. To show, that this approach is dangerous, as it selects the positions and not the values per se, I selected x from another vector vec2 <- c(vec100,20,50,100,10). Funnily, it also returns those added values although they are out of the range of x, and I don't find an explanation why it does that.
vec <- 1:100
x <- vec<50 #logical vector of length 100
vec[x] #selects the values just as
vec[vec<50] #does
#now let's add some values
vec2 <- c(vec, 20,50,100,10)
vec2[x]
#returns:
# [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
#[30] 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 20 50 100 10
# so adds 20, 50, 100, 10 although those positions are not in x
#though it omits those new positions (not in x) when I look at the values which are not TRUE in x
vec2[x==F]
# [1] 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 #73 74 75 76 77 78
#[30] 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100
confused greetings,
Lea
It is basically due to recycling of vector when one of the vector is of lesser length. i.e. the lesser length vector recycles from the beginning. Here the 'x' is a logical vector of the same length as vec, but, when we concatenated four more elements to create 'vec2', the 'x' begins from the beginning i.e. x[1:4] is recycled. It can be checked by
v1 <- vec2[x]
v2 <- vec2[c(x, x[1:4])]
identical(v1, v2)
#[1] TRUE
I remarked a strange behavior of data.table that I don't understand:
library(data.table)
df <- as.data.table(matrix(ncol = 100,nrow = 3,data = sample(letters,300,replace = T)))
If I want to inverse first two columns, I could do:
df[,c(2,1,3:100L)]
which works fine. But if I do:
df[,c(2,1,3:ncol(df))]
[1] 2 1 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
[33] 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64
[65] 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96
[97] 97 98 99 100
and I don't understand it, because ncol(df) is 100 and is an integer. Why does it do that ?
You need to use with=FALSE as follows:
df[,c(2,1,3:ncol(df)),with=FALSE]
From ?data.table, under the Arguments for with
When j is a character vector of column names, a numeric vector of column positions to select or of the form startcol:endcol, and the value returned is always a data.table. with=FALSE is not necessary anymore to select columns dynamically. Note that x[, cols] is equivalent to x[, ..cols] and to x[, cols, with=FALSE] and to x[, .SD, .SDcols=cols].
Since c(2,1,3:100L) is a numeric column, then with=FALSE is not required and the columns are automatically returned. When it is c(2,1,3:ncol(df)), this expression will be evaluated and returned as a vector.
Should have a dupe somewhere
Is there actually an easy solution to reordering a vector like
first element, last element, second element, second last element, etc.
So I expect for c(1,2,3,4,5) to get c(1,5,2,4,3).
The reason is I have a color palette with 16 colours and color 1 is very similar to two but not to color 16. But within my plots, the dots coloured by color 1 are close to the ones are coloured by color 2.
For my color palette I use Set 1 from color brewer and also use colorRampPalette to calculate colours in between, so they get a bit similar.
One solution would be to just sample(my_colors) but actually I would like to reorder them like I told above.
This will do what you need:
a <- c(1,2,3,4,5)
b <- rbind(a,a[5:1])
c <-b [1:5]
Hope this helps
Here is a fiddle
You can generalise this with
rbind(a,rev(a))[1:length(a)]
Here is an easy way to do this:
a<-seq(1,100)
b<-a-median(a)
names(a)=b
a<-order(-abs(b))
print(a)
[1] 1 100 2 99 3 98 4 97 5 96 6 95 7 94 8 93 9 92 10 91 11 90 12 89 13 88 14 87
[29] 15 86 16 85 17 84 18 83 19 82 20 81 21 80 22 79 23 78 24 77 25 76 26 75 27 74 28 73
[57] 29 72 30 71 31 70 32 69 33 68 34 67 35 66 36 65 37 64 38 63 39 62 40 61 41 60 42 59
[85] 43 58 44 57 45 56 46 55 47 54 48 53 49 52 50 51
From the comments:
1: From #bgoldst: A better (one line) approach that doesn't involve vector names:
a[order(-abs(a-median(a)))]
2: (Also from bgoldst) For dealing with non-numeric (alphabetical order) values:
letters[order(-abs(seq_along(letters)-(length(letters)+1)/2))]
I am trying to write a function in R which will print every 3rd number in [1,100]; this is what I have tried, but this doesn't produce every third number, it produces every number
x <- c(100)
question.1 <- function (x){
out <- seq(x)
return(out)
}
question.1(x)
Am I missing something? Any help you can offer would be greatly appreciated!
Use the modulo operator (%%) to obtain every nth value from 1:100 like this:
nth <- function(x,n){
x[x%%n==0]
}
For example:
x <- 1:100
nth(x,7)
[1] 7 14 21 28 35 42 49 56 63 70 77 84 91 98
Use indexing with [ and the recycling of short vectors:
seq(100)[c(F,F,T)]
## [1] 3 6 9 12 15 18 21 24 27 30 33 36 39 42 45 48 51 54 57 60 63 66 69 72 75 78 81 84 87 90 93 96 99
Excellent answers have already been posted; this is just one further simple alternative:
start <- 1 # defines the initial number you want to select
step <- 3 # difference between subsequent numbers
seq(start, 100, by=step)
#[1] 1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58 61 64 67 70 73 76 79 82 85 88 91 94 97 100
Just wrapping Matthew's solution in a general nth function:
nth <- function(x, n) x[c(rep(FALSE, n-1), TRUE)]
nth(1:100, 5)
# [1] 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100
Or even using an operator-style:
`%nth%` <- function(x, n) x[c(rep(FALSE, n-1), TRUE)]
seq(100) %nth% 5
I am working on a project where I need to enter a number of "T score" tables into R. These are tables used to convert raw test scores into standardized values. They generally follow a specific pattern, but not one that is simple. For instance, one pattern is:
34,36,39,42,44,47,50,52,55,58,60,63,66,68,
71,74,76,79,82,84,87,90,92,95,98,100,103,106
I'd prefer to use a simple function to fill these in, rather than typing them by hand. I know that the seq() function can create a simple seqeuence, like:
R> seq(1,10,2)
[1] 1 3 5 7 9
Is there any way to create more complex sequences based on specific patterns? For instance, the above data could be done as:
c(34,seq(36:106,c(3,3,2)) # The pattern goes 36,39,42,44,47,50,52 (+3,+3,+2)
...however, this results in an error. I thought there would be a function that should do this, but all my Google-fu has just brought me back to the original seq().
This could be done using the cumsum (cumulative sum) function and rep:
> 31 + cumsum(rep(c(3, 2, 3), 9))
[1] 34 36 39 42 44 47 50 52 55 58 60 63 66 68 71 74 76 79 82
[20] 84 87 90 92 95 98 100 103
To make sure sure the sequence stops at the right place:
> (31 + cumsum(rep(c(3, 2, 3), 10)))[1:28]
[1] 34 36 39 42 44 47 50 52 55 58 60 63 66 68 71 74 76 79 82
[20] 84 87 90 92 95 98 100 103 106
Here is a custom function that should work in most cases. It uses the cumulative sum (cumsum()) of a sequence, and integer division to calculate the length of the desired sequence.
cseq <- function(from, to, by){
times <- (to-from) %/% sum(by)
x <- cumsum(c(from, rep(by, times+1)))
x[x<=to]
}
Try it:
> cseq(36, 106, c(3,3,2))
[1] 36 39 42 44 47 50 52 55 58 60 63 66 68 71 74 76 79 82 84 87 90 92 95 98
[25] 100 103 106
> cseq(36, 109, c(3,3,2))
[1] 36 39 42 44 47 50 52 55 58 60 63 66 68 71 74 76 79 82 84 87 90 92 95 98
[25] 100 103 106 108
Here is a non-iterative solution, in case you need a specific element of the sequence
f <- function(x){
d <- (x) %/% 3
r <- x %% 3
31 + d*8 + c(0,3,5)[r+1]
}
> f(1:10)
[1] 34 36 39 42 44 47 50 52 55 58