Scope of Aggregation Functions when nesting apply(within()) - r

Edited original post to clarify question
Background
I'm learning R and saw this scenario and don't understand how R handles (what I'll call) implied context transitions. The script I am trying to understand simply iterates through each row of a matrix and prints the index of the column(s) within that row that contain the minimum value of that row. What I don't understand is how R handles the context transition as different functions are applied to the dependent variable x:
x (when defined as an argument to function(x)) is an atomic vector because of the apply() function with a MARGIN = 1 argument
The which() function then iterates over the individual elements within the atomic vector x to see which ones == min(x)
This is the part that truly confuses me: Despite the fact which() is iterating over elements of atomic vector x, you can call min(x) within the which() function and R somehow switches x to be defined as the entire atomic vector again for calculating the min() across the vector vs. within the scope of a single element
Example Data Matrix
a <- matrix (c(5, 2, 7, 1, 2, 8, 4, 5, 6), 3, 3)
[,1] [,2] [,3]
[1,] 5 1 4
[2,] 2 2 5
[3,] 7 8 6
This is the script that returns the column indexes that I am struggling to understand
apply (a, 1, function(x) which(x == min(x)))
My question:
Within the which() function, why does min(x) return the minimum of the atomic vector (as is desired) and not the minimum within the scope of an individual element within that vector, since which() is iterating over each individual element within the atomic vector x?

Edit: discussion about which and x:
the first comment on your question is incorrect:
x is anonymous function, lambda
x is just a variable, nothing fancy. function(x) declares it as the first (and only) argument of the anonymous function, and then every reference to x after that is referencing what is passed to this anonymous function;
the code uses an anonymous function; normally, almost everything you do in R is using named functions (e.g., mean, min). In some cases (e.g., in apply and related functions), it makes sense to define a whole function as an argument and not name it, as in
## anonymous (unnamed) function
apply(m, 1, function(x) which(x == min(x)))
## equivalently, with a named function
myfunc <- function(x) which(x == min(x))
apply(m, 1, myfunc)
In the first case, function(x) which(x == min(x))) is not named, so it is "anonymous". The results between the two apply calls are identical.
Given that context, x is the first argument to the function (myfunc or the anonymous function in your case). With the rest of the apply/MARGIN discussion below,
x (in this case) contains the whole row (when MARGIN=1);
min(x) returns the value of the lowest value within x, and it is always length 1); and
which(x == min(x)) returns the index of that lowest value within x; in this case, it will always be length 1 or more, because we are confident that there is always one element such that it is equal to the minimum of that vector ... however, there is no guarantee that which will find any matches, so the length of which(...)'s return value can be between 0 and the length of the inputs. Examples:
which(11:15 == 13)
# [1] 3
which(11:15 == 1:5)
# integer(0)
which(11:15 == 11:15)
# [1] 1 2 3 4 5
which(11:15 %in% c(12, 14))
# [1] 2 4
apply works one or more dimensions at a time. For now, I'll stick with a 2d matrix, in which case MARGIN= selects rows or columns. (There is a caveat, see below.)
I'm going to use a step-by-step verbose function for trying to show each step. I'll name it anonfunc, but in your mind convert apply(a, 1, anonfunc) later with apply(a, 1, function(x) { ... }) and you will see what I'm intending to do. Also, I have a dematrix function to help show what's being used in the anonfunc.
dematrix <- function(m, label = "") {
if (!is.matrix(m)) m <- matrix(m, nrow = 1)
out <- capture.output(print(m))[-1]
out <- gsub("^[][,0-9]+", "", out)
paste(paste0(c(label, rep(strrep(" ", nchar(label)), length(out) - 1)), out),
collapse = "\n")
}
anonfunc <- function(x) {
message(dematrix(x, "Input: "))
step1 <- x == min(x)
message(dematrix(step1, "Step1: "))
step2 <- which(step1)
message("Step2: ", paste(step2, collapse = ","), "\n#\n")
step2
}
2d arrays
I'm going to modify your sample data a little by adding a column. This helps visualize how many function calls there are and how big the function's input is.
apply(a, 1, anonfunc)
# Input: 5 1 4 11
# Step1: FALSE TRUE FALSE FALSE
# Step2: 2
# #
# Input: 2 2 5 12
# Step1: TRUE TRUE FALSE FALSE
# Step2: 1,2
# #
# Input: 7 8 6 13
# Step1: FALSE FALSE TRUE FALSE
# Step2: 3
# #
# [[1]]
# [1] 2
# [[2]]
# [1] 1 2
# [[3]]
# [1] 3
Our anonymous function is called three times, once for each row. In each call, it is passed a vector of length 4, which is the size of one row in the matrix.
Note that we get a list in return. Normally apply returns a vector or matrix. The return value is actually the dimension of the MARGIN= axes, with an added dimension of the length of the return values. That is, a has dims 3x4; if the return value from each call to the anon-func is length 1, then the return value is "sort of" 3x1, but R simplifies that to a vector of length 3 (this might be construed as inconsistent mathematically, I don't disagree).; if the return value from each anon-func call is length 10, then the output would be a matrix of 3x10.
However, when any of the anon-func returns is of a different length/size/class as the others, then apply will return a list. (This is the same behavior as sapply, and it can be frustrating if it changes when you are not expecting it. There is allegedly a patch in R-devel that allows us to force a list with apply(..., simplify=FALSE).)
If we instead use MARGIN=2, we'll be operating on columns:
apply(a, 2, anonfunc)
# Input: 5 2 7
# Step1: FALSE TRUE FALSE
# Step2: 2
# #
# Input: 1 2 8
# Step1: TRUE FALSE FALSE
# Step2: 1
# #
# Input: 4 5 6
# Step1: TRUE FALSE FALSE
# Step2: 1
# #
# Input: 11 12 13
# Step1: TRUE FALSE FALSE
# Step2: 1
# #
# [1] 2 1 1 1
Now, one call for each column (4 calls) and x is a vector of length 3 (number of rows in the source matrix).
It is possible to operate on more than one axis at a time; while it seems meaningless to do it with a matrix (2d array), it makes more sense with larger-dimensioned arrays.
apply(a, 1:2, anonfunc)
# Input: 5
# Step1: TRUE
# Step2: 1
# #
# Input: 2
# Step1: TRUE
# Step2: 1
# #
# Input: 7
# Step1: TRUE
# Step2: 1
# #
# ...truncated... total of 12 calls to `anonfunc`
# #
# [,1] [,2] [,3] [,4]
# [1,] 1 1 1 1
# [2,] 1 1 1 1
# [3,] 1 1 1 1
From the discussion of output dimensions, the MARGIN=1:2 means the output dimension will be the dimensions of the margin -- 3x4 -- with the dimension/length of the output. Since the output here is always length 1, then that is technically 3x4x1, which in R-speak is a matrix of dim 3x4.
Pics of what each margin uses from a matrix:
3d array
Let's go slightly larger to see some of the "plane" operations.
a3 <- array(1:24, dim = c(3,4,2))
a3
# , , 1
# [,1] [,2] [,3] [,4]
# [1,] 1 4 7 10
# [2,] 2 5 8 11
# [3,] 3 6 9 12
# , , 2
# [,1] [,2] [,3] [,4]
# [1,] 13 16 19 22
# [2,] 14 17 20 23
# [3,] 15 18 21 24
Starting with MARGIN=1. While you have both arrays visible, look at the first Input: and see which "plane" is being used from the original a3 array. It appears transposed, sure ...
For the sake of brevity (too late!), I'll abbreviate the third and subsequent iterations of anonfunc to show just the first line (inner-matrix row) of the verbose output.
apply(a3, 1, anonfunc)
# Input: 1 13
# 4 16
# 7 19
# 10 22
# Step1: TRUE FALSE
# FALSE FALSE
# FALSE FALSE
# FALSE FALSE
# Step2: 1
# #
# Input: 2 14
# 5 17
# 8 20
# 11 23
# Step1: TRUE FALSE
# FALSE FALSE
# FALSE FALSE
# FALSE FALSE
# Step2: 1
# #
# Input: 3 15 ...
# #
# [1] 1 1 1
Similarly, MARGIN=2. I'll show a3 again so you can see which "plane" is being used:
a3
# , , 1
# [,1] [,2] [,3] [,4]
# [1,] 1 4 7 10
# [2,] 2 5 8 11
# [3,] 3 6 9 12
# , , 2
# [,1] [,2] [,3] [,4]
# [1,] 13 16 19 22
# [2,] 14 17 20 23
# [3,] 15 18 21 24
apply(a3, 2, anonfunc)
# Input: 1 13
# 2 14
# 3 15
# Step1: TRUE FALSE
# FALSE FALSE
# FALSE FALSE
# Step2: 1
# #
# Input: 4 16
# 5 17
# 6 18
# Step1: TRUE FALSE
# FALSE FALSE
# FALSE FALSE
# Step2: 1
# #
# Input: 7 19 ...
# Input: 10 22 ...
# #
# [1] 1 1 1 1
MARGIN=3 is not very exciting: anonfunc is only called twice, one for each of the front-facing "planes" (no abbreviation necessary here):
apply(a3, 3, anonfunc)
# Input: 1 4 7 10
# 2 5 8 11
# 3 6 9 12
# Step1: TRUE FALSE FALSE FALSE
# FALSE FALSE FALSE FALSE
# FALSE FALSE FALSE FALSE
# Step2: 1
# #
# Input: 13 16 19 22
# 14 17 20 23
# 15 18 21 24
# Step1: TRUE FALSE FALSE FALSE
# FALSE FALSE FALSE FALSE
# FALSE FALSE FALSE FALSE
# Step2: 1
# #
# [1] 1 1
One can use multiple dimensions here as well, and this is where I think the Input: string becomes a little clarifying:
a3
# , , 1
# [,1] [,2] [,3] [,4]
# [1,] 1 4 7 10
# [2,] 2 5 8 11
# [3,] 3 6 9 12
# , , 2
# [,1] [,2] [,3] [,4]
# [1,] 13 16 19 22
# [2,] 14 17 20 23
# [3,] 15 18 21 24
apply(a3, 2:3, anonfunc)
# Input: 1 2 3
# Step1: TRUE FALSE FALSE
# Step2: 1
# #
# Input: 4 5 6
# Step1: TRUE FALSE FALSE
# Step2: 1
# #
# Input: 7 8 9 ...
# Input: 10 11 12 ...
# Input: 13 14 15 ...
# Input: 16 17 18 ...
# Input: 19 20 21 ...
# Input: 22 23 24 ...
# #
# [,1] [,2]
# [1,] 1 1
# [2,] 1 1
# [3,] 1 1
# [4,] 1 1
And since the dimensions of a3 are 3,4,2, and we're looking at margins 2:3, and each call to anonfunc returns length 1, our returned matrix is 4x2x1 (where the x1 is silently dropped by R).
To visualize what each call of MARGIN= actually uses, see the below pics:

"Lexical scoping looks up symbol values based on how functions were nested when they were created, not how they are nested when they are called. With lexical scoping, you don’t need to know how the function is called to figure out where the value of a variable will be looked up. You just need to look at the function’s definition."**
**Source: http://adv-r.had.co.nz/Functions.html#lexical-scoping

Related

Generating Possible Sequences in R

I am trying to solve the following problem in R. Generically, given a sequence [a,b], I am to generate lists from this sequence that have a length n, whose elements pairwise at least have a difference of d.
I was thinking of using seq() but you can only create evenly-spaced sequences using this function.
This may be what you are after, generate all permutations of the possible different values that could exist in the sequence for size n and then check which satisfy your requirements of having their terminal value be b.
This is quite intensive and slow for larger vectors, but should return all possible valid sequences (unless I've made a mistake).
# sequence length of n which includes a, b
# therefore need to find n - 1 values (then check that last val of cumsum == b)
# vals must be greater than or equal to d
# vals have upper bound is if all but one value was d, b - ((n - 1) * d)
library(gtools)
library(matrixStats)
# parameters
a = 1
b = 20
n = 5
d = 2
# possible values that differences can be
poss_diffs <- d:(b - ((n - 1) * d))
# generate all possible permutations of differences
diff_perms_n <- permutations(n = length(poss_diffs), r = n - 1, v = poss_diffs)
# turn differences into sequences, add column for the a value
seqs_n <- matrixStats::rowCumsums(cbind(a, diff_perms_n))
# filter to only valid sequences, last column == b
valid_seqs <- seqs_n[seqs_n[, ncol(seqs_n)] == b, ]
# check that diffs are all greater than d
valid_seqs_diffs <- matrixStats::rowDiffs(valid_seqs)
print(head(valid_seqs))
print(head(valid_seqs_diffs))
# > print(head(valid_seqs))
# [,1] [,2] [,3] [,4] [,5]
# [1,] 1 3 6 10 20
# [2,] 1 3 6 11 20
# [3,] 1 3 6 12 20
# [4,] 1 3 6 14 20
# [5,] 1 3 6 15 20
# [6,] 1 3 6 16 20
# > print(head(valid_seqs_diffs))
# [,1] [,2] [,3] [,4]
# [1,] 2 3 4 10
# [2,] 2 3 5 9
# [3,] 2 3 6 8
# [4,] 2 3 8 6
# [5,] 2 3 9 5
# [6,] 2 3 10 4

Getting all the combination of numbers from a list that would sum to a specific number

I have the following list of numbers (1,3,4,5,7,9,10,12,15) and I want to find out all the possible combinations of 3 numbers from this list that would sum to 20.
My research on stackoverflow has led me to this post:
Finding all possible combinations of numbers to reach a given sum
There is a solution provided by Mark which stand as follows:
subset_sum = function(numbers,target,partial=0){
if(any(is.na(partial))) return()
s = sum(partial)
if(s == target) print(sprintf("sum(%s)=%s",paste(partial[-1],collapse="+"),target))
if(s > target) return()
for( i in seq_along(numbers)){
n = numbers[i]
remaining = numbers[(i+1):length(numbers)]
subset_sum(remaining,target,c(partial,n))
}
}
However I am having a hard time trying to tweak this set of codes to match my problem. Or may be there is a simpler solution?
I want the output in R to show me the list of numbers.
Any help would be appreciated.
You can use combn function and filter to meet your criteria. I have performed below calculation in 2 steps but one can perform it in single step too.
v <- c(1,3,4,5,7,9,10,12,15)
AllComb <- combn(v, 3) #generates all combination taking 3 at a time.
PossibleComb <- AllComb[,colSums(AllComb) == 20] #filter those with sum == 20
#Result: 6 sets of 3 numbers (column-wise)
PossibleComb
# [,1] [,2] [,3] [,4] [,5] [,6]
# [1,] 1 1 1 3 3 4
# [2,] 4 7 9 5 7 7
# [3,] 15 12 10 12 10 9
#
# Result in list
split(PossibleComb, col(PossibleComb))
# $`1`
# [1] 1 4 15
#
# $`2`
# [1] 1 7 12
#
# $`3`
# [1] 1 9 10
#
# $`4`
# [1] 3 5 12
#
# $`5`
# [1] 3 7 10
#
# $`6`
# [1] 4 7 9
The combn also have a FUN parameter which we can describe to output as list and then Filter the list elements based on the condition
Filter(function(x) sum(x) == 20, combn(v, 3, FUN = list))
#[[1]]
#[1] 1 4 15
#[[2]]
#[1] 1 7 12
#[[3]]
#[1] 1 9 10
#[[4]]
#[1] 3 5 12
#[[5]]
#[1] 3 7 10
#[[6]]
#[1] 4 7 9
data
v <- c(1,3,4,5,7,9,10,12,15)

Find similar groups of numbers across rows R

I'm trying to find similar patterns of numbers across a dataframe. I have a dataframe with 5 columns and some columns have a random number between 3 and 50. However, for some rows 2 or 3 columns don't have a number.
A B C D E
5 23 6
9 33 7 8 12
33 7 14
6 18 23 48
8 44 33 7 9
I want to know what are the recurring numbers, so I'm interested in:
Row 1 and 4 that have the number 23 and 6,
Row 2 and 5 that have number 9, 33 and 8,
Row 2, 3 and 5 that have number 33 and 7.
Basically I'm trying to get the number of different combinations.
I'm a bit stuck about how to do this. I've tried to join the numbers in a list.
for (i in 1:dim(knots_all)[1]) {
knots_all$list_knots <- list(sort(knots_all[i,1:5]))
}
I've also tried intersect but it doesn't seem very efficient as R also considers the NAs which I want to disregard.
I would like to hear some ideas about the best way to achieve this. I've been thinking about this problem but I'm not able to understand how to get to the answer. My mind is stuck so any idea is much appreciated!
Thank you!
There's no specific/target pattern you want to capture. It seems like you need a process to identify the numbers that appear more often in your dataset and then see in which rows they appear.
I'll modify your example dataset to have number 23 appearing twice in the same row in order to illustrate some useful differences in counts.
df = read.table(text = "
A B C D E
5 23 6 23 NA
9 33 7 8 12
33 7 14 NA NA
6 18 23 48 NA
8 44 33 7 9
", header=T)
library(dplyr)
library(tidyr)
df %>%
mutate(row_id = row_number()) %>% # add a row flag
gather(col_name,value,-row_id) %>% # reshape
filter(!is.na(value)) %>% # exclude NAs
group_by(value) %>% # for each number value
summarise(NumOccurences = n(), # count occurences
rows = paste(sort(row_id), collapse = "_"), # capture rows
NumRowOccurences = n_distinct(row_id), # count occurences in unique rows
unique_rows = paste(sort(unique(row_id)), collapse = "_")) %>% # capture unique rows
arrange(desc(NumOccurences)) # order by number popularity (occurences)
# # A tibble: 12 x 5
# value NumOccurences rows NumRowOccurences unique_rows
# <int> <int> <chr> <int> <chr>
# 1 7 3 2_3_5 3 2_3_5
# 2 23 3 1_1_4 2 1_4
# 3 33 3 2_3_5 3 2_3_5
# 4 6 2 1_4 2 1_4
# 5 8 2 2_5 2 2_5
# 6 9 2 2_5 2 2_5
# 7 5 1 1 1 1
# 8 12 1 2 1 2
# 9 14 1 3 1 3
# 10 18 1 4 1 4
# 11 44 1 5 1 5
# 12 48 1 4 1 4
Make a list of lists:
List = [1[],2[],...,n[]].
Loop through your data frame and for your example ad A to List = [1[],2[],.5[A]..,[n]] (at index = 5). And so on for every column.
after this loop through list check if the list (in the list) are filled and have multiple columns.
this should get you started.
good luck
This is an algorithm which can detect numbers presents in two columns.
df <- data.frame(A = c(5, 23, 6, NA, NA),
B = c(9, 33, 7, 8, 12),
C = c(33, 7, 14, NA, NA),
D = c(6, 18, 23, 48, NA),
E = c(8, 44, 33, 7, 9))
L <- as.list(df)
LL <- rep(list(rep(list(NA), length(L))), length(L))
for(i in 1:length(L)){
for(j in 1:length(L))
LL[[i]][[j]] <- intersect(L[[i]], L[[j]])
}
To see the overlapping numbers in columns 1 and 4:
LL[[1]][[4]]
[1] 23 6 NA
To see all overapping numbers:
unique(unlist(LL))
[1] 5 23 6 NA 9 33 7 8 12 14 18 48 44
It could be changed a little bit (by adding a level in the nested loop and if the for loop) to see the pesence in 3 different columns etc
One example for dealing with the NA would be to temporarily fill them with randomly generated numbers:
# data
df <- data.frame(A = c(5,9,33,6,8),
B = c(23,33,7,18,44),
C = c(6,7,14,23,33),
D = c(NA, 8, NA, 48, 7),
E = c(NA, 12, NA, NA, 9))
# fill NA with random numbers
set.seed(1)
df2 <- as.data.frame(do.call(cbind, lapply(df, function(x) ifelse(is.na(x), rnorm(1), x))))
> df2
A B C D E
1 5 23 6 -0.6264538 0.1836433
2 9 33 7 8.0000000 12.0000000
3 33 7 14 -0.6264538 0.1836433
4 6 18 23 48.0000000 0.1836433
5 8 44 33 7.0000000 9.0000000
# split data by rows
df2 <- split(df2, seq_along(df2))
# compare rows with each other
temp <- lapply(lapply(df2, function(x) lapply(df2, function(y) x %in% y)), function(x) do.call(rbind, x))
# delete self comparisons
output <- lapply(1:5, function(x) temp[[x]] <- temp[[x]][-x,])
Result:
[[1]]
[,1] [,2] [,3] [,4] [,5]
2 FALSE FALSE FALSE FALSE FALSE
3 FALSE FALSE FALSE TRUE TRUE
4 FALSE TRUE TRUE FALSE TRUE
5 FALSE FALSE FALSE FALSE FALSE
[[2]]
[,1] [,2] [,3] [,4] [,5]
1 FALSE FALSE FALSE FALSE FALSE
3 FALSE TRUE TRUE FALSE FALSE
4 FALSE FALSE FALSE FALSE FALSE
5 TRUE TRUE TRUE TRUE FALSE
[[3]]
[,1] [,2] [,3] [,4] [,5]
1 FALSE FALSE FALSE TRUE TRUE
2 TRUE TRUE FALSE FALSE FALSE
4 FALSE FALSE FALSE FALSE TRUE
5 TRUE TRUE FALSE FALSE FALSE
[[4]]
[,1] [,2] [,3] [,4] [,5]
1 TRUE FALSE TRUE FALSE TRUE
2 FALSE FALSE FALSE FALSE FALSE
3 FALSE FALSE FALSE FALSE TRUE
5 FALSE FALSE FALSE FALSE FALSE
[[5]]
[,1] [,2] [,3] [,4] [,5]
1 FALSE FALSE FALSE FALSE FALSE
2 TRUE FALSE TRUE TRUE TRUE
3 FALSE FALSE TRUE TRUE FALSE
4 FALSE FALSE FALSE FALSE FALSE

check whether matrix rows equal a vector in R , vectorized

I'm very surprised this question has not been asked, maybe the answer will clear up why. I want to compare rows of a matrix to a vector and return whether the row == the vector everywhere. See the example below. I want a vectorized solution, no apply functions because the matrix is too large for slow looping. Suppose there are many rows as well, so I would like to avoid repping the vector.
set.seed(1)
M = matrix(rpois(50,5),5,10)
v = c(3 , 2 , 7 , 7 , 4 , 4 , 7 , 4 , 5, 6)
M
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 4 8 3 5 9 4 5 6 7 7
[2,] 4 9 3 6 3 1 5 7 6 1
[3,] 5 6 6 11 6 4 5 2 7 5
[4,] 8 6 4 4 3 8 3 6 5 6
[5,] 3 2 7 7 4 4 7 4 5 6
Output should be
FALSE FALSE FALSE FALSE TRUE
One possibility is
rowSums(M == v[col(M)]) == ncol(M)
## [1] FALSE FALSE FALSE FALSE TRUE
Or simlarly
rowSums(M == rep(v, each = nrow(M))) == ncol(M)
## [1] FALSE FALSE FALSE FALSE TRUE
Or
colSums(t(M) == v) == ncol(M)
## [1] FALSE FALSE FALSE FALSE TRUE
v[col(M)] is just a shorter version of rep(v, each = nrow(M)) which creates a vector the same size as M (matrix is just a vector, try c(M)) and then compares each element against its corresponding one using ==. Fortunately == is a generic function which has an array method (see methods("Ops") and is.array(M)) which allows us to run rowSums (or colSums) on it in order to makes sure we have the amount of matches as ncol(M)
Using DeMorgan's rule (Not all = Some not), then All equal = Not Some Not equal, we also have
!colSums(t(M) != v)
The package prodlim has a function called row.match, which is easy to use and ideal for your problem. First install and load the library: library(prodlim). In our example, row.match will return '5' because the 5th row in M is equal to v. We can then convert this into a logical vector.
m <- row.match(v, M)
m==1:NROW(M)#[1] FALSE FALSE FALSE FALSE TRUE

Is there a way to extract continuous feature in an 2D array

Say I have an array of number
a <- c(1,2,3,6,7,8,9,10,20)
if there a way to tell R to output just the range of the continuous sequence from "a"
e.g., the continuous sequences in "a" are the following
1,3
6,10
20
Thanks a lot!
Derek
I don't think there is a straight way, but you could create two logical vectors telling you if next/previous element is 1 greatest/least. E.g.:
data.frame(
a,
is_first = c(TRUE,diff(a)!=1),
is_last = c(diff(a)!=1,TRUE)
)
# Gives you:
a is_first is_last
1 1 TRUE FALSE
2 2 FALSE FALSE
3 3 FALSE TRUE
4 6 TRUE FALSE
5 7 FALSE FALSE
6 8 FALSE FALSE
7 9 FALSE FALSE
8 10 FALSE TRUE
9 20 TRUE TRUE
So ranges are:
cbind(a[c(TRUE,diff(a)!=1)], a[c(diff(a)!=1,TRUE)])
[1,] 1 3
[2,] 6 10
[3,] 20 20
I did this (not so elegant I admit) in case you want all the numbers of each sequence in a list
a <- c(1,2,3,6,7,8,9,10,20)
z <- c(1,which(c(1,diff(a))!=1))
g <- lapply(seq(1:length(z)),function(i) {
if (i < length(z)) a[z[i] : (z[i+1] - 1)]
else a[z[i] : length(a)]
})
[[1]]
[1] 1 2 3
[[2]]
[1] 6 7 8 9 10
[[3]]
[1] 20
Then you can get a 2D array with something like this
sapply(g,function(x) c(x[1],x[length(x)]))
[,1] [,2] [,3]
[1,] 1 6 20
[2,] 3 10 20
> a <- c(1,2,3,6,7,8,9,10,20)
> N<-length(a)
> k<-2:(N-1)
> z<-(a[k-1]+1)!=a[k] | (a[k+1]-1)!=a[k]
> c(a[1],a[k][z],a[N])
[1] 1 3 6 10 20

Resources