How to find missing numbers in a sequence? - r

I have a vector containing a list of numbers. How do I find numbers that are missing from the vector?
For example:
sequence <- c(12:17,1:4,6:10,19)
The missing numbers are 5, 11 and 18.

sequence <- c(12:17,1:4,6:10,19)
seq2 <- min(sequence):max(sequence)
seq2[!seq2 %in% sequence]
...and the output:
> seq2[!seq2 %in% sequence]
[1] 5 11 18
>

You can use the setdiff() function to compute set differences. You want the difference between the complete sequence (from min(sequence) to max(sequence)) and the sequence with missing values.
setdiff(min(sequence):max(sequence), sequence)

This answer just gets all of the numbers from the lowest to highest in the sequence, then asks which are not present in the original sequence.
which(!(seq(min(sequence), max(sequence)) %in% sequence))
[1] 5 11 18

c(1:max(sequence))[!duplicated(c(sequence,1:max(sequence)))[-(1:length(sequence))]]
[1] 5 11 18
Not a particularly elegant solution, I admit, but what it does is determines which in the vector 1:max(sequence) are duplicates of sequence, and then selects those out of that same vector.

Related

Repeating patterns in a vector in R

If a vector is produced from a vector of unknown length with unique elements by repeating it unknown times
small_v <- c("as","d2","GI","Worm")
big_v <- rep(small_v, 3)
then how to determine how long that vector was and how many times it was repeated?
So in this example the original length was 4 and it repeats 3 times.
Realistically in my case the vectors will be fairly small and will be repeated only a few times.
1) Assuming that there is at least one unique element in small_v (which is the case in the question since it assumes all elements in small_v are unique):
min(table(big_v))
## [1] 3
or using pipes
big_v |> table() |> min()
## [1] 3
Here is a more difficult test but it still works because small_v2[2] is unique in small_v2 even though the other elements of small_v2 are not unique.
# test data
small_v2 <- c(small_v, small_v[-2])
big_v2 <- rep(small_v2, 3)
min(table(big_v2))
## [1] 3
2) If we knew that the first element of small_v were unique (which is the case in the question since it assumes all elements in small_v are unique) then this would work:
sum(big_v[1] == big_v)
## [1] 3
1) If the elements are all repeating and no other values are there, then use
length(big_v)/length(unique(big_v))
[1] 3
2) Or use
library(data.table)
max(rowid(big_v))
[1] 3
Alternatively we could use rle with with to count the repeats
with(rle(sort(big_v)), max(lengths))
Created on 2023-02-04 with reprex v2.0.2
[1] 3

Compare value in R data frame after certain index

I have a data.frame as given below. I want to get the index/row number where (b-a)>8 but I want to compare them after row 7 not from row 1. I have written the code to get me the row number where b-a>8 satisfies but it checks from row 1. How to check it from row 7?
a <- c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16)
b <- c(2,12,4,5,2,5,8,5,7,19,6,7,4,23,1,2)
df <- data.frame(a,b)
which((df$b-df$a)>8)[1]
Desired output: Row number 10 not 2.
One can start with offset as in both vectors as:
which((df$b[7:nrow(df)]-df$a[7:nrow(df)])>8)
#[1] 8
This is just a math calculation
(which(with(df[-(1:7),],b-a>8))+7)[1]
[1] 10
(a<-which((df$b-df$a)>8))[a>7][1]
[1] 10

How to compute all possible combinations of multiple vectors/matrices of different sizes and sum up columns simultaneously?

Assume I have three matrices...
A=matrix(c("a",1,2),nrow=1,ncol=3)
B=matrix(c("b","c",3,4,5,6),nrow=2,ncol=3)
C=matrix(c("d","e","f",7,8,9,10,11,12),nrow=3,ncol=3)
I want to find all possible combinations of column 1 (characters or names) while summing up columns 2 and 3. The result would be a single matrix with length equal to the total number of possible combinations, in this case 6. The result would look like the following matrix...
Result <- matrix(c("abd","abe","abf","acd","ace","acf",11,12,13,12,13,14,17,18,19,18,19,20),nrow=6,ncol=3)
I do not know how to add a table in to this question, otherwise I would show it more descriptively. Thank you in advance.
You are mixing character and numeric values in a matrix and this will coerce all elements to character. Much better to define your matrix as numeric and keep the character values as the row names:
A <- matrix(c(1,2),nrow=1,dimnames=list("a",NULL))
B <- matrix(c(3,4,5,6),nrow=2,dimnames=list(c("b","c"),NULL))
C <- matrix(c(7,8,9,10,11,12),nrow=3,dimnames=list(c("d","e","f"),NULL))
#put all the matrices in a list
mlist<-list(A,B,C)
Then we use some Map, Reduce and lapply magic:
res <- Reduce("+",Map(function(x,y) y[x,],
expand.grid(lapply(mlist,function(x) seq_len(nrow(x)))),
mlist))
Finally, we build the rownames
rownames(res)<-do.call(paste0,expand.grid(lapply(mlist,rownames)))
# [,1] [,2]
#abd 11 17
#acd 12 18
#abe 12 18
#ace 13 19
#abf 13 19
#acf 14 20

How to Count the Number of non-NA's between NA in Vector R

You have the following vector which has NA's mixed in. There are no consecutive NA's and the vector is of unknown length. There are always the same number of NA's.
#Data
testvector <- c(NA,rnorm(round(abs(rnorm(1))*10)),NA,rnorm(round(abs(rnorm(1))*10)),NA,rnorm(round(abs(rnorm(1))*10)),NA,rnorm(round(abs(rnorm(1))*10)),NA,rnorm(round(abs(rnorm(1))*10)))
You need to find the number of non-NA values that exist after each NA. This needs to be return as a vector. The length of this vector will equal the number of NA's.
For this vector.
thisvector <- c(NA,rnorm(4),NA,rnorm(5),NA,rnorm(9),NA,rnorm(2),NA,rnorm(6))
What you want is
somefunction(thisvector)
[1] 4 5 9 2 6
How can this be done?
Use rle on the output of is.na to get what you want:
x <- rle(is.na(testvector))
x$lengths[!x$values]
## [1] 1 2 5 9 4

Determining minimum values in a vector in R

I need some help in determining more than one minimum value in a vector. Let's suppose, I have a vector x:
x<-c(1,10,2, 4, 100, 3)
and would like to determine the indexes of the smallest 3 elements, i.e. 1, 2 and 3. I need the indexes of because I will be using the indexes to access the corresponding elements in another vector. Of course, sorting will provide the minimum values but I want to know the indexes of their actual occurrence prior to sorting.
In order to find the index try this
which(x %in% sort(x)[1:3]) # this gives you and index vector
[1] 1 3 6
This says that the first, third and sixth elements are the first three lowest values in your vector, to see which values these are try:
x[ which(x %in% sort(x)[1:3])] # this gives the vector of values
[1] 1 2 3
or just
x[c(1,3,6)]
[1] 1 2 3
If you have any duplicated value you may want to select unique values first and then sort them in order to find the index, just like this (Suggested by #Jeffrey Evans in his answer)
which(x %in% sort(unique(x))[1:3])
I think you mean you want to know what are the indices of the bottom 3 elements? In that case you want order(x)[1:3]
You can use unique to account for duplicate minimum values.
x<-c(1,10,2,4,100,3,1)
which(x %in% sort(unique(x))[1:3])
Here's another way with rank that includes duplicates.
x <- c(x, 3)
# [1] 1 10 2 4 100 3 3
which(rank(x, ties.method='min') <= 3)
# [1] 1 3 6 7

Resources