How to automatically move from e.g. x[1] to x[2] - r

I have a random vector (of numbers 1:5) of length 20. I need to count the number of runs of 1 (i.e. each number that is not followed by the same number), 2 (i.e. 2 consecutive numbers the same), 3 and 4.
I'm trying to write a function that takes x[1] and x[2] and compares them, if they are the same then + 1 to a counting variable. After that, x[1] becomes x[2] and x[2] should become x[3] so it keeps on repeating. How do I make x[2] change to x[3] without assigning it again? Sorry if that doesn't make much sense
This is my first day learning R so please simplify as much as you can so I understand lol..
{
startingnumber <- x[1]
nextnumber <- x[2]
count <- 0
repeat {
if (startingnumber == nextnumber) {
count <- count + 1
startingnumber <- nextnumber
nextnumber <- x[3]
} else {
if (startingnumber != nextnumber) {
break
........
}
}
}
}

As mentioned in the comments, using table() on the rle() lengths is probably the most concise solution
E.g:
x <- c(3, 1, 1, 3, 4, 5, 3, 1, 5, 4, 2, 4, 2, 3, 2, 3, 2, 4, 5, 4)
table(rle(x)$lengths)
# 1 2
# 18 1
# or
v <- c(1, 1, 2, 4, 5, 5, 4, 5, 5, 3, 3, 2, 2, 2, 1, 4, 4, 4, 2, 1)
table(rle(v)$lengths)
# 1 2 3
# 6 4 2
In the first example there's 18 singles and one double (the two 1s near the beginning), for a total of 1*18 + 2*1 = 20 values
In the second example there are 6 singles, 4 doubles, and 2 triples, giving a total of 1*6 + 2*4 + 3*2 = 20 values
But if computational speed is of more importance than concise code, we can do better, as both table() and rle() do computations internally that we don't really need. Instead we can assemble a function that only does the bare minimum.
runlengths <- function(x) {
n <- length(x)
r <- which(x[-1] != x[-n])
rl <- diff(c(0, r, n))
rlu <- sort(unique(rl))
rlt <- tabulate(match(rl, rlu))
names(rlt) <- rlu
as.table(rlt)
}
runlengths(x)
# 1 2
# 18 1
runlengths(v)
# 1 2 3
# 6 4 2
Bonus:
You already know that you can compare individual elements of a vector like this
x[1] == x[2]
x[2] == x[3]
but did you know that you can compare vectors with each other, and that you can select multiple elements from a vector by specifying multiple indices? Together that means we can instead of doing
x[1] == x[2]
x[2] == x[3]
.
.
.
x[18] == x[19]
x[19] == x[20]
do
x[1:19] == x[2:20]
# Or even
x[-length(x)] == x[-1]

Related

R function to find count of elements before sum is above a threshold

I'm trying to recreate a function from Sum of first n elements of a vector, but where this solution took an argument to sum first n elements of vector, I'd like an argument which is the threshold (including a default) the elements sum up to (or over).
After trying different for and/or while possibilities and searching StackOverflow, I've ended up here: unclear how to implement the threshold and set the n_elements.
I have this logic which returns 0 for the given vector. It doesn't seem the n_elements = x[i] + 1 part is correct.
theFunc <- function(x, threshold = 5){
n_elements = 0
while (sum(head(x)) < threshold){
n_elements = x[i] + 1
}
return(n_elements)
}
Call:
x <- c(0, 0, 1, 1, 2, 3, 6, 7)
theFunc(x)
[1] 0
If the input is as above and the threshold is 5, then the function should return 6 (number of elements) because 0+0+1+1+2+3 = 7 and is above the threshold.
A simple function without a loop is as follows:
theFunc <- function(x, threshold = 5){
sum(cumsum(x) < threshold) + 1
}
x <- c(0, 0, 1, 1, 2, 3, 6, 7)
theFunc(x)
[1] 6

finding values in a range in r and sum the number of values

I have a question I have the following data
c(1, 2, 4, 5, 1, 8, 9)
I set a l = 2 and an u = 6
I want to find all the values in the range (3,7)
How can I do this?
In base R we can use comparison operators to create a logical vector and use that for subsetting the original vector
x[x > 2 & x <= 6]
#[1] 3 5 6
Or using a for loop, initialize an empty vector, loop through the elements of 'x', if the value is between 2 and 6, then concatenate that value to the empty vector
v1 <- c()
for(i in x) {
if(i > 2 & i <= 6) v1 <- c(v1, i)
}
v1
#[1] 3 5 6
data
x <- c(3, 5, 6, 8, 1, 2, 1)

How to make a generalized function update the value of a vector?

I have been trying to write a generalized function that multiplies each value in each row of a matrix by the corresponding value of a vector in terms of their position (i.e. matrix[1,1]*vector[1], matrix[1,2]*vector[2], etc) and then sum them together. It is important to note that the lengths of the vector and the rows of the matrix are always the same, which means that in each row the first value of the vector is multiplied with the first value of the matrix row. Also important to note, I think, is that the rows and columns of the matrix are of equal length. The end sum for each row should be assigned to different existing vector, the length of which is equal to the number of rows.
This is the matrix and vector:
a <- c(4, -9, 2, -1)
b <- c(-1, 3, -8, 2)
c <- c(5, 2, 6, 3)
d <- c(7, 9, -2, 5)
matrix <- cbind(a,b,c,d)
a b c d
[1,] 4 -1 5 7
[2,] -9 3 2 9
[3,] 2 -8 6 -2
[4,] -1 2 3 5
vector <- c(1, 2, 3, 4)
These are the basic functions that I have to generalize for the rows and columns of matrix and a vector of lenghts "n":
f.1 <- function() {
(matrix[1,1]*vector[1]
+ matrix[1,2]*vector[2]
+ matrix[1,3]*vector[3]
+ matrix[1,4]*vector[4])
}
f.2 <- function() {
(matrix[2,1]*vector[1]
+ matrix[2,2]*vector[2]
+ matrix[2,3]*vector[3]
+ matrix[2,4]*vector[4])
}
and so on...
This is the function I have written:
ncells = 4
f = function(x) {
i = x
result = 0
for(j in 1:ncells) {
result = result + vector[j] * matrix[i][j]
}
return(result)
}
Calling the function:
result.cell = function() {
for(i in 1:ncells) {
new.vector[i] = f(i)
}
}
The vector to which this result should be assigned (i.e. new.vector) has been defined beforehand:
new.vector <- c()
I expected that the end sum for each row will be assigned to the vector in a corresponding manner (e.g. if the sums for all rows were 1, 2, 3, 4, etc. then new.vector(1, 2, 3, 4, etc) but it did not happen.
(Edit) When I do this with the basic functions, the assignment works:
new.vector[1] <- f.1()
new.vector[2] <- f.2()
This does not however work with the generalized function:
new.vector[1:ncells] <- result cell[1:ncells]
(End Edit)
I have also tried setting the length for the the new.vector to be equal to ncells but I don't think it did any good:
length(new.vector) = ncells
My question is how can I make the new vector take the resulting sums of the multiplied elements of a row of a matrix by the corresponding value of a vector.
I hope I have been clear and thanks in advance!
There is no need for a loop here, we can use R's power of matrix multiplication and then sum the rows with rowSums. Note that m and v are used as names for matrix and vector to avoid conflict with those function names.
nr <- nrow(m)
rowSums(m * matrix(rep(v, nr), nr, byrow = TRUE))
# [1] 45 39 -4 32
However, if the vector v is always going to be the column number, we can simply use the col function as our multiplier.
rowSums(m * col(m))
# [1] 45 39 -4 32
Data:
a <- c(4, -9, 2, -1)
b <- c(-1, 3, -8, 2)
c <- c(5, 2, 6, 3)
d <- c(7, 9, -2, 5)
m <- cbind(a, b, c, d)
v <- 1:4

Extract first continuous sequence in vector

I have a vector:
as <- c(1,2,3,4,5,9)
I need to extract the first continunous sequence in the vector, starting at index 1, such that the output is the following:
1 2 3 4 5
Is there a smart function for doing this, or do I have to do something not so elegant like this:
a <- c(1,2,3,4,5,9)
is_continunous <- c()
for (i in 1:length(a)) {
if(a[i+1] - a[i] == 1) {
is_continunous <- c(is_continunous, i)
} else {
break
}
}
continunous_numbers <- c()
if(is_continunous[1] == 1) {
is_continunous <- c(is_continunous, length(is_continunous)+1)
continunous_numbers <- a[is_continunous]
}
It does the trick, but I would expect that there is a function that can already do this.
It isn't clear what you need if the index of the continuous sequence only if it starts at index one or the first sequence, whatever the beginning index is.
In both case, you need to start by checking the difference between adjacent elements:
d_as <- diff(as)
If you need the first sequence only if it starts at index 1:
if(d_as[1]==1) 1:(rle(d_as)$lengths[1]+1) else NULL
# [1] 1 2 3 4 5
rle permits to know lengths and values for each consecutive sequence of same value.
If you need the first continuous sequence, whatever the starting index is:
rle_d_as <- rle(d_as)
which(d_as==1)[1]+(0:(rle_d_as$lengths[rle_d_as$values==1][1]))
Examples (for the second option):
as <- c(1,2,3,4,5,9)
d_as <- diff(as)
rle_d_as <- rle(d_as)
which(d_as==1)[1]+(0:(rle_d_as$lengths[rle_d_as$values==1][1]))
#[1] 1 2 3 4 5
as <- c(4,3,1,2,3,4,5,9)
d_as <- diff(as)
rle_d_as <- rle(d_as)
which(d_as==1)[1]+(0:(rle_d_as$lengths[rle_d_as$values==1][1]))
# [1] 3 4 5 6 7
as <- c(1, 2, 3, 6, 7, 8)
d_as <- diff(as)
rle_d_as <- rle(d_as)
which(d_as==1)[1]+(0:(rle_d_as$lengths[rle_d_as$values==1][1]))
# [1] 1 2 3
A simple way to catch the sequence would be to find the diff of your vector and grab all elements with diff == 1 plus the very next element, i.e.
d1<- which(diff(as) == 1)
as[c(d1, d1[length(d1)]+1)]
NOTE
This will only work If you only have one sequence in your vector. However If we want to make it more general, then I 'd suggest creating a function as so,
get_seq <- function(vec){
d1 <- which(diff(as) == 1)
if(all(diff(d1) == 1)){
return(c(d1, d1[length(d1)]+1))
}else{
d2 <- split(d1, cumsum(c(1, diff(d1) != 1)))[[1]]
return(c(d2, d2[length(d2)]+1))
}
}
#testing it
as <- c(3, 5, 1, 2, 3, 4, 9, 7, 5, 4, 5, 6, 7, 8)
get_seq(as)
#[1] 3 4 5 6
as <- c(8, 9, 10, 11, 1, 2, 3, 4, 7, 8, 9, 10)
get_seq(as)
#[1] 1 2 3 4
as <- c(1, 2, 3, 4, 5, 6, 11)
get_seq(as)
#[1] 1 2 3 4 5 6

Finding the closest index to a value in R

I have a question about finding index values in a vector.
Let's say I have a vector as follows:
vector <- c(1,2,4,6,8,10)
And, let's say I have the value '5'. I would like to find the maximum index in "vector" such that it is less than or equal to the value 5. In the case of the example above, this index would be 3 (since 4 is less than or equal to 5). Similarly, if instead I had a vector such as:
vector <- c(1,2,4,5,6,8,10)
Then if I were to find a value less than or equal to 5, this index would now be 4 instead of 3.
However, I also want to find the first and last time this index occurs. For example, if I had a vector such as:
vector <- c(1,1,2,2,4,5,5,5,5,6,8,10)
Then the first time this index occurs would be 6 and the last time this index occurs would be 9.
Is there a short, one-line method which would allow me to perform this task? Up until now I have been using the function max(which(....)), however I find that this method is extremely inefficient for large datasets since it will literally list hundreds/thousands of values, so I would like to find a more efficient method if possible which can fit in one line.
Thanks in advance.
You can use the following code:
min(max(which(vector <= 5)), min(which(vector == 5)))
First, it searches all indices where vector is less or equal to 5 with which function, then it takes the maximum one.
Second, it searches all indices where vector is equal to 5 and takes the minimum.
Third, it takes the first of these two indices
Thanks for all those who replied, I actually found an extremely short, one-line method to do this by download a package BBmisc. It has functions called which.last and which.first, and they perform the actions I need. Thanks again for taking the time to reply, I appreciate it.
You can use:
my_ind <- function(vec, num){
ind <- which.max(vec == num) # Check for equality first
if(ind == 1L && vec[1L] != num){
ind <- which.min(vec < num) - 1L
}
ind
}
my_ind(c(1,2,4,6,8,10), 5L) # 3
my_ind(c(1,2,4,5,6,8,10), 5L) # 4
my_ind(c(1,1,2,2,4,5,5,5,5,6,8,10), 5L) # 6
my_ind(c(5,8,10), 5L) # 1
my_ind(c(6,8,10), 5L) # 0 - returns 0 if all(vec > 5L)
I don't see a need for packages here. It seems like the construct which(x == max(x[x <= 5])) would work for you.
x <- c(1, 2, 4, 6, 8, 10)
which(x == max(x[x <= 5]))
# [1] 3
x <- c(1, 2, 4, 5, 6, 8, 10)
which(x == max(x[x <= 5]))
# [1] 4
x <- c(1, 1, 2, 2, 4, 5, 5, 5, 5, 6, 8, 10)
which(x == max(x[x <= 5]))
# [1] 6 7 8 9
And to find the min/max index for multiples indices, use head/tail.
head(which(x == max(x[x <= 5])), 1)
# [1] 6
tail(which(x == max(x[x <= 5])), 1)
# [1] 9

Resources