counting the larger value - r

Completely new to R and am trying to count how many numbers in a list are larger than the one right before.
This is what I have so far,
count <- 0
number <- function(value) {
for (i in 1:length(value))
{ if(value[i+1] > value[i])
{count <- count + 1}
}
}
x <- c(1,2,1,1,3,5)
number(x)
The output should be 3 based on the list.
Any help or advice would be greatly appreciated!

A base R alternative would be diff
sum(diff(x) > 0)
#[1] 3
Or we can also eliminate first and last values and compare them.
sum(x[-1] > x[-length(x)])
#[1] 3
where
x[-1]
#[1] 2 1 1 3 5
x[-length(x)]
#[1] 1 2 1 1 3

You can lag your vector and count how many times your initial vector is greater than your lagged vector
library(dplyr)
sum(x>lag(x), na.rm = TRUE)
In details, lag(x) does:
> lag(x)
[1] NA 1 2 1 1 3
so x > lag(x) does
> x>lag(x)
[1] NA TRUE FALSE FALSE TRUE TRUE
The sum of the above is 3.

Related

Writing a function in in R

I am doing an exercise to practice writing functions.
I'm trying to figure out the general code before writing the function that reproduces the output from the table function. So far, I have the following:
set.seed(111)
vec <- as.integer(runif(10, 5, 20))
x <- sort(unique(vec))
for (i in x) {
c <- length(x[i] == vec[i])
print(c)
}
But this gives me the following output:
[1] 1
[1] 1
[1] 1
[1] 1
[1] 1
[1] 1
[1] 1
[1] 1
[1] 1
I don't think I'm subsetting correctly in my loop. I've been watching videos, but I'm not quite sure where I'm going wrong. Would appreciate any insight!
Thanks!
We can sum the logical vector concatenate it to count
count <- c()
for(number in x) count <- c(count, sum(vec == number))
count
#[1] 3 1 4 1 5 4 3 2 7
In the OP's for loop, it is looping over the 'x' values and not on the sequence of 'x'
If we do
for(number in x) count <- c(count, length(vec[vec == number]))
it should work as well
You can try sapply + setNames to achieve the same result like table, i.e.,
count <- sapply(x, function(k) setNames(sum(k==vec),k))
or
count <- sapply(x, function(k) setNames(length(na.omit(match(vec,k))),k))
such that
> count
1 2 3 4 5 6 7 8 9
3 1 4 1 5 4 3 2 7
Here is a solution without using unique and with one pass through the vector (if only R was fast with for loops!):
count = list()
for (i in vec) {
val = as.character(i)
if (is.null(count[[val]]))
count[[val]] = 1
else
count[[val]] = count[[val]] + 1
}
unlist(count)

Find all subsequences with specific length in sequence of numbers in R

I want to find all subsequences within a sequence with (minimum) length of n. Lets assume I have this sequence
sequence <- c(1,2,3,2,5,3,2,6,7,9)
and I want to find the increasing subsequences with minimum length of 3. The ouput should be a dataframe with start and end position for each subsequence found.
df =data.frame(c(1,7),c(3,10))
colnames(df) <- c("start", "end")
Can somebody give a hint how to solve my problem?
Thanks in advance!
One way using only base R
n <- 3
do.call(rbind, sapply(split(1:length(sequence), cumsum(c(0, diff(sequence)) < 1)),
function(x) if (length(x) >= n) c(start = x[1], end = x[length(x)])))
# start end
#1 1 3
#4 7 10
split the index of sequence based on the continuous incremental subsequences, if the length of each group is greater than equal to n return the start and end index of that group.
To understand lets break this down and understand it step by step
Using diff we can find difference between consecutive elements
diff(sequence)
#[1] 0 1 1 -1 3 -2 -1 4 1 2
We check which of them do not have increasing subsequences
diff(sequence) < 1
#[1] FALSE FALSE TRUE FALSE TRUE TRUE FALSE FALSE FALSE
and take cumulative sum over them to create groups
cumsum(c(0, diff(sequence)) < 1)
#[1] 1 1 1 2 2 3 4 4 4 4
Based on this groups, we split the index from 1:length(sequence)
split(1:length(sequence), cumsum(c(0, diff(sequence)) < 1))
#$`1`
#[1] 1 2 3
#$`2`
#[1] 4 5
#$`3`
#[1] 6
#$`4`
#[1] 7 8 9 10
Using sapply we loop over this list and return the start and end index of the list if the length of the list is >= n (3 in this case)
sapply(split(1:length(sequence), cumsum(c(0, diff(sequence)) < 1)),
function(x) if (length(x) >= n) c(start = x[1], end = x[length(x)]))
#$`1`
#start end
# 1 3
#$`2`
# NULL
#$`3`
#NULL
#$`4`
#start end
# 7 10
Finally, rbind all of them together using do.call. NULL elements are automatically ignored.
do.call(rbind, sapply(split(1:length(sequence), cumsum(c(0, diff(sequence)) < 1)),
function(x) if (length(x) >= n) c(start = x[1], end = x[length(x)])))
# start end
#1 1 3
#4 7 10
Here is another solution using base R. I tried to comment it well but it may still be hard to follow. It seems like you wanted direction / to learn, more than an outright answer so definitely follow up with questions if anything is unclear (or doesn't work for your actual application).
Also, for your data, I added a 12 on the end to make sure it was returning the correct position for repeated increases greater than n (3 in this case):
# Data (I added 11 on the end)
sequence <- c(1,2,3,2,5,3,2,6,7,9, 12)
# Create indices for whether or not the numbers in the sequence increased
indices <- c(1, diff(sequence) >= 1)
indices
[1] 1 1 1 0 1 0 0 1 1 1 1
Now that we have the indices, we need to get the start and end postions for repeates >= 3
# Finding increasing sequences of n length using rle
n <- 3
n <- n - 1
# Examples
rle(indices)$lengths
[1] 3 1 1 2 4
rle(indices)$values
[1] 1 0 1 0 1
# Finding repeated TRUE (1) in our indices vector
reps <- rle(indices)$lengths >= n & rle(indices)$values == 1
reps
[1] TRUE FALSE FALSE FALSE TRUE
# Creating a vector of positions for the end of a sequence
# Because our indices are true false, we can use cumsum along
# with rle to create the positions of the end of the sequences
rle_positions <- cumsum(rle(indices)$lengths)
rle_positions
[1] 3 4 5 7 11
# Creating start sequence vector and subsetting start / end using reps
start <- c(1, head(rle_positions, -1))[reps]
end <- rle_positions[reps]
data.frame(start, end)
start end
1 1 3
2 7 11
Or, concisely:
n <- 3
n <- n-1
indices <- c(1, diff(sequence) >= 1)
reps <- rle(indices)$lengths >= n & rle(indices)$values == 1
rle_positions <- cumsum(rle(indices)$lengths)
data.frame(start = c(1, head(rle_positions, -1))[reps],
end = rle_positions[reps])
start end
1 1 3
2 7 11
EDIT: #Ronak's update made me realize I should be using diff instead of sapply with an anonymous function for my first step. Updated the answer b/c it was not catching an increase at the end of the vector (e.g., sequence <- c(1,2,3,2,5,3,2,6,7,9,12, 11, 11, 20, 100), also needed to add one more line under n <- 3. This should work as intended now.

R find row identificators of matrix given an element value of a specific column [duplicate]

I want to get the indices of non zero elements in a matrix.for example
X <- matrix(c(1,0,3,4,0,5), byrow=TRUE, nrow=2);
should give me something like this
row col
1 1
1 3
2 1
2 3
Can any one please tell me how to do that?
which(X!=0,arr.ind = T)
row col
[1,] 1 1
[2,] 2 1
[3,] 1 3
[4,] 2 3
If arr.ind == TRUE and X is an array, the result is a matrix whose rows each are the indices of the elements of X
There's an error in your example code - True is not defined, use TRUE.
X <-matrix(c(1,0,3,4,0,5), byrow = TRUE, nrow = 2)
which should do it:
which(!X == 0)
X[ which(!X == 0)]
#[1] 1 4 3 5
to get the row/col indices:
row(X)[which(!X == 0)]
col(X)[which(!X == 0)]
to use those to index back into the matrix:
X[cbind(row(X)[which(!X == 0)], col(X)[which(!X == 0)])]
#[1] 1 4 3 5

Comparison of two vectors of unequal length

I was trying this out, trying to subset a data frame based on values in vector being in another vector:
x <- c( 1,2,3,1,2,3 )
df <- data.frame(x=x,y=x)
df[ df$x == c(1,2), ]
expecting to get this:
x y
1 1 1
2 2 2
4 1 1
5 2 2
but I didn't, I got this:
x y
1 1 1
2 2 2
Disregarding the fact that I really wanted this (occurred to me a minute later):
df[ df$x %in% c(1,2), ]
What is the logic behind the result of this:
x == c(1,2)
being this:
[1] TRUE TRUE FALSE FALSE FALSE FALSE
I don't really get it. I am aware that this is likely a duplicate, but I couldn't find one.
It is based on the recycling of c(1,2) to the length of 'x', i.e. we are comparing df$x with
rep(c(1,2),length.out= nrow(df))
#[1] 1 2 1 2 1 2
df$x ==rep(c(1,2),length.out= nrow(df))
#[1] TRUE TRUE FALSE FALSE FALSE FALSE
It means, we are comparing the corresponding elements of 'x' with the corresponding recycled c(1,2) instead of checking any element of 'x' contains c(1,2)

How to combine two vectors with missing values?

I have two vectors of the same length and I'm trying to combine them such that they fill out each others missing values. For example:
a=c("",1,2,"")
b=c(5,"","",6)
I'm looking for this output:
5 1 2 6
Thanks much
In this case, the normally numeric comparison via pmax also works:
as.numeric(pmax(a,b))
#[1] 5 1 2 6
This is because R will resort to alphanumeric sorting when max/min etc are applied to character data:
max(c("b","a"))
#[1] "b"
And:
as.numeric(paste(a,b))
[1] 5 1 2 6
Or:
a[a==""] <- b[b!=""]
as.numeric(a)
# [1] 5 1 2 6
a[a == ""] <- 0
b[b == ""] <- 0
a <- as.numeric(a)
b <- as.numeric(b)
output <- a + b
as.numeric(ifelse(a != "", a, b))

Resources