Extract an increasing subsequence - r

I wish to extract an increasing subsequence of a vector, starting from the first element. For example, from this vector:
a = c(2, 5, 4, 0, 1, 6, 8, 7)
...I'd like to return:
res = c(2, 5, 6, 8).
I thought I could use a loop, but I want to avoid it. Another attempt with sort:
a = c(2, 5, 4, 0, 1, 6, 8, 7)
ind = sort(a, index.return = TRUE)$ix
mat = (t(matrix(ind))[rep(1, length(ind)), ] - matrix(ind)[ , rep(1, length(ind))])
mat = ((mat*upper.tri(mat)) > 0) %*% rep(1, length(ind)) == (c(length(ind):1) - 1)
a[ind][mat]
Basically I sort the input vector and check if the indices verify the condition "no indices at the right hand side are lower" which means that there were no greater values beforehand.
But it seems a bit complicated and I wonder if there are easier/quicker solutions, or a pre-built function in R.
Thanks

One possibility would be to find the cumulative maxima of the vector, and then extract unique elements:
unique(cummax(a))
# [1] 2 5 6 8

The other answer is better, but i made this iterative function which works as well. It works by making all consecutive differences > 0
increasing <- function (input_vec) {
while(!all(diff(input_vec) > 0)){
input_vec <- input_vec[c(1,diff(input_vec))>0]
}
input_vec
}

Related

Failure in Calling a Function in R

I'm trying to create a function that compares two matrices. It will compare the element of both matrices at a certain position, and returns "greater than" "equal to" or "less than". Below is the code I have right now. However, when I tried calling the function, R does not return anything, not even an error message. I'm wondering why that is the case. Any suggestions would be helpful. Thanks.
fxn <- function(x, y) {
emptymatrix <- matrix( , nrow = dim(x)[1], ncol = dim(x)[2])
for (i in 1:dim(emptymatrix)[1]) {
for (j in 1:dim(emptymatrix)[2]) {
if (x[i, j] < y[i, j]) {
emptymatrix[i, j] <- "Less Than"
}else if (x[i, j] == y[i, j]) {
emptymatrix[i, j] <- "Equal to"
}else {
emptymatrix[i, j] <- "Greater than"
}
}
}
}
#trying to test the function
vecc1 <- c(1, 2, 3, 4, 5, 6, 7, 8, 9)
vecc2 <- c(4, 5, 2, 3, 1, 1, 8, 9, 10)
matrix1 <- matrix(vecc1, nrow = 3, byrow = T)
matrix2 <- matrix (vecc2, nrow=3, byrow = T)
fxn(matrix1, matrix2)
Hi as SamR pointed out in his comment, your function doesn't return anything, because it has no return function / object in the end. He is also right about the loop thing, because R is mainly designed for tabular data and matrices it can do a lot of stuff for you under the hood. This is a great examples about some design principles R has. First we don't need to use a for loop because we can just evaluate larger equal less, on all indices (vectorized). The output will be a matrix of size M with TRUE / FALSE. we can use this matrix to index our new matrix at all TRUE position. than we just need to assign a single string "equal", "larger", or "less" that gets recycled to the length of the longer vector(/list).
vecc1 <- c(1, 2, 3, 4, 5, 6, 7, 8, 9)
vecc2 <- c(4, 5, 2, 3, 1, 1, 8, 9, 10)
matrix1 <- matrix(vecc1, nrow = 3, byrow = T)
matrix2 <- matrix (vecc2, nrow=3, byrow = T)
# run this to see how the comparision works
matrix1 == matrix2
foo <- function(x,y) {
m_new<-matrix(NA,nrow=dim(x),ncol=dim(x))
m_new[x==y]<-"Equal"
m_new[x<y]<-"Less Than"
m_new[x>y]<-"Greater Than"
m_new # faster
#return(m_new) is not as efficent
}
foo(matrix1,matrix2)
You missed returning emptyMatrix from your function.
In R, the result of the last statement in a function is returned automatically. In the original function, the last statement was the for loop, whose value is NULL. It was returned, marked "invisible", so it didn't print.
The usual convention in R is to type the name of the object you want to return when it isn't already the last value produced. So just add one line to your function, containing emptyMatrix.
You can also call return(emptyMatrix), but that's actually less efficient.
And if you like returning things invisibly like for loops do, you can call invisible(emptyMatrix) as the last line. Then it won't automatically print, but you can still assign it to another variable.

Processing a data_frame: Defining when the value of a column change one unit

I have the following data structure:
iid<-c(rep("I1",5),rep("I2",5),rep("I3",5),rep("I4",5))
days<-rep(c(0,2,5,7,14),4)
estatus<-c(4,4,4,3,3,
5,4,4,4,3,
4,4,4,4,4,
5,4,4,3,2)
data<-as.data.frame(cbind(iid,days,estatus))
I'm interested in obtained different outcomes all related to changes in the variable "status"
First I want to know how many individuals (iid) have changed their status in 1 unit by the day 5. I don't want to treat days as a factor, this is a simple example, but in the real dataset days can change between individuals, so I don't have always the same days.
The first outcome would look like this:
iid<-c("I1","I2","I3","I4")
res_5<-c(0,1,0,1)
results_1<-as.data.frame(cbind(iid,res_5))
I1 and I3 did not experience a change in their status of 1 unit.
The second outcome I'm interested in is to know on which day the status of each individual changes 1 unit in their status. The outcome would be like:
iid<-c("I1","I2","I3","I4")
res_d<-c(7,2,NA,2)
results_1<-as.data.frame(cbind(iid,res_d))
I think that I got the first part of the problem, as I know how to aggregate by iid with tidyverse or dplyr. However, I don't know how to check if a certain row is 1, 2 or n units above or below the previous row.
Using by to apply a function for each id.
(i) look for the index where days == 5 and check the diff with the first element
(ii) use diff to compute the difference of consecutive elements in your vector and then look for a difference of 1 or -1
iid <- c(rep("I1", 5), rep("I2", 5), rep("I3", 5), rep("I4", 5))
days <- rep(c(0, 2, 5, 7, 14), 4)
estatus <- c(
4, 4, 4, 3, 3,
5, 4, 4, 4, 3,
4, 4, 4, 4, 4,
5, 4, 4, 3, 2
)
data <- data.frame(iid = iid, days = days, estatus = estatus)
my_func1 <- function(x) {
ind5 <- which(x$days == 5)
d <- x$estatus[ind5] - x$estatus[1]
return((d == 1) | (d == -1))
}
by(data, data$iid, my_func1)
my_func2 <- function(x) {
d <- diff(x$estatus)
hasChangeOf1 <- (d == 1) | (d == -1)
return(x$days[which(hasChangeOf1)[1] + 1])
}
by(data, data$iid, my_func2)

I have a joint PMF in matrix form. How to use R to find $P(N > G)$?

I've just started learning R, and I'm attempting to do some calculations involving a joint PMF in R.
The following matrix holds the joint PMF $p_{NG}(n,g)$:
(pNG <- matrix(c(16, 0, 0, 0, 0, 8, 8, 0, 0, 0, 4, 8, 4,
0, 0, 2, 6, 6, 2, 0, 1, 4, 6, 4, 1)/80,
ncol = 5, nrow = 5, byrow = TRUE))
colnames(pNG) <- rownames(pNG) <- 0:4
The marginal PMFs of $N$ and $G$ are found as follows:
(pN <- rowSums(pNG))
(pG <- colSums(pNG))
The expected value and variance of $N$ are found as follows:
(EN <- sum(0:4 * pN))
(VarN <- sum((0:4 - EN)^2 * pN))
The conditional PMF of $N$ at $G = 0, 1, 2, 3, 4$ are found as follows:
(pNgG <- sweep(pNG, 2, pG, "/"))
The expected value of $N$ given $G$ are found as follows:
(ENgG <- colSums(0:4 * pNgG))
The variance of $N$ given $G$ is found as follows:
(VarNgG <- colSums(outer(0:4, ENgG, "-")^2 * pNgG))
With all this said and done, I want to find $P(N > G)$. However, I'm unsure of how to do this. I was thinking that there is a pattern here that has to do with the diagonals (upper or lower) of the matrix, since this is where $i > j$ or $j > i$; on the diagonals, we have $i = j$..
So you need to add up all the cells of the matrix where the row number is greater than the column number. This is the "lower triangular" sub-matrix, which you can access using R's lower.tri() function:
sum(pXY[lower.tri(pXY)])
You can use upper.tri() for the opposite. (And diag() if you need the diagonal, where the row number equals the column number.)

How to create chain from pairs in R

edit: added current solution
I am dabbling with the Travelling Salesman Problem and am using a solver to calculate the most optimal tour. The output of my linear solver gives me a table with arches in a route, however to plot the tour I require vector with all the locations chained in the right order. Is there an elegant way to chain these arches into a single tour?
One solution would be a series of (nested) joins/matches, however that is not an elegant solution in my opinion.
# output of solver (where i = 'from' and j = 'to')
solution = data.frame(i = c(6, 4, 10, 7, 1, 9, 3, 2, 8, 5),
j = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10))
# transformation
??
# required output
tour = c(6, 1, 5, 10, 3, 7, 4, 2, 8, 9)
So the output I am looking for is a single chain of connected arches (from i to j) in the tour.
My current solution uses for loops and match and looks as follows:
# number of cities to visit
nCities = length(solution)
# empty matrix
tour = matrix(0, nCities, 2)
#first location to visit picked manually
tour[1, ] = solution[1, ]
# for loop to find index of next arch in tour
for(k in 2:nCities){
ind = match(tour[k - 1, 2], solution[, 1])
tour[k, ] = solution[ind, ]
}
# output 'tour' is the solution but then sorted.
# I then take only the first column which is the tour
tour = tour[1, ]
However, it looks clunky and as I try to avoid for loops as much as possible I am not to happy with it. Also, my suspicion is that there are more elegant solutions out there, preferably using base R functions.

Sampling until a condition is met on a list

I'm looking to take the following vector:
v1 = c(2, 5, 7, 9, 1)
I want to run a loop of iterative sampling, placing the values sampled into
a new vector v2 and then break this process when the sum of these values are greater than 12.
This is what I have so far:
v2 = c()
while (sum(v2) > 12) {
sample(v1, 1, replace = FALSE)
if(sum(v2) > 12))
break
}
Not sure if I'm on the right track. Appreciate the help.
I think your syntax has a problem and the use of break makes more sense with a repeat loop:
v1 = c(2, 5, 7, 9, 1)
v2 <- c()
repeat {
v2 <- c(v2, sample(v1[!v1 %in% v2], 1) )
if( sum(v2) > 12 )
break
}
print(v2)
[1] 5 7 9

Resources