I have a random 10x10 DF (in reality its a few million rows):
df <- replicate(10, sample(0:5, 10, rep=T))
I need to calculate a column on at end of my df that is the a count of the maximum length of consecutive values equal or over a set number e.g. 3 or more.
Therefore, a single row that contained the values: 2,4,3,3,4,5,1,0,5,1 would return a value of 5, as the set of values 4,3,3,4,5 are all 3 or more and are consecutive.
while a 5 does occur again in the row which is above 3 its consecutive occurrence is less than 5 consecutives numbers over 3 earlier in the row.
Any help appreciated.
# condition is that x should be larger or equal to 3
condition <- function(x) x >= 3
# example row
row = c(2,4,3,3,4,5,1,0,5,1)
# we can use condition on row:
condition(row)
# and we can emplay rle on that:
rle(condition(row))
# we need to filter those rle results for TRUE:
r <- rle(condition(row))
r$length[r$values == TRUE]
# The answer is the max of the latter
max(r$length[r$values])
or for your dataframe example
# condition is that x should be larger or equal to 3
condition <- \(x) x >= 3
number <- function(row, condition){
r <- row |>
condition() |>
rle()
max(r$length[r$values])
}
df <- replicate(10, sample(0:5, 10, rep=T))
apply(df, 1, number, condition)
Use rle here for run-length encoding.
vec <- c(2,4,3,3,4,5,1,0,5,1)
r <- rle(vec >= 3)
r
# Run Length Encoding
# lengths: int [1:5] 1 5 2 1 1
# values : logi [1:5] FALSE TRUE FALSE TRUE FALSE
ind <- head(which(r$values), 1)
ind
# [1] 2
r$lengths[ind]
# [1] 5
### to see what those five values are ...
r$values[-ind] <- FALSE
vec[inverse.rle(r)]
# [1] 4 3 3 4 5
That gets us the longest length within the row. To apply this row-wise to a frame,
set.seed(42)
df <- replicate(10, sample(0:5, 10, rep=T))
df
# [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
# [1,] 0 0 3 1 3 5 1 3 5 3
# [2,] 4 4 4 5 4 5 4 0 1 1
# [3,] 0 5 4 2 3 1 0 2 1 1
# [4,] 0 3 4 5 1 3 0 2 0 2
# [5,] 1 1 3 1 1 2 3 4 1 4
# [6,] 3 1 1 3 2 5 4 4 4 4
# [7,] 1 2 3 3 0 4 1 3 5 5
# [8,] 1 0 2 5 4 1 0 5 4 2
# [9,] 0 0 1 1 1 5 4 4 3 5
# [10,] 3 2 0 4 1 1 3 3 0 3
func <- function(x, lim = 3) {
r <- rle(x >= lim)
ind <- head(which(r$values), 1)
if (length(ind) == 1 && !anyNA(ind)) r$lengths[ind] else 0
}
apply(df, 1, func)
# [1] 1 7 2 3 1 1 2 2 5 1
Related
I have adjacency matrix as below:
> matrix(c(0,1,0,0,1,0,1,0,0,1,0,1,0,0,1,0),ncol=4,byrow=T)
[,1] [,2] [,3] [,4]
[1,] 0 1 0 0
[2,] 1 0 1 0
[3,] 0 1 0 1
[4,] 0 0 1 0
Question 1: how can I get the corresponding information like:
2 5 7 10 12 15from R?
Question 2: how can I get the location information of '1's in each row like:
2
1 3
2 4
3
or 2 1 3 2 4 3from R?
Thanks!
Just use which on a logical matrix
which(m1 == 1)
#[1] 2 5 7 10 12 15
If we need the column index in a list
sapply(split(!!m1, col(m1)), which)
Or as a vector
na.omit(na_if(c(t(m1 * col(m1))), 0))
#[1] 2 1 3 2 4 3
data
m1 <- matrix(c(0,1,0,0,1,0,1,0,0,1,0,1,0,0,1,0),ncol = 4,byrow = TRUE)
m <- matrix(c(0,1,0,0,1,0,1,0,0,1,0,1,0,0,1,0),ncol=4,byrow=T)
mm <- m == 1
which(mm)
#[1] 2 5 7 10 12 15
apply(mm, 1, which)
#[[1]]
#[1] 2
#
#[[2]]
#[1] 1 3
#
#[[3]]
#[1] 2 4
#
#[[4]]
#[1] 3
perhaps also see raster::adjacency
I want to generate a symmetric matrix around a diagonal of zeroes and a predetermined sequence around them. In theory the lines should show as
0 1 3 5 7 9
1 0 3 5 7 9
I've tried tweaking with the conditionals, but I suspect that it's wonky because of indexing, which I am nowhere near skilled enough to fix.
bend <- function(n){
m <- seq(1, n, by=2)
a <- length(m)
y <- matrix(nrow= a, ncol = a, byrow= TRUE)
y <- ifelse(row(y) == col(y), 0, m)
y
}
Assuming that the input is a 9, expected output is
0 1 3 5 7 9
1 0 3 5 7 9
1 3 0 5 7 9
1 3 5 0 7 9
1 3 5 7 0 9
1 3 5 7 9 0
Actual output is
0 3 5 7 9 1
3 0 7 9 1 3
5 7 0 1 3 5
7 9 1 0 5 7
9 1 3 5 0 9
1 3 5 7 9 0
There's a simpler way to do what you need. You can start off by creating a matrix of length(x) + 1 columns and rows with all elements as a logical TRUE. Then make the diagonal FALSE using diag(). Now you can replace the TRUEs with your desired vector. The diagonal being FALSE is not affected. Since the values are replaced column-wise you need a final transpose t() to get correct result.
This way, you don't need to worry about tracking indices.
x <- c(1,3,5,7,9)
make_matrix <- function(x) {
m <- matrix(TRUE, ncol = length(x) + 1, nrow = length(x) + 1)
diag(m) <- FALSE
m[m] <- x
t(m)
}
make_matrix(x)
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 0 1 3 5 7 9
[2,] 1 0 3 5 7 9
[3,] 1 3 0 5 7 9
[4,] 1 3 5 0 7 9
[5,] 1 3 5 7 0 9
[6,] 1 3 5 7 9 0
Here's another way with sapply. This creates the necessary row elements in each iteration and puts them in a matrix by column. Again, you need a t() to get correct results. -
sapply(0:length(x), function(a) append(x, 0, after = a)) %>% t()
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 0 1 3 5 7 9
[2,] 1 0 3 5 7 9
[3,] 1 3 0 5 7 9
[4,] 1 3 5 0 7 9
[5,] 1 3 5 7 0 9
[6,] 1 3 5 7 9 0
Benchmarks -
sapply is slower, likely because it's creating the matrix elements one row at a time and calls append for every row. All this overhead is avoided in the make_matrix() approach.
x <- sample(100)
microbenchmark(
make_matrix = make_matrix(x),
sapply = t(sapply(0:length(x), function(a) append(x, 0, after = a))),
akrun_forloop = {
n <- length(x) + 1
m1 <- matrix(0, n, n)
for(i in seq_len(nrow(m1))) m1[i, -i] <- x
},
times = 1000
)
Unit: microseconds
expr min lq mean median uq max neval
make_matrix 111.495 117.5610 128.3135 126.890 135.7540 225.323 1000
sapply 520.620 551.1765 592.2642 573.335 602.2585 10477.221 1000
akrun_forloop 3380.292 3526.3080 3837.1570 3648.765 3812.5075 20943.245 1000
Using a simple for loop
n <- length(x) + 1
m1 <- matrix(0, n, n)
for(i in seq_len(nrow(m1))) m1[i, -i] <- x
m1
# [,1] [,2] [,3] [,4] [,5] [,6]
#[1,] 0 1 3 5 7 9
#[2,] 1 0 3 5 7 9
#[3,] 1 3 0 5 7 9
#[4,] 1 3 5 0 7 9
#[5,] 1 3 5 7 0 9
#[6,] 1 3 5 7 9 0
data
x <- c(1,3,5,7,9)
Is it possible to extend the sample function in R to not return more than say 2 of the same element when replace = TRUE?
Suppose I have a list:
l = c(1,1,2,3,4,5)
To sample 3 elements with replacement, I would do:
sample(l, 3, replace = TRUE)
Is there a way to constrain its output so that only a maximum of 2 of the same elements are returned? So (1,1,2) or (1,3,3) is allowed, but (1,1,1) or (3,3,3) is excluded?
set.seed(0)
The basic idea is to convert sampling with replacement to sampling without replacement.
ll <- unique(l) ## unique values
#[1] 1 2 3 4 5
pool <- rep.int(ll, 2) ## replicate each unique so they each appear twice
#[1] 1 2 3 4 5 1 2 3 4 5
sample(pool, 3) ## draw 3 samples without replacement
#[1] 4 3 5
## replicate it a few times
## each column is a sample after out "simplification" by `replicate`
replicate(5, sample(pool, 3))
# [,1] [,2] [,3] [,4] [,5]
#[1,] 1 4 2 2 3
#[2,] 4 5 1 2 5
#[3,] 2 1 2 4 1
If you wish different value to appear up to different number of times, we can do for example
pool <- rep.int(ll, c(2, 3, 3, 4, 1))
#[1] 1 1 2 2 2 3 3 3 4 4 4 4 5
## draw 9 samples; replicate 5 times
oo <- replicate(5, sample(pool, 9))
# [,1] [,2] [,3] [,4] [,5]
# [1,] 5 1 4 3 2
# [2,] 2 2 4 4 1
# [3,] 4 4 1 1 1
# [4,] 4 2 3 2 5
# [5,] 1 4 2 5 2
# [6,] 3 4 3 3 3
# [7,] 1 4 2 2 2
# [8,] 4 1 4 3 3
# [9,] 3 3 2 2 4
We can call tabulate on each column to count the frequency of 1, 2, 3, 4, 5:
## set `nbins` in `tabulate` so frequency table of each column has the same length
apply(oo, 2L, tabulate, nbins = 5)
# [,1] [,2] [,3] [,4] [,5]
#[1,] 2 2 1 1 2
#[2,] 1 2 3 3 3
#[3,] 2 1 2 3 2
#[4,] 3 4 3 1 1
#[5,] 1 0 0 1 1
The count in all columns meet the frequency upper bound c(2, 3, 3, 4, 1) we have set.
Would you explain the difference between rep and rep.int?
rep.int is not the "integer" method for rep. It is just a faster primitive function with less functionality than rep. You can get more details of rep, rep.int and rep_len from the doc page ?rep.
Let n be a positive integer. We have a matrix B that has n columns, whose entries are integers between 1 and n. The aim is to match the rows of B with the rows of permutations(n), memorizing the indices in a vector v.
For example, let us consider the following. If
permutations(3)=
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 1 3 2
[3,] 2 1 3
[4,] 2 3 1
[5,] 3 1 2
[6,] 3 2 1
and
B=
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 1 2 3
[3,] 3 1 2
[4,] 2 3 1
[5,] 3 1 2
Then the vector v is
1 1 5 4 5
because the first two rows of B are equal to the row number 1 of permutations(3), the third row of B is the row number 5 of permutations(3), and so on.
I tried to apply the command
row.match
but the latter returns the error:
Error in do.call("paste", c(x[, , drop = FALSE], sep = "\r")) :
second argument must be a list
One way is to use match,
match(do.call(paste, data.frame(B)), do.call(paste, data.frame(m1)))
#[1] 1 1 5 4 5
One possible way is to turn your matrices into dataframes and join them:
A = read.table(text = "
1 2 3
1 3 2
2 1 3
2 3 1
3 1 2
3 2 1
")
B = read.table(text = "
1 2 3
1 2 3
3 1 2
2 3 1
3 1 2
")
library(dplyr)
A %>%
mutate(row_id = row_number()) %>%
right_join(B) %>%
pull(row_id)
# [1] 1 1 5 4 5
I'm using r and
I want to fill a 8 lenght dimension vector/table with integer numbers form 1 to 4 with respect to the conditions below:
vector [i]<= vector[i+1]
all integrs should be present
example:
1 1 1 1 2 2 3 4 may be a solution
1 2 1 1 2 3 3 4 isn't a solution to my problem
I am wondering also if there is a way to list all solutions
To get all solutions, reserve four slots for the numbers 1:4 (since every number must appear at least once), and consider all possible length-4 sequences of 1:4 to fill the remaining slots. Sorting and removing duplicates leaves you with 35 non-decreasing sequences:
# The sequences will be the rows of a matrix. First, the 'reserved' slots:
reserved = matrix(1:4, 256, 4, byrow=TRUE)
# Add all combinations of 1:4 to fill the remaining four slots:
result = cbind(reserved,
unname(as.matrix(expand.grid(1:4, 1:4, 1:4, 1:4))) )
# Now simply sort and de-duplicate along rows:
result = t(apply(result, 1, sort))
result = unique(result)
> head(result)
# [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
# [1,] 1 1 1 1 1 2 3 4
# [2,] 1 1 1 1 2 2 3 4
# [3,] 1 1 1 1 2 3 3 4
# [4,] 1 1 1 1 2 3 4 4
# [5,] 1 1 1 1 2 2 3 4
# [6,] 1 1 1 2 2 2 3 4