I have 2 columns of different size, A and B.
A | B
5 | 4
1 | 2
3 |
How do I say If CellA is < CellB then give me CellB-CellA? So it calculates:
"5(A1) is bigger than 4(B1) and 2(B2)" so no results
"1(A2) is smaller than 4(B1) and 2(B2)" so we have 3 and 1
"3(A3) is smaller than 4(B1) and bigger than 2(B2)" so we have 1
Result = (3, 1, 1)
The closest I got was:
if (A < B) {B - A}
But that only works with columns of same size, I want each individual cell of Column A to interact with each individual cell of column B. How can I do that?
Since the columns of different sizes then it must be list and not data.frame so the solution :
listAB <- list(A = c(5,1,3) , B = c(4,2))
equ <- function(li){
result <- vector("numeric")
for (x in li$A){
result <- append(result , sapply(li$B , function(y) if(x < y) y - x))
}
unlist(result)
}
equ(listAB)
#> [1] 3 1 1
Created on 2022-05-29 by the reprex package (v2.0.1)
Related
I would like to know: if B becomes bigger than A, get the first value of B that is bigger than A (if B keeps bigger than A, it would not count until B becomes smaller than A) in the trend, and after that, if B becomes small than A, then get first value of B that is smaller that A (if B keeps smaller than A, it would not count, until B becomes bigger than A ). My code below that does not work:
>A
[1] 1 2 3 4 5 8 10 15
>B
[1] 0 3 5 3 6 11 14 13
fn<-function(m,n){
count<-0
for (i in 1:length(m)){
for (j in i+1: length(m)+1){
if((m[i]<n[i]&&m[j]>n[j])||(m[i]>n[i]&&m[j]<n[j])){
count<-count+1
}
}
}
}
fn(A,B)
the output should be (3 > 2, count: 1; 3 < 4, count: 2; 6 > 5, count: 3, 13 < 15, count: 4):
>count
4
Try something like this:
d <- sign(A - B)
sum(head(d, -1) * tail(d, -1) < 0)
trying to catch how many times sign changes in adjacent positions of difference.
if you are interested in positions then you can do:
s <- which(head(d, -1) * tail(d, -1) < 0)
s <- s + if(length(s) > 0) 1
A_ <- A[s]
B_ <- B[s]
count <- length(s)
I want to evaluate the distance between non-zero data. So if i have 50 data, and only the first and last data is non-zero, thus i want the result to be 49.
For example, my data is:
1. 0
2. 0
3. 5
4. 6
5. 0
6. 1
7. 0
Based on my data above, i want to get 4 variables:
v0 = 3 (because the distance between 0th to 3rd data is 3 jumps)
v1 = 1 (because the distance between 3rd to 4th data is 1 jump)
v2 = 2 (because the distance between 4rd to 6th data is 2 jump)
v3 = 1 (because the distance between 6rd to 7th data is 1 jump)
This is my code:
data=c(0,0,5,6,0,1,0)
t=1
for (i in data) {
if (i == 0) {
t[i]=t+1
}
else {
t[i]=1
}
}
t
The result is:
[1] 1 NA NA NA 1 1
Could you help me in figuring out this problem? I also hope that the code is using some kind of loop, so that it can be applied to any other data.
The general rule is not clear from the question but if x is the input we assume that:
the input is non-negative
the first element in output is the position of the first +ve element in x
subsequent elements of output are distances between successive +ve elements of x
if that results in a vector whose sum is less than length(x) append the remainder
To do that determine the positions of the positive elements of c(1, x), calculate the differences between successive elements in that reduced vector using diff and then if they don't sum to length(x) append the remainder.
dists <- function(x) {
d <- diff(which(c(1, x) > 0))
if (sum(d) < length(x)) c(d, length(x) - sum(d)) else d
}
# distance to 5 is 3 and then to 6 is 1 and then to 1 is 2 and 1 is left
x1 <- c(0, 0, 5, 6, 0, 1, 0)
dists(x1)
## [1] 3 1 2 1
# distance to first 1 is 1 and from that to second 1 is 3
x2 <- c(1, 0, 0, 1)
dists(x2)
## [1] 1 3
Here it is redone using a loop:
dists2 <- function(x) {
pos <- 0
out <- numeric(0)
for(i in seq_along(x)) {
if (x[i]) {
out <- c(out, i - pos)
pos <- i
}
}
if (sum(out) < length(x)) out <- c(out, length(x) - sum(out))
out
}
dists2(x1)
## [1] 3 1 2 1
dists2(x2)
## [1] 1 3
Updates
Simplification based on comments below answer. Added loop approach.
Suppose that I have a set of 10 elements. Suppose that my code is able to choose only 3 elements at a time. Then, I would like it to choose another $3$ elements, however, without selecting the elements that are already selected.
x <- c(4,3,5,6,-2,7,-4,10,22,-12)
Then, suppose that my condition is to select 3 elements that are less than 5. Then,
new_x <- c(4, 3, -2)
Then, I would like to select another 3 elements that are less than 5 but were not selected at the first time. If there is no 3 element then the third element should have value zero.
Hence,
new_xx <- c(-4,-12,0)
Any help, please?
Here is an option using split
f <- function(x, max = 5, n = 3) {
x <- x[x < max]
ret <- split(x, rep(1:(length(x) / n + 1), each = n)[1:length(x)])
lapply(ret, function(w) replace(rep(0, n), 1:length(w), w))
}
f(x)
#$`1`
#[1] 4 3 -2
#
#$`2`
#[1] -4 -12 0
Explanation: We define a custom function that first selects entries < 5, then splits the resulting vector into chunks of length 3 and stores the result in a list, and finally 0-pads those list elements that are vectors of length < 3.
Sample data
x <- c(4,3,5,6,-2,7,-4,10,22,-12)
Given a vector v of F non-negative integers, I want to create, one by one, all possible sets of K vectors with size F whose sum is v. I call C the matrix of these K vectors; the row sum of C gives v.
For instance, the vector (1,2) of size F=2, if we set K=2, can be decomposed in:
# all sets of K vectors such that their sum is (1,2)
C_1 = 1,0 C_2 = 1,0 C_3 = 1,0 C_4 = 0,1 C_5 = 0,1 C_6 = 0,1
2,0 1,1 0,2 2,0 1,1 0,2
The goal is to apply some function to each possible C. Currently, I use this code, where I pre-compute all possible C and then go through them.
library(partitions)
K <- 3
F <- 5
v <- 1:F
partitions <- list()
for(f in 1:F){
partitions[[f]] <- compositions(n=v[f],m=K)
}
# Each v[f] has multiple partitions. Now we create an index to consider
# all possible combinations of partitions for the whole vector v.
npartitions <- sapply(partitions, ncol)
indices <- lapply(npartitions, function(x) 1:x)
grid <- as.matrix(do.call(expand.grid, indices)) # breaks if too big
for(n in 1:nrow(grid)){
selected <- c(grid[n,])
C <- t(sapply(1:F, function(f) partitions[[f]][,selected[f]]))
# Do something with C
#...
print(C)
}
However, when the dimensions are too big, of F, K are large, then the number of combinations explodes and expand.grid can't deal with that.
I know that, for a given position v[f], I can create a partition at a time
partition <- firstcomposition(n=v[f],m=K)
nextcomposition(partition, v[f],m=K)
But how can I use this to generate all possible C as in the above code?
npartitions <- ......
indices <- lapply(npartitions, function(x) 1:x)
grid <- as.matrix(do.call(expand.grid, indices))
You can avoid the generation of grid and successively generate its rows thanks to a Cantor expansion.
Here is the function returning the Cantor expansion of the integer n:
aryExpansion <- function(n, sizes){
l <- c(1, cumprod(sizes))
nmax <- tail(l,1)-1
if(n > nmax){
stop(sprintf("n cannot exceed %d", nmax))
}
epsilon <- numeric(length(sizes))
while(n>0){
k <- which.min(l<=n)
e <- floor(n/l[k-1])
epsilon[k-1] <- e
n <- n-e*l[k-1]
}
return(epsilon)
}
For example:
expand.grid(1:2, 1:3)
## Var1 Var2
## 1 1 1
## 2 2 1
## 3 1 2
## 4 2 2
## 5 1 3
## 6 2 3
aryExpansion(0, sizes = c(2,3)) + 1
## [1] 1 1
aryExpansion(1, sizes = c(2,3)) + 1
## [1] 2 1
aryExpansion(2, sizes = c(2,3)) + 1
## [1] 1 2
aryExpansion(3, sizes = c(2,3)) + 1
## [1] 2 2
aryExpansion(4, sizes = c(2,3)) + 1
## [1] 1 3
aryExpansion(5, sizes = c(2,3)) + 1
## [1] 2 3
So, instead of generating grid:
npartitions <- ......
indices <- lapply(npartitions, function(x) 1:x)
grid <- as.matrix(do.call(expand.grid, indices))
for(n in 1:nrow(grid)){
selected <- grid[n,]
......
}
you can do:
npartitions <- ......
for(n in seq_len(prod(npartitions))){
selected <- 1 + aryExpansion(n-1, sizes = npartitions)
......
}
It is easy to generate a random subset of the powerset if we are able to compute all elements of the powerset first and then randomly draw a sample out of it:
set.seed(12)
x = 1:4
n.samples = 3
library(HapEstXXR)
power.set = HapEstXXR::powerset(x)
sample(power.set, size = n.samples, replace = FALSE)
# [[1]]
# [1] 2
#
# [[2]]
# [1] 3 4
#
# [[3]]
# [1] 1 3 4
However, if the length of x is large, there will be too many elements for the powerset. I am therefore looking for a way to directly compute a random subset.
One possibility is to first draw a "random length" and then draw random subset of x using the "random length":
len = sample(1:length(x), size = n.samples, replace = TRUE)
len
# [1] 2 1 1
lapply(len, function(l) sort(sample(x, size = l)))
# [[1]]
# [1] 1 2
#
# [[2]]
# [1] 1
#
# [[3]]
# [1] 1
This, however, generates duplicates. Of course, I could now remove the duplicates and repeat the previous sampling using a while loop until I end up with n.samples non-duplicate random subsets of the powerset:
drawSubsetOfPowerset = function(x, n) {
ret = list()
while(length(ret) < n) {
# draw a "random length" with some meaningful prob to reduce number of loops
len = sample(0:n, size = n, replace = TRUE, prob = choose(n, 0:n)/2^n)
# draw random subset of x using the "random length" and sort it to better identify duplicates
random.subset = lapply(len, function(l) sort(sample(x, size = l)))
# remove duplicates
ret = unique(c(ret, random.subset))
}
return(ret)
}
drawSubsetOfPowerset(x, n.samples)
Of course, I could now try to optimize several components of my drawSubsetOfPowerset function, e.g. (1) trying to avoid the copying of the object ret in each iteration of the loop, (2) using a faster sort, (3) using a faster way to remove duplicates of the list, ...
My question is: Is there maybe a different way (which is more efficient) of doing this?
How about using binary representation? This way we can generate a random subset of integers from the length of the total number of power sets given by 2^length(v). From there we can make use of intToBits along with indexing to guarantee we generate random unique subsets of the power set in an ordered fashion.
randomSubsetOfPowSet <- function(v, n, mySeed) {
set.seed(mySeed)
lapply(sample(2^length(v), n) - 1, function(x) v[intToBits(x) > 0])
}
Taking x = 1:4, n.samples = 5, and a random seed of 42, we have:
randomSubsetOfPowSet(1:4, 5, 42)
[[1]]
[1] 2 3 4
[[2]]
[1] 1 2 3 4
[[3]]
[1] 3
[[4]]
[1] 2 4
[[5]]
[1] 1 2 3
Explanation
What does binary representation have to do with power sets?
It turns out that given a set, we can find all subsets by turning to bits (yes, 0s and 1s). By viewing the elements in a subset as on elements in the original set and the elements not in that subset as off, we now have a very tangible way of thinking about how to generate each subset. Observe:
Original set: {a, b, c, d}
| | | |
V V V V b & d
Existence in subset: 1/0 1/0 1/0 1/0 are on
/ \
/ \
| |
V V
Example subset: {b, d} gets mapped to {0, 1, 0, 1}
| \ \ \_______
| | \__ \
| |___ \____ \____
| | | |
V V V V
Thus, {b, d} is mapped to the integer 0*2^0 + 1*2^1 + 0*2^2 + 1*2^3 = 10
This is now a problem of combinations of bits of length n. If you map this out for every subset of A = {a, b, c, d}, you will obtain 0:15. Therefore, to obtain a random subset of the power set of A, we simply generate a random subset of 0:15 and map each integer to a subset of A. How might we do this?
sample comes to mind.
Now, it is very easy to go the other way as well (i.e. from an integer to a subset of our original set)
Observe:
Given the integer 10 and set A given above (i.e. {a, b, c, d}) we have:
10 in bits is -->> {0, 1, 0, 1}
Which indices are greater than 0?
Answer: 2 and 4
Taking the 2nd the 4th element of our set gives: {b, d} et Voila!