Take value from list and use it as index - r

I have a matrix (A) like this (the names of row and column are identification codes (ID):
1 3 10 38 46
1 0 0.4 0 0 0
3 0 0 0 0 0
10 0 0 0.9 0.8 0
38 0 0 0 0 0
46 0 0.1 0 0 0
And another matrix (B) like this:
a b c
1 2.676651e-04 4.404911e-06 9.604227e-06
3 6.073389e-10 3.273222e-05 3.360321e-04
10 4.156392e-08 1.269607e-06 7.509217e-06
38 4.200699e-08 3.227431e-02 8.286920e-11
46 9.352353e-05 3.318948e-20 8.694981e-06
I would like to take the index of the elements of the A matrix >0, therefore I used this command:
temp <- apply(A,1, FUN=function(x) which(x>0))
it returned a list with the correct index of the elements >0.
After that I would like to multiply the element of the matrix B using the index. In particular, I would like to do something like these for each row:
1: 6.073389e-10*3.273222e-05*3.360321e-04
I have used the information of the matrix A (in the second column of the first row I have a value >0) as index to take the element in the matrix B for the first row.
For the second row, I obtained 0 because there aren't element in A[2,]>0
For the third row, I would like to obtain something like the first row, but I should sum the two products
10: 4.156392e-08*1.269607e-06*7.509217e-06 +4.200699e-08*3.227431e-02*8.286920e-11
I have tried to unlist the list but in this way I obtained a vector losing the corresponding between the ID

A <-
matrix(
c(0, 0.4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.9, 0.8, 0, 0, 0, 0, 0, 0, 0, 0.1, 0, 0, 0),
nrow = 5,
ncol = 5,
byrow = T
)
B <-
matrix(
c(
2.676651e-04, 4.404911e-06, 9.604227e-06,
6.073389e-10, 3.273222e-05, 3.360321e-04,
4.156392e-08, 1.269607e-06, 7.509217e-06,
4.200699e-08, 3.227431e-02, 8.286920e-11,
9.352353e-05, 3.318948e-20, 8.694981e-06
),
nrow = 5,
ncol = 3,
byrow = T
)
idx<-which(A>0, arr.ind = T)
result <- 0;
for (i in 1:nrow(idx)) {
cat(A[idx[i,1],idx[i,2]], sep="\n")
cat(B[idx[i,2], ], sep="\n")
result = result + sum(A[idx[i,1],idx[i,2]] * B[idx[i,2],])
}
cat("result=")
cat(result)

Related

Merging 2 Vectors to 1 Vector that satisfies certain criteria

I have two vectors that can be written as follows:
aa <- c(0, 0, 0, 0, 1, 0, 0, 0)
bb <- c(0, 2, 0, 0, 3, 1, 1, 1)
I want to merge these vectors such that the rest of vector bb takes the value zero when vector aa interfere with the value 1. In this example the result should look like:
cc <- c(0, 2, 0, 0, 3, 0, 0, 0)
What is the fastest and most efficient way to do this in R?
We may do
library(dplyr)
ifelse(lag(cummax(aa), default = 0) == 0, bb, aa)
[1] 0 2 0 0 3 0 0 0
Or another way is
bb * !c(0, head(cummax(aa), -1))
[1] 0 2 0 0 3 0 0 0
Or another option
ind <- (which.max(aa) + 1):length(aa)
bb[ind] <- aa[ind]
> bb
[1] 0 2 0 0 3 0 0 0
This is maybe too much for this task. At least for me it is easier to follow:
library(dplyr)
cc <- tibble(aa,bb) %>%
group_by(id_group=lag(cumsum(aa==1), default = 0)) %>%
mutate(cc = ifelse(id_group == 0, coalesce(bb,aa), coalesce(aa,bb))) %>%
pull(cc)
output:
[1] 0 2 0 0 3 0 0 0

Linear programming with conditional constraints in R

I have a linear programming problem where I'm trying to select from a number of binary resources to optimize value, basically a knapsack problem. The issue I'm having is that the different resources have characteristics in common and I want to ensure that my final solution has either 0 or 2 of resources with a specific characteristic. Is there some way to accomplish this? I haven't been able to think of one or find one despite extensive searching. In my data, the decision variables are resources and the constraints are characteristics of those resources. Consider the following code:
library(lpSolve)
const_mat_so<-matrix(c(
c(0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0,1,0,0,1,0,1)
,c(0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0,1,1,0,0,1,1)
,c(0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1,0,1,0,1,0,0)
,c(1, 1, 0, 1, 1, 0, 0, 0, 1, 0, 0,0,0,0,0,0,0)
,c(8800, 8500, 7600, 8600, 8400, 7500, 7000, 8500, 8800, 7700, 6700,5500,1200,6700,9500,8700,6500)
,c(0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0,0,0,0,0,0,0)
,c(0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0,0,0,0,0,0,0)
,c(0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0,0,0,0,0,0,0)
,c(0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1,0,0,1,0,1,0)
,c(0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0,1,1,0,0,0,0)
,c(0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0,0,0,0,0,0,0)
,c(0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0,1,1,1,0,1,0)
,c(0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0,0,0,0,0,1,0)
,c(0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0,0,0,0,1,0,0)
,c(0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1,0,0,0,0,0,0)
),nrow=15,byrow = TRUE)
const_dir_so<-c("=","=","=","=","<=","<=","<=","<=","<=","<=","<=","<=","<=","<=","<=")
max_cost_so = 25000
objective_so = c(21.0, 19.3, 19.2, 18.8, 18.5, 16.6, 16.4, 16.4, 16.0, 16.0, 14.9, 14.6, 14.0, 13.9,12.0,5.5,24.6)
const_rhs_so<-c(1,1,1,1,25000,3,3,3,2,2,2,2,2,2,2)
x = lp ("max", objective_so, const_mat_so, const_dir_so, const_rhs_so, all.bin=TRUE, all.int=TRUE
)
> x
Success: the objective function is 68.1
> x$solution
[1] 1 0 1 0 0 0 0 0 0 0 0 0 1 1 0 0 0
While the above produces a solution, it is not the solution I want because I actually want the last seven constraints to be >=2 or 0. I have no clue how to code this or whether it's possible. Any help would be appreciated. I'm not a linear programming whiz so please forgive any misconceptions regarding the approach.
My understanding is that each of the last 7 constraints are to be greater than 2 or equal to zero, i.e. not 1.
1) There are only 7 such constraints so there are 2^7 = 128 possibilities which is small enough that we can just run every one using the formulation given it he question without excessive runtime and then find the maximum of those.
dec2bin takes a base 10 (i.e. decimal) number and converts it to a binary vector of 0s and 1s. Running it on each number between 0 and 127 gives binary numbers such that the 1s correspond to constraints which are >= 2 (with the rest equal to 0).
dec2bin <- function(dec, digits = 7) {
# see http://stackoverflow.com/questions/6614283/converting-decimal-to-binary-in-r
tail(rev(as.integer(intToBits(dec))), digits)
}
runLP <- function(i) {
bin <- dec2bin(i)
n <- length(const_rhs_so) # 15
ix <- seq(to = n, length = length(bin)) # indexes of last 7 constraints, i.e. 9:15
const_dir_so[ix] <- ifelse(bin, ">=", "=")
const_rhs_so[ix] <- 2*bin
lp("max", objective_so, const_mat_so, const_dir_so, const_rhs_so, all.bin = TRUE)
}
lpout <- lapply(0:127, runLP)
ixmax <- which.max(sapply(lpout, "[[", "objval"))
ans <- lpout[[ixmax]]
ans
ans$solution
tail(c(const_mat_so %*% ans$solution), 7)
giving:
> ans
Success: the objective function is 62
> ans$solution
[1] 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1
> tail(c(const_mat_so %*% ans$solution), 7) # last 7 constraint values
[1] 0 0 0 0 0 0 0
2) In #Erwin Kalvelagen's second alternative it refers to constraining variables but I think what was meant was that x in his answer is the value of the LHS of one of the last 7 constraints. That is, if C is the matrix of the original last 7 constraints then replace those original 7 constraints with these 14 constraints:
Cx + D1 y <= 0
Cx + D2 y >= 0
where D1 is a diagonal matrix whose diagonal elements are any sufficiently large negative number and D2 is a diagonal matrix whose diagonal elements are all -2. Here we are optimizing over x and y vectors of binary variables. The x variables are as in the question and there are 7 new y binary variables such that y[i] is 0 to constrain the ith of the last 7 original constraints to 0 or 1 to constrain it to 2 or more. The y variables are called bin in (1). The coefficients of the y variables in the objective are all zero.
In terms of lpSolve R code:
objective_so2 <- c(objective_so, numeric(7))
const_mat_so2 <- cbind(rbind(const_mat_so, const_mat_so[9:15, ]),
rbind(matrix(0, 8, 7), diag(-100, 7), diag(-2, 7)))
const_dir_so2 <- c(const_dir_so, rep(">=", 7))
const_rhs_so2 <- c(const_rhs_so[1:8], numeric(14))
x2 = lp ("max", objective_so2, const_mat_so2, const_dir_so2, const_rhs_so2, all.bin = TRUE)
giving the same value of 62 as in (1). The y variables (last 7) are all 0 which also corresponds to (1). This also provides a double check as two methods have now given consistent answers.
> x2
Success: the objective function is 62
> x2$solution
[1] 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0
I believe LpSolve supports semi-continuous variables. A semi-continuous variable with lower bound L and upper bound U can assume the values 0 or between L and U. I am not sure the R package lpSolve supports this variable type.
However we can simulate this with an extra binary variable y and extra constraints. So you need to make your x variable continuous (or integer if you want only integer values) and add the constraints:
2*y <= x <= U*y
where U is an upper bound for x.
lpSolveAPI package provides more advanced interface to "lp_solve". As #Erwin Kalvelagen mentioned, "lp_solve" and lpSolveAPI supports Semi-continuous variable (Semi-continuous decision variables can take allowed values between their upper and lower bound as well as zero). And a constraint matrix enable you to transfer the outputs of 9-15th constraint formulas into 18-24th variables. For example (about 9th constraint), when x6 + x11 + x14 + x16 - x18 = 0, x6 + x11 + x14 + x16 = x18. So I think you can control x6 + x11 + x14 + x16 via semi-continuous variable, x18.
library(lpSolveAPI)
## add 18-24th cols to define the 18-24th variables
const_mat_so2 <- cbind(const_mat_so, rbind(matrix(0, nrow = 8, ncol = 7), diag(-1, 7)))
## [EDITED] make a model and set a constraint matrix and objective coefs
model <- make.lp(nrow(const_mat_so2), 0)
for(i in 1:ncol(const_mat_so2)) add.column(model, const_mat_so2[,i])
set.constr.type(model, c(const_dir_so[-c(9:15)], rep("=", 7)))
set.rhs(model, c(const_rhs_so[-c(9:15)], rep(0, 7))) # each original output - 18-24th = 0
set.objfn(model, c(objective_so, rep(0, 7))) # 18-24th are 0
## define semi-continuous and bounds.
set.semicont(model, col = 18:24)
set.bounds(model, lower = rep(1.9, 7), col = 18:24) # default upper is Inf.
## define other things
set.type(model, col = 1:17, type = "binary") # original variable
set.type(model, col = 18:24, type = "integer") # outputs of original constraint formulas
lp.control(model, sense = "max") # do maximize
# write.lp(model, "filename.lp", "lp") # if you want to watch the whole model
solve(model)
get.variables(model)
# [1] 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 [18] 0 0 0 0 0 0 0
get.objective(model)
# [1] 62
t(const_mat_so %*% res[1:17])
# [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14] [,15]
# [1,] 1 1 1 1 22300 1 0 0 0 0 0 0 0 0 0

how to remove one data in r

In R I have some vector.
x <- c(0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0)
I want to remove only "0" in x vector, but it removes all '0' in this vector.
Example
x=x[!x %in% 0 )]
All zero in this vector had been remove in x vector
For Example in Python
x = [0,1,0,1,0,0,0,1]
x.remove(0)
x
[1, 0, 1, 0, 0, 0, 1]
x.remove(0)
x
[1, 1, 0, 0, 0, 1]
We can use match to remove the first occurrence of a particular number
x <- c(1, 0, 1, 0, 0, 0, 1)
x[-match(1, x)]
#[1] 0 1 0 0 0 1
If you have any other number to remove in array, for example 5 in the case below,
x <- c(1, 0, 5, 5, 0, 0, 1)
x[-match(5, x)]
#[1] 1 0 5 0 0 1
You may need which.min(),
which determines the index of the first minimum of a vector:
x <- c(0,1,0,1,0,0,0,1)
x <- x[-which.min(x)]
x
# [1] 1 0 1 0 0 0 1
If your vector contains elements other than 0 or 1: x <- x[-which.min(x != 0)]

make data frame with binaries to sum to 1

I have a data frame with only zeros and ones, e.g.
df <- data.frame(v1 = rbinom(100, 1, 0.5),
v2 = rbinom(100, 1, 0.2),
v3 = rbinom(100, 1, 0.4))
Now I want to modify this data set so that each row sums to 1.
So this
1 0 0
1 1 0
0 0 1
1 1 1
0 0 0
should become this:
1 0 0
0.5 0.5 0
0 0 1
0.33 0.33 0.33
0 0 0
edit: rows with all zeros should be left as is
As already pointed out by #lmo the data.frame (or matrix) can be modified with
df <- df / rowSums(df)
In the case of rows containing only zeros this will lead to rows containing only NaN. Since these rows should be kept as they were, the easiest way is probably to correct for this afterwards with
df[is.na(df)] <- 0
Here is a quick method:
# create matrix
temp <- matrix(c(1, 0, 0, 1, 1, 0, 0, 0, 1, 1, 1, 1), ncol=3, byrow=T)
temp / rowSums(temp)
This exploits the fact that matrices are ordered column-wise, so that the element by element division of rowsSums and the recycling are aligned.
In the case that all elements in a row are zero, and you don't want an Inf, another method from #RHertel s is the following:
# save rowSum:
mySums <- rowSums(temp)
temp / ifelse(mySums != 0, mySums, 1)

string index matching alternate 1 and -1

I am trying to write a function that takes in a vector of integers and returns the indexes where 1 and -1 alternate. I thought this would be a simple function but it is proving devilishly hard to write!
Example:
index: 1 2 3 4 5 6 7 8 9 10
string: 1 0 0 0 1 -1 -1 0 1 -1 #this is input to function
returns index of -1 and 1 so that -1 index is listed first
[6,1] #from 1 to -1
[6,9] #from -1 to 1
[10,9] #from 1 to -1
My (non-working) attempt:
x<-c(1,0,0,0,1,-1,-1,0,1)
matchVals(x)
matchVals<-function(x){
current.index<-getStart(x)
#next VALUE to search for should be:
next.val<-x[current.index]*-1
next.index<-getNextVal(x[current.index:length(x)],next.val) + current.index #adding in offset
return(current.index,next.index)
}
#gets index of first value that isnt a 0
getStart<-function(x){
lapply(1:length(x),function(i){if(x[i]!=0)return(i)})
return(NA)
}
#gets FIRST index of the value specified (so dont feed it the entire string). Must add offset for truncated portion of string
getNextVal<-function(x,v){
lapply(1:length(x),function(i){if(x[i]==v)return(i)})
return(NA)
}
A way would to be to iterate, alternately, over the indices of 1 and -1 and save indices progressively.
Having:
x = c(1, 0, 0, 0, 1, -1, -1, 0, 1, -1)
and computing the indices:
i1 = which(x == 1)
i2 = which(x == -1)
find the progressive sequence recursively:
ff = function(x, y, acc = integer())
{
if(!length(x)) return(acc)
if(!length(y)) return(c(acc, x[[1L]]))
Recall(y[(findInterval(x[[1L]], y) + 1L):length(y)], x[-1L], c(acc, x[[1L]]))
}
ans = if(i1[[1]] < i2[[1]]) ff(i1, i2) else ff(i2, i1)
ans
#[1] 1 6 9 10
To get the exact desired output (among alternative ways):
tmp = embed(ans, 2)
i = (seq_len(nrow(tmp)) %% 2) == (if(i1[[1]] < i2[[1]]) 0 else 1)
tmp[i, ] = t(apply(tmp[i, , drop = FALSE], 1, rev))
tmp
# [,1] [,2]
#[1,] 6 1
#[2,] 6 9
#[3,] 10 9
Testing on other data:
X = c(0, 0, -1, 0, 0, -1, 1, 1, 1, -1, -1, 0, 0, -1, 0, 0, 1, -1,
-1, -1, 0, 1, 0, 1, 1, 1, -1, 0, 0, 1, 0, 1, 0, -1, 1, 1, 1,
-1, 0, 0, 1, 0, 1, 0, -1, 1, 1, 1)
i1 = which(X == 1)
i2 = which(X == -1)
if(i1[[1]] < i2[[1]]) ff(i1, i2) else ff(i2, i1)
# [1] 3 7 10 17 18 22 27 30 34 35 38 41 45 46
#..and proceed as necessary
This is an incomplete answer, but I think it's in the right direction.
Test case:
x <- c(1,0,0,0,1,-1,-1,0,1)
We basically want to ignore zero values, so let's (1) replace them by NAs and (2) use zoo::na.locf ("last observation carried forward") to replace them by the leading value
x2 <- x
x2[x2==0] <- NA
x2 <- zoo::na.locf(x2)
Now use rle() to identify runs/breakpoints:
(r <- rle(x2))
## Run Length Encoding
## lengths: int [1:3] 5 3 1
## values : num [1:3] 1 -1 1
The following statement gives (6,9,10), the locations you were interested in: r$values gives the corresponding information about the direction of the switch
cumsum(r$lengths)+1

Resources