0-1 sequence without spaces - r

Spaces are redundant when reporting a binary sequence. This code
x <- '1 0 0 0 0 0 1 1 0 1 0 1 1 0 '
y<-gsub(' +', '', x)
does the job so I can copy and paste from R. How do I do the same for 0-1 sequences (and other one-digit data) in others formats, e.g.,
x <- c(1, 0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 1, 1, 0)
or
toString(x)
or whatever (for the sake of learning various options)? Thanks.

For vectors, use the paste() function and specify the collapse argument:
x <- c(1, 0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 1, 1, 0)
paste( x, collapse = '' )
[1] "10000011010110"

Have you tried
write.table(x,row.names=FALSE,col.names=FALSE,eol="\t")
1 0 0 0 0 0 1 1 0 1 0 1 1 0
By changing the eol (end of line) character, you can decide if and what separator to use.

Related

Merging 2 Vectors to 1 Vector that satisfies certain criteria

I have two vectors that can be written as follows:
aa <- c(0, 0, 0, 0, 1, 0, 0, 0)
bb <- c(0, 2, 0, 0, 3, 1, 1, 1)
I want to merge these vectors such that the rest of vector bb takes the value zero when vector aa interfere with the value 1. In this example the result should look like:
cc <- c(0, 2, 0, 0, 3, 0, 0, 0)
What is the fastest and most efficient way to do this in R?
We may do
library(dplyr)
ifelse(lag(cummax(aa), default = 0) == 0, bb, aa)
[1] 0 2 0 0 3 0 0 0
Or another way is
bb * !c(0, head(cummax(aa), -1))
[1] 0 2 0 0 3 0 0 0
Or another option
ind <- (which.max(aa) + 1):length(aa)
bb[ind] <- aa[ind]
> bb
[1] 0 2 0 0 3 0 0 0
This is maybe too much for this task. At least for me it is easier to follow:
library(dplyr)
cc <- tibble(aa,bb) %>%
group_by(id_group=lag(cumsum(aa==1), default = 0)) %>%
mutate(cc = ifelse(id_group == 0, coalesce(bb,aa), coalesce(aa,bb))) %>%
pull(cc)
output:
[1] 0 2 0 0 3 0 0 0

R Lookback few days and assign new value if old value exists

I have two timeseries vectors as follows -
a <- c(1, 0, 0, 0, 1, 0, 1, 1, 1, 0, 0, 0)
b <- c(1, 0, 1, 0)
I want to look back 7 days and replace only 1's in vectors a and b with 2. It is important to check if there were any values 7 days before replacing.
The expected result is -
a = c(1, 0, 0, 0, 1, 0, 2, 1, 1, 0, 2, 0)
b = c(1, 0, 1, 0) - Since no value existed 7 days ago, nothing changes here.
Thanks!
We can create a condition with lag
library(dplyr)
f1 <- function(vec) replace(vec, lag(vec, 6) == 1, 2)
-output
f1(a)
#[1] 1 0 0 0 1 0 2 1 1 0 2 0
f1(b)
#[1] 1 0 1 0
A base R option by defining an user function f
f <- function(v) replace(v, (ind <- which(v == 1) + 6)[ind <= length(v)], 2)
such that
> f(a)
[1] 1 0 0 0 1 0 2 1 1 0 2 0
> f(b)
[1] 1 0 1 0

Recode a value in a vector based on surrounding values

I'm trying to programmatically change a variable from a 0 to a 1 if there are three 1s before and after a 0.
For example, if the number in a vector were 1, 1, 1, 0, 1, 1, and 1, then I want to change the 0 to a 1.
Here is data in the vector dummy_code in the data.frame df:
original_df <- data.frame(dummy_code = c(1, 0, 0, 1, 1, 1, 0, 1, 1, 1, 0, 0, 1))
Here is how I'm trying to have the values be recoded:
desired_df <- data.frame(dummy_code = c(1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1)
I tried to use the function fill in the package tidyr, but this fills in missing values, so it won't work. If I were to recode the 0 values to be missing, then that would not work either, because it would simply code every NA as 1, when I would only want to code every NA surrounded by three 1s as 1.
Is there a way to do this in an efficient way programmatically?
An rle alternative, using the x from #G. Grothendieck's answer:
r <- rle(x)
Find indexes of runs of three 1:
i1 <- which(r$lengths == 3 & r$values == 1)
Check which of the "1 indexes" that surround a 0, and get the indexes of the 0 to be replaced:
i2 <- i1[which(diff(i1) == 2)] + 1
Replace relevant 0 with 1:
r$values[i2] <- 1
Reverse the rle operation on the updated runs:
inverse.rle(r)
# [1] 1 0 0 1 1 1 1 1 1 1 0 0 1
A similar solution based on data.table::rleid, slightly more compact and perhaps easier to read:
library(data.table)
d <- data.table(x)
Calculate length of each run:
d[ , n := .N, by = rleid(x)]
For "x" which are zero and the preceeding and subsequent runs of 1 are of length 3, set "x" to 1:
d[x == 0 & shift(n) == 3 & shift(n, type = "lead") == 3, x := 1]
d$x
# [1] 1 0 0 1 1 1 1 1 1 1 0 0 1
Here is a one-liner using rollapply from zoo:
library(zoo)
rollapply(c(0, 0, 0, x, 0, 0, 0), 7, function(x) if (all(x[-4] == 1)) 1 else x[4])
## [1] 1 0 0 1 1 1 1 1 1 1 0 0 1
Note: Input used was:
x <- c(1, 0, 0, 1, 1, 1, 0, 1, 1, 1, 0, 0, 1)

Linear programming with conditional constraints in R

I have a linear programming problem where I'm trying to select from a number of binary resources to optimize value, basically a knapsack problem. The issue I'm having is that the different resources have characteristics in common and I want to ensure that my final solution has either 0 or 2 of resources with a specific characteristic. Is there some way to accomplish this? I haven't been able to think of one or find one despite extensive searching. In my data, the decision variables are resources and the constraints are characteristics of those resources. Consider the following code:
library(lpSolve)
const_mat_so<-matrix(c(
c(0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0,1,0,0,1,0,1)
,c(0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0,1,1,0,0,1,1)
,c(0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1,0,1,0,1,0,0)
,c(1, 1, 0, 1, 1, 0, 0, 0, 1, 0, 0,0,0,0,0,0,0)
,c(8800, 8500, 7600, 8600, 8400, 7500, 7000, 8500, 8800, 7700, 6700,5500,1200,6700,9500,8700,6500)
,c(0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0,0,0,0,0,0,0)
,c(0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0,0,0,0,0,0,0)
,c(0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0,0,0,0,0,0,0)
,c(0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1,0,0,1,0,1,0)
,c(0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0,1,1,0,0,0,0)
,c(0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0,0,0,0,0,0,0)
,c(0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0,1,1,1,0,1,0)
,c(0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0,0,0,0,0,1,0)
,c(0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0,0,0,0,1,0,0)
,c(0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1,0,0,0,0,0,0)
),nrow=15,byrow = TRUE)
const_dir_so<-c("=","=","=","=","<=","<=","<=","<=","<=","<=","<=","<=","<=","<=","<=")
max_cost_so = 25000
objective_so = c(21.0, 19.3, 19.2, 18.8, 18.5, 16.6, 16.4, 16.4, 16.0, 16.0, 14.9, 14.6, 14.0, 13.9,12.0,5.5,24.6)
const_rhs_so<-c(1,1,1,1,25000,3,3,3,2,2,2,2,2,2,2)
x = lp ("max", objective_so, const_mat_so, const_dir_so, const_rhs_so, all.bin=TRUE, all.int=TRUE
)
> x
Success: the objective function is 68.1
> x$solution
[1] 1 0 1 0 0 0 0 0 0 0 0 0 1 1 0 0 0
While the above produces a solution, it is not the solution I want because I actually want the last seven constraints to be >=2 or 0. I have no clue how to code this or whether it's possible. Any help would be appreciated. I'm not a linear programming whiz so please forgive any misconceptions regarding the approach.
My understanding is that each of the last 7 constraints are to be greater than 2 or equal to zero, i.e. not 1.
1) There are only 7 such constraints so there are 2^7 = 128 possibilities which is small enough that we can just run every one using the formulation given it he question without excessive runtime and then find the maximum of those.
dec2bin takes a base 10 (i.e. decimal) number and converts it to a binary vector of 0s and 1s. Running it on each number between 0 and 127 gives binary numbers such that the 1s correspond to constraints which are >= 2 (with the rest equal to 0).
dec2bin <- function(dec, digits = 7) {
# see http://stackoverflow.com/questions/6614283/converting-decimal-to-binary-in-r
tail(rev(as.integer(intToBits(dec))), digits)
}
runLP <- function(i) {
bin <- dec2bin(i)
n <- length(const_rhs_so) # 15
ix <- seq(to = n, length = length(bin)) # indexes of last 7 constraints, i.e. 9:15
const_dir_so[ix] <- ifelse(bin, ">=", "=")
const_rhs_so[ix] <- 2*bin
lp("max", objective_so, const_mat_so, const_dir_so, const_rhs_so, all.bin = TRUE)
}
lpout <- lapply(0:127, runLP)
ixmax <- which.max(sapply(lpout, "[[", "objval"))
ans <- lpout[[ixmax]]
ans
ans$solution
tail(c(const_mat_so %*% ans$solution), 7)
giving:
> ans
Success: the objective function is 62
> ans$solution
[1] 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1
> tail(c(const_mat_so %*% ans$solution), 7) # last 7 constraint values
[1] 0 0 0 0 0 0 0
2) In #Erwin Kalvelagen's second alternative it refers to constraining variables but I think what was meant was that x in his answer is the value of the LHS of one of the last 7 constraints. That is, if C is the matrix of the original last 7 constraints then replace those original 7 constraints with these 14 constraints:
Cx + D1 y <= 0
Cx + D2 y >= 0
where D1 is a diagonal matrix whose diagonal elements are any sufficiently large negative number and D2 is a diagonal matrix whose diagonal elements are all -2. Here we are optimizing over x and y vectors of binary variables. The x variables are as in the question and there are 7 new y binary variables such that y[i] is 0 to constrain the ith of the last 7 original constraints to 0 or 1 to constrain it to 2 or more. The y variables are called bin in (1). The coefficients of the y variables in the objective are all zero.
In terms of lpSolve R code:
objective_so2 <- c(objective_so, numeric(7))
const_mat_so2 <- cbind(rbind(const_mat_so, const_mat_so[9:15, ]),
rbind(matrix(0, 8, 7), diag(-100, 7), diag(-2, 7)))
const_dir_so2 <- c(const_dir_so, rep(">=", 7))
const_rhs_so2 <- c(const_rhs_so[1:8], numeric(14))
x2 = lp ("max", objective_so2, const_mat_so2, const_dir_so2, const_rhs_so2, all.bin = TRUE)
giving the same value of 62 as in (1). The y variables (last 7) are all 0 which also corresponds to (1). This also provides a double check as two methods have now given consistent answers.
> x2
Success: the objective function is 62
> x2$solution
[1] 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0
I believe LpSolve supports semi-continuous variables. A semi-continuous variable with lower bound L and upper bound U can assume the values 0 or between L and U. I am not sure the R package lpSolve supports this variable type.
However we can simulate this with an extra binary variable y and extra constraints. So you need to make your x variable continuous (or integer if you want only integer values) and add the constraints:
2*y <= x <= U*y
where U is an upper bound for x.
lpSolveAPI package provides more advanced interface to "lp_solve". As #Erwin Kalvelagen mentioned, "lp_solve" and lpSolveAPI supports Semi-continuous variable (Semi-continuous decision variables can take allowed values between their upper and lower bound as well as zero). And a constraint matrix enable you to transfer the outputs of 9-15th constraint formulas into 18-24th variables. For example (about 9th constraint), when x6 + x11 + x14 + x16 - x18 = 0, x6 + x11 + x14 + x16 = x18. So I think you can control x6 + x11 + x14 + x16 via semi-continuous variable, x18.
library(lpSolveAPI)
## add 18-24th cols to define the 18-24th variables
const_mat_so2 <- cbind(const_mat_so, rbind(matrix(0, nrow = 8, ncol = 7), diag(-1, 7)))
## [EDITED] make a model and set a constraint matrix and objective coefs
model <- make.lp(nrow(const_mat_so2), 0)
for(i in 1:ncol(const_mat_so2)) add.column(model, const_mat_so2[,i])
set.constr.type(model, c(const_dir_so[-c(9:15)], rep("=", 7)))
set.rhs(model, c(const_rhs_so[-c(9:15)], rep(0, 7))) # each original output - 18-24th = 0
set.objfn(model, c(objective_so, rep(0, 7))) # 18-24th are 0
## define semi-continuous and bounds.
set.semicont(model, col = 18:24)
set.bounds(model, lower = rep(1.9, 7), col = 18:24) # default upper is Inf.
## define other things
set.type(model, col = 1:17, type = "binary") # original variable
set.type(model, col = 18:24, type = "integer") # outputs of original constraint formulas
lp.control(model, sense = "max") # do maximize
# write.lp(model, "filename.lp", "lp") # if you want to watch the whole model
solve(model)
get.variables(model)
# [1] 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 [18] 0 0 0 0 0 0 0
get.objective(model)
# [1] 62
t(const_mat_so %*% res[1:17])
# [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14] [,15]
# [1,] 1 1 1 1 22300 1 0 0 0 0 0 0 0 0 0

how to remove one data in r

In R I have some vector.
x <- c(0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0)
I want to remove only "0" in x vector, but it removes all '0' in this vector.
Example
x=x[!x %in% 0 )]
All zero in this vector had been remove in x vector
For Example in Python
x = [0,1,0,1,0,0,0,1]
x.remove(0)
x
[1, 0, 1, 0, 0, 0, 1]
x.remove(0)
x
[1, 1, 0, 0, 0, 1]
We can use match to remove the first occurrence of a particular number
x <- c(1, 0, 1, 0, 0, 0, 1)
x[-match(1, x)]
#[1] 0 1 0 0 0 1
If you have any other number to remove in array, for example 5 in the case below,
x <- c(1, 0, 5, 5, 0, 0, 1)
x[-match(5, x)]
#[1] 1 0 5 0 0 1
You may need which.min(),
which determines the index of the first minimum of a vector:
x <- c(0,1,0,1,0,0,0,1)
x <- x[-which.min(x)]
x
# [1] 1 0 1 0 0 0 1
If your vector contains elements other than 0 or 1: x <- x[-which.min(x != 0)]

Resources