I have a dataframe "data" containing 10 variables A to J (which all contain 0s and 1s) and 500 rows:
I need to make a second set of 10 variables AY to JY based on the variables A to J such that:
for AY, if A==1 then AY takes the value 1 with 80% probability and if A==0 then AY takes the value 1 with 20% probability
for BY, if B==1 then BY takes the value 1 with 80% probability and if B==0 then BY takes the value 1 with 20% probability
And so on...
Right now, I have the variables A to J stored the dataframe "data", and have the following as my code:
out <- paste0(LETTERS[1:10], "Y")
data2 <- data.frame(data)
colnames(data2) <- out
for (i in out) {
data2[i] <- ifelse(**???**, rbinom(length(out), 1, 0.8), rbinom(length(out), 1, 0.2))
}
What would I write instead of the question marks to denote "if any value in the list of variables AY:JY is equal to 1, execute the first argument, otherwise execute the second argument"?
Please find below one solution that should work.
Starting data
set.seed(4854)
df <- data.frame("A" = sample(c(0,1), 500, replace = TRUE),
"B" = sample(c(0,1), 500, replace = TRUE),
"C" = sample(c(0,1), 500, replace = TRUE),
"D" = sample(c(0,1), 500, replace = TRUE),
"E" = sample(c(0,1), 500, replace = TRUE),
"F" = sample(c(0,1), 500, replace = TRUE),
"G" = sample(c(0,1), 500, replace = TRUE),
"H" = sample(c(0,1), 500, replace = TRUE),
"I" = sample(c(0,1), 500, replace = TRUE),
"J" = sample(c(0,1), 500, replace = TRUE)
)
Saving original data
df2 <- df
Apply with apply a function which randomly samples with replacement
(i.e. replace = TRUE) 1 or 0 with the probabilities which you indicated
according to whether the original data is 0 or 1
df2 <- apply(df2, c(1,2), function (x)
ifelse(
x == 1,
sample(c(0, 1), 1, prob = c(0.2, 0.8), replace = TRUE),
sample(c(0, 1), 1, prob = c(0.8, 0.2), replace = TRUE)
))
Renaming of columns
colnames(df2) <- paste0(colnames(df),"Y")
Output
head(df2)
#> AY BY CY DY EY FY GY HY IY JY
#> [1,] 1 0 0 1 0 0 0 1 0 1
#> [2,] 0 1 0 0 0 0 0 0 1 1
#> [3,] 1 1 1 0 1 1 0 0 0 0
#> [4,] 1 0 1 0 1 1 1 1 1 0
#> [5,] 1 1 0 1 0 1 1 0 0 0
#> [6,] 0 0 0 1 1 1 1 1 0 1
Created on 2021-09-24 by the reprex package (v2.0.1)
Related
I have a function f(x) which I intend to minimize. "x" is a vector containing 50 parameters. This function has several constraints: first is that all parameters in x should be binary, so that x = (1,1,0,1,...); second is that the sum of "x" should be exactly 25, so that sum(x) = 25. The question can be illustrated as:
min f(x)
s.t. sum(x) = 25,
x = 0 or 1
However when I try to solve this problem in R, I met some problems. Prevalent packages such as "optim","constrOptim" from "stats" can only input coefficients of the target function (in my case, the function is bit complex and cannot be simply illustrated using coefficient matrix), "donlp2" from "Rdonlp" does not support setting parameters to be binary. I'm wondering whether anyone has any idea of how to set binary constraints for this case?
Expanding my comment, here is an example of a Local Search, as implemented in package NMOF. (I borrow Stéphane's objective function).
library("NMOF")
library("neighbours")
## Stéphane's objective function
f <- function(x)
sum(1:20 * x)
nb <- neighbourfun(type = "logical", kmin = 10, kmax = 10)
x0 <- c(rep(FALSE, 10), rep(TRUE, 10))
sol <- LSopt(f, list(x0 = x0, neighbour = nb, nI = 1000))
## initial solution
as.numeric(x0)
## [1] 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1
## final solution
as.numeric(sol$xbest)
## [1] 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0
(Disclosure: I am the maintainer of packages NMOF and neighbours.)
You can try the amazing package rgenoud. Below is an example.
I take 20 binary variables instead of your 50 for easier reading. I take f(x) = sum(1:20 * x), this is a weighted sum with increasing weights so clearly the best solution (restricted to sum(x)=10) is 1, 1, ..., 1, 0, 0, ..., 0. And rgenoud brilliantly finds it.
library(rgenoud)
f <- function(x) { # the function to be minimized
sum(1:20 * x)
}
g <- function(x){
c(
ifelse(sum(x) == 10, 0, 1), # set the constraint (here sum(x)=10) in this way
f(x) # the objective function (to minimize/maximize)
)
}
solution <- genoud(
g,
pop.size = 3000,
lexical = 2, # see ?genoud for explanations
nvars = 20, # number of x_i's
starting.values = c(rep(0, 10), rep(1, 10)),
Domains = cbind(rep(0, 20), rep(1, 20)), # lower and upper bounds
data.type.int = TRUE # x_i's are integer
)
solution$par # the values of x
## [1] 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0
solution$value
## [1] 0 55 ; 0 is the value of ifelse(sum(x)=10,0,1) and 55 is the value of f(x)
I need to create a vector of length 4, such that 2 of the elements are 1, the other 2 are 0, and which number appears in a certain place is random.
For example, the first simulation is [1, 1, 0, 0]. The second simulation might be [1, 0, 1, 0] or something else.
I tried the code as follows:
sample(x = c(0,1), size = 4, replace = TRUE, prob = 0.5)
but it simply returned a list with random numbers of 1 and 0.
Is it possible to make such a list without using for or while loops?
You can give two values for prob argument but this doesn't guarantee the distribution to be equal
sample(x = c(0,1), size = 4, replace = TRUE, prob = c(0.5, 0.5))
#[1] 1 0 0 1
sample(x = c(0,1), size = 4, replace = TRUE, prob = c(0.5, 0.5))
#[1] 1 1 0 1
sample(x = c(0,1), size = 4, replace = TRUE, prob = c(0.5, 0.5))
#[1] 0 0 0 1
What you can do instead is create your vector using rep and then use sample for randomness
sample(rep(c(1, 0), 2))
#[1] 0 1 0 1
sample(rep(c(1, 0), 2))
#[1] 0 0 1 1
sample(rep(c(1, 0), 2))
#[1] 1 1 0 0
sample(rep(c(1, 0), 2))
#[1] 0 1 1 0
This would also work if you have different vector length as we can use the times argument
n <- 11
sample(rep(c(0, 1), c(n - 3, 3)))
#[1] 0 0 1 0 1 1 0 0 0 0 0
sample(rep(c(0, 1), c(n - 3, 3)))
#[1] 0 1 0 0 0 0 1 0 0 0 1
sample(rep(c(0, 1), c(n - 3, 3)))
#[1] 0 1 0 0 0 0 1 0 0 0 1
I have a data set and would like to do two things:
Set certain row values in Col A to 0 based on values in Col B
Create a new column with values of either 0 or 1 based on the edited values in Col A
My current approach is shown below - the issue is I occasionally get an error:
Error in `[<-.data.frame`(`*tmp*`, "OCS_dose", value = 0) :
replacement has 1 row, data has 0
As the numbers that I am generating are randomly selected and on certain trials there are no rows to update in Col A based on the numbers in Col B.
Here is an example of my code that causes the error:
pbo_IFNlow_data[pbo_IFNlow_data$OCS_status == 0,]['OCS_dose'] <- 0
OCS_status is either a 0 or 1 that is generated using:
pbo_OCS_status_low <- sample(c(0,1), replace = TRUE,
size = pbo_n_IFNlow, prob=c(1-.863, 0.863))
Therefore on occasion, I have no 0's... In my mind R should then just not try to update anything.
Is there a better way to do what I am trying to do?
Here is a more complete segment of my code:
pbo_OCS_status_low <- sample(c(0,1), replace = TRUE, size = pbo_n_IFNlow, prob=c(1-.863, 0.863)) #on OCS = 1
#OCS dose
pbo_OCS_dose_low <- rtruncnorm(pbo_n_IFNlow, a=0, b=Inf, mean=12.8, sd=8.1)
#IFN boolean flag
pbo_IFN_low <- rep(0, pbo_n_IFNlow)
#SLEDAI score
pbo_SLEDAI_low <- rtruncnorm(pbo_n_IFNlow, a=0, b=Inf, mean=11.1, sd=4.4)
#Response criteria met for SRI score reduction
pbo_SRI_low <- sample(c(0,1), replace = TRUE, size = pbo_n_IFNlow, prob=c(1-0.423, 0.423))
pbo_IFNlow_data <- cbind(IFN_status=pbo_IFN_low,
OCS_status=pbo_OCS_status_low,
OCS_dose=pbo_OCS_dose_low,
SLEDAI=pbo_SLEDAI_low,
SRI_response=pbo_SRI_low)
pbo_IFNlow_data <- data.frame(pbo_IFNlow_data)
#set those off OCS to 0
pbo_IFNlow_data[pbo_IFNlow_data$OCS_status == 0,]['OCS_dose'] <- 0
#stratifcation factor for OCS dosage
pbo_IFNlow_data$OCS_lessthan10 <- "temp"
pbo_IFNlow_data[pbo_IFNlow_data$OCS_dose < 10, ]['OCS_lessthan10'] <- 1
pbo_IFNlow_data[pbo_IFNlow_data$OCS_dose >= 10, ]['OCS_lessthan10'] <- 0
#stratification factor for SLE score
pbo_IFNlow_data$SLE_lessthan10 <- "temp"
pbo_IFNlow_data[pbo_IFNlow_data$SLEDAI < 10, ]['SLE_lessthan10'] <- 1
pbo_IFNlow_data[pbo_IFNlow_data$SLEDAI >= 10, ]['SLE_lessthan10'] <- 0
It would be easier if we can have a minimal reproducible example. If I understand your question correctly, you may want to try ifelse statement in R?
df <- data.frame(colA = seq(1, 10), colB = seq(11, 20))
# Set certain row values in Col A to 0 based on values in Col B
df$colA <- ifelse(df$colB > 15, 0, df$colB)
# Create a new column with values of either 0
# or 1 based on the edited values in Col A
df$colC <- ifelse(df$colA == 0, 1, 0)
print(df)
## colA colB colC
## 1 11 11 0
## 2 12 12 0
## 3 13 13 0
## 4 14 14 0
## 5 15 15 0
## 6 0 16 1
## 7 0 17 1
## 8 0 18 1
## 9 0 19 1
## 10 0 20 1
I have a data set that looks as follows
df <- data.frame( name = c("a", "b", "c"),
judgement1= c(5, 0, NA),
judgement2= c(1, 1, NA),
judgement3= c(2, 1, NA))
I want to reshape the dataframe to look like this
# name judgement1 judgement2 judgement3
# a 1 0 0
# a 1 0 0
# a 1 0 0
# a 1 0 0
# a 1 0 0
# b 1 0 0
# b 0 1 0
# b 0 0 1
And so on. I have seen that untable is recommended on some other threads, but it does not appear to work with the current version of r. Is there a package that can convert summarised counts into individual observations?
You could try something like this:
df <- data.frame( name = c("a", "b", "c"),
judgement1= c(5, 0, NA),
judgement2= c(1, 1, NA),
judgement3= c(2, 1, NA))
rep.vec <- colSums(df[colnames(df) %in% paste0("judgement", (1:nrow(df)), sep="")], na.rm = TRUE)
want <- data.frame(name=df$name, cbind(diag(nrow(df))))
colnames(want)[-1] <- paste0("judgement", (1:nrow(df)), sep="")
(want <- want[rep(1:nrow(want), rep.vec), ])
I wrote a function that works to give you your desired output:
untabl <- function(df, id.col, count.cols) {
df[is.na(df)] <- 0 # replace NAs
out <- lapply(count.cols, function(x) { # for each column with counts
z <- df[rep(1:nrow(df), df[,x]), ] # replicate rows
z[, -c(id.col)] <- 0 # set all other columns to zero
z[, x] <- 1 # replace the count values with 1
z
})
out <- do.call(rbind, out) # combine the list
out <- out[order(out[,c(id.col)]),] # reorder (you can change this)
rownames(out) <- NULL # return to simple row numbers
out
}
untabl(df = df, id.col = 1, count.cols = c(2,3,4))
# name judgement1 judgement2 judgement3
#1 a 1 0 0
#2 a 1 0 0
#3 a 1 0 0
#4 a 1 0 0
#5 a 1 0 0
#6 a 0 1 0
#7 b 0 1 0
#8 a 0 0 1
#9 a 0 0 1
#10 b 0 0 1
And for your reference, reshape::untable consists of the following code:
function (df, num)
{
df[rep(1:nrow(df), num), ]
}
I have two lists (one or more of the columns of these list will be a list as well) and I am matching values between each only where the column names match. Then based on matching, I would like the values either TRUE or FALSE outputted to a DF. I was using a loop in my code and ignoring the fist column which is an ID column. I provided some code to reproduce the example and desired output, but my script is more specific to my data so you can get a better idea of what I'm trying to accomplish.
List1 <- list(ID = 123, L1 = 89, L2 = 2, L3 = c(2.5, 3.5, 4))
List2 <- list(income = c(1,2,3,4,5), L1 = c(0,0,0,89,0), L10 = c(3,3,3,3,3),
L2 = c(2,55,55,55,55), L3 = c(2.5, 8, 8, 4, 8))
Desired results:
DF1
income L1 L10 L2 L3
1 0 0 1 1
2 0 0 0 0
3 0 0 0 0
4 1 0 0 1
5 0 0 0 0
My code:
NBPmatrix <- function(x){
nbp <- matrix(0, nrow = nrow(black), ncol = ncol(black))
colnames(nbp) <- names(black)
index <- nbp[, match(colnames(x), colnames(nbp), nomatch = 0)]
for(i in 2:length(nbp))
nbp[index] <- grepl(black[[i]], x[i], ignore.case = TRUE)
return(nbp)
}