Possible combinations using R - r

I have edited my question and changed certain lines in my script, to make it clear to find the number of times I can get the output 1 or 0.
I have 19 variables.I tried the possible combinations of these 19 variables for giving a binary output of 0 or 1 i.e. 2 to the power of 19 (5,24,288). But I couldn't display the truth table in R for all the 5,24,288 combinations because of the limited memory space. Is there any way to find the number of combinations that give the output 1 and 0. Below is the script, where I have given the following inputs using logical gate AND and OR. Kindly give me ideas or suggestions to find the number of times I can get values 0 or 1 as output
n <- 19
l <- rep(list(0:1), n)
inputs <- expand.grid(l)
len <-dim(inputs)
len <-len[1]
output <- 1;
for(i in 1:len)
{
if((inputs[i,1] == 1 & inputs[i,2] == 1 & inputs[i,3] == 1 & (inputs[i,4] == 1 & inputs[i,5] == 1 | inputs[i,6] == 1 & inputs[i,7] == 0)) | (inputs[i,1] == 1 & inputs[i,2] == 1 & inputs[i,8] == 1 & inputs[i,9] == 1) | (inputs[i,1] == 1 & inputs[i,10] == 0 & inputs[i,11] == 0) |(inputs[i,1] == 1 & inputs[i,12] == 1 & inputs[i,13] == 1 & inputs[i,14] == 1) | (inputs[i,1] == 1 & inputs[i,15] == 1 & inputs[i,16] == 1) | (inputs[i,1] == 1 & inputs[i,17] == 0) | (inputs[i,1] == 1 & inputs[i,18] == 1 & inputs[i,19] == 1)){
output[i] <- 1
}
else
{
output[i] <- 0
}
}
data <- cbind(inputs, output)
write.csv(data, "data.csv", row.names=FALSE)

1048576 isn't absurdly big. If all you want are the 20 0/1 columns it takes about 80 Mb if you use integers:
x = replicate(n = 20, expr = c(0L, 1L), simplify = FALSE)
comb = do.call(expand.grid, args = x)
dim(comb)
# [1] 1048576 20
format(object.size(comb), units = "Mb")
# [1] "80 Mb"
In your question you use && a lot. && is good for comparing something of length 1. Use & for a vectorized comparison so you don't need a for loop.
For example:
y = matrix(c(1, 1, 0, 0, 1, 0, 1, 0), nrow = 4)
y[, 1] & y[, 2] # gives the truth table for & applied across columns
# no for loop needed
# R will interpret 0 as FALSE and non-zero numbers as TRUE
# so you don't even need the == 1 and == 0 parts.
It seems like you're really after the number of combinations where all the values are 1. (Or where they all have specific values.) I'm not going to give away the answer here because I suspect this is for homework, but I will say that you shouldn't need to program a single line of code to find that out. If you understand what the universe of 'all possible combinations' is, the answer will be quite clear logically.

I guess this is what you want:
key <- c(1,0,1,1,1,1,1,1,1,1,1,0,1,1,0,1,1,1,1,1) # based on your if condition
inputs <- expand.grid(rep(list(0:1), 20))
len <- nrow(inputs)
output <- sapply(1:len, function(i) all(inputs[i,]==key))
data <- cbind(inputs, as.numeric(output))
write.csv(data, "data.csv", row.names=FALSE)
Although, as stressed by others, key can be found only in one row out of all 1048576 rows.

Related

For loop with condition in R

Im new at programing in R.
I have a list which contains numbers between 0 and 5. I want to count how many times 1 appears before 5, in result2 stored my list. I have done this:
counting<-function(lista,n,m){
p=2
for (p in data_list){
if(results2[p]==n && results2[p-1]==m){
length(p)
}
p<-p+1
}
}
counting(results2,5,1)
Can anyone please provide me with som helpful adivce to imporve my code since it does not work.
We loop over the list, find the index of the first 5, get the sequence (seq), use that to subset the list element and count the number of 1 by creating a logical expression with == and using sum on that
sapply(data_list, function(x) {
i1 <- which(x == 5)
i2 <- i1[i1 > 1]
if(length(i2) > 0) {
sum(x[i2-1] == 1)
} else NA_real_
})
#[1] 3 3
Or in tidyverse, we can make use of lag
library(dplyr)
library(purrr)
map_dbl(data_list, ~ sum(.x == 5 & lag(.x) == 1, na.rm = TRUE))
#[1] 3 3
data
data_list <- list(c(3,4,1,5 ,2,3,1,5,4,1,5),
c(3,4,1,5 ,2,3,1,5,4,1,5))

Trying to create a new column in a data frame using a function in R

I have a large data frame, and I would like to create a new column for the data frame in R but I am struggling.
I am a relative beginner and I would be very grateful for some help.
Essentially I am looking to create a new column of AKI stage, based on an individuals peak and baseline creatinine measurements, and whether they have been on renal-replacement therapy (RRT), according to the following criteria:
stage 1: Peak Cr/Baseline Cr = 1.5–1.9 OR Peak Cr ≥ Baseline Cr + 26.5mmol/l)
stage 2: Peak Cr/Baseline Cr = 2.0–2.9
stage 3: Peak Cr/Baseline Cr ≥ 3 OR Peak cr ≥353.6mmol/l OR Initiation of RRT
My data looks like this, in which I have 3 main variables.
head(data)
Peak.Creatinine.1 baseline.Cr.within.12.months new.RRT
1 421 82 1
2 659 98 1
3 569 89 1
4 533 113 1
5 533 212 1
6 396 65 1
I would like to create a new column called "AKI.stage", which returns a number 0,1,2,3 or 4.
Which essentially uses this function:
akistage <- function(peak_cr, bl_cr, rrt=0) {
ratio <- peak_cr / bl_cr
if (rrt == "1"){return(3)}
else if (ratio >= 3){return(3)}
else if (peak_cr > 353.6){return(3)}
else if (ratio > 2 & ratio <3){return(2)}
else if (ratio > 1.5 & ratio <2){return(1)}
else if ((peak_cr >= bl_cr + 26.5)){return(1)}
else {return (0)}
}
The function works well when I test it, but I can't seem to apply it to the dataframe in order to create the new column.
I have attempted this in multiple ways including using apply,mapply,mutate,transform etc but I just can't seem to get it to work.
Here are some of my failed attempts:
data2$Peak.Creatinine.1 <- as.numeric(data2$Peak.Creatinine.1)
data2$baseline.Cr.within.12.months <- as.numeric(data2$baseline.Cr.within.12.months)
data2$test <- apply(data2, 1, function(x){
ratio <- x[1] / x[2]
peak_cr <- x[1]
bl_cr <- x[2]
rrt <- x[3]
if (rrt == "1"){return(3)}
else if (ratio >= 3){return(3)}
else if (peak_cr > 353.6){return(3)}
else if (ratio > 2 & ratio <3){return(2)}
else if (ratio > 1.5 & ratio <2){return(1)}
else if ((peak_cr >= bl_cr + 26.5)){return(1)}
else {return (0)}
})
But this returns the following error message, despite being of class numerical:
Error in x[1]/x[2] : non-numeric argument to binary operator
Another attempt:
data2 %>%
mutate(test =
akistage(Peak.Creatinine.1,baseline.Cr.within.12.months,new.RRT))
Returns
Warning message:
In if (rrt == "1") { :
the condition has length > 1 and only the first element will be used
I have attempted it in lots of other ways, and I'm not sure why it's not working.
It does not seem very difficult to do, I would be extremely grateful if someone could come up with a solution!
Many thanks for your help!
The following vectorized function does what the question describes. It uses index vectors to assign the return values to a previously created vector AKI.stage.
akistage <- function(peak_cr, bl_cr, rrt = 0) {
AKI.stage <- numeric(length(peak_cr))
ratio <- peak_cr / bl_cr
rrt1 <- rrt == 1
i <- findInterval(ratio, c(0, 1.5, 2, 3, Inf))
AKI.stage[rrt1 | i == 4 | peak_cr > 353.6] <- 3
AKI.stage[!rrt1 & i == 3] <- 2
AKI.stage[!rrt1 & i == 2] <- 1
AKI.stage[!rrt1 & i == 1 & peak_cr >= bl_cr + 26.5] <- 1
AKI.stage
}
data %>%
mutate(test = akistage(Peak.Creatinine.1,baseline.Cr.within.12.months,new.RRT))
I propose you different solutions to add a new colum to a data.frame using only base R :
df <- data.frame(v1 = rep(0, 100), v2 = seq(1, 100))
v3 <- rep(0, 100)
# first way with a $
df$v3 <- v3
# second way with cbind
df <- cbind(df, v3)
# third way
df[, 3] <- 3
EDIT 1
Your problem is coming from the fact that your third column is a factor so when you use apply it transforms all your data into character. The right way to do what you want is :
sapply(1:nrow(data2), function(i, df){
x <- df[i,]
ratio <- x[1] / x[2]
peak_cr <- x[1]
bl_cr <- x[2]
rrt <- x[3]
if (rrt == "1"){return(3)}
else if (ratio >= 3){return(3)}
else if (peak_cr > 353.6){return(3)}
else if (ratio > 2 & ratio <3){return(2)}
else if (ratio > 1.5 & ratio <2){return(1)}
else if ((peak_cr >= bl_cr + 26.5)){return(1)}
else {return (0)}
}, df = data2)

Cut elements from the beginning and end of an R vector

For time series analysis I handle data that often contains leading and trailing zero elements. In this example, there are 3 zeros at the beginning an 2 at the end. I want to get rid of these elements, and filter for the contents in the middle (that also may contain zeros)
vec <- c(0, 0, 0, 1, 2, 0, 3, 4, 0, 0)
I did this by looping from the beginning and end, and masking out the unwanted elements.
mask <- rep(TRUE, length(vec))
# from begin
i <- 1
while(vec[i] == 0 && i <= length(vec)) {
mask[i] <- FALSE
i <- i+1
}
# from end
i <- length(vec)
while(i >= 1 && vec[i] == 0) {
mask[i] <- FALSE
i <- i-1
}
cleanvec <- vec[mask]
cleanvec
[1] 1 2 0 3 4
This works, but I wonder if there is a more efficient way to do this, avoiding the loops.
vec[ min(which(vec != 0)) : max(which(vec != 0)) ]
Basically the which(vec != 0) part gives the positions of the numbers that are different from 0, and then you take the min and max of them.
We could use the range and Reduce to get the sequence
vec[Reduce(`:`, range(which(vec != 0)))]
#[1] 1 2 0 3 4
Take the cumsum forward and backward of abs(vec) and keep only elements > 0. if it were known that all elements of vec were non-negative, as in the question, then we could optionally omit abs.
vec[cumsum(abs(vec)) > 0 & rev(cumsum(rev(abs(vec)))) > 0]
## [1] 1 2 0 3 4

R: Remove repeated values and keep the first one in a binary vector

I would like to remove the repeated ones but keep the first in a binary vector:
x = c(0,0,1,1,0,1,0,1,1,1,0,1) # the input
y = c(0,0,1,0,1,0,1,0,1) # the desired output
i.e., one 1 and two 1's of the first and third set of 1's are removed, respectively, and the first in the set is kept.
I am trying to use rle with cumsum but have not yet figured it out. Any suggestion would be appreciated.
Using rle/inverse.rle
res <- rle(x)
res$lengths[res$values == 1] <- 1
inverse.rle(res)
## [1] 0 0 1 0 1 0 1 0 1
We can use diff:
x[c(1, diff(x)) == 1 | x == 0]
x = c(0,0,1,1,0,1,0,1,1,1,0,1)
x[!(x == 1 & #remove each value that is a 1
c(x[-1] == 1, FALSE) #followed by a 1 (never the case for the last value)
)]
#[1] 0 0 1 0 1 0 1 0 1
x = c(0,0,1,1,0,1,0,1,1,1,0,1)
x1 <- rle(x)
x1$lengths[x1$values==1] <- 1
inverse.rle(x1)
Depending on the vector size you could loop through it and use conditions for appending the value to the result. Here is a simple solution using your given input.
x <- c(0,0,1,1,0,1,0,1,1,1,0,1)
prev <- 0
y <- c()
for(i in x){
if (i == 1){
if (prev != 1){
y <- append(y,i)
}
}else{
y <- append(y,i)
}
prev <- i
}

find contiguous responses greater than x

I'm trying to find the the point at which participants reach 8 contiguous responses in a row that are greater than 3. For example:
x <- c(2,2,4,4,4,4,5,5,5,5,7)
i want to return the value 10.
i tried the code (Thanks #DWin):
which( rle(x)$values>3 & rle(x)$lengths >= 8)
sum(rle(x)$lengths[ 1:(min(which(rle(x)$lengths >= 8))-1) ]) + 8
the problem with the above code is that it only works if the responses are all identical and greater than 3. thus the code returns a zero.
if:
x <- c(2,2,4,4,4,4,4,4,4,4,7)
the code works fine. but this isn't how my data looks.
Thanks in advance!
Why don't you create a new vector that contains the identical values that rle needs to work properly? You can use ifelse() for this and put everything into a function:
FUN <- function(x, value, runlength) {
x2 <- ifelse(x > value, 1, 0)
ret <- sum(rle(x2)$lengths[ 1:(min(which(rle(x2)$lengths >= runlength))-1) ]) + runlength
return(ret)
}
> FUN(x, value = 3, runlength = 8)
[1] 10
You could just convert your data so that the responses are only coded discriminating the measure of interest (greater than 3) and then your code will work as it is replacing x with x1.
x1 <- ifelse( x > 3, 4, 0 )
But if I was already doing this I might rewrite the code slightly more clearly this way.
runl <- rle(x1)
i <- which( runl$length > 8 & runl$value > 3 )[1]
sum( runl$length[1:(i-1)] ) + 8
Here's a vectorized way of doing it with just cumsum and cummax. Let's take an example that has a short (less than length 8) sequence of elements greater than 3 as well as a long one, just to make sure it's doing the right thing.
> x <- c(2,2,4,5,6,7,2,2,4,9,8,7,6,5,4,5,6,9,2,2,9)
> x3 <- x > 3
> cumsum(x3) - cummax(cumsum(x3)*(!x3))
[1] 0 0 1 2 3 4 0 0 1 2 3 4 5 6 7 8 9 10 0 0 1
> which( cumsum(x3) - cummax(cumsum(x3)*(!x3)) == 8)[1]
[1] 16

Resources