R vectorization without ifelse - r

a <- array(1:18, dim=c(3,3,2))
a1 <- which( a >= 17, arr.ind = T)
a1im <- cbind(a1[,1] - 1, a1[,c(2,3)])
#ADVDOMiM is a logical vector
I would like a better vectorization than this one:
a[a1im] <- ifelse( a[a1] >= 5 & ADVDOMiM, a[a1], a[a1im])
I would like something like this:
a[a[a1] >= 5 & ADVDOMiM] <- a[a1]
but it doesn't work( i need to change the result in a[a1im], but i don't know how to do it)

You can try to subset a1im and a1 with a[a1] >= 5 & ADVDOMiM to avoid the ifelse:
i <- a[a1] >= 5 & ADVDOMiM
a[a1im[i,]] <- a[a1[i,]]

Related

For loop with condition in R

Im new at programing in R.
I have a list which contains numbers between 0 and 5. I want to count how many times 1 appears before 5, in result2 stored my list. I have done this:
counting<-function(lista,n,m){
p=2
for (p in data_list){
if(results2[p]==n && results2[p-1]==m){
length(p)
}
p<-p+1
}
}
counting(results2,5,1)
Can anyone please provide me with som helpful adivce to imporve my code since it does not work.
We loop over the list, find the index of the first 5, get the sequence (seq), use that to subset the list element and count the number of 1 by creating a logical expression with == and using sum on that
sapply(data_list, function(x) {
i1 <- which(x == 5)
i2 <- i1[i1 > 1]
if(length(i2) > 0) {
sum(x[i2-1] == 1)
} else NA_real_
})
#[1] 3 3
Or in tidyverse, we can make use of lag
library(dplyr)
library(purrr)
map_dbl(data_list, ~ sum(.x == 5 & lag(.x) == 1, na.rm = TRUE))
#[1] 3 3
data
data_list <- list(c(3,4,1,5 ,2,3,1,5,4,1,5),
c(3,4,1,5 ,2,3,1,5,4,1,5))

Trying to create a new column in a data frame using a function in R

I have a large data frame, and I would like to create a new column for the data frame in R but I am struggling.
I am a relative beginner and I would be very grateful for some help.
Essentially I am looking to create a new column of AKI stage, based on an individuals peak and baseline creatinine measurements, and whether they have been on renal-replacement therapy (RRT), according to the following criteria:
stage 1: Peak Cr/Baseline Cr = 1.5–1.9 OR Peak Cr ≥ Baseline Cr + 26.5mmol/l)
stage 2: Peak Cr/Baseline Cr = 2.0–2.9
stage 3: Peak Cr/Baseline Cr ≥ 3 OR Peak cr ≥353.6mmol/l OR Initiation of RRT
My data looks like this, in which I have 3 main variables.
head(data)
Peak.Creatinine.1 baseline.Cr.within.12.months new.RRT
1 421 82 1
2 659 98 1
3 569 89 1
4 533 113 1
5 533 212 1
6 396 65 1
I would like to create a new column called "AKI.stage", which returns a number 0,1,2,3 or 4.
Which essentially uses this function:
akistage <- function(peak_cr, bl_cr, rrt=0) {
ratio <- peak_cr / bl_cr
if (rrt == "1"){return(3)}
else if (ratio >= 3){return(3)}
else if (peak_cr > 353.6){return(3)}
else if (ratio > 2 & ratio <3){return(2)}
else if (ratio > 1.5 & ratio <2){return(1)}
else if ((peak_cr >= bl_cr + 26.5)){return(1)}
else {return (0)}
}
The function works well when I test it, but I can't seem to apply it to the dataframe in order to create the new column.
I have attempted this in multiple ways including using apply,mapply,mutate,transform etc but I just can't seem to get it to work.
Here are some of my failed attempts:
data2$Peak.Creatinine.1 <- as.numeric(data2$Peak.Creatinine.1)
data2$baseline.Cr.within.12.months <- as.numeric(data2$baseline.Cr.within.12.months)
data2$test <- apply(data2, 1, function(x){
ratio <- x[1] / x[2]
peak_cr <- x[1]
bl_cr <- x[2]
rrt <- x[3]
if (rrt == "1"){return(3)}
else if (ratio >= 3){return(3)}
else if (peak_cr > 353.6){return(3)}
else if (ratio > 2 & ratio <3){return(2)}
else if (ratio > 1.5 & ratio <2){return(1)}
else if ((peak_cr >= bl_cr + 26.5)){return(1)}
else {return (0)}
})
But this returns the following error message, despite being of class numerical:
Error in x[1]/x[2] : non-numeric argument to binary operator
Another attempt:
data2 %>%
mutate(test =
akistage(Peak.Creatinine.1,baseline.Cr.within.12.months,new.RRT))
Returns
Warning message:
In if (rrt == "1") { :
the condition has length > 1 and only the first element will be used
I have attempted it in lots of other ways, and I'm not sure why it's not working.
It does not seem very difficult to do, I would be extremely grateful if someone could come up with a solution!
Many thanks for your help!
The following vectorized function does what the question describes. It uses index vectors to assign the return values to a previously created vector AKI.stage.
akistage <- function(peak_cr, bl_cr, rrt = 0) {
AKI.stage <- numeric(length(peak_cr))
ratio <- peak_cr / bl_cr
rrt1 <- rrt == 1
i <- findInterval(ratio, c(0, 1.5, 2, 3, Inf))
AKI.stage[rrt1 | i == 4 | peak_cr > 353.6] <- 3
AKI.stage[!rrt1 & i == 3] <- 2
AKI.stage[!rrt1 & i == 2] <- 1
AKI.stage[!rrt1 & i == 1 & peak_cr >= bl_cr + 26.5] <- 1
AKI.stage
}
data %>%
mutate(test = akistage(Peak.Creatinine.1,baseline.Cr.within.12.months,new.RRT))
I propose you different solutions to add a new colum to a data.frame using only base R :
df <- data.frame(v1 = rep(0, 100), v2 = seq(1, 100))
v3 <- rep(0, 100)
# first way with a $
df$v3 <- v3
# second way with cbind
df <- cbind(df, v3)
# third way
df[, 3] <- 3
EDIT 1
Your problem is coming from the fact that your third column is a factor so when you use apply it transforms all your data into character. The right way to do what you want is :
sapply(1:nrow(data2), function(i, df){
x <- df[i,]
ratio <- x[1] / x[2]
peak_cr <- x[1]
bl_cr <- x[2]
rrt <- x[3]
if (rrt == "1"){return(3)}
else if (ratio >= 3){return(3)}
else if (peak_cr > 353.6){return(3)}
else if (ratio > 2 & ratio <3){return(2)}
else if (ratio > 1.5 & ratio <2){return(1)}
else if ((peak_cr >= bl_cr + 26.5)){return(1)}
else {return (0)}
}, df = data2)

Change cell value in one raster based on another raster

I have two raster maps from two points in time (t1 and t2) with two land-cover categories in each (LC1, LC2). I want impose a rule that a LC2-cell in t1 cannot change to LC1-cell in t2, i.e., only LC1 can change to LC2 through time but not the other way around. I am having a hard time coming up with a rule for that in R. What I had in mind was something like this:
#create test rasters
r <- raster(nrows=25, ncols=25, vals=round(rnorm(625, 3), 0)) #land-use/cover raster
r[ r > 2 ] <- 2
r[ r < 1 ] <- 1
r2 <- r
plot(r2) #r2 is t2
r <- raster(nrows=25, ncols=25, vals=round(rnorm(625, 3), 0)) #land-use/cover raster
r[ r > 2 ] <- 2
r[ r < 1 ] <- 1
plot(r) #r is t1
r_fix <- overlay(r, r2, fun = function(x, y) {
if (x[ x==2 ] & y[ y==1 ]) { #1 is LC1, 2 is LC2
x[ x==2 ] <- 1 }
return(x)
})
But it returns an error (because of they way I am using the if statement with rasters?):
Error in (function (x, fun, filename = "", recycle = TRUE, forcefun = FALSE, :
cannot use this formula, probably because it is not vectorized
I wonder if there is a simple way to implement something similar to that that works with rasters? Thank you in advance.
You were really close,
overlay(r, r2, fun = function(x, y) {x[x == 2 & y == 1] <- 1; x})
seems to do the job.
In terms of your solution,
x[x == 2] <- 1
doesn't cause any errors, although it's not exactly what you want to use in your case either. However,
if (x[x == 2] & y[y == 1])
is a problem because x[x == 2] & y[y == 1] returns a matrix, while if wants just a single logical input. Subsetting, on the other hand, can handle logical matrices, which is exactly what is happening in x[x == 2 & y == 1].

Repeated conditional change with sapply or a loop in R

I am trying to do a conditional change for a list of 11 columns in R. My conditional is always the same survey$only0 == 1. I wrote the following code:
survey$w.house[survey$only0 == 1] <- 1
survey$w.inc[survey$only0 == 1] <- 1
survey$w.jobs[survey$only0 == 1] <- 1
survey$w.com[survey$only0 == 1] <- 1
survey$w.edu[survey$only0 == 1] <- 1
survey$w.env[survey$only0 == 1] <- 1
survey$w.health[survey$only0 == 1] <- 1
survey$w.satisf[survey$only0 == 1] <- 1
survey$w.safe[survey$only0 == 1] <- 1
survey$w.bal[survey$only0 == 1] <- 1
survey$w.civic[survey$only0 == 1] <- 1
My code works well, but I would like to shorten my code using a loop or a function as sapply or lapply. Does anyone know how to do it ?
Thank you for your help !
David
We can do this easily with lapply by looping through the columns of interest ('nm1'), and replace the values of it to 1 where 'only0' is 1.
survey[nm1] <- lapply(survey[nm1], function(x) replace(x, survey$only0==1, 1))
Or as #Vlo mentioned the anonymous function call is not needed
survey[nm1] <- lapply(survey[nm1], replace, list = survey$only0==1, values=1)
where
nm1 <- c("w.house", "w.inc", "w.jobs", "w.com", "w.edu", "w.env",
"w.health", "w.satisf", "w.safe", "w.bal", "w.civic")
You can try,
survey[survey$only0 == 1, cols] <- 1
where cols are the columns for which you want to check the condition.
cols <- c("w.house", "w.inc", "w.jobs", "w.com", "w.edu", "w.env",
"w.health", "w.satisf", "w.safe", "w.bal", "w.civic")

Possible combinations using R

I have edited my question and changed certain lines in my script, to make it clear to find the number of times I can get the output 1 or 0.
I have 19 variables.I tried the possible combinations of these 19 variables for giving a binary output of 0 or 1 i.e. 2 to the power of 19 (5,24,288). But I couldn't display the truth table in R for all the 5,24,288 combinations because of the limited memory space. Is there any way to find the number of combinations that give the output 1 and 0. Below is the script, where I have given the following inputs using logical gate AND and OR. Kindly give me ideas or suggestions to find the number of times I can get values 0 or 1 as output
n <- 19
l <- rep(list(0:1), n)
inputs <- expand.grid(l)
len <-dim(inputs)
len <-len[1]
output <- 1;
for(i in 1:len)
{
if((inputs[i,1] == 1 & inputs[i,2] == 1 & inputs[i,3] == 1 & (inputs[i,4] == 1 & inputs[i,5] == 1 | inputs[i,6] == 1 & inputs[i,7] == 0)) | (inputs[i,1] == 1 & inputs[i,2] == 1 & inputs[i,8] == 1 & inputs[i,9] == 1) | (inputs[i,1] == 1 & inputs[i,10] == 0 & inputs[i,11] == 0) |(inputs[i,1] == 1 & inputs[i,12] == 1 & inputs[i,13] == 1 & inputs[i,14] == 1) | (inputs[i,1] == 1 & inputs[i,15] == 1 & inputs[i,16] == 1) | (inputs[i,1] == 1 & inputs[i,17] == 0) | (inputs[i,1] == 1 & inputs[i,18] == 1 & inputs[i,19] == 1)){
output[i] <- 1
}
else
{
output[i] <- 0
}
}
data <- cbind(inputs, output)
write.csv(data, "data.csv", row.names=FALSE)
1048576 isn't absurdly big. If all you want are the 20 0/1 columns it takes about 80 Mb if you use integers:
x = replicate(n = 20, expr = c(0L, 1L), simplify = FALSE)
comb = do.call(expand.grid, args = x)
dim(comb)
# [1] 1048576 20
format(object.size(comb), units = "Mb")
# [1] "80 Mb"
In your question you use && a lot. && is good for comparing something of length 1. Use & for a vectorized comparison so you don't need a for loop.
For example:
y = matrix(c(1, 1, 0, 0, 1, 0, 1, 0), nrow = 4)
y[, 1] & y[, 2] # gives the truth table for & applied across columns
# no for loop needed
# R will interpret 0 as FALSE and non-zero numbers as TRUE
# so you don't even need the == 1 and == 0 parts.
It seems like you're really after the number of combinations where all the values are 1. (Or where they all have specific values.) I'm not going to give away the answer here because I suspect this is for homework, but I will say that you shouldn't need to program a single line of code to find that out. If you understand what the universe of 'all possible combinations' is, the answer will be quite clear logically.
I guess this is what you want:
key <- c(1,0,1,1,1,1,1,1,1,1,1,0,1,1,0,1,1,1,1,1) # based on your if condition
inputs <- expand.grid(rep(list(0:1), 20))
len <- nrow(inputs)
output <- sapply(1:len, function(i) all(inputs[i,]==key))
data <- cbind(inputs, as.numeric(output))
write.csv(data, "data.csv", row.names=FALSE)
Although, as stressed by others, key can be found only in one row out of all 1048576 rows.

Resources