How to properly avoid if-expressions by using vector indices? - r

x is a vector of integers ranging between 1 and 100
I created a function that determines in which category a number is:
x∈[1,20]: small
x∈[21,50]: med
x∈[51, 100]:large
Here the function:
x <- c(1:99)
vector.fun<-function(x){
x[x >= 1 & x <=20] <-"small"
x[x >= 21 & x <=50] <-"med"
x[x >=51 & x <=99] <-"large"
return(x)
}
vector.fun(89)
However as you can see, in the function my vector is 1:99 instead of 1:100, for some reason when i change it to:
x <- c(1:100)
vector.fun<-function(x){
x[x >= 1 & x <=20] <-"small"
x[x >= 21 & x <=50] <-"med"
x[x >=51 & x <=100] <-"large"
return(x)
}
vector.fun(100)
it doesn't recognise any number from the last line: x[x >=51 & x <=100] <-"large" and when it does it returns "med" instead of "large" as it should be.
what am I doing wrong? Which changes should I do in my function in order that 100 is included in the parameter and returns "large"?

It is indeed a coercion problem as mentioned in the comments above.
If you want to keep your function structure the way you created it, you can alter it as follows:
vector.fun<-function(y){
x <- y
x[y >= 1 & y <=20] <-"small"
x[y >= 21 & y <=50] <-"med"
x[y >=51 & y <=100] <-"large"
return(x)
}
Although the solution suggested by #alexis_laz is more concise and elegant:
vector.fun<-function(x){
cut(x, c(0,20,50,100), labels = c("small", "med", "large"))
}
Keep in mind, this second version will produce a factor type vector, while the first version will produce a character type vector.

Related

how to select values from a list within a Range using while and if loop?

I have a list with values and I need to select values from this list that are greater or equal to 3 and less than/equal to 4 but I don't know how to do so using the while and if loops. Anyone could give me a clue on how to solve this?
If I understood what you have in mind correctly, you can use the following solution. Imagine we have a vector of length 20 called vec:
We first create an empty vector out to store our result in it during in every iteration (if any)
Then we set the iterator i to its first value (here we set the initial value to 1)
While loops begin by testing a condition (i <= length(vec)), so after that they execute the body (our if clause and subsequent assigning of value that meets our requirements(>=3 & <=4) to out). It then adds one to the iterator and evaluates the condition again and so forth.
vec <- sample(1:10, size = 20, replace = TRUE)
out <- c()
i <- 1
while(i <= length(vec)) {
if(vec[i] <= 4 & vec[i] >= 3) {
out <- c(out, vec[i])
}
i <- i + 1
}
out
[1] 4 3
Actually you don't need any while/if statements, you can simply apply
x[x >= 3 & x <= 4]
If you have to use while and if to make it, below is one option
k <- 1
res <- c()
while(k < length(x)) {
if (x[k] >= 3 & x[k] <= 4) {
res <- append(res,x[k])
}
k <- k + 1
}

Trying to create a new column in a data frame using a function in R

I have a large data frame, and I would like to create a new column for the data frame in R but I am struggling.
I am a relative beginner and I would be very grateful for some help.
Essentially I am looking to create a new column of AKI stage, based on an individuals peak and baseline creatinine measurements, and whether they have been on renal-replacement therapy (RRT), according to the following criteria:
stage 1: Peak Cr/Baseline Cr = 1.5–1.9 OR Peak Cr ≥ Baseline Cr + 26.5mmol/l)
stage 2: Peak Cr/Baseline Cr = 2.0–2.9
stage 3: Peak Cr/Baseline Cr ≥ 3 OR Peak cr ≥353.6mmol/l OR Initiation of RRT
My data looks like this, in which I have 3 main variables.
head(data)
Peak.Creatinine.1 baseline.Cr.within.12.months new.RRT
1 421 82 1
2 659 98 1
3 569 89 1
4 533 113 1
5 533 212 1
6 396 65 1
I would like to create a new column called "AKI.stage", which returns a number 0,1,2,3 or 4.
Which essentially uses this function:
akistage <- function(peak_cr, bl_cr, rrt=0) {
ratio <- peak_cr / bl_cr
if (rrt == "1"){return(3)}
else if (ratio >= 3){return(3)}
else if (peak_cr > 353.6){return(3)}
else if (ratio > 2 & ratio <3){return(2)}
else if (ratio > 1.5 & ratio <2){return(1)}
else if ((peak_cr >= bl_cr + 26.5)){return(1)}
else {return (0)}
}
The function works well when I test it, but I can't seem to apply it to the dataframe in order to create the new column.
I have attempted this in multiple ways including using apply,mapply,mutate,transform etc but I just can't seem to get it to work.
Here are some of my failed attempts:
data2$Peak.Creatinine.1 <- as.numeric(data2$Peak.Creatinine.1)
data2$baseline.Cr.within.12.months <- as.numeric(data2$baseline.Cr.within.12.months)
data2$test <- apply(data2, 1, function(x){
ratio <- x[1] / x[2]
peak_cr <- x[1]
bl_cr <- x[2]
rrt <- x[3]
if (rrt == "1"){return(3)}
else if (ratio >= 3){return(3)}
else if (peak_cr > 353.6){return(3)}
else if (ratio > 2 & ratio <3){return(2)}
else if (ratio > 1.5 & ratio <2){return(1)}
else if ((peak_cr >= bl_cr + 26.5)){return(1)}
else {return (0)}
})
But this returns the following error message, despite being of class numerical:
Error in x[1]/x[2] : non-numeric argument to binary operator
Another attempt:
data2 %>%
mutate(test =
akistage(Peak.Creatinine.1,baseline.Cr.within.12.months,new.RRT))
Returns
Warning message:
In if (rrt == "1") { :
the condition has length > 1 and only the first element will be used
I have attempted it in lots of other ways, and I'm not sure why it's not working.
It does not seem very difficult to do, I would be extremely grateful if someone could come up with a solution!
Many thanks for your help!
The following vectorized function does what the question describes. It uses index vectors to assign the return values to a previously created vector AKI.stage.
akistage <- function(peak_cr, bl_cr, rrt = 0) {
AKI.stage <- numeric(length(peak_cr))
ratio <- peak_cr / bl_cr
rrt1 <- rrt == 1
i <- findInterval(ratio, c(0, 1.5, 2, 3, Inf))
AKI.stage[rrt1 | i == 4 | peak_cr > 353.6] <- 3
AKI.stage[!rrt1 & i == 3] <- 2
AKI.stage[!rrt1 & i == 2] <- 1
AKI.stage[!rrt1 & i == 1 & peak_cr >= bl_cr + 26.5] <- 1
AKI.stage
}
data %>%
mutate(test = akistage(Peak.Creatinine.1,baseline.Cr.within.12.months,new.RRT))
I propose you different solutions to add a new colum to a data.frame using only base R :
df <- data.frame(v1 = rep(0, 100), v2 = seq(1, 100))
v3 <- rep(0, 100)
# first way with a $
df$v3 <- v3
# second way with cbind
df <- cbind(df, v3)
# third way
df[, 3] <- 3
EDIT 1
Your problem is coming from the fact that your third column is a factor so when you use apply it transforms all your data into character. The right way to do what you want is :
sapply(1:nrow(data2), function(i, df){
x <- df[i,]
ratio <- x[1] / x[2]
peak_cr <- x[1]
bl_cr <- x[2]
rrt <- x[3]
if (rrt == "1"){return(3)}
else if (ratio >= 3){return(3)}
else if (peak_cr > 353.6){return(3)}
else if (ratio > 2 & ratio <3){return(2)}
else if (ratio > 1.5 & ratio <2){return(1)}
else if ((peak_cr >= bl_cr + 26.5)){return(1)}
else {return (0)}
}, df = data2)

Change cell value in one raster based on another raster

I have two raster maps from two points in time (t1 and t2) with two land-cover categories in each (LC1, LC2). I want impose a rule that a LC2-cell in t1 cannot change to LC1-cell in t2, i.e., only LC1 can change to LC2 through time but not the other way around. I am having a hard time coming up with a rule for that in R. What I had in mind was something like this:
#create test rasters
r <- raster(nrows=25, ncols=25, vals=round(rnorm(625, 3), 0)) #land-use/cover raster
r[ r > 2 ] <- 2
r[ r < 1 ] <- 1
r2 <- r
plot(r2) #r2 is t2
r <- raster(nrows=25, ncols=25, vals=round(rnorm(625, 3), 0)) #land-use/cover raster
r[ r > 2 ] <- 2
r[ r < 1 ] <- 1
plot(r) #r is t1
r_fix <- overlay(r, r2, fun = function(x, y) {
if (x[ x==2 ] & y[ y==1 ]) { #1 is LC1, 2 is LC2
x[ x==2 ] <- 1 }
return(x)
})
But it returns an error (because of they way I am using the if statement with rasters?):
Error in (function (x, fun, filename = "", recycle = TRUE, forcefun = FALSE, :
cannot use this formula, probably because it is not vectorized
I wonder if there is a simple way to implement something similar to that that works with rasters? Thank you in advance.
You were really close,
overlay(r, r2, fun = function(x, y) {x[x == 2 & y == 1] <- 1; x})
seems to do the job.
In terms of your solution,
x[x == 2] <- 1
doesn't cause any errors, although it's not exactly what you want to use in your case either. However,
if (x[x == 2] & y[y == 1])
is a problem because x[x == 2] & y[y == 1] returns a matrix, while if wants just a single logical input. Subsetting, on the other hand, can handle logical matrices, which is exactly what is happening in x[x == 2 & y == 1].

Why are these functions different?

I am not sure why I get different results from these functions.
change_it1 <- function(x) {
x[x == 5] <- -10
}
change_it2 <- function(x) {
x[x == 5] <- -10
x
}
x <- 1:5
x <- change_it1(x)
x
x <- 1:5
x <- change_it2(x)
x
Why do both functions not change x in the same way as?
x[x==5] <- -10
The assignment operator <- is really a function that has the side effect of changing a variables value. But as a function, it also invisibly returns the value that was used on the right hand side for assignment. We can force the invisible value to be seen with a print(). For example
x <- 1:2
print(names(x) <- c("a","b"))
# [1] "a" "b"
or again with subsetting
print(x[1] <- 10)
# [1] 10
print(x[2] <- 20)
# [1] 20
x
# a b
# 10 20
See in each case the assignment returned the right-hand-side value and not the updated value of x. Functions will return whatever value was returned by the last expression. In the first case, you are returning the value returned by the assignment (which is just the value -10) and in the second case you are explicitly returning the updated x value.
The functions both change x in the same way (at least in the scope of the function), but you are just not returning the updated x value in both cases.

Recoding Numeric Vector R

I have a numeric vector, let's say something like:
x <- rep(1:6, 300)
What I would like to do is recode the vector, in place such that 6=1,5=2,4=3,3=4,2=5,1=6. I don't want to create a factor out of it.
Everything I have tried so far gives me the wrong counts because of the order, ie:
x[x == 6] <- 1
x[x == 5] <- 2 ## Lines that follow where x[x == 2] removes 5's entirely from counts.
Note: I'm aware of the car package, but would prefer to use base R for this problem.
Construct a map between the old and new values, and subset with the old,
(6:1)[x]
Wouldn't something as simple as 7 - x give you what you are after?
See manual for car::recode. Otherwise, create variable y:
y <- numeric()
length(y) <- length(x)
y[x == 6] <- 1
y[x == 5] <- 2
## ad nauseam...
It's always considered a bad practice to recode variables in place, because if you mess things up, you're probably going to lose data. Be careful.
In your case, yes, just subtract. In general, match can be quite useful in cases like this. For example, suppose you wanted to recode the values in this x column to the values in the y column
> d <- data.frame(x=c(1,3,4,5 ,6),y=c(3,4,2.2,1,4.6))
> print(d, row.names=FALSE)
x y
1 3.0
3 4.0
4 2.2
5 1.0
6 4.6
Then this would recode the values in a to the new values.
> a <- c(3,4,6,1,5)
> d$y[match(a,d$x)]
[1] 4.0 2.2 4.6 3.0 1.0
rev(x) ... at least when the length is an even multiple of the sequence.
if you want to recode multiple variables you might take the following approach:
MapFunc = function(x) {
y = NULL;
if (x %in% c("1","2","3")) {y=100}
if (x %in% c("0","4")) {y=200}
if (x %in% c("5")) {y=100}
print(y)
}
MapFunc(x=1); MapFunc(x=0); #working ok for scalars
#
X = matrix( sample(0:5,25,replace=TRUE), nrow=5,ncol=5)
apply(X,c(1,2),MapFunc) #working ok for matrices...

Resources