I have two raster maps from two points in time (t1 and t2) with two land-cover categories in each (LC1, LC2). I want impose a rule that a LC2-cell in t1 cannot change to LC1-cell in t2, i.e., only LC1 can change to LC2 through time but not the other way around. I am having a hard time coming up with a rule for that in R. What I had in mind was something like this:
#create test rasters
r <- raster(nrows=25, ncols=25, vals=round(rnorm(625, 3), 0)) #land-use/cover raster
r[ r > 2 ] <- 2
r[ r < 1 ] <- 1
r2 <- r
plot(r2) #r2 is t2
r <- raster(nrows=25, ncols=25, vals=round(rnorm(625, 3), 0)) #land-use/cover raster
r[ r > 2 ] <- 2
r[ r < 1 ] <- 1
plot(r) #r is t1
r_fix <- overlay(r, r2, fun = function(x, y) {
if (x[ x==2 ] & y[ y==1 ]) { #1 is LC1, 2 is LC2
x[ x==2 ] <- 1 }
return(x)
})
But it returns an error (because of they way I am using the if statement with rasters?):
Error in (function (x, fun, filename = "", recycle = TRUE, forcefun = FALSE, :
cannot use this formula, probably because it is not vectorized
I wonder if there is a simple way to implement something similar to that that works with rasters? Thank you in advance.
You were really close,
overlay(r, r2, fun = function(x, y) {x[x == 2 & y == 1] <- 1; x})
seems to do the job.
In terms of your solution,
x[x == 2] <- 1
doesn't cause any errors, although it's not exactly what you want to use in your case either. However,
if (x[x == 2] & y[y == 1])
is a problem because x[x == 2] & y[y == 1] returns a matrix, while if wants just a single logical input. Subsetting, on the other hand, can handle logical matrices, which is exactly what is happening in x[x == 2 & y == 1].
Related
I have a large data frame, and I would like to create a new column for the data frame in R but I am struggling.
I am a relative beginner and I would be very grateful for some help.
Essentially I am looking to create a new column of AKI stage, based on an individuals peak and baseline creatinine measurements, and whether they have been on renal-replacement therapy (RRT), according to the following criteria:
stage 1: Peak Cr/Baseline Cr = 1.5–1.9 OR Peak Cr ≥ Baseline Cr + 26.5mmol/l)
stage 2: Peak Cr/Baseline Cr = 2.0–2.9
stage 3: Peak Cr/Baseline Cr ≥ 3 OR Peak cr ≥353.6mmol/l OR Initiation of RRT
My data looks like this, in which I have 3 main variables.
head(data)
Peak.Creatinine.1 baseline.Cr.within.12.months new.RRT
1 421 82 1
2 659 98 1
3 569 89 1
4 533 113 1
5 533 212 1
6 396 65 1
I would like to create a new column called "AKI.stage", which returns a number 0,1,2,3 or 4.
Which essentially uses this function:
akistage <- function(peak_cr, bl_cr, rrt=0) {
ratio <- peak_cr / bl_cr
if (rrt == "1"){return(3)}
else if (ratio >= 3){return(3)}
else if (peak_cr > 353.6){return(3)}
else if (ratio > 2 & ratio <3){return(2)}
else if (ratio > 1.5 & ratio <2){return(1)}
else if ((peak_cr >= bl_cr + 26.5)){return(1)}
else {return (0)}
}
The function works well when I test it, but I can't seem to apply it to the dataframe in order to create the new column.
I have attempted this in multiple ways including using apply,mapply,mutate,transform etc but I just can't seem to get it to work.
Here are some of my failed attempts:
data2$Peak.Creatinine.1 <- as.numeric(data2$Peak.Creatinine.1)
data2$baseline.Cr.within.12.months <- as.numeric(data2$baseline.Cr.within.12.months)
data2$test <- apply(data2, 1, function(x){
ratio <- x[1] / x[2]
peak_cr <- x[1]
bl_cr <- x[2]
rrt <- x[3]
if (rrt == "1"){return(3)}
else if (ratio >= 3){return(3)}
else if (peak_cr > 353.6){return(3)}
else if (ratio > 2 & ratio <3){return(2)}
else if (ratio > 1.5 & ratio <2){return(1)}
else if ((peak_cr >= bl_cr + 26.5)){return(1)}
else {return (0)}
})
But this returns the following error message, despite being of class numerical:
Error in x[1]/x[2] : non-numeric argument to binary operator
Another attempt:
data2 %>%
mutate(test =
akistage(Peak.Creatinine.1,baseline.Cr.within.12.months,new.RRT))
Returns
Warning message:
In if (rrt == "1") { :
the condition has length > 1 and only the first element will be used
I have attempted it in lots of other ways, and I'm not sure why it's not working.
It does not seem very difficult to do, I would be extremely grateful if someone could come up with a solution!
Many thanks for your help!
The following vectorized function does what the question describes. It uses index vectors to assign the return values to a previously created vector AKI.stage.
akistage <- function(peak_cr, bl_cr, rrt = 0) {
AKI.stage <- numeric(length(peak_cr))
ratio <- peak_cr / bl_cr
rrt1 <- rrt == 1
i <- findInterval(ratio, c(0, 1.5, 2, 3, Inf))
AKI.stage[rrt1 | i == 4 | peak_cr > 353.6] <- 3
AKI.stage[!rrt1 & i == 3] <- 2
AKI.stage[!rrt1 & i == 2] <- 1
AKI.stage[!rrt1 & i == 1 & peak_cr >= bl_cr + 26.5] <- 1
AKI.stage
}
data %>%
mutate(test = akistage(Peak.Creatinine.1,baseline.Cr.within.12.months,new.RRT))
I propose you different solutions to add a new colum to a data.frame using only base R :
df <- data.frame(v1 = rep(0, 100), v2 = seq(1, 100))
v3 <- rep(0, 100)
# first way with a $
df$v3 <- v3
# second way with cbind
df <- cbind(df, v3)
# third way
df[, 3] <- 3
EDIT 1
Your problem is coming from the fact that your third column is a factor so when you use apply it transforms all your data into character. The right way to do what you want is :
sapply(1:nrow(data2), function(i, df){
x <- df[i,]
ratio <- x[1] / x[2]
peak_cr <- x[1]
bl_cr <- x[2]
rrt <- x[3]
if (rrt == "1"){return(3)}
else if (ratio >= 3){return(3)}
else if (peak_cr > 353.6){return(3)}
else if (ratio > 2 & ratio <3){return(2)}
else if (ratio > 1.5 & ratio <2){return(1)}
else if ((peak_cr >= bl_cr + 26.5)){return(1)}
else {return (0)}
}, df = data2)
I am trying to do a conditional change for a list of 11 columns in R. My conditional is always the same survey$only0 == 1. I wrote the following code:
survey$w.house[survey$only0 == 1] <- 1
survey$w.inc[survey$only0 == 1] <- 1
survey$w.jobs[survey$only0 == 1] <- 1
survey$w.com[survey$only0 == 1] <- 1
survey$w.edu[survey$only0 == 1] <- 1
survey$w.env[survey$only0 == 1] <- 1
survey$w.health[survey$only0 == 1] <- 1
survey$w.satisf[survey$only0 == 1] <- 1
survey$w.safe[survey$only0 == 1] <- 1
survey$w.bal[survey$only0 == 1] <- 1
survey$w.civic[survey$only0 == 1] <- 1
My code works well, but I would like to shorten my code using a loop or a function as sapply or lapply. Does anyone know how to do it ?
Thank you for your help !
David
We can do this easily with lapply by looping through the columns of interest ('nm1'), and replace the values of it to 1 where 'only0' is 1.
survey[nm1] <- lapply(survey[nm1], function(x) replace(x, survey$only0==1, 1))
Or as #Vlo mentioned the anonymous function call is not needed
survey[nm1] <- lapply(survey[nm1], replace, list = survey$only0==1, values=1)
where
nm1 <- c("w.house", "w.inc", "w.jobs", "w.com", "w.edu", "w.env",
"w.health", "w.satisf", "w.safe", "w.bal", "w.civic")
You can try,
survey[survey$only0 == 1, cols] <- 1
where cols are the columns for which you want to check the condition.
cols <- c("w.house", "w.inc", "w.jobs", "w.com", "w.edu", "w.env",
"w.health", "w.satisf", "w.safe", "w.bal", "w.civic")
x is a vector of integers ranging between 1 and 100
I created a function that determines in which category a number is:
x∈[1,20]: small
x∈[21,50]: med
x∈[51, 100]:large
Here the function:
x <- c(1:99)
vector.fun<-function(x){
x[x >= 1 & x <=20] <-"small"
x[x >= 21 & x <=50] <-"med"
x[x >=51 & x <=99] <-"large"
return(x)
}
vector.fun(89)
However as you can see, in the function my vector is 1:99 instead of 1:100, for some reason when i change it to:
x <- c(1:100)
vector.fun<-function(x){
x[x >= 1 & x <=20] <-"small"
x[x >= 21 & x <=50] <-"med"
x[x >=51 & x <=100] <-"large"
return(x)
}
vector.fun(100)
it doesn't recognise any number from the last line: x[x >=51 & x <=100] <-"large" and when it does it returns "med" instead of "large" as it should be.
what am I doing wrong? Which changes should I do in my function in order that 100 is included in the parameter and returns "large"?
It is indeed a coercion problem as mentioned in the comments above.
If you want to keep your function structure the way you created it, you can alter it as follows:
vector.fun<-function(y){
x <- y
x[y >= 1 & y <=20] <-"small"
x[y >= 21 & y <=50] <-"med"
x[y >=51 & y <=100] <-"large"
return(x)
}
Although the solution suggested by #alexis_laz is more concise and elegant:
vector.fun<-function(x){
cut(x, c(0,20,50,100), labels = c("small", "med", "large"))
}
Keep in mind, this second version will produce a factor type vector, while the first version will produce a character type vector.
I am trying to write a function that will take a data.frame, a list (or a character vector) of variable names of the data.frame and create some new variables with names derived from the corresponding variable names in the list and values from the variables named in the list.
For example, if data.frame d has variable x, y, z, w, the list of names is c('x', 'z') the output maybe vectors with names x.cat, z.cat and values based on values of d$x and d$z.
I can do this with a loop
df <- data.frame(x = c(1 : 10), y = c(11 : 20), z = c(21 : 30), w = c(41: 50))
vnames <- c("x", "w")
loopfunc <- function(dat, vlst){
s <- paste(vlst, "cat", sep = ".")
for (i in 1:length(vlst)){
dat[s[i]] <- NA
dat[s[i]][dat[vlst[i]] %% 4 == 0 ] <- 0
dat[s[i]][dat[vlst[i]] %% 4 == 1 | dat[vlst[i]] %%4 == 3] <- 1
dat[s[i]][dat[vlst[i]] %% 4 == 2 ] <- 2
}
dat[s]
}
dout <- loopfunc(df, vnames)
This would output a 10x2 data.frame with columns x.cat and w.cat, the values of these are 0, 1, or 2 depending on the remainder of the corresponding values of df$x and df$w mod 4.
I would like to find a way to something like this without loop, maybe using the apply functions?
Here is a failed attempt
noloopfunc <- function(dat, l){
assign(l[2], NA)
assign(l[2][d[l[1]] %% 4 == 0], 0)
assign(l[2][d[l[1]] %% 4 == 2], 2)
assign(l[2][(d[l[1]] %% 4 == 1) | (d[l[1]] %% 4 == 3)], 1)
as.name(l[2])
}
newvnames <- sapply(vnames, function(x){paste(x, "cat", sep = ".")})
vpairs <- mapply(c, vnames, newvnames, SIMPLIFY = F)
lapply(vpairs, noloopfunc, d <- df)
Here the formal argument l is supposed to represent vpairs[[1]] or vpairs[[2]], both string vectors of length 2.
I found several threads on Stackoverflow on converting strings to variable names but I couldn't find anything where it is used in this way where the variables have to be referred to subsequently and assigned values in a non interactive way.
Thanks for any help.
You can replace your loop with an apply variant
dout <- as.data.frame(sapply(vnames, function(x) {
out <- rep(NA, nrow(df))
out[df[,x] %% 4 == 0] <- 0
out[df[,x] %% 4 == 1 | df[,x] %% 4 == 3] <- 1
out[df[,x] %% 4 == 2] <- 2
out
}))
names(dout) <- paste(vnames, "cat", sep=".")
I have some code that identifies outliers in a data frame and then either removes or caps them. I'm trying to speed up the removal process using an apply() function (or perhaps another method).
Example data
https://github.com/crossfitAL/so_ex_data/blob/master/subset
# this is the contents of a csv file, you will need to load it into your R session.
# set up an example decision-matrix
# rm.mat is a {length(cols) x 4} matrix -- in this example 8 x 4
# rm.mat[,1:2] - identify the values for min/max outliers, respectively.
# rm.mat[,3:4] - identify if you wish to remove min/max outliers, respectively.
cols <- c(1, 6:12) # specify the columns you wish to examine
rm.mat <- matrix(nrow = length(cols), ncol= 4,
dimnames= list(names(fico2[cols]),
c("out.min", "out.max","rm outliers?", "rm outliers?")))
# add example decision criteria
rm.mat[, 1] <- apply(fico2[, cols], 2, quantile, probs= .05)
rm.mat[, 2] <- apply(fico2[, cols], 2, quantile, probs= .95)
rm.mat[, 3] <- replicate(4, c(0,1))
rm.mat[, 4] <- replicate(4, c(1,0))
Here's my current code for subsetting:
df2 <- fico2 # create a copy of the data frame
cnt <- 1 # add a count variable
for (i in cols) {
# for each column of interest in the data frame. Determine if there are min/max
# outliers that you wish to remove, remove them.
if (rm.mat[cnt, 3] == 1 & rm.mat[cnt, 4] == 1) {
# subset / remove min and max outliers
df2 <- df2[df2[, i] >= rm.mat[cnt, 1] & df2[, i] <= rm.mat[cnt, 2], ]
} else if (rm.mat[cnt, 3] == 1 & rm.mat[cnt, 4] == 0) {
# subset / remove min outliers
df2 <- df2[df2[, i] >= rm.mat[cnt, 1], ]
} else if (rm.mat[cnt, 3] == 0 & rm.mat[cnt, 4] == 1) {
# subset / remove max outliers
df2 <- df2[df2[, i] <= rm.mat[cnt, 2], ]
}
cnt <- cnt + 1
}
proposed solution:
I think I should be able to do this via an apply type function, with the removal of the for loop / vectorization speeding up the code. The problem that I'm running into is that I'm trying to apply a function if-and-only-if the the decision-matrix indicates that I should. IE- using a logical vector rm.mat[,3] or rm.mat[,4] to determine if subsetting "[" should be applied to the dataframe df2.
Any help you have would be greatly appreciated! Also, please let me know if the example data / code is sufficient.
Here a solution. just to clarify your code. Hope that others can use it to give a better solution.
So if understand, you have a decision matrix, that looks like this :
rm.mat
c1 c2 c3 c4
amount.funded.by.investors 27925.000 NA 0 1
monthly.income 11666.670 NA 1 0
open.credit.lines 18.000 NA 0 1
revolving.credit.balance 40788.750 NA 1 0
inquiries.in.the.last.6.months 3.000 NA 0 1
debt.to.inc 28.299 NA 1 0
int.rate 20.490 NA 0 1
fico.num 775.000 NA 1 0
And you try to filter a big matrix according to the values of this matrix
colnames(rm.mat) <- paste('c',1:4,sep='')
rm.mat <- as.data.frame(rm.mat)
apply(rm.mat,1,function(y){
h <- paste(y['c3'],y['c4'],sep='')
switch(h,
'11'= apply(df2,2, function(x)
df2[x >= y['c1'] & x <= y['c2'],]), ## we never have this!!
'10'= apply(df2,2, function(x)
df2[x >= y['c1'] , ]), ## here we apply by columns!
'01'= apply(df2,2,function(x)
df2[x <= y['c2'], ])) ## c2 is NA!! so !!!
}
)