I am trying to do a conditional change for a list of 11 columns in R. My conditional is always the same survey$only0 == 1. I wrote the following code:
survey$w.house[survey$only0 == 1] <- 1
survey$w.inc[survey$only0 == 1] <- 1
survey$w.jobs[survey$only0 == 1] <- 1
survey$w.com[survey$only0 == 1] <- 1
survey$w.edu[survey$only0 == 1] <- 1
survey$w.env[survey$only0 == 1] <- 1
survey$w.health[survey$only0 == 1] <- 1
survey$w.satisf[survey$only0 == 1] <- 1
survey$w.safe[survey$only0 == 1] <- 1
survey$w.bal[survey$only0 == 1] <- 1
survey$w.civic[survey$only0 == 1] <- 1
My code works well, but I would like to shorten my code using a loop or a function as sapply or lapply. Does anyone know how to do it ?
Thank you for your help !
David
We can do this easily with lapply by looping through the columns of interest ('nm1'), and replace the values of it to 1 where 'only0' is 1.
survey[nm1] <- lapply(survey[nm1], function(x) replace(x, survey$only0==1, 1))
Or as #Vlo mentioned the anonymous function call is not needed
survey[nm1] <- lapply(survey[nm1], replace, list = survey$only0==1, values=1)
where
nm1 <- c("w.house", "w.inc", "w.jobs", "w.com", "w.edu", "w.env",
"w.health", "w.satisf", "w.safe", "w.bal", "w.civic")
You can try,
survey[survey$only0 == 1, cols] <- 1
where cols are the columns for which you want to check the condition.
cols <- c("w.house", "w.inc", "w.jobs", "w.com", "w.edu", "w.env",
"w.health", "w.satisf", "w.safe", "w.bal", "w.civic")
Related
Im new at programing in R.
I have a list which contains numbers between 0 and 5. I want to count how many times 1 appears before 5, in result2 stored my list. I have done this:
counting<-function(lista,n,m){
p=2
for (p in data_list){
if(results2[p]==n && results2[p-1]==m){
length(p)
}
p<-p+1
}
}
counting(results2,5,1)
Can anyone please provide me with som helpful adivce to imporve my code since it does not work.
We loop over the list, find the index of the first 5, get the sequence (seq), use that to subset the list element and count the number of 1 by creating a logical expression with == and using sum on that
sapply(data_list, function(x) {
i1 <- which(x == 5)
i2 <- i1[i1 > 1]
if(length(i2) > 0) {
sum(x[i2-1] == 1)
} else NA_real_
})
#[1] 3 3
Or in tidyverse, we can make use of lag
library(dplyr)
library(purrr)
map_dbl(data_list, ~ sum(.x == 5 & lag(.x) == 1, na.rm = TRUE))
#[1] 3 3
data
data_list <- list(c(3,4,1,5 ,2,3,1,5,4,1,5),
c(3,4,1,5 ,2,3,1,5,4,1,5))
a <- array(1:18, dim=c(3,3,2))
a1 <- which( a >= 17, arr.ind = T)
a1im <- cbind(a1[,1] - 1, a1[,c(2,3)])
#ADVDOMiM is a logical vector
I would like a better vectorization than this one:
a[a1im] <- ifelse( a[a1] >= 5 & ADVDOMiM, a[a1], a[a1im])
I would like something like this:
a[a[a1] >= 5 & ADVDOMiM] <- a[a1]
but it doesn't work( i need to change the result in a[a1im], but i don't know how to do it)
You can try to subset a1im and a1 with a[a1] >= 5 & ADVDOMiM to avoid the ifelse:
i <- a[a1] >= 5 & ADVDOMiM
a[a1im[i,]] <- a[a1[i,]]
I have two raster maps from two points in time (t1 and t2) with two land-cover categories in each (LC1, LC2). I want impose a rule that a LC2-cell in t1 cannot change to LC1-cell in t2, i.e., only LC1 can change to LC2 through time but not the other way around. I am having a hard time coming up with a rule for that in R. What I had in mind was something like this:
#create test rasters
r <- raster(nrows=25, ncols=25, vals=round(rnorm(625, 3), 0)) #land-use/cover raster
r[ r > 2 ] <- 2
r[ r < 1 ] <- 1
r2 <- r
plot(r2) #r2 is t2
r <- raster(nrows=25, ncols=25, vals=round(rnorm(625, 3), 0)) #land-use/cover raster
r[ r > 2 ] <- 2
r[ r < 1 ] <- 1
plot(r) #r is t1
r_fix <- overlay(r, r2, fun = function(x, y) {
if (x[ x==2 ] & y[ y==1 ]) { #1 is LC1, 2 is LC2
x[ x==2 ] <- 1 }
return(x)
})
But it returns an error (because of they way I am using the if statement with rasters?):
Error in (function (x, fun, filename = "", recycle = TRUE, forcefun = FALSE, :
cannot use this formula, probably because it is not vectorized
I wonder if there is a simple way to implement something similar to that that works with rasters? Thank you in advance.
You were really close,
overlay(r, r2, fun = function(x, y) {x[x == 2 & y == 1] <- 1; x})
seems to do the job.
In terms of your solution,
x[x == 2] <- 1
doesn't cause any errors, although it's not exactly what you want to use in your case either. However,
if (x[x == 2] & y[y == 1])
is a problem because x[x == 2] & y[y == 1] returns a matrix, while if wants just a single logical input. Subsetting, on the other hand, can handle logical matrices, which is exactly what is happening in x[x == 2 & y == 1].
I am trying to write a function that will take a data.frame, a list (or a character vector) of variable names of the data.frame and create some new variables with names derived from the corresponding variable names in the list and values from the variables named in the list.
For example, if data.frame d has variable x, y, z, w, the list of names is c('x', 'z') the output maybe vectors with names x.cat, z.cat and values based on values of d$x and d$z.
I can do this with a loop
df <- data.frame(x = c(1 : 10), y = c(11 : 20), z = c(21 : 30), w = c(41: 50))
vnames <- c("x", "w")
loopfunc <- function(dat, vlst){
s <- paste(vlst, "cat", sep = ".")
for (i in 1:length(vlst)){
dat[s[i]] <- NA
dat[s[i]][dat[vlst[i]] %% 4 == 0 ] <- 0
dat[s[i]][dat[vlst[i]] %% 4 == 1 | dat[vlst[i]] %%4 == 3] <- 1
dat[s[i]][dat[vlst[i]] %% 4 == 2 ] <- 2
}
dat[s]
}
dout <- loopfunc(df, vnames)
This would output a 10x2 data.frame with columns x.cat and w.cat, the values of these are 0, 1, or 2 depending on the remainder of the corresponding values of df$x and df$w mod 4.
I would like to find a way to something like this without loop, maybe using the apply functions?
Here is a failed attempt
noloopfunc <- function(dat, l){
assign(l[2], NA)
assign(l[2][d[l[1]] %% 4 == 0], 0)
assign(l[2][d[l[1]] %% 4 == 2], 2)
assign(l[2][(d[l[1]] %% 4 == 1) | (d[l[1]] %% 4 == 3)], 1)
as.name(l[2])
}
newvnames <- sapply(vnames, function(x){paste(x, "cat", sep = ".")})
vpairs <- mapply(c, vnames, newvnames, SIMPLIFY = F)
lapply(vpairs, noloopfunc, d <- df)
Here the formal argument l is supposed to represent vpairs[[1]] or vpairs[[2]], both string vectors of length 2.
I found several threads on Stackoverflow on converting strings to variable names but I couldn't find anything where it is used in this way where the variables have to be referred to subsequently and assigned values in a non interactive way.
Thanks for any help.
You can replace your loop with an apply variant
dout <- as.data.frame(sapply(vnames, function(x) {
out <- rep(NA, nrow(df))
out[df[,x] %% 4 == 0] <- 0
out[df[,x] %% 4 == 1 | df[,x] %% 4 == 3] <- 1
out[df[,x] %% 4 == 2] <- 2
out
}))
names(dout) <- paste(vnames, "cat", sep=".")
I have some code that identifies outliers in a data frame and then either removes or caps them. I'm trying to speed up the removal process using an apply() function (or perhaps another method).
Example data
https://github.com/crossfitAL/so_ex_data/blob/master/subset
# this is the contents of a csv file, you will need to load it into your R session.
# set up an example decision-matrix
# rm.mat is a {length(cols) x 4} matrix -- in this example 8 x 4
# rm.mat[,1:2] - identify the values for min/max outliers, respectively.
# rm.mat[,3:4] - identify if you wish to remove min/max outliers, respectively.
cols <- c(1, 6:12) # specify the columns you wish to examine
rm.mat <- matrix(nrow = length(cols), ncol= 4,
dimnames= list(names(fico2[cols]),
c("out.min", "out.max","rm outliers?", "rm outliers?")))
# add example decision criteria
rm.mat[, 1] <- apply(fico2[, cols], 2, quantile, probs= .05)
rm.mat[, 2] <- apply(fico2[, cols], 2, quantile, probs= .95)
rm.mat[, 3] <- replicate(4, c(0,1))
rm.mat[, 4] <- replicate(4, c(1,0))
Here's my current code for subsetting:
df2 <- fico2 # create a copy of the data frame
cnt <- 1 # add a count variable
for (i in cols) {
# for each column of interest in the data frame. Determine if there are min/max
# outliers that you wish to remove, remove them.
if (rm.mat[cnt, 3] == 1 & rm.mat[cnt, 4] == 1) {
# subset / remove min and max outliers
df2 <- df2[df2[, i] >= rm.mat[cnt, 1] & df2[, i] <= rm.mat[cnt, 2], ]
} else if (rm.mat[cnt, 3] == 1 & rm.mat[cnt, 4] == 0) {
# subset / remove min outliers
df2 <- df2[df2[, i] >= rm.mat[cnt, 1], ]
} else if (rm.mat[cnt, 3] == 0 & rm.mat[cnt, 4] == 1) {
# subset / remove max outliers
df2 <- df2[df2[, i] <= rm.mat[cnt, 2], ]
}
cnt <- cnt + 1
}
proposed solution:
I think I should be able to do this via an apply type function, with the removal of the for loop / vectorization speeding up the code. The problem that I'm running into is that I'm trying to apply a function if-and-only-if the the decision-matrix indicates that I should. IE- using a logical vector rm.mat[,3] or rm.mat[,4] to determine if subsetting "[" should be applied to the dataframe df2.
Any help you have would be greatly appreciated! Also, please let me know if the example data / code is sufficient.
Here a solution. just to clarify your code. Hope that others can use it to give a better solution.
So if understand, you have a decision matrix, that looks like this :
rm.mat
c1 c2 c3 c4
amount.funded.by.investors 27925.000 NA 0 1
monthly.income 11666.670 NA 1 0
open.credit.lines 18.000 NA 0 1
revolving.credit.balance 40788.750 NA 1 0
inquiries.in.the.last.6.months 3.000 NA 0 1
debt.to.inc 28.299 NA 1 0
int.rate 20.490 NA 0 1
fico.num 775.000 NA 1 0
And you try to filter a big matrix according to the values of this matrix
colnames(rm.mat) <- paste('c',1:4,sep='')
rm.mat <- as.data.frame(rm.mat)
apply(rm.mat,1,function(y){
h <- paste(y['c3'],y['c4'],sep='')
switch(h,
'11'= apply(df2,2, function(x)
df2[x >= y['c1'] & x <= y['c2'],]), ## we never have this!!
'10'= apply(df2,2, function(x)
df2[x >= y['c1'] , ]), ## here we apply by columns!
'01'= apply(df2,2,function(x)
df2[x <= y['c2'], ])) ## c2 is NA!! so !!!
}
)