How to apply own function using lapply? - r

I have created a custom function to replace values with NA to understand how functions work in R:
replacewithna <- function(x) {
if(x == -99) {
return(NA)
}
else {
return(x)
}
}
I have a dataframe with several columns and values which contain "-99" in certain elements and want to apply the custom function I created to each element. I have been able to do this with a for loop:
for (i in 1:nrow(survey2)) {
for (j in 1:ncol(survey2)) {
survey2[i,j] <- replacewithna2(survey2[i,j], NA)
}
}
However, I can't do the same with a lapply. How can I use my replace function with a function from the apply family like so:
survey1 <- lapply(survey1, replacewithna)
Currently I have the following error: "Error in if (x == -99) { : the condition has length > 1"

Try a vectorized version of your function with ifelse or, like below, with is.na<-.
replacewithna <- function(x) {
is.na(x) <- x == -99
x
}
With ifelse it is a one-liner:
replacewithna <- function(x) ifelse(x == -99, NA, x)
Note
With both functions, if survey1 or survey2 are data.frames, the correct way of lapplying the function and keep the dimensions, the tabular format, is
survey1[] <- lapply(survey1, replacewithna)
The square parenthesis are very important.

Here, you can also use sapply (which returns a vector or a matrix, and might be more appropriate here) with replace:
sapply(survey2, function(x) replace(x, x == -99, NA))

Related

Use if else statement for Dummy-Coding in R

I tried to create a If Else Statement to Recode my Variable in a Dummy-Variable.
I Know there is the ifelse() Function and the fastDummy-Package, but I tried this Way without succes.
Why does this not work? I want to learn and understand R in a better Way.
if(df$iscd115==1){
df$iscd1151 <- 1
} else {
df$iscd1151 <- 0
}
This should be a reasonable solution.
First we'll find out what the positions of your important columns are, and then we'll apply a function that will search the rows (margin = 1) that will check if that our important column is 1 or 0, and then modify the other column accordingly.
col1 <- which(names(df) == "iscd115")
col2 <- which(names(df) == "iscd1151")
mat <- apply(df, margin = 1, function(x) {
if (x[col1] == 1) {x[col2] <- 1
} else {
x[col2] == 0
}
x
})
Unfortunately, this transforms the original data frame into a transposed matrix. We can re-transpose the matrix back and turn it back into a data frame with the following.
new_df <- as.data.frame( t(mat))

Use mutate_at with nested ifelse

This will make values, which are not in columnA, NA given the conditions (using %>%).
mutate_at(vars(-columnA), funs(((function(x) {
if (is.logical(x))
return(x)
else if (!is.na(as.numeric(x)))
return(as.numeric(x))
else
return(NA)
})(.))))
How can I achieve the same result using mutate_at and nested ifelse?
For example, this does not produce the same result:
mutate_at(vars(-columnA),funs(ifelse(is.logical(.),.,
ifelse(!is.na(as.numeric(.)),as.numeric(.),NA))))
Update (2018-1-5)
The intent of the question is confusing, in part, due to a misconception I had in regard to what was being passed to the function.
This is what I had intended to write:
mutate_at(vars(-columnA), funs(((function(x) {
for(i in 1:length(x))
{
if(!is.na(as.numeric(x[i])) && !is.logical(x[i]))
{
x[i] <- as.numeric(x[i]);
}
else if(!is.na(x[i]))
{
x[i] <- NA
}
}
return(x)
})(.))))
This is a better solution:
mutate_at(vars(-columnA), function(x) {
if(is.logical(x))
return(x)
return(as.numeric(x))
})
ifelse may not be appropriate in this case, as it returns a value that is the same shape as the condition i.e., 1 logical element. In this case, is.logical(.), the result of the condition is of length 1, so the return value will be first element of the column that is passed to the function.
Update (2018-1-6)
Using ifelse, this will return columns that contain logical values or NA as-is and it will apply as.numeric to columns otherwise.
mutate_at(vars(-columnA),funs(
ifelse(. == TRUE | . == FALSE | is.na(.),.,as.numeric(.))))
The main issue is the
else if (!is.na(as.numeric(x)))
return(as.numeric(x))
The if/else works on a vector of length 1. If the length of the vector/column where the function is applied is more than 1, it is better to use ifelse. In the above, the !is.na(as.numeric(x)) returns a logical vector of length more than 1 (assuming that the number of rows in the dataset is greater than 1). The way to make it work is to wrap with all/any (depending on what we need)
f1 <- function(x) {
if (is.logical(x))
return(x)
else if (all(!is.na(as.numeric(x))))
return(as.numeric(x))
else
return(x) #change if needed
}
df1 %>%
mutate_all(f1)
data
set.seed(24)
df1 <- data.frame(col1 = sample(c(TRUE, FALSE), 10, replace = TRUE),
col2 = c(1:8, "Good", 10), col3 = as.character(1:10),
stringsAsFactors = FALSE)

Apply function in data frame

I have a data frame named Cat. I have multiple columns. In one vector named
Jan.15_Transaction I have values. I want to apply a condition that if value is greater than 0 then 1 else 0. So I do not want to use if else condition as there are 42 columns similar to this in which I want to apply the same the same logic.
Jan.15_Transaction Feb.15_Transaction
1 1
2 2
3 3
4 4
Hence I build this function
myfunc <- function(x){
if(x > 0){
x=1
}
else {
x=0
}
return(x)
}
This is getting applied to first element only when I use this code.
Cat$Jan.15_Transaction.1<-myfunc(Cat$Jan.15_Transaction)
Warning message:
In if (x > 0) { :
the condition has length > 1 and only the first element will be used
So I tried sapply and got this error below
sapply(Cat$Jan.15_Transaction.1, myfunction(Cat))
Error in match.fun(FUN) : argument "FUN" is missing, with no default
You can use the ifelse function to vectorise (= apply across a vector) an if statement:
myfunc = function (x)
ifelse(x > 0, 1, 0)
Alternatively, you could use the following which is more efficient (but less readable):
myfunc = function (x)
as.integer(x > 0)
Coming back to your original function, your way of writing it is very un-R-like. A more R-like implementation would look like this:
myfunc = function (x)
if (x > 0) 1 else 0
— No need for a temporary variable, assignments, or the return statement.
I am assuming you want to apply the function on columns which have names ending with '_Transaction'. This can be done with the base function grepl.
vars <- grepl('_Transaction', names(df))
df[, vars] <- ifelse(df[, vars] > 0, 1, 0)
You could also use dplyr like shown below. This would generalize to more complicated functions too.
binarizer <- function(x) ifelse(x > 0, 1, 0)
df <- bind_cols(
df %>% select(-ends_with('_Transaction')),
df %>% select(ends_with('_Transaction')) %>%
mutate_each(funs(binarizer))
)

How to get a table for all the factor variables having less than a specific number of levels?

So I'm trying to make tables for all the factor variables with less than 3 levels in my data frame.
My current method is as follows:
df=data.frame(a=1:10, b=as.factor(c(rep(1,5),rep(2,5))), c=as.factor(c(rep(1,3),rep(2,5),rep(3,2)))) ## Dataset
myfun = function(x) {
if(is.factor(x) && levels(x) < 3) {
table(x)
}
}
tab = apply(df, 2, myfun)
While this is working, could I do the same thing using lapply?
Using on of the apply functions, we could do this:
myfun = function(x) {
if(length(unique(x)) < 3){ table(x) }
}
As the example showed have no factor variables and are all numeric, we can loop through the columns with lapply and get the table if the length of unique elements is less than 3 or else return NA.
lapply(df, function(x) if(length(unique(x)) < 3) table(x) else NA)
NOTE: Based on the OP's first example.
Update
If there are infact factor variables, it is not good to use apply as it converts it to a matrix and matrix can hold only character or numeric class thereby the factor column coerce to character and using levels will be useless on that case. We can use lapply for that
lapply(df, function(x) if(is.factor(x) & nlevels(x) <3) table(x) else NA)

Get index of character patterns in table

There is a table which has two columns with each column having the type character. It is:
"FTGS" "JKLP"
"CVVA" "CVVA"
"HGFF" "CVVD"
"CVVD" "HGFF"
"OPSF" "WQSR"
...
Can somebody tell me how I would write a function that spits out the index (row number) of a specific combination of characters in column1 and 2? If I enter the function (HGFF,CVVD) it would return 3 and 4 (whether the HGFF or CVVD is in column1 or 2 does not matter). If I enter (CVVA,CVVA) it would be 2. The problem is that it should check accross two columns. Is there a solution in R? Otherwise bash would also be fine.
A function like the following should work for you:
myFun <- function(v1, v2, indf) {
x <- sort(c(v1, v2))
which(apply(indf, 1, function(z) all(sort(z) == x)))
}
The usage would be like this (assuming your data are in a data.frame called "mydf"):
myFun("CVVA", "CVVD", indf = mydf)
myFun("HGFF", "CVVD", indf = mydf)
In R, the function that it sounds like you are looking for is which, but it won't do what you are looking for directly.
This also seems to work
fun1 <- function(v1, v2, mat) {
ind <- c(0, -nrow(mat))
indx1 <- which(mat == v1) + ind
indx2 <- which(mat == v2) + ind
if (all(sort(indx1) == sort(indx2))) {
indx1
} else NULL
}
fun1("HGFF","CVVD", mat) #mat is the matrix
#[1] 3 4
fun1("CVVA","CVVD", mat)
#NULL

Resources