Apply function in data frame - r

I have a data frame named Cat. I have multiple columns. In one vector named
Jan.15_Transaction I have values. I want to apply a condition that if value is greater than 0 then 1 else 0. So I do not want to use if else condition as there are 42 columns similar to this in which I want to apply the same the same logic.
Jan.15_Transaction Feb.15_Transaction
1 1
2 2
3 3
4 4
Hence I build this function
myfunc <- function(x){
if(x > 0){
x=1
}
else {
x=0
}
return(x)
}
This is getting applied to first element only when I use this code.
Cat$Jan.15_Transaction.1<-myfunc(Cat$Jan.15_Transaction)
Warning message:
In if (x > 0) { :
the condition has length > 1 and only the first element will be used
So I tried sapply and got this error below
sapply(Cat$Jan.15_Transaction.1, myfunction(Cat))
Error in match.fun(FUN) : argument "FUN" is missing, with no default

You can use the ifelse function to vectorise (= apply across a vector) an if statement:
myfunc = function (x)
ifelse(x > 0, 1, 0)
Alternatively, you could use the following which is more efficient (but less readable):
myfunc = function (x)
as.integer(x > 0)
Coming back to your original function, your way of writing it is very un-R-like. A more R-like implementation would look like this:
myfunc = function (x)
if (x > 0) 1 else 0
— No need for a temporary variable, assignments, or the return statement.

I am assuming you want to apply the function on columns which have names ending with '_Transaction'. This can be done with the base function grepl.
vars <- grepl('_Transaction', names(df))
df[, vars] <- ifelse(df[, vars] > 0, 1, 0)
You could also use dplyr like shown below. This would generalize to more complicated functions too.
binarizer <- function(x) ifelse(x > 0, 1, 0)
df <- bind_cols(
df %>% select(-ends_with('_Transaction')),
df %>% select(ends_with('_Transaction')) %>%
mutate_each(funs(binarizer))
)

Related

How to apply own function using lapply?

I have created a custom function to replace values with NA to understand how functions work in R:
replacewithna <- function(x) {
if(x == -99) {
return(NA)
}
else {
return(x)
}
}
I have a dataframe with several columns and values which contain "-99" in certain elements and want to apply the custom function I created to each element. I have been able to do this with a for loop:
for (i in 1:nrow(survey2)) {
for (j in 1:ncol(survey2)) {
survey2[i,j] <- replacewithna2(survey2[i,j], NA)
}
}
However, I can't do the same with a lapply. How can I use my replace function with a function from the apply family like so:
survey1 <- lapply(survey1, replacewithna)
Currently I have the following error: "Error in if (x == -99) { : the condition has length > 1"
Try a vectorized version of your function with ifelse or, like below, with is.na<-.
replacewithna <- function(x) {
is.na(x) <- x == -99
x
}
With ifelse it is a one-liner:
replacewithna <- function(x) ifelse(x == -99, NA, x)
Note
With both functions, if survey1 or survey2 are data.frames, the correct way of lapplying the function and keep the dimensions, the tabular format, is
survey1[] <- lapply(survey1, replacewithna)
The square parenthesis are very important.
Here, you can also use sapply (which returns a vector or a matrix, and might be more appropriate here) with replace:
sapply(survey2, function(x) replace(x, x == -99, NA))

Use if else statement for Dummy-Coding in R

I tried to create a If Else Statement to Recode my Variable in a Dummy-Variable.
I Know there is the ifelse() Function and the fastDummy-Package, but I tried this Way without succes.
Why does this not work? I want to learn and understand R in a better Way.
if(df$iscd115==1){
df$iscd1151 <- 1
} else {
df$iscd1151 <- 0
}
This should be a reasonable solution.
First we'll find out what the positions of your important columns are, and then we'll apply a function that will search the rows (margin = 1) that will check if that our important column is 1 or 0, and then modify the other column accordingly.
col1 <- which(names(df) == "iscd115")
col2 <- which(names(df) == "iscd1151")
mat <- apply(df, margin = 1, function(x) {
if (x[col1] == 1) {x[col2] <- 1
} else {
x[col2] == 0
}
x
})
Unfortunately, this transforms the original data frame into a transposed matrix. We can re-transpose the matrix back and turn it back into a data frame with the following.
new_df <- as.data.frame( t(mat))

How can I use apply properly in R in this dataframe column?

I have a dataframe column with NA, I want to how can I use apply (or lapply, sapply, ...) to the column.
I've tried with apply and lapply, but it return an error.
The function I want to apply to the column is:
a.b <- function(x, y = 165){
if (x < y)
return('Good')
else if (x > y)
return('Bad')
}
the column of the dataframe is:
data$col = 180 170 NA NA 185 185
When I use apply I get:
apply(data$col, 2, a.b)
Error in apply(data$col, 2, a.b) :
dim(X) must have a positive length
I have try dim(data$col) and the return is NULL and I think it is because of the NA's.
I also use lapply and I get:
lapply(data$col, a.b)
Error in if (x < y) return("Good") else if (x > y) return("Bad") :
missing value where TRUE/FALSE needed
This is for a course of R for beginners that I am doing so I am sorry if I made some mistakes. Thanks for taking your time to read it and trying to help.
apply is used on a matrix, not a vector. Try:
a.b <- function(x, y = 165){
if (is.na(x)){
return("NA")
} else if (x < y){
return('Good')} else if (x > y){
return('Bad')}
}
data$col=sapply(data$col,a.b)
You should be able to solve this with mapply by specifying the values to pass into your parameters:
mapply(a.b, x = data[,'col'], y = 165)
Note that you may need to modify your a.b.() function in order to manage the NA's.
There's a few issues going on here:
apply is meant to run on a something with a dimension to act over, which is the MARGIN argument. A column, which you're passing to apply has no dimension. see below:
> dim(mtcars)
[1] 32 11
> dim(mtcars$cyl)
NULL
apply and lapply are meant to run over all columns (or rows if you're using that margin for apply). If you want to just replace one column, you should not use apply. Do something like data$my_col <- my_func(data$my_col) if you want to replace my_col with the result of passing it to my_func
NA values do not return TRUE or FALSE when using an operator on them. Note that 7 < NA will return NA. Your if statement is looking for a TRUE or FALSE value but getting an NA value, hence the error in your second attempt. If you want to handle NA values, you may need to incorporate that into your function with is.na.
Your function should be vectorized. See circle 3 of the R-Inferno. Currently, it will just return length 1 vectors of "Good" or "Bad". My hunch is what you want is similar to the following (although not exactly same if x == y)
a.b <- function(x, y = 165){
ifelse(x < y, "Good", "Bad")
}
I beleive using the above info should get you where you want to be.

Optimize code to filter R dataframe

I have some R code that takes in the args string from the command line and then filters a dataframe based on values in a column; the args string contains the column names. Right now I'm doing it by looping through the vector but something tells me that there has to be a better way. Is there a way to optimize this code?
args = c("col1","col2")
for(i in args){
df = df[df[,i]==0,]
}
If I understand correctly, you want to keep the rows where all of the args are equal to 0 (or any other given value).
First get the indices of the columns you're interested in:
idx <- match(args, colnames(df))
Then you can simply do:
df <- df[apply(df[, idx], 1, function(x) all(x == 0)), ]
Another possibility:
df <- df[rowSums(df[, idx] != 0) == 0, ]

R - using apply on numeric matrix with shapiro.test() gives error: all 'x' values are identical

I have a data.frame df with > 110 000 rows. It looks like that:
traking_id A1_CTRL A2_CTRL A3_CTRL A4_CTRL A5_CTRL A1_DEX A2_DEX A3_DEX A4_DEX A5_DEX
1 ENSMUST00000000001 1.35358e+01 1.03390e+01 1.03016e+01 1.12654e+01 1.22707e+01 1.40684e+01 9.15279e+00 1.17276e+01 1.14550e+01 1.46256e+01
2 ENSMUST00000000003 5.01868e-06 5.59107e-06 1.60922e-01 2.45402e-01 2.18614e-01 2.24124e-01 2.88035e-01 7.18876e-06 1.74746e-06 0.00000e+00
...
I'm interested in perform shapiro.test twice for each row - once for values in columns 2:6, an once for values in columns 7:11.
I want to obtain two lists of objects that function shapiro.test returns in order to extract from them p.value column. I want to do it by using function apply, but my code
shapiro.test_CTRL <- apply(data.matrix(df[,2:6]), 1, shapiro.test)
returns an error
Error in FUN(newX[, i], ...) : all 'x' values are identical
However, when I use pearson.test everything works fine:
pearson.test_CTRL <- apply(data.matrix(df[,2:6]), 1, pearson.test)
Calculating shapiro.test just for one row also works fine:
shapiro.test(data.matrix(x[1,2:6]))
I would like to know why using apply with shapiro.test the way I did resulted in error and how to correctly do it?
If you look at the source for shapiro.test it has this line:
...
x <- sort(x[complete.cases(x)])
n <- length(x)
if (is.na(n) || n < 3L || n > 5000L)
stop("sample size must be between 3 and 5000")
rng <- x[n] - x[1L]
if (rng == 0)
stop("all 'x' values are identical")
...
This error is triggered the values of your row are all the same. The same error can be triggered with this code:
mtcars[2,] <- 1
apply(mtcars[,2:5], 1, shapiro.test)
You can avoid this error by testing for that condition and returning something else:
f <- function(x) {
if (diff(range(x)) == 0) list() else shapiro.test(x)
}
apply(mtcars[,2:5], 1, f)

Resources