This will make values, which are not in columnA, NA given the conditions (using %>%).
mutate_at(vars(-columnA), funs(((function(x) {
if (is.logical(x))
return(x)
else if (!is.na(as.numeric(x)))
return(as.numeric(x))
else
return(NA)
})(.))))
How can I achieve the same result using mutate_at and nested ifelse?
For example, this does not produce the same result:
mutate_at(vars(-columnA),funs(ifelse(is.logical(.),.,
ifelse(!is.na(as.numeric(.)),as.numeric(.),NA))))
Update (2018-1-5)
The intent of the question is confusing, in part, due to a misconception I had in regard to what was being passed to the function.
This is what I had intended to write:
mutate_at(vars(-columnA), funs(((function(x) {
for(i in 1:length(x))
{
if(!is.na(as.numeric(x[i])) && !is.logical(x[i]))
{
x[i] <- as.numeric(x[i]);
}
else if(!is.na(x[i]))
{
x[i] <- NA
}
}
return(x)
})(.))))
This is a better solution:
mutate_at(vars(-columnA), function(x) {
if(is.logical(x))
return(x)
return(as.numeric(x))
})
ifelse may not be appropriate in this case, as it returns a value that is the same shape as the condition i.e., 1 logical element. In this case, is.logical(.), the result of the condition is of length 1, so the return value will be first element of the column that is passed to the function.
Update (2018-1-6)
Using ifelse, this will return columns that contain logical values or NA as-is and it will apply as.numeric to columns otherwise.
mutate_at(vars(-columnA),funs(
ifelse(. == TRUE | . == FALSE | is.na(.),.,as.numeric(.))))
The main issue is the
else if (!is.na(as.numeric(x)))
return(as.numeric(x))
The if/else works on a vector of length 1. If the length of the vector/column where the function is applied is more than 1, it is better to use ifelse. In the above, the !is.na(as.numeric(x)) returns a logical vector of length more than 1 (assuming that the number of rows in the dataset is greater than 1). The way to make it work is to wrap with all/any (depending on what we need)
f1 <- function(x) {
if (is.logical(x))
return(x)
else if (all(!is.na(as.numeric(x))))
return(as.numeric(x))
else
return(x) #change if needed
}
df1 %>%
mutate_all(f1)
data
set.seed(24)
df1 <- data.frame(col1 = sample(c(TRUE, FALSE), 10, replace = TRUE),
col2 = c(1:8, "Good", 10), col3 = as.character(1:10),
stringsAsFactors = FALSE)
Related
I have created a custom function to replace values with NA to understand how functions work in R:
replacewithna <- function(x) {
if(x == -99) {
return(NA)
}
else {
return(x)
}
}
I have a dataframe with several columns and values which contain "-99" in certain elements and want to apply the custom function I created to each element. I have been able to do this with a for loop:
for (i in 1:nrow(survey2)) {
for (j in 1:ncol(survey2)) {
survey2[i,j] <- replacewithna2(survey2[i,j], NA)
}
}
However, I can't do the same with a lapply. How can I use my replace function with a function from the apply family like so:
survey1 <- lapply(survey1, replacewithna)
Currently I have the following error: "Error in if (x == -99) { : the condition has length > 1"
Try a vectorized version of your function with ifelse or, like below, with is.na<-.
replacewithna <- function(x) {
is.na(x) <- x == -99
x
}
With ifelse it is a one-liner:
replacewithna <- function(x) ifelse(x == -99, NA, x)
Note
With both functions, if survey1 or survey2 are data.frames, the correct way of lapplying the function and keep the dimensions, the tabular format, is
survey1[] <- lapply(survey1, replacewithna)
The square parenthesis are very important.
Here, you can also use sapply (which returns a vector or a matrix, and might be more appropriate here) with replace:
sapply(survey2, function(x) replace(x, x == -99, NA))
I tried to create a If Else Statement to Recode my Variable in a Dummy-Variable.
I Know there is the ifelse() Function and the fastDummy-Package, but I tried this Way without succes.
Why does this not work? I want to learn and understand R in a better Way.
if(df$iscd115==1){
df$iscd1151 <- 1
} else {
df$iscd1151 <- 0
}
This should be a reasonable solution.
First we'll find out what the positions of your important columns are, and then we'll apply a function that will search the rows (margin = 1) that will check if that our important column is 1 or 0, and then modify the other column accordingly.
col1 <- which(names(df) == "iscd115")
col2 <- which(names(df) == "iscd1151")
mat <- apply(df, margin = 1, function(x) {
if (x[col1] == 1) {x[col2] <- 1
} else {
x[col2] == 0
}
x
})
Unfortunately, this transforms the original data frame into a transposed matrix. We can re-transpose the matrix back and turn it back into a data frame with the following.
new_df <- as.data.frame( t(mat))
R has problems when reading .csv files with column names that begin with a number; it changes these names by putting an "X" as the first character.
I am trying to write a function which simply solves this problem (although: is this the easiest way?)
As an example file, I simply created two new (non-sensical) columns in iris:
iris$X12.0 <- iris$Sepal.Length
iris$X18.0 <- iris$Petal.Length
remv.X <- function(x){
if(substr(colnames(x), 1, 1) == "X"){
colnames(x) <- substr(colnames(x), 2, 100)
}
else{
colnames(x) <- substr(colnames(x), 1, 100)
}
}
remv.X(iris)
When printing, I get a warning, and nothing changes.
What do I do wrong?
check.names=FALSE
Use the read.table/read.csv argument check.names = FALSE to turn off column name mangling.
For example,
read.csv(text = "1x,2x\n10,20", check.names = FALSE)
giving:
1x 2x
1 10 20
Removing X using sub
If for some reason you did have an unwanted X character at the beginning of some column names they could be removed like this. This only removes an X at the beginning of columns names for which the next character is a digit. If the next character is not a digit or if there is no next character then the column name is left unchanged.
names(iris) <- sub("^X(\\d.*)", "\\1", names(iris))
or as a function:
rmX <- function(data) setNames(data, sub("^X(\\d.*)", "\\1", names(data)))
# test
iris <- rmX(iris)
Problem with code in question
There are two problems with the code in the question.
in if (condition) ... the condition is a vector but must be a
scalar.
the data frame is never returned.
Here it is fixed up. We have also factored out the LHS of the two legs of the if.
remv.X2 <- function(x) {
for (i in seq_along(x)) {
colnames(x)[i] <- if (substr(colnames(x)[i], 1, 1) == "X") {
substr(colnames(x)[i], 2, 100)
} else {
substr(colnames(x)[i], 1, 100)
}
}
x
}
iris <- remv.X2(iris)
or maybe even:
remv.X3 <- function(x) {
setNames(x, substr(colnames(x), (substr(colnames(x), 1, 1) == "X") + 1, 100))
}
iris <- remv.X3(iris)
There is a table which has two columns with each column having the type character. It is:
"FTGS" "JKLP"
"CVVA" "CVVA"
"HGFF" "CVVD"
"CVVD" "HGFF"
"OPSF" "WQSR"
...
Can somebody tell me how I would write a function that spits out the index (row number) of a specific combination of characters in column1 and 2? If I enter the function (HGFF,CVVD) it would return 3 and 4 (whether the HGFF or CVVD is in column1 or 2 does not matter). If I enter (CVVA,CVVA) it would be 2. The problem is that it should check accross two columns. Is there a solution in R? Otherwise bash would also be fine.
A function like the following should work for you:
myFun <- function(v1, v2, indf) {
x <- sort(c(v1, v2))
which(apply(indf, 1, function(z) all(sort(z) == x)))
}
The usage would be like this (assuming your data are in a data.frame called "mydf"):
myFun("CVVA", "CVVD", indf = mydf)
myFun("HGFF", "CVVD", indf = mydf)
In R, the function that it sounds like you are looking for is which, but it won't do what you are looking for directly.
This also seems to work
fun1 <- function(v1, v2, mat) {
ind <- c(0, -nrow(mat))
indx1 <- which(mat == v1) + ind
indx2 <- which(mat == v2) + ind
if (all(sort(indx1) == sort(indx2))) {
indx1
} else NULL
}
fun1("HGFF","CVVD", mat) #mat is the matrix
#[1] 3 4
fun1("CVVA","CVVD", mat)
#NULL
Basically I have a matrix and row with a in it I want to append a "1" to a list, otherwise append a "0"
The code is as follows:
is.there.A <- function(a,b,c,d,e) {
library(combinat)
x <- c(a,b,c,d,e)
matrix <- matrix(combn(x,3), ncol=3, byrow=T)
row <- nrow(matrix)
list <- list()
for (i in seq(row)) {
if (matrix[i,] %in% "A") {c(list, "1")}
else {c(list, "0")}
print(list)
}
}
But it doesn't work and this shows up.
Warning messages:
1: In if (matrix[i, ] %in% "A") { :
the condition has length > 1 and only the first element will be used
The question is how to overcome this to achieve the objective
You can avoid your explicit loop by using apply
is.there.A <- function(a,b,s,d,e) {
library(combinat)
x <- c(a,b,s,d,e)
.matrix <- matrix(combn(x,3), ncol=3, byrow=T)
any_A <- apply(.matrix, 1, `%in%`, x = 'A')
as.list(as.numeric(any_A))
}
Never grow an object within a for loop, pre-allocate then fill.
Avoid naming objects with function names (eg c or matrix orlist)
You meant to test for "A" %in% matrix[i,], not the other way around. However, note that
row <- nrow(matrix)
list <- list()
for (i in seq(row)) {
if ("A" %in% matrix[i,]) {c(list, "1")}
else {c(list, "0")}
}
can be rewritten
rowSums(matrix == "A") > 0
It returns a vector of logicals (TRUE/FALSE) which is the most appropriate output for your function. However, if you really need a list of '1' or '0', you can wrap it as follows:
as.list(ifelse(rowSums(matrix == "A") > 0, "1", "0"))
Also note that it is a bad idea to name an object matrix since it is also the name of a function in R.