Use if else statement for Dummy-Coding in R

Use if else statement for Dummy-Coding in R - r

I tried to create a If Else Statement to Recode my Variable in a Dummy-Variable.
I Know there is the ifelse() Function and the fastDummy-Package, but I tried this Way without succes.
Why does this not work? I want to learn and understand R in a better Way.
if(df$iscd115==1){
df$iscd1151 <- 1
} else {
df$iscd1151 <- 0
}

This should be a reasonable solution.
First we'll find out what the positions of your important columns are, and then we'll apply a function that will search the rows (margin = 1) that will check if that our important column is 1 or 0, and then modify the other column accordingly.
col1 <- which(names(df) == "iscd115")
col2 <- which(names(df) == "iscd1151")
mat <- apply(df, margin = 1, function(x) {
if (x[col1] == 1) {x[col2] <- 1
} else {
x[col2] == 0
}
x
})
Unfortunately, this transforms the original data frame into a transposed matrix. We can re-transpose the matrix back and turn it back into a data frame with the following.
new_df <- as.data.frame( t(mat))

Related

Use mutate_at with nested ifelse

This will make values, which are not in columnA, NA given the conditions (using %>%).
mutate_at(vars(-columnA), funs(((function(x) {
if (is.logical(x))
return(x)
else if (!is.na(as.numeric(x)))
return(as.numeric(x))
else
return(NA)
})(.))))
How can I achieve the same result using mutate_at and nested ifelse?
For example, this does not produce the same result:
mutate_at(vars(-columnA),funs(ifelse(is.logical(.),.,
ifelse(!is.na(as.numeric(.)),as.numeric(.),NA))))
Update (2018-1-5)
The intent of the question is confusing, in part, due to a misconception I had in regard to what was being passed to the function.
This is what I had intended to write:
mutate_at(vars(-columnA), funs(((function(x) {
for(i in 1:length(x))
{
if(!is.na(as.numeric(x[i])) && !is.logical(x[i]))
{
x[i] <- as.numeric(x[i]);
}
else if(!is.na(x[i]))
{
x[i] <- NA
}
}
return(x)
})(.))))
This is a better solution:
mutate_at(vars(-columnA), function(x) {
if(is.logical(x))
return(x)
return(as.numeric(x))
})
ifelse may not be appropriate in this case, as it returns a value that is the same shape as the condition i.e., 1 logical element. In this case, is.logical(.), the result of the condition is of length 1, so the return value will be first element of the column that is passed to the function.
Update (2018-1-6)
Using ifelse, this will return columns that contain logical values or NA as-is and it will apply as.numeric to columns otherwise.
mutate_at(vars(-columnA),funs(
ifelse(. == TRUE | . == FALSE | is.na(.),.,as.numeric(.))))

The main issue is the
else if (!is.na(as.numeric(x)))
return(as.numeric(x))
The if/else works on a vector of length 1. If the length of the vector/column where the function is applied is more than 1, it is better to use ifelse. In the above, the !is.na(as.numeric(x)) returns a logical vector of length more than 1 (assuming that the number of rows in the dataset is greater than 1). The way to make it work is to wrap with all/any (depending on what we need)
f1 <- function(x) {
if (is.logical(x))
return(x)
else if (all(!is.na(as.numeric(x))))
return(as.numeric(x))
else
return(x) #change if needed
}
df1 %>%
mutate_all(f1)
data
set.seed(24)
df1 <- data.frame(col1 = sample(c(TRUE, FALSE), 10, replace = TRUE),
col2 = c(1:8, "Good", 10), col3 = as.character(1:10),
stringsAsFactors = FALSE)

Optimize code to filter R dataframe

I have some R code that takes in the args string from the command line and then filters a dataframe based on values in a column; the args string contains the column names. Right now I'm doing it by looping through the vector but something tells me that there has to be a better way. Is there a way to optimize this code?
args = c("col1","col2")
for(i in args){
df = df[df[,i]==0,]
}

If I understand correctly, you want to keep the rows where all of the args are equal to 0 (or any other given value).
First get the indices of the columns you're interested in:
idx <- match(args, colnames(df))
Then you can simply do:
df <- df[apply(df[, idx], 1, function(x) all(x == 0)), ]
Another possibility:
df <- df[rowSums(df[, idx] != 0) == 0, ]

Fill data.frame with missing columns

I have the following function taken from R: iterative outliers detection (this is an updated version):
dropout<-function(x) {
outliers <- NULL
res <- NULL
if(length(x)<2) return (1)
vals <- rep.int(1, length(x))
r <- chisq.out.test(x)
while (r$p.value<.05 & sum(vals==1)>2) {
if (grepl("highest",r$alternative)) {
d <- which.max(ifelse(vals==1,x, NA))
res <- rbind(list(as.numeric(strsplit(r$alternative," ")[[1]][3]),as.numeric(r$p.value)),fill=TRUE)
}
else {
d <- which.min(ifelse(vals==1, x, NA))
}
vals[d] <- r$p.value
r <- chisq.out.test(x[vals==1])
}
return(res)
}
The problem is that in each round it gives me some missing rows to fill in the data.frame
i want to fill res but in some iterations it contains missing values.
I used all possible things e.g rbindlist, rbind.fill, rbind (with fill=TRUE) but nothing is working.
When i do something like :
res <- c(res,as.numeric(strsplit(r$alternative," ")[[1]][3]),as.numeric(r$p.value))
it works but it creates 2 rows for each set of (V1,V2), one with the last column as r$alternativeand the second row with the same first 2 columns but with the p-value in the last column instead.
Thats how I'm calling the function on data similar as the one in the mentioned question:
outliers <- d[, dropout(V3), list(V1, V2)]
and im getting always this error : j doesn't evaluate to the same number of columns for each group

How to set a column value based on values in another column in R

I am trying to add a new column based on values in another column. (Basically if the other column is missing or 0, set the new value to 0 or to 1)
What's wrong with this code below?
times=nrow(eachfile)
for(i in 1:times)
{eachfile$SalesCycleN0[i] <- ifelse(eachfile$R[i]==NA | eachfile$R[i]==0,0,1 ) }
table(eachfile$SalesCycleN0)

As long as you have tested that the column only contains 0, 1 and NA I would do:
eachfile$SalesCycleN0 <- 1
eachfile$SalesCycleN0[is.na(eachfile$R) | eachfile$R==0] <- 0

Nothing is ever "==" to NA. Just do this (no loop):
eachfile$SalesCycleN0 <- ifelse( is.na(eachfile$R) | eachfile$R==0, 0,1 )
If you were looking for a little more economy in code this might also work:
eachfile$SalesCycleN0 <- as.numeric( !grepl("^0$", eachfile$R) )
grepl returns FALSE for NA's.

A more efficient way of doing this is using the sapply function, rather than using a for loop (handy in case of huge dataset). Here is an example:
df = data.frame(x = c(1,2,0,NA,5))
fun = function(i) {is.na(df$x[i]) || (df$x[i] == 0)}
bin <- (sapply(1:nrow(df), FUN = fun))*1 ## multiplying by 1 will convert the logical vector to a binary one.
df <- cbind(df, bin)
In your case:
fun = function(i) {is.na(eachfile$SalesCycleNO[i]) || (eachfile$SalesCycleNO[i] == 0)}
bin <- (sapply(1:times, FUN = fun))*1
eachfile <- cbind(eachfile, bin)

Check if something is in each row (row length>1)

Basically I have a matrix and row with a in it I want to append a "1" to a list, otherwise append a "0"
The code is as follows:
is.there.A <- function(a,b,c,d,e) {
library(combinat)
x <- c(a,b,c,d,e)
matrix <- matrix(combn(x,3), ncol=3, byrow=T)
row <- nrow(matrix)
list <- list()
for (i in seq(row)) {
if (matrix[i,] %in% "A") {c(list, "1")}
else {c(list, "0")}
print(list)
}
}
But it doesn't work and this shows up.
Warning messages:
1: In if (matrix[i, ] %in% "A") { :
the condition has length > 1 and only the first element will be used
The question is how to overcome this to achieve the objective

You can avoid your explicit loop by using apply
is.there.A <- function(a,b,s,d,e) {
library(combinat)
x <- c(a,b,s,d,e)
.matrix <- matrix(combn(x,3), ncol=3, byrow=T)
any_A <- apply(.matrix, 1, `%in%`, x = 'A')
as.list(as.numeric(any_A))
}
Never grow an object within a for loop, pre-allocate then fill.
Avoid naming objects with function names (eg c or matrix orlist)

You meant to test for "A" %in% matrix[i,], not the other way around. However, note that
row <- nrow(matrix)
list <- list()
for (i in seq(row)) {
if ("A" %in% matrix[i,]) {c(list, "1")}
else {c(list, "0")}
}
can be rewritten
rowSums(matrix == "A") > 0
It returns a vector of logicals (TRUE/FALSE) which is the most appropriate output for your function. However, if you really need a list of '1' or '0', you can wrap it as follows:
as.list(ifelse(rowSums(matrix == "A") > 0, "1", "0"))
Also note that it is a bad idea to name an object matrix since it is also the name of a function in R.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Use if else statement for Dummy-Coding in R - r

Related

Use mutate_at with nested ifelse

Optimize code to filter R dataframe

Fill data.frame with missing columns

How to set a column value based on values in another column in R

Check if something is in each row (row length>1)

Categories

Resources