I am performing Data Analysis and cleaning in R using tidyverse.
I have a Data Frame with 23 columns containing values 'NO','STEADY','UP' and 'down'.
I want to change all the values in these 23 columns to 0 in case of 'NO','STEADY' and 1 in other case.
What i did is, i created a list by name keys in which i have kept all my columns, After that i am using for loop, ifelse statements and mutate.
Please have a look at the code below
# Column names are kept in the list by name keys
keys = c('metformin', 'repaglinide', 'nateglinide', 'chlorpropamide', 'glimepiride',
'glipizide', 'glyburide', 'pioglitazone', 'rosiglitazone', 'acarbose', 'miglitol',
'insulin', 'glyburide-metformin', 'tolazamide', 'metformin-pioglitazone',
'metformin-rosiglitazone', 'glimepiride-pioglitazone', 'glipizide-metformin',
'troglitazone', 'tolbutamide', 'acetohexamide')
After that, i used following code to get the desired result :
for (col in keys){
Dataset = Dataset %>%
mutate(col = ifelse(col %in% c('No','Steady'),0,1)) }
I was expecting that, it will do the changes that i require, but nothing happens after this. (NO ERROR MESSAGE AND NO DESIRED RESULT)
After that, i researched further and executed following code
for (col in keys){
print(col)}
It gives me elements of list as characters like - "metformin"
So, i thought - may be this is the issue. Hence, i used the below code to caste the keys as symbols :
keys_new = sym(keys)
After that i again ran the same code:
for (col in keys_new){
Dataset = Dataset %>%
mutate(col = ifelse(col %in% c('No','Steady'),0,1))}
It gives me following Error -
Error in match(x, table, nomatch = 0L) :
'match' requires vector arguments
After all this. I also tried to create a function to get the desired results, but that too didn't worked:
change = function(name){
Dataset = Dataset %>%
mutate(name = ifelse(name %in% c('No','Steady'),0,1),
name = as.factor(name))
return(Dataset)}
for (col in keys){
change(col)}
This didn't perform any action. (NO ERROR MESSAGE AND NO DESIRED RESULT)
When keys_new is placed in this code:
for (col in keys_new){
change(col)}
I got the same Error :
Error in match(x, table, nomatch = 0L) :
'match' requires vector arguments
PLEASE GUIDE
There's no need to loop or keep track of column names. You can use mutate_all -
Dataset %>%
mutate_all(~ifelse(. %in% c('No','Steady'), 0, 1))
Another way, thanks to Rui Barradas -
Dataset %>%
mutate_all(~as.integer(!. %in% c('No','Steady')))
There's a simpler way using mutate_at and case_when.
Dataset %>% mutate_at(keys, ~case_when(. %in% c("NO", "STEADY") ~ 0, TRUE ~ 1))
mutate_at will only mutate the columns specified in the keys variable. case_when then lets you replace one value by another by some condition.
This answer for using mutate through forloop.
I don't have your data, so i tried to make my own data, i changed the keys into a tibble using enframe then spread it into columns and used the row number as a value for each column, then check if the value is higher than 10 or not.
To use the column name in mutate you have to use !! and := in the mutate function
df <- enframe(c('metformin', 'repaglinide', 'nateglinide', 'chlorpropamide', 'glimepiride',
'glipizide', 'glyburide', 'pioglitazone', 'rosiglitazone', 'acarbose', 'miglitol',
'insulin', 'glyburide-metformin', 'tolazamide', 'metformin-pioglitazone',
'metformin-rosiglitazone', 'glimepiride-pioglitazone', 'glipizide-metformin',
'troglitazone', 'tolbutamide', 'acetohexamide')
) %>% spread(key = value,value = name)
keys = c('metformin', 'repaglinide', 'nateglinide', 'chlorpropamide', 'glimepiride',
'glipizide', 'glyburide', 'pioglitazone', 'rosiglitazone', 'acarbose', 'miglitol',
'insulin', 'glyburide-metformin', 'tolazamide', 'metformin-pioglitazone',
'metformin-rosiglitazone', 'glimepiride-pioglitazone', 'glipizide-metformin',
'troglitazone', 'tolbutamide', 'acetohexamide')
for (col in keys){
df = df %>%
mutate(!!as.character(col) := ifelse( df[col] > 10,0,100) )
}
Related
I am trying to add a new column, currency, to df "myfile".
The contents of that column are conditional like if the year column fulfills this condition then the new column has this value, else another value.
When I tried if else without the loop, it says >1, so I guessed if else couldn’t work for a vector with multiple elements, I could use a for loop, but then this error showed:
myfile$currency <- myfile %>% for (i in year) {if(year>2000){print("Latest")}else{"Oldest"}}
Error in for (. in i) year : 4 arguments passed to 'for' which requires 3
You can use ifelse in mutate. See the documentation for dpylr.
library(dplyr)
myfile <- myfile %>%
mutate(
currency = ifelse(year > 2000, "latest", "oldest")
)
If you have more conditions, see case_when.
Or you can do something like this:
myfile$currency[myfile$year > 2000] <- "latest"
myfile$currency[myfile$year <= 2000] <- "oldest"
I'm trying to use loop through column names in dataframe.
I have a dataset, testset, which has about 100 columns. To simplify, I will call these columns in alphabet, a, b, c, ... and so on.
I made function to mutate values in testset into other value.
mutatefun<-function(i){
i1<-rlang::quo_name(enquo(i))
output<-testset%>%mutate(!! i1:=case_when({{i}} %in% c(1)~0,
{{i}} %in% c(2)~50,
{{i}} %in% c(3)~100,
TRUE~NA_real_))
return(output)
}
When i ran mutatefun(a), it worked. Now I want to use this function for multiple columns in testset
With help of this post R: looping through column names, I created varlist and ran this code.
varlist<-c("a","c","e","k")
for(i in varlist){
output<-mutatefun({{i}})}
However, it didn't return expected results. Instead, it return column a as NA. I think it might read i as only character, not object in testset.
Would you let me know how to solve this problem?
Try using map_dfc with transmute
library(dplyr)
purrr::map_dfc(varlist, ~{
var <- sym(.x)
transmute(testset, !!.x := case_when(!!var == 1~0,
!!var == 2~50,
!!var == 3~100,
TRUE~ NA_real_))
})
If you want other columns which are not in varlist, you may add
%>% bind_cols(testset %>% select(-varlist))
at the end of the chain.
I have two datasets, I'm trying to join together. the column i am joining by does not exactly match up with each other. first file the column looks like this: 00:01:54:2145 etc. 00: for every single observation. I want to change all the observations in this column to be in this format: 01/54/2145.
I have tried several things with string package, but can't get it to work.
df1 <- df %>%
str_replace_all("00:")
I'm getting this error, but don't think that's the only problem:
argument is not an atomic vector; coercing
Thank you
library(stringr)
library(dplyr)
my_conversion <- Vectorize(function(str) {
str_replace(str, "^00:", "") %>%
str_replace_all(":", "/")
})
df <- data.frame(
a_column = 1:3, key_column = c("00:01:54:2145", "00:01:54:2145", "00:01:54:2145"))
df %>% mutate(key_column = my_conversion(key_column))
My dataframe, dat, has two columns which look like this:
value condition
2 learning/cat
4 learning/dog
1 naming/cat
6 naming/dog
I would like to 'trim' the data frame to only include rows in which condition contains "naming".
I've tried to do this with grep:
dat = dat[grep("naming", dat$condition, value = T)]
which causes the following error:
Error in `[.data.frame`(dat, grep("naming", dat$condition, value = T)) :
undefined columns selected
Can anyone suggest a fix? Any help would be greatly appreciated!
You can split up condition using separate from tidyr:
df = input_df %>% separate( condition, into = c("condition1", "condition2"), sep = "/")
Then just use filter:
only_naming_df = df %>% filter(condition1 == "naming")
The error is easy to fix once adding a comma after the parenthesis. But I want to have a list of available options to achieve this task. Belows are solution and comments from others and mine.
Use grep or grepl
grep returns the index (row number), while grepl returns a logical vector (TRUE or FALSE). Notice that when using grep in this case, value = T should not be added because it will return the string, which is not helpful for subsetting.
dat[grep("naming", dat$condition), ]
dat[grepl("naming", dat$condition), ]
Functions from dplyr and stringr
str_detect is equivalent to grepl(pattern, x), while str_which is equivalent to grep(pattern, x).
library(dplyr)
library(stringr)
dat %>% filter(str_detect(condition, "naming"))
dat %>% slice(str_which(condition, "naming"))
Data Preparation
# Create example dataframes
dat <- read.table(text = "value condition
2 learning/cat
4 learning/dog
1 naming/cat
6 naming/dog",
header = TRUE, stringsAsFactors = FALSE)
I have data frame which contain column, which is list.
data frame contain json reponse as column, and second column is list which is converted from JSON using following code.
vectorize_fromJSON <- Vectorize(fromJSON, USE.NAMES=FALSE)
z <- vectorize_fromJSON(data_df$json_response)
I am using rowwise with do function to extract information for list.
However, I am not able to use if with it.
Working code
t <- data_df %>% rowwise %>% do(
test = class(.$json_list$cbas$dslscc)
)
I want something like as follows:
t <- data_df %>% rowwise %>% do(
test = ifelse(class(.$json_list$cbas$dslscc)=="list", TRUE,
.$json_list$cbas$dslscc)
)
following is error:
Error in
.$json_list$clear_bank_attributes$days_since_last_successful_check_cashed$nil
: $ operator is invalid for atomic vectors