I am trying to remove a row in a dataframe based on string matching. I'm using:
data <- data[- grep("my_string", data$field1),]
When there's an actual row with the value "my_string" in data$field1 this works as expected and it drops that row. However, if there is no string "my_string", it creates an empty dataframe. How to I do write this so that it allows for the possibility of the string to not exist, and still keeps my data frame intact?
It may be better to use grepl and negate with !
data[!grepl("my_string", data$field1),]
Or another option is setdiff on grep
data[setdiff(seq_len(nrow(data)), grep("my_string", data$field1)),]
You can use a plain if statement.
df <- data.frame(fieled = c("my_string", "my_string_not", "something", "something_else"),
numbers = 1:4)
result <- grep("gabriel", df$fieled)
if (length(result))
{
df <- df[- result, ]
}
df
result <- grep("my_string", df$fieled)
if (length(result))
{
df <- df[- result, ]
}
df
I am performing Data Analysis and cleaning in R using tidyverse.
I have a Data Frame with 23 columns containing values 'NO','STEADY','UP' and 'down'.
I want to change all the values in these 23 columns to 0 in case of 'NO','STEADY' and 1 in other case.
What i did is, i created a list by name keys in which i have kept all my columns, After that i am using for loop, ifelse statements and mutate.
Please have a look at the code below
# Column names are kept in the list by name keys
keys = c('metformin', 'repaglinide', 'nateglinide', 'chlorpropamide', 'glimepiride',
'glipizide', 'glyburide', 'pioglitazone', 'rosiglitazone', 'acarbose', 'miglitol',
'insulin', 'glyburide-metformin', 'tolazamide', 'metformin-pioglitazone',
'metformin-rosiglitazone', 'glimepiride-pioglitazone', 'glipizide-metformin',
'troglitazone', 'tolbutamide', 'acetohexamide')
After that, i used following code to get the desired result :
for (col in keys){
Dataset = Dataset %>%
mutate(col = ifelse(col %in% c('No','Steady'),0,1)) }
I was expecting that, it will do the changes that i require, but nothing happens after this. (NO ERROR MESSAGE AND NO DESIRED RESULT)
After that, i researched further and executed following code
for (col in keys){
print(col)}
It gives me elements of list as characters like - "metformin"
So, i thought - may be this is the issue. Hence, i used the below code to caste the keys as symbols :
keys_new = sym(keys)
After that i again ran the same code:
for (col in keys_new){
Dataset = Dataset %>%
mutate(col = ifelse(col %in% c('No','Steady'),0,1))}
It gives me following Error -
Error in match(x, table, nomatch = 0L) :
'match' requires vector arguments
After all this. I also tried to create a function to get the desired results, but that too didn't worked:
change = function(name){
Dataset = Dataset %>%
mutate(name = ifelse(name %in% c('No','Steady'),0,1),
name = as.factor(name))
return(Dataset)}
for (col in keys){
change(col)}
This didn't perform any action. (NO ERROR MESSAGE AND NO DESIRED RESULT)
When keys_new is placed in this code:
for (col in keys_new){
change(col)}
I got the same Error :
Error in match(x, table, nomatch = 0L) :
'match' requires vector arguments
PLEASE GUIDE
There's no need to loop or keep track of column names. You can use mutate_all -
Dataset %>%
mutate_all(~ifelse(. %in% c('No','Steady'), 0, 1))
Another way, thanks to Rui Barradas -
Dataset %>%
mutate_all(~as.integer(!. %in% c('No','Steady')))
There's a simpler way using mutate_at and case_when.
Dataset %>% mutate_at(keys, ~case_when(. %in% c("NO", "STEADY") ~ 0, TRUE ~ 1))
mutate_at will only mutate the columns specified in the keys variable. case_when then lets you replace one value by another by some condition.
This answer for using mutate through forloop.
I don't have your data, so i tried to make my own data, i changed the keys into a tibble using enframe then spread it into columns and used the row number as a value for each column, then check if the value is higher than 10 or not.
To use the column name in mutate you have to use !! and := in the mutate function
df <- enframe(c('metformin', 'repaglinide', 'nateglinide', 'chlorpropamide', 'glimepiride',
'glipizide', 'glyburide', 'pioglitazone', 'rosiglitazone', 'acarbose', 'miglitol',
'insulin', 'glyburide-metformin', 'tolazamide', 'metformin-pioglitazone',
'metformin-rosiglitazone', 'glimepiride-pioglitazone', 'glipizide-metformin',
'troglitazone', 'tolbutamide', 'acetohexamide')
) %>% spread(key = value,value = name)
keys = c('metformin', 'repaglinide', 'nateglinide', 'chlorpropamide', 'glimepiride',
'glipizide', 'glyburide', 'pioglitazone', 'rosiglitazone', 'acarbose', 'miglitol',
'insulin', 'glyburide-metformin', 'tolazamide', 'metformin-pioglitazone',
'metformin-rosiglitazone', 'glimepiride-pioglitazone', 'glipizide-metformin',
'troglitazone', 'tolbutamide', 'acetohexamide')
for (col in keys){
df = df %>%
mutate(!!as.character(col) := ifelse( df[col] > 10,0,100) )
}
I want to make a new column with a name that is a combo of two arguments I gave a function.
Here is some data:
data <- tribble(
~one, ~two, ~three,
'a','b', 'c',
'd', 'e', 'f'
)
If I just want to give it a normal name, this works fine:
normal_naming_func <- function(data, name) {
data %>%
mutate({{name}} := str_c(one, two))
}
But what if I want the name to be a combination of two different function parameters?
This doesn't work:
naming_func <- function(data, name_part1, name_part2) {
data %>%
mutate(str_c({{name_part1}}, {{name_part2}}) := str_c(one, two))
}
I get the error:
Error: The LHS of:=must be a string or a symbol
Neither does this:
naming_func <- function(data, name_part1, name_part2) {
data %>%
mutate(str_glue("{{name_part1}}, {{name_part2}}") := str_c(one, two))
}
Thanks for your help.
You forgot to unquote the LHS. Furthermore, you need to convert the unevaluated names to strings before you can concatenate them:
naming_func <- function(data, name_part1, name_part2) {
name1 = as.character(ensym(name_part1))
name2 = as.character(ensym(name_part2))
data %>%
mutate(!! str_c(name1, name2) := str_c({{name_part1}}, {{name_part2}}))
}
Remember, {{…}} is a shortcut for enquote-then-unquote. However, to construct the new column name you need a slightly different operation: enquote-then-to-string-then-concatenate-then-unquote.
{{…}} does not allow you to insert operations in between the quoting and unquoting so the only way to achieve this is to split the operations up and perform them manually, as is done in the code above.
I am trying to write a function which subsets a dataset containing a certain string.
Mock data:
library(stringr)
set.seed(1)
codedata <- data.frame(
Key = sample(1:10),
ReadCodePreferredTerm = sample(c("yes", "prefer", "Had refer"), 20, replace=TRUE)
)
User defined function:
findterms <- function(inputdata, variable, searchterm) {
outputdata <- inputdata[str_which(inputdata$variable, regex(searchterm, ignore_case=TRUE)), ]
return(outputdata)
}
I am expecting at least a couple of rows returned, but I get 0 when I run the following code:
findterms(codedata, ReadCodePreferredTerm, " refer") #the space in front of this word is deliberate
I realise I am trying to do something quite simple... but can't find out why it isn't working.
Note, the code works fine when not defined as a function:
referterms <- codedata[str_which(codedata$ReadCodePreferredTerm, regex(" refer", ignore_case=TRUE)), ]
You can use dplyr and stringr to do this simply
library(magrittr) # For the pipe (%>%)
library(dplyr)
library(stringr)
codedata %>%
dplyr::filter(str_detect(ReadCodePreferredTerm, '\\brefer\\b'))
You can also write your own function if you like, you will need rlang as well if you don't want to pass in a string for the variable name. something like this works
library(rlang)
findterms <- function(df, variable, searchterm) {
variable <- enquo(variable)
return(
df %>%
dplyr::filter(str_detect(!!variable, str_interp('\\b${ searchterm }\\b')))
)
}
findterms(codedata, ReadCodePreferredTerm, 'refer')
When writing a function, how do I get the new name for baseline to change depending on what the name of my dataset is? With this function the column names become dataset_baseline and dataset_adverse instead of for example Inflation_baseline and Inflation_adverse.
renaming <- function(dataset) {
dataset <- dataset %>%
rename(dataset_baseline = baseline, dataset_adverse = adverse)
return(dataset)
}
Try this :
renaming <- function(dataset,columns) {
call = as.list(match.call())
dataset.name <- toString(call$dataset)
dataset %>% rename_at(columns,funs(paste0(dataset.name,.)))
}
dataset <- renaming(dataset,c("baseline","adverse"))
NOTE : You should not try to assign dataset from within your function : it won't work because the 'dataset' there would refer to a local variable of your function.