Customizing make.names function in R? - r

I am automating a R code for which I have to use make.names function. Default behavior of make.names function is fine with me but when my table name contains a "-", I want the table name to be different.
For example, current behavior :
> make.names("iris-ir")
[1] "iris.ir"
But I want it to modify only in the case when I have "-" present in table name:
> make.names("iris-ir")
[1] "iris_ir"
How can I achieve this? EDIT: using only builtin packages.

Use the following function:
library(dplyr)
make_names<-function(name)
{
name <- as.character(name)
if(contains("-", vars = name))
sub("-", "_", name)
}
This should do what you want.
Sorry, I forgot to mention that the contains function is in the dplyr package.
Without dplyr
make_names<-function(name)
{
name <- as.character(name)
if(grepl("-", name, fixed = T))
sub("-", "_", name)
else
name
}

Related

What is causing 'object not found' error in filter() with the across() function?

This function filters/selects one or more variables from my dataset and writes it to a new CSV file. I'm getting an 'object not found' error when I call the function. Here is the function:
extract_ids <- function(filename, opp, ...) {
#Read in data
df <- read_csv(filename)
#Remove rows 2,3
df <- df[-c(1,2),]
#Filter and select
df_id <- filter(df, across(..., ~ !is.na(.x)) & gc == 1) %>%
select(...) #not sure if my use of ... here is correct
#String together variables for export file path
path <- c("/Users/stephenpoole/Downloads/",opp,"_",...,".csv") #not sure if ... here is correct
#Export the file
write_csv(df_id, paste(path,collapse=''))
}
And here is the function call. I'm trying to get columns "rid" and "cintid."
extract_ids(filename = "farmers.csv",
opp = "farmers",
rid, cintid)
When I run this, I get the below error:
Error: Problem with `filter()` input `..1`.
ℹ Input `..1` is `across(..., ~!is.na(.x)) & gc == 1`.
x object 'cintid' not found
The column cintid is correct and appears in the data. I've also tried running it with just one column, rid, and get the same 'object not found' error.
If you are passing multiple values to across(), you need to collect them in the first parameter, otherwise they will spread into the other parameters of across(). Try
filter(df, across(c(...), ~ !is.na(.x))
Otherwise every value other than the first one will be passed along as a parameter to function you've specified in across()
Sorry for omitting this in my previous suggestion to you. Unfortunately, your original question was closed before I could post it as an answer:
If you want your function to resemble dplyr, here's a few
modifications you can make. Write your function header as
function(filename, opp, ...) verbatim. Then, replace !is.na(ID)
with across(..., ~ !is.na(.x)) verbatim. Now, you can call
extract_ids() and, just as you would with any dplyr verb, you can
specify any selection of columns you want to filter out NAs:
extract_ids(filename = "farmers.csv", opp = "farmers", rid, another_column_you_want_without_NAs).
Object Not Found
As MrFlick rightly suggests in their comment, you should wrap ... with c(), so everything you pass in ... is interpreted as the first argument to across(): a single tidy-selection of columns from df:
extract_ids <- function(filename, opp, ...) {
# ...
# Filter and select
df_id <- df %>%
# This format is preferred for dplyr workflows with pipes (%>%).
filter(across(c(...), ~ !is.na(.x)) & gc == 1) %>%
select(...)
# ...
}
Without this precaution, R interprets rid and cintid as multiple arguments to across(), rather than as simply columns named by the first argument (the tidy-selection).
Variable Names in the Filepath
To get those variable names within your filepath, use
extract_ids <- function(filename, opp, ...) {
# ...
# Expand the '...' into a list of given variable names, which will get pasted.
path <- c("/Users/stephenpoole/Downloads/", opp, "_", match.call(expand.dots = FALSE)$`...`, ".csv")
# ...
}
though you might want to consider replacing match.call(expand.dots = FALSE)$`...`, which currently mushes together the variable names:
"/Users/stephenpoole/Downloads/farmers_ridcintid.csv"
In exactly the same place, you might use the expression paste(match.call(expand.dots = FALSE)$`...`, collapse = "-"), which will separate those variable names using -
"/Users/stephenpoole/Downloads/farmers_rid-cintid.csv"
or any other separator of your choice that gives a valid filename.

Syntax of mutate combined with other expressions

I'm struggling to figure out the corect syntax of mutate combined with other functions. Here I'm trying to remove the text "incubated: " from a column called "days.incubated2"
Any ideas?
df%<%
mutate(str_remove(days.incubated2, "[incubated: ]"))
The correct syntax would be :
library(dplyr)
library(stringr)
df <- df %>% mutate(days.incubated2 = str_remove(days.incubated2, "incubated: "))
You had incorrect pipe operator.
You can add a column name where you want to store the value i.e days.incubated2 here. (mutate(days.incubated2 = ....).
We can use sub from base R
df$days.incubated2 <- sub("incubated: ", "", df$days.incubated2)

convert character column and then split it into multiple new boolean columns using r mutate

I am attempting to split out a flags column into multiple new columns in r using mutate_at and then separate functions. I have simplified and cleaned my solution as seen below, however I am getting an error that indicates that the entire column of data is being passed into my function rather than each row individually. Is this normal behaviour which just requires me to loop over each element of x inside my function? or am I calling the mutate_at function incorrectly?
example data:
dataVariable <- data.frame(c_flags = c(".q.q.q","y..i.o","0x5a",".lll.."))
functions:
dataVariable <- read_csv("...",
col_types = cols(
c_date = col_datetime(format = ""),
c_dbl = col_double(),
c_flags = col_character(),
c_class = col_factor(c("a", "b", "c")),
c_skip = col_skip()
))
funTranslateXForNewColumn <- function(x){
binary = ""
if(startsWith(x, "0x")){
binary=hex2bin(x)
} else {
binary = c(0,0,0,0,0,0)
splitFlag = strsplit(x, "")[[1]]
for(i in splitFlag){
flagVal = 1
if(i=="."){
flagVal = 0
}
binary=append(binary, flagVal)
}
}
return(paste(binary[4:12], collapse='' ))
}
mutate_at(dataVariable, vars(c_flags), funs(funTranslateXForNewColumn(.)))
separate(dataVariable, c_flags, c(NA, "flag_1","flag_2","flag_3","flag_4","flag_5","flag_6","flag_7","flag_8","flag_9"), sep="")
The error I am receiving is:
Warning messages:
1: Problem with `mutate()` input `c_flags`.
i the condition has length > 1 and only the first element will be used
After translating the string into an appropriate binary representation of the flags, I will then use the seperate function to split it into new columns.
Similar to OP's logic but maybe shorter :
dataVariable$binFlags <- sapply(strsplit(dataVariable$c_flags, ''), function(x)
paste(as.integer(x != '.'), collapse = ''))
If you want to do this using dplyr we can implement the same logic as :
library(dplyr)
dataVariable %>%
mutate(binFlags = purrr::map_chr(strsplit(c_flags, ''),
~paste(as.integer(. != '.'), collapse = '')))
# c_flags binFlags
#1 .q.q.q 010101
#2 y..i.o 100101
#3 .lll.. 011100
mutate_at/across is used when you want to apply a function to multiple columns. Moreover, I don't see here that you are creating only one new binary column and not multiple new columns as mentioned in your post.
I was able to get the outcome I desired by replacing the mutate_at function with:
dataVariable$binFlags <- mapply(funTranslateXForNewColumn, dataVariable$c_flags)
However I want to know how to use the mutate_at function correctly.
credit to: https://datascience.stackexchange.com/questions/41964/mutate-with-custom-function-in-r-does-not-work
The above link also includes the solution to get this function to work which is to vectorize the function:
v_funTranslateXForNewColumn <- Vectorize(funTranslateXForNewColumn)
mutate_at(dataVariable, vars(c_flags), funs(v_funTranslateXForNewColumn(.)))

Mutate a column selected by a string converted to symbol

I'm trying to make lower case a column of my dataset
I wrote a basic stupid function
library(dplyr)
cleaning_tags<-function(data,col)
{
data<-data%>%mutate(!!sym(col)=tolower(!!sym(col)))
return (data)
}
where data is a data.frame and column is column name as a string
I don't know the error I'm getting
Error: unexpected '=' in "data%>%dplyr::mutate(!!sym("GROUPDSC") ="
The sym operator seems to work correctly because if I'm trying to do
data%>%select(!!sym(col))
it select the desired column.
Thanks.
Try using := when assigning values to column
library(dplyr)
library(rlang)
cleaning_tags<-function(data,col) {
data %>% mutate(!!sym(col) := tolower(!!sym(col)))
}
df <- data.frame(a = c("ABC", "DEF"))
cleaning_tags(df, "a")
# a
#1 abc
#2 def
There are different strange things in your code. The function does not return anything (you forgot to return data), you can't assign the new column name like this and the code is hard to read.
library(tidyverse)
cleaning_tags<-function(data, col) {
data %>%
mutate_at(col, toupper)
}
ir <- cleaning_tags(iris, "Species")

rename columns containing pattern in r using plyr rename function

I would like to rename all columns in a dataframe containing a pattern in r. Ie, I would like to substitute the column name "variable" for all columns containing "variable", such as "htn.variable". I thought I could use rename from plyr and grepl. I have created an example:
exp<-data.frame(htn.variable = c(1,2,3), id = c(5,6,7), visit = c(1,3,4))
require(plyr)
rename ( exp, c(
names(exp)[grepl ( 'variable',names(exp))] = "variable" ))
But I get the following error:
Error: unexpected '=' in:
" c(
names(exp)[grepl ( 'variable',names(exp))] ="
I think this has to do with calling up a name within a function, and I would like to ask if anyone might have a suggestion how to make this work please? Thanks.
Why bother with rename at all?
colnames(exp)[grepl('variable',colnames(exp))] <- 'variable'
If you only want to replace the part of the column name that is equal to 'variable', use:
colnames(exp) <- gsub('variable', 'replace string', colnames(exp))
rename ( exp, “variable” = names(exp)[grepl ( 'variable',names(exp))])
I am not 100% sure if this is what you need but it may be a start. I stayed away from plyr
for (i in 1:ncol(exp)){
if (substr(names(exp)[i],5,12) == "variable"){
names(exp)[i] <- "new.variable" #or any new var name
}
}
exp
You could also just remove the first four elements of the variable name.

Resources