I want to use the character strings from one column of a dataframe as the search string in a sub search of the character strings in another column of the dataframe on a row-by-row basis. I would like to do this using dplyr::mutate. I have figured out a way to do this using an anonymous function and apply, but I feel like apply shouldn't be necessary and I must be doing something wrong with how I'm implementing mutate. (And yes, I know that tools::file_path_sans_ext can give me the final result without needing to use mutate; I'm just want to understand how to use mutate.)
Here is the code that I think should work but doesn't:
files.vec <- dir(
dir.target,
full.names = T,
recursive = T,
include.dirs = F,
no.. = T
)
library(tools)
files.paths.df <- as.data.frame(
cbind(
path = files.vec,
directory = dirname(files.vec),
file = basename(files.vec),
extension = file_ext(files.vec)
)
)
library(tidyr)
library(dplyr)
files.split.df <- files.paths.df %>%
mutate(
no.ext = function(x) {
sub(paste0(".", x["extension"], "$"), "", x["file"])
}
)
| Error in mutate_impl(.data, dots) :
| Column `no.ext` is of unsupported type function
Here is the code that works, using apply:
files.split.df <- files.paths.df %>%
mutate(no.ext = apply(., 1, function(x) {
sub(paste0(".", x["extension"], "$"), "", x["file"])
}))
Can this be done without apply?
Apparently what you need is a whole bunch of parentheses. See https://stackoverflow.com/a/36906989/3277050
In your situation it looks like:
files.split.df <- files.paths.df %>%
mutate(
no.ext = (function(x) {sub(paste0(".", x["extension"], "$"), "", x["file"])})(.)
)
So it seems like if you wrap the whole function definition in brackets you can then treat it like a regular function and supply arguments to it.
New Answer
Really this is not the right way to use mutate at all though. I got focused in on the anonymous function part first without looking at what you are actually doing. What you need is a vectorized version of sub. So I used str_replace from the stringr package. Then you can just refer to columns by name because that is the beauty of dplyr:
library(tidyr)
library(dplyr)
library(stringr)
files.split.df <- files.paths.df %>%
mutate(
no.ext = str_replace(file, paste0(".", extension, "$"), ""))
Edit to Answer Comment
To use a user defined function where there isn't an existing vectorized function you could use Vectorize like this:
string_fun <- Vectorize(function(x, y) {sub(paste0(".", x, "$"), "", y)})
files.split.df <- files.paths.df %>%
mutate(
no.ext = string_fun(extension, file))
Or if you really don't want to name the function, which I do not recommend as it is much harder to read:
files.split.df <- files.paths.df %>%
mutate(
no.ext = (Vectorize(function(x, y) {sub(paste0(".", x, "$"), "", y)}))(extension, file))
Related
For example:
Imagine I have an object named "cors" which contains a string only ("Spain" for example). I would like then for "cors" to be replaced in the expression (1) below by "Spain", resulting in the expression (2):
#(1)
DF <- DF %>% filter(str_detect(Country, "Germany|cors", negate = TRUE))
#(2)
DF <- DF %>% filter(str_detect(Country, "Germany|Spain", negate = TRUE))
P.S: I know that in MATLAB this could be handled with the "eval()" command, though in R it apparently has a completely different applicability.
If we have an object, then place it outside the quotes and use paste/str_c to create the string
library(dplyr)
library(stringr)
cors <- "Spain"
pat <- str_c(c("Germany", cors), collapse = "|")
DF %>%
filter(str_detect(Country, pat, negate = TRUE))
Or another option is to string interpolate with glue (assuming cors object have only a single string element)
DF %>%
filter(str_detect(Country, glue::glue("Germany|{cors}"), negate = TRUE))
Or this can be done in base R with grepl
pat <- paste(c("Germany", cors), collapse = "|")
subset(DF, !grepl(pat, Country))
If you really want eval, you could do:
cors <- 'Spain'
DF <- DF %>% filter(
eval(
parse(text=paste0('str_detect(Country, "Germany|', cors, '", negate=TRUE)'))
))
I am trying to code a function that would allow me to move certain patterns in a string in r. For example, if my strings are pattern_string1, pattern_string2, pattern_string3, pattern_string4, I want to mutate them to string1_pattern, string2_pattern, string3_pattern, string4_pattern.
In oder to achieve this, I tried the following:
string_flip <- function(x, pattern){
if(str_detect(x, pattern)==TRUE){
str_remove(x, pattern) %>%
paste(x, "pattern", sep = "_")
}
}
However, when I try to apply this onto a vector of strings by the following code:
stringvector <- c(pattern_string1, pattern_string2, pattern_string3, pattern_string4, string5, string6)
string_flip(stringvector, "pattern")
it returns a warning and changes all vectors, not only the vectors that contain "pattern". In addition it does not only add pattern to the end of the string, it doubles the string itself as well, so I get the following result:
[1] "_string1_pattern_string1_pattern" "_string2_pattern_string2_pattern" "_string3_pattern_string3_pattern"
[4] "_string4_pattern_string4_pattern" "string5_string5_pattern" "string6_string6_pattern"
Can anybody help me with this?
Thanks a lot in advance!
Your function string_flip is not vectorised. It works for only one string at a time.
I think you have additional x which is why the string is doubling.
In paste, pattern should not be in quotes.
Try this function.
library(stringr)
string_flip <- function(x, pattern){
trimws(ifelse(str_detect(x, pattern),
str_remove(x, pattern) %>% paste(pattern, sep = "_"), x), whitespace = '_')
}
stringvector <- c('pattern_string1', 'pattern_string2', 'pattern_string3', 'pattern_string4')
string_flip(stringvector, "pattern")
#[1] "string1_pattern" "string2_pattern" "string3_pattern" "string4_pattern"
I have some code which I'm looking to replicate many times, each for a different country as the suffix.
Assuming 3 countries as a simple example:
country_list <- c('ALB', 'ARE', 'ARG')
I'm trying to create a series of variables called a_m5_ALB, a_m5_ARE, a_m5_ARG etc which have various functions e.g. addcol or round_df applied to reg_math_ALB, reg_math_ARE, reg_math_ARG etc
for (i in country_list) {
paste("a_m5", i , sep = "_") <- addcol(paste("reg_math", i , sep = "_"))
}
for (i in country_list) {
paste("a_m5", i , sep = "_") <- round_df(paste("reg_math", i , sep = "_"))
}
where addcol and round_df are defined as:
addcol = function(y){
dat1 = mutate(y, p.value = ((1 - pt(q = abs(reg.t.value), df = dof))*2))
return(dat1)
}
round_df <- function(x, digits) {
numeric_columns <- sapply(x, mode) == 'numeric'
x[numeric_columns] <- round(x[numeric_columns], digits)
x
}
The loop errors when any of the functions are added in brackets before the paste variable part but it works if doing it manually e.g.
a_m5_ALB <- addcol(reg_math_ALB)
Please could you help? I think it's the application of the function in a loop which i'm getting wrong.
Errors:
Error in UseMethod("mutate_") :
no applicable method for 'mutate_' applied to an object of class "character"
Error in round(x[numeric_columns], digits) :
non-numeric argument to mathematical function
Thank you
From your examples, you're really in a case where everything should be in a single dataframe. Here, keeping separate variables for each country is not the right tool for the job. Say you have your per-country dataframes saved as csv, you can rewrite everything as:
library(tidyverse)
country_list <- c('ALB', 'ARE', 'ARG')
read_data <- function(ctry){
read_csv(paste0("/path/to/file/", "reg_math_", ctry)) %>%
add_column(country = ctry)
}
total_df <- map_dfr(country_list, read_data)
total_df %>%
mutate(p.value = (1 - pt(q = abs(reg.t.value), df = dof))*2) %>%
mutate(across(where(is.numeric), round, digits = digits))
And it gives you immediate access to all other dplyr functions that are great for this kind of manipulation.
I'm aiming to get a list of all files in a Google Drive folder, as well at the associated metadata for those files. When I use drive_ls, it returns 3 columns {name, id, drive_resource}. drive_resource is a structured like this: list(kind = "drive#file", id = "abc",...). However, some of the list is not qualified by quotations, and commas are also occassionally used when not a separator.
Any ideas how I might properly unlist this? I can't find anywhere in the package that can handle this.
Using the package 'googledrive', I can get a list of all the files
a <- drive_ls(path = "abc", recursive = TRUE)
The below attempt gets close, but fails to get thee column names and also splits some values at the wrong place based on a comma being contained in the string.
a$drive_resource <- vapply(a$drive_resource, paste, collapse = ",", character(1L))
abcd <- a%>% separate(drive_resource, sep = ",", into = c("1","2","3","4","5","6","7","8","9","10","11","12","13","14","15","16","17","18","19","20","21","22","23","24","25","26","27","28","29","30") )
You can try the following approach. It's an example with only four elements of the list (selected names are specified in the function). The function maps each list contained in each row to a tibble, so you can unnest it
require(googledrive)
require(dplyr)
f <- function(l){
l[c("version","webContentLink","viewedByMeTime","mimeType")] %>% as_tibble()
}
dr_content <- drive_ls(path = "<path>", recursive = TRUE)
dr_content <- dr_content %>% mutate(drive_resource = purrr::map(drive_resource, f))
dr_content <- dr_content %>% tidyr::unnest(drive_resource)
Is there a way to automatically give names to the returned list given by purrr:map?
For example, I run code like this very often.
fn <- function(x) { paste0(x, "_") }
l <- map(LETTERS, fn)
names(l) <- LETTERS
I'd like for the vector that is being automated upon to automatically become the names of the resulting list.
We can use imap
imap(setNames(LETTERS, LETTERS), ~ paste0(.x, "_"))
Or map with a named vector
map(setNames(LETTERS, LETTERS), ~ paste0(.x, "_"))
This seems like a clean way to do it to me:
purrr::map(LETTERS, paste0, "_") %>% purrr::set_names()
Thanks to the comment left by aosmith for identifying purrr::set_names. Note if you want to set the names to something else, just say ... %>% set_names(my_names).