Switching the order of paste() in piping in R - r

I am fairly new to R and I would like to paste the string "exampletext" in front of each file name within the path.
csvList <- list.files(path = "./csv_by_subject") %>%
paste0("*exampletext")
Currently this code renders things like "csv*exampletext" and I want it to be *exampletextcsv". I would like to continue to using dplyr and piping - help appreciated!

As others pointed out, the pipe is not necessary here. But if you do want to use it, you just have to specify that the second argument to paste0 is "the thing you are piping", which you do using a period (.)
list.files(path = "./csv_by_subject") %>%
paste0("*exampletext", .)

paste0('exampletext', csvList) should do the trick. It's not necessarily using dplyr and piping, but it's taking advantage of the vectorization features that R provides.

If you'd like to paste *exampletext before all of the file names, you can reverse the order of what you're doing now using paste0 and passing the second argument as list.files. paste0 can handle vectors as the second argument and will apply the paste to each element.
csvList <- paste0("*exampletext", list.files(path = "./csv_by_subject"))
This returns a few examples from a local folder on my machine:
csvList
[1] "*exampletext_error_metric.m"
[2] "*exampletext_get_K_clusters.m"
...

Related

Running regex in R using str_extract_all has regexp not yet implemented

I am trying to use regex to parse a file using regex. Most of the solutions to using regex in R use the stringr package. I have not found another way, or another package to use that would work. If you have another way of going about this that would also be acceptable.
What I am trying to accomplish is to grab a couple of values that are seperated by spaces with the last value being some comma seperated values of variable length. This should go into a matrix or df in table like format is it is currently.
foo foo_123bar foo,bar,bazz
foo2 foo_456bar foo2,bar2
I have the working example of my regex here.
There could be a couple of issues I could be running into. The first could be that the regex I am writing is not supported by R's regex engine. Although I have the feeling from this that would be supported. I have seen that R uses a POSIX like format which could make things interesting. The second simply could be exactly what the error message bellow is showing. This is not a feature that has been coded in yet. This however would be the most troubling because I don't know another way to solve my problem without this package.
Below is the R code that I am using to replicate this error
library("stringr")
string = " foo foo_123bar foo,bar,bazz\n foo2 foo_456bar foo2,bar2,bazz2"
pattern = "
(?(DEFINE)
(?<blanks>[[:blank:]]+)
(?<var>\"?[[:alnum:]_]+\"?)
(?<csvar>(\"?[[:alnum:]_]+\"?,?)+)
)
^
(?&blanks)((?&var))
(?&blanks)((?&var))
(?&blanks)((?&csvar))"
# Both of these are throwing the error
str_extract_all(string, pattern)
str_extract_all(string, regex(pattern, multiline=TRUE, comments=TRUE))
> Error in stri_extract_all_regex(string, pattern, simplify = simplify, :
> Use of regexp feature that is not yet implemented. (U_REGEX_UNIMPLEMENTED)
# Using the example from ?str_extract_all runs without error
shopping_list <- c("apples x4", "bag of flour", "bag of sugar", "milk x2")
str_extract_all(shopping_list, "\\b[a-z]+\\b", simplify = TRUE)
I am looking for a solution, not necessarily a stringr solution, but this is the only way I found that fits my needs. The other simpler R regex functions only accept the pattern and not the extra parameters that include the multi line and comment functionality that I am using.
You have a PCRE regex that can only be used in methods/functions that parse the regex with the PCRE regex library (or Boost, it is based on PCRE). stringr str_extract parses the regex with the ICU regex library. ICU regex does not support recursion and DEFINE block. You just can't use the in-pattern approach to define subpatterns and then re-use them.
Instead, just declare the regex parts you need to re-use as variables and build the pattern dynamically:
library("stringr")
string = " foo foo_123bar foo,bar,bazz\n foo2 foo_456bar foo2,bar2,bazz2"
blanks <- "[[:blank:]]+"
vars <- "\"?[[:alnum:]_]+\"?"
csvar <- "(?:\"?[[:alnum:]_]+\"?,?)+"
pattern <- paste0("^",blanks,"(", vars, ")",blanks,"(", vars,")",blanks,"(",csvar, ")")
str_match_all(string, pattern)
# [[1]]
# [,1] [,2] [,3] [,4]
#[1,] " foo foo_123bar foo,bar,bazz" "foo" "foo_123bar" "foo,bar,bazz"
Note: you need to use str_match (or str_match_all) to extract the capturing group values as str_extract or str_extract_all only allows access to the whole match values.

Extract segment of filename

I'm trying to extract a filename and save the dataframe with that same name.
The problem I have is that if the filename for some reason is inside a folder with a similar word, stringr will return that word as well.
filename <- "~folder/testdata/2016/testdata 2016.csv"
If I run this:
library(stringr)
str <- str_trim(stringr::str_extract(filename,"[t](.*)"), "left") it returns testdata/2016/testdata 2016.csv when all I want is testdata 2016. Optimally it would even be better to get testdata2016.
I've been trying several combinations but there has to be a simpler way of doing this. If there was a way of reading the path from right to left, starting at .csv stop at /, I wouldn't have this issue.
You can have below approaches:
library(stringr)
str_replace(str_extract(filename,"\\w*\\s+\\w*(?=\\.)"),"\\s+","")
str_replace_all(basename(filename),"\\s+|\\.csv","")
You can use basename approach as suggested by Benjamin.
?basename:
basename removes all of the path up to and including the last path
separator (if any).
Output:
[1] "testdata2016"
Plenty of help in base R (tools pkg comes with the default R install):
gsub(" ", "",
tools::file_path_sans_ext(
basename("~folder/testdata/2016/testdata 2016.csv")))

Automatically using the object name as file name with write.table or write.csv

Is there a way to have the object name become the file name character string when using write.table or write.csv?
In the following, a and b are vectors. I will be doing similar comparisons for many other pairs of vectors, and would like to not write out the object name as many times as I have been doing.
unique_downa<-a[!(a%in%b)]
write.csv(unique_downa,file="unique_downa.csv")
Or if anyone has a suggestion for a better way to do this whole process, I'd be happy to hear it.
The idiomatic approach is to use deparse(substitute(blah))
eg
write.csv.named <- function(x, ...){
fname <- sprintf('%s.csv',deparse(substitute(x)))
write.csv(x=x, file = fname, ...)
}
It might be easiest to use the names of elements of a list instead of trying to use object names:
mycomparisons <-list (unique_downa = a[!(a%in%b)], unique_downb = b[!(b%in%a)])
mapply (write.csv, mycomparisons, paste (names (mycomparisons), ".csv", sep =""))
The best thing to do is probably put your vectors in a list, and then do the comparisons, the naming, and the writing out all inside the same loop, but that depends on how similar these similar comparisons are...

How to use variable in xpath in R?

when i parse a web file, it works fine ,
tdata=xpathApply(data,"//table[#id='PL']")
i want to use variable in xpathApply,
x="PL"
tdata=xpathApply(data,"//table[#id=x]")
it can not work,how to write the xpath expression in xpathApply with variable?
think for Dason's suggestion,
x="PL"
y=paste0("//table[#id='",x,"']")
tdata=xpathApply(data,y)
it is ok,but i feel it is ugly,how can i write it more beautiful?
The gsubfn package can do string interpolation somewhat along the lines of Perl if we preface the function whose arguments are to contain substitutions with fn$. Here $x means substitute in the value of x . See ?fn and the gsubfn home page.
library(gsubfn)
x <- "PL"
tdata <- fn$xpathApply(data, "//table[#id='$x']")
#Dason's suggestion of using paste or one alike is most likely the only way to go. If you find it ugly, you can sweep it under the rug by creating your own function:
my.xpathApply <- function(data, x) xpathApply(data, paste0("//table[#id='",x,"']"))
tdata <- my.xpathApply(data, "PL")
After all, you must use a lot of package functions that use paste somewhere, so you should be ok with having one of your own :-)

Do you always use row.names=F in write.csv? Changing the default values inside R (base) functions

Couldn't see a solution online but I thought this might be quite common.
with write.csv I basically always have the argument row.name set to F. Is it possible to run a line once and update the default value of the argument for the rest of the session?
I tried paste <- paste(sep="") which ran and returned no error but seemed to do nothing (and didn't destroy the paste function). This is another one, I always set sep="" with paste...
like I always have exclude=NULL when I am using table so I can see the N/A values.
EDIT: So, I'm looking for a solution that will work for multiple functions if possible: paste, write.csv, table and other functions like these.
paste <- paste(sep="") puts the output of paste() into an object named "paste". You would need to do something like this instead.
paste <- function (..., sep = "", collapse = NULL) {
base::paste(..., sep=sep, collapse=collapse)
}
You can also look at the Defaults package for this sort of thing, but it doesn't currently work for two of your examples.
Try this:
paste <- paste
formals(paste)$sep <- ""
This creates a new copy of paste in your workspace, and then modifies its default value for sep to "". Subsequent calls to paste will then use the modified copy, as it sits in front of the base environment in your search path.

Resources