I have the following problem:
I need the same syntax over and over again for different variable-sets.
They all have the same "core" name (for example: variable_1) and different suffixes like:
variable_1_a, variable_1_b, variable_1_c, variable_1_d, variable_1_e, ...
since the syntax is large and I need to run it for (example) variables _2, _3, _4, _5, ... and so on, I was wondering whether there is some form of placeholder-expression I could define with the "core" name with which I want to run it each time, instead of copy pasting the whole syntax and substituting every "variable_1" with the next core-name.
For example saving the core name in term !XY! (the "!" is just to represent that it is something atypical) and having that term in the whole syntax with "_a", "_b", "_c" attached
!XY!_a, !XY!_b, !XY!_c, !XY!_d, !XY!_e, ...
I played around with saving the core-name in an element called XY and pasting it with the endings:
XY <- "variable_1"
paste0(as.character(XY),"_a")
"variable_1_a"
OR
as.symbol(paste0(as.character(XY),"_a"))
variable_1_a
of course that looks horribly long but that I would accept if it worked to also use it as an expression which I could address to use like a variable: for example to read or write it which results in an error.
as.symbol(paste0(as.character(XY),"_a")) <- "test"
Error in as.symbol(paste0(as.character(XY),"_a")) <- "test" :
could not find function "as.symbol<-"
It would be a huge time-saver if there is a chance to write one syntax to fit all procedures!
Thx a lot for your ideas!
Let's assume you have 5 variables ("variable_1", "variable_2" etc) and 4 letters ("_a", "_b" etc).
We can use outer like :
n <- 1:5
l <- letters[1:4]
c(outer(n, l, function(x, y) paste("variable", x, y, sep = "_")))
#Or a bit shorter :
#paste0("variable_", c(outer(n, l, paste, sep = "_")))
#[1] "variable_1_a" "variable_2_a" "variable_3_a" "variable_4_a"
#[5] "variable_5_a" "variable_1_b" "variable_2_b" "variable_3_b"
#[9] "variable_4_b" "variable_5_b" "variable_1_c" "variable_2_c"
#[13] "variable_3_c" "variable_4_c" "variable_5_c" "variable_1_d"
#[17] "variable_2_d" "variable_3_d" "variable_4_d" "variable_5_d"
Related
I am trying to write a function to extract package names from a list of R script files. My regular expression do not seem to be working and I am not sure why. For begginers, I am not able to match lines that include library. For example
str <- c(" library(abc)", "library(def)", "some other text")
grep("library\\(", str, value = TRUE)
grep("library\\(+[A-z]\\)", str, value = TRUE)
Why does my second grep do not return elements 1 and 2 from the str vector? I have tried so many options but all my results come back empty.
Your second grep does not return 1,2 for two reasons.
You used value=TRUE which makes it return the matching string instead of the
location. and
You misplaced the +. You wantgrep("library\\(\\w+\\)", str)
If you'd like something a bit more robust that will handle some edge cases (library() takes a number of parameters and the package one can be a name/symbol or a string and doesn't necessarily have to be specified first):
library(purrr)
script <- '
library(js) ; library(foo)
#
library("V8")
ls()
library(package=rvest)
TRUE
library(package="hrbrthemes")
1 + 1
library(quietly=TRUE, "ggplot2")
library(quietly=TRUE, package=dplyr, verbose=TRUE)
'
x <- parse(textConnection(script)) # parse w/o eval
keep(x, is.language) %>% # `library()` is a language object
keep(~languageEl(.x, 1) == "library") %>% # other things are too, so only keep `library()` ones
map(as.call) %>% # turn it into a `call` object
map(match.call, definition = library) %>% # so we can match up parameters and get them in the right order
map(languageEl, 2) %>% # language element 1 is `library`
map_chr(as.character) %>% # turn names/symbols into characters
sort() # why not
## [1] "dplyr" "foo" "ggplot2" "hrbrthemes" "js" "rvest" "V8"
This won't catch library() calls within functions (it could be expanded to do that) but if top-level edge cases are infrequent, there is an even smaller likelihood of ones in functions (those wld likely use require() as well).
I am trying to understand names, lists and lists of lists in R. It would be convenient to have a way to dynamically label them like this:
> ll <- list("1" = 2)
> ll
$`1`
[1] 2
But this is not working:
> ll <- list(as.character(1) = 2)
Error: unexpected '=' in "ll <- list(as.character(1) ="
Neither is this:
> ll <- list(paste(1) = 2)
Error: unexpected '=' in "ll <- list(paste(1) ="
Why is that? Both paste() and as.character() are returning "1".
The reason is that paste(1) is a function call that evaluates to a string, not a string itself.
The The R Language Definition says this:
Each argument can be tagged (tag=expr), or just be a simple expression.
It can also be empty or it can be one of the special tokens ‘...’, ‘..2’, etc.
A tag can be an identifier or a text string.
Thus, tags can't be expressions.
However, if you want to set names (which are just an attribute), you can do so with structure, eg
> structure(1:5, names=LETTERS[1:5])
A B C D E
1 2 3 4 5
Here, LETTERS[1:5] is most definitely an expression.
If your goal is simply to use integers as names (as in the question title), you can type them in with backticks or single- or double-quotes (as the OP already knows). They are converted to characters, since all names are characters in R.
I can't offer a deep technical explanation for why your later code fails beyond "the left-hand side of = is not evaluated in that context (of enumerating items in a list)". Here's one workaround:
mylist <- list()
mylist[[paste("a")]] <- 2
mylist[[paste("b")]] <- 3
mylist[[paste("c")]] <- matrix(1:4,ncol=2)
mylist[[paste("d")]] <- mean
And here's another:
library(data.table)
tmp <- rbindlist(list(
list(paste("a"), list(2)),
list(paste("b"), list(3)),
list(paste("c"), list(matrix(1:4,ncol=2))),
list(paste("d"), list(mean))
))
res <- setNames(tmp$V2,tmp$V1)
identical(mylist,res) # TRUE
The drawbacks of each approach are pretty serious, I think. On the other hand, I've never found myself in need of richer naming syntax.
Using a basic function such as this:
myname<-function(z){
nm <-deparse(substitute(z))
print(nm)
}
I'd like the name of the item to be printed (or returned) when iterating through a list e.g.
for (csv in list(acsv, bcsv, ccsv)){
myname(csv)
}
should print:
acsv
bcsv
ccsv
(and not csv).
It should be noted that acsv, bcsv, and ccsvs are all dataframes read in from csvs i.e.
acsv = read.csv("a.csv")
bcsv = read.csv("b.csv")
ccsv = read.csv("c.csv")
Edit:
I ended up using a bit of a compromise. The primary goal of this was not to simply print the frame name - that was the question, because it is a prerequisite for doing other things.
I needed to run the same functions on four identically formatted files. I then used this syntax:
for(i in 1:length(csvs)){
cat(names(csvs[i]), "\n")
print(nrow(csvs[[i]]))
print(nrow(csvs[[i]][1]))
}
Then the indexing of nested lists was utilized e.g.
print(nrow(csvs[[i]]))
which shows the row count for each of the dataframes.
print(nrow(csvs[[i]][1]))
Then provides a table for the first column of each dataframe.
I include this because it was the motivator for the question. I needed to be able to label the data for each dataframe being examined.
The list you have constructed doesn't "remember" the expressions it was constructed of anymore. But you can use a custom constructor:
named.list <- function(...) {
l <- list(...)
exprs <- lapply(substitute(list(...))[-1], deparse)
names(l) <- exprs
l
}
And so:
> named.list(1+2,sin(5),sqrt(3))
$`1 + 2`
[1] 3
$`sin(5)`
[1] -0.9589243
$`sqrt(3)`
[1] 1.732051
Use this list as parameter to names, as Thomas suggested:
> names(mylist(1+2,sin(5),sqrt(3)))
[1] "1 + 2" "sin(5)" "sqrt(3)"
To understand what's happening here, let's analyze the following:
> as.list(substitute(list(1+2,sqrt(5))))
[[1]]
list
[[2]]
1 + 2
[[3]]
sqrt(5)
The [-1] indexing leaves out the first element, and all remaining elements are passed to deparse, which works because of...
> lapply(as.list(substitute(list(1+2,sqrt(5))))[-1], class)
[[1]]
[1] "call"
[[2]]
[1] "call"
Note that you cannot "refactor" the call list(...) inside substitute() to use simply l. Do you see why?
I am also wondering if such a function is already available in one of the countless R packages around. I have found this post by William Dunlap effectively suggesting the same approach.
I don't know what your data look like, so here's something made up:
csvs <- list(acsv=data.frame(x=1), bcsv=data.frame(x=2), ccsv=data.frame(x=3))
for(i in 1:length(csvs))
cat(names(csvs[i]), "\n")
I have lots of characters those are actually function definitions. How can I use that characters to execute those function?
The characters I have is as follows:
foo1 <- "function(p1,p2){, v <- 2, print(\"value is \"), print(v)}"
foo2 <- "function(){, cName <- .Call(\"companyNames\"), return(cName)}"
foo3 <- "function(tickers,field,date){,df<-data.frame(Ticker = tickers, Field = field, Date = date), return(df)}"
...etc
I need a general method to execute to all these functions.
EDIT: You've changed your question, so I've amended my answer:
do.call(eval(parse(text=foo1)), list())
You can add a named list to each of those functions in the place of list(). But frankly, what you're attempting is bordering on the absurd. I have no idea how you got into a position where you would need these kinds of tools. You're going to have all kinds of scoping problems from here on in.
Old solution:
fun <- eval(parse(text="function(p1,p2){v <- 2; print(paste0(\"value is \", v))}"))
fun()
## [1] "value is 2"
I am writing a loop that takes two files per run e.g.a0.txt and b0.txt. I am running this over 100 files that run from a0.txt and b0.txt to a999.txt and b999.txt. The pattern function i use works perfect if i do the run for files a0 and b0 to a9 and b9 with only file pairs 0-9 in the directory. but when i put more files in the directory and do the run from '0:10, the loop fails and confuses vectors in files. I think this is becuase of thepattern` i use i.e.
list.files(pattern=paste('.', x, '\\.txt', sep=''))
This only looks for files that have '.',x,//txt.
So if '.'=a and x=1 it finds file a1. But i think it gets confused between a0 and a10 when I do the run over more files. But i cannot seem to find the appropriate loop that will serach for files that also look for files up to a999 and b999, as well.
Can anyone help with a better way to do this? code below.
dostuff <- function(x)
{
files <- list.files(pattern=paste('.', x, '\\.txt', sep=''))
a <- read.table(files[1],header=FALSE) #file a0.txt
G <- a$V1-a$V2
b <- read.table(files[2],header=FALSE) #file b0.txt
as.factor(b$V2)
q <- tapply(b$V3,b$V2,Fun=length)
H <- b$V1-b$V2
model <- lm(G~H)
return(model$coefficients[2],q)
}
results <- sapply(0:10,dostuff)
Error in tapply(b$V3, b$V2, FUN = length) : arguments must have same length
How about getting the files directly, without searching. i.e.
dostuff <- function(x)
{
a.filename <- paste('a', x, '.txt', sep='') # a<x>.txt
b.filename <- paste('b', x, '.txt', sep='') # b<x>.txt
a <- read.table(a.filename, header=FALSE)
# [...]
b <- read.table(b.filename, header=FALSE)
# [...]
}
But the error message says the problem is caused by the call to tapply rather than anything about incorrect file names, and I have literally no idea how that could happen, since I thought a data frame (which read.table creates) always has the same number of rows for each column. Did you copy-paste that error message out of R? (I have a feeling there might be a typo, and so it was, for example, q <- tapply(a$V3,b$V2,Fun=length). But I could easily be wrong)
Also, as.factor(b$V2) doesn't modify b$V2, it just returns a factor representing b$V2: after you call as.factor b$V2 is still a vector. You need to assign it to something, e.g.:
V2.factor <- as.factor(b$V2)
If the beginning of the two files is always the same (a,b in your example); you could use this information in the pattern:
x <- 1
list.files(pattern=paste('[a,b]', x, '\\.txt', sep=''))
# [1] "a1.txt" "b1.txt"
x <- 11
list.files(pattern=paste('[a,b]', x, '\\.txt', sep=''))
# [1] "a11.txt" "b11.txt"
Edit: and you should include the ^ as well, as Wojciech proposed. ^ matches the beginning of a line or in your case the beginning of the filename.