Convert string argument to regular expression - julia

Trying to get into Julia after learning python, and I'm stumbling over some seemingly easy things. I'd like to have a function that takes strings as arguments, but uses one of those arguments as a regular expression to go searching for something. So:
function patterncount(string::ASCIIString, kmer::ASCIIString)
numpatterns = eachmatch(kmer, string, true)
count(numpatterns)
end
There are a couple of problems with this. First, eachmatch expects a Regex object as the first argument and I can't seem to figure out how to convert a string. In python I'd do r"{0}".format(kmer) - is there something similar?
Second, I clearly don't understand how the count function works (from the docs):
count(p, itr) → Integer
Count the number of elements in itr for which predicate p returns true.
But I can't seem to figure out what the predicate is for just counting how many things are in an iterator. I can make a simple counter loop, but I figure that has to be built in. I just can't find it (tried the docs, tried searching SO... no luck).
Edit: I also tried numpatterns = eachmatch(r"$kmer", string, true) - no go.

To convert a string to a regex, call the Regex function on the string.
Typically, to get the length of an iterator you an use the length function. However, in this case that won't really work. The eachmatch function returns an object of type Base.RegexMatchIterator, which doesn't have a length method. So, you can use count, as you thought. The first argument (the predicate) should be a one argument function that returns true or false depending on whether you would like to count a particular item in your iterator. In this case that function can simply be the anonymous function x->true, because for all x in the RegexMatchIterator, we want to count it.
So, given that info, I would write your function like this:
patterncount(s::ASCIIString, kmer::ASCIIString) =
count(x->true, eachmatch(Regex(kmer), s, true))
EDIT: I also changed the name of the first argument to be s instead of string, because string is a Julia function. Nothing terrible would have happened if we would have left that argument name the same in this example, but it is usually good practice not to give variable names the same as a built-in function name.

Related

Why are names(x)<-y and "names<-"(x,y) not equivalent?

Consider the following:
y<-c("A","B","C")
x<-z<-c(1,2,3)
names(x)<-y
"names<-"(z,y)
If you run this code, you will discover that names(x)<-y is not identical to "names<-"(z,y). In particular, one sees that names(x)<-y actually changes the names of x whereas "names<-"(z,y) returns z with its names changed.
Why is this? I was under the impression that the difference between writing a function normally and writing it as an infix operator was only one of syntax, rather than something that actually changes the output. Where in the documentation is this difference discussed?
Short answer: names(x)<-y is actually sugar for x<-"names<-"(x,y) and not just "names<-"(x,y). See the the R-lang manual, pages 18-19 (pages 23-24 of the PDF), which comes to basically the same example.
For example, names(x) <- c("a","b") is equivalent to:
`*tmp*`<-x
x <- "names<-"(`*tmp*`, value=c("a","b"))
rm(`*tmp*`)
If more familiar with getter/setter, one can think that if somefunction is a getter function, somefunction<- is the corresponding setter. In R, where each object is immutable, it's more correct to call the setter a replacement function, because the function actually creates a new object identical to the old one, but with an attribute added/modified/removed and replaces with this new object the old one.
In the case example for instance, the names attribute are not just added to x; rather a new object with the same values of x but with the names is created and linked to the x symbol.
Since there are still some doubts about why the issue is discussed in the language doc instead directly on ?names, here is a small recap of this property of the R language.
You can define a function with the name you wish (there are some restrictions of course) and the name does not impact in any way if the function is called "normally".
However, if you name a function with the <- suffix, it becomes a replacement function and allows the parser to apply the function with the mechanism described at the beginning of this answer if called by the syntax foo(x)<-value. See here that you don't call explicitely foo<-, but with a slightly different syntax you obtain an object replacement (since the name).
Although there are not formal restrictions, it's common to define getter/setter in R with the same name (for instance names and names<-). In this case, the <- suffix function is the replacement function of the corresponding version without suffix.
As stated at the beginning, this behaviour is general and a property of the language, so it doesn't need to be discussed in any replacement function doc.
In particular, one sees that names(x)<-y actually changes the names of x whereas "names<-"(z,y) returns z with its names changed.
That’s because `names<-`1 is a regular function, albeit with an odd name2. It performs no assignment, it returns a new object with the names attribute set. In fact `names<-` is a primitive function in R but it could be implemented as follows (there are shorter, better ways of writing this in R, but I want the separate steps to be explicit):
`names<-` = function (x, value) {
new = x
attr(new, 'names') = value
new
}
That is, it
… creates a new object that’s a copy of x,
… sets the names attribute on that newly created object, and
… returns the new object.
Since virtually all objects in R are immutable, this fits naturally into R’s semantics. In fact, a better name for this exact function would be with_names3. But the creators of R found it convenient to be able to write such an assignment without repeating the name of the object. So instead of writing
x = with_names(x, c('foo', 'bar'))
or
x = `names<-`(x, c('foo', 'bar'))
R allows us to write
names(x) = c('foo', 'bar')
R handles this syntax specially by internally converting it to another expression, documented in the Subset assignment section of the R language definition, as explained in the answer by Nicola.
But the gist is that names(x) = y and `names<-`(x, y) are different because … they just are. The former is a special syntactic form that gets recognised and transformed by the R parser. The latter is a regular function call, and the weird function name is a red herring: it doesn’t affect the execution whatsoever. It does the same as if the function was named differently, and you can confirm this by assigning it a different name:
with_names = `names<-`
`another weird(!) name` = `names<-`
# These are all identical:
`names<-`(x, y)
with_names(x, y)
`another weird(!) name`(x, y)
1 I strongly encourage using backtick quotes (`) instead of straight quotes (' or ") to quote R variable names. While both are allowed in some circumstances, the latter invites confusion with strings, and is conceptually bonkers. These are not strings. Consider:
"a" = "b"
"c" = "a"
Rather than copy the value of a into c, what this code actually does is set c to literal "a", because quotes now mean different things on the left- and right-hand side of assignment.
The R documentation confirms that
The preferred quote [for variable names] is the backtick (`)
2 Regular variable names (aka “identifiers” or just “names”) in R can only contain letters, digits, underscore and the dot, must start with a letter, or with a dot not followed by a digit, and can’t be reserved words. But R allows using pretty much arbitrary characters — including punctuation and even spaces! — in variable names, provided the name is backtick-quoted.
3 In fact, R has an almost-alias for this function, called setNames — which isn’t a great name, since set… implies mutating the object, but of course it doesn’t do that.

save get'd variable (after assign)

Why can't R find this variable?
assign(paste0('my', '_var'), 2)
get(paste0('my', '_var')) ## isn't this returning an object?
save(get(paste0('my', '_var')), file = paste0('my', '_var.RDATA'))
This throws the error:
Error in save(paste0("my", "_var"), file = paste0("my", "_var.RDATA")) :
object ‘paste0("my", "_var")’ not found
From the help page, the save() function expects "the names of the objects to be saved (as symbols or character strings)." Those values are not evaulated, ie you can't put in functions that will eventually return strings or raw values themselves. Use the list= parameter if you want to call a function to return a string the the name of a variable.
save(list=paste0('my', '_var'), file = paste0('my', '_var.RDATA'))
Though using get/assign is often not a good practice in R. They are usually better ways so you might want to rethink your general approach.
And finally, if you are saving a single object, you might want to consider saveRDS() instead. Often that's the behavior people are expecting when they use the save() function.
The documentation for save says that ... should be
the names of the objects to be saved (as symbols or character strings).
And indeed if you type save into the console you can see that the source has the line
names <- as.character(substitute(list(...)))[-1L]
where substitute captures its argument and doesn't evaluate it. So as the error suggests, it is looking for an object with the name paste0('my', '_var'), not evaluating the expressions supplied.

Why does substitute change noquote text to a string in R?

I wanted to answer a question regarding plotmath but I failed to get my desired substitute output.
My desired output:paste("Hi", paste(italic(yes),"why not?"))
and what I get: paste("Hi", "paste(italic(yes),\"why not?\")")
text<-'paste(italic(yes),"why not?")'
text
[1] "paste(italic(yes),\"why not?\")"
noqoute_text<-noquote(text)
noqoute_text
[1] paste(italic(yes),"why not?")
sub<-substitute(paste("Hi",noqoute_text),
env=list(noqoute_text=noqoute_text))
sub
paste("Hi", "paste(italic(yes),\"why not?\")")
You're using the wrong function, use parse instead of noquote :
text<-'paste(italic(yes),"why not?")'
noquote_text <- parse(text=text)[[1]]
sub<- substitute(paste("Hi",noquote_text),env=list(noquote_text= noquote_text))
# paste("Hi", paste(italic(yes), "why not?"))
noquote just applies a class to an object of type character, with a specific print method not to show the quotes.
str(noquote("a"))
Class 'noquote' chr "a"
unclass(noquote("a"))
[1] "a"
Would you please elaborate on your answer?
In R you ought to be careful about the difference between what's in an object, and what is printed.
What noquote does is :
add "noquote" to the class attribute of the object
That's it
The code is :
function (obj)
{
if (!inherits(obj, "noquote"))
class(obj) <- c(attr(obj, "class"), "noquote")
obj
}
Then when you print it, the methods print.noquote :
Removes the class "noquote" from the object if it's there
calls print with the argument quote = FALSE
that's it
You can actually call print.noquote on a string too :
print.noquote("a")
[1] a
It does print in a similar fashion as quote(a) or substitute(a) would but it's a totally different beast.
In the code you tried, you've been substituting a string instead of a call.
For solving the question I think Moody_Mudskipperss answer works fine, but as you asked for some elaboration...
You need to be careful about different ways similar-looking things are actually stored in R, which means they behave differently.
Especially with the way plotmath handles labels, as they try to emulate the way character-strings are normally handled, but then applies its own rules. The 3 things you are mixing I think:
character() is the most familiar: just a string. Printing can be confusing when quotes etc. are escaped. The function noquote basically tells R to mark it's argument, so that quotes are not escaped.
calls are "unevaluated function-calls": it's an instruction as to what R should do, but it's not yet executed. Any errors in this call don't come up yet, and you can inspect it.
Note that a call does not have its own evironment given with it, which means a call can give different results if evaluated e.g. from within a function.
Expressions are like calls, but applied more generally, i.e. not always a function that needs to be executed. An expression can be a variable-name, but also a simple value such as "why not?". Also, expressions can consist of multiple units, like you would have with {
Different functions can convert between these classes, but sometimes functions (such as paste!) also convert unexpectedly:
noquote does not do that much useful, as Moody_Mudskipper already pointed out: it only changes the printing. But the object basically remains a character
substitute not only substitutes variables, but also converts its first argument into (most often) a call. Here, the print bites you, for when printing a call, there is no provision for special classes of its members. Try it: sub[[3]] from the question gives[1] paste(italic(yes),"why not?")
without any backslashes! Only when printing the full call the noquote-part is lost.
parse is used to transform a character to an expression. Nothing is evaluated yet, but some structure is introduced, so that you could manipulate the expression.
paste is often behaving annoyingly (although as documented), as it can only paste together character-strings. Therefore, if you feed it anything but a character, it firs calls as.character. So if you give it a call, you just get a text-line again. So in your question, even if you'd use parse, as soon as you start pasting thing together, you get the quotes again.
Finally, your problem is harder because it's using plotmaths internal logic.
That means that as soon as you try to evaluate your text, you'll probably get an error "could not find function italic" (or a more confusing error if there is a function italic defined elsewhere). When providing it in plotmath, it works because the call is only evaluated by plotmath, which will give it a nice environment, where italic works as expected.
This all means you need to treat it all as an expression or call. As long as evaluation cannot be done (as long as it's you that handles the expression, instead of plotmath) it all needs to remain an expression or call. Giving substitute a call works, but you can also emulate more closely what happens in R, with
call('paste', 'Hi', parse(text=text)[[1]])

Use of quotes within get function (get())

I hope to get some help on the use of quotation marks within a string for get().
Say, I want to retrieve an element from a list
some_list <- list(element1=11,element2=22,element3=33)
naturally, I can simply reference this element through
some_list[['element1']]
However, once I use this as a string within get(), R throws this error message
get("some_list[['element1']]")
> Error in get("some_list[['element1']]") :
object 'some_list[['element1']]' not found
I cannot figure out why this is the case. get() works fine when used with strings that do not have quotation marks within them, e.g.
get("some_list")
I also tried escaping the quotation marks within the string (although I don't this I would need to since they are single quotation marks) but it does not work either.
some_list[["\'"element1"\'"]]
What am I missing.
get won't do that.
some_list[['element1']] isn't the name of an object in an R environment (in a technical sense). When you type some_list[['element1']] at the console, R parses the expression, looks up the symbol some_list and then calls the function [[. get is intended just for the symbol lookup piece of that.
(Technically, my sequence of events there probably isn't right, but I listed them that way to help make the issue clear. Really, R is just parsing the expression, and then calling [[ with arguments some_list and 'element1', and those symbols are subsequently looked up.)
The quotes have nothing to do with it. Run:
get("some_list")[['element1']]

Difference between `do.call()` a function and directly call a function in R?

Here is my code:
>ipo_num_year<- do.call(length,list(as.name(paste0("all_data_align_",year))))
>ipo_num_year
>90
>ipo_num_year<- length(as.name(paste0("all_data_align_",year)))
>ipo_num_year
>1
year is an string object "1999";
In previous code,all_data_align_1999 has been assigned as an list with 90 elements,so the right result is ipo_num_year equals to 90.But the second line makes ipo_num_year equals to 1,it means length() function look the return value of as.name() as an symbol object,so its length is just 1.
Why does the return value of as.name() can not be directly used as the argument of function length()?
And why the first solution works fine?
Some one may ask that why don't you just use length(all_data_align_1999).That is because the year is an loop variable in my code.
Really appreciate your kindly reply!
Instead of as.name you should use get:
length(get(paste0("all_data_align_",year)))
You need to retrieve the object not just the name.

Resources