Is there way to apply back-ticks to a vector of .Primitive function names so that it can be passed safely to is.primitive ?
Currently I use get(x) for x in is.primitive(x). These first calls,
> is.primitive(`$`)
#[1] TRUE
> is.primitive(get("$"))
#[1] TRUE
are the right ones. All the next calls don't work.
> is.primitive("$")
#[1] FALSE
## which is a bit confusing considering the argument name in
> `$`
#.Primitive("$")
##----
## other tries ...
> is.primitive($)
#Error: unexpected '$' in "is.primitive($"
> is.primitive("`$`")
#[1] FALSE
> is.primitive(`"$"`)
#Error in is.primitive(`"$"`) : object '"$"' not found
> sQuote("$")
#[1] "‘$’"
> is.primitive(sQuote("$"))
#[1] FALSE
> as.name("$") ## most promising! ...
#`$`
> is.primitive(as.name("$")) ## ...but no
#[1] FALSE
The reason I'm doing this is because I'd like to perform some analysis on the objects in package:base using something like
Vis.primitive <- Vectorize(is.primitive)
The vector x I'll be using is
x <- ls("package:base")
It is worth reading the help for functional programming
(?Map/?Filter)
This gives examples of how to use Filter to perform such analyses.
You could do something like
Filter(is.primitive, sapply(ls(baseenv()), get, baseenv()))
An alternative would be to use match.fun to search for the function
eg
match.fun('$')
There is no need to mess arround with `!
Related
I'm working on a project where I define some nouns like Haus, Boot, Kampf, ... and what to detect every version (singular/plurar) and every combination of these words in sentences. For example, the algorithm should return true if a sentences does contain one of : Häuser, Hausboot, Häuserkampf, Kampfboot, Hausbau, Bootsanleger, ....
Are you familiar with an algorithm that can do such a thing (preferable in R)? Of course I could implement this manually, but I'm pretty sure that something should already exist.
Thanks!
you can use stringr library and the grepl function as it is done in this example:
> # Toy example text
> text1 <- c(" This is an example where Hausbau appears twice (Hausbau)")
> text2 <- c(" Here it does not appear the name")
> # Load library
> library(stringr)
> # Does it appear "Hausbau"?
> grepl("Hausbau", text1)
[1] TRUE
> grepl("Hausbau", text2)
[1] FALSE
> # Number of "Hausbau" in the text
> str_count(text1, "Hausbau")
[1] 2
check <- c("Der Häuser", "Das Hausboot ist", "Häuserkampf", "Kampfboot im Wasser", "NotMe", "Hausbau", "Bootsanleger", "Schauspiel")
base <- c("Haus", "Boot", "Kampf")
unlist(lapply(str_to_lower(stringi::stri_trans_general(check, "Latin-ASCII")), function(x) any(str_detect(x, str_to_lower(base)) == T)))
# [1] TRUE TRUE TRUE TRUE FALSE TRUE TRUE TRUE
Breaking it down
Note the comment of Roland, you will match false TRUE values in words like "Schauspiel"
You need to get rid of the special characters, you can use stri_trans_general to translate them to Latin-ASCII
You need to convert your strings to lowercase (i.e. match Boot in Kampfboot)
Then apply over the strings to test and check if they are in the base list, if any of those values is true. You got a match.
I have a dataframe object that is presorted, and I am trying to call a function that requires it to be sorted. Somehow is.unsorted() is returning true. R then proceeds to sort it.
Unfortunately, there are about 2million entries, and I don't have the memory. Is there a way to force is.unsorted to be false?
Quick check of the RDocumentation (is.unsorted) includes the following line:
Note:
This function is designed for objects with one-dimensional indices, as described above. Data frames, matrices and other arrays may give surprising results.
Therefore, you should avoid using this function on complete data frames. Instead, you should run this function on certain features of the data frame, instead of the entire data frame itself.
Take the below code snippet for example. You can see that this function works as expected on one-dimensional objects (vectors); however has a surprising result when run on a data frame (returned FALSE when expecting TRUE result).
However, when the data frame was subset (using the $ operator) and the is.unsorted() function is run on the individual features, then it returns the expected result.
> vec <- c(1,2,3,4,5)
> is.unsorted(vec) # Expected: FALSE
[1] FALSE
> vec <- c(1,3,2,5,4)
> is.unsorted(vec) # Expected: TRUE
[1] TRUE
> vec <- c("A","B","C","D","E")
> is.unsorted(vec) # Expected: FALSE
[1] FALSE
> vec <- c("A","C","B","E","D")
> is.unsorted(vec) # Expected: TRUE
[1] TRUE
> dat <- data.frame(num=c(1,2,3,4,5)
+ ,chr=c("A","B","C","D","E")
+ ,stringsAsFactors=FALSE
+ )
> is.unsorted(dat) # Expected: FALSE
[1] FALSE
> dat <- data.frame(num=c(1,3,2,5,4)
+ ,chr=c("A","B","C","D","E")
+ ,stringsAsFactors=FALSE
+ )
> is.unsorted(dat) # Expected: TRUE
[1] FALSE
> is.unsorted(dat$num) # Expected: TRUE
[1] TRUE
> is.unsorted(dat$chr) # Expected: FALSE
[1] FALSE
This should be simple, but a limitiation with lapply (or at least in the way I understand to implement lapply) is only being allowed to pass a single list as the first argument. Here is a toy example of what I am trying to do:
a = list(1, 2, 3)
b = list(3, 2, 1)
a > b
What I want as output is:
[[1]]
[1] FALSE
[[2]]
[1] FALSE
[[3]]
[1] TRUE
which, through unlist(), I will convert to
[1] FALSE FALSE TRUE
Of course, you cannot do it using a > b, so instead, I get:
Error in a > b : comparison of these types is not implemented
What is the most elegant way to compare these lists without resorting to loops, which will yield an output similar to what I a looking for. Thanks!
I am using the binary operator %in% to subset a dataframe (I got the idea from another stackoverflow thread), but when I double check the result by switching the arguments, I get different answers. I've read the R documentation on the match() function, and it seems like neither match() nor %in% should be directionally dependent. I really need to understand exactly what is happening to be confident in my results. Could anybody provide some insight?
> filtered_ordGeneNames_proteinIDs <- ordGeneNames_ProteinIDs[ordGeneNames_ProteinIDs$V4 %in% ordDEGs$X, ];
> filtered2_ordGeneNames_proteinIDs <- ordDEGs[ordDEGs$X %in% ordGeneNames_ProteinIDs$V4, ];
> nrow(filtered_ordGeneNames_proteinIDs)
[1] 5767
> nrow(filtered2_ordGeneNames_proteinIDs)
[1] 5746
Of course you have different results:
ordGeneNames_ProteinIDs$V4 %in% ordDEGs$X
tells you which element of ordGeneNames_ProteinIDs$V4 that is also in ordDEGs$X
where :
ordDEGs %in% $XordGeneNames_ProteinIDs$V4
tells you which element of ordDEGs$X that is also in ordGeneNames_ProteinIDs$V4
compare
c(1,2,3,4) %in% c(1,2,1, 2)
[1] TRUE TRUE FALSE FALSE
to
c(1,2,1, 2) %in% c(1,2,3,4)
[1] TRUE TRUE TRUE TRUE
I am using the bit64 package in some R code. I have created a vector
of 64 bit integers and then tried to use sapply to iterate over these
integers in a vector. Here is an example:
v = c(as.integer64(1), as.integer64(2), as.integer64(3))
sapply(v, function(x){is.integer64(x)})
sapply(v, function(x){print(x)})
Both the is.integer64(x) and print(x) give the incorrect
(or at least) unexpected answers (FALSE and incorrect float values).
I can circumvent this by directly indexing the vector c but I have
two questions:
Why the type conversion? Is their some rule R uses in such a scenario?
Any way one can avoid this type conversion?
TIA.
Here is the code of lapply:
function (X, FUN, ...)
{
FUN <- match.fun(FUN)
if (!is.vector(X) || is.object(X))
X <- as.list(X)
.Internal(lapply(X, FUN))
}
Now check this:
!is.vector(v)
#TRUE
as.list(v)
#[[1]]
#[1] 4.940656e-324
#
#[[2]]
#[1] 9.881313e-324
#
#[[3]]
#[1] 1.482197e-323
From help("as.list"):
Attributes may be dropped unless the argument already is a list or
expression.
So, either you creaste a list from the beginning or you add the class attributes:
v_list <- lapply(as.list(v), function(x) {
class(x) <- "integer64"
x
})
sapply(v_list, function(x){is.integer64(x)})
#[1] TRUE TRUE TRUE
The package authours should consider writing a method for as.list. Might be worth a feature request ...