Tracemem is doing what I need it to, but it is also producing distracting visual clutter. Here is a simple example.
a<-1
b<-2
dummyfunction<-function(x,y){return(sum(x,y))}
dummyfunction(a,b)
[1] 3
Now, I want to do something more complex, first tracemem to see if the inputs are duplicated...
dummyfunction2<-function(x,y){if (tracemem(x)==tracemem(y)){return("Input vectors are identical")}
if(sum(x %in% y)>=length(x) & sum(y %in% x)>=length(y)){print("Something something.")}
return(sum(x,y))}
This does what I want if the inputs are duplicated...
dummyfunction2(a,a)
[1] "Input vectors are identical"
When they're not duplicated, though the function still works, it spews a bunch of confusing information.
dummyfunction2(a,b)
tracemem[0x0000000009824470 -> 0x000000000a7ced80]: match %in% dummyfunction2
tracemem[0x0000000009824500 -> 0x000000000a7cedb0]: match %in% dummyfunction2
tracemem[0x0000000009824500 -> 0x000000000a7cef90]: match %in% dummyfunction2
tracemem[0x0000000009824470 -> 0x000000000a7cc1a8]: match %in% dummyfunction2
[1] 3
I'm hoping to convince non-R users to try using a function with this issue, and output like this will certainly scare them off.
What is the most elegent way to remove this visual clutter without supressing potentially informative warnings. etc that may crop up in other portions of the function?
From http://stat.ethz.ch/R-manual/R-patched/library/base/html/tracemem.html :
"This function marks an object so that a message is printed whenever the internal code copies the object."
You could stick untracemem into the function to get around it:
dummyfunction3<-function(x,y){
if (tracemem(x)==tracemem(y)){return("Input vectors are identical")}
untracemem(x)
untracemem(y)
if(sum(x %in% y)>=length(x) & sum(y %in% x)>=length(y)){print("Something something.")}
return(sum(x,y))}
output:
a <- 1
b <- 2
dummyfunction3(a,a)
# [1] "Input vectors are identical"
dummyfunction3(a,b)
# [1] 3
Don't use tracemem(). Instead you could try pryr::address() which
just returns the memory address of the input.
devtools::install_github("hadley/pryr")
library(pryr)
x <- 1:10
y <- x
address(x)
## [1] "0x100a568c8"
address(y)
## [1] "0x100a568c8"
Related
What's the difference between the in and the %in% operator in R? Why do I sometimes need the percentage signs and other times I do not?
The 3 following objects are all functions :
identity
%in%
for
We can call them this way :
`identity`(1)
#> [1] 1
`%in%`(1, 1:2)
#> [1] TRUE
`for`(x, seq(3), print("yes"))
#> [1] "yes"
#> [1] "yes"
#> [1] "yes"
But usually we don't!
"identity" is syntactic (i.e. it's a "regular" name, doesn't contain weird symbols etc), AND it is not a protected word so we can skip the tick marks and call just :
identity(1)
%in% is not syntactic but it starts and ends with "%" so it can be used in infix form. you could define your own `%fun%` <-function(x,y) ... and use it this way to, so we would call :
1 %in% 1:2
for is a control flow construct, like if, while and repeat, all of those are functions with a given number of arguments, but they come in the language with more convenient ways to call them than the above. here we'd do :
for (x in seq(3)) print("yes")
in is just used to parse the code, it's not a function here (just like else isn't either.
?`%in%` will show you what the function does.
Depending on how you define it, there is no in operator in R, only an %in% operator. Instead, in is “syntactic sugar” as part of the syntax for the for loop.
By contrast, %in% is an actual operator defined in R which tests whether the left-hand expression is contained in the right-hand expression. As other operators in R, %in% is a regular function and can be called as such:
if (`%in%`(x, seq(3, 5))) message("yes")
… or it can be redefined:
`%in%` = function (x, table) {
message("I redefined %in%!")
match(x, table, nomatch = 0L) > 0L
}
if (5 %in% 1 : 10) message("yes")
# I redefined %in%!
# yes
Usage-wise, I have figured out the answer: I can only use in when I loop through everything, and %in%for checking whether something is contained in something else, e.g.
for (x in seq(3)){
if (x %in% seq(3,5)) print("yes")
}
I have this vector
v <- c("firstOne","firstTwo","secondOne")
I would like to factor the vector assigning c("firstOne","firstTwo) to the same level (i.e., firstOne). I have tried this:
> factor(v, labels = c("firstOne", "firstOne", "secondOne"))
[1] firstOne firstOne secondOne
Levels: firstOne firstOne secondOne
But I get a duplicate factor (and a warning message advising not to use it). Instead, I would like the output to look like:
[1] firstOne firstOne secondOne
Levels: firstOne secondOne
Is there any way to get this output without brutally substituting the character strings?
Here are a couple of options:
v <- factor(ifelse(v %in% c("firstOne", "firstTwo"), "firstOne", "secondOne"))
v <- factor(v,levels = c("firstOne","secondOne")); f[is.na(f)] <- 'firstOne'
A factor is just a numeric (integer) vector with labels, and so manipulating a factor is equivalent to manipulating integers, rather than character strings. Therefore performance-wise is perfectly OK to do
f <- as.factor(v)
f[f %in% c('firstOne', 'firstTwo')] <- 'firstOne'
f <- droplevels(f)
You could use the rec-function of the sjmisc-package:
rec(v, "firstTwo=firstOne;else=copy", as.fac = T)
> [1] firstOne firstOne secondOne
> Levels: firstOne secondOne
(the output is shortened; note that the sjmisc-package supports labelled data and thus adds label attributes to the vector, which you'll see in the console output as well)
Eventually I also found a solution which looks somehow sloppy but I don't see major issues (looking forward to listen which might be possible problems with this tho):
v <- c("firstOne","firstTwo","secondOne")
factor(v)
factor(factor(v,labels = c("firstOne","firstOne","secondOne")))
Using a basic function such as this:
myname<-function(z){
nm <-deparse(substitute(z))
print(nm)
}
I'd like the name of the item to be printed (or returned) when iterating through a list e.g.
for (csv in list(acsv, bcsv, ccsv)){
myname(csv)
}
should print:
acsv
bcsv
ccsv
(and not csv).
It should be noted that acsv, bcsv, and ccsvs are all dataframes read in from csvs i.e.
acsv = read.csv("a.csv")
bcsv = read.csv("b.csv")
ccsv = read.csv("c.csv")
Edit:
I ended up using a bit of a compromise. The primary goal of this was not to simply print the frame name - that was the question, because it is a prerequisite for doing other things.
I needed to run the same functions on four identically formatted files. I then used this syntax:
for(i in 1:length(csvs)){
cat(names(csvs[i]), "\n")
print(nrow(csvs[[i]]))
print(nrow(csvs[[i]][1]))
}
Then the indexing of nested lists was utilized e.g.
print(nrow(csvs[[i]]))
which shows the row count for each of the dataframes.
print(nrow(csvs[[i]][1]))
Then provides a table for the first column of each dataframe.
I include this because it was the motivator for the question. I needed to be able to label the data for each dataframe being examined.
The list you have constructed doesn't "remember" the expressions it was constructed of anymore. But you can use a custom constructor:
named.list <- function(...) {
l <- list(...)
exprs <- lapply(substitute(list(...))[-1], deparse)
names(l) <- exprs
l
}
And so:
> named.list(1+2,sin(5),sqrt(3))
$`1 + 2`
[1] 3
$`sin(5)`
[1] -0.9589243
$`sqrt(3)`
[1] 1.732051
Use this list as parameter to names, as Thomas suggested:
> names(mylist(1+2,sin(5),sqrt(3)))
[1] "1 + 2" "sin(5)" "sqrt(3)"
To understand what's happening here, let's analyze the following:
> as.list(substitute(list(1+2,sqrt(5))))
[[1]]
list
[[2]]
1 + 2
[[3]]
sqrt(5)
The [-1] indexing leaves out the first element, and all remaining elements are passed to deparse, which works because of...
> lapply(as.list(substitute(list(1+2,sqrt(5))))[-1], class)
[[1]]
[1] "call"
[[2]]
[1] "call"
Note that you cannot "refactor" the call list(...) inside substitute() to use simply l. Do you see why?
I am also wondering if such a function is already available in one of the countless R packages around. I have found this post by William Dunlap effectively suggesting the same approach.
I don't know what your data look like, so here's something made up:
csvs <- list(acsv=data.frame(x=1), bcsv=data.frame(x=2), ccsv=data.frame(x=3))
for(i in 1:length(csvs))
cat(names(csvs[i]), "\n")
I want to use information from a field and include it in a R function, e.g.:
data #name of the data.frame with only one raw
"(if(nclusters>0){OptmizationInputs[3,3]*beta[1]}else{0})" # this is the raw
If I want to use this information inside a function how could I do it?
Another example:
A=c('x^2')
B=function (x) A
B(2)
"x^2" # this is the return. I would like to have the return something like 2^2=4.
Use body<- and parse
A <- 'x^2'
B <- function(x) {}
body(B) <- parse(text = A)
B(3)
## [1] 9
There are more ideas here
Another option using plyr:
A <- 'x^2'
library(plyr)
body(B) <- as.quoted(A)[[1]]
> B(5)
[1] 25
A <- "x^2"; x <- 2
BB <- function(z){ print( as.expression(do.call("substitute",
list( parse(text=A)[[1]], list(x=eval(x) ) )))[[1]] );
cat( "is equal to ", eval(parse(text=A)))
}
BB(2)
#2^2
#is equal to 4
Managing expressions in R is very weird. substitute refuses to evaluate its first argument so you need to use do.call to allow the evaluation to occur before the substitution. Furthermore the printed representation of the expressions hides their underlying representation. Try removing the fairly cryptic (to my way of thinking) [[1]] after the as.expression(.) result.
I'm using R, and I'm a beginner. I have two large lists (30K elements each). One is called descriptions and where each element is (maybe) a tokenized string. The other is called probes where each element is a number. I need to make a dictionary that mapsprobes to something in descriptions, if that something is there. Here's how I'm going about this:
probe2gene <- list()
for (i in 1:length(probes)){
strings<-strsplit(descriptions[i]), '//')
if (length(strings[[1]]) > 1){
probe2gene[probes[i]] = strings[[1]][2]
}
}
Which works fine, but seems slow, much slower than the roughly equivalent python:
probe2gene = {}
for p,d in zip(probes, descriptions):
try:
probe2gene[p] = descriptions.split('//')[1]
except IndexError:
pass
My question: is there an "R-thonic" way of doing what I'm trying to do? The R manual entry on for loops suggests that such loops are rare. Is there a better solution?
Edit: a typical good "description" looks like this:
"NM_009826 // Rb1cc1 // RB1-inducible coiled-coil 1 // 1 A2 // 12421 /// AB070619 // Rb1cc1 // RB1-inducible coiled-coil 1 // 1 A2 // 12421 /// ENSMUST00000027040 // Rb1cc1 // RB1-inducible coiled-coil 1 // 1 A2 // 12421"
a bad "description: looks like this
"-----"
though it can quite easily be some other not-very-helpful string. Each probe is simply a number. The probe and description vectors are the same length, and completely correspond to each other, i.e. probe[i] maps to description[i].
It's usually better in R if you use the various apply-like functions, rather than a loop. I think this solves your problem; the only drawback is that you have to use string keys.
> descriptions <- c("foo//bar", "")
> probes <- c(10, 20)
> probe2gene <- lapply(strsplit(descriptions, "//"), function (x) x[2])
> names(probe2gene) <- probes
> probe2gene <- probe2gene[!is.na(probe2gene)]
> probe2gene[["10"]]
[1] "bar"
Unfortunately, R doesn't have a good dictionary/map type. The closest I've found is using lists as a map from string-to-value. That seems to be idiomatic, but it's ugly.
If I understand correctly you are looking to save each probe-description combination where the there is more than one (split) value in description?
Probe and Description are the same length?
This is kind of messy but a quick first pass at it?
a <- list("a","b","c")
b <- list(c("a","b"),c("DEF","ABC"),c("Z"))
names(b) <- a
matches <- which(lapply(b, length)>1) #several ways to do this
b <- lapply(b[matches], function(x) x[2]) #keeps the second element only
That's my first attempt. If you have a sample dataset that would be very useful.
Best regards,
Jay
Another way.
probe<-c(4,3,1)
gene<-c('red//hair','strange','blue//blood')
probe2gene<-character()
probe2gene[probe]<-sapply(strsplit(gene,'//'),'[',2)
probe2gene
[1] "blood" NA NA "hair"
In the sapply, we take advantage of the fact that in R the subsetting operator is also a function named '[' to which we can pass the index as an argument. Also, an out-of-range index does not cause an error but gives a NA value. On the left hand of the same line, we use the fact that we can pass a vector of indices in any order and with gaps.
Here's another approach that should be fast. Note that this doesn't
remove the empty descriptions. It could be adapted to do that or you
could clean those in a post processing step using lapply. Is it the
case that you'll never have a valid description of length one?
make_desc <- function(n)
{
word <- function(x) paste(sample(letters, 5, replace=TRUE), collapse = "")
if (runif(1) < 0.70)
paste(sapply(seq_len(n), word), collapse = "//")
else
"----"
}
description <- sapply(seq_len(10), make_desc)
probes <- seq_len(length(description))
desc_parts <- strsplit(description, "//", fixed=TRUE, useBytes=TRUE)
lens <- sapply(desc_parts, length)
probes_expand <- rep(probes, lens)
ans <- split(unlist(desc_parts), probes_expand)
> description
[1] "fmbec"
[2] "----"
[3] "----"
[4] "frrii//yjxsa//wvkce//xbpkc"
[5] "kazzp//ifrlz//ztnkh//dtwow//aqvcm"
[6] "stupm//ncqhx//zaakn//kjymf//swvsr//zsexu"
[7] "wajit//sajgr//cttzf//uagwy//qtuyh//iyiue//xelrq"
[8] "nirex//awvnw//bvexw//mmzdp//lvetr//xvahy//qhgym//ggdax"
[9] "----"
[10] "ubabx//tvqrd//vcxsp//rjshu//gbmvj//fbkea//smrgm//qfmpy//tpudu//qpjbu"
> ans[[3]]
[1] "----"
> ans[[4]]
[1] "frrii" "yjxsa" "wvkce" "xbpkc"