Calling two functions with the same name from different sources() - r

in regards to this post: How can I call two functions with same name from two source files in R?
The example provided shows two functions with the same name coming from two different sources:
in "aa.R"
hi <- function(){
print("hi, aa")
}
in "bb.R"
hi <- function(){
print("hi, bb")
}
Now what I want to do is call function hi from aa.R by refernecing the source. I know when working with packages i can use:
packagename::functionname()
But when working with source(filename.R) it doens't work.
One of the provided answers explains to use two different envoirments, which i would not prefer doing, as i feel like this makes it much more accessible to errors.
Also being said, that it's not very smart to name two functions the same name. While I totally agree, that it would make more sense to use different names for functions actually doing something different, i would still prefer calling functions by reference of source, for readability purpose, as i can instantly see the file i am grabbing the function from, while debugging or coding in general.
One more thing: Whats stopping me from creating a package with the functions i am working with? Is there any reason not to create a package from a source-file contianing functions used in my main script?
Thanks for any advice.

Except for (5) these all use the example in the Note at the end.
1) separate environments source each file into a separate environment and then qualify the name using the appropriate environment when calling the function. this seems very close to library(aa); aa::aa.R in spirit.
source("aa.R", local = aa <- new.env())
source("bb.R", local = bb <- new.env())
aa$hi()
## [1] "hi, aa"
bb$hi()
## [1] "hi, bb"
1a) A variation of this is to put only one of the hi's in a separate environment. That might be useful in the case that that one is less used.
source("aa.R", local = aa <- new.env())
source("bb.R")
aa$hi()
## [1] "hi, aa"
hi()
## [1] "hi, bb"
1b) A variation of this is to attach them to the search list.
source("aa.R", local = attach(NULL, name = "aa"))
source("bb.R", local = attach(NULL, name = "bb"))
as.environment("aa")$hi()
## [1] "hi, aa"
as.environment("bb")$hi()
## [1] "hi, bb"
2) box Konrad Rudolph's box package (on CRAN) can be used for this. Again this is not much different than library(aa); aa::aa.R .
box::use(./aa)
box::use(./bb)
aa$hi()
## [1] "hi, aa"
bb$hi()
## [1] "hi, bb"
3) re-source Another approach is to reread the file each time hi is called.
source("aa.R")
hi()
## [1] "hi, aa"
source("bb.R")
hi()
## [1] "hi, bb"
source("aa.R")
hi()
## [1] "hi, aa"
4) rename Yet another approach is to rename hi each time it is read. This won't work if hi itself is used within the source file but is ok otherwise. (This could also be combined with one of the above solutions.)
source("aa.R")
aa_hi <- hi
rm(hi)
source("bb.R")
bb_hi <- hi
rm(hi)
aa_hi()
## [1] "hi, aa"
bb_hi()
## [1] "hi, bb"
5) S3 In the case that the two instances of hi work on different input classes they could be made to be methods of the same generic. Change the example to this:
cat('hi <- function(x, ...) UseMethod("hi")
hi.numeric <- function(x) {
print(paste("hi, aa -", x))
}', file = "aa.R")
cat('hi <- function(x, ...) UseMethod("hi")
hi.character <- function(x) {
print(paste("hi, bb -", x))
}', file = "bb.R")
source("aa.R")
source("bb.R")
hi(1)
## [1] "hi, aa - 1"
hi("z")
## [1] "hi, bb - z"
6) modules The modules package (on CRAN) can be used.
library(modules)
aa <- use("aa.R")
bb <- use("bb.R")
aa$hi()
## [1] "hi, aa"
bb$hi()
## [1] "hi, bb"
7) package Another possibility is to convert the script to a package. Run the following to convert aa.R to a package, build, install and run it. Similarly for bb.R .
library(pkgKitten)
library(devtools)
setwd("...directory containing aa.R...")
kitten("aa", author = "me") # create empty package
file.copy("aa.R", "aa/R") # add script to it
setwd("aa")
build()
install()
setwd("..")
aa::hi()
## [1] "hi, aa"
Note
Generate the input in reproducible form.
cat('hi <- function() {
print("hi, aa")
}', file = "aa.R")
cat('hi <- function() {
print("hi, bb")
}', file = "bb.R")

Here is an approach based on environments :
my_Fun <- function(path_To_File)
{
source(path_To_File)
new_Env <- new.env()
new_Env$hi <- hi
return(new_Env)
}
env1 <- my_Fun("aa.R")
env2 <- my_Fun("bb.R")
> evalq(hi(), env1)
[1] "hi, aa"
> evalq(hi(), env2)
[1] "hi, bb"

Related

Nested List Parsing with jsonlite

This is the second time that I have faced this recently, so I wanted to reach out to see if there is a better way to parse dataframes returned from jsonlite when one of elements is an array stored as a column in the dataframe as a list.
I know that this part of the power with jsonlite, but I am not sure how to work with this nested structure. In the end, I suppose that I can write my own custom parsing, but given that I am almost there, I wanted to see how to work with this data.
For example:
## options
options(stringsAsFactors=F)
## packages
library(httr)
library(jsonlite)
## setup
gameid="2015020759"
SEASON = '20152016'
BASE = "http://live.nhl.com/GameData/"
URL = paste0(BASE, SEASON, "/", gameid, "/PlayByPlay.json")
## get the data
x <- GET(URL)
## parse
api_response <- content(x, as="text")
api_response <- jsonlite::fromJSON(api_response, flatten=TRUE)
## get the data of interest
pbp <- api_response$data$game$plays$play
colnames(pbp)
And exploring what comes back:
> class(pbp$aoi)
[1] "list"
> class(pbp$desc)
[1] "character"
> class(pbp$xcoord)
[1] "integer"
From above, the column pbp$aoi is a list. Here are a few entries:
> head(pbp$aoi)
[[1]]
[1] 8465009 8470638 8471695 8473419 8475792 8475902
[[2]]
[1] 8470626 8471276 8471695 8476525 8476792 8477956
[[3]]
[1] 8469619 8471695 8473492 8474625 8475727 8476525
[[4]]
[1] 8469619 8471695 8473492 8474625 8475727 8476525
[[5]]
[1] 8469619 8471695 8473492 8474625 8475727 8476525
[[6]]
[1] 8469619 8471695 8473492 8474625 8475727 8475902
I don't really care if I parse these lists in the same dataframe, but what do I have for options to parse out the data?
I would prefer to take the data out of out lists and parse them into a dataframe that can be "related" to the original record it came from.
Thanks in advance for your help.
From #hrbmstr above, I was able to get what I wanted using unnest.
select(pbp, eventid, aoi) %>% unnest() %>% head

Extracting hashtags from twitter - string in R error

I have twitter data. Using library(stringr) i have extracted all the weblinks. However, when I try to do the same I am getting error. The same code had worked some days ago. The following is the code:
library(stringr)
hash <- "#[a-zA-Z0-9]{1, }"
hashtag <- str_extract_all(travel$texts, hash)
The following is the error:
Error in stri_extract_all_regex(string, pattern, simplify = simplify, :
Error in {min,max} interval. (U_REGEX_BAD_INTERVAL)
I have re-installed stringr package....but doesn't help.
The code that I used for weblink is:
pat1 <- "http://t.co/[a-zA-Z0-9]{1,}"
twitlink <- str_extract_all(travel$texts, pat1)
The reproduceable example is as follows:
rtt <- structure(data.frame(texts = c("Review Anthem of the Seas Anthems maiden voyage httptcoLPihj2sNEP #stevenewman", "#Job #Canada #Marlin Travel Agentagente de voyages Full Time in #St Catharines ON httptconMHNlDqv69", "Experience #Fiji amp #NewZealand like never before on a great 10night voyage 4033 pp departing Vancouver httptcolMvChSpaBT"), source = c("Twitter Web Client", "Catch a Job Canada", "Hootsuite"), tweet_time = c("2015-05-07 19:32:58", "2015-05-07 19:37:03", "2015-05-07 20:45:36")))
Your problem comes from the whitespace in the hash:
#Not working (look the whitespace after the comma)
str_extract_all(rtt$texts,"#[a-zA-Z0-9]{1, }")
#working
str_extract_all(rtt$texts,"#[a-zA-Z0-9]{1,}")
You may want to consider usig the qdapRegex package that I maintain for this task. It makes extracting urls and hash tags easy. qdapRegex is a package that contains a bunch of canned regex and the uses the amazing stringi package as a backend to do the regex task.
rtt <- structure(data.frame(texts = c("Review Anthem of the Seas Anthems maiden voyage httptcoLPihj2sNEP #stevenewman", "#Job #Canada #Marlin Travel Agentagente de voyages Full Time in #St Catharines ON httptconMHNlDqv69", "Experience #Fiji amp #NewZealand like never before on a great 10night voyage 4033 pp departing Vancouver httptcolMvChSpaBT"), source = c("Twitter Web Client", "Catch a Job Canada", "Hootsuite"), tweet_time = c("2015-05-07 19:32:58", "2015-05-07 19:37:03", "2015-05-07 20:45:36")))
library(qdapRegex)
## first combine the built in url + twitter regexes into a function
rm_twitter_n_url <- rm_(pattern=pastex("#rm_twitter_url", "#rm_url"), extract=TRUE)
rm_twitter_n_url(rtt$texts)
rm_hash(rtt$texts, extract=TRUE)
Giving the following output:
## > rm_twitter_n_url(rtt$texts)
## [[1]]
## [1] "httptcoLPihj2sNEP"
##
## [[2]]
## [1] "httptconMHNlDqv69"
##
## [[3]]
## [1] "httptcolMvChSpaBT"
## > rm_hash(rtt$texts, extract=TRUE)
## [[1]]
## [1] "#stevenewman"
##
## [[2]]
## [1] "#Job" "#Canada" "#Marlin" "#St"
##
## [[3]]
## [1] "#Fiji" "#NewZealand"

Data loss during read.csv in R

I have a .csv file to be imported into R, which has more than 1K observations. However, when I used the read.csv function as usual, the imported file only has 21 observations. This is strange. I've never seen this before.
t <- read.csv("E:\\AH1_09182014.CSV",header=T, colClasses=c(rep("character",3),rep("numeric",22)),na.string=c("null","NaN",""),stringsAsFactors=FALSE)
Can anyone help me figure out the problem? I am giving a link to my data file:
https://drive.google.com/file/d/0B86_a8ltyoL3TzBza0x1VTd2OTQ/edit?usp=sharing
You have some messy characters in your data--things like embedded control characters.
A workaround is to read the file in binary mode, and use read.csv on the text file read in.
This answer proposes a basic function to do those steps.
The function looks like this:
sReadLines <- function(fnam) {
f <- file(fnam, "rb")
res <- readLines(f)
close(f)
res
}
You can use it as follows:
temp <- read.csv(text = sReadLines("~/Downloads/AH1_09182014.CSV"),
stringsAsFactors = FALSE)
Have all lines been read in?
dim(temp)
# [1] 1449 25
Where is that problem line?
unlist(temp[21, ], use.names = FALSE)
# [1] "A-H Log 1" "09/18/2014" "0:19:00" "7.866" "255" "0.009"
# [7] "525" "7" "4468" "76" "4576.76" "20"
# [13] "71" "19" "77" "1222" "33857" "-3382"
# [19] "26\032)" "18.30" "84.80" "991.43" "23713.90" "0.85"
# [25] "10.54"
^^ see item [19] above.
Because of this, you won't be able to specify all of your column types up front--unless you clean the CSV first.

parent.env( x ) confusion

I've read the documentation for parent.env() and it seems fairly straightforward - it returns the enclosing environment. However, if I use parent.env() to walk the chain of enclosing environments, I see something that I cannot explain. First, the code (taken from "R in a nutshell")
library( PerformanceAnalytics )
x = environment(chart.RelativePerformance)
while (environmentName(x) != environmentName(emptyenv()))
{
print(environmentName(parent.env(x)))
x <- parent.env(x)
}
And the results:
[1] "imports:PerformanceAnalytics"
[1] "base"
[1] "R_GlobalEnv"
[1] "package:PerformanceAnalytics"
[1] "package:xts"
[1] "package:zoo"
[1] "tools:rstudio"
[1] "package:stats"
[1] "package:graphics"
[1] "package:utils"
[1] "package:datasets"
[1] "package:grDevices"
[1] "package:roxygen2"
[1] "package:digest"
[1] "package:methods"
[1] "Autoloads"
[1] "base"
[1] "R_EmptyEnv"
How can we explain the "base" at the top and the "base" at the bottom? Also, how can we explain "package:PerformanceAnalytics" and "imports:PerformanceAnalytics"? Everything would seem consistent without the first two lines. That is, function chart.RelativePerformance is in the package:PerformanceAnalytics environment which is created by xts, which is created by zoo, ... all the way up (or down) to base and the empty environment.
Also, the documentation is not super clear on this - is the "enclosing environment" the environment in which another environment is created and thus walking parent.env() shows a "creation" chain?
Edit
Shameless plug: I wrote a blog post that explains environments, parent.env(), enclosures, namespace/package, etc. with intuitive diagrams.
1) Regarding how base could be there twice (given that environments form a tree), its the fault of the environmentName function. Actually the first occurrence is .BaseNamespaceEnv and the latter occurrence is baseenv().
> identical(baseenv(), .BaseNamespaceEnv)
[1] FALSE
2) Regarding the imports:PerformanceAnalytics that is a special environment that R sets up to hold the imports mentioned in the package's NAMESPACE or DESCRIPTION file so that objects in it are encountered before anything else.
Try running this for some clarity. The str(p) and following if statements will give a better idea of what p is:
library( PerformanceAnalytics )
x <- environment(chart.RelativePerformance)
str(x)
while (environmentName(x) != environmentName(emptyenv())) {
p <- parent.env(x)
cat("------------------------------\n")
str(p)
if (identical(p, .BaseNamespaceEnv)) cat("Same as .BaseNamespaceEnv\n")
if (identical(p, baseenv())) cat("Same as baseenv()\n")
x <- p
}
The first few items in your results give evidence of the rules R uses to search for variables used in functions in packages with namespaces. From the R-ext manual:
The namespace controls the search strategy for variables used by functions in the package.
If not found locally, R searches the package namespace first, then the imports, then the base
namespace and then the normal search path.
Elaborating just a bit, have a look at the first few lines of chart.RelativePerformance:
head(body(chart.RelativePerformance), 5)
# {
# Ra = checkData(Ra)
# Rb = checkData(Rb)
# columns.a = ncol(Ra)
# columns.b = ncol(Rb)
# }
When a call to chart.RelativePerformance is being evaluated, each of those symbols --- whether the checkData on line 1, or the ncol on line 3 --- needs to be found somewhere on the search path. Here are the first few enclosing environments checked:
First off is namespace:PerformanceAnalytics. checkData is found there, but ncol is not.
Next stop (and the first location listed in your results) is imports:PerformanceAnalytics. This is the list of functions specified as imports in the package's NAMESPACE file. ncol is not found here either.
The base environment namespace (where ncol will be found) is the last stop before proceeding to the normal search path. Almost any R function will use some base functions, so this stop ensures that none of that functionality can be broken by objects in the global environment or in other packages. (R's designers could have left it to package authors to explicitly import the base environment in their NAMESPACE files, but adding this default pass through base does seem like the better design decision.)
The second base is .BaseNamespaceEnv, while the second to last base is baseenv(). These are not different (probably w.r.t. its parents). The parent of .BaseNamespaceEnv is .GlobalEnv, while that of baseenv() is emptyenv().
In a package, as #Josh says, R searches the namespace of the package, then the imports, and then the base (i.e., BaseNamespaceEnv).
you can find this by, e.g.:
> library(zoo)
> packageDescription("zoo")
Package: zoo
# ... snip ...
Imports: stats, utils, graphics, grDevices, lattice (>= 0.18-1)
# ... snip ...
> x <- environment(zoo)
> x
<environment: namespace:zoo>
> ls(x) # objects in zoo
[1] "-.yearmon" "-.yearqtr" "[.yearmon"
[4] "[.yearqtr" "[.zoo" "[<-.zoo"
# ... snip ...
> y <- parent.env(x)
> y # namespace of imported packages
<environment: 0x116e37468>
attr(,"name")
[1] "imports:zoo"
> ls(y) # objects in the imported packages
[1] "?" "abline"
[3] "acf" "acf2AR"
# ... snip ...

Extracting synonymous terms from wordnet using synonym()

Supposed I am pulling the synonyms of "help" by the function of synonyms() from wordnet and get the followings:
Str = synonyms("help")
Str
[1] "c(\"aid\", \"assist\", \"assistance\", \"help\")"
[2] "c(\"aid\", \"assistance\", \"help\")"
[3] "c(\"assistant\", \"helper\", \"help\", \"supporter\")"
[4] "c(\"avail\", \"help\", \"service\")"
Then I can get a one character string using
unique(unlist(lapply(parse(text=Str),eval)))
at the end that looks like this:
[1] "aid" "assist" "assistance" "help" "assistant" "helper" "supporter"
[8] "avail" "service"
The above process was suggested by Gabor Grothendieck. His/Her solution is good, but I still couldn't figure out that if I change the query term into "company", "boy", or someone else, an error message will be responsed.
One possible reason maybe due to the "sixth" synonym of "company" (please see below) is a single term and does not follow the format of "c(\"company\")".
synonyms("company")
[1] "c(\"caller\", \"company\")"
[2] "c(\"company\", \"companionship\", \"fellowship\", \"society\")"
[3] "c(\"company\", \"troupe\")"
[4] "c(\"party\", \"company\")"
[5] "c(\"ship's company\", \"company\")"
[6] "company"
Could someone kindly help me to solve this problem.
Many thanks.
You can solve this by creating a little helper function that uses R's try mechanism to catch errors. In this case, if the eval produces an error, then return the original string, else return the result of eval:
Create a helper function:
evalOrValue <- function(expr, ...){
z <- try(eval(expr, ...), TRUE)
if(inherits(z, "try-error")) as.character(expr) else unlist(z)
}
unique(unlist(sapply(parse(text=Str), evalOrValue)))
Produces:
[1] "caller" "company" "companionship"
[4] "fellowship" "society" "troupe"
[7] "party" "ship's company"
I reproduced your data and then used dput to reproduce it here:
Str <- c("c(\"caller\", \"company\")", "c(\"company\", \"companionship\", \"fellowship\", \"society\")",
"c(\"company\", \"troupe\")", "c(\"party\", \"company\")", "c(\"ship's company\", \"company\")",
"company")
Those synonyms are in a form that looks like an expression, so you should be able to parse them as you illustrated. BUT: When I execute your original code above I get an error from the synonyms call because you included no part-of-speech argument.
> synonyms("help")
Error in charmatch(x, WN_synset_types) :
argument "pos" is missing, with no default
Observe that the code of synonyms uses getSynonyms and that its code has a unique wrapped around it so all of the pre-processing you are doing is no longer needed (if you update);:
> synonyms("company", "NOUN")
[1] "caller" "companionship" "company"
[4] "fellowship" "party" "ship's company"
[7] "society" "troupe"
> synonyms
function (word, pos)
{
filter <- getTermFilter("ExactMatchFilter", word, TRUE)
terms <- getIndexTerms(pos, 1L, filter)
if (is.null(terms))
character()
else getSynonyms(terms[[1L]])
}
<environment: namespace:wordnet>
> getSynonyms
function (indexterm)
{
synsets <- .jcall(indexterm, "[Lcom/nexagis/jawbone/Synset;",
"getSynsets")
sort(unique(unlist(lapply(synsets, getWord))))
}
<environment: namespace:wordnet>

Resources