R gotcha: `.packages()` vs `(.packages())` - r

I am wrapping my head around this:
> .packages()
> (.packages())
[1] "stats" "graphics" "grDevices" "utils" "datasets" "methods" "base"
How is it possible that the first command outputs nothing and the second one works? I guess this is yet another syntax gotcha of R.

From the help page for .packages
‘.packages()’ returns the names of the currently attached packages
_invisibly_ whereas ‘.packages(all.available = TRUE)’ gives
(visibly) _all_ packages available in the library location path
‘lib.loc’.
Read the help page on invisible for more info but basically if something is returned invisibly then it won't automatically print. It will still be there so you can store it into an object it just won't display by default. Here are a few other examples
> 3
[1] 3
> invisible(3)
> x <- invisible(3)
> x
[1] 3
We see that when wrapped in invisible the "3" doesn't automatically print. We still can store it into an object even when it's invisible though.
Edit: Note that using invisible only masks the printing when the result would be autoprinted by the interpreter. We can force it to print using print or pretty much any other function call (of which ( counts as a function which is why wrapping the command in parenthesis prints the result).
> invisible(3) + 0
[1] 3
> I(invisible(3))
[1] 3
> (invisible(3))
[1] 3
> print(invisible(3))
[1] 3

Related

Trouble With Outputting all Elements to Console

I have an issue where I'm trying to use sink() to capture my console output to a text file. However, my console keeps on restricting my print statements, despite having set max.print to the maximum integer in R.
I have consulted various other stackoverflow links but to no avail. Has anyone solved this issue?
This is a sample output, despite having changed max.print.
options(max.print = .Machine$integer.max)
> print(outputFile[1])
[[1]]
+ 1681/519133 vertices, named, from 71aeda5:
[1] p_8945206-t_25 p_24353782-t_0 p_5096967-t_0
[4] p_12728438-t_2 p_1914103-t_8 p_7949965-t_59
[7] p_5171435-t_4 p_6628106-t_7 p_2535537-t_0
[10] p_45026190-t_2 p_25504870-t_8 p_796238-t_1
[13] p_135998-t_13 p_20853906-t_1 p_17154085-t_0
[16] p_29505258-t_4 p_27269129-t_13 p_6793896-t_92
[19] p_5331193-t_1 p_11521441-t_2 p_34271996-t_2
[22] p_95594-t_0 p_16395989-t_0 p_582576-t_3
[25] p_9368888-t_1 p_697462-t_28 p_80124-t_72
[28] p_7595644-t_0 p_14372110-t_4 p_2083314-t_2
+ ... omitted several vertices
Additionally, I have tried indexing but it hasn't worked.
igraph specific options like auto.print.lines should still affect the printing of your graph objects, even if they're contained in a list. Using a combination of auto.print.lines and max.print, I'm able to get graphs to print out in full:
library(purrr)
library(igraph)
# Using purrr to create a list of multiple large graphs
gs = map(1:5, ~ random.graph.game(200, 0.1))
options(max.print = .Machine$integer.max)
igraph_options(auto.print.lines = Inf)
print(gs)

Why does sapply of an ordered list outputs my content twice

I stored a list of files in a list using this code:
filesList <- list.files(path="/Users/myPath/data/", pattern="*.csv")
I then wanted to output it without the indexes (that usually appear of form [1] at start of each line, so I tried this:
sapply(filesList[order(filesList)], print)
The result is given below copied exactly from RStudio. Why does my list of files output twice? I can work with this, I am just curious.
[1] "IMDB_Bottom250movies.csv"
[1] "IMDB_Bottom250movies2_OMDB_Detailed.csv"
[1] "IMDB_Bottom250movies2.csv"
[1] "IMDB_ErrorLogIDs1_OMDB_Detailed.csv"
[1] "IMDB_ErrorLogIDs1.csv"
[1] "IMDB_ErrorLogIDs2_OMDB_Detailed.csv"
[1] "IMDB_ErrorLogIDs2.csv"
[1] "IMDB_OMDB_Kaggle_TestSet_OMDB_Detailed.csv"
[1] "IMDB_OMDB_Kaggle_TestSet.csv"
[1] "IMDB_Top250Engmovies.csv"
[1] "IMDB_Top250Engmovies2_OMDB_Detailed.csv"
[1] "IMDB_Top250Engmovies2.csv"
[1] "IMDB_Top250Indianmovies.csv"
[1] "IMDB_Top250Indianmovies2_OMDB_Detailed.csv"
[1] "IMDB_Top250Indianmovies2.csv"
[1] "IMDB_Top250movies.csv"
[1] "IMDB_Top250movies2_OMDB_Detailed.csv"
[1] "IMDB_Top250movies2.csv"
[1] "TestDoc2_KaggleData_OMDB_Detailed.csv"
[1] "TestDoc2_KaggleData.csv"
[1] "TestDoc2_KaggleData68_OMDB_Detailed.csv"
[1] "TestDoc2_KaggleData68.csv"
[1] "TestDoc2_KaggleDataHUGE_OMDB_Detailed.csv"
[1] "TestDoc2_KaggleDataHUGE.csv"
IMDB_Bottom250movies.csv IMDB_Bottom250movies2_OMDB_Detailed.csv
"IMDB_Bottom250movies.csv" "IMDB_Bottom250movies2_OMDB_Detailed.csv"
IMDB_Bottom250movies2.csv IMDB_ErrorLogIDs1_OMDB_Detailed.csv
"IMDB_Bottom250movies2.csv" "IMDB_ErrorLogIDs1_OMDB_Detailed.csv"
IMDB_ErrorLogIDs1.csv IMDB_ErrorLogIDs2_OMDB_Detailed.csv
"IMDB_ErrorLogIDs1.csv" "IMDB_ErrorLogIDs2_OMDB_Detailed.csv"
IMDB_ErrorLogIDs2.csv IMDB_OMDB_Kaggle_TestSet_OMDB_Detailed.csv
"IMDB_ErrorLogIDs2.csv" "IMDB_OMDB_Kaggle_TestSet_OMDB_Detailed.csv"
IMDB_OMDB_Kaggle_TestSet.csv IMDB_Top250Engmovies.csv
"IMDB_OMDB_Kaggle_TestSet.csv" "IMDB_Top250Engmovies.csv"
IMDB_Top250Engmovies2_OMDB_Detailed.csv IMDB_Top250Engmovies2.csv
"IMDB_Top250Engmovies2_OMDB_Detailed.csv" "IMDB_Top250Engmovies2.csv"
IMDB_Top250Indianmovies.csv IMDB_Top250Indianmovies2_OMDB_Detailed.csv
"IMDB_Top250Indianmovies.csv" "IMDB_Top250Indianmovies2_OMDB_Detailed.csv"
IMDB_Top250Indianmovies2.csv IMDB_Top250movies.csv
"IMDB_Top250Indianmovies2.csv" "IMDB_Top250movies.csv"
IMDB_Top250movies2_OMDB_Detailed.csv IMDB_Top250movies2.csv
"IMDB_Top250movies2_OMDB_Detailed.csv" "IMDB_Top250movies2.csv"
TestDoc2_KaggleData_OMDB_Detailed.csv TestDoc2_KaggleData.csv
"TestDoc2_KaggleData_OMDB_Detailed.csv" "TestDoc2_KaggleData.csv"
TestDoc2_KaggleData68_OMDB_Detailed.csv TestDoc2_KaggleData68.csv
"TestDoc2_KaggleData68_OMDB_Detailed.csv" "TestDoc2_KaggleData68.csv"
TestDoc2_KaggleDataHUGE_OMDB_Detailed.csv TestDoc2_KaggleDataHUGE.csv
"TestDoc2_KaggleDataHUGE_OMDB_Detailed.csv" "TestDoc2_KaggleDataHUGE.csv"
The second copy (without the indexes) is close enough to copy-paste-use, jsut wondering why this happened ?
What is happening here is that sapply is calling print on each element of fileList[order(fileList)] printing the contents to screen. Then Rstudio prints the result of the sapply function itself, which is a list of the contents printed by print. You can use cat to print values without the [1] or wrap sapply in invisible to suppress its output. https://stackoverflow.com/a/12985020/6490232

Nested List Parsing with jsonlite

This is the second time that I have faced this recently, so I wanted to reach out to see if there is a better way to parse dataframes returned from jsonlite when one of elements is an array stored as a column in the dataframe as a list.
I know that this part of the power with jsonlite, but I am not sure how to work with this nested structure. In the end, I suppose that I can write my own custom parsing, but given that I am almost there, I wanted to see how to work with this data.
For example:
## options
options(stringsAsFactors=F)
## packages
library(httr)
library(jsonlite)
## setup
gameid="2015020759"
SEASON = '20152016'
BASE = "http://live.nhl.com/GameData/"
URL = paste0(BASE, SEASON, "/", gameid, "/PlayByPlay.json")
## get the data
x <- GET(URL)
## parse
api_response <- content(x, as="text")
api_response <- jsonlite::fromJSON(api_response, flatten=TRUE)
## get the data of interest
pbp <- api_response$data$game$plays$play
colnames(pbp)
And exploring what comes back:
> class(pbp$aoi)
[1] "list"
> class(pbp$desc)
[1] "character"
> class(pbp$xcoord)
[1] "integer"
From above, the column pbp$aoi is a list. Here are a few entries:
> head(pbp$aoi)
[[1]]
[1] 8465009 8470638 8471695 8473419 8475792 8475902
[[2]]
[1] 8470626 8471276 8471695 8476525 8476792 8477956
[[3]]
[1] 8469619 8471695 8473492 8474625 8475727 8476525
[[4]]
[1] 8469619 8471695 8473492 8474625 8475727 8476525
[[5]]
[1] 8469619 8471695 8473492 8474625 8475727 8476525
[[6]]
[1] 8469619 8471695 8473492 8474625 8475727 8475902
I don't really care if I parse these lists in the same dataframe, but what do I have for options to parse out the data?
I would prefer to take the data out of out lists and parse them into a dataframe that can be "related" to the original record it came from.
Thanks in advance for your help.
From #hrbmstr above, I was able to get what I wanted using unnest.
select(pbp, eventid, aoi) %>% unnest() %>% head

Why do I get "Error in rbind.zoo(...) : indexes overlap" when merging two zoo objects?

I have two seemingly identical zoo objects created by the same commands from csv files for different time periods. I try to combine them into one long zoo but I'm failing with "indexes overlap" error. ('merge' 'c' or 'rbind' all produce variants of the same error text.) As far as I can see there are no duplicates and the time periods do not overlap. What am I doing wrong? Am using R version 3.0.1 on Windows 7 64bit if that makes a difference.
> colnames(z2)
[1] "Amb" "HWS" "Diff"
> colnames(t.tmp)
[1] "Amb" "HWS" "Diff"
> max(index(z2))
[1] "2012-12-06 02:17:45 GMT"
> min(index(t.tmp))
[1] "2012-12-06 03:43:45 GMT"
> anyDuplicated(c(index(z2),index(t.tmp)))
[1] 0
> c(z2,t.tmp)
Error in rbind.zoo(...) : indexes overlap
>
UPDATE: In trying to make a reproducible case I've concluded this is an implementation error due to the large number of rows I'm dealing with: it fails if the final result is more than 311434 rows long.
> nrow(c(z2,head(t.tmp,n=101958)))
Error in rbind.zoo(...) : indexes overlap
> nrow(c(z2,head(t.tmp,n=101957)))
[1] 311434
# but row 101958 inserts fine on its own so its not a data problem.
> nrow(c(z2,tail(head(t.tmp,n=101958),n=2)))
[1] 209479
I'm sorry but I dont have the R scripting skills to produce a zoo of the critical length, hopefully someone might be able to help me out..
UPDATE 2- Responding to Jason's suggestion.. : The problem is in the MATCH but my R skills arent sufficient to know how to interpret it- does it mean MATCH finds a duplicate value in x.t whereas anyDuplicated does not?
> x.t <- c(index(z2),index(t.tmp));
> length(x.t)
[1] 520713
> ix <- ORDER (x.t)
> length(ix)
[1] 520713
> x.t <- x.t[ix]
> length(ix)
[1] 520713
> length(x.t)
[1] 520713
> tx <- table(MATCH(x.t,x.t))
> max(tx)
[1] 2
> tx[which(tx==2)]
311371 311373 311378 311383 311384 311386 311389 311392 311400 311401
2 2 2 2 2 2 2 2 2 2
> anyDuplicated(x.t)
[1] 0
After all the testing and head scratching it seems that the problem I'm having is timezone related. Setting the environment to the same time zone as the original data makes it work just fine.
Sys.setenv(TZ="GMT")
> z3<-rbind(z2,t.tmp)
> nrow(z3)
[1] 520713
Thanks to how to guard against accidental time zone conversion for the inspiration to look in that direction.

parent.env( x ) confusion

I've read the documentation for parent.env() and it seems fairly straightforward - it returns the enclosing environment. However, if I use parent.env() to walk the chain of enclosing environments, I see something that I cannot explain. First, the code (taken from "R in a nutshell")
library( PerformanceAnalytics )
x = environment(chart.RelativePerformance)
while (environmentName(x) != environmentName(emptyenv()))
{
print(environmentName(parent.env(x)))
x <- parent.env(x)
}
And the results:
[1] "imports:PerformanceAnalytics"
[1] "base"
[1] "R_GlobalEnv"
[1] "package:PerformanceAnalytics"
[1] "package:xts"
[1] "package:zoo"
[1] "tools:rstudio"
[1] "package:stats"
[1] "package:graphics"
[1] "package:utils"
[1] "package:datasets"
[1] "package:grDevices"
[1] "package:roxygen2"
[1] "package:digest"
[1] "package:methods"
[1] "Autoloads"
[1] "base"
[1] "R_EmptyEnv"
How can we explain the "base" at the top and the "base" at the bottom? Also, how can we explain "package:PerformanceAnalytics" and "imports:PerformanceAnalytics"? Everything would seem consistent without the first two lines. That is, function chart.RelativePerformance is in the package:PerformanceAnalytics environment which is created by xts, which is created by zoo, ... all the way up (or down) to base and the empty environment.
Also, the documentation is not super clear on this - is the "enclosing environment" the environment in which another environment is created and thus walking parent.env() shows a "creation" chain?
Edit
Shameless plug: I wrote a blog post that explains environments, parent.env(), enclosures, namespace/package, etc. with intuitive diagrams.
1) Regarding how base could be there twice (given that environments form a tree), its the fault of the environmentName function. Actually the first occurrence is .BaseNamespaceEnv and the latter occurrence is baseenv().
> identical(baseenv(), .BaseNamespaceEnv)
[1] FALSE
2) Regarding the imports:PerformanceAnalytics that is a special environment that R sets up to hold the imports mentioned in the package's NAMESPACE or DESCRIPTION file so that objects in it are encountered before anything else.
Try running this for some clarity. The str(p) and following if statements will give a better idea of what p is:
library( PerformanceAnalytics )
x <- environment(chart.RelativePerformance)
str(x)
while (environmentName(x) != environmentName(emptyenv())) {
p <- parent.env(x)
cat("------------------------------\n")
str(p)
if (identical(p, .BaseNamespaceEnv)) cat("Same as .BaseNamespaceEnv\n")
if (identical(p, baseenv())) cat("Same as baseenv()\n")
x <- p
}
The first few items in your results give evidence of the rules R uses to search for variables used in functions in packages with namespaces. From the R-ext manual:
The namespace controls the search strategy for variables used by functions in the package.
If not found locally, R searches the package namespace first, then the imports, then the base
namespace and then the normal search path.
Elaborating just a bit, have a look at the first few lines of chart.RelativePerformance:
head(body(chart.RelativePerformance), 5)
# {
# Ra = checkData(Ra)
# Rb = checkData(Rb)
# columns.a = ncol(Ra)
# columns.b = ncol(Rb)
# }
When a call to chart.RelativePerformance is being evaluated, each of those symbols --- whether the checkData on line 1, or the ncol on line 3 --- needs to be found somewhere on the search path. Here are the first few enclosing environments checked:
First off is namespace:PerformanceAnalytics. checkData is found there, but ncol is not.
Next stop (and the first location listed in your results) is imports:PerformanceAnalytics. This is the list of functions specified as imports in the package's NAMESPACE file. ncol is not found here either.
The base environment namespace (where ncol will be found) is the last stop before proceeding to the normal search path. Almost any R function will use some base functions, so this stop ensures that none of that functionality can be broken by objects in the global environment or in other packages. (R's designers could have left it to package authors to explicitly import the base environment in their NAMESPACE files, but adding this default pass through base does seem like the better design decision.)
The second base is .BaseNamespaceEnv, while the second to last base is baseenv(). These are not different (probably w.r.t. its parents). The parent of .BaseNamespaceEnv is .GlobalEnv, while that of baseenv() is emptyenv().
In a package, as #Josh says, R searches the namespace of the package, then the imports, and then the base (i.e., BaseNamespaceEnv).
you can find this by, e.g.:
> library(zoo)
> packageDescription("zoo")
Package: zoo
# ... snip ...
Imports: stats, utils, graphics, grDevices, lattice (>= 0.18-1)
# ... snip ...
> x <- environment(zoo)
> x
<environment: namespace:zoo>
> ls(x) # objects in zoo
[1] "-.yearmon" "-.yearqtr" "[.yearmon"
[4] "[.yearqtr" "[.zoo" "[<-.zoo"
# ... snip ...
> y <- parent.env(x)
> y # namespace of imported packages
<environment: 0x116e37468>
attr(,"name")
[1] "imports:zoo"
> ls(y) # objects in the imported packages
[1] "?" "abline"
[3] "acf" "acf2AR"
# ... snip ...

Resources