I am currently reviewing this question on SO and see that the OP stated that by adding more for loops you can expand the polynomials. How exactly would you do so? I am trying to expand to polyorder 5.
Polynomial feature expansion in R
Here is the code below:
polyexp = function(df){
df.polyexp = df
colnames = colnames(df)
for (i in 1:ncol(df)){
for (j in i:ncol(df)){
colnames = c(colnames, paste0(names(df)[i],'.',names(df)[j]))
df.polyexp = cbind(df.polyexp, df[,i]*df[,j])
}
}
names(df.polyexp) = colnames
return(df.polyexp)
}
Ultimately, I'd like to order the matrix so that it expands in order of degree. I tried using the poly function but I'm not sure if you can order the result so that it returns a matrix that starts with degree 1 then moves to degree 2, then 3, 4, and 5.
To "sort by degree" is a little ambiguous. x^2 and x*y both have degree 2. I'll assume you want to sort by total degree, and then within each of those, by degree of the 1st column; within that, by degree of the second column, etc. (I believe the default is to ignore total degree and sort by degree of the last column, within that the second last, and so on, but this is not documented so I won't count on it.)
Here's how to use polym to do this. The columns are named things like "2.0" or "1.1". You could sort these alphabetically and it would be fine up to degree 9, but if you convert those names using as.numeric_version, there's no limit. So convert the column names to version names, get the sort order, and use that plus degree to re-order the columns of the result. For example,
df <- data.frame(x = 1:6, y = 0:5, z = -(1:6))
expanded <- polym(as.matrix(df), degree = 5)
o <- order(attr(expanded, "degree"),
as.numeric_version(colnames(expanded)))
sorted <- expanded[,o]
# That lost the attributes, so put them back
attr(sorted, "degree") <- attr(expanded, "degree")[o]
attr(sorted, "coefs") <- attr(expanded, "coefs")
class(sorted) <- class(expanded)
# If you call predict(), it comes out in the default order,
# so will need sorting too:
predict(sorted, newdata = as.matrix(df[1,]))[, o]
#> 0.0.1 0.1.0 1.0.0 0.0.2 0.1.1 0.2.0
#> 0.59761430 -0.59761430 -0.59761430 0.54554473 -0.35714286 0.54554473
#> 1.0.1 1.1.0 2.0.0 0.0.3 0.1.2 0.2.1
#> -0.35714286 0.35714286 0.54554473 0.37267800 -0.32602533 0.32602533
#> 0.3.0 1.0.2 1.1.1 1.2.0 2.0.1 2.1.0
#> -0.37267800 -0.32602533 0.21343368 -0.32602533 0.32602533 -0.32602533
#> 3.0.0 0.0.4 0.1.3 0.2.2 0.3.1 0.4.0
#> -0.37267800 0.18898224 -0.22271770 0.29761905 -0.22271770 0.18898224
#> 1.0.3 1.1.2 1.2.1 1.3.0 2.0.2 2.1.1
#> -0.22271770 0.19483740 -0.19483740 0.22271770 0.29761905 -0.19483740
#> 2.2.0 3.0.1 3.1.0 4.0.0 0.0.5 0.1.4
#> 0.29761905 -0.22271770 0.22271770 0.18898224 0.06299408 -0.11293849
#> 0.2.3 0.3.2 0.4.1 0.5.0 1.0.4 1.1.3
#> 0.20331252 -0.20331252 0.11293849 -0.06299408 -0.11293849 0.13309928
#> 1.2.2 1.3.1 1.4.0 2.0.3 2.1.2 2.2.1
#> -0.17786140 0.13309928 -0.11293849 0.20331252 -0.17786140 0.17786140
#> 2.3.0 3.0.2 3.1.1 3.2.0 4.0.1 4.1.0
#> -0.20331252 -0.20331252 0.13309928 -0.20331252 0.11293849 -0.11293849
#> 5.0.0
#> -0.06299408
Created on 2020-03-21 by the reprex package (v0.3.0)
this line of code previously worked on travis-ci but now fails. works fine on appveyor
RCurl::getURL( "ftp://ftp.cdc.gov/pub/data/yrbs/" , ftp.use.epsv = TRUE, dirlistonly = TRUE )
here's the text of the error-
Error in function (type, msg, asError = TRUE) :
server did not report OK, got 425
Calls: get_catalog ... <Anonymous> -> curlPerform -> .Call -> <Anonymous> -> fun
i think my .travis.yml is a pretty standard r configuration:
language: r
cache: packages
sudo: required
apt_packages:
- unixodbc-dev
- libarchive-dev
successful build log late february 2018 at https://api.travis-ci.org/v3/job/343635739/log.txt
failed build log early march 2018 https://api.travis-ci.org/v3/job/352115990/log.txt
the libcurl block looks nearly identical, but there's one noticeable difference between the success and the failure--
late feburary success:
2 upgraded, 35 newly installed, 1 to remove and 124 not upgraded.
mid march failure:
2 upgraded, 35 newly installed, 1 to remove and 135 not upgraded.
minor r session info changes:
late february success:
Session info ------------------------------------------------------------------
date 2018-02-20
DBI 0.7 2017-06-18 cran (#0.7)
devtools 1.13.4 2017-11-09 CRAN (R 3.4.2)
lodown 0.1.0 2018-02-20 Github (ajdamico/lodown#6a69363)
rlang 0.1.6 2017-12-21 cran (#0.1.6)
srvyr 0.3.0 2018-01-24 cran (#0.3.0)
mid march failure:
Session info ------------------------------------------------------------------
date 2018-03-12
DBI 0.8 2018-03-02 cran (#0.8)
devtools 1.13.5 2018-02-18 CRAN (R 3.4.3)
lodown 0.1.0 2018-03-12 Github (ajdamico/lodown#bef726b)
rlang 0.2.0 2018-02-20 cran (#0.2.0)
srvyr 0.3.1 2018-03-10 cran (#0.3.1)
I don't know much about graphic devices etc. All I want to do is to save plots to PDF and to embed fonts.
I use cairo_pdf() for this, but I noticed that sometimes plot elements are printed outside of the box/plot region (see screenshots of the PDFs). I can reproduce the issue on different Windows machines, different R versions, using packages cairoDevice or Cairo, and with for example lines(). But plots saved via pdf() look fine.
My questions are:
Is this reproducible? If yes, is this a bug and where?
Are there any other situations were cairo_pdf()-plots look different compared to pdf()-plots? Are there any other disadvantages of using cairo_pdf()?
Below are screenshots from details of the whole PDFs illustrating the differences. Note that, in the left image, the axis overlap with some points.
capabilities("cairo")
#> cairo
#> TRUE
set.seed(123456)
N <- 10000
v1 <- rnorm(N)
v2 <- rnorm(N)
v3 <- ifelse(v1 > 1.02 | v2 > 1.02 | v1 < -.02 | v2 < -.02, 2, 1)
cairo_pdf("plot1.pdf")
plot(v1, v2, xlim = 0:1, ylim = 0:1, col = v3, pch = 16)
dev.off()
#> null device
#> 1
pdf("plot2.pdf")
plot(v1, v2, xlim = 0:1, ylim = 0:1, col = v3, pch = 16)
dev.off()
#> null device
#> 1
devtools::session_info()
#> Session info ------------------------------------------------------------------
#> setting value
#> version R version 3.4.2 (2017-09-28)
#> system x86_64, mingw32
#> ui Rgui
#> language (EN)
#> collate German_Germany.1252
#> tz Europe/Berlin
#> date 2018-03-09
#>
#> Packages ----------------------------------------------------------------------
#> package * version date source
#> base * 3.4.2 2017-09-28 local
#> compiler 3.4.2 2017-09-28 local
#> datasets * 3.4.2 2017-09-28 local
#> devtools 1.13.5 2018-02-18 CRAN (R 3.4.3)
#> digest 0.6.15 2018-01-28 CRAN (R 3.4.3)
#> graphics * 3.4.2 2017-09-28 local
#> grDevices * 3.4.2 2017-09-28 local
#> memoise 1.1.0 2017-04-21 CRAN (R 3.4.1)
#> methods * 3.4.2 2017-09-28 local
#> stats * 3.4.2 2017-09-28 local
#> utils * 3.4.2 2017-09-28 local
#> withr 2.1.1 2017-12-19 CRAN (R 3.4.3)
This bug is fixed in R 3.6.0.
From the NEWS:
The cairo_pdf graphics device (and other Cairo-based devices) now clip correctly to the right and bottom border.
There was an off-by-one-pixel bug, reported by Lee Kelvin.
I'm puzzled by the behaviour of the uq() function. The behavior is not the same when I use uq() or lazyeval::uq().
Here is my reproducible example :
First, I generate a fake dataset
library(tibble)
library(lazyeval)
fruits <- c("apple", "banana", "peanut")
price <- c(5,6,4)
table_fruits <- tibble(fruits, price)
Then I write a toy function, toy_function_v1, using only uq() :
toy_function_v1 <- function(data, var) {
lazyeval::f_eval(f = ~ uq(var), data = data)
}
and a second function using lazyeval::uq() :
toy_function_v2 <- function(data, var) {
lazyeval::f_eval(f = ~ lazyeval::uq(var), data = data)
}
Surprisingly, the output of v1 and v2 is not the same :
> toy_function_v1(data = table_fruits, var = ~ price)
[1] 5 6 4
> toy_function_v2(data = table_fruits, var = ~ price)
price
Is there any explanation ?
I know it's a good practice to use the syntaxe package::function() to use the function inside a new package. So what's the best solution in that case ?
Here is my session_info :
> devtools::session_info()
Session info ----------------------------------------------------------------------------------------------------------------------------------------------------
setting value
version R version 3.3.1 (2016-06-21)
system x86_64, linux-gnu
ui RStudio (1.0.35)
language (EN)
collate C
tz <NA>
date 2016-11-07
Packages --------------------------------------------------------------------------------------------------------------------------------------------------------
package * version date source
Rcpp 0.12.7 2016-09-05 CRAN (R 3.2.3)
assertthat 0.1 2013-12-06 CRAN (R 3.2.2)
devtools 1.12.0 2016-06-24 CRAN (R 3.2.3)
digest 0.6.10 2016-08-02 CRAN (R 3.2.3)
lazyeval * 0.2.0.9000 2016-10-14 Github (hadley/lazyeval#c155c3d)
memoise 1.0.0 2016-01-29 CRAN (R 3.2.3)
tibble * 1.2 2016-08-26 CRAN (R 3.2.3)
withr 1.0.2 2016-06-20 CRAN (R 3.2.3)
It's just a bug in the uq() function. The issue is open on Github : https://github.com/hadley/lazyeval/issues/78.
Is there a simple way to get a list of R package dependencies (all recursive dependencies) for a given package, without installing the package and it's dependencies? Something similar to a fake install in portupgrade or apt.
You can use the result of the available.packages function. For example, to see what ggplot2 depends on :
pack <- available.packages()
pack["ggplot2","Depends"]
Which gives :
[1] "R (>= 2.14), stats, methods"
Note that depending on what you want to achieve, you may need to check the Imports field, too.
I am surprised no one mentioned tools::package_dependencies() , which is the simplest solution, and has a recursive argument (which the accepted solution does not offer).
Simple example looking at the recursive dependencies for the first 200 packages on CRAN:
library(tidyverse)
avail_pks <- available.packages()
deps <- tools::package_dependencies(packages = avail_pks[1:200, "Package"],
recursive = TRUE)
tibble(Package=names(deps),
data=map(deps, as_tibble)) %>%
unnest(data)
#> # A tibble: 7,125 x 2
#> Package value
#> <chr> <chr>
#> 1 A3 xtable
#> 2 A3 pbapply
#> 3 A3 parallel
#> 4 A3 stats
#> 5 A3 utils
#> 6 aaSEA DT
#> 7 aaSEA networkD3
#> 8 aaSEA shiny
#> 9 aaSEA shinydashboard
#> 10 aaSEA magrittr
#> # … with 7,115 more rows
Created on 2020-12-04 by the reprex package (v0.3.0)
Another neat and simple solution is the internal function recursivePackageDependencies from the library packrat. However, the package must be installed in some library on your machine. The advantage is that it works with selfmade non-CRAN packages as well. Example:
packrat:::recursivePackageDependencies("ggplot2",lib.loc = .libPaths()[1])
giving:
[1] "R6" "RColorBrewer" "Rcpp" "colorspace" "dichromat" "digest" "gtable"
[8] "labeling" "lazyeval" "magrittr" "munsell" "plyr" "reshape2" "rlang"
[15] "scales" "stringi" "stringr" "tibble" "viridisLite"
I do not have R installed and I needed to find out which R Packages were dependencies upon a list of R Packages being requested for usage at my company.
I wrote a bash script that iterates over a list of R Packages in a file and will recursively discover dependencies.
The script uses a file named rinput_orig.txt as input (example below). The script will create a file named rinput.txt as it does its work.
The script will create the following files:
rdepsfound.txt - Lists dependencies found including the R Package that is dependent upon it (example below).
routput.txt - Lists all R Packages (from original list and list of dependencies) along with the license and CRAN URL (example below).
r404.txt - List of R Packages where a 404 was received when trying to curl. This is handy if your original list has any typos.
Bash script:
#!/bin/bash
# CLEANUP
rm routput.txt
rm rdepsfound.txt
rm r404.txt
# COPY ORIGINAL INPUT TO WORKING INPUT
cp rinput_orig.txt rinput.txt
IFS=","
while read PACKAGE; do
echo Processing $PACKAGE...
PACKAGEURL="http://cran.r-project.org/web/packages/${PACKAGE}/index.html"
if [ `curl -o /dev/null --silent --head --write-out '%{http_code}\n' ${PACKAGEURL}` != 404 ]; then
# GET LICENSE INFO OF PACKAGE
LICENSEINFO=$(curl ${PACKAGEURL} 2>/dev/null | grep -A1 "License:" | grep -v "License:" | gawk 'match($0, /<a href=".*">(.*)<\/a>/, a) {print a[0]}' | sed "s/|/,/g" | sed "s/+/,/g")
for x in ${LICENSEINFO[*]}
do
# SAVE LICENSE
LICENSE=$(echo ${x} | gawk 'match($0, /<a href=".*">(.*)<\/a>/, a) {print a[1]}')
break
done
# WRITE PACKAGE AND LICENSE TO OUTPUT FILE
echo $PACKAGE $LICENSE $PACKAGEURL >> routput.txt
# GET DEPENDENCIES OF PACKAGE
DEPS=$(curl ${PACKAGEURL} 2>/dev/null | grep -A1 "Depends:" | grep -v "Depends:" | gawk 'match($0, /<a href=".*">(.*)<\/a>/, a) {print a[0]}')
for x in ${DEPS[*]}
do
FOUNDDEP=$(echo "${x}" | gawk 'match($0, /<a href=".*">(.*)<\/a>/, a) {print a[1]}' | sed "s/<\/span>//g")
if [ "$FOUNDDEP" != "" ]; then
echo Found dependency $FOUNDDEP for $PACKAGE...
grep $FOUNDDEP rinput.txt > /dev/null
if [ "$?" = "0" ]; then
echo $FOUNDDEP already exists in package list...
else
echo Adding $FOUNDDEP to package list...
# SAVE FOUND DEPENDENCY BACK TO INPUT LIST
echo $FOUNDDEP >> rinput.txt
# SAVE FOUND DEPENDENCY TO DEPENDENCY LIST FOR EASY VIEWING OF ALL FOUND DEPENDENCIES
echo $FOUNDDEP is a dependency of $PACKAGE >> rdepsfound.txt
fi
fi
done
else
echo Skipping $PACKAGE because 404 was received...
echo $PACKAGE $PACKAGEURL >> r404.txt
fi
done < rinput.txt
echo -e "\nRESULT:"
sort -u routput.txt
Example rinput_orig.txt:
shiny
rmarkdown
xtable
RODBC
RJDBC
XLConnect
openxlsx
xlsx
Rcpp
Example console output when running script:
Processing shiny...
Processing rmarkdown...
Processing xtable...
Processing RODBC...
Processing RJDBC...
Found dependency DBI for RJDBC...
Adding DBI to package list...
Found dependency rJava for RJDBC...
Adding rJava to package list...
Processing XLConnect...
Found dependency XLConnectJars for XLConnect...
Adding XLConnectJars to package list...
Processing openxlsx...
Processing xlsx...
Found dependency rJava for xlsx...
rJava already exists in package list...
Found dependency xlsxjars for xlsx...
Adding xlsxjars to package list...
Processing Rcpp...
Processing DBI...
Processing rJava...
Processing XLConnectJars...
Processing xlsxjars...
Found dependency rJava for xlsxjars...
rJava already exists in package list...
Example rdepsfound.txt:
DBI is a dependency of RJDBC
rJava is a dependency of RJDBC
XLConnectJars is a dependency of XLConnect
xlsxjars is a dependency of xlsx
Example routput.txt:
shiny GPL-3 http://cran.r-project.org/web/packages/shiny/index.html
rmarkdown GPL-3 http://cran.r-project.org/web/packages/rmarkdown/index.html
xtable GPL-2 http://cran.r-project.org/web/packages/xtable/index.html
RODBC GPL-2 http://cran.r-project.org/web/packages/RODBC/index.html
RJDBC GPL-2 http://cran.r-project.org/web/packages/RJDBC/index.html
XLConnect GPL-3 http://cran.r-project.org/web/packages/XLConnect/index.html
openxlsx GPL-3 http://cran.r-project.org/web/packages/openxlsx/index.html
xlsx GPL-3 http://cran.r-project.org/web/packages/xlsx/index.html
Rcpp GPL-2 http://cran.r-project.org/web/packages/Rcpp/index.html
DBI LGPL-2 http://cran.r-project.org/web/packages/DBI/index.html
rJava GPL-2 http://cran.r-project.org/web/packages/rJava/index.html
XLConnectJars GPL-3 http://cran.r-project.org/web/packages/XLConnectJars/index.html
xlsxjars GPL-3 http://cran.r-project.org/web/packages/xlsxjars/index.html
I hope this helps someone!
I tested my own solution (local installed packages checked) against packrat and tools ones.
You could find out clear differences between methods.
tools::package_dependencies looks to give too much for older R versions (till 4.1.0 and recursive = TRUE) and is not efficient solution.
R 4.1.0 NEWS
"Function tools::package_dependencies() (in package tools) can now use different dependency types for direct and recursive dependencies."
packrat:::recursivePackageDependencies is using available.packages so it is based on newest remote packages, not local ones.
My function by default is skipping base packages, change the base arg if you want to attach them too.
Tested under R 4.1.0:
get_deps <- function(package, fields = c("Depends", "Imports", "LinkingTo"), base = FALSE, lib.loc = NULL) {
stopifnot((length(package) == 1) && is.character(package))
stopifnot(all(fields %in% c("Depends", "Imports", "Suggests", "LinkingTo")))
stopifnot(is.logical(base))
stopifnot(package %in% rownames(utils::installed.packages(lib.loc = lib.loc)))
paks_global <- NULL
deps <- function(pak, fileds) {
pks <- packageDescription(pak)
res <- NULL
for (f in fileds) {
ff <- pks[[f]]
if (!is.null(ff)) {
res <- c(
res,
vapply(
strsplit(trimws(strsplit(ff, ",")[[1]]), "[ \n\\(]"),
function(x) x[1],
character(1)
)
)
}
}
if (is.null(res)) {
return(NULL)
}
for (r in res) {
if (r != "R" && !r %in% paks_global) {
paks_global <<- c(r, paks_global)
deps(r, fields)
}
}
}
deps(package, fields)
setdiff(unique(paks_global), c(
package,
"R",
if (!base) {
c(
"stats",
"graphics",
"grDevices",
"utils",
"datasets",
"methods",
"base",
"tools"
)
} else {
NULL
}
))
}
own = get_deps("shiny", fields = c("Depends", "Imports"))
packrat = packrat:::recursivePackageDependencies("shiny", lib.loc = .libPaths(), fields = c("Depends", "Imports"))
tools = tools::package_dependencies("shiny", which = c("Depends", "Imports"), recursive = TRUE)[[1]]
setdiff(own, packrat)
#> character(0)
setdiff(packrat, own)
#> character(0)
setdiff(own, tools)
#> character(0)
setdiff(tools, own)
#> [1] "methods" "utils" "grDevices" "tools" "stats" "graphics"
setdiff(packrat, tools)
#> character(0)
setdiff(tools, packrat)
#> [1] "methods" "utils" "grDevices" "tools" "stats" "graphics"
own
#> [1] "lifecycle" "ellipsis" "cachem" "jquerylib" "rappdirs"
#> [6] "fs" "sass" "bslib" "glue" "commonmark"
#> [11] "withr" "fastmap" "crayon" "sourcetools" "base64enc"
#> [16] "htmltools" "digest" "xtable" "jsonlite" "mime"
#> [21] "magrittr" "rlang" "later" "promises" "R6"
#> [26] "Rcpp" "httpuv"
packrat
#> [1] "R6" "Rcpp" "base64enc" "bslib" "cachem"
#> [6] "commonmark" "crayon" "digest" "ellipsis" "fastmap"
#> [11] "fs" "glue" "htmltools" "httpuv" "jquerylib"
#> [16] "jsonlite" "later" "lifecycle" "magrittr" "mime"
#> [21] "promises" "rappdirs" "rlang" "sass" "sourcetools"
#> [26] "withr" "xtable"
tools
#> [1] "methods" "utils" "grDevices" "httpuv" "mime"
#> [6] "jsonlite" "xtable" "digest" "htmltools" "R6"
#> [11] "sourcetools" "later" "promises" "tools" "crayon"
#> [16] "rlang" "fastmap" "withr" "commonmark" "glue"
#> [21] "bslib" "cachem" "ellipsis" "lifecycle" "sass"
#> [26] "jquerylib" "magrittr" "base64enc" "Rcpp" "stats"
#> [31] "graphics" "fs" "rappdirs"
microbenchmark::microbenchmark(get_deps("shiny", fields = c("Depends", "Imports")),
packrat:::recursivePackageDependencies("shiny", lib.loc = .libPaths(), fields = c("Depends", "Imports")),
tools = tools::package_dependencies("shiny", which = c("Depends", "Imports"), recursive = TRUE)[[1]],
times = 5
)
#> Warning in microbenchmark::microbenchmark(get_deps("shiny", fields =
#> c("Depends", : less accurate nanosecond times to avoid potential integer
#> overflows
#> Unit: milliseconds
#> expr
#> get_deps("shiny", fields = c("Depends", "Imports"))
#> packrat:::recursivePackageDependencies("shiny", lib.loc = .libPaths(), fields = c("Depends", "Imports"))
#> tools
#> min lq mean median uq max neval
#> 5.316552 5.607365 6.054568 5.674359 6.633308 7.041258 5
#> 18.767340 19.387588 21.739127 21.581457 23.526169 25.433079 5
#> 411.589734 449.179354 458.526354 465.431262 468.440211 497.991207 5
Created on 2021-06-25 by the reprex package (v0.3.0)
Proof that sth was wrong with tools solution under older R versions. Tested under R 3.6.3.
paks <- tools::package_dependencies("shiny", which = c("Depends", "Imports"), recursive = TRUE)[[1]]
"lifecycle" %in% paks
#> [1] TRUE
any(c(paks, "shiny") %in% tools::dependsOnPkgs("lifecycle"))
#> [1] FALSE
Created on 2021-06-25 by the reprex package (v0.3.0)
Try this: tools::package_dependencies(recursive = TRUE)$package_name
As an example- here are the dependencies for dplyr:
tools::package_dependencies(recursive = TRUE)$dplyr
[1] "ellipsis" "generics" "glue" "lifecycle" "magrittr" "methods"
[7] "R6" "rlang" "tibble" "tidyselect" "utils" "vctrs"
[13] "cli" "crayon" "fansi" "pillar" "pkgconfig" "purrr"
[19] "digest" "assertthat" "grDevices" "utf8" "tools"