Check if two paths resolve to the same directory - r

In Check if two file paths resolve to the same file the solution is to use normalizePath; however, this appears to not be definitive for directories:
td1 <- tempdir()
td2 <- paste0(td1, "/")
dir.exists(td1) && dir.exists(td2)
#> [1] TRUE
file.create(file.path(td1, "foo.txt"))
#> [1] TRUE
file.exists(file.path(td2, "foo.txt"))
#> [1] TRUE
normalizePath(td1) == normalizePath(td2)
#> [1] FALSE
sessionInfo()
#> R version 3.5.1 (2018-07-02)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 10 x64 (build 17134)
#>
#> Matrix products: default
#>
#> locale:
#> [1] LC_COLLATE=English_Australia.1252 LC_CTYPE=English_Australia.1252
#> [3] LC_MONETARY=English_Australia.1252 LC_NUMERIC=C
#> [5] LC_TIME=English_Australia.1252
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> loaded via a namespace (and not attached):
#> [1] compiler_3.5.1 backports_1.1.2 magrittr_1.5 rprojroot_1.3-2
#> [5] tools_3.5.1 htmltools_0.3.6 yaml_2.2.0 Rcpp_0.12.18
#> [9] stringi_1.1.7 rmarkdown_1.10 knitr_1.20 stringr_1.3.1
#> [13] digest_0.6.16 evaluate_0.11
Created on 2018-09-05 by the reprex package (v0.2.0).
Is there a method that is reliable (or more reliable) at identifying directories?

If you're open to using a package for this, I'm having a lot of success with the fs package for path operations that are robust across OSes. When it normalizes via fs::path_norm(), it will strip this trailing slash on Windows, for example.
td1 <- tempdir()
td2 <- paste0(td1, "/")
dir.exists(td1) && dir.exists(td2)
#> [1] TRUE
file.create(file.path(td1, "foo.txt"))
#> [1] TRUE
file.exists(file.path(td2, "foo.txt"))
#> [1] TRUE
normalizePath(td1) == normalizePath(td2)
#> [1] FALSE
library(fs)
path_norm(td1) == path_norm(td2)
#> [1] TRUE
sessionInfo()
#> R version 3.4.3 (2017-11-30)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 10 x64 (build 17134)
#>
#> Matrix products: default
#>
#> locale:
#> [1] LC_COLLATE=English_United States.1252
#> [2] LC_CTYPE=English_United States.1252
#> [3] LC_MONETARY=English_United States.1252
#> [4] LC_NUMERIC=C
#> [5] LC_TIME=English_United States.1252
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> loaded via a namespace (and not attached):
#> [1] compiler_3.4.3 backports_1.1.2 magrittr_1.5 rprojroot_1.3-2
#> [5] tools_3.4.3 htmltools_0.3.6 yaml_2.1.16 Rcpp_0.12.18
#> [9] stringi_1.1.6 rmarkdown_1.8 knitr_1.17 stringr_1.2.0
#> [13] digest_0.6.16 evaluate_0.10.1
Created on 2018-09-05 by the reprex package (v0.2.0).

Related

Could not find function "fpkmToTpm_matrix" [duplicate]

This question already has answers here:
Error: could not find function ... in R
(10 answers)
Closed 7 months ago.
I installed a R package named "GeoTcgaData", libraried it and wanted to use the function "fpkmToTPM_matrix" to convert my data. But this message came out:
Error in fpkmToTpm_matrix(J_Rseq) :
could not find function "fpkmToTpm_matrix"
Wondering where the problem is.
(The version of the package is 1.1.0)
You may need to install some of the dependencies separately via Bioconductor. If you install the packages using:
install.packages("BiocManager")
BiocManager::install(pkgs = c('DESeq2', 'impute', 'edgeR', 'cqn', 'topconfects', 'ChAMP', 'clusterProfiler',
'org.Hs.eg.db', 'minfi', 'IlluminaHumanMethylation450kanno.ilmn12.hg19',
'dearseq', 'NOISeq'))
You can then (hopefully) install and run GeoTcgaData as expected:
install.packages("GeoTcgaData", type = "source")
library(GeoTcgaData)
#> =============================================================
#> Hello, friend! welcome to use GeoTcgaData!
#> -------------------------------------------------------------
#> Version:1.1.0
#> =============================================================
lung_squ_count2 <- matrix(c(0.11,0.22,0.43,0.14,0.875,0.66,0.77,0.18,0.29),ncol=3)
rownames(lung_squ_count2) <- c("DISC1","TCOF1","SPPL3")
colnames(lung_squ_count2) <- c("sample1","sample2","sample3")
lung_squ_count2
#> sample1 sample2 sample3
#> DISC1 0.11 0.140 0.77
#> TCOF1 0.22 0.875 0.18
#> SPPL3 0.43 0.660 0.29
result <- fpkmToTpm_matrix(lung_squ_count2)
result
#> sample1 sample2 sample3
#> DISC1 144736.8 83582.09 620967.7
#> TCOF1 289473.7 522388.06 145161.3
#> SPPL3 565789.5 394029.85 233871.0
sessionInfo()
#> R version 4.1.3 (2022-03-10)
#> Platform: x86_64-apple-darwin17.0 (64-bit)
#> Running under: macOS Big Sur/Monterey 10.16
#>
#> Matrix products: default
#> BLAS: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRblas.0.dylib
#> LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib
#>
#> locale:
#> [1] en_AU.UTF-8/en_AU.UTF-8/en_AU.UTF-8/C/en_AU.UTF-8/en_AU.UTF-8
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] GeoTcgaData_1.1.0
#>
#> loaded via a namespace (and not attached):
#> [1] Rcpp_1.0.9 plyr_1.8.7 pillar_1.8.0 compiler_4.1.3
#> [5] highr_0.9 R.methodsS3_1.8.2 R.utils_2.12.0 tools_4.1.3
#> [9] digest_0.6.29 evaluate_0.15 lifecycle_1.0.1 tibble_3.1.8
#> [13] R.cache_0.16.0 pkgconfig_2.0.3 rlang_1.0.4 reprex_2.0.1
#> [17] DBI_1.1.3 cli_3.3.0 rstudioapi_0.13 yaml_2.3.5
#> [21] xfun_0.31 fastmap_1.1.0 withr_2.5.0 styler_1.7.0
#> [25] stringr_1.4.0 dplyr_1.0.9 knitr_1.39 generics_0.1.3
#> [29] fs_1.5.2 vctrs_0.4.1 tidyselect_1.1.2 glue_1.6.2
#> [33] R6_2.5.1 fansi_1.0.3 rmarkdown_2.14 purrr_0.3.4
#> [37] magrittr_2.0.3 htmltools_0.5.3 splines_4.1.3 assertthat_0.2.1
#> [41] utf8_1.2.2 nor1mix_1.3-0 stringi_1.7.8 cqn_1.38.0
#> [45] R.oo_1.25.0
Created on 2022-08-08 by the reprex package (v2.0.1)
If installing the dependencies separately doesn't solve the issue, another potential alternative is to define the function yourself. From the source code:
fpkmToTpm <- function(fpkm)
{
exp(log(fpkm) - log(sum(fpkm)) + log(1e6))
}
fpkmToTpm_matrix <- function(fpkm_matrix) {
fpkm_matrix_new <- apply(fpkm_matrix, 2, fpkmToTpm)
}
fpkmToTpm_matrix(J_Rseq)
Does this work?
Also, please edit your question to include the output of the command sessionInfo()

R: strange results when looking at the unique elements of two simple strings

I am absolutely puzzled at what I see.
I read an excel file and when I look at the unique values in a column of strings, I do not understand the result.
I can reproduce this in a minimal reprex (see below): why dd has two unique elements, wheread dd2 has just one?
Any suggestion is appreciated.
dd <- c("Grant", "Grant")
dd2 <- c("Grant", "Grant")
unique(dd)
#> [1] "Grant" "Grant"
length(unique(dd))
#> [1] 2
unique(dd2)
#> [1] "Grant"
length(unique(dd2))
#> [1] 1
sessionInfo()
#> R version 4.1.1 (2021-08-10)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Debian GNU/Linux 11 (bullseye)
#>
#> Matrix products: default
#> BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0
#>
#> locale:
#> [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8
#> [5] LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8
#> [7] LC_PAPER=en_GB.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> loaded via a namespace (and not attached):
#> [1] knitr_1.33 magrittr_2.0.1 rlang_0.4.11 fansi_0.5.0
#> [5] stringr_1.4.0 styler_1.5.1 highr_0.9 tools_4.1.1
#> [9] xfun_0.25 utf8_1.2.2 withr_2.4.2 htmltools_0.5.1.1
#> [13] ellipsis_0.3.2 yaml_2.2.1 digest_0.6.27 tibble_3.1.3
#> [17] lifecycle_1.0.0 crayon_1.4.1 purrr_0.3.4 vctrs_0.3.8
#> [21] fs_1.5.0 glue_1.4.2 evaluate_0.14 rmarkdown_2.10
#> [25] reprex_2.0.1 stringi_1.7.3 compiler_4.1.1 pillar_1.6.2
#> [29] backports_1.2.1 pkgconfig_2.0.3
Created on 2021-09-13 by the reprex package (v2.0.1)
The raw values seems to be different, probably from copying
sapply(dd, charToRaw)
$`Grant`
[1] ef bb bf 47 72 61 6e 74
$Grant
[1] 47 72 61 6e 74
whereas with dd2, it is the same
sapply(dd2, charToRaw)
Grant Grant
[1,] 47 47
[2,] 72 72
[3,] 61 61
[4,] 6e 6e
[5,] 74 74
There seems to be an extra character in the first case
nchar(dd)
[1] 6 5
If we remove that first character, unique will be 1
unique(c(substring(dd[1],2), dd[2]))
[1] "Grant"

R: bind_rows fails because of unnamed argument 1

This has been asked several times in different forms, but here is a very simple example. I have used this function several times in the past, but apparently it fails miserably in the simplest case.
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(purrr)
bind_df_list <- function(ll){
res <- map_df(ll, bind_rows)
return(res)
}
ll<-list(seq(4), seq(4), seq(4))
dd<-bind_df_list(ll)
#> Error: Argument 1 must have names.
print(sessionInfo())
#> R version 4.0.4 (2021-02-15)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Debian GNU/Linux 10 (buster)
#>
#> Matrix products: default
#> BLAS: /usr/lib/x86_64-linux-gnu/openblas/libblas.so.3
#> LAPACK: /usr/lib/x86_64-linux-gnu/libopenblasp-r0.3.5.so
#>
#> locale:
#> [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8
#> [5] LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8
#> [7] LC_PAPER=en_GB.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] purrr_0.3.4 dplyr_1.0.4
#>
#> loaded via a namespace (and not attached):
#> [1] knitr_1.31 magrittr_2.0.1 tidyselect_1.1.0 R6_2.5.0
#> [5] rlang_0.4.10 stringr_1.4.0 styler_1.3.2 highr_0.8
#> [9] tools_4.0.4 xfun_0.21 DBI_1.1.1 htmltools_0.5.1.1
#> [13] ellipsis_0.3.1 assertthat_0.2.1 yaml_2.2.1 digest_0.6.27
#> [17] tibble_3.0.6 lifecycle_1.0.0 crayon_1.4.1 vctrs_0.3.6
#> [21] fs_1.5.0 glue_1.4.2 evaluate_0.14 rmarkdown_2.6
#> [25] reprex_1.0.0 stringi_1.5.3 compiler_4.0.4 pillar_1.4.7
#> [29] generics_0.1.0 backports_1.2.1 pkgconfig_2.0.3
Created on 2021-03-08 by the reprex package (v1.0.0)
Any idea about how to fix this?
Many thanks!
it is just a vector and ?bind_rows input should be a set of dataframes
... - Data frames to combine.
One option is to loop over the list, with map, convert to data.frame and append them by rows with _dfr
library(purrr)
map_dfr(ll, as.data.frame.list)
Or transpose and convert to data.frame
map_dfr(ll, ~ as.data.frame(t(.x)))
Or this can also be a named vector
map_dfr(ll, ~ setNames(.x, seq_along(.x)))
Or use rbind with do.call from base R as rbind have methods for matrix and data.frame
do.call(rbind, ll)
do.call(rbind.data.frame, ll)

Inconsistent results for the exact same gtrendsR::gtrends() call that runs in a loop

I ran into some weird behavior and I'm not sure what's causing it.
The same gtrends() call, in a loop, on past data - returns different results.
Any idea what might be causing it?
Minimal example:
library(gtrendsR)
unlist(lapply(X = 1:20, function(i) {
g <- gtrends(
keyword = c("/m/01k_wh", "/m/03m424"),
time = "2018-01-01 2018-12-31",
geo = "GB",
hl = "en-GB"
)
g <- g$interest_over_time
max(g$hits[g$keyword == "/m/01k_wh"])
}))
# [1] 72 72 72 72 72 72 72 72 72 72 72 72 72 52 52 52 52 52 52 52
Session info:
R version 3.5.2 (2018-12-20)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] gtrendsR_1.4.2
loaded via a namespace (and not attached):
[1] Rcpp_1.0.0 assertthat_0.2.0 crayon_1.3.4 dplyr_0.7.8 R6_2.3.0
[6] grid_3.5.2 plyr_1.8.4 gtable_0.2.0 magrittr_1.5 scales_1.0.0
[11] ggplot2_3.1.0 pillar_1.3.1 rlang_0.3.0.1 lazyeval_0.2.1 rstudioapi_0.8
[16] bindrcpp_0.2.2 tools_3.5.2 glue_1.3.0 purrr_0.2.5 munsell_0.5.0
[21] yaml_2.2.0 compiler_3.5.2 pkgconfig_2.0.2 colorspace_1.3-2 tidyselect_0.2.5
[26] bindr_0.1.1 tibble_1.4.2

Strange behavior when subsetting with column names quoted with backticks in I of data.table

Look at the follow example generated with reprex:
library(data.table)
DT <- data.table(id = letters[1:3], `counts(a>=0)` = 1:3)
DT[`counts(a>=0)` >= 2] # 1
#> id counts(a>=0)
#> 1: b 2
#> 2: c 3
DT[`counts(a>=0)` == 2] # 2
#> Error in `[.data.table`(DT, `counts(a>=0)` == 2): Column(s) [counts(a] not found in x
DT[id == "a"] # 3
#> id counts(a>=0)
#> 1: a 1
As both the lines marked with #1 and #3 work, I wonder why subsetting with `counts(a>=0)` == 2 (#2) doesn't work.
SessionInfo:
> sessionInfo()
R version 3.4.4 (2018-03-15)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.4 LTS
Matrix products: default
BLAS: /usr/lib/atlas-base/atlas/libblas.so.3.0
LAPACK: /usr/lib/atlas-base/atlas/liblapack.so.3.0
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] reprex_0.1.2 data.table_1.11.2
loaded via a namespace (and not attached):
[1] Rcpp_0.12.16 rprojroot_1.3-2 digest_0.6.15 crayon_1.3.4 withr_2.1.2 assertthat_0.2.0 R6_2.2.2
[8] backports_1.1.2 magrittr_1.5 formatR_1.5 evaluate_0.10.1 stringi_1.1.6 debugme_1.1.0 rstudioapi_0.7
[15] callr_2.0.2 whisker_0.3-2 rmarkdown_1.9 devtools_1.13.5 tools_3.4.4 stringr_1.3.0 yaml_2.1.17
[22] compiler_3.4.4 htmltools_0.3.6 memoise_1.1.0 knitr_1.20
It works for me with :
DT[as.numeric(`counts(a>=0)`) == 2]

Resources