I am trying to import data from a timetable into R. I have been able to successfully load the v7.3 .mat file using the raveio package. The .mat file has two timetables, both have the same timescale. These tables are rather huge (6422100 rows in the sample I am using to write the code) so it hard to share a reproducible example.
In the list raveio::read_mat generates from the .mat file, I can see variables that contain the timetable properties:
[82] "#refs#/cc" "#refs#/d/CustomProps" "#refs#/d/VariableCustomProps"
[85] "#refs#/d/arrayProps/Description" "#refs#/d/arrayProps/TableCustomProperties" "#refs#/d/arrayProps/UserData"
[88] "#refs#/d/data" "#refs#/d/dimNames" "#refs#/d/dimNamesOrig"
[91] "#refs#/d/incompatibilityMsg" "#refs#/d/minCompatibleVersion" "#refs#/d/numDims"
[94] "#refs#/d/numRows" "#refs#/d/numVars" "#refs#/d/rowTimes"
[97] "#refs#/d/useDimNamesOrig" "#refs#/d/useVarNamesOrig" "#refs#/d/varContinuity"
[100] "#refs#/d/varDescriptions" "#refs#/d/varNames" "#refs#/d/varNamesOrig"
[103] "#refs#/d/varUnits" "#refs#/d/versionSavedFrom" "#refs#/db"
"#refs#/d/rowTimes" only has a length of 6, so that is not the key. None of the variables that are the right length have obvious timestamps in them, or datenum value.
Any clues? I'm also trying to find better documentation as to how timetables are created to help. Cheers.
Related
My code:
setwd("C:/A549_ALI/4_tert-Butanol (22)/")
list.celfiles()
my.affy=ReadAffy()
dim(exprs(my.affy))
Output:
Show in New Window
[1] "(46) 22-B1-1_(miRNA-4_0).CEL"
[2] "(47) 22-B1-2_(miRNA-4_0).CEL"
[3] "(48) 22-B1-3_(miRNA-4_0).CEL"
[4] "(49) 22-R1-1_(miRNA-4_0).CEL"
[5] "(50) 22-NEC 1-1_(miRNA-4_0).CEL"
[6] "(51) 22-B2-1_(miRNA-4_0).CEL"
[7] "(52) 22-B2-2_(miRNA-4_0).CEL"
[8] "(53) 22-B2-3_(miRNA-4_0).CEL"
[9] "(54) 22-R2-1_(miRNA-4_0).CEL"
[10] "(55) 22-NEC 2-1_(miRNA-4_0).CEL"
[11] "(56) 22-B3-1_(miRNA-4_0).CEL"
[12] "(57) 22-B3-2_(miRNA-4_0).CEL"
[13] "(58) 22-B3-3_(miRNA-4_0).CEL"
[14] "(59) 22-R3-1_(miRNA-4_0).CEL"
[15] "(60) 22-NEC 3-1_(miRNA-4_0).CEL"
[1] 292681 15
Up to here everything works but than I get this error message:
background correction: mas
PM/MM correction : mas
expression values: mas
background correcting...'getOption("repos")' replaces Bioconductor standard repositories, see '?repositories' for details
replacement repositories:
CRAN: https://cran.rstudio.com/
Error in getCdfInfo(object) :
Could not obtain CDF environment, problems encountered:
Specified environment does not contain miRNA-4_0
Library - package mirna40cdf not installed
Bioconductor - mirna40cdf not available
I have already tried to install this package, but I can't find it on the Bioconductor website.
Now I do not know how to proceed. Is there any other way to use the mas5calls function?
I use R 4.2.2.
Thanks for all answers.
This simply means that the package with the cdf-env built upon the chip description (CDF) - file for this type of MicroArray data is not distributed through Bio-conductor. It seems that Affymetrix is not providing those anymore, but you can find them on GEO. (click on the platform and than under "Supplementary file" . Alternatively ask the person you got the data from if they can provide you with the relevant CDFs. Use cfdName() to check which ones you need.
Once you obtained the CDF you can build the R package ( mirna40cdf in your case) that affy needs using the makecdfenv package you can install from Bioconductor. You could also try to use another package called oligo and see if it supports your data.
When running read_xlsx() in my normal .R script, I'm able to read in the data. But when running the .R script with source() in R Markdown, it suddenly takes a long time (> 20+++ mins I always terminate before the end) and I keep getting these warning messages where it is evaluating every single column and expecting it to be a logical:
Warning: Expecting logical in DE5073 / R5073C109: got 'HOSPITAL/CLINIC'
Warning: Expecting logical in DG5073 / R5073C111: got 'YES'
Warning: Expecting logical in CQ5074 / R5074C95: got '0'
Warning: Expecting logical in CR5074 / R5074C96: got 'MARKET/GROCERY STORE'
Warning: Expecting logical in CT5074 / R5074C98: got 'NO'
Warning: Expecting logical in CU5074 / R5074C99: got 'YES'
Warning: Expecting logical in CV5074 / R5074C100: got 'Less than one week'
Warning: Expecting logical in CW5074 / R5074C101: got 'NEXT'
Warning: Expecting logical in CX5074 / R5074C102: got '0'
.. etc
I can't share the data here, but it is just a normal xlsx file (30k obs, 110 vars). The data has responses in all capitals like YES and NO. The raw data has filters applied, some additional sheets, and some mild formatting in Excel (no borders, white fill) but I don't think these are affecting it.
An example of my workflow setup is like this:
Dataprep.R:
setwd()
pacman::p_load() # all my packages
df <- read_xlsx("./data/Data.xlsx") %>% type_convert()
## blabla more cleaning stuff
Report.Rmd:
setwd()
pacman::p_load() # all my packages again
source("Dataprep.R")
When I run Dataprep.R, everything works in < 1 min. But when I try to source("Dataprep.R") from Report.Rmd, then it starts being slow at read_xlsx() and giving me those warnings.
I've tried also taking df <- read_xlsx() from Dataprep.R and moving it to Report.Rmd, and it is still as slow as running source(). I've also removed type_convert() and tried other things like removing the extra sheets in the Excel. source() was also in the setup chunk in Report.Rmd, but I took it out and still the same thing.
So I think it is something to do with R Markdown and readxl/read_xlsx(). The exact same code and data is evaluating so differently in R vs Rmd and it's very puzzling.
Would appreciate any insight on this. Is there a fix? Or is this something I will just have to live with (i.e. convert to csv)?
> sessionInfo()
R version 4.2.0 (2022-04-22 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 22000)
Matrix products: default
locale:
[1] LC_COLLATE=English_United Kingdom.utf8 LC_CTYPE=English_United Kingdom.utf8 LC_MONETARY=English_United Kingdom.utf8
[4] LC_NUMERIC=C LC_TIME=English_United Kingdom.utf8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] digest_0.6.29 R6_2.5.1 lifecycle_1.0.1 pacman_0.5.1 evaluate_0.15 scales_1.2.0 rlang_1.0.2 cli_3.3.0 rstudioapi_0.13
[10] rmarkdown_2.14 tools_4.2.0 munsell_0.5.0 xfun_0.30 yaml_2.3.5 fastmap_1.1.0 compiler_4.2.0 colorspace_2.0-3 htmltools_0.5.2
[19] knitr_1.39
UPDATE:
So in Markdown, I can use the more generic read_excel() and that works in my setup chunk. But I still get the same Warning messages if I try to source() it, even if the R script sourced is also using read_excel() instead of read_xlsx(). Very puzzling all around.
When you run that code on a .R (and probably other kinds of codes that generate warnings), you will get a summary of warnings. Something like "There were 50 or more warnings (use warning() to see the first 50)".
While if you run that same code on a standard Rmarkdown code chunk, you will actually get the whole 50+ warnings. That could mean you are printing thousands, millions, or more warnings.
If your question is WHY does that happen on Rmarkdown and not on R, I'm not sure.
But if your question is how to solve it, it's simple. Just make sure to add the options message=FALSE and warning=FALSE to your code chunk.
It should look something like this:
{r chunk_name, message=FALSE, warning=FALSE}
setwd()
pacman::p_load() # all my packages again
source("Dataprep.R")
Now, about the "setwd()", I would advise against using anything that changes the state of your system (avoid "side effect" functions). They can create problems if you are not very careful. But that is another topic for another day.
I wanted to filter a data set based on some conditions. When I looked at the help for filter function the result was:
filter {stats} R Documentation
Linear Filtering on a Time Series
Description
Applies linear filtering to a univariate time series or to each series separately of a multivariate time series.
After searching on web I found the filter function I needed i.e. from dplyr package. How can R have two functions with same name. What am I missing here?
At the moment the R interpreter would dispatch a call to filter to the dplyr environment, at least if the class of the object were among the avaialble methods:
methods(filter)
[1] filter.data.frame* filter.default* filter.sf* filter.tbl_cube* filter.tbl_df* filter.tbl_lazy*
[7] filter.ts*
As you can see there is a ts method, so if the object were of that class, the interpreter would instead deliver the x values to it. However, it appears that the authors of dplyr have blocked that mechanism and instead put in a warning function. You would need to use:
getFromNamespace('filter', 'stats')
function (x, filter, method = c("convolution", "recursive"),
sides = 2L, circular = FALSE, init = NULL)
{ <omitting rest of function body> }
# same result also obtained with:
stats::filter
R functions are contained in namespaces, so a full designation of a function would be: namespace_name::function_name. There is a hierarchy of namespace containers (actually "environments" in R terminology) arranged along a search path (which will vary depending on the order in which packages and their dependencies have been loaded). The ::-infix-operator can be used to specify a namespace or package name that is further up the search path than might be found in the context of the calling function. The function search can display the names of currently loaded packages and their associated namespaces. See ?search Here's mine at the moment (which is a rather bloated one because I answer a lot of questions and don't usually start with a clean systems:
> search()
[1] ".GlobalEnv" "package:kernlab" "package:mice" "package:plotrix"
[5] "package:survey" "package:Matrix" "package:grid" "package:DHARMa"
[9] "package:eha" "train" "package:SPARQL" "package:RCurl"
[13] "package:XML" "package:rnaturalearthdata" "package:rnaturalearth" "package:sf"
[17] "package:plotly" "package:rms" "package:SparseM" "package:Hmisc"
[21] "package:Formula" "package:survival" "package:lattice" "package:remotes"
[25] "package:forcats" "package:stringr" "package:dplyr" "package:purrr"
[29] "package:readr" "package:tidyr" "package:tibble" "package:ggplot2"
[33] "package:tidyverse" "tools:rstudio" "package:stats" "package:graphics"
[37] "package:grDevices" "package:utils" "package:datasets" "package:methods"
[41] "Autoloads"
At the moment I can find instances of 3 versions of filter using the help system:
?filter
# brings this up in the help panel
Help on topic 'filter' was found in the following packages:
Return rows with matching conditions
(in package dplyr in library /home/david/R/x86_64-pc-linux-gnu-library/3.5.1)
Linear Filtering on a Time Series
(in package stats in library /usr/lib/R/library)
Objects exported from other packages
(in package plotly in library /home/david/R/x86_64-pc-linux-gnu-library/3.5.1)
I'm learning mongolite/mongoDB right now, and came across this:
https://cran.r-project.org/web/packages/mongolite/vignettes/intro.html
Inside I saw code like this:
tbl <- m$mapreduce(
map = "function(){emit({cut:this.cut, color:this.color}, 1)}",
reduce = "function(id, counts){return Array.sum(counts)}"
)
Can someone tell me what these functions are written in? I don't think they are R functions.
The R language allows you to create environments where you put functions that are then referenced with the $-operator as one would pull items from a list. So the m$mapreduce is calling an R function and sending that text to the database engine: http://docs.mongodb.org/manual/reference/command/mapReduce/
If you install the package and execute help(pac=mongolite) you will see that the package has a single exposed function, mongo that allows any of those function calls. You can then work through the examples on the help page and the vignette.
(Note: you will get an error if you do not first install and set up the database executable.)
If you execute this with mongolite loaded you get a list of objects in the environment defined when the mongo function was created:
ls(envir=environment(mongo))
There are a set of objects in that environment that appear to hold what you might be interested in:
[14] "mongo_collection_aggregate"
[15] "mongo_collection_command"
[16] "mongo_collection_command_simple"
[17] "mongo_collection_count"
[18] "mongo_collection_create_index"
[19] "mongo_collection_distinct"
[20] "mongo_collection_drop"
[21] "mongo_collection_drop_index"
[22] "mongo_collection_find"
[23] "mongo_collection_find_indexes"
[24] "mongo_collection_insert_bson"
[25] "mongo_collection_insert_page"
[26] "mongo_collection_mapreduce"
[27] "mongo_collection_name"
[28] "mongo_collection_new"
[29] "mongo_collection_remove"
[30] "mongo_collection_rename"
[31] "mongo_collection_stats"
[32] "mongo_collection_update"
The mapreduce functions in the mongolite package are written in javascript. Please see the package docs on CRAN for confirmation (page 3) (a link to external PDF):
mapreduce(map, reduce, query = ’{}’, sort = ’{}’, limit = 0, out = NULL, scope = NULL)
"Performs a map reduce query. The map and reduce arguments are strings containing a JavaScript function. Set out to a string to store results in a collection instead of returning."
I am opening a large (347M) .R file. In Rstudio with R3.1.2
After mucking around with various input functions, I eventually got R to return this:
file("/Users/vincentlaufer/Desktop/all.t.subsets.R")
description class mode
"/Users/vincentlaufer/Desktop/all.t.subsets.R" "gzfile" "rt"
text opened can read
"text" "closed" "yes"
can write
"yes"
So, I tried working on opening it in the following way:
splat <- scan(gzfile("/Users/vincentlaufer/Desktop/all.t.subsets.R"), what="Factor")
(I chose Factor because Rstudio told me that "RDX2" is a Factor. I don't know what RDX2 is because I cannot open the file, probably a gene name..)
displaying the variable I created, splat, returns the following
> splat
[1] "RDX2"
[2] "X"
[3] ""
[4] "\002"
[5] ""
[6] "\025beta.combat.x.th1th17"
[7] "\033q?\xd9\xce\a_o\xd2"
[8] "?\xe0\x85\x87\x93\u0757\xf6?\xe1&\027\xc1\xbd\xa5\022?\xec\037!-w1\x90?\xed|\xed\x91hr\xb0?\xe0-\xe0"
[9] "\033qv?\xc9\xf2\022\xd7s\030\xfc?\xe8\xa3\xd7"
[10] "=p\xa4?\xe1\xdc\xc6?\024\022\006?\xee>BZ\xeec"
[11] "?\xee\027\xc1\xbd\xa5\021\x9d?\xe0\xb0\xf2{\xb2\xfe\xc8?\xe7"
I am a bit out of depth here. Can anyone tell me how to open this .R file.
The file is a binary files saved by R. (You can tell because of the RDX2 entry which says what kind of file it is - see http://biostat.mc.vanderbilt.edu/wiki/Main/RBinaryFormat)
You should try loading it using load("/Users/vincentlaufer/Desktop/all.t.subsets.R")