Gzip error when reading R data files into julia - r

I'm getting an error from gzip when reading an R data file. I'm trying to use the approach described here: Reading and writing RData files in Julia.
Here's a minimal example. In R, I run the following script:
var1 <- matrix( runif(9), 3, 3 )
save( var1, file='~/temp/file1.rda')
Then in julia:
using DataFrames
x = read_rda("~/temp/file1.rda")
This returns:
ERROR: GZip.GZError(-1,"gzopen failed")
in gzopen at /home/squipbar/.julia/v0.4/GZip/src/GZip.jl:250
in gzopen at /home/squipbar/.julia/v0.4/GZip/src/GZip.jl:265
in read_rda at /home/squipbar/.julia/v0.4/DataFrames/src/RDA.jl:418
I don't think that I'm doing anything dumb. The closest I've found to this error online is in the RDatasets github issues, here: https://github.com/johnmyleswhite/RDatasets.jl/issues/32
So perhaps this is somehow related to RDatasets? Suggestions very welcome.

As you found, tilde expansion is not automatic. You can use expanduser() to expand to the full file name.
julia> expanduser("~/Desktop")
"/Users/mycomputer/Desktop"

Ok, I figured this one out. It's the expansion of "~" in the location. The following works:
using DataFrames
x = read_rda("/home/squipbar/temp/file1.rda")
So I guess I learnt two things here: 1) The error message for read_rda is not that helpful, a File not found message would have saved me a lot of time, and 2) that you can't use ~ in this case (is this a general thing in Julia?)

Related

Base does not exist R

I have been trying to perform methylation data analysis, however I am stuck on the first few steps. I am trying to follow the workflow mentioned here and I am unable to read in my files as it gives me an error saying base does not exist.
library(methylationArrayAnalysis)
library(knitr)
library(limma)
library(minfi)
library(IlluminaHumanMethylation450kanno.ilmn12.hg19)
library(IlluminaHumanMethylation450kmanifest)
library(RColorBrewer)
library(missMethyl)
library(minfiData)
library(Gviz)
library(DMRcate)
library(stringr)
library(IlluminaHumanMethylationEPICanno.ilm10b4.hg19)
library(conumee)
dataDirectory <- "C:/Users/35389/Desktop/Medullos/All_combined/"
list.files(dataDirectory, recursive = TRUE)
target_EPIC <- read.metharray.sheet("C:/Users/35389/Desktop/Medullos/All_combined/", pattern = "sample_sheet_2.csv") #tried first with dataDirectory then put the link to files again by myself and still the same error of base does not exist
Error in read.metharray.sheet("C:/Users/35389/Desktop/Medullos/All_combined/", : 'base' does not exists
I have been trying to get around this error for a while now.
I tried to read in the methylation files directly using read.methyarray.exp() however, that way, it greatly reduces the dimensions of my data
I wonder if anyone of you has had any experiences with this and any help would be greatly appreciated! :)

Error while reading mlc file from CODEML in ggtree / treeio

I'm trying to read an mlc file (output from CODEML program) using the read.codeml_mlc function from the treeio package, as follows:
library(ggtree)
library(treeio)
tree <- read.codeml_mlc(mlc_file)
However, I'm getting the following error:
Error in strsplit(., split = "[[:space:]]") :
non character argument
Does anyone have an idea of what could be wrong? This is an ordinary mlc output from PAML from a free ratio branch model, I have not modified or altered it. However, the error seems to be related to my file, because I have tried running the example from the tutorial (reading the mlc file provided with the package) and it works fine.
Put it some other way: Where does the strsplit function come about when reading the mlc file and what part exactly of the file is it refering to? Maybe knowing that will help solve my problem.
Thanks in advance for any help that could be provided!!

Running as.Node from data.tree package in R

I'm trying to use the as.Node function from the data.tree library in R to visualize a set of media server log data as a tree. I've subset the original data frame by month and year, so that I can run one month's worth of data at a time. My function code for turning the data into a tree, and then printing it out as a .csv, is as follows:
treetrimmer2 <- function(x, y) {
urimodel <- as.Node(x)
uridf <- ToDataFrameTree(urimodel, "level", "count")
uridf <- filter(uridf, level <= y, count != 0)
filename <- paste(x$year[1], x$month[1], ".csv", sep="")
write.csv(uridf, file = filename, fileEncoding = "CP1252")
}
Some months finish without any issue. Other months, however, give me the following error (and traceback):
Error in (function () : unused argument (quote(<environment>))
7 (function ()
{
c(self$parent$path, self$name)
})(quote(<environment>))
6 self$AddChildNode(child)
5 mynode$AddChild(path)
4 FromDataFrameTable(x, pathName, pathDelimiter, colLevels, na.rm)
3 as.Node.data.frame(x)
2 as.Node(x) at media_visualizer.R#63
1 treetrimmer2(uricut$`2015.06`, 5)
Can anyone give me some guidance on what 'unused argument (quote())' means? I've tried googling it, and found that in some cases, it means that a function or term has already been defined in another context. But I'm still too novice to understand what that means here.
I'm running rStudio 0.99.896 and R 3.2.4 on Mac OS 10.11.5. I would share my data set, except that it is pretty massive, and I'm not sure which lines are causing the problem...
I can't claim credit for this; Christoph Glur (see the comments on the main post) figured it out. But it might be useful for others to share the cause, and my solution:
The problem is that a few of the log files contain one of the data.tree package's reserved words, in this case, "path". The format of the lines was "/something/something/path/something/something.jpg", so that data.tree read "path" as an independent word. There were other instances of "path" as part of a larger word, e.g., "pathString" or "pathTo", that didn't cause the bug.
Once he'd figured it out, my solution was to run the following command on all of the log files in Terminal:
sed -i '' 's/\/path\//\/spath\//' *.log
I'm still a novice, but as I understand it, what that means is "find and replace, in place, instances of "/path/" with "/spath/" in all of the .log files." I don't actually care about that one word, path vs. spath (which is gibberish), so changing it didn't matter. And now the as.Node() function runs properly on the data set.
Thank you, Christoph!

source() doesn't work ("node stack overflow")

I have the following few lines of code in my R script called assign1.R:
(u <- c(1, 1, 0, 1, 0)) # a)
u[3] # b)
ones_u <- which(u == 1) # c)
ones_u
source("assign1.R")
Only, the source() function does not work. R shows me the following error message:
Error in match(x, table, nomatch = 0L) : node stack overflow
Error during wrapup: node stack overflow
What is the problem?
I didn't get exactly the same error you did, but I was able to get something pretty similar with a trivial example:
writeLines("source('badsource.R')",con="badsource.R")
source("badsource.R")
## Error in guess(ll) : node stack overflow
As one of the comments above states, the file you're sourcing is trying to source() itself.
This is how you would test for that possibility from within R, without just opening the file in a text editor (which is a much more sensible approach):
grepl("source('badsource.R')",readLines("badsource.R"),fixed=TRUE) ## TRUE
(obviously you should fill in the name of your assignment file here ...)
It feels like you should have noticed this yourself, but I'm answering anyway because the problem is delightfully recursive ...
Your are sourcing the file that you are in. That source() line of code should be deleted. If you are sourcing some code from another R file then you would use the source() function, otherwise there is no need to source another file. Also, if all the code works in the one file without running other bits of code in other files, it is likely that you already have the code you need and you wouldn't need to source another file.

getting the name of a dataframe from loading a .rda file in R

I am trying to load an .rda file in r which was a saved dataframe. I do not remember the name of it though.
I have tried
a<-load("al.rda")
which then does not let me do anything with a. I get the error
Error:object 'a' not found
I have also tried to use the = sign.
How do I load this .rda file so I can use it?
I restared R with load("al.rda) and I know get the following error
Error: C stack usage is too close to the limit
Use 'attach' and then 'ls' with a name argument. Something like:
attach("al.rda")
ls("file:al.rda")
The data file is now on your search path in position 2, most likely. Do:
search()
ls(pos=2)
for enlightenment. Typing the name of any object saved in al.rda will now get it, unless you have something in search path position 1, but R will probably warn you with some message about a thing masking another thing if there is.
However I now suspect you've saved nothing in your RData file. Two reasons:
You say you don't get an error message
load says there's nothing loaded
I can duplicate this situation. If you do save(file="foo.RData") then you'll get an empty RData file - what you probably meant to do was save.image(file="foo.RData") which saves all your objects.
How big is this .rda file of yours? If its under 100 bytes (my empty RData files are 42 bytes long) then I suspect that's what's happened.
I had to reinstall R...somehow it was corrupt. The simple command which I expected of
load("al.rda")
finally worked.
I had a similar issue, and it was solved without reinstall R. for example doing
load("al.rda) works fine, however if you do
a <- load("al.rda") will not work.
The load function does return the list of variables that it loaded. I suspect you actually get an error when you load "al.rda". What exactly does R output when you load?
Example of how it should work:
d <- data.frame(a=11:13, b=letters[1:3])
save(d, file='foo.rda')
a <- load('foo.rda')
a # prints "d"
Just to be sure, check that the load function you actually call is the original one:
find("load") # should print "package:base"
EDIT Since you now get an error when you load the file, it is probably corrupt in some way. Try this and say what it prints:
file.info("a1.rda") # Prints the file size etc...
readBin("a1.rda", "raw", 50) # reads first 50 bytes from the file
Without having access to the file, it's hard to investigate more... Maybe you could share the file somehow (http://www.filedropper.com or similar)?
I usually use save to save only a single object, and I then use the following utility method to retrieve that object into a given variable name using load, but into a temporary namespace to avoid overwriting existing objects. Maybe it will be helpful for others as well:
load_first_object <- function(fname){
e <- new.env(parent = parent.frame())
load(fname, e)
return(e[[ls(e)[1]]])
}
The method can of course be extended to also return named objects and lists of objects, but this simple version is for me the most useful.

Resources