After trying to read a biom file:
rich_dense_biom <-
system.file("extdata", "D:\sample_otutable.biom", package = "phyloseq")
myData <-
import_biom(rich_dense_biom, treefilename, refseqfilename, parseFunction =
parse_taxonomy_greengenes)
the following errors are showing
Error in read_biom(biom_file = BIOMfilename) :
Both attempts to read input file:
either as JSON (BIOM-v1) or HDF5 (BIOM-v2).
Check file path, file name, file itself, then try again.
Are you sure D:\sample_otutable.biom really exists? And is a system file?
In R for Windows, it is at least safer (if not required?) to separate file paths with \\
This works for me
library("devtools")
install_github("biom", "joey711")
library(biom)
biom.file <-
"C:\\Users\\Mark Miller\\Documents\\R\\win-library\\3.3\\biom\\extdata\\min_dense_otu_table.biom"
my.data <- import_biom(BIOMfilename = biom.file)
Related
I am attempting to load a .Rdata file without having to download the actual file to my computer.
Here is what I have so far:
primate_URL <- "https://ftp.ncbi.nlm.nih.gov/geo/series/GSE118nnn/GSE118546/suppl/GSE118546_macaque_fovea_all_10X_Jan2018.Rdata.gz"
con <- gzcon(url(primate_URL))
load(con)
When I run the script, it returns this error:
Error in load(con) :
the input does not start with a magic number compatible with loading from a connection
Any tips on what I might be doing wrong?
Try following. I tried it with a small file from the same source of same type.
primate_URL <- "https://ftp.ncbi.nlm.nih.gov/geo/series/GSE118nnn/GSE118992/suppl/GSE118992_supplementaryData.Rdata.gz"
download.file(primate_URL, "GSE118992_supplementaryData.Rdata.gz")
file.x <- gzfile("GSE118992_supplementaryData.Rdata.gz", open = "r")
file.x.data <- readLines(file.x, encoding="UTF-8")
Given a .tar.gz file on my hard disk, would like to create that exact file, but with R code alone (e.g. with the help of serialization). The goal is to not refer to the file itself, but to generate a plain text variable containing the content of the file and after that to write the file to the file system. I thought about the following:
Take the base64 string of the file (base64 serialization).
Write it to the file system as a binary file.
But the following code generates an empty file:
zzfil <- tempfile("testfile")
zz <- file(zzfil, "wb")
file_content <- "H4sIAAAAAAAAA+1YbW/bNhD2Z/6KW/zBNpLIerHjQmvapo6HBWgyw3ZXDE1X0BJtEZFIgaTguEb/+06S7drJumJA5m6DHsAQxOM9PPF4uscyTJuUBnd0zmzbaV8Oxv3R1XBy9ctN7clgI846nfzq9Lr27rVAr2fXHM+zvV6303N6NdvxHDSDXTsAMm2owlDE/K/nfcv+H8WwzL0PZu8gkMkyxcG1lUy4ifH2XUQNmIhtxuFSMg3Nwgp9qlmL/MqU5lL4YFuOZZOLzERS5Z4SFkoaBtyQa8qFwR9DwwTZ1stCsh2H50uZKc3i2SstE7aImGKWYOYFuWQ6UDw1xRrXUjGgU5kZWOShcQNhEVFCl1Pky80mogKkYBAjcYsA4q1mMEN+0LgyTkd6AVyETBgu5hiOonNF0wgt3ERcFI+8s7BF3vCACb3Zkbi8A67zCDIkUi/JQAQyRDof3k5+On2GgadMhNqHETRfnINndSyvRa6SVCqDo/N5GkvjFjbXci2ndQKGT6e4sfmQg9PdFnlDPy0vqaGYLpUxcsNYqPsySXlMyx0RkqxzE/rg2s6zU7t76jngOL7T870uBtP/EcScbPK/n/b2zcX1YDy86A+e8ox9q/6x8Mv6d92eZztY/+6Z163q/xBg9/kBHFJjmBLNo9/fv/dpnEbU//Dh+KhFahX+33hQ/6P2P7BG0eO73a/Xv20/qH/nDGUAdKv6P3z+IxbH0hod8P2P2T5b59/zOo6d6z+706ve/4dAHX7OE34CC6ni8AdSJ3XUZLmS0YDCid3TJEUNMstEkCsMEDRhITSKU9LAuYuIBxGkCpWbhsYeV8Mq2H6TGQRIFTOqRKnJSsm2kX200Ii58srlFozGJgu5BGr8wh8gMib12211mt7NtRXR0AqkJT61C/MY9SFkms2yGO7YciqpCkEjoQkyDGkm1eOFNsSvMx6H+JghjFgsaQhbNQyNvlExHMM44jOD19eNwqMfseBuZ9oOHnoMSo8JFtifOzzymDQIqTfgVdmTSbHH8Px0u/nNFqxQwBab3Tza22ts1Z8fO3/Mq3ufISeI+VRRtWyuNWfrC+eV3gpRLrAw4piFL4+KCTlNaWuGqEDPExNQpU+AMt28P0/SeaucdgxzJpOPeISMRBWdYNBQh5DNaBYbmCItRlr13X/p+z+h4ukVwN/v/67Tc6r+/53yv1YA4cH+/7mP8u91u17V/w+B27yhr4qUfya3NOZUb+9M/l1nte4z74o+g6OZxrOyKhtMM287t+GXTyMrMvyKFMB5azGhd52rN3CFChUqfB/8AQr6tbUAGgAA"
writeBin(RCurl::base64Decode(file_content), zz)
close(zz)
file.rename(from = zzfil, to = paste0(zzfil,".tar.gz"))
How should I serialize the file instead? I.e. how should I fill the functions file_to_string and string_to_file?
file_to_string <- function(input_file){
# Return a serialized string of input_file
}
string_to_file <- function(input_string){
# Return content to write to a file
}
original_file <- "original.tar.gz"
zzfil <- tempfile("copy")
zz <- file(zzfil, "wb")
file_content <- file_to_string(original_file)
writeBin(string_to_file(file_content), zz)
close(zz)
file.rename(from = zzfil, to = paste0(zzfil,".tar.gz"))
For me, using R 3.4.4 on platform x86_64-pc-linux-gnu, RCurl version 1.95-4.10, the example code produces a non-empty file that can be read back in using readBin, so i can't reproduce your empty file issue.
But that's not the main issue here.
UsingwriteBin does not achieve what you want to do: it's use case is to store an R-Object (a vector) in a binary format on the filesystem and read it back in with readBin; not to read in a binary file, then manipulate it and save the new version or generate a binary file that is meant to be understood by anything else besides readBin.
In my humble opinion: R is probably not the right tool to do binary patches.
I am trying and failing to write a process that will download a .zip archive, extract a particular Excel file from that archive, and load that Excel file into my R workspace without ever writing any of those files (the .zip or the .xls) to my hard drive.
I have written a version of this process that works for zipped .csvs, but it doesn't work for .xls. Here's how that version goes, using one of the URLs I'm targeting in my current project and using readWorksheetFromFile() instead of read.csv() at the appropriate moment:
library(XLConnect)
waed.old.link <- "http://eventdata.parusanalytics.com/data.dir/pitf.world.19950101-20121231.xls.zip"
waed.old.file <- "pitf.world.19950101-20121231.xls"
tmp <- tempfile()
download.file(waed.old.link, tmp)
tmp2 <- tempfile()
tmp2 <- unz(tmp, waed.old.file)
WAED.old <- readWorksheetFromFile(tmp2, sheet = 1, startRow = 3, startCol = 1, endCol = 73)
unlink(tmp)
unlink(tmp2)
And here's what pops up after line 8, the one that tries to ingest the spreadsheet as WAED.old:
Error in path.expand(filename) : invalid 'path' argument
I also tried read_excel() at that step and got the same result:
> WAED.old <- read_excel(tmp2, skip = 2)
Error in file.exists(path) : invalid 'file' argument
I gather that this has something to do with pointing readWorksheetFromFile() at a connection rather than a file, but I'm not sure that's right, and I don't know how to fix it if it is. I searched stackoverflow and the web for an answer but couldn't find one that was right on point. I'd really appreciate some help.
As you say, it is because unz returns a connection object for the file within the zip (but does not explicitly unzip that file), while readWorksheetFromFile expects a path to a file.
Use unzip to explicitly unzip the file.
tmp2 <- unzip(zipfile=tmp, files = waed.old.file, exdir=tempdir())
# readWorksheetFromFile(tmp2, ...)
Im trying to read an excel file into R. It's about the following file in my cwd:
> list.files()
[1] "Keuren_Op_Afspraak.xlsx"
I installed XLConnect and am doing the following:
library(XLConnect)
demoExcelFile <- system.file("Keuren_Op_Afspraak.xlsx", package = "XLConnect")
wb <- loadWorkbook(demoExcelFile)
But this gives me the error:
Error: FileNotFoundException (Java): File '' could not be found - you may specify to automatically create the file if not existing.
But I dont understand where this is coming from. Any thoughts?
I prefer using the readxl package. It is written in C so it is faster. It also seems to handle large files better. The command would be:
library(readxl)
wb <- read_excel("Keuren_Op_Afspraak.xlsx")
You can also use the xlsx package.
library(xlsx)
wb <- read.xlsx("Keuren_Op_Afspraak.xlsx", sheet = 1)
Edit :#Verena
You can also use this function much faster:
wb <- read.xlsx2("Keuren_Op_Afspraak.xlsx", sheet = 1)
You have to change your code that way:
library(XLConnect)
demoExcelFile <- "Keuren_Op_Afspraak.xlsx"
wb <- loadWorkbook(demoExcelFile)
You probably took the example from here:
http://www.inside-r.org/packages/cran/XLConnect/docs/loadWorkbook
This line
system.file("demoFiles/mtcars.xlsx", package = "XLConnect")
is a way to get sample files that are part of a package. If you download the zip File of XLConnect and look into the folder structure you will see that there is a folder demoFiles that contains mtcars.xlsx. And the parameter package="XLConnect" tells the method to look for the file in this package.
If you type it into the command line it returns the absolute path to the file:
"C:/Users/Expecto/Documents/R/win-library/3.1/XLConnect/demoFiles/mtcars.xlsx"
To use loadWorkbook you simply need to pass the relative or absolute filepath.
I am trying to use the R package xlsx to load a file available at this URL:
http://www.plosgenetics.org/article/fetchSingleRepresentation.action?uri=info:doi/10.1371/journal.pgen.1002236.s019
library(xlsx)
filename="/home/avilella/00x/mobile.element.insertions.1000g.journal.pgen.1002236.s019.xlsx"
system(paste("ls -l",filename))
-rw-rw-r-- 1 avilella avilella 2372143 2011-12-11 16:36 /home/avilella/00x/mobile.element.insertions.1000g.journal.pgen.1002236.s019.xlsx
Once downloaded, I try to load it in R using read.xlsx or read.xlsx2:
file <- system.file("mobile.element.insertions.1000g", filename, package = "xlsx")
res <- read.xlsx2(file, 1) # read first sheet
But I get an error:
Error in .jnew("java/io/FileInputStream", file) :
java.io.FileNotFoundException: (No such file or directory)
Any ideas?
1) xlsx package. Try using file.choose which will allow you to interactively navigate to the file and thereby eliminate the possibility of misidentifying it:
fn <- file.choose()
DF <- read.xls(fn, 1)
2) gdata package. If the above still does not work then you might try read.xls in gdata. It uses a perl program rather than java. It can read both xls and xlsx files and can read data right off the net (downloading it into a temporary file and reading it from there in a manner that is transparent to the user):
library(gdata)
URL <- "http://www.plosgenetics.org/article/fetchSingleRepresentation.action?uri=info:doi/10.1371/journal.pgen.1002236.s019"
DF <- read.xls(URL)
?read.xls in gdata has more info.