How do I load a remote .Rdata file in R? - r

I am attempting to load a .Rdata file without having to download the actual file to my computer.
Here is what I have so far:
primate_URL <- "https://ftp.ncbi.nlm.nih.gov/geo/series/GSE118nnn/GSE118546/suppl/GSE118546_macaque_fovea_all_10X_Jan2018.Rdata.gz"
con <- gzcon(url(primate_URL))
load(con)
When I run the script, it returns this error:
Error in load(con) :
the input does not start with a magic number compatible with loading from a connection
Any tips on what I might be doing wrong?

Try following. I tried it with a small file from the same source of same type.
primate_URL <- "https://ftp.ncbi.nlm.nih.gov/geo/series/GSE118nnn/GSE118992/suppl/GSE118992_supplementaryData.Rdata.gz"
download.file(primate_URL, "GSE118992_supplementaryData.Rdata.gz")
file.x <- gzfile("GSE118992_supplementaryData.Rdata.gz", open = "r")
file.x.data <- readLines(file.x, encoding="UTF-8")

Related

How can I download .ods data from web to R?

I would like to ask how I could download an .ods dataset from web (specifically this site: https://knowledge4policy.ec.europa.eu/territorial/ardeco-online_en?fbclid=IwAR1CPVLzdey8MnMZDLA-9NpvMDAJqMq1WHmm6yu8FtRAk01u9K184wCU7Wc) directly to R? I tried the following read_ODS code
a <- read_ods(path = url("https://knowledge4policy.ec.europa.eu/sites/default/files/RNPTD.ods"), sheet = 1)
and got the error
"Error in file.exists(file) : invalid 'file' argument"
Did I make a mistake here or does read_ods load only local files?
This seems to work fine:
url1 <- "https://knowledge4policy.ec.europa.eu/sites/default/files/RNPTD.ods"
f <- tempfile()
download.file(url1, dest=f)
x <- readODS::read_ods(f)
unlink(f)
That is, you can't read directly from an ODS file located at a URL (or at least, it didn't work for me), but downloading to a temp file and reading works.

fread() in R unable to open a file

I am trying to open a file in R as shown below:
data0 <- filename_a %>% map_df(~fread(., sep=",", skip=1))
Let us assume that fread fails to read this file for various reasons. Such as the file is under use by other program or the file does not exist. In such a case I would like to read filename_b instead.
At this moment, as soon as the above step fails, the code stops executing. How can I read filename_b when filename_a fails to read?
You can try using tryCatch as follows :
library(data.table)
data <- tryCatch(fread(filename_a, sep=",", skip=1),
error = function(e) return(fread(filename_b, sep=",", skip=1)))

Can't open .biom file for Phyloseq tree plotting

After trying to read a biom file:
rich_dense_biom <-
system.file("extdata", "D:\sample_otutable.biom", package = "phyloseq")
myData <-
import_biom(rich_dense_biom, treefilename, refseqfilename, parseFunction =
parse_taxonomy_greengenes)
the following errors are showing
Error in read_biom(biom_file = BIOMfilename) :
Both attempts to read input file:
either as JSON (BIOM-v1) or HDF5 (BIOM-v2).
Check file path, file name, file itself, then try again.
Are you sure D:\sample_otutable.biom really exists? And is a system file?
In R for Windows, it is at least safer (if not required?) to separate file paths with \\
This works for me
library("devtools")
install_github("biom", "joey711")
library(biom)
biom.file <-
"C:\\Users\\Mark Miller\\Documents\\R\\win-library\\3.3\\biom\\extdata\\min_dense_otu_table.biom"
my.data <- import_biom(BIOMfilename = biom.file)

Open a dta file in R

I am trying to open a Stata .dta file which is compressed into winrar in R. Here are my codes:
library(foreign)
setwd("C:/Users/ASUS/Desktop/Data on oil/Oil discovery")
data <- read.dta("oil_discovery")
and I get :
Error in read.dta("oil_discovery") : unable to open file: 'No such file or directory'
I think that my problem is coming from the assignment of my working directory but I don't know how to manage it.
You need to specify the full file name to read.dta. This includes the file ending. That is, instead of
data <- read.dta("oil_discovery")
you need to write
data <- read.dta("oil_discovery.dta")
If there is an additional problem with the compression, I would imagine that the error message will be different. However, Error in read.dta("oil_discovery") : unable to open file: 'No such file or directory' very explicitly points out that the current error is that the file oil_discovery is not found.
A good way to check if the name or path is causing the error is to use choose.files(). That is, run the following line:
data <- read.dta(choose.files())
This will open a pop-up window where you can manually select the file. If this works, then the name of the file was misspecified.
library(haven)
data <- read_dta("**.dta")
View(data)

Reading in Excel (downloaded with automated script) produces error when not manually opened and saved first

I run an automated script to download 3 .xls files from 3 websites every hour. When I later try to read in the .xls files in R to further work with them, R produces the following error message:
"Error: IOException (Java): block[ 2 ] already removed - does your POIFS have circular or duplicate block references?"
When I manually open and save the .xls files this problem doesn't appear anymore and everything works normal, but since the total number of files is increasing with 72 every day this is not a nice work around.
The script I use to download and save the files:
library(httr)
setwd("WORKDIRECTION")
orig_wd <- getwd()
FOLDERS <- c("NAME1","NAME2","NAME3") #representing folder names
LINKS <- c("WEBSITE_1", #the urls from which I download
"WEBSITE_2",
"WEBSITE_3")
NO <- length(FOLDERS)
for(i in 1:NO){
today <- as.character(Sys.Date())
if (!file.exists(paste(FOLDERS[i],today,sep="/"))){
dir.create(paste(FOLDERS[i],today,sep="/"))
}
setwd(paste(orig_wd,FOLDERS[i],today,sep="/"))
dat<-GET(LINKS[i])
bin <- content(dat,"raw")
now <- as.character(format(Sys.time(),"%X"))
now <- gsub(":",".",now)
writeBin(bin,paste(now,".xls",sep=""))
setwd(orig_wd)
}
I then read in the files with the following script:
require(gdata)
require(XLConnect)
require(xlsReadWrite)
wb = loadWorkbook("FILEPATH")
df = readWorksheet(wb, "Favourite List" , header = FALSE)
Does anybody have experience with this type of error, and knows a solution or workaround?
The problem is partly resolved by using the readxl package available in the CRAN library. After installation files can be read in with:
library(readxl)
read_excel("PathToFile")
The only problem is, that the last column is omitted while reading in. If I find a solution for this I'll update the awnser.

Resources