From the IMF I want to read a .xls file from an URL directly into R, but all attempts fail so far. Weirdly, I can download the file manually or by download.file() and open it without problems in Microsoft Outlook or in a text editor. However, even then I can't read the data into R.
I always try with both https and http.
myUrl <- "https://www.imf.org/external/pubs/ft/weo/2019/02/weodata/WEOOct2019all.xls"
myUrl2 <- "http://www.imf.org/external/pubs/ft/weo/2019/02/weodata/WEOOct2019all.xls"
1. Classic approach – fails.
imf <- read.table(file=myUrl, sep="\t", header=TRUE)
# Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, :
# line 51 did not have 55 elements
imf <- read.table(file=url(myUrl), sep="\t", header=TRUE)
# Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, :
# line 51 did not have 55 elements
2. Several packages – fails.
imf <- readxl::read_xls(myUrl)
# Error: `path` does not exist: ‘https://www.imf.org/external/pubs/ft/weo/2019/02/weodata/WEOOct2019all.xls’
imf <- readxl::read_xls(myUrl2)
# Error: `path` does not exist: ‘http://www.imf.org/external/pubs/ft/weo/2019/02/weodata/WEOOct2019all.xls’
imf <- gdata::read.xls(myUrl)
# Error in xls2sep(xls, sheet, verbose = verbose, ..., method = method, :
# Intermediate file 'C:\Users\jay\AppData\Local\Temp\RtmpUtW45x\file16f873be18e0.csv' missing!
# In addition: Warning message:
# In system(cmd, intern = !verbose) :
# running command '"C:\STRAWB~1\perl\bin\perl.exe"
# "C:/Program Files/R/R-3.6.1rc/library/gdata/perl/xls2csv.pl"
# "https://www.imf.org/external/pubs/ft/weo/2019/02/weodata/WEOOct2019all.xls"
# "C:\Users\jay\AppData\Local\Temp\RtmpUtW45x\file16f873be18e0.csv" "1"' had status 2
# Error in file.exists(tfn) : invalid 'file' argument
imf <- gdata::read.xls(myUrl2) # <---------------------------------------------- THIS DOWNLOADS SOMETHING AT LEAST!
# trying URL 'http://www.imf.org/external/pubs/ft/weo/2019/02/weodata/WEOOct2019all.xls'
# Content type 'application/vnd.ms-excel' length unknown
# downloaded 8.9 MB
#
# Error in xls2sep(xls, sheet, verbose = verbose, ..., method = method, :
# Intermediate file 'C:\Users\jay\AppData\Local\Temp\RtmpUtW45x\file16f87ded406b.csv' missing!
# In addition: Warning message:
# In system(cmd, intern = !verbose) :
# running command '"C:\STRAWB~1\perl\bin\perl.exe"
# "C:/Program Files/R/R-3.6.1rc/library/gdata/perl/xls2csv.pl"
# "C:\Users\jay\AppData\Local\Temp\RtmpUtW45x\file16f87f532cb3.xls"
# "C:\Users\jay\AppData\Local\Temp\RtmpUtW45x\file16f87ded406b.csv" "1"' had status 255
# Error in file.exists(tfn) : invalid 'file' argument
3. Tempfile approach – fails.
temp <- tempfile()
download.file(myUrl, temp) # THIS WORKS...
## BUT...
imf <- gdata::read.xls(temp)
# Error in xls2sep(xls, sheet, verbose = verbose, ..., method = method, :
# Intermediate file 'C:\Users\jay\AppData\Local\Temp\RtmpUtW45x\file16f870f55e04.csv' missing!
# In addition: Warning message:
# In system(cmd, intern = !verbose) :
# running command '"C:\STRAWB~1\perl\bin\perl.exe"
# "C:/Program Files/R/R-3.6.1rc/library/gdata/perl/xls2csv.pl"
# "C:\Users\jay\AppData\Local\Temp\RtmpUtW45x\file16f8746a46db"
# "C:\Users\jay\AppData\Local\Temp\RtmpUtW45x\file16f870f55e04.csv" "1"' had status 255
# Error in file.exists(tfn) : invalid 'file' argument
# even not...
tmp1 <- readLines(temp)
# Warning message:
# In readLines(temp) :
# incomplete final line found on
# 'C:\Users\jay\AppData\Local\Temp\Rtmp00GPlq\file2334435c2905'
str(tmp1)
# chr [1:8733] "WEO Country Code\tISO\tWEO Subject Code\tCountry\tSubject
# Descriptor\tSubject Notes\tUnits\tScale\tCountry/Seri"| __truncated__ ...
4. SDMX
I also tried the SDMX the IMF offer, but also without success. Probably this would be a more sophisticated approach, but I never used SDMX.
link <- "https://www.imf.org/external/pubs/ft/weo/2019/02/weodata/WEOOct2019_SDMXData.zip"
temp <- tempfile()
download.file(link, temp, quiet=TRUE)
imf <- rsdmx::readSDMX(temp)
# Error in function (type, msg, asError = TRUE) :
# Could not resolve host: C
# imf <- rsdmx::readSDMX(unzip(temp)) # runs forever and crashes R
unlink(temp)
Now... does anybody know what's going on, and how I may load the data into R?
Why not just use fill=TRUE?
imf <- read.table(file=myUrl, sep="\t", header=TRUE, fill = TRUE)
from ?read.table
fill
logical. If TRUE then in case the rows have unequal length, blank fields are implicitly added. See ‘Details’.
Related
I have this code that I run in R regularly to read excel files from T&E folder and put into dataframe, but suddenly it has stopped working. My excel files are not open, also I have updated my R and Rstudios.
setwd("C:\\My Directory")
inputFiles = list.files(path = "T&E/", pattern="*.xlsx")
inputFiles = paste0("T&E/", inputFiles)
loadData = function(sheetName) {
Files = lapply(inputFiles, read.xlsx, sheet = sheetName)
data = data.frame(stringsAsFactors = F)
for(i in 1:length(Files)) {
data = rbind(data, Files[[i]])
}
return(data)
}
d1 = loadData('Air')
This is the error that comes up after i call the function
Error in file(con, "r") : invalid 'description' argument
In addition: Warning message:
In unzip(xlsxFile, exdir = xmlDir) : error 1 in extracting from zip file
I also did traceback()
12: file(con, "r")
11: readLines(x, warn = FALSE, encoding = "UTF-8")
10: readUTF8(workbookRelsXML)
9: paste(readUTF8(workbookRelsXML), collapse = "")
8: read.xlsx.default(X[[i]], ...)
7: FUN(X[[i]], ...)
6: lapply(inputFiles, read.xlsx, sheet = sheetName) at filename.r#57
5: loadData("Air") at filename.r#67
4: eval(ei, envir)
3: eval(ei, envir)
2: withVisible(eval(ei, envir))
1: source("C:/Users/myname/Downloads/filename.r", echo = TRUE)
I am trying to reproduce this protocol for DNA sequencing data analysis. It requires running this bash script that links to an R script. However, I am getting this error (see bottom) that I cant seem to solve.
#!/bin/bash
Project_dir=~/base
cd /${Project_dir}
SCRTP=~/scRepliseq-Pipeline
OUTNAME="bam/G1_F121_A1.adapter_filtered2"
genome_name="mm10"
bamfile=${OUTNAME}.${genome_name}.clean_srt_markdup.bam
rscript=${SCRTP}/util/Step3_R-Aneu-Fragment-bins.R
out_dir="Aneu_analysis"
Name=‘$bamfile’
Name=${name%.adapter_filtered2.${genome_name}.clean_srt_markdup.bam}
blacklist=~/blacklist/mm10-blacklist-v1_id.bed
genome_file=~/reference/UCSC_hg19_female.fa.fai
mkdir -p ${out_dir}
Rscript --vanilla $rscript ${bamfile} ${out_dir} ${name} ${blacklist} ${genome_file}
it links to this R script
args = commandArgs(TRUE)
bamfile=args[1]
out_dir=args[2]
name=args[3]
blacklist=args[4]
genome_file=args[5]
options(scipen=100)
##Extension of file name##
ext="_mapq10_blacklist_fragment.Rdata"
ext2="_mapq10_blacklist_bin.Rdata"
library(AneuFinder)
##loading black list and genome Info##
genome_tmp <- read.table(genome_file,sep="\t") #UCSC_mm9.woYwR.fa.fai
genome=data.frame(UCSC_seqlevel=genome_tmp$V1,UCSC_seqlength=genome_tmp$V2)
chromosomes=as.character(genome$UCSC_seqlevel)
##setup output directories##
out_dir_f=paste0(out_dir,"/fragment")
out_dir_b=paste0(out_dir,"/bins")
dir.create(out_dir,showWarnings = FALSE)
dir.create(out_dir_f,showWarnings = FALSE)
dir.create(out_dir_b,showWarnings = FALSE)
##save the fragment file (>10 MAPQ), filtering out the blacklist regions##
raw_reads=bam2GRanges(bamfile,remove.duplicate.reads = TRUE,min.mapq = 10,blacklist = blacklist)
save(raw_reads,file = paste0(out_dir_f,"/",name,ext))
##save the bin data file ##
bins_reads=binReads(raw_reads,
assembly=genome,
chromosomes=chromosomes,
binsizes=c(40000,80000,100000,200000,500000))
rpm=1000000/length(raw_reads)
bins_reads[["rpm"]]=rpm
save(bins_reads,file=paste(out_dir_b,"/",name,ext2,sep=""))
It shows this error:
Error in file(file, "rt") : invalid 'description' argument
Calls: read.table -> file
Execution halted
I'm new at using GIS with R and I'm trying to open an ENVI file containing hyperspectral data following the suggestions from this post R how to read ENVI .hdr-file?, but I don't seem to be able to do so. I tried three different approaches but all of them failed. I also can't seem to find any other posts where my problem is described.
# install.packages("rgdal")
# install.packages("raster")
# install.packages("caTools")
library("rgdal")
library("raster")
library("caTools")
dirname <- "S:/LAB-cavender/4_Project_Folders/oakWilt/oak_wilt_image_analyses/R_input/6.15.2021 - Revisions/ENVI export/AISA/Resampled_flights"
filename <- file.path(dirname, "AISA_Flight_4_resampled")
file.exists(filename)
The first option that I tried was using file name only
x <- read.ENVI(filename)
But I got the following error message:
#Error in read.ENVI(filename) :
# read.ENVI: Could not open input file: S:/LAB-cavender/4_Project_Folders/oakWilt/oak_wilt_image_analyses/R_input/6.15.2021 - Revisions/ENVI export/AISA/Resampled_flights/AISA_Flight_4_resampled
#In addition: Warning message:
# In nRow * nCol * nBand : NAs produced by integer overflow
I tried then the second option which is using file name + header file name read using file.path
headerfile <- file.path(dirname, "AISA_Flight_4_resampled")
x <- read.ENVI(filename = filename,headerfile = headerfile)
Again, I got an error message that says:
#Error in read.ENVI(filename = filename, headerfile = headerfile) :
# read.ENVI: Could not open input header file: S:/LAB-cavender/4_Project_Folders/oakWilt/oak_wilt_image_analyses/R_input/6.15.2021 - Revisions/ENVI export/AISA/Resampled_flights/AISA_Flight_4_resampled
Finally, I tried the third option by using file name + header file name read using readLines
hdr_file <- readLines(con = "S:/LAB-cavender/4_Project_Folders/oakWilt/oak_wilt_image_analyses/R_input/6.15.2021 - Revisions/ENVI export/AISA/Resampled_flights/AISA_Flight_4_resampled.hdr")
x <- read.ENVI(filename = filename,headerfile = hdr_file)
But I got the error message:
#Error in read.ENVI(filename = filename, headerfile = hdr_file) :
# read.ENVI: Could not open input header file: ENVIdescription = { Spectrally Resampled File. Input number of bands: 63, output number of bands: 115. [Fri Jun 25 16:57:21 2021]}samples = 5187lines = 6111bands = 115header offset = 0file type = ENVI Standarddata type = 4interleave = bilsensor type = Unknownbyte order = 0map info = {UTM, 1.000, 1.000, 482828.358, 5029367.353, 7.5000000000e-001, 7.5000000000e-001, 15, North, WGS-84, units=Meters}coordinate system string = {PROJCS["UTM_Zone_15N",GEOGCS["GCS_WGS_1984",DATUM["D_WGS_1984",SPHEROID["WGS_1984",6378137.0,298.257223563]],PRIMEM["Greenwich",0.0],UNIT["Degree",0.0174532925199433]],PROJECTION["Transverse_Mercator"],PARAMETER["False_Easting",500000.0],PARAMETER["False_Northing",0.0],PARAMETER["Central_Meridian",-93.0],PARAMETER["Scale_Factor",0.9996],PARAMETER["Latitude_Of_Origin",0.0],UNIT["Meter",1.0]]}default bands = {46,31,16}wavelength units = Nanometersdata ignore value = -9999.00000000e+000band names = { Resampled
# In addition: Warning message:
# In if (!file.exists(headerfile)) stop("read.ENVI: Could not open input header file: ", :
# the condition has length > 1 and only the first element will be used
Any help would be really appreciated!
After a bit of a gap I updated RStudio and all packages this morning.
I have a little function that I use to prettify currencies
currency <- function(n, k=FALSE) {
n <- ifelse(!k, str_c("£", comma(round(n,0))), str_c("£", comma(round(n/1000,0)),"k"))
return(n)
}
It now fails to parse - the problem is the £ sign.
Error in parse(text = lines, n = -1, srcfile = srcfile) :
[path]/plot_helpers.R:72:
25: unexpected INCOMPLETE_STRING
71: currency <- function(n, k=FALSE) {
72: n <- ifelse(!k, str_c("
^
In addition: Warning message:
In readLines(con, warn = FALSE, n = n, ok = ok, skipNul = skipNul) :
invalid input found on input connection '/home/richardc/ownCloud/prodr/R/plot_helpers.R'
However I can run the code within the editor and it works fine. What is causing readLines to fail in this way ?
After some messing about today, I realise that the problem is in devtools. To recap here is a test project testencr.prj:
library(stringr)
library(devtools)
main <- function(n) {
n <- str_c("£", n)
return(n)
}
I can run the code fine from the console, but when I use devtools it barfs on the UTF-8 character:
> devtools::load_all()
Loading testencr
Error in parse(text = lines, n = -1, srcfile = srcfile) :
/home/richardc/ownCloud/test/R/test_enc.R:6:14: unexpected INCOMPLETE_STRING
5: main <- function(n) {
6: n <- str_c("
^
In addition: There were 27 warnings (use warnings() to see them)
But when I add a specific encoding into the DESCRIPTION
Encoding: UTF-8
It is all fine (this notwithstanding the project defaults are UTF8)
Loading testencr
There were 36 warnings (use warnings() to see them)```
I was having the same problem, specifically in a Shiny app (not the rest of the time). I managed to solve it by using this unicode instead of £:
enc2utf8("\u00A3")
I am following the tutorials of Machine Learning for Hackers (https://github.com/johnmyleswhite/ML_for_Hackers) and I am using Sublime Text as a text editor. To run my code, I use SublimeREPL R.
I am using this code, taken directly from the book:
setwd("/path/to/folder")
# Load the text mining package
library(tm)
library(ggplot2)
# Loading all necessary paths
spam.path <- "data/spam/"
spam2.path <- "data/spam_2/"
easyham.path <- "data/easy_ham/"
easyham.path2 <- "data/easy_ham_2/"
hardham.path <- "data/hard_ham/"
hardham2.path <- "data/hard_ham_2/"
# Get the content of each email
get.msg <- function(path) {
con <- file(path, open = "rt", encoding = "latin1")
text <- readLines(con)
msg <- text[seq(which(text == "")[1] + 1, length(text),1)]
close(con)
return(paste(msg, collapse = "\n"))
}
# Create a vector where each element is an email
spam.docs <- dir(spam.path)
spam.docs <- spam.docs[which(spam.docs != "cmds")]
all.spam <- sapply(spam.docs, function(p) get.msg(paste(spam.path, p, sep = "")))
# Log the spam
head(all.spam)
This piece of code works fine in RStudio (with the data provided here: https://github.com/johnmyleswhite/ML_for_Hackers/tree/master/03-Classification) but when I run it in Sublime, Iget the following error message:
> all.spam <- sapply(spam.docs,
+ function(p) get.msg(file.path(spam.path, p)))
Error in seq.default(which(text == "")[1] + 1, length(text), 1) :
'from' cannot be NA, NaN or infinite
In addition: Warning messages:
1: In readLines(con) :
invalid input found on input connection 'data/spam/00006.5ab5620d3d7c6c0db76234556a16f6c1'
2: In readLines(con) :
invalid input found on input connection 'data/spam/00009.027bf6e0b0c4ab34db3ce0ea4bf2edab'
3: In readLines(con) :
invalid input found on input connection 'data/spam/00031.a78bb452b3a7376202b5e62a81530449'
4: In readLines(con) :
incomplete final line found on 'data/spam/00031.a78bb452b3a7376202b5e62a81530449'
5: In readLines(con) :
invalid input found on input connection 'data/spam/00035.7ce3307b56dd90453027a6630179282e'
6: In readLines(con) :
incomplete final line found on 'data/spam/00035.7ce3307b56dd90453027a6630179282e'
>
I get the same results when I take the code from John Myles White's repo.
How can I fix this?
Thanks
I think the problem got is in using encoding=latin1, you can just remove this one, I test it in my environment, it ran well.
spam.docs <- paste(spam.path,spam.docs,sep="")
all.spam <- sapply(spam.docs,get.msg)
Warning message:
In readLines(con) :
incomplete final line found on 'XXXXXXXXXXXXXXXXX/ML_for_Hackers-master/03-Classification/data/spam/00136.faa39d8e816c70f23b4bb8758d8a74f0'
still some warnnings in it, but it can produce the results well.
Thanks.