How to read IMF xls- or sdmx-data from url? - r

From the IMF I want to read a .xls file from an URL directly into R, but all attempts fail so far. Weirdly, I can download the file manually or by download.file() and open it without problems in Microsoft Outlook or in a text editor. However, even then I can't read the data into R.
I always try with both https and http.
myUrl <- "https://www.imf.org/external/pubs/ft/weo/2019/02/weodata/WEOOct2019all.xls"
myUrl2 <- "http://www.imf.org/external/pubs/ft/weo/2019/02/weodata/WEOOct2019all.xls"
1. Classic approach – fails.
imf <- read.table(file=myUrl, sep="\t", header=TRUE)
# Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, :
# line 51 did not have 55 elements
imf <- read.table(file=url(myUrl), sep="\t", header=TRUE)
# Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, :
# line 51 did not have 55 elements
2. Several packages – fails.
imf <- readxl::read_xls(myUrl)
# Error: `path` does not exist: ‘https://www.imf.org/external/pubs/ft/weo/2019/02/weodata/WEOOct2019all.xls’
imf <- readxl::read_xls(myUrl2)
# Error: `path` does not exist: ‘http://www.imf.org/external/pubs/ft/weo/2019/02/weodata/WEOOct2019all.xls’
imf <- gdata::read.xls(myUrl)
# Error in xls2sep(xls, sheet, verbose = verbose, ..., method = method, :
# Intermediate file 'C:\Users\jay\AppData\Local\Temp\RtmpUtW45x\file16f873be18e0.csv' missing!
# In addition: Warning message:
# In system(cmd, intern = !verbose) :
# running command '"C:\STRAWB~1\perl\bin\perl.exe"
# "C:/Program Files/R/R-3.6.1rc/library/gdata/perl/xls2csv.pl"
# "https://www.imf.org/external/pubs/ft/weo/2019/02/weodata/WEOOct2019all.xls"
# "C:\Users\jay\AppData\Local\Temp\RtmpUtW45x\file16f873be18e0.csv" "1"' had status 2
# Error in file.exists(tfn) : invalid 'file' argument
imf <- gdata::read.xls(myUrl2) # <---------------------------------------------- THIS DOWNLOADS SOMETHING AT LEAST!
# trying URL 'http://www.imf.org/external/pubs/ft/weo/2019/02/weodata/WEOOct2019all.xls'
# Content type 'application/vnd.ms-excel' length unknown
# downloaded 8.9 MB
#
# Error in xls2sep(xls, sheet, verbose = verbose, ..., method = method, :
# Intermediate file 'C:\Users\jay\AppData\Local\Temp\RtmpUtW45x\file16f87ded406b.csv' missing!
# In addition: Warning message:
# In system(cmd, intern = !verbose) :
# running command '"C:\STRAWB~1\perl\bin\perl.exe"
# "C:/Program Files/R/R-3.6.1rc/library/gdata/perl/xls2csv.pl"
# "C:\Users\jay\AppData\Local\Temp\RtmpUtW45x\file16f87f532cb3.xls"
# "C:\Users\jay\AppData\Local\Temp\RtmpUtW45x\file16f87ded406b.csv" "1"' had status 255
# Error in file.exists(tfn) : invalid 'file' argument
3. Tempfile approach – fails.
temp <- tempfile()
download.file(myUrl, temp) # THIS WORKS...
## BUT...
imf <- gdata::read.xls(temp)
# Error in xls2sep(xls, sheet, verbose = verbose, ..., method = method, :
# Intermediate file 'C:\Users\jay\AppData\Local\Temp\RtmpUtW45x\file16f870f55e04.csv' missing!
# In addition: Warning message:
# In system(cmd, intern = !verbose) :
# running command '"C:\STRAWB~1\perl\bin\perl.exe"
# "C:/Program Files/R/R-3.6.1rc/library/gdata/perl/xls2csv.pl"
# "C:\Users\jay\AppData\Local\Temp\RtmpUtW45x\file16f8746a46db"
# "C:\Users\jay\AppData\Local\Temp\RtmpUtW45x\file16f870f55e04.csv" "1"' had status 255
# Error in file.exists(tfn) : invalid 'file' argument
# even not...
tmp1 <- readLines(temp)
# Warning message:
# In readLines(temp) :
# incomplete final line found on
# 'C:\Users\jay\AppData\Local\Temp\Rtmp00GPlq\file2334435c2905'
str(tmp1)
# chr [1:8733] "WEO Country Code\tISO\tWEO Subject Code\tCountry\tSubject
# Descriptor\tSubject Notes\tUnits\tScale\tCountry/Seri"| __truncated__ ...
4. SDMX
I also tried the SDMX the IMF offer, but also without success. Probably this would be a more sophisticated approach, but I never used SDMX.
link <- "https://www.imf.org/external/pubs/ft/weo/2019/02/weodata/WEOOct2019_SDMXData.zip"
temp <- tempfile()
download.file(link, temp, quiet=TRUE)
imf <- rsdmx::readSDMX(temp)
# Error in function (type, msg, asError = TRUE) :
# Could not resolve host: C
# imf <- rsdmx::readSDMX(unzip(temp)) # runs forever and crashes R
unlink(temp)
Now... does anybody know what's going on, and how I may load the data into R?

Why not just use fill=TRUE?
imf <- read.table(file=myUrl, sep="\t", header=TRUE, fill = TRUE)
from ?read.table
fill
logical. If TRUE then in case the rows have unequal length, blank fields are implicitly added. See ‘Details’.

Related

Reading multiple excel files from folder in R, Error in file(con, "r") : invalid 'description' argument

I have this code that I run in R regularly to read excel files from T&E folder and put into dataframe, but suddenly it has stopped working. My excel files are not open, also I have updated my R and Rstudios.
setwd("C:\\My Directory")
inputFiles = list.files(path = "T&E/", pattern="*.xlsx")
inputFiles = paste0("T&E/", inputFiles)
loadData = function(sheetName) {
Files = lapply(inputFiles, read.xlsx, sheet = sheetName)
data = data.frame(stringsAsFactors = F)
for(i in 1:length(Files)) {
data = rbind(data, Files[[i]])
}
return(data)
}
d1 = loadData('Air')
This is the error that comes up after i call the function
Error in file(con, "r") : invalid 'description' argument
In addition: Warning message:
In unzip(xlsxFile, exdir = xmlDir) : error 1 in extracting from zip file
I also did traceback()
12: file(con, "r")
11: readLines(x, warn = FALSE, encoding = "UTF-8")
10: readUTF8(workbookRelsXML)
9: paste(readUTF8(workbookRelsXML), collapse = "")
8: read.xlsx.default(X[[i]], ...)
7: FUN(X[[i]], ...)
6: lapply(inputFiles, read.xlsx, sheet = sheetName) at filename.r#57
5: loadData("Air") at filename.r#67
4: eval(ei, envir)
3: eval(ei, envir)
2: withVisible(eval(ei, envir))
1: source("C:/Users/myname/Downloads/filename.r", echo = TRUE)

Error in file(file, "rt") : invalid 'description' argument when running R script

I am trying to reproduce this protocol for DNA sequencing data analysis. It requires running this bash script that links to an R script. However, I am getting this error (see bottom) that I cant seem to solve.
#!/bin/bash
Project_dir=~/base
cd /${Project_dir}
SCRTP=~/scRepliseq-Pipeline
OUTNAME="bam/G1_F121_A1.adapter_filtered2"
genome_name="mm10"
bamfile=${OUTNAME}.${genome_name}.clean_srt_markdup.bam
rscript=${SCRTP}/util/Step3_R-Aneu-Fragment-bins.R
out_dir="Aneu_analysis"
Name=‘$bamfile’
Name=${name%.adapter_filtered2.${genome_name}.clean_srt_markdup.bam}
blacklist=~/blacklist/mm10-blacklist-v1_id.bed
genome_file=~/reference/UCSC_hg19_female.fa.fai
mkdir -p ${out_dir}
Rscript --vanilla $rscript ${bamfile} ${out_dir} ${name} ${blacklist} ${genome_file}
it links to this R script
args = commandArgs(TRUE)
bamfile=args[1]
out_dir=args[2]
name=args[3]
blacklist=args[4]
genome_file=args[5]
options(scipen=100)
##Extension of file name##
ext="_mapq10_blacklist_fragment.Rdata"
ext2="_mapq10_blacklist_bin.Rdata"
library(AneuFinder)
##loading black list and genome Info##
genome_tmp <- read.table(genome_file,sep="\t") #UCSC_mm9.woYwR.fa.fai
genome=data.frame(UCSC_seqlevel=genome_tmp$V1,UCSC_seqlength=genome_tmp$V2)
chromosomes=as.character(genome$UCSC_seqlevel)
##setup output directories##
out_dir_f=paste0(out_dir,"/fragment")
out_dir_b=paste0(out_dir,"/bins")
dir.create(out_dir,showWarnings = FALSE)
dir.create(out_dir_f,showWarnings = FALSE)
dir.create(out_dir_b,showWarnings = FALSE)
##save the fragment file (>10 MAPQ), filtering out the blacklist regions##
raw_reads=bam2GRanges(bamfile,remove.duplicate.reads = TRUE,min.mapq = 10,blacklist = blacklist)
save(raw_reads,file = paste0(out_dir_f,"/",name,ext))
##save the bin data file ##
bins_reads=binReads(raw_reads,
assembly=genome,
chromosomes=chromosomes,
binsizes=c(40000,80000,100000,200000,500000))
rpm=1000000/length(raw_reads)
bins_reads[["rpm"]]=rpm
save(bins_reads,file=paste(out_dir_b,"/",name,ext2,sep=""))
It shows this error:
Error in file(file, "rt") : invalid 'description' argument
Calls: read.table -> file
Execution halted

How to open/read an ENVI file in R

I'm new at using GIS with R and I'm trying to open an ENVI file containing hyperspectral data following the suggestions from this post R how to read ENVI .hdr-file?, but I don't seem to be able to do so. I tried three different approaches but all of them failed. I also can't seem to find any other posts where my problem is described.
# install.packages("rgdal")
# install.packages("raster")
# install.packages("caTools")
library("rgdal")
library("raster")
library("caTools")
dirname <- "S:/LAB-cavender/4_Project_Folders/oakWilt/oak_wilt_image_analyses/R_input/6.15.2021 - Revisions/ENVI export/AISA/Resampled_flights"
filename <- file.path(dirname, "AISA_Flight_4_resampled")
file.exists(filename)
The first option that I tried was using file name only
x <- read.ENVI(filename)
But I got the following error message:
#Error in read.ENVI(filename) :
# read.ENVI: Could not open input file: S:/LAB-cavender/4_Project_Folders/oakWilt/oak_wilt_image_analyses/R_input/6.15.2021 - Revisions/ENVI export/AISA/Resampled_flights/AISA_Flight_4_resampled
#In addition: Warning message:
# In nRow * nCol * nBand : NAs produced by integer overflow
I tried then the second option which is using file name + header file name read using file.path
headerfile <- file.path(dirname, "AISA_Flight_4_resampled")
x <- read.ENVI(filename = filename,headerfile = headerfile)
Again, I got an error message that says:
#Error in read.ENVI(filename = filename, headerfile = headerfile) :
# read.ENVI: Could not open input header file: S:/LAB-cavender/4_Project_Folders/oakWilt/oak_wilt_image_analyses/R_input/6.15.2021 - Revisions/ENVI export/AISA/Resampled_flights/AISA_Flight_4_resampled
Finally, I tried the third option by using file name + header file name read using readLines
hdr_file <- readLines(con = "S:/LAB-cavender/4_Project_Folders/oakWilt/oak_wilt_image_analyses/R_input/6.15.2021 - Revisions/ENVI export/AISA/Resampled_flights/AISA_Flight_4_resampled.hdr")
x <- read.ENVI(filename = filename,headerfile = hdr_file)
But I got the error message:
#Error in read.ENVI(filename = filename, headerfile = hdr_file) :
# read.ENVI: Could not open input header file: ENVIdescription = { Spectrally Resampled File. Input number of bands: 63, output number of bands: 115. [Fri Jun 25 16:57:21 2021]}samples = 5187lines = 6111bands = 115header offset = 0file type = ENVI Standarddata type = 4interleave = bilsensor type = Unknownbyte order = 0map info = {UTM, 1.000, 1.000, 482828.358, 5029367.353, 7.5000000000e-001, 7.5000000000e-001, 15, North, WGS-84, units=Meters}coordinate system string = {PROJCS["UTM_Zone_15N",GEOGCS["GCS_WGS_1984",DATUM["D_WGS_1984",SPHEROID["WGS_1984",6378137.0,298.257223563]],PRIMEM["Greenwich",0.0],UNIT["Degree",0.0174532925199433]],PROJECTION["Transverse_Mercator"],PARAMETER["False_Easting",500000.0],PARAMETER["False_Northing",0.0],PARAMETER["Central_Meridian",-93.0],PARAMETER["Scale_Factor",0.9996],PARAMETER["Latitude_Of_Origin",0.0],UNIT["Meter",1.0]]}default bands = {46,31,16}wavelength units = Nanometersdata ignore value = -9999.00000000e+000band names = { Resampled
# In addition: Warning message:
# In if (!file.exists(headerfile)) stop("read.ENVI: Could not open input header file: ", :
# the condition has length > 1 and only the first element will be used
Any help would be really appreciated!

RStudio fails to read GBP pound sign

After a bit of a gap I updated RStudio and all packages this morning.
I have a little function that I use to prettify currencies
currency <- function(n, k=FALSE) {
n <- ifelse(!k, str_c("£", comma(round(n,0))), str_c("£", comma(round(n/1000,0)),"k"))
return(n)
}
It now fails to parse - the problem is the £ sign.
Error in parse(text = lines, n = -1, srcfile = srcfile) :
[path]/plot_helpers.R:72:
25: unexpected INCOMPLETE_STRING
71: currency <- function(n, k=FALSE) {
72: n <- ifelse(!k, str_c("
^
In addition: Warning message:
In readLines(con, warn = FALSE, n = n, ok = ok, skipNul = skipNul) :
invalid input found on input connection '/home/richardc/ownCloud/prodr/R/plot_helpers.R'
However I can run the code within the editor and it works fine. What is causing readLines to fail in this way ?
After some messing about today, I realise that the problem is in devtools. To recap here is a test project testencr.prj:
library(stringr)
library(devtools)
main <- function(n) {
n <- str_c("£", n)
return(n)
}
I can run the code fine from the console, but when I use devtools it barfs on the UTF-8 character:
> devtools::load_all()
Loading testencr
Error in parse(text = lines, n = -1, srcfile = srcfile) :
/home/richardc/ownCloud/test/R/test_enc.R:6:14: unexpected INCOMPLETE_STRING
5: main <- function(n) {
6: n <- str_c("
^
In addition: There were 27 warnings (use warnings() to see them)
But when I add a specific encoding into the DESCRIPTION
Encoding: UTF-8
It is all fine (this notwithstanding the project defaults are UTF8)
Loading testencr
There were 36 warnings (use warnings() to see them)```
I was having the same problem, specifically in a Shiny app (not the rest of the time). I managed to solve it by using this unicode instead of £:
enc2utf8("\u00A3")

R/SublimeREPL R - code not working in sublime but working in RStudio

I am following the tutorials of Machine Learning for Hackers (https://github.com/johnmyleswhite/ML_for_Hackers) and I am using Sublime Text as a text editor. To run my code, I use SublimeREPL R.
I am using this code, taken directly from the book:
setwd("/path/to/folder")
# Load the text mining package
library(tm)
library(ggplot2)
# Loading all necessary paths
spam.path <- "data/spam/"
spam2.path <- "data/spam_2/"
easyham.path <- "data/easy_ham/"
easyham.path2 <- "data/easy_ham_2/"
hardham.path <- "data/hard_ham/"
hardham2.path <- "data/hard_ham_2/"
# Get the content of each email
get.msg <- function(path) {
con <- file(path, open = "rt", encoding = "latin1")
text <- readLines(con)
msg <- text[seq(which(text == "")[1] + 1, length(text),1)]
close(con)
return(paste(msg, collapse = "\n"))
}
# Create a vector where each element is an email
spam.docs <- dir(spam.path)
spam.docs <- spam.docs[which(spam.docs != "cmds")]
all.spam <- sapply(spam.docs, function(p) get.msg(paste(spam.path, p, sep = "")))
# Log the spam
head(all.spam)
This piece of code works fine in RStudio (with the data provided here: https://github.com/johnmyleswhite/ML_for_Hackers/tree/master/03-Classification) but when I run it in Sublime, Iget the following error message:
> all.spam <- sapply(spam.docs,
+ function(p) get.msg(file.path(spam.path, p)))
Error in seq.default(which(text == "")[1] + 1, length(text), 1) :
'from' cannot be NA, NaN or infinite
In addition: Warning messages:
1: In readLines(con) :
invalid input found on input connection 'data/spam/00006.5ab5620d3d7c6c0db76234556a16f6c1'
2: In readLines(con) :
invalid input found on input connection 'data/spam/00009.027bf6e0b0c4ab34db3ce0ea4bf2edab'
3: In readLines(con) :
invalid input found on input connection 'data/spam/00031.a78bb452b3a7376202b5e62a81530449'
4: In readLines(con) :
incomplete final line found on 'data/spam/00031.a78bb452b3a7376202b5e62a81530449'
5: In readLines(con) :
invalid input found on input connection 'data/spam/00035.7ce3307b56dd90453027a6630179282e'
6: In readLines(con) :
incomplete final line found on 'data/spam/00035.7ce3307b56dd90453027a6630179282e'
>
I get the same results when I take the code from John Myles White's repo.
How can I fix this?
Thanks
I think the problem got is in using encoding=latin1, you can just remove this one, I test it in my environment, it ran well.
spam.docs <- paste(spam.path,spam.docs,sep="")
all.spam <- sapply(spam.docs,get.msg)
Warning message:
In readLines(con) :
incomplete final line found on 'XXXXXXXXXXXXXXXXX/ML_for_Hackers-master/03-Classification/data/spam/00136.faa39d8e816c70f23b4bb8758d8a74f0'
still some warnnings in it, but it can produce the results well.
Thanks.

Resources