I´m trying to create a new excel workbook from R to save a few small datasets using xlsx package. For some reason it was working fine, but i´m unable to do it again.
Code to create a new workbook
library("xlsx")
library("xlsxjars")
library("rJava")
file <- "marca_imei.xlsx"
wb <- loadWorkbook(file)
# The error:
# Error in .jcall("RJavaTools", "Ljava/lang/Object;", "invokeMethod", cl, :
# java.lang.IllegalArgumentException: Your InputStream was neither an OLE2 stream, nor an OOXML stream
I´ve searched for an answer but it seems people are having the same error when importing data from excel.
I´ve tried what was recommended but it didn´t work. Here are some links for future searchers:
http://r.789695.n4.nabble.com/Read-shortcuts-of-MS-Excel-files-through-R-td4677020.html
http://r.789695.n4.nabble.com/Problem-with-xlsx-package-td3298470.html
sessionInfo():
locale:
[1] LC_COLLATE=Spanish_Spain.1252 LC_CTYPE=Spanish_Spain.1252 LC_MONETARY=Spanish_Spain.1252
[4] LC_NUMERIC=C LC_TIME=Spanish_Spain.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] xlsx_0.5.5 xlsxjars_0.6.0 RJDBC_0.2-3 rJava_0.9-6
[5] DBI_0.2-7 slidifyLibraries_0.3.1 slidify_0.4 knitr_1.5
[9] devtools_1.4.1 scales_0.2.3 ggplot2_0.9.3.1 data.table_1.8.11
[13] reshape2_1.2.2
loaded via a namespace (and not attached):
[1] colorspace_1.2-4 dichromat_2.0-0 digest_0.6.4 evaluate_0.5.1 formatR_0.10
[6] grid_3.0.2 gtable_0.1.2 httr_0.2 labeling_0.2 markdown_0.6.3
[11] MASS_7.3-29 memoise_0.1 munsell_0.4.2 parallel_3.0.2 plyr_1.8
[16] proto_0.3-10 RColorBrewer_1.0-5 RCurl_1.95-4.1 stringr_0.6.2 tools_3.0.2
[21] whisker_0.3-2 yaml_2.1.10
Martin,
I believe the issue is that the file you are reading in is not a valid .xlsx file. Here is a code example to reproduce your problem. You can also modify the example to solve the problem. The example uses an example data set from the web (Speed Camera locations baltimore :-)).
In essence line 16 is the culprit of the error triggered on line 26 that generates the error you see.
Error in .jcall("RJavaTools", "Ljava/lang/Object;", "invokeMethod", cl, :
`java.lang.IllegalArgumentException: Your InputStream was neither an OLE2 stream, nor an OOXML stream
to reproduce the error download the file "rows.csv", when you invoke read.xlsx on line 26 it triggers the error you see. To fix change line 16 to download "rows.xlsx" and rerun the script below:
#!/usr/bin/env Rscript
# Ensure Clean Setup...
# Unload packages
if (require(xlsx)) {
detach("package:xlsx", unload=TRUE)
}
if (require(xlsxjars)) {
detach("package:xlsxjars", unload=TRUE)
}
# Delete Environment...
rm(list = ls())
# Delete directory
if (file.exists("data")) {
unlink("./data", recursive = TRUE)
}
# OK - we should be in a base state setup test...
if (!require(xlsx)) {
install.packages("xlsx")
}
if (!file.exists("data")) {
dir.create("data")
}
# Download the file as a CSV file (Deliberate mistake) not a XLSX file
# This causes the error seen when read.xlsx is invoked...
# To fix replace rows.csv with rows.xlsx
if (!file.exists("data/cameras.xlsx")) {
fileUrl <- "https://data.baltimorecity.gov/api/views/dz54-2aru/rows.csv?accessType=DOWNLOAD"
download.file(fileUrl, destfile = "./data/cameras.xlsx", method = "curl")
}
list.files("./data")
# Now we check the file exists and read in the data...
# read.xlsx will throw the java error as the file downloaded is not a valid excel file...
if (!file.exists(".data/cameraData.xlsx")) {
cameraData.xlsx <- read.xlsx("./data/cameras.xlsx", sheetIndex=1, header = TRUE)
}
head(cameraData.xlsx)
Here is the example output:
Load rows.csv...
source('test.R')
Loading required package: xlsx
Loading required package: xlsxjars
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0100 9294 100 9294 0 0 33870 0 --:--:-- --:--:-- --:--:-- 33796
Error in .jcall("RJavaTools", "Ljava/lang/Object;", "invokeMethod", cl, :
java.lang.IllegalArgumentException: Your InputStream was neither an OLE2 stream, nor an OOXML stream
now we replace rows.csv with rows.xlsx...
> source('test.R', echo=TRUE)
> #!/usr/bin/env Rscript
>
> # Ensure Clean Setup...
> # Unload packages
> if (require(xlsx)) {
+ detach("package:xlsx", unload=TRUE)
+ }
> if (require(xlsxjars)) {
+ detach("package:xlsxjars", unload=TRUE)
+ }
> # Delete Environment...
> rm(list = ls())
> # Delete directory
> if (file.exists("data")) {
+ unlink("./data", recursive = TRUE)
+ }
> # OK - we should be in a base state setup test...
>
> if (!require(xlsx)) {
+ install.packages("xlsx")
+ }
Loading required package: xlsx
Loading required package: xlsxjars
> if (!file.exists("data")) {
+ dir.create("data")
+ }
> # Download the file as a CSV file (Deliberate mistake) not a XLSX file
> # This causes the error seen when read.xlsx is invoked...
> # To fix replac .... [TRUNCATED]
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0100 9923 100 9923 0 0 48559 0 --:--:-- --:--:-- --:--:-- 48642
> list.files("./data")
[1] "cameras.xlsx"
> # Now we check the file exists and read in the data...
> # read.xlsx will throw the java error as the file downloaded is not a valid excel file...
> .... [TRUNCATED]
> head(cameraData.xlsx)
address direction street crossStreet intersection Location.1
1 S CATON AVE & BENSON AVE N/B Caton Ave Benson Ave Caton Ave & Benson Ave (39.2693779962, -76.6688185297)
2 S CATON AVE & BENSON AVE S/B Caton Ave Benson Ave Caton Ave & Benson Ave (39.2693157898, -76.6689698176)
3 WILKENS AVE & PINE HEIGHTS AVE E/B Wilkens Ave Pine Heights Wilkens Ave & Pine Heights (39.2720252302, -76.676960806)
4 THE ALAMEDA & E 33RD ST S/B The Alameda 33rd St The Alameda & 33rd St (39.3285013141, -76.5953545714)
5 E 33RD ST & THE ALAMEDA E/B E 33rd The Alameda E 33rd & The Alameda (39.3283410623, -76.5953594625)
6 ERDMAN AVE & N MACON ST E/B Erdman Macon St Erdman & Macon St (39.3068045671, -76.5593167803)
>
It is possible the problem is with Java, not XLConnect. Be sure you have Java installed by taking the test on the Java site -- it will confirm Java is correctly installed. Then make sure R knows the path to find the jre.dll or something like that file name for what is crucial.
Second, here is the code I have been using for a year, without the error message you got.
If it helps you ....
read.xls <- function(filename, sheetnumber=1, sheetname=NULL, forceConversion=TRUE, startCol=0, stringsAsFactors=TRUE) {
wb <- loadWorkbook(filename)
if (is.null(sheetname)) sheetname = getSheets(wb)[sheetnumber]
df <- readWorksheet(wb, sheet=sheetname, forceConversion=forceConversion, startCol=startCol)
if (stringsAsFactors) {
ischar <- sapply(df, class) == "character"
for (i in 1:length(df)) {
if (ischar[i]) df[,i] <- factor(df[,i])
}
}
df
}
Related
I have the following R script ~/test.R :
print(.libPaths())
print(system(command = "whoami",ignore.stderr = TRUE))
library(lubridate)
ymd("2022-09-15")
If I run this script from the terminal with /opt/R/3.6.2/lib64/R/bin/Rscript test.R > test2.log I get the following output:
[1] "/home/domain/username/R/library/3.6.2"
[2] "/applis/R/site-library/x86_64-pc-linux-gnu/3.6.2"
[3] "/opt/R/3.6.2/lib64/R/library"
username#domain
[1] 0
[1] "2022-09-15"
So it's working as intended and I have 3 paths for packages. Now let's run this script with cron :
* * * * * /opt/R/3.6.2/lib64/R/bin/Rscript $HOME/test.R > $HOME/test.log 2>&1
I get this for test.log:
[1] "/opt/R/3.6.2/lib64/R/library"
username#domain
[1] 0
Error in library(lubridate) :
aucun package nommé ‘lubridate’ n'est trouvé
Exécution arrêtée
So I only have one path for libraries, consequently lubridate is not found, because it's installed in /home/domain/username/R/library/3.6.2. I cannot install packages within /opt/R/3.6.2/lib64/R/library, so I'm looking for a way to add libpaths to crontab.
I've updated Strawberry Perl 64-bit 5.30.2001 and the gdata package. Now, when loading library(gdata) I always get this warning messages which appear to be related to Perl.
suppressPackageStartupMessages(library(gdata))
# Warning messages:
# 1: In system(cmd, intern = intern, wait = wait | intern, show.output.on.console = wait, :
# running command 'C:\Windows\system32\cmd.exe /c ftype perl' had status 2
# 2: In system(cmd, intern = intern, wait = wait | intern, show.output.on.console = wait, :
# running command 'C:\Windows\system32\cmd.exe /c ftype perl' had status 2
However, read.xls, the function I need, seems to run well, except that the warning is repeated every time I use it.
read.xls("http://file-examples-com.github.io/uploads/2017/02/file_example_XLS_10.xls")
# trying URL 'http://file-examples-com.github.io/uploads/2017/02/file_example_XLS_10.xls'
# Content type 'application/vnd.ms-excel' length 8704 bytes
# downloaded 8704 bytes
# X0 First.Name Last.Name Gender Country Age Date Id
# 1 1 Dulce Abril Female United States 32 15/10/2017 1562
# 2 2 Mara Hashimoto Female Great Britain 25 16/08/2016 1582
# 3 3 Philip Gent Male France 36 21/05/2015 2587
# 4 4 Kathleen Hanner Female United States 25 15/10/2017 3549
# 5 5 Nereida Magwood Female United States 58 16/08/2016 2468
# 6 6 Gaston Brumm Male United States 24 21/05/2015 2554
# 7 7 Etta Hurn Female Great Britain 56 15/10/2017 3598
# 8 8 Earlean Melgar Female United States 27 16/08/2016 2456
# 9 9 Vincenza Weiland Female United States 40 21/05/2015 6548
# Warning messages:
# 1: In system(cmd, intern = intern, wait = wait | intern, show.output.on.console = wait, :
# running command 'C:\Windows\system32\cmd.exe /c ftype perl' had status 2
# 2: In system(cmd, intern = intern, wait = wait | intern, show.output.on.console = wait, :
# running command 'C:\Windows\system32\cmd.exe /c ftype perl' had status 2
I'm not sure how to deal with this warning because it says nothing to me, I could probably just ignore it and wrap a suppressWarnings() around it.
Nevertheless, does anybody know a way to fix this? I couldn't find anything by googling and don't know where to start and what's actually wrong.
> sessionInfo()
R version 4.0.2 (2020-06-22)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] gdata_2.18.0
loaded via a namespace (and not attached):
[1] compiler_4.0.2 tools_4.0.2 gtools_3.8.2
I had the same issue with a freshly installed version of R, gdata and Strawberry Perl. I finally found this answer to a different (but related) question. Adapting the suggestion there, I ran the following on an elevated command promt:
FTYPE perl=C:\Strawberry\perl\bin\perl.exe %1 %*
This solved the issue for me – however: I am not sure if setting the FTYPE like this might have any unwanted side effects. So be careful.
Update: The command above did suppress the warning "ftype perl' had status 2" for me, but gdata still had issues:
gdata: Unable to load perl libaries needed by read.xls()
gdata: to support 'XLSX' (Excel 2007+) files.
gdata: Run the function 'installXLSXsupport()'
gdata: to automatically download and install the perl
gdata: libaries needed to support Excel XLS and XLSX formats.
However, installXLSXsupport() failed with an unspecific error message.
I then ran
Sys.which("perl")
perl
"C:\\rtools40\\usr\\bin\\perl.exe"
and realized that the Perl version from RTools takes precedence over my Strawberry Perl installation – and apparently gdata does not "like" that Perl version.
Therefore, I decided to give Strawberry Perl precedence over RTools by changing my .Renviron file (usethis::edit_r_environ()):
PATH="${RTOOLS40_HOME}\usr\bin;${PATH}" # old
PATH="${PATH};${RTOOLS40_HOME}\usr\bin" # new
Again, I'm not entirely sure what ramifications this might have, but it fixed gdata for me.
Maybe adjusting the PATH alone would also have done the trick (without the ftype stunt I made first), but I cannot test this anymore.
What I recommend:
Adjust the PATH first.
If gdata still complains about the ftype, set the ftype.
I want to write a package with internal data, and my method is discribe here
My DESCRIPTION file is:
Package: cancerProfile
Title: A collection of data sets of cancer
Version: 0.1
Authors#R: person("NavyCheng", email = "navycheng2020#gmail.com", role = c("aut", "cre"))
Description: This package contain some data sets of cancers, such as RNA-seq data, TF bind data and so on.
Depends: R (>= 3.4.0)
License: What license is it under?
Encoding: UTF-8
LazyData: true
and my project is like this:
cancerProfile.Rproj
NAMESPACE
LICENSE
DESCRIPTION
R/
data/
|-- prad.rna.count.rda
Then I install my package and load it:
> library(pryr)
> library(devtools)
> install_github('hcyvan/cancerProfile')
> library(cancerProfile)
> mem_used()
82.2 MB
> invisible(prad.rna.count)
> mem_used()
356 MB
> ls()
character(0)
> prad.rna.count[1:3,1:3]
TCGA.2A.A8VL.01A TCGA.2A.A8VO.01A TCGA.2A.A8VT.01A
ENSG00000000003.13 2867 1667 3140
ENSG00000000005.5 6 0 0
ENSG00000000419.11 1354 888 1767
> rm(prad.rna.count)
Warning message:
In rm(prad.rna.count) : object 'prad.rna.count' not found
My question is why I can't 'ls' and 'rm' prad.rna.count and how can I don this?
In your case you couldn't ls() or rm() the dataset because you never put it in your global environment. Consider the following:
# devtools::install_github("hcyvan/cancerProfile")
library(cancerProfile)
library(pryr)
mem_used()
#> 31.8 MB
data(prad.rna.count)
mem_used()
#> 32.2 MB
ls()
#> [1] "prad.rna.count"
prad.rna.count[1:3,1:3]
#> TCGA.2A.A8VL.01A TCGA.2A.A8VO.01A TCGA.2A.A8VT.01A
#> ENSG00000000003.13 2867 1667 3140
#> ENSG00000000005.5 6 0 0
#> ENSG00000000419.11 1354 888 1767
mem_used()
#> 305 MB
rm(prad.rna.count)
ls()
#> character(0)
mem_used()
#> 32.5 MB
Created on 2019-01-15 by the reprex package (v0.2.1)
Since I used data() rather than invisible(), I actually put the data into the global environment, allowing me to see it via ls() and remove it via rm(). The way I loaded the data (data()) didn't increase memory usage because it just returns a promise, but when I evaluated the promise via prad.rna.count[1:3,1:3], the memory usage shot up. Luckily, since I had a name pointing to the object by using data() rather than invisible(), when I used rm(prad.rna.count), R recognized there was no longer a name pointing to that object and released the memory. I'd check out http://adv-r.had.co.nz/memory.html#gc and http://r-pkgs.had.co.nz/data.html#data-data for more details.
I am working with data from: Environment Canada
I am using download.file() to acquire this data. When I use:
download.file(url="http://dd.weather.gc.ca/model_gem_global/25km/grib2/lat_lon/00/000/CMC_glb_VGRD_ISBL_1000_latlon.24x.24_2015091100_P000.grib2",destfile = "Local_Grib.grib2")
GribInfo(grib.file = "Local_File.grib2",file.type = "grib2")
It yields:
$inventory
[1] "" "*** FATAL ERROR: rd_grib2_msg, missing end section ('7777') ***"
[3] ""
attr(,"status")
[1] 8
$grid
[1] "" "*** FATAL ERROR: rd_grib2_msg, missing end section ('7777') ***"
[3] ""
attr(,"status")
[1] 8
Warning messages:
1: running command 'wgrib2 Local_File.grib2 -inv -' had status 8
2: running command 'wgrib2 Local_File.grib2 -grid' had status 8
Whilst a manual download followed by:
GribInfo(grib.file = "CMC_glb_TMP_ISBL_985_latlon.24x.24_2015091100_P000.grib2",file.type = "grib2")
Yields:
$inventory
[1] "1:0:d=2015091100:TMP:985 mb:anl:"
$grid
[1] "1:0:grid_template=0:winds(N/S):" "\tlat-lon grid:(1500 x 751) units 1e-06 input WE:SN output WE:SN res 48"
[3] "\tlat -90.000000 to 90.000000 by 0.240000" "\tlon 180.000000 to 179.760000 by 0.240000 #points=1126500"
I have attempted using the Curl and Wget methods within download.file() however they fail giving a non exit error. I am able to obtain these files using a wget batch file however, I would prefer my entire system be run within R for consistency and ease of use.
As per #Martin Morgan. Downloading as a binary will circumvent this issue. Thanks again Martin.
download.file(url="http://dd.weather.gc.ca/model_gem_global/25km/grib2/lat_lon/00/000/CMC_glb_VGRD_ISBL_1000_latlon.24x.24_2015091100_P000.grib2",destfile = "Local_Grib.grib2", mode="wb")
GribInfo(grib.file = "Local_File.grib2",file.type = "grib2")
I have downloaded the package SemiPar and I have been trying to attach the dataset fuel.frame, using the command data(fuel.frame), but without sucess. The error I have been getting is:
Error in read.table(zfile, header = TRUE, as.is = FALSE) :
more columns than column names
In addition: Warning messages:
1: In read.table(zfile, header = TRUE, as.is = FALSE) :
line 1 appears to contain embedded nulls
2: In read.table(zfile, header = TRUE, as.is = FALSE) :
line 5 appears to contain embedded nulls
3: In read.table(zfile, header = TRUE, as.is = FALSE) :
incomplete final line found by readTableHeader on 'C:/...
Could you please tell me what is wrong here? I have tried to look for solutions online but it seems the package works for everyone besides myself.
My sessionInfo()
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] SemiPar_1.0-4.1
loaded via a namespace (and not attached):
[1] cluster_1.15.3 grid_3.1.1 lattice_0.20-29 MASS_7.3-33 nlme_3.1-117
[6] tools_3.1.1
Thank you.
The "fuel.frame" file is actually in the ../SemiPar/data/ directory wherever your library is. You can use the .libPaths() function. For me it returns:
> .libPaths()
[1] "/Library/Frameworks/R.framework/Versions/3.1/Resources/library"
If you look in there you should see "fuel.frame.txt.gz" which tells you that it's a gzipped file that will expand to a text file (which is what the data() call is doing before passing it to read.table() ). The top of it looks like:
car.name Weight Disp. Mileage Fuel Type
"Eagle Summit 4" 2560 97 33 3.030303 Small
"Ford Escort 4" 2345 114 33 3.030303 Small
"Ford Festiva 4" 1845 81 37 2.702703 Small
"Honda Civic 4" 2260 91 32 3.125000 Small
"Mazda Protege 4" 2440 113 32 3.125000 Small
"Mercury Tracer 4" 2285 97 26 3.846154 Small
"Nissan Sentra 4" 2275 97 33 3.030303 Small
"Pontiac LeMans 4" 2350 98 28 3.571429 Small
As you can see your error message is not correct about my copy. So you may want to use your unnamed system to expand the .gz file and investigate. (I was not getting an error with my R 3.1.1 (SnowLeopard build) running in OSX 10.7.5.) With my setup this also succeeds:
data('fuel.frame',
lib.loc='/Library/Frameworks/R.framework/Versions/3.1/Resources/library/')