I am trying to read a BUFR binary in R for cyclone track forecasts.
Trying to read the file using rasters since that worked on GRIB files, but are not successful as the files are not recognized.
Related
I’m getting this error every time I load my excel files into R:
error openxlsx can only read .xlsx files
. However, all the files are .xlsx files and data looks like it loads ok. But as I’m new to R I'm not sure if I am safe to ignore this error and just treat it as a warning.
I have installed and loaded readxls & openxlsx libraries.
I have been having a problem trying to download a zipped file with American Community Survey (ACS) data. The file is a zipped folder that contains zipped sub-folders within it. I want to download and unzip the file leaving the individual zipped sub-folders. The code I am using is:
ACS.url<-"https://www2.census.gov/programs-surveys/acs/summary_file/2019/data/5_year_entire_sf/Tracts_Block_Groups_Only.zip"
dir<-getwd()
zip.file<-"CTrctACS19.zip"
zip.combine<-as.character(paste(dir,zip.file,sep="/"))
download.file(ACS.url,destfile=zip.combine,mode="wb")
unzip(zip.file)
After running the code I get what appears to be a correct download, but the unzip does not work. I get the following error message:
In unzip(zip.file) : error 1 in extracting from zip file
The downloaded file is only about half of the size zip of the file I am trying to download (the original is 3.7G at Census website but I have about 1.8G) so I think it is not downloading the data correctly. I tried to access the downloaded file with the winzip program and it would not work either. I can download files from that website if they are smaller and don't include zipped subfolders. Any help would be appreciated.
I want to know how can we convert .xlsx file residing in hdfs to .csv file using R script.
I tried using XLConnect and xlsx packages, but its giving me error 'file not found'.I am providing HDFS location as input in the R script using the above packages.I am able to read .csv files from hdfs using R script (read.csv()).
Do I need to install any new packages for reading .xlsx present in hdfs .
sharing the code i used:
library(XLConnect)
d1=readWorksheetFromFile(file='hadoop fs -cat hdfs://............../filename.xlsx', sheet=1)
"Error: FileNotFoundException (Java): File 'filename.xlsx' could not be found - you may specify to automatically create the file if not existing."
I am sure the file is present in the specified location.
Hope my question is clear. Please suggest a method to resolve it.
Thanks in Advance!
hadoop fs isn't a file, but a command that copies a file from HDFS to your local filesystem. Run this command from outside R (or from inside it using system), and then open the spreadsheet.
I have couple of .csv files in C:\Users\USER_NAME\Documents which are more than 2 GB in size. I want to use Apache Spark to read the data out of them in R. I am using Microsoft R Open 3.3.1 with Spark 2.0.1.
I am stuck with reading the .csv files with the function spark_read_csv(...) defined in Sparklyr package. It is asking for a file path which starts with file://. I want to know the proper file path for my case starting with file:// and ends with the file name which are in .../Documents directory.
I had a similar problem. In my case it was necessary for the .csv file to be put into the hdfs file system before calling it with spark_read_csv.
I think you probably have a similar problem.
If your cluster is also running with hdfs you need to use:
hdfs dfs -put
Best,
Felix
Newbie R question: I have been trying to test the R script posted in FlowingData, but the script spit out the following error:
Error: XML content does not seem to be XML: 'NA'
I am running R on my windows box, with the .gpx files in the same directory as the script. Any help is appreciated.
Not sure if you ever found the answer to this or not, but the XML error relates to the fact that R does not know where your .gpx files are. While the FlowingData script indicates that the script will work if the .gpx files are in the same folder as your saved R script copy/pasted from FlowingData, that is not true. You must also set your working directory to this path as well, then R will see your .gpx files. If you FlowingData R script file and .gpx files are in: C:\Users\leon\Documents\R then add this line under the library(plotKML) line to set your working directory: setwd("C:\\Users\\leon\\Documents\\R")
Another word of note, make sure you only use the RunKeeper gpx files for a fairly small geographic area or the plotted data will be insanely small.