Error in df_parse_dta_file(): Failed to parse C:/Users/folder/data.dta: This version of the file format is not supported - r

I wanted to read in a .dta file in R in order to convert it to a .csv file. First, I tried to do so by using the foreign package, but it reported:
Error in read.dta(file): not a Stata version 5-12 .dta file
So I tried to do it by using teh haven package, but that also failed and reported:
Error in df_parse_dta_file(spec, encoding, cols_skip, n_max, skip, name_repair = .name_repair) : Failed to parse C:/Users/folder/data.dta: This version of the file format is not supported
I also tried to convert it with the rio package:
install.packages("rio")
library(rio)
install_formats()
convert("file.dta","file.csv")
but it reported:
Error in arg_reconcile(haven::read_dta, file = file, ..., .docall = TRUE, :
Failed to parse C:/Users/folder/data.dta: This version of the file format is not supported.
This error was generated by: haven::read_dta
With the following arguments:
"._costs.dta"
Does anyone know how to import such .dta files in R so that one can convert a .csv file ?
PS: The preamble of the .dta-file looks like this:
<stata_dta>118LSFM 23 Apr 2019 16:22

Try adding encoding = "UTF-8" or encoding = "Latin1" inside the read_dta() function to tell R import same data without encoding into numbers. It might take a little while to clean data tho :(

Related

R openxlsx error using "loadWorkbook": WARNING: simpleWarning in unzip(file, exdir = xmlDir): error 1 in extracting from zip file

Attempting to load data from an Excel spreadsheet in R using openxlsx, I get the following error/warning,
WARNING: simpleWarning in unzip(file, exdir = xmlDir): error 1 in extracting from zip file. This is not a zipped file.
Why am I getting this error?
This error is not related to zip-issues. Instead, this is probably a file-lock error. Check to be certain that the workbook in question is not currently open. If so, close it and try again.

Reading .xls-file in R

I am trying to read a .xls-file into a R dataframe. I've tried:
library(readxl)
dfTest <- readxl::read_excel("file_path/file.xls")
Which gives me:
Error:
filepath: file_path/file.xls
libxls error: Unable to open file
Next I tried:
library(xlsx)
dfTest <- xlsx::read.xlsx("file_path/file.xls",1)
Which results in:
Error in .jcall("RJavaTools", "Ljava/lang/Object;", "invokeMethod", cl, :
java.io.IOException: block[ 1462 ] already removed - does your POIFS have circular or duplicate block references?
I tried:
library(openxlsx)
dfTest <- openxlsx::read.xlsx("file_path/file.xls")
Which results in:
Error in read.xlsx.default("file_path/file.xls") :
openxlsx can not read .xls or .xlm files!
Last thing that I tried was:
library(RODBC)
conn <- odbcConnectExcel("file_path/file.xls")
Which gives me:
Error in odbcConnectExcel("file_path/file.xls") :
odbcConnectExcel is only usable with 32-bit Windows
Would anyone have an idea how I can read the Excel file? Saving the file as .csv-file and loading it into R works perfectly fine. However, I have a large amount of files that I ultimately want to read and process in a loop. Saving all by hand as .csv is teadious to say the least.
I'm restricted in changing the software installations on the computer I'm working on.
I believe for .xls files read_delim from the readr package should work.
For example:
readr::read_delim("file_path/file.xls",as.is=TRUE)

how to import .rec files in R

I have a .rec file that I want to import into R. I have saved the .rec file to my working directory. This is what I have tried.
library(foreign)
library(RODBC)
data.test <- read.epiinfo("data_in.rec")
I get this error:
Error in if (headerlength <= 0L)
stop("file has zero or fewer variables: probably not an EpiInfo file") :
missing value where TRUE/FALSE needed
In addition: Warning messages:
1:
In readLines(file, 1L, ok = TRUE) :
line 1 appears to contain an embedded nul
2:
In strsplit(line, " ") : input string 1 is invalid in this locale
I have looked online and in the read.epiinfo help package in R. The help package says
Some later versions of Epi Info use the Microsoft Access file format
to store data. That may be readable with the RODBC package.
I have two questions.
1. Is the error I am getting because the .rec file I have is from an Epi Info version later than 6?
2. How do I use the RODBC library to open the .rec file?
The .rec (or .REC) file turned out to be a .EDF (European Data Format) file type. It was easily opened in R using the library edfReader. The edfReader library help file is very useful for opening the file and extracting the time series data. See code below for what I used. Code was adapted from the help file.
install.packages('edfReader')
library(edfReader)
?edfReader
lib.dir <- system.file("data_in.rec",package="edfReader")
Cfile <- paste(lib.dir,'/edfPlusC.edf',sep='')
CHdr <- readEdfHeader("data_in.rec")
CSignals <- readEdfSignals(CHdr)
summary(CSignals)

R Importing excel file directly from web

I need to import excel file directly from NYSE website. The spreadsheet url is https://quotespeed.morningstar.com/exportChartDataToExcel.jsp?tickers=AAPL&symbols=126.1.AAPL&st=1980-12-1&ed=2015-6-8&f=m&dty=1&types=1&ver=1.6.0&qs_wsid=E43474CC03753FE0E777D89877788ECB . Tried using gdata package and changing https to http but still doesnt work. Does anybody know solution to such issue?
EDIT: Has to be imported to R directly from website (project requirement)
Without information about why using the gdata package does not work for you I have to assume. Make sure you have Perl installed - you can download it at http://www.activestate.com/activeperl
This works for me:
library('gdata')
## URL broken into multiple lines for readability
url <- paste("https://quotespeed.morningstar.com/exportChartDataToExcel.",
"jsp?tickers=AAPL&symbols=126.1.AAPL&st=1980-12-1&ed=2015-",
"6-8&f=m&dty=1&types=1&ver=1.6.0&qs_wsid=E43474CC03753FE0E",
"777D89877788ECB", sep = "")
url <- gsub("https", "http",url)
data <- read.xls(url, perl = "C:/Perl64/bin/perl.exe")
Without perl = "path_to_perl.exe" I got the error
Error in findPerl(verbose = verbose) :
perl executable not found. Use perl= argument to specify the correct path.
Error in file.exists(tfn) : invalid 'file' argument
Use the RCurl package to download the file and the readxl package by Hadley to read the excel file

Cannot read data from an xlsx file in RStudio

I have installed the required packages - gdata and ggplot2 and I have installed perl.
library(gdata)
library(ggplot2)
# Read the data from the excel spreadsheet
df = data.frame(read.xls ("AssignmentData.xlsx", sheet = "Data", header = TRUE, perl = "C:\\Strawberry\\perl\\bin\\perl.exe"))
However when I run this I get the following error:
Error in xls2sep(xls, sheet, verbose = verbose, ..., method = method, :
Intermediate file 'C:\Users\CLAIRE~1\AppData\Local\Temp\RtmpE3UYWA\file8983d8e1efc.csv' missing!
In addition: Warning message:
running command '"C:\STRAWB~1\perl\bin\perl.exe" "C:/Users/Claire1992/Documents/R/win-library/3.1/gdata/perl/xls2csv.pl" "AssignmentData.xlsx" "C:\Users\CLAIRE~1\AppData\Local\Temp\RtmpE3UYWA\file8983d8e1efc.csv" "Data"' had status 2
Error in file.exists(tfn) : invalid 'file' argument
Thanks to #Stibu I realised I had to set my work directory. This is the command you use to run in Rstudio; setwd("C/Documents..."). The file path is where the excel file is located.
I had the issue but I solved it differently.
My problem was because my file was saved as Excel (extension .xls) but it was a txt file.
I corrected the file and I did not meet any other error with the R function.

Resources