SASxport to R: Errors while reading XPT SAS file - r

Anyone knows how to ignore/skip error while getting SAS export format file into R?
require(SASxport)
asc = SASxport::read.xport("..\\LLCP2018.XPT_", keep = cols)
Checking if the specified file has the appropriate header
Extracting data file information...
Reading the data file... ### Error in [.data.frame(ds, whichds) : undefined columns selected
I have plenty of columns and don't want to check one-by-one if it really exists.
Would like to ignore missing but there's no option within the function.
EDIT
Found an easy solution:
lu = SASxport::lookup.xport(xfile)
Now can probably choose from lu$names and intersect with cols. Still not every variables can be read but it's better.
But when I choose few columns (checked) I get another error unable to skip:
Error in if (any(tooLong)) { : missing value where TRUE/FALSE needed
Why this stops the reading process and returns null?
EDIT 2
Found workaround reading the same function but from different package:
asc <- foreign::read.xport(xfile)
Works, unfortunately, loads whole data - if there's some size limitation probably nothing I could do.

Related

Using rlist on a list of imported Excel sheets causes problems when filtering on certain names

I have dug into rlist and purrr and found them to be quite helpful in working with lists of pre-structured data. I tried to solve the problems arising on my one to improve my coding skills - so thanks to the community of helping out! However, I reached a dead-end now:
I want to write a code which is needed to be written in a way, that we throw our excel files in xlsm format to the folder an r does the rest.
I Import my data using:
vec.files<-list.files(pattern=".xlsm")
vec.numbers<- gsub("\\.xlsm","",vec.files)
list.alldata <- lapply(vec.files, read_excel, sheet="XYZ")
names(list.alldata) <- vec.numbers
The data we call is a combination of charaters, Dates (...).
When I try to use the rlist-package everything works fine until I try to use to filter on names, which were in the excel file not a fixed entry (e.g. Measurable 1), but a reference to another field (e.g. =Table1!A1, or an Reference).
If I try to call a false element I get this failure:
list.map(list.alldata, NameWhichWasAReferenceToAnotherFieldBefore)
Fehler in eval(.expr, .evalwith(.data), environment()) :
Objekt 'Namewhichwasareferencetoanotherfieldbefore' nicht gefunden
I am quite surprised, as if I call
names(list.alldata[[1]])
I get a vector with the correct entries / names.
As I identified the read_excel() as the problem causing reason I tried to add col_names=True, but did not help. Also col_names=False calls the correct arguments into the dataset.
I assume, that exporting the data as a .csv would help, but this is not an option. Can this be easily done by r in a pree-loop?
In my concept of working assessing the data by the names is essential and there is no work around so I really appreciate your help!

How to set a readable xlsx range in read.xlsx() in openxlsx

I am using read.xlsx() function to read a xlsx file, with colNames = FALSE, rowNames = TRUE arguments, everything was fine, but after adding a line of variable, it pops up error saying
Error in ".rowNamesDF<-"(x, value = value) :
missing values in 'row.names' are not allowed
When I check the problem byView() and using rowNames = FALSE, I found that the last row was introduced by a NA variable. However, since in the manual of read.xlsx() it doesn't say how to define a range, and I can't do like read.xlsx()[1:ncol(),] either, so I don't know what to do.
My trials:
I tried to delete the last row in the xlsx file, but R keeps saying missing value is introduced.
I know I could use rowNames = FALSE argument first and remove the last row, and define first row as row.names(), but I don't want to do so because I think there is a better solution.
Can you provide an example of the data contained in your excel file ?
So I can try something based on your data, if I understood you want to add a line at the ned of it right ?

R save() not producing any output but no error

I am brand new to R and I am trying to run some existing code that should clean up an input .csv then save the cleaned data to a different location as a .RData file. This code has run fine for the previous owner.
The code seems to be pulling the .csv and cleaning it just fine. It also looks like the save is running (there are no errors) but there is no output in the specified location. I thought maybe R was having a difficult time finding the location, but it's pulling the input data okay and the destination is just a sub folder.
After a full day of extensive Googling, I can't find anything related to a save just not working.
Example code below:
save(data, file = "C:\\Users\\my_name\\Documents\\Project\\Data.RData", sep="")
Hard to believe you don't see any errors - unless something has switched errors off:
> data = 1:10
> save(data, file="output.RData", sep="")
Error in FUN(X[[i]], ...) : invalid first argument
Its a misleading error, the problem is the third argument, which doesn't do anything. Remove and it works:
> save(data, file="output.RData")
>
sep is used as an argument in writing CSV files to separate columns. save writes binary data which doesn't have rows and columns.

Reading large csv file with missing data using bigmemory package in R

I am using large datasets for my research (4.72GB) and I discovered "bigmemory" package in R that supposedly handles large datasets (up to the range of 10GB). However, when I use read.big.matrix to read a csv file, I get the following error:
> x <- read.big.matrix("x.csv", type = "integer", header=TRUE, backingfile="file.bin", descriptorfile="file.desc")
Error in read.big.matrix("x.csv", type = "integer", header = TRUE,
: Dimension mismatch between header row and first data row.
I think the issue is that the csv file is not full, i.e., it is missing values in several cells. I tried removing header = TRUE but then R aborts and restarts the session.
Does anyone have experience with reading large csv files with missing data using read.big.matrix?
It may not be solving your problem directly, but you might find a package of mine filematrix useful. The relevant function is fm.create.from.text.file.
Please let me know if it works for your data file.
Did you check bigmemory PDF at https://cran.r-project.org/web/packages/bigmemory/bigmemory.pdf?
It was clearly described right there.
write.big.matrix(x, 'IrisData.txt', col.names=TRUE, row.names=TRUE)
y <- read.big.matrix("IrisData.txt", header=TRUE, has.row.names=TRUE)
# The following would fail with a dimension mismatch:
if (FALSE) y <- read.big.matrix("IrisData.txt", header=TRUE)
Basically, error means there is a column in the CSV file with row names. If you don't pass has.row.names=TRUE, bigmemory will consider row names a separate column, and without header you'll get mismatch.
I personally found data.table package more useful for dealing with large data set cases, YMMV

Error exporting data.frame as csv

Exporting data.frame as .csv with code.
write.csv(df, "name.csv")
LogitTV.Rda has 3000 rows and 4 columns.
My code has an error when identifying the data.frame.
load("~/Home Automation/LogitTV.Rda")
write.csv(LogitTV.Rda, "LogitTV.csv")
Error in is.data.frame(x) : object 'LogitTV.Rda' not found
Checked the following:
1) Cleaned the console of previous history
2) Working Directory set as ~/Home Automation/
Anything else to check for preventing the error?
Thanks
LogitTV.Rda is, confusingly, not the name of the object that gets loaded.
Try:
loadedObj <- load("~/Home Automation/LogitTV.Rda")
write.csv(get(loadedObj), file="LogitTV.csv")
This assumes that the .Rda file contains only a single R object, and that it is a data frame or matrix.
It would be nice if write.csv had a way to accept the name of an object instead of the object itself (so get() was unnecessary), but I don't know of one.

Resources