Issues in R with saving split dataframes as new files

Issues in R with saving split dataframes as new files - r

I've searched a bit for answered questions related to this, but I still keep running into issues.
I have a 1.4 million dataframe loaded into R, containing gps route data for ~56 vehicles. I used the split() function to parse my data into smaller chunks by bus name (Bus name example: '1367/E0007489'). I used the following line of code:
dfs <- split(sater001_paired, f=sater001_paired[, "vehicleName"])
Where sater001_pairedis my dataframe, and vehicleName is the variable I split with. The # of rows for each chunk is uneven, given that this data was captured real-time.
The problem I'm facing now is attempting to save each of these chunks into their own .csv files. I tried using lapply as such:
lapply(names(dfs), function(x){write.table(dfs[[x]], file = paste("bus", x, sep = ""))})
But R returns en error message "cannot open the connection". It's likely I'm missing something, as I'm very rusty on using the lapply function.
Any suggestions based off this?

MrFlick has helped me realize the issue I was having here.
So just to close this, the Vehicle Names column I had contained a forward slash halfway in each identification code. As Rstudio on windows does not take kindly to these characters, I did not realize this, as I have only recently switched over from primarily Mac OS use.
By using gsub in the following code:
sater001_paired$vehicleName <- gsub('/', '-', sater001_paired$vehicleName)
This issue has now resolved. Thanks again to MrFlick for the help.

Related

Using rlist on a list of imported Excel sheets causes problems when filtering on certain names

I have dug into rlist and purrr and found them to be quite helpful in working with lists of pre-structured data. I tried to solve the problems arising on my one to improve my coding skills - so thanks to the community of helping out! However, I reached a dead-end now:
I want to write a code which is needed to be written in a way, that we throw our excel files in xlsm format to the folder an r does the rest.
I Import my data using:
vec.files<-list.files(pattern=".xlsm")
vec.numbers<- gsub("\\.xlsm","",vec.files)
list.alldata <- lapply(vec.files, read_excel, sheet="XYZ")
names(list.alldata) <- vec.numbers
The data we call is a combination of charaters, Dates (...).
When I try to use the rlist-package everything works fine until I try to use to filter on names, which were in the excel file not a fixed entry (e.g. Measurable 1), but a reference to another field (e.g. =Table1!A1, or an Reference).
If I try to call a false element I get this failure:
list.map(list.alldata, NameWhichWasAReferenceToAnotherFieldBefore)
Fehler in eval(.expr, .evalwith(.data), environment()) :
Objekt 'Namewhichwasareferencetoanotherfieldbefore' nicht gefunden
I am quite surprised, as if I call
names(list.alldata[[1]])
I get a vector with the correct entries / names.
As I identified the read_excel() as the problem causing reason I tried to add col_names=True, but did not help. Also col_names=False calls the correct arguments into the dataset.
I assume, that exporting the data as a .csv would help, but this is not an option. Can this be easily done by r in a pree-loop?
In my concept of working assessing the data by the names is essential and there is no work around so I really appreciate your help!

For loop data frame error when working with character vectors

I am currently doing data science with R and I generally write loops to access multiple files or objects at once. Normally this goes without any problems but recently a problem occurred when trying to run the following code:
setwd(PROJECT_FOLDER)
climate_forcing <- c("cf-1", "cf-2", "cf-3", "cf-4")
#load all mean stacks from IM and create rasterstack
for (i in 1:NROW(climate_forcing)){
setwd(PROJECT_FOLDER)
setwd(paste0("time frames mcor/X variable/IM/", climate_forcing[i], "/ncstack/"))
file.names <- list.files(pattern = ".nc", recursive=T, full.names=F) #list all files with ".nc"
stopwords <- c(".nc", "stack", "/dLAI") #stopwords
names.short <- gsub(paste(stopwords, collapse="|"), "", file.names)
assign("names.short", paste0(names.short, climate_forcing[i]))
for (j in 1:NROW(file.names)){
assign(paste0(names.short[j], "_stack"), stack(file.names[j]))
}
}
Error message returned:
Error in data.frame(values = unlist(unname(x)), ind, stringsAsFactors = FALSE) :
arguments imply differing number of rows: 1, 0
I wrote this a while ago and I ran it before and I think it used to work since the files being created by a similar script are there.
Anyways I did some testing and it seems that the error occurs in the for loop within the for loop (with the variable j). I am unsure what may cause this bug but has to do something with "file.names" and "names.short" right? When I compare them, their properties appear to be identical though, which I figured would be, since I create the latter out of the former. The reason I am creating them like this is because I want to create objects reading out the corresponding files of file.names.
The error I get refers to data.frame which confuses me because I'm working with character vectors here..
Maybe somebody with more experience can figure this issue out.
Thanks for any help and if there are any questions I will try to answer them.

Alright it turns out something was wrong with the R packages, I reinstalled and reloaded them (raster) and now it works. Thanks to anyone for your contributions!

Remove everything after an empty row on a list in R

I have a quick question that I cannot figure out. I am reading some results from an output file using the code below and stored as a list in R that can be seen in the picture. I want to delete all of the information after an empty row, in other words, it would be everything after line 42:
Does anybody know anything that I could use? I tried using gsub was I was not very successful.
Thanks for all of the help I am new to programming in R. Again any help is very much appreciated.
LoadFFA <- function(filename, folder.out, TYPE = "PeakFQ_17C",
colStandard = TRUE){ # standardize column output names
require(data.table)
if(grepl("PEAKFQSA",TYPE)){ # PeakfqSA Bulleting 17C analysis
text.list<-lapply(fileinput,readLines)
skip.rows<-sapply(text.list, grep, pattern = '^Ann. Exc. Prob.\\s+EMA Est.')-1
PFA <- lapply(seq_along(text.list),function(i) read.delim(fileinput[i],skip=skip.rows[i],sep="\n",stringsAsFactors = TRUE,blank.lines.skip = FALSE))
}
EDIT
I don't know if I could upload directly so here is the google drive link.
Also, here is the command to run the function LoadFFA("03606500peaks.out","D:/Documents/hydraulic.failures","PEAKFQSA"). The screenshot is the result using print(PFA).
The reason why I am using a loop is because I am reading multiple files (output files) and they have a lot of data, multiple lenghts, and I am reading the data beginning Ann.Exc.Prob. and as per the screenshot provided I would like to end after line 42 (after a full empty row). I hope that clears some confusion.
Basically read the output files, start reading on "Ann.Exc.Prob" and end until the end of that data (line 42 for this particular file). I am using a function because I am running several times.
Again, sorry for the trouble. Thank you for your time and I appreciate your patience.
https://drive.google.com/file/d/1PGbGWIHFj7IQRevTAEfqqA9Okg4fz7Mg/view?usp=sharing

How can I import data into R that is meant for use in SAS, SPSS, or STATA?

I am attempting to read data from the National Health Interview Survey in R: http://www.cdc.gov/nchs/nhis/nhis_2011_data_release.htm . The data is Sample Adult. The SAScii library actually has a function read.SAScii whose documentation has an example for the same data set I would like to use. The issue is it "doesn't work":
NHIS.11.samadult.SAS.read.in.instructions <-
"ftp://ftp.cdc.gov/pub/Health_Statistics/NCHS/Program_Code/NHIS/2011/SAMADULT.sas"
NHIS.11.samadult.file.location <-
"ftp://ftp.cdc.gov/pub/Health_Statistics/NCHS/Datasets/NHIS/2011/samadult.zip"
#store the NHIS file as an R data frame!
NHIS.11.samadult.df <-
read.SAScii (
NHIS.11.samadult.file.location ,
NHIS.11.samadult.SAS.read.in.instructions ,
zipped = T, )
#or store the NHIS SAS import instructions for use in a
#read.fwf function call outside of the read.SAScii function
NHIS.11.samadult.sas <- parse.SAScii( NHIS.11.samadult.SAS.read.in.instructions )
#save the data frame now for instantaneous loading later
save( NHIS.11.samadult.df , file = "NHIS.11.samadult.data.rda" )
However, when running it I get the error Error in toupper(SASinput) : invalid multibyte string 533.
Others on Stack Overflow with a similar error, but for functions such as read.delim and read.csv, have recommended to try changing the argument to fileEncoding="latin1" for example. The problem with read.SAScii is it has no such parameter fileEncoding.
See:
R: invalid multibyte string and Invalid multibyte string in read.csv

Just in case anyone has a similar problem, the issue and solution for me was to run options( encoding = "windows-1252" ) right before running the above code for read.SAScii since the ASCII file is meant for use in SAS and therefore on Windows. And I am using Linux.
The author of the SAScii library actually has another Github repository asdfree where he has working code for downloading CDC-NHIS datasets for all available years as well as as many other datasets from various surveys such as the American Housing Survey, FDA Drug Surveys, and many more.
The following links to the author's solution to the issue in this question. From there, you can easily find a link to the asdfree repository: https://github.com/ajdamico/SAScii/issues/3 .
As far as this dataset goes, the code in https://github.com/ajdamico/asdfree/blob/master/National%20Health%20Interview%20Survey/download%20all%20microdata.R#L8-L13 does the trick, however it doesn't encode the columns as factors or numeric properly. The good thing is that for any given dataset in an NHIS year, there are only about less than ten to twenty numeric columns where encoding these as numeric one by one is not so painful, and encoding the rest of the columns as numeric requires only a loop through the non-numeric columns.
The easiest solution for me, since I only require the Sample Adult dataset for 2011, and I was able to get my hands on a machine with SAS installed, was to run the SAS program included at http://www.cdc.gov/nchs/nhis/nhis_2011_data_release.htm to encode the columns as necessary. Finally, I used proc export to export the sas dataset onto a CSV file which I then opened in R easily with no necessary edits to the data except in dealing with missing values.
In case you want to work with NHIS datasets besides Sample Adult, it is worth noting that when I ran the available SAS program for 2010 "Sample Adult Cancer" (http://www.cdc.gov/nchs/nhis/nhis_2010_data_release.htm) and exported the data to a CSV, there was an issue with having less column names than actual columns when I attempted to read in the CSV file in R. Skipping the first line resolves this issue but you lose the descriptive column names. You can however import this same data easily without encoding with the R code in the asdfree repository. Please read the documentation there for more info.

Using read.ssd to convert SAS data set into an R data.frame

I wish to read data into R from SAS data sets in Windows. The read.ssd function allows me to do so, however, it seems to have an issue when I try to import a SAS data set that has any non-alphabetic symbols in its name. For example, I can import table.sas7bdat using the following:
directory <- "C:/sas data sets"
sashome <- "/Program Files/SAS/SAS 9.1"
table.df <- read.ssd(directory, "table", sascmd = file.path(sashome, "sas.exe"))
but I can't do the same for a table SAS data set named table1.sas7bdat. It returns an error:
Error in file.symlink(oldPath, linkPath) :
symbolic links are not supported on this version of Windows
Given that I do not have the option to rename these data sets, is there a way to read a SAS data set that has non-alphabetic symbols in its name in to R?

Looking about, it looks like others have your problem as well. Perhaps it's just a bug.
Anyway, try the suggestion from this (old) R help post, posted by the venerable Dan Nordlund who's pretty good at this stuff - and also active on SASL (sasl#listserv.uga.edu) if you want to try cross-posting your question there.
https://stat.ethz.ch/pipermail/r-help/2008-December/181616.html
Also, you might consider the transport method if you don't mind 8 character long variable names.

Use:
directory <- "C:/sas data sets"
sashome <- "/Program Files/SAS/SAS 9.1"
table.df <- read.ssd(library=directory, mem="table1", formats=F,
sasprog=file.path(sashome, "sas.exe"))

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex