Writing an RDA to CSV in R - r

I'm trying to write a script to load an RDA by filename and write an equivalent file out as CSV.
It's almost there (with the loading and writing) however the output CSV contains the vector of strings returned by load(), rather than the data frame referenced by load...
$ cat convert.r
#!/usr/bin/env Rscript
argv <- commandArgs(TRUE)
inFile <- toString(argv[1])
print(inFile)
outFile <- gsub(".rda$", ".csv", inFile)
print(outFile)
inData <- load(inFile)
write.csv(inData, file=outFile)
This is the command + output...
/convert.r data.rda
[1] "data.rda"
[1] "data.csv"
[1] "table.data"
So you can see its picking up the input filename from the arguments, it's creating the right output filename, but the inData is a reference to the global object called table.data. When write.csv runs, it just contains this:
$ cat data.csv
"","x"
"1","table.data"
How do I make write.csv pick up the data-frame from the rda file? I understand there is a current risk if the RDA contains more than one frame - maybe it should loop over them and do file-frame.csv?!

Clearly the best way to solve a problem yourself is to publically ask a question.
The answer is: get()
inData <- load(inFile)
write.csv(get(inData), file=outFile)
(Answer found on https://stackoverflow.com/a/6316461/224707 by greg-snow)

Related

Writing a R function that takes in user input file path, then writing them out to csv?

The following is a part of the R script that I wrote that takes in files in dta and sav format, then converts them into csvs
##### Reading in the data ####
#Sometimes you may need to load in multiple files and merge them. Loading in second
# dta and merging process has been commented out
dta_data <- read_dta("/ihme/limited_use/IDENT/PROJECT_FOLDERS/UNICEF_MICS/UZB/2021_2022/UZB_MICS6_2021_2022_HH_Y2022M12D27.DTA")
#dta_data_2 <- read_dta("/ihme/limited_use/IDENT/PROJECT_FOLDERS/WB_LSMS_ISA/MWI/2019_2020/MWI_LSMS_ISA_IHS5_2019_2020_HH_GEOVARIABLES_Y2022M10D26.DTA")
#merged_MWI <- merge(dta_data, dta_data_2, by = "ea_id", all.x = TRUE)
write_csv(x=dta_data, path = "UZB_MICS6_2021_2022_HH_Y2022M12D27.csv")
#This loads in SAV files, as well as checking where the output will be saved.
sav_data <- read_sav("/home/j/DATA/WB_LSMS/JAM/2012/JAM_JSLC_2012_ANNUAL_Y2022M11D29.SAV")
getwd() # this is the folder it will save into unless you specify otherwise in the path below
write_csv(x=sav_data, path="JAM_JSLC_2012_ANNUAL_Y2022M11D29.csv")
I want to re-structure this bit to take in user input on where the original files are, and where they should written out to be. And I am not quite sure how to do this. I want to have a if else function where R determines whether the input_file is in dta or sav format. then, depending on the format either use read_dta or read_sav, save that to dta_or_sav, and finally write those out as csvs and save it out to output_path
I got some rough ideas;
convert_to_csv <- function(input_file, output_path) {
dta_or_sav <- read_dta(input_file)
write_csv(x=dta_or_sav, path=output_path)
}
I have no idea where to go from here.
You can make the function that you want with grepl, which check the pattern of the file (.DTA or .SAV).
library(haven)
convert_to_csv <- function(input_file, output_path) {
# .DTA -> read_dta()
# .SAV -> read_sav()
if(grepl('.DTA',input_file)){
input <- read_dta(input_file)
} else {
input <- read_sav(input_file)
}
# Export to CSV
write.csv(x=input, path=output_path)
}
Note that output_path should contains filename(ends with .csv) with your target path.

Using unz() to read in SAS data set into R

I am trying to read in a data set from SAS using the unz() function in R. I do not want to unzip the file. I have successfully used the following to read one of them in:
dir <- "C:/Users/michael/data/"
setwd(dir)
dir_files <- as.character(unzip("example_data.zip", list = TRUE)$Name)
ds <- read_sas(unz("example_data.zip", dir_files))
That works great. I'm able to read the data set in and conduct the analysis. When I try to read in another data set, though, I encounter an error:
dir2_files <- as.character(unzip("data.zip", list = TRUE)$Name)
ds2 <- read_sas(unz("data.zip", dir2_files))
Error in read_connection_(con, tempfile()) :
Evaluation error: error reading from the connection.
I have read other questions on here saying that the file path may be incorrectly specified. Some answers mentioned submitting list.files() to the console to see what is listed.
list.files()
[1] "example_data.zip" "data.zip"
As you can see, I can see the folders, and I was successfully able to read the data set in from "example_data.zip", but I cannot access the data.zip folder.
What am I missing? Thanks in advance.
Your "dir2_files" is String vector of the names of different files in "data.zip". So for example if the files that you want to read have them names at the positions "k" in "dir_files" and "j" in "dir2_files" then let update your script like that:
dir <- "C:/Users/michael/data/"
setwd(dir)
dir_files <- as.character(unzip("example_data.zip", list = TRUE)$Name)
ds <- read_sas(unz("example_data.zip", dir_files[k]))
dir2_files <- as.character(unzip("data.zip", list = TRUE)$Name)
ds2 <- read_sas(unz("data.zip", dir2_files[j]))

Naming an output file using an input file name

I am fairly new to R programming and I apologize if this question has been answered already; I did search for an answer, but perhaps my wording is off.
I have imported a TXT file, performed my analysis and transformation of the data and now wish to write a CSV file for export. However, since this script is meant to run multiple files, I would like to use the file name from the input TXT file as the output CSV file.
>read.csv("C:\\Users\\Desktop\\filename.txt", header=FALSE)
>...
>...
>write.csv(Newfile, "filename.csv")
As an example, I want to be able to take the 'filename' portion of the pathway and (I would assume) create a string variable to pull into the name of the CSV file I want to write.
I know this is beginner level stuff, but any help would be appreciated. Thanks!
We can keep the filename and path in a variable then manipulate to make output filename:
myInputFile <- "C:\\Users\\Desktop\\filename.txt"
myOutFile <- paste0(tools::file_path_sans_ext(myInputFile),".csv")
# test
myInputFile
# [1] "C:\\Users\\Desktop\\filename.txt"
myOutFile
# [1] "C:\\Users\\Desktop\\filename.csv"
Or more general approach, I use below to keep track of my ins and outs:
# define folders
folderWD <- "/users/myName/myXproject/"
folderInput <- paste0(folderWD, "data/")
folderOutput <- paste0(folderWD, "output/")
# input output files
fileInput <- paste0(folderInput, "filename.txt")
fileOutput <- paste0(folderOutput, tools::file_path_sans_ext(basename(fileInput)), ".csv")
# test
fileInput
# [1] "/users/myName/myXproject/data/filename.txt"
fileOutput
# [1] "/users/myName/myXproject/output/filename.csv"
#then codez
myInputData <- read.csv(fileInput, header = FALSE)
...
Newfile <- # do some stuff with myInputData
...
write.csv(Newfile, fileOutput)

Loop for Multiple DataFrame in R

I have about 200 txt files with different name, any file have diferent number of dimensions. The code for read is ok:
setwd("C:/...")
filelist<-list.files(pattern="*.txt")
for (j in 1:length(filelist)) assign(filelist[j], read.csv(filelist[j], header=TRUE))
But I want a loop about to read all of the above files and data variable to gets each file each time.
for (file in filelist){
data[file]<-file
Do something with data
e.g. log(data[,6]
}
From the above the output from data is
"NameFile.txt"
The problem is that in this way does not read the data set just the name of data. There is a way to to chase away the "" or something else?
You may want to do one of the following in your for loop, the second suggested by #hvollmeier.
for (file in filelist){
## Uncomment one of these options
#=> data[file] <- eval(parse(text = file))
# OR
#=> data[file] <- get(file)
Do something with data
e.g. log(data[,6])
}

Loop in R to read sequentially numbered filenames and output accordingly numbered files

I'm sure this is very simple, but I'm new to doing my own programming in R and haven't quite gotten a hang of the syntax for looping.
I have code like this:
mydata1 <- read.table("ph001.txt", header=TRUE)
# ... series of formatting and merging steps
write.table(mydata4, "ph001_anno.txt", row.names=FALSE, quote=FALSE, sep="\t")
png("manhattan_ph001.png"); manhattan(mydata4); dev.off()
png("qq_ph001.png"); qq(mydata4$P); dev.off()
The input file ph001.txt is output from a linear regression algorithm, and from that file, I need to output ph001_anno.txt, manhattan_ph001.png, and qq_ph001.png. The latter two are using the qqman package.
I have a folder that contains ph001 through ph138, and would like a loop function that reads these files individually and creates the corresponding output files for each file. As I said, I'm sure there is an easy way to do this as a loop function, but the part that's tripping me up is modifying the output filenames.
You can use the stringr package to do a lot of the string manipulation you want in order to generate your file names, like so:
f <- function(i) {
num <- str_pad(i, 3, pad = "0")
a <- str_c("ph", num, "_anno.txt")
m <- str_c("manhattan_ph", num, ".png")
q <- str_c("qq_ph", num, ".png")
# Put code to do stuff with these file names here
}
sapply(1:138, f)
In the above block of code, for each number in 1:138 you create the name of three files. You can then use those file names in calls to read.table or ggsave or whatever you want.

Resources