Naming an output file using an input file name - r

I am fairly new to R programming and I apologize if this question has been answered already; I did search for an answer, but perhaps my wording is off.
I have imported a TXT file, performed my analysis and transformation of the data and now wish to write a CSV file for export. However, since this script is meant to run multiple files, I would like to use the file name from the input TXT file as the output CSV file.
>read.csv("C:\\Users\\Desktop\\filename.txt", header=FALSE)
>...
>...
>write.csv(Newfile, "filename.csv")
As an example, I want to be able to take the 'filename' portion of the pathway and (I would assume) create a string variable to pull into the name of the CSV file I want to write.
I know this is beginner level stuff, but any help would be appreciated. Thanks!

We can keep the filename and path in a variable then manipulate to make output filename:
myInputFile <- "C:\\Users\\Desktop\\filename.txt"
myOutFile <- paste0(tools::file_path_sans_ext(myInputFile),".csv")
# test
myInputFile
# [1] "C:\\Users\\Desktop\\filename.txt"
myOutFile
# [1] "C:\\Users\\Desktop\\filename.csv"
Or more general approach, I use below to keep track of my ins and outs:
# define folders
folderWD <- "/users/myName/myXproject/"
folderInput <- paste0(folderWD, "data/")
folderOutput <- paste0(folderWD, "output/")
# input output files
fileInput <- paste0(folderInput, "filename.txt")
fileOutput <- paste0(folderOutput, tools::file_path_sans_ext(basename(fileInput)), ".csv")
# test
fileInput
# [1] "/users/myName/myXproject/data/filename.txt"
fileOutput
# [1] "/users/myName/myXproject/output/filename.csv"
#then codez
myInputData <- read.csv(fileInput, header = FALSE)
...
Newfile <- # do some stuff with myInputData
...
write.csv(Newfile, fileOutput)

Related

Writing a R function that takes in user input file path, then writing them out to csv?

The following is a part of the R script that I wrote that takes in files in dta and sav format, then converts them into csvs
##### Reading in the data ####
#Sometimes you may need to load in multiple files and merge them. Loading in second
# dta and merging process has been commented out
dta_data <- read_dta("/ihme/limited_use/IDENT/PROJECT_FOLDERS/UNICEF_MICS/UZB/2021_2022/UZB_MICS6_2021_2022_HH_Y2022M12D27.DTA")
#dta_data_2 <- read_dta("/ihme/limited_use/IDENT/PROJECT_FOLDERS/WB_LSMS_ISA/MWI/2019_2020/MWI_LSMS_ISA_IHS5_2019_2020_HH_GEOVARIABLES_Y2022M10D26.DTA")
#merged_MWI <- merge(dta_data, dta_data_2, by = "ea_id", all.x = TRUE)
write_csv(x=dta_data, path = "UZB_MICS6_2021_2022_HH_Y2022M12D27.csv")
#This loads in SAV files, as well as checking where the output will be saved.
sav_data <- read_sav("/home/j/DATA/WB_LSMS/JAM/2012/JAM_JSLC_2012_ANNUAL_Y2022M11D29.SAV")
getwd() # this is the folder it will save into unless you specify otherwise in the path below
write_csv(x=sav_data, path="JAM_JSLC_2012_ANNUAL_Y2022M11D29.csv")
I want to re-structure this bit to take in user input on where the original files are, and where they should written out to be. And I am not quite sure how to do this. I want to have a if else function where R determines whether the input_file is in dta or sav format. then, depending on the format either use read_dta or read_sav, save that to dta_or_sav, and finally write those out as csvs and save it out to output_path
I got some rough ideas;
convert_to_csv <- function(input_file, output_path) {
dta_or_sav <- read_dta(input_file)
write_csv(x=dta_or_sav, path=output_path)
}
I have no idea where to go from here.
You can make the function that you want with grepl, which check the pattern of the file (.DTA or .SAV).
library(haven)
convert_to_csv <- function(input_file, output_path) {
# .DTA -> read_dta()
# .SAV -> read_sav()
if(grepl('.DTA',input_file)){
input <- read_dta(input_file)
} else {
input <- read_sav(input_file)
}
# Export to CSV
write.csv(x=input, path=output_path)
}
Note that output_path should contains filename(ends with .csv) with your target path.

How to use lapply to run a set of functions/code for a list of files in R? [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed last month.
Improve this question
I have 50 text files all beginning with NEW. I want to loop through each textfile/dataframe and run the same functions for each of these files and then output the results via the write.table function. Therefore, for each 50 files, the same functions are applied and then 50 independent output results should be created containing the original name with word 'output' at the end.
Here is my code. I have used the list.file function to read in the 50 files but how can I adapt my code below so that each of the below R commands/functions run for each of the 50 files independently and then output the corresponding 50 files?
file_list <- list.files("/data/genome/relevantfiles/AL/*NEW*.") #reading in the list of 50 files in my directory all starting with NEW
df <- file_list #each file *NEW* is a df #this does not work - how can apply each file into a dataframe.
##code for each dataframe. The code/function is shown more simply below because it works
#running a function on the dataframe
df_output_results <- coloc.susie(dataset1 = list(beta=df$BETA, varbeta=df$Varbeta, N=1597, type="quant" ...)
#printing out results for each dataframe
final_results <- print(df_output_results$summary)
#outputting results
write.table(final_results, file = paste0(fileName,"_output.txt"), quote = F, sep = "\t")
I am unsure how to adapt the code so that each file is inputted in a list and the following R codes are applied to each file in the code block and then outputted into a seperate file ... repeated for 50 files. I am assuming I need to use lapply but not sure how to use this? The codes in the code block work so there is not issue with the codes.
From what I understand you want to import 50 files from a folder and store each file in a list. Then you want to loop a function across that list, then export those results somewhere.
I created an example folder on my desktop ("Desktop/SO Example") and put five CSV files in there. You didn't specify what format your files were in, but you can change the below code to whatever import command you need (see ?read.delim). The data in the CSVs are identical and made using:
ex_df <- data.frame(A = LETTERS[1:5], B = 1:5, C = words[1:5])
And look like this:
A
B
C
A
1
a
B
2
able
C
3
about
D
4
absolute
E
5
accept
I imported these and stored them in a list using lapply. Then I made a simple example function to loop through each data frame in the list and perform some operation (using lapply). Lastly, I exported those results as a CSV file back in the same folder using sapply.
Hopefully this helps!
# Define file path to desired folder
file_path <- "~/Desktop/SO Example/"
# Get file names in the folder
file_list <- list.files(path = file_path)
# Use lapply() with read.csv (if they are CSV files) to store data in a list
list_data <- lapply(file_list, function(x) read.csv(paste0(file_path, x)))
# Define some function
somefunction <- function(x){
paste(x[,1], x[,2], x[,3])
}
# Run the function across your list data using lapply()
results <- lapply(list_data, somefunction)
# Output to the same folder using sapply
sapply(1:length(results), function(x)
write.csv(results[x],
paste0(file_path, "results_output_", x, ".csv"),
row.names = FALSE))

Retaining original file names when processing multiple raster files using R

I have the following problem: I need to process multiple raster files using the same function in R package landscapemetrics. Basically my raster files are parts of a country map, all of the same shape and size (i.e. quadrants. I figured out a code for 1 file, but I have to do the same with more than 600 rasters. So, doing it manually is very irrational. The steps in my code are the following:
# 1. I load "raster" and "landscapemetrics" packages:
library(raster)
library(landscapemetrics)
# 2. I read in my quadrant:
Quadrant <- raster("C:\\Users\\customer\\Documents\\ ... \\2434-44.tif")
# 3. I process the raster to get landscape metrics tibble:
LS_metrics <- calculate_lsm(landscape = Quadrant)
# 4. Finally, I write it into a csv:
write.csv(LS_metrics, file = "2434-44.csv")
I need to keep the same file name for my csv files as I had for tif (e.g. results from processing quadrant "2434-44.tif", need to be stored in "2434-44.csv", possibly in a folder in wd).
I am new to R. I tried to use list.files() and then apply a for loop, but my code did not work.
I need your advice.
Yours faithfully,
Denis
Your question is really about iteration and character (filename) manipulation; not about landscapemetrics etc. There are many similar questions on this site and resources elsewhere that you can consult. The basic approach can be like this:
# get input filenames
inf <- list.files("/my/path", pattern="\\.tif$", full=TRUE)
# create output filenames
outf <- gsub(".tif", ".csv", basename(inf))
# perhaps put output files in particular folder
dir.create("out", FALSE, FALSE)
outf <- file.path("out", outf)
# iterate
for (i in 1:length(inf)) {
# read input
input <- raster(inf[i])
# do something
output <- data.frame(id=1)
# write output
write.csv(output, outf[i])
}
It's very hard to help without further information. What was the issue with your approach of looping through all files using list.files(). In general, this should work.
Furthermore, most likely you don't want to calculate all available landscape metrics, but rather specify a subselection during the calculate_lsm() function call.

How to store a folder containing over 30 zipped files into a variable in r

I used the package 'GDELTtools' to download data from GDELT. Now, the data was downloaded however, no variable was stored in the global environment. I want to store the data into a dataframe variable so I can analyze it.
The folder contains over 30 zipped files. Every zipped file contains one csv. I need to store all these csvs in one variable in the Global Environment of r. I hope this can be done.
Thank you in advance!
Haven't written R for a while so I will try my best.
Read the comments carefully, cause they will explain the procedure.
I will attach the links to check information for: unzip, readCSV, mergeDataFrames, emptyDataFrame, concatinateStrings
According to docs of GDELTtools you can easily specify folder of download by providing local.folder="~/gdeltdata" as parameter to GetGDELT() function.
After that you can list.files("path/to/files/directory") function to obtain a vector of file names used in the explanation code bellow. Check the docs for more examples and explanation.
# set path to of unzip output
outDir <-"C:\\Users\\Name\\Documents\\unzipfolder"
# relative path where zip files are stored
relativePath <- "C:\\path\\to\\my\\directory\\"
# create varible to store all the paths to the zip files in a vector
zipPaths <- vector()
# since we have 30 files we should iterate through
# I assume you have a vector with file names in the variable fileNames
for (name in fileNamesZip) {
# Not sure if it will work but use paste() to concat strings
zipfilepath <- paste0(relativePath, name, ".zip")
# append filepath
append(zipPaths, zipfilepath)
}
# now we have a vector which contains all the paths to zip files
# use unzip() function and pass zipPaths to it. (Read official docs)
unzip(files=zipPaths, exdir=outDir)
# initialize dataframe for all the data. You must provide datatypes for the columns.
total <- data.frame=(Doubles=double(),
Ints=integer(),
Factors=factor(),
Logicals=logical(),
Characters=character(),
stringsAsFactors=FALSE)
# now its time to store data by reading csv files and storing them into dataframe.
# again, I assume you have a vector with file names in the variable fileNames
for (name in fileNamesCSV) {
# create the csv file path
csvfilepath <- paste0(outDir, name, ".csv")
# read data from csv file and store in in a dataframe
dataFrame = read.csv(file=csvfilepath, header=TRUE, sep=",")
# you will be able to merge dataframes only if they are equal in structure. Specify the column names to merge by.
total <- merge(data total, data dataFrame, by=c("Name1","Name2"))
}
Something potentially much simpler:
list.files() lists the files in a directory
readr::read_csv() will automatically unzip files as necessary
dplyr::bind_rows() will combine data frames
So try:
lf <- list.files(pattern="\\.zip")
dfs <- lapply(lf,readr::read_csv)
result <- dplyr::bind_rows(dfs)

How to not overwrite file in R

I am trying to copy and paste tables from R into Excel. Consider the following code from a previous question:
data <- list.files(path=getwd())
n <- length(list)
for (i in 1:n)
{
data1 <- read.csv(data[i])
outline <- data1[,2]
outline <- as.data.frame(table(outline))
print(outline) # this prints all n tables
name <- paste0(i,"X.csv")
write.csv(outline, name)
}
This code writes each table into separate Excel files (i.e. "1X.csv", "2X.csv", etc..). Is there any way of "shifting" each table down some rows instead of rewriting the previous table each time? I have also tried this code:
output <- as.data.frame(output)
wb = loadWorkbook("X.xlsx", create=TRUE)
createSheet(wb, name = "output")
writeWorksheet(wb,output,sheet="output",startRow=1,startCol=1)
writeNamedRegion(wb,output,name="output")
saveWorkbook(wb)
But this does not copy the dataframes exactly into Excel.
I think, as mentioned in the comments, the way to go is to first merge the data frames in R and then writing them into (one) output file:
# get vector of filenames
filenames <- list.files(path=getwd())
# for each filename: load file and create outline
outlines <- lapply(filenames, function(filename) {
data <- read.csv(filename)
outline <- data[,2]
outline <- as.data.frame(table(outline))
outline
})
# merge all outlines into one data frame (by appending them row-wise)
outlines.merged <- do.call(rbind, outlines)
# save merged data frame
write.csv(outlines.merged, "all.csv")
Despite what microsoft would like you to believe, .csv files are not excel files, they are a common file type that can be read by excel and many other programs.
The best approach depends on what you really want to do. Do you want all the tables to read into a single worksheet in excel? If so you could just write to a single file using the append argument to the write.csv or other functions. Or use a connection that you keep open so each new one is appended. You may want to use cat to put a couple of newlines before each new table.
Your second attempt looks like it uses the XLConnect package (but you don't say, so it could be something else). I would think this the best approach, how is the result different from what you are expecting?

Resources