Import multiple csv into R using a For loop - r

Hi I am new to R and struggling to understand where my script is going wrong. I am trying to import only the csv files that fall between the 2 dates Sdate
& Fdate entered near the top of the script. The script runs fine without any errors but only pulls in the last file in the list. I am on windows 10 and all the files are on the local machine. Any help will be appreciated. Thanks
Sdate <- as.Date("2018-10-01")
Fdate <- as.Date("2018-10-30")
Ndate = as.character.Date(seq.Date(from = as.Date(Sdate), to = as.Date(Fdate),
by = "days"), format ="%Y%m%d")
for (i in Ndate){
MyData <- read.csv(
file=paste('D:/Data/Merlin Data/Merlin BDD/T1/BDD_',i,'_T1.csv',sep = ""),
header=TRUE, sep=",")
}

The problem is you are overwriting your variable everytime your loop. So let's change that. You need to append your files to your dataframe.
One solution is to create an initial dataframe
MyData <- read.csv(file='D:/Data/Merlin Data/Merlin BDD/T1/BDD_20181001_T1.csv', header=TRUE, sep=",")
and afterwards append your data to this one with concatenationc(). Since you already read the first file let's set Sdate to Sdate<-as.Date("2018-10-02")
You should be able to read your Data then with:
for (i in :Ndate){
MyData <- read.csv(file=paste('D:/Data/Merlin Data/Merlin BDD/T1/BDD_',i,'_T1.csv',sep = ""), header=TRUE, sep=",")
}

Related

How to read in multiple .csv files in R through a if for statement?

I have many .csv files labeled by year. i.e.
2017TMAXMIN.csv
2016TMAXMIN.csv
2015TMAXMIN.csv
2014TMAXMIN.csv
.
.
.
And so on going back about 40 years.
I am attempting to read in these .csv files based on user input date. For example, the user may specify they would file from 2016 and 2017 based on the user input.
Below is the code I have so far.
setwd("P:/%%%%%%/FTP_Yearly_Climate_Data")
FromDate <- "20170115" #enter from date in YYYYMMDD format
ToDate <- "20161215" #enter to date in YYYYMMDD format
AnalysisType <- "maxtemp" #maxtemp, mintemp, precip
filename <- "TMAXMIN.csv"
FromYear <- as.numeric(substr(FromDate, 1, 4))
ToYear <- as.numeric(substr(ToDate, 1, 4))
if (FromYear != ToYear) {
for (i in ToYear:FromYear) {
i <- paste0(i, sub('\\..*', '', filename), '.csv')
read.csv(i, header=TRUE, sep=",")
print(i)
}
}
When I run this code, nothing imports; however, I do get a value of i labeled 2017TMAXMIN.csv. Ideally, I would like the two years to import. When the print(i) runs in the if/for loop, the output is below.
[1] "2016TMAXMIN.csv"
[1] "2017TMAXMIN.csv"
I am pretty new to R (coding in general), so any advice or help is very much appreciated. Thank you.

Reading data from text file and combining it with date in r

I downloaded data from the internet. I wanted to extract the data and create a data frame. You can find the data in the following filtered data set link: http://www.esrl.noaa.gov/gmd/dv/data/index.php?category=Ozone&type=Balloon . At the bottom of the site page from the 9 filtered data sets you can choose any station. Say Suva, Fiji (SUV):
I have written the following code to create a data frame that has Launch date as part of the data frame for each file.
setwd("C:/Users/")
path = "~C:/Users/"
files <- lapply(list.files(pattern = '\\.l100'), readLines)
test.sample<-do.call(rbind, lapply(files, function(lines){
data.frame(datetime = as.POSIXct(sub('^.*Launch Date : ', '', lines[grep('Launch Date :', lines)])),
# and the data, read in as text
read.table(text = lines[(grep('Sonde Total', lines) + 1):length(lines)]))
}))
The files are from FTP server. The pattern of the file doesn't look familiar to me even though I tried it with .txt, it didn't work. Can you please tweak the above code or any other code to get a data frame.
Thank you in advance.
I think the problem is that the search string does not match "Launch Date :" does not match what is in the files (at least the one I checked).
This should work
lines <- "Launch Date : 11 June 1991"
lubridate::dmy(sub('^.*Launch Date.*: ', '', lines[grep('Launch Date', lines)]))
Code would probably be easier to debug if you broke the problem down into steps rather than as one sentence
I took the following approach:
td <- tempdir()
setwd(td)
ftp <- 'ftp://ftp.cmdl.noaa.gov/ozwv/Ozonesonde/Suva,%20Fiji/100%20Meter%20Average%20Files/'
files <- RCurl::getURL(ftp, dirlistonly = T)
files <- strsplit(files, "\n")
files <- unlist(files)
dat <- list()
for (i in 1:length(files)) {
download.file(paste0(ftp, files[i]), 'data.txt')
df <- read.delim('data.txt', sep = "", skip = 17)
ld <- as.character(read.delim('data.txt')[9, ])
ld <- strsplit(ld, ":")[[1]][2]
df$launch.date <- stringr::str_trim(ld)
dat[[i]] <- df ; rm(df)
}

Batch rename list of files to totally different names using R

I have 330 files that i would like to rename using R. I saved the original names and the new names in a .csv file. I used a script which does not give an error but it does not change the names.
Here is a sample of the new names:(df1)
D:\Modis_EVI\Original\EVI_Smoothed\ MODIS_EVI_20010101.tif
D:\Modis_EVI\Original\EVI_Smoothed\ MODIS_EVI_20010117.tif
D:\Modis_EVI\Original\EVI_Smoothed\ MODIS_EVI_20010201.tif
And a sample of the original names:(df2)
D:\Modis_EVI\Original\EVI_Smoothed\ MODIS.2001001.yL1600.EVI.tif
D:\Modis_EVI\Original\EVI_Smoothed\ MODIS.2001033.yL1600.EVI.tif
D:\Modis_EVI\Original\EVI_Smoothed\ MODIS.2001049.yL1600.EVI.tif
Then here is the script i'm using:
csv_dir <- "D:\\"
df1 <- read.csv(paste(csv_dir,"New_names.csv",sep=""), header=TRUE, sep=",") # read csv
hdfs <- df1$x
hdfs <- as.vector(hdfs)
df2 <- read.csv(paste(csv_dir,"smoothed.csv",sep=""), header=TRUE, sep=",") # read csv
tifs <- df2$x
tifs <- as.vector(tifs)
for (i in 1:length(hdfs)){
setwd("D:\\Modis_EVI\\Original\\EVI_Smoothed\\")
file.rename(from =tifs[i], to = hdfs[i])
}
Any advice please?
I think you mix up the old and the new files, and you are trying to use rename the new file (names), which do not exist, to the old file names. This might work
file.rename(from =hdfs[i], to = tifs[i])
A general approach would go like this:
setwd("D:\\Modis_EVI\\Original\\EVI_Smoothed\\")
fin <- list.files(pattern='tif$')
fout <- gsub("_EVI_", ".", fin)
fout <- gsub(".tif", "yL1600.EVI.tif", fout)
for (i in 1:length(fin)){
file.rename(from=fin[i], to= fout[i])
}
To fix your script (do you really need .csv files?)
setwd("D:\\Modis_EVI\\Original\\EVI_Smoothed\\")
froms <- read.csv("d:/New_names.csv", stringsAsFactors=FALSE)
froms <- as.vector(froms$x)
First check if they exist:
all(file.exists(froms))
Perhaps you need to trim the names (remove whitespace) -- that is what the examples you give suggest
library(raster)
froms <- trim(froms)
all(file.exists(froms))
If they exist
tos <- read.csv("d:/smoothed.csv", stringsAsFactors=FALSE)
tos <- as.vector(tos$x)
# tos <- trim(tos)
for (i in 1:length(froms)) {
file.rename(froms[i], tos[i])
}

Combining files from a list based on date

I have a list of files that are all named similarly: "FlightTrackDATE.txt" where the date is expressed in YYYYMMDD. I read in all the files with the list.files() command, but this gives me all the files in that folder (only flight track files are in this folder). What I would like to do is create a new file that will combine all the files from the last 90 days (or three months, whichever is easier) and ignore the other files.
You can try this :
#date from which you want to consolidate (replace with required date)
fromDate = as.Date("2015-12-23")
for (filename in list.files()){
#extract the date from filename using substr ( characters 12- 19)
filenameDate = as.Date(substr(filename,12,19), format = "%Y%m%d")
#read and consolidate if the filedate is on or after from date
if ((filenameDate - fromDate) >=0){
#create consolidated list from first file
if (!exists('consolidated')){
consolidated <- read.table(filename, header = TRUE)
} else{
data = read.table(filename, header = TRUE)
#row bind to consolidate
consolidated = rbind(consolidated, data)
}
}
}
OUTPUT:
I have three sample files :
FlightTrack20151224.txt
FlightTrack20151223.txt
FlightTrack20151222.txt
Sample data:
Name Speed
AA101 23
Consolidated data:
Name Speed
1 AA102 24
2 AA101 23
Note:
1. Create the From date by subtracting from current date or using a fixed date like above.
2. Remember to clean up the existing consolidated data if you are running the script again. Data duplication might occur otherwise.
3. Save consolidated to file :)
Consider an lapply() solution without a need for list.files() since you know ahead of time the directory and file name structure:
path = "C:/path/to/txt/files"
# LIST OF ALL LAST 90 DATES IN YYYYMMDD FORMAT
dates <- lapply(0:90, function(x) format(Sys.Date()-x, "%Y%m%d"))
# IMPORT ALL FILES INTO A LIST OF DATAFRAMES
dfList <- lapply(paste0(path, "FlightTrack", dates, ".txt"),
function(x) if (file.exists(x)) {read.table(x)})
# COMBINE EACH DATA FRAME INTO ONE
df <- do.call(rbind, dfList)
# OUTPUT FINAL FILE TO TXT
write.table(df, paste0(path, "FlightTrack90Days.txt"), sep = ",", row.names = FALSE)

using cat in R to create a formatted R script

I want to read an R file or script, modify the name of the external data file being read and export the modified R code into a new R file or script. Other than the name of the data file being read (and the name of the new R file) I want the two R scripts to be identical.
I can come close, except that I cannot figure out how to retain the blank lines I use for readability and error reduction.
Here is the original R file being read. Note that some of the code in this file is non-sensical, but to me that is irrelevant. This code does not need to run.
# apple.pie.all.purpose.flour.arsc.Jun23.2013.r
library(my.library)
aa <- 10 # aa
bb <- c(1:7) # bb
my.data = convert.txt("../applepieallpurposeflour.txt",
group.df = data.frame(recipe =
c("recipe1", "recipe2", "recipe3", "recipe4", "recipe5")),
covariates = c(paste( "temp", seq_along(1:aa), sep="")))
ingredient <- c('all purpose flour')
function(make.pie){ make a pie }
Here is R code I use to read the above file, modify it and export the result. This R code runs and is the only code that needs to run to achieve the desired result (except that I cannot get the format of the new R script to match that of the original R script exactly, i.e., blank lines present in the original R script are not present in the new R script):
setwd('c:/users/mmiller21/simple r programs/')
# define new fruit
new.fruit <- 'peach'
# read flour file for original fruit
flour <- readLines('apple.pie.all.purpose.flour.arsc.Jun23.2013.r')
# create new file name
output.flour <- paste(new.fruit, ".pie.all.purpose.flour.arsc.Jun23.2013.r", sep="")
# add new file name
flour.a <- gsub("# apple.pie.all.purpose.flour.arsc.Jun23.2013.r",
paste("# ", output.flour, sep=""), flour)
# add line to read new data file
cat(file = output.flour,
gsub( "my.data = convert.txt\\(\"../applepieallpurposeflour.txt",
paste("my.data = convert.txt\\(\"../", new.fruit, "pieallpurposeflour.txt",
sep=""), flour.a),
sep=c("","\n"), fill = TRUE
)
Here is the resulting new R script:
# peach.pie.all.purpose.flour.arsc.Jun23.2013.r
library(my.library)
aa <- 10 # aa
bb <- c(1:7) # bb
my.data = convert.txt("../peachpieallpurposeflour.txt",
group.df = data.frame(recipe =
c("recipe1", "recipe2", "recipe3", "recipe4", "recipe5")),
covariates = c(paste( "temp", seq_along(1:aa), sep="")))
ingredient <- c('all purpose flour')
function(make.pie){ make a pie }
There is one blank line in the newly-created R file, but how can I insert all of the blank lines present in the original R script? Thank you for any advice.
EDIT: I cannot seem to duplicate the blank lines here on StackOverflow. They seem to be deleted automatically. StackOverflow is even deleting the indentation I am using and I cannot seem to replace it. Sorry about this. Automatic deletion of blank lines and indentation is problematic when the issue at hand is specifically about formatting. I cannot seem to fix the post to display the R code as formatted in my script. However, the code does display correctly when I am actively editing the post.
EDIT: June 27, 2013: The deletion of empty rows and indentation in the code for the original R file and in the code for the middle R file appears to be associated with my laptop rather than with StackOverflow. When I view this post and my answers on my office desktop the format is correct. When I view this post and my answers with my laptop the empty rows and indentation are gone. Perhaps my laptop monitor is malfunctioning. Sorry about assuming initially that the problem was with StackOverflow.
Here is a function that will create a new R file for every combination of two variables. Sorry the formatting of the code below is not better. The code does run and does work as intended (provided the name of the original R file ends in ".arsc.Jun26.2013.r" instead of in ".arsc.Jun23.2013.r" used in the original post):
setwd('c:/users/mmiller21/simple r programs/')
# define fruits of interest
fruits <- c('apple', 'pumpkin', 'pecan')
# define ingredients of interest
ingredients <- c('all.purpose.flour', 'sugar', 'ground.cinnamon')
# define every combination of fruit and ingredient
fruits.and.ingredients <- expand.grid(fruits, ingredients)
old.fruit <- as.character(rep('apple', nrow(fruits.and.ingredients)))
old.ingredient <- as.character(rep('all.purpose.flour', nrow(fruits.and.ingredients)))
fruits.and.ingredients2 <- cbind(old.fruit , as.character(fruits.and.ingredients[,1]),
old.ingredient, as.character(fruits.and.ingredients[,2]))
colnames(fruits.and.ingredients2) <- c('old.fruit', 'new.fruit', 'old.ingredient', 'new.ingredient')
# begin function
make.pie <- function(old.fruit, new.fruit, old.ingredient, new.ingredient) {
new.ingredient2 <- gsub('\\.', '', new.ingredient)
old.ingredient2 <- gsub('\\.', '', old.ingredient)
new.ingredient3 <- gsub('\\.', ' ', new.ingredient)
old.ingredient3 <- gsub('\\.', ' ', old.ingredient)
# file name
old.file <- paste(old.fruit, ".pie.", old.ingredient, ".arsc.Jun26.2013.r", sep="")
new.file <- paste(new.fruit, ".pie.", new.ingredient, ".arsc.Jun26.2013.r", sep="")
# read original fruit and original ingredient
flour <- readLines(old.file)
# add new file name
flour.a <- gsub(paste("# ", old.file, sep=""),
paste("# ", new.file, sep=""), flour)
# read new data file
old.data.file <- print(paste("my.data = convert.txt(\"../", old.fruit, "pie", old.ingredient2, ".txt\",", sep=""), quote=FALSE)
new.data.file <- print(paste("my.data = convert.txt(\"../", new.fruit, "pie", new.ingredient2, ".txt\",", sep=""), quote=FALSE)
flour.b <- ifelse(flour.a == old.data.file, new.data.file, flour.a)
flour.c <- ifelse(flour.b == paste('ingredient <- c(\'', old.ingredient3, '\')', sep=""),
paste('ingredient <- c(\'', new.ingredient3, '\')', sep=""), flour.b)
cat(flour.c, file = new.file, sep=c("\n"))
}
apply(fruits.and.ingredients2, 1, function(x) make.pie(x[1], x[2], x[3], x[4]))
Here is one solution that reproduces the original R script (except for the two desired changes) while also preserving the formatting of that original R script in the new R script.
setwd('c:/users/mmiller21/simple r programs/')
new.fruit <- 'peach'
flour <- readLines('apple.pie.all.purpose.flour.arsc.Jun23.2013.r')
output.flour <- paste(new.fruit, ".pie.all.purpose.flour.arsc.Jun23.2013.r", sep="")
flour.a <- gsub("# apple.pie.all.purpose.flour.arsc.Jun23.2013.r",
paste("# ", output.flour, sep=""), flour)
flour.b <- gsub( "my.data = convert.txt\\(\"../applepieallpurposeflour.txt",
paste("my.data = convert.txt\\(\"../", new.fruit, "pieallpurposeflour.txt", sep=""), flour.a)
for(i in 1:length(flour.b)) {
if(i == 1) cat(flour.b[i], file = output.flour, sep=c("\n"), fill=TRUE )
if(i > 1) cat(flour.b[i], file = output.flour, sep=c("\n"), fill=TRUE, append = TRUE)
}
Again, I apologize for my inability to format the above R code in a readable way. I have never encountered this problem on StackOverflow and do not know the solution. Regardless, the above R script solves the problem I described in the original post.
To see the formatting of the original R script you will have to click the edit button under the original post.
EDIT: June 25, 2013
I do not know what I was doing differently yesterday, but today I found that the following simple cat statement, in place of the for-loop immediately above, creates the new R script while preserving the formatting of the original R script.
cat(flour.b, file = output.flour, sep=c("\n"))

Resources