How to pull and loop the data in R studio? - r

It would be great if some one could help on below requirement.
My requirement is pull the data from Hive table based on "Fiscal Quarter" and load it to txt file. Process should be like loop, i would expect 3 txt files (FY19Q1_Txtfile1.txt/FY19Q2_Txtfile2.txt/FY19Q3_Txtfile3.txt) with 3 iterations.

Once your table is stored as a data.frame on R, named data for example, you can do that :
write.csv(data[data$Fiscal_Quarter == 'FY19Q1'], 'FY19Q1_Txtfile1.txt')
write.csv(data[data$Fiscal_Quarter == 'FY19Q2'], 'FY19Q2_Txtfile3.txt')
write.csv(data[data$Fiscal_Quarter == 'FY19Q3'], 'FY19Q3_Txtfile3.txt')
And if you want to use a loop instead :
for (i in 1:3){
file_name = paste('FY19Q', i, '_Txtfile', i, '.txt',sep="")
FQ = paste('FY19Q', i, sep="")
write.csv(data[data$Fiscal_Quarter == FQ], file_name)
}
I hope this answers the question.

Related

Repeating same task with for loop using filename pattern in csv in r

I am repeating some tasks in R. I first read the csv file, do some repeating work, and write the csv file based on the date on the filename. I want to use for loop by using the filename pattern especially using day value (e.g. 17 in below example) say from 1 to 31. Could anyone help me how to code for loop here? Thanks in advance.
text <- read_csv("D://2017-10-17.csv")
... Some work here ...
write_csv(text , "2017-10-17_csv_backup.csv", na = "")
You could do
for (i in 1:31) {
text <- read_csv(paste0("D://2017-10-", i, ".csv"))
... Some work here ...
write_csv(text , paste0("2017-10-", i, "_csv_backup.csv"), na = "")
}

Append files based on their names

I am new in R and I have a lot of climate data files in text format with long names in the same folder, for example, "tasmax_SAM-44_ICHEC-EC-EARTH_rcp26_r12i1p1_SMHI-RCA4_v3_day_20060101-20101231.txt" where each term separated by "_" corresponds to a characteristic like the variable, domain, institute, scenario, etc.
What I want is a code that allows me to select all the files in my folder that have the same name as model name, scenario name, gcm name and append them by rows.
What I tried is to first create a list of the files and assigned variables for each part of their name like model_name, gcm_name, etc.
And then created a condition where I compare those variables through the files with a loop.
file <- list.files ( pattern = '*.txt' )
group <- function(input){
index = which(file == input)
df=read.table(input,header=FALSE,sep="")
fname= unlist((strsplit(input,"_")),use.names=FALSE)
model_name=fname[3]
sce_name=fname[4]
gcm_name=fname[6]
m=1
for (m in 1:length(file)) {
if (model_name[m]==model_name[m+1] & sce_name[m]==sce_name[m+1] & gcm_name[m]==gcm_name[m+1]) {
data=rbind(df[m],df[m+1])
} else {}
}
}
for (i in 1:length(file)) {
group(file[i])
}
The error I had with my code is this:
Error in if (model_name[m] == model_name[m + 1] & sce_name[m] ==
sce_name[m + : missing value where TRUE/FALSE needed
In the end, the code should append files that meet the if a condition like for example making a file out of these two files:
tasmax_SAM-44_ICHEC-EC-EARTH_rcp26_r12i1p1_SMHI-RCA4_v3_day_20060101-20101231.txt
tasmax_SAM-44_ICHEC-EC-EARTH_rcp26_r12i1p1_SMHI-RCA4_v3_day_20110101-20151231.txt
Any help and suggestions are very welcome!
I would suggest a completely different approach:
Get the list of all txt files:
file <- list.files ( pattern = '*.txt' )
Read all the files into a single dataframe:
library(dplyr)
library(readr)
df <- suppressMessages(do.call(bind_rows,lapply(file, read_csv, col_names = FALSE)))
Then group_by the fields you want and write each frame into a separate csv file
df %>%
group_by(X3, X4, X6) %>%
do(write_csv(., paste(.$X3, .$X4, .$X6, ".csv", sep = "_")))
Not sure if i get your question completely but this may help:
The code works as follows
Read the values of the file you give as input.
loop over all other files and append them if they match your conditions.
The If condition checks the values of your input and then compares it with the names of file[m] now. If true, it gets appended to your data. Another fix: you have to use return(data) at the end of your function.
file <- list.files ( pattern = '*.txt' )
group <- function(input){
index = which(file == input)
data=read.table(input,header=FALSE,sep="")
fname= unlist((strsplit(input,"_")),use.names=FALSE)
model_name=fname[3]
sce_name=fname[4]
gcm_name=fname[6]
for (m in 2:length(file)) {
index = file[m]
df_new=read.table(file[m],header=FALSE,sep="")
fname= unlist((strsplit(input,"_")),use.names=FALSE)
if (model_name==fname[3] & sce_name==fname[4] & gcm_name==fname[6]) {
data=rbind(data,df_new)
} else {}
}
return(data)
}
group(file[1])
Problems which still have to be solved: You have to fix if you don't input the first file. Since this code using the file you input in your group function. But the for loop goes with the second file. So if you use group(file[3]) the first file will be skipped and the third file will be doubled. You could use something like another if condition. if(file==input){skip} (not actual syntax, just for an idea, also make sure you get your loop range correct then)

Specifying consecutive file names and assigning consecutive vectors with counter variable in for loops

I am trying to analyze 10 sets of data, for which I have to import the data, remove some values and plot histograms. I could do it individually but can naturally save a lot of time with a for loop. I know this code is not correct, but I have no idea of how to specify the name for the input files and how to name each iterated variable in R.
par(mfrow = c(10,1))
for (i in 1:10)
{
freqi <- read.delim("freqspeci.frq", sep="\t", row.names=NULL)
freqveci <- freqi$N_CHR
freqveci <- freqveci[freqveci != 0 & freqveci != 1]
hist(freqveci)
}
What I want to do is to have the counter number in every "i" in my code. Am I just approaching this the wrong way in R? I have read about the assign and paste functions, but honestly do not understand how I can apply them properly in this particular problem.
you can do if in several ways:
Use list.files() to get all files given directory. You can use regular expression as well. See here
If the names are consecutive, then you can use
for (i in 1:10)
{
filename <- sprintf("freqspeci.frq_%s",i)
freqi <- read.delim(filename, sep="\t", row.names=NULL)
freqveci <- freqi$N_CHR
freqveci <- freqveci[freqveci != 0 & freqveci != 1]
hist(freqveci)
}
Use also can use paste() to create file name.
paste("filename", 1:10, sep='_')
you could just save all your datafiles into an otherwise empty Folder. Then get the filenames like:
filenames <- dir()
for (i in 1:length(filenames)){
freqi <- read.delim("freqspeci.frq", sep="\t", row.names=NULL)
# and here whatever else you want to do on These files
}

How to automate read.csv command in R?

I'm doing something stupid and I cannot get read.csv to write a lot of files.
If I write:
write.csv(X1, file = "X1.csv")
Then it writes a ~2mb csv file which is ok. I have around 2000 variables in memory and I've tried
for (i in seq_along(fotos)) {
write.csv(paste("X", i, sep = ""), file = paste(paste("X", i, sep = ""),"csv", sep="."))}
I obtain the desired files but the files are ~2kb and X1.csv contains only one cell saying "X1.csv", and all all the files are similar because X1000.csv contains "X1000.csv", this is unlike the command write.csv(X1, file = "X1.csv") which creates a file X1.csv containing a matrix of 96x96.
Any idea of what I'm doing wrong?
Many thanks in advance.
You can get the object by name with the function get. However, it is much better to read the data frames into a list than into objects related by having common names.
So you can create a list of the data frames:
X <- lapply(seq_along(fotos), function(i) get(paste0("X", i)))
names(x) <- fotos
And then write them (and this is what you'd use if you had a list to start with):
lapply(names(X), function(name) write.csv(X[[name]], paste(name, 'csv', sep='.')))
You could try using the get() function
for (i in seq_along(fotos)) {
write.csv(get(paste("X", i, sep = "")), file = paste(paste("X", i, sep = ""),"csv", sep="."))}

R: Saving Output as xlsx in for loop

Using (openxlsx) package to write xlsx files.
I have a variable that is a vector of numbers
x <- 1:8
I then paste ".xlsx" to the end of each element of x to later create an xlsx file
new_x <- paste(x,".xlsx", sep = "")
I then write.xlsx using the ("openxlsx") package in a forloop to create new xlsx files
for (i in x) {
for (j in new_x) {
write.xlsx(i,j)
}}
When I open ("1.xlsx" - "8.xlsx"), all the files only have the number "8" on them. What I don't understand is why it doesn't have the number 1 for 1.xlsx - 7 for 7.xlsx, why does the 8th one overwrite everything else.
I even tried creating a new output for the dataframes as most others suggested
for (i in x) {
for (j in new_x) {
output[[i]] <- i
write.xlsx(output[[i]],j)
}}
And it still comes up with the same problem. I don't understand what is going wrong.
The problem is that you are creating each Excel file multiple times because you have nested loops. Try just using a single loop, and referring to an element of new_x.
x <- 1:8
new_x <- paste(x,".xlsx", sep = "")
for (i in seq_along(x)) {
write.xlsx(i,new_x[i])
}
if you want to read a number of .csv files and save them as xlsx files it is a similar approach, you still want to only have a single for loop such as:
# Define directory of where to look for csv files and where to save Excel files
csvDirectory <- "C:/Foo/Bar/"
ExcelDirectory <- paste0(Sys.getenv(c("USERPROFILE")),"\\Desktop")
# Find all the csv files of interest
csvFiles <- list.files(csvDirectory,"*.csv")
# Go through the list of files and for each one read it into R, and then save it as Excel
for (i in seq_along(csvFiles)) {
csvFile <- read.csv(paste0(csvDirectory,"/",csvFiles[i]))
write.xlsx(csvFile, paste0(ExcelDirectory,"/",gsub("\\.csv$","\\.xlsx",csvFiles[i])))
}

Resources