I wish to write the output from each loop iteration to a separate .csv file or add each loop as a new row in a separate object, which can them be output as a .csv, sequentially.
I've tried using write.csv but I am unsure how to apply it in the loop whereby a new file is created for each i.
for (i in seq(1,1128,12)){
Array <- ncvar_get(temp_cont, varid = "TREFHT", start = c(1,1,1,i), count= c(144,96,60,12))
myArray <- array(Array, dim =c(144, 96, 60, 12))
Annual <- apply(myArray, c(1,2,3), mean)
myArray2 <- array(Annual, dim = c(144,96,60))
Annuallat <- apply(myArray2, c(2,3), mean)
myArray3 <- array(Annuallat, dim = c(96,60))
AnnualGLobal <- apply(myArray3, 2,mean)
AnnualGlobalc <- AnnualGLobal - 273.15
write.csv(AnnualGlobalc, file = "year[i].csv")
}
Maybe this would give you an example / start:
write.csv(AnnualGlobalc, file = paste0(i,".csv"))
You can use paste() or paste0() to generate file name associated with loop index i on the fly.
Note I did not use paste0(year[i],".csv"), for two concerns:
I did not see variable year predefined;
You iterate i through seq(1,1128,12), which I sort of doubt whether year[i] is reasonable.
define a list to loop through... I then would write a function that does what i want... use the function within a call to lapply() ...
Related
I have been struggling with finding a way to create a new data frame using a loop, where the main goal is to filter the data when is >= 0.5.
I´m using Rstudio; however, python is an option too.
Here is how looks like my data frame (csv file) and some lines of the script (incomplete):
df <- read.table(choose.files(), header = T, sep = ",", comment.char = "")
Site,Partition,alpha,beta,omega,alpha=beta,LRT,p-value,Total branch length
1,1,"0.000","0.000","NaN","0.000","0.000","1.000","0.000"
2,1,"0.060","0.046","0.774","0.048","0.049","0.825","0.000"
Then I use select function to take only two columns that interest me:
sdf <- subset(df, select = c("ï..Site", "alpha.beta"))
ï..Site alpha.beta
1 1 0.000
2 2 0.048
...
Then I thought in use a loop to create a new csv file, when the second column has a value >= 0.5 print this value, it doesn´t have a value that satisfies this requisite pass and print a 0.
Here I try differents ways; obviously neither works for me. Here are the last lines that I tried.
for (i in names(sdf1)) {
f_sdf1 <- sdf1[sdf1[, i] >= 0.5]
write.csv(f_sdf1, paste0(i, ".csv"))
}
So in this post I´m looking for some ideas to generate this script. Maybe it´s simple, but in this case, I need to ask how?
You could use subset to filter your data as in
# first get some example data
expl <- data.frame(site = 1:10, alpha.beta = runif(10))
print(expl)
# now do the filtering
expl.filtered = subset(expl, alpha.beta >= .5)
print(expl.filtered)
# Now write.table or write.csv...
I am trying to create a large number of data frames in a for loop using the "assign" function in R. I want to use the colnames function to set the column names in the data frame. The code I am trying to emulate is the following:
county_tmax_min_df <- data.frame(array(NA,c(length(days),67)))
colnames(county_tmax_min_df) <- c('Date',sd_counties$NAME)
county_tmax_min_df$Date <- days
The code I have so far in the loop looks like this:
file_vars = c('file1','file2')
days <- seq(as.Date("1979-01-01"), as.Date("1979-01-02"), "days")
f = 1
for (f in 1:2){
assign(paste0('county_',file_vars[f]),data.frame(array(NA,c(length(days),67))))
}
I need to be able to set the column names similar to how I did in the above statement. How do I do this? I think it needs to be something like this, but I am unsure what goes in the text portion. The end result I need is just a bunch of data frames. Any help would be wonderful. Thank you.
expression(parse(text = ))
You can set the names within assign, like that:
file_vars = c('file1', 'file2')
days <- seq.Date(from = as.Date("1979-01-01"), to = as.Date("1979-01-02"), by = "days")
for (f in seq_along(file_vars)) {
assign(x = paste0('county_', file_vars[f]),
value = {
df <- data.frame(array(NA, c(length(days), 67)))
colnames(df) <- paste0("fancy_column_",
sample(LETTERS, size = ncol(df), replace = TRUE))
df
})
}
When in {} you can use colnames(df) or setNames to assign column names in any manner desired. In your first piece of code you are referring to sd_counties object that is not available but the generic idea should work for you.
I have a matrix which I like to split automatically in overlapping parts and store the result in a single list object. I like to have a solution without loops.
mat = matrix(c(1:24), 4)
list = NULL
list[[1]] = mat[,c(1:2)]
list[[2]] = mat[,c(2:3)]
list[[3]] = mat[,c(3:4)]
list[[4]] = mat[,c(4:5)]
list[[5]] = mat[,c(5:6)]
Expected output
list
Thats what I like, but without to use a loop.
Try this:
lapply(seq_len(ncol(mat)-1), function(i) mat[,c(i,i+1)])
I am a long-time Stata user but am trying to familiarize myself with the syntax and logic of R. I am wondering if you could help me with writing more efficient codes as shown below (The "The Not-so-efficient Codes")
The goal is to (A) read several files (each of which represents the data of a year), (B) create the same variables for each file, and (C) combine the files into a single one for statistical analysis. I have finished revising "part A", but are struggling with the rest, particularly part B. Could you give me some ideas as to how to proceed, e.g. use unlist to unlist data.l first, or lapply to each element of data.l? I appreciate your comments-thanks.
More Efficient Codes: Part A
# Creat an empty list
data.l = list()
# Create a list of file names
fileList=list.files(path="C:/My Data, pattern=".dat")
# Read the ".dat" files into a single list
data.l = sapply(fileList, readLines)
The Not-so-efficient Codes: Part A, B and C
setwd("C:/My Data")
# Part A: Read the data. Each "dat" file is text file and each line in the file has 300 characters.
dx2004 <- readLines("2004.INJVERBT.dat")
dx2005 <- readLines("2005.INJVERBT.dat")
dx2006 <- readLines("2006.INJVERBT.dat")
# Part B-1: Create variables for each year of data
dt2004 <-data.frame(hhx = substr(dx2004,7,12),fmx = substr(dx2004,13,14),
,iphow = substr(dx2004,19,318),stringsAsFactors = FALSE)
dt2005 <-data.frame(hhx = substr(dx2005,7,12),fmx = substr(dx2005,13,14),
,iphow = substr(dx2005,19,318),stringsAsFactors = FALSE)
dt2006 <-data.frame(hhx = substr(dx2006,7,12),fmx = substr(dx2006,13,14),
iphow = substr(dx2006,19,318),stringsAsFactors = FALSE)
# Part B-2: Create the "iid" variable for each year of data
dt2004$iid<-paste0("2004",dt2004$hhx, dt2004$fmx, dt2004$fpx, dt2004$ipepno)
dt2005$iid<-paste0("2005",dt2005$hhx, dt2005$fmx, dt2005$fpx, dt2005$ipepno)
dt2006$iid<-paste0("2006",dt2006$hhx, dt2006$fmx, dt2006$fpx, dt2006$ipepno)
# Part C: Combine the three years of data into a single one
data = rbind(dt2004,dt2005, dt2006)
you are almost there. Its a combination of lapply and do.call/rbind to work with lapply's list output.
Consider this example:
test1 = "Thisistextinputnumber1"
test2 = "Thisistextinputnumber2"
test3 = "Thisistextinputnumber3"
data.l = list(test1, test2, test3)
makeDF <- function(inputText){
DF <- data.frame(hhx = substr(inputText, 7, 12), fmx = substr(inputText, 13, 14), iphow = substr(inputText, 19, 318), stringsAsFactors = FALSE)
DF <- within(DF, iid <- paste(hhx, fmx, iphow))
return(DF)
}
do.call(rbind, (lapply(data.l, makeDF)))
Here test1, test2, test3 represent your dx200X, and data.l should be the list format you get from the efficient version of Part A.
In makeDF you create your desired data.frame. The do.call(rbind, ) is somewhat standard if you work with lapply-return values.
You also might want to consider checking out the data.table-package which features the function rbindlist, replacing any do.call-rbind construction (and is much faster), next to other great utility for large data sets.
I have five dataframes (a-f), each of which has a column 'nq'. I want to find the max, min and average of the nq columns
classes <- c("a","b","c","d","e","f")
for (i in classes){
format(max(i$nq), scientific = TRUE)
format(min(i$nq), scientific = TRUE)
format(mean(i$nq), scientific = TRUE)
}
But the code is not working. Can you please help?
You can't use a character value as a data.frame name. The value "a" is not the same as the data.frame a.
You probably shouldn't have a bunch of data.frames lying around. You probably want to have them all in a list. Then you can lapply over them to get results.
mydata <- list(
a = data.frame(nq=runif(10)),
b = data.frame(nq=runif(10)),
c = data.frame(nq=runif(10)),
d = data.frame(nq=runif(10))
)
then you can do
lapply(mydata, function(x)
format(c(max(x$nq), min(x$nq), mean(x$nq)), scientific = TRUE)
)
to get all the values at once.
The reason it is not working is because 'i' is a character/string. As already mentioned by Mr.Flick you have to make it into a list.
Alternatively, you instead of writing i$nq in your loop you can write get(i)$nq. The get() function will search the workspace for an object by name and it will return the object itself. However, this is not as clean as making it into a list and using lapply.