merge tables in Loop using R - r

I have a simple question regarding a loop that I wrote. I want to access different files in different directories and extract data from these files and combine into one table. My problem is that my loop is not adding the results of the different files but only updating with the species that is currently in the loop. Here it is my code:
for(i in 1:length(splist.par))
{
results<-read.csv(paste(getwd(),"/ResultsR10arcabiotic/",splist.par[i],"/","maxentResults.csv",sep=""),h=T)
species <- splist.par[i]
AUC <- results$Test.AUC[1:10]
AUC_SD <- results$AUC.Standard.Deviation[1:10]
Variable <- "a"
Resolution <- "10arc"
table <-cbind(species,AUC,AUC_SD,Variable,Resolution)
}
This is probably an easy question but I am not an experienced programmer. Thanks for the attention
Gabriel

I'd use lapply to get the desired data from each file and add the Species information, and then combine with rbind. Something like this (untested):
do.call(rbind, lapply(splist.par, function(x) {
d <- read.csv(file.path("ResultsR10arcabiotic", x, "maxentResults.csv"))
d <- d[1:10, c("Test.AIC", "AIC.Standard.Deviation")]
names(d) <- c("AUC", "AUC_SD")
cbind(Species=x, d, stringsAsFactors=FALSE)
}))

#Aaron's lapply answer is good, and clean. But to debug your code: you put a bunch of data into table but overwrite table every time. You need to do
table <-cbind(table, species,AUC,AUC_SD,Variable,Resolution)
BTW, since table is a function in R, I'd avoid using it as a variable name. Imagine:
table(table)
:-)

Related

R - Creating subsets of several datasets in a loop

I have a quite big number of quite heavy datasets. I would like to extract a subset out of each of them and save it into different csv files (one for each dataset). These are the commands I would like to loop for all the files I have in the folder:
df <-read.csv("1985.csv",header=FALSE,stringsAsFactors=TRUE,sep="\t")
df_short <- df[df$V6=="OPP", ]
write.csv(df_short, file = "OPP_1985.csv",row.names=FALSE)
rm(df)
rm(df_short)
This is probably a very noob question, but I am struggling to understand how to do it, so I would appreciate a lot help with this!
EDIT:
Following #SimonShine's suggestion, I have run this code and it works!
You don't specify if you are trying to collect the subsets into one dataset, or if you are trying to make one file per subset. You refer to OPP_1985 that appears out of scope for the code you wrote. Did you mean to refer to df_short?
You could start by abstracting what you want to do with one datafile into a function, e.g.:
extract_and_save_from_dataset <- function(csvfile) {
df <- read.csv(csvfile, header=F, stringsAsFactors=T, sep="\t")
df_short <- df[df$V6 == "OPP",]
csvfile_short <- gsub(".csv", "_short.csv", csvfile)
write.csv(df_short, file=csvfile_short, row_names=F)
}
Assuming you have a collection of dataset filenames, you could apply this function multiple times:
# csvfiles <- c("OPP_1985.csv", "OPP_1986.csv", ...)
csvfiles <- list.files("/path/to/my/csvfiles")
for (csvfile in csvfiles) {
extract_and_save_from_dataset(csvfile)
}
The data.table approach is probably the fastest option, specially if you have a large dataset. The function fwrite{data.table} works in parallel using many CPUS, making it extremely fast.
Here is how you can divide your original data according to subgroups defined based on the values of df$V6 and save each subset into a separate .csv file.
library (data.table)
set(df)[, fwrite(.SD, paste0("output_", V6,".csv")), by = V6, .SDcols=names(df) ]
ps. The name of the files will be output_*.csv where * is the correspondent V6 value.

nested for loop to create histograms named according to list

I'm new to R and need to create a bunch of histograms that are named according to the population they came from. When I try running the loop without the "names" part, it works fine. The code below loops through the list of names and applies them in order, but I end up with 3,364 versions of the same exact histogram. If anyone has any suggestions, I'd really appreciate it.
popFiles <- list.files(pattern = "*.txt") # generates a list of the files I'm working with
popTables <- lapply(popFiles, read.table, header=TRUE, na.strings="NA")
popNames <- read.table(file.path("Path to file containing names", "popNamesR.txt"), header=FALSE,)
popNames <- as.matrix(popNames)
name <- NULL
table <- c(1:58)
for (table in popTables){
for (name in popNames){
pVals <- table$p
hist(pVals, breaks=20, xlab="P-val", main=name))
}
}
Try making a distinct iterator, and use that, rather than iterating over the table list itself. It's just easier to see what's going on. For example:
pdf("Myhistograms.pdf")
for(i in 1:length(popTables)){
table = popTables[[i]]
name = popNames[i]
pVals = table$p
hist(pVals, breaks=20, xlab="P-val", main=name))
}
dev.off()
In this case, your problem is that name and table are actually linked, but you have two for loops, so actually every combination of table and name are generated.

Concatenating CSV files into single data frame in R, why am I still getting list as the class when I use typeof?

Here's the zip file to the specdata directory with all the CSV files in it:
https://d396qusza40orc.cloudfront.net/rprog%2Fdata%2Fspecdata.zip
I'm trying to get all the files into a data frame so I can use complete.cases, this code creates a list of data frames but not a single data frame so I am currently getting errors when trying to use complete.cases. I looked at using merge but I can't seem to wrap my head around how to use merge inside a for loop with multiple files. I have tried implementing rbind and I think I'm close to getting it that way but I also can't seem to figure out how to use it correctly inside a for loop. I am a beginner, trying to understand for loops before I move on vectorized functions like lapply.
Here's the code:
complete<- function(directory, id=1:332){
data<-NULL
for (i in 1:length(id)) {
data[[i]]<- c(paste(directory, "/",formatC(id[i], width=3, flag=0),".csv",sep=""))
}
cases<-NULL
for (d in 1:length(data)) {
cases[[d]]<-c(read.csv(data[d]))
}
df<-NULL
for (c in 1:length(cases)){
df[[c]]<-(data.frame(cases[c]))
}
df
}
The first thing to do is remove the for loops (if you are a beginner, then just get into the apply family right off the bat, for-loops in R are sometimes easier, but the apply family is the R way).
files <- list.files()
data <- lapply(files,function(x) read.csv(x))
Then depending on whether you actually want merge or rbind (because they are not the same)
data_rbind <- do.call("rbind", data)
Or
merge.df <- Reduce(function(x, y) merge(x, y, all=T,by="your_value",sort=F), data, accumulate=F)

Create several data.frames via a for loop and name them accordingly

I want to apply a for-loop to every element of a list (station code of air quality stations) and create a single data.frame for each station with specific data.
My current code looks like this:
for (i in Stations))
{i_PM <- data.frame(PM2.5$DateTime,PM2.5$i)
colnames(i_PM)[1] <- "DateTime"
i_AOT <- subset(MOD2011, MOD2011$Station_ID==i)
i <- merge(i_PM, i_AOT, by="DateTime")}
Stations consists of 28 elements. The result should be a data.frame for every station with the colums DateTime, PM2.5 and several elements from MOD2011.
I just dont get it running as its supposed to be. Im sure its my fault, I couldnt find the specific answer via the internet.
Can you show me my mistake?
Try assign:
for (i in Stations)) {
dat <- data.frame(PM2.5$DateTime,PM2.5$i)
dat2 <- subset(MOD2011, MOD2011$Station_ID==i)
colnames(i_PM)[1] <- "DateTime"
assign(paste(i, "_PM", sep=""), dat)
assign(paste(i, "_AOT", sep=""), dat2)
assign(i, merge(dat, dat2, by="DateTime"))
}
Note, however, that this is bad coding practice. You should reconsider your algorithm. For instance, use a list instead.

Aggregate data from different files into data structure

I noticed I encounter this task quite often when programming in R, yet I don't think I implement it "pretty".
I get a list of file names, each containing a table or a simple vector. I want to read all the files into some construct (list of tables?) so I can later manipulate them in simple loops.
I know how to read each file into a table/vector, but I do not know how to put all these objects together in one structure (list?).
Anyway, I guess this is VERY routine so I'll be happy to hear about your tricks.
Do all the files have the same # of columns? If so, I think this should work to put them all into one dataframe.
library(plyr)
x <- c(FILENAMES)
df <- ldply(x, read.table, sep = "\t", header = T)
If they don't have all the same columns, then use llply() instead
Or, without plyr:
filenames <- c("file1.txt", "file2.txt", "file3.txt")
mydata <- array(list(NULL))
for (i in 1:length(filenames))
{
mydata[[i]] <- read.table(filenames[i])
}
You can have a look at my answer here: Merge several data.frames into one data.frame with a loop.

Resources