updating column values when table name is a variable

updating column values when table name is a variable - r

First question here, and very new to R as well.
I have a loop creating data frames according to a list of studytables. I can read all the CSVs fine, but I would like to get the field "Subject" and add the variable "study" before what is currently in the field. My trouble is with the 2nd "assign" line, I can't get R to assign the new value to "Subject".
Thanks for all your help.
study <- 'study10'
studytables <- list('ae', 'subject')
studypath <- 'C:/mypath/'
for(table in studytables) {
destinframe <- paste(table,study, sep='')
file <- paste(studypath, table, '.CSV', sep='' )
assign(destinframe, read.csv(file)) # create all dataframes
assign(destinframe['Subject'], rep('testing', nrow(get(destinframe))))
}

Using assign like that really isn't a great idea. And as you can see it doesn't work well when you try to add columns to a data.frame. It's better to add the columns before you do the assign. So replace
assign(destinframe, read.csv(file))
assign(destinframe['Subject'], rep('testing', nrow(get(destinframe))))
with
dd <- read.csv(file)
dd$Subject <- paste(study, dd$Subject)
assign(destinframe, dd)

Related

Binding rows of multiple data frames into one data frame in R

I have a vector of file paths called dfs, and I want create a dataframe of those files and bind them together into one huge dataframe, so I did something like this :
for (df in dfs){
clean_df <- bind_rows(as.data.table(read.delim(df, header=T, sep="|")))
return(clean_df)
}
but only the last item in the dataframe is being returned. How do I fix this?

I'm not sure about your file format, so I'll take common .csv as an example. Replace the a * i part with actually reading all the different files, instead of just generating mockup data.
files = list()
for (i in 1:10) {
a = read.csv('test.csv', header = FALSE)
a = a * i
files[[i]] = a
}
full_frame = data.frame(data.table::rbindlist(files))

The problem is that you can only pass one file at a time to the function read.delim(). So the solution would be to use a function like lapply() to read in each file specified in your df.
Here's an example, and you can find other answers to your question here.
library(tidyverse)
df <- c("file1.txt","file2.txt")
all.files <- lapply(df,function(i){read.delim(i, header=T, sep="|")})
clean_df <- bind_rows(all.files)
(clean_df)
Note that you don't need the function return(), putting the clean_df in parenthesis prompts R to print the variable.

nested for loop to create histograms named according to list

I'm new to R and need to create a bunch of histograms that are named according to the population they came from. When I try running the loop without the "names" part, it works fine. The code below loops through the list of names and applies them in order, but I end up with 3,364 versions of the same exact histogram. If anyone has any suggestions, I'd really appreciate it.
popFiles <- list.files(pattern = "*.txt") # generates a list of the files I'm working with
popTables <- lapply(popFiles, read.table, header=TRUE, na.strings="NA")
popNames <- read.table(file.path("Path to file containing names", "popNamesR.txt"), header=FALSE,)
popNames <- as.matrix(popNames)
name <- NULL
table <- c(1:58)
for (table in popTables){
for (name in popNames){
pVals <- table$p
hist(pVals, breaks=20, xlab="P-val", main=name))
}
}

Try making a distinct iterator, and use that, rather than iterating over the table list itself. It's just easier to see what's going on. For example:
pdf("Myhistograms.pdf")
for(i in 1:length(popTables)){
table = popTables[[i]]
name = popNames[i]
pVals = table$p
hist(pVals, breaks=20, xlab="P-val", main=name))
}
dev.off()
In this case, your problem is that name and table are actually linked, but you have two for loops, so actually every combination of table and name are generated.

Select multiple rows in multiple DFs with loop in R

I have read multiple questionnaire files into DFs in R. Now I want to create new DFs based on them, buit with only specific rows in them, via looping over all of them.The loop appears to work fine. However the selection of the rows does not seem to work. When I try selecting with simple squarebrackts, i get the error "incorrect number of dimensions". I tried it with subet(), but i dont seem to be able to set the subset correctly.
Here is what i have so far:
for (i in 1:length(subjectlist)) {
p[i] <- paste("path",subjectlist[i],sep="")
files <- list.files(path=p,full.names = T,include.dirs = T)
assign(paste("subject_",i,sep=""),read.csv(paste("path",subjectlist[i],".csv",sep=""),header=T,stringsAsFactors = T,row.names=NULL))
assign(paste("subject_",i,"_t",sep=""),sapply(paste("subject_",i,sep=""),[c((3:22),(44:63),(93:112),(140:159),(180:199),(227:246)),]))
}

Here's some code that tries to abstract away the details and do what it seems like you're trying to do. If you just want to read in a bunch of files and then select certain rows, I think you can avoid the assign functions and just use sapply to read all the data frames into a list. Let me know if this helps:
# Get the names of files we want to read in
files = list.files([arguments])
df.list = sapply(files, function(file) {
# Read in a csv file from the files vector
df = read.csv(file, header=TRUE, stringsAsFactors=FALSE)
# Add a column telling us the name of the csv file that the data came from
df$SourceFile = file
# Select only the rows we want
df = df[c(3:22,44:63,93:112,140:159,180:199,227:246), ]
}, simplify=FALSE)
If you now want to combine all the data frames into a single data frame, you can do the following (the SourceFile column tells you which file each row originally came from):
# Combine all the files into a single data frame
allDFs = do.call(rbind, df.list)

R extract variable from multiple dataframe in loop

I have a lot of result from parametric study to analyze. Fortunately there is an output file where the output file are saved. I need to save the name of file. I used this routine:
IndexJobs<-read.csv("C:/Users/.../File versione7.1/
"IndexJobs.csv",sep=",",header=TRUE,stringsAsFactors=FALSE)
dir<-IndexJobs$WORKDIR
Dir<-gsub("\\\\","/",dir)
Dir1<-gsub(" C","C",Dir)
Now I use e for in order to read CSV and create different dataframe
for(i in Dir1){
filepath <- file.path(paste(i,"eplusout.csv",sep=""))
dat<-NULL
dat<-read.table(filepath,header=TRUE,sep=",")
filenames <- substr(filepath,117,150)
names <-substr(filenames,1,21)
assign(names, dat)
}
Now I want to extract selected variables from each database, and putting together each variable for each database into separated database. I would also joint name of variable and single database in order to have a clear database for making some analysis. I try to make something but with bad results.
I tried to insert in for some other row:
for(i in Dir1){
filepath <- file.path(paste(i,"eplusout.csv",sep=""))
dat<-NULL
dat<-read.table(filepath,header=TRUE,sep=",")
filenames <- substr(filepath,117,150)
names <-substr(filenames,1,21)
assign(names, dat)
datTest<-dat$X5EC132.Surface.Outside.Face.Temperature..C..TimeStep.
nameTest<-paste(names,"_Test",sep="")
assign(nameTest,datTest)
DFtest=c[,nameTest]
}
But for each i there is an overwriting of DFtest and remain only the last database column.
Some suggestion?Thanks

Maybe it will work if you replace DFtest=c[,nameTest] with
DFtest[nameTest] <- get(nameTest)
or, alternatively,
DFtest[nameTest] <- datTest
This procedure assumes the object DFtest exists before you run the loop.
An alternative way is to create an empty list before running the loop:
DFtest <- list()
In the loop, you can use the following command:
DFtest[[nameTest]] <- datTest
After the loop, all values in the list DFtest can be combined using
do.call("cbind", DFtest)
Note that this will only work if all vectors in the list DFtesthave the same length.

merge tables in Loop using R

I have a simple question regarding a loop that I wrote. I want to access different files in different directories and extract data from these files and combine into one table. My problem is that my loop is not adding the results of the different files but only updating with the species that is currently in the loop. Here it is my code:
for(i in 1:length(splist.par))
{
results<-read.csv(paste(getwd(),"/ResultsR10arcabiotic/",splist.par[i],"/","maxentResults.csv",sep=""),h=T)
species <- splist.par[i]
AUC <- results$Test.AUC[1:10]
AUC_SD <- results$AUC.Standard.Deviation[1:10]
Variable <- "a"
Resolution <- "10arc"
table <-cbind(species,AUC,AUC_SD,Variable,Resolution)
}
This is probably an easy question but I am not an experienced programmer. Thanks for the attention
Gabriel

I'd use lapply to get the desired data from each file and add the Species information, and then combine with rbind. Something like this (untested):
do.call(rbind, lapply(splist.par, function(x) {
d <- read.csv(file.path("ResultsR10arcabiotic", x, "maxentResults.csv"))
d <- d[1:10, c("Test.AIC", "AIC.Standard.Deviation")]
names(d) <- c("AUC", "AUC_SD")
cbind(Species=x, d, stringsAsFactors=FALSE)
}))

#Aaron's lapply answer is good, and clean. But to debug your code: you put a bunch of data into table but overwrite table every time. You need to do
table <-cbind(table, species,AUC,AUC_SD,Variable,Resolution)
BTW, since table is a function in R, I'd avoid using it as a variable name. Imagine:
table(table)
:-)

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

updating column values when table name is a variable - r

Related

Binding rows of multiple data frames into one data frame in R

nested for loop to create histograms named according to list

Select multiple rows in multiple DFs with loop in R

R extract variable from multiple dataframe in loop

merge tables in Loop using R

Categories

Resources