Write all variable values to a table - r

I'm a total R-newbie and I have a simple question which might be hilarious but I could not find a answer even though I searched for 4 hours. I might miss the concept.
I write a Monte-Carlo script with a lot of variables stored in differnet environments. At the end of every iteration I want to write all variables (the ones which are listed when typing ls()) to a table.
This would be a working example (without the item I ask for) of what I want to do. (Thank you for your help sofar, it helped me to build that example!)
#input data (data will be manipulated for mc later on)
ha<-5
w_eff<-1.9
v_T1<-8
n<-1000 #number of iterations
#function
T1_func <- function(ha_mc, w_eff_mc, v_T1_mc){
T1_result <- ((ha*10)/(w_eff*v_T1));
return(T1_result)
}
for(i in 1:n){ #number of iterations
#MC maipulation (illustrative)
ha_mc<-rnorm(1, ha, sd=1)
w_eff_mc<-rnorm(1, w_eff, sd=1)
v_T1_mc<-rnorm(1, v_T1, sd=1)
#calculation
T1_mc<-T1_func(ha_mc, w_eff_mc, v_T1_mc)
#now I want to write all variables to a table
df<-data.frame(ha, w_eff, v_T1, ha_mc, w_eff_mc, v_T1_mc, T1_mc)
write.table(df, file = "result.txt", append = TRUE, quote = TRUE, sep = " ",
eol = "\n", na = "NA", dec = ".", row.names = FALSE,
col.names = !file.exists("result.txt"), qmethod = c("escape", "double"))
}
My question would be: how do I get that:
df<-data.frame(ha, w_eff, v_T1, ha_mc, w_eff_mc, v_T1_mc, T1_mc)
without writing down all the variables (ha, w_eff, v_T1, ha_mc, w_eff_mc, v_T1_mc, T1_mc) but with something like "ls()". And how do I get that for the variables in the different environments so that I will have a column named "my.env$w_eff".
Thank you very much!

I woud suggest not using ls() and instead making a data.frame which contains the variables you want to store. Here I firstly create the file "results.txt" with the correct column headers (I'm storing values of a, b, and c) and then in each iteration I append the corresponding values to the file. Hope this helps:
n <- 10L
write.table(data.frame("a", "b", "c"), file = "result.txt",
col.names = FALSE, row.names = FALSE)
for (i in seq_len(n)) {
#do MC
a <- rnorm(1L)
b <- exp(a)
c <- a + b
write.table(data.frame(a, b, c), file = "result.txt",
append = TRUE, row.names = i, col.names = FALSE)
}

Thats the solution I found with your help, thx!
write_table_func <- function(env_name, file_part_name, dir_name){
#write input to table
df_input<-data.frame(as.list(get(env_name), all.names=TRUE))
sort.df_input <- df_input[,order(names(df_input))]
filename<-(paste(sep="", dir_name, "/", "tabl_", process_n, "_", process_step_n, file_part_name, ".txt"));
suppressWarnings(write.table(sort.df_input, file = paste(filename), append = TRUE, quote = TRUE, sep = " ",
eol = "\n", na = "NA", dec = ".", row.names = FALSE,
col.names = !file.exists(paste(filename)), qmethod = c("escape", "double")));
rm(df_input);
rm(sort.df_input);
}

Related

Are there functions in R that can output all the values in a row, of a data frame, if I have a specific ID from that row

I have two datasets containing a lot of data. I originally wanted to compile a list of rsIDs that were common between two data sets, so I wrote the following:
file1 <- read.csv("CKD.csv", header = TRUE, sep = ",")
file2 <- read.csv("eGFR.csv", header = TRUE, sep = ",")
write.table(file1$rsID.[match(file1$rsID., file2$rsID., nomatch = NA, incomparables = TRUE)],
"rs_matches_Raw.csv", sep = ",", row.names = FALSE, col.names = c("rsID.")
)
x <- read.csv("rs_matches_Raw.csv",header = TRUE, sep = ",")
write.table(na.omit(x), "rs_matches_final", sep = ",", row.names = FALSE, col.names = c("rsID."))
It did what I wanted it to do. Now I want some additional information; for example the chromosomal location. Is there a way that I can use my above result and apply it to the data-set to get the rest of the information?
For example: suppose that rs1, rs2, and rs3 are in both files.
x<- c("rs1", "rs2", "rs3")
f(x) = variantIDrs1, variantIDrs2, variantIDrs3
And ideally get more information than just this, but this is just an example.
I have tried using match.data.frame from EcFun and an inner join from plyr. Thanks.
I was able to get this work using dplyr package select and filter functions.

write results sequentially in a loop in r

I have a bunt of single files which need to apply a test. I need to find the way to write automatically results of each file into a file. Here is what I do:
library(ape)
stud_files <- list.files("path/dir/data",full.names = T)
for (f in stud_files) {
df <- read.table(f, header=TRUE, sep=";")
df_xts <- as.xts(df$cola, order.by = as.Date(df$colb,"%m/%d/%Y"))
pet <- testa(df_xts)
res <- data.frame(estimate = pet$estimate,
p.value=pet$p.value,
logi = pet$alternative)
write.dna(res,file = "res_testa.xls",format = "sequential")
}
This loop works well, except the last command which aim to write the results of each file consecutively, it saved only the last performance. And the results save as string, not a table as I define above (data.frame). Any idea in this case? Thanks in advance
Check help(write.dna).
write.dna(x, file, format = "interleaved", append = FALSE,
nbcol = 6, colsep = " ", colw = 10, indent = NULL,
blocksep = 1)
append a logical, if TRUE the data are appended to the file without
erasing the data possibly existing in the file, otherwise the file (if
it exists) is overwritten (FALSE the default).
Set append = TRUE and you should be all set.
As some of the comments point out, however, you are probably better off generating your table, and then writing it all at once to a file. Unless you have billions of files, you likely won't run out of memory.
Here is how I would approach this.
library(ape)
library(data.table)
stud_files <- list.files("path/dir/data",full.names = T)
sumfunc <- function(f) {
df <- read.table(f, header=TRUE, sep=";")
df_xts <- as.xts(df$cola, order.by = as.Date(df$colb,"%m/%d/%Y"))
pet <- testa(df_xts)
res <- data.table(estimate = pet$estimate,
p.value=pet$p.value,
logi = pet$alternative)
return(res)
}
lres <- lapply(stud_files, sumfunc)
dat <- rbindlist(lres)
write.table(dat,
file = "res_testa.csv",
sep = ",",
quote = FALSE,
row.names = FALSE)

Difficulty exporting data from R with write.csv and append

I would be delighted and most grateful if anyone can explain to me why I am having a problem exporting some data from a function which extracts coefficients from a linear model. I have hundreds to do so I’m hoping to build a loop to handle it but have fallen at an earlier hurdle.
I am using methods borrowed from someone much smarter at this stuff:
https://stat.ethz.ch/pipermail/r-sig-ecology/2008-May/000062.html
The relevant bits (data creation, the function and finally my attempt to export my data are below) but firstly I will mention that the data, “export”, is exported to the experiment.csv file as a single COLUMN. I am told that the Append property of the write.table function only works with rows. Consequently it overwrites previous runs of the same sets of commands rather than successfully appending it.
The error messages are of the form below: (they are all the same, one for each piece of information).
Warning messages:
1: In write.csv(export, file = "experiment.csv", append = TRUE, quote = TRUE, :
attempt to set 'append' ignored
#DATA CREATION
# create an empty list
mod <- list()
# start a loop for create 5 objects of class 'lm'
for (i in 1:5) {
x <- rnorm(i*10)
y <- rnorm(i*10)
mod[[paste("run",i,sep="")]] <- lm(y ~ x)
}
# FUNCTION TO EXTRACT DATA
myFun <-
function(lm)
{
out <- c(lm$coefficients[1],
lm$coefficients[2],
length(lm$model$y),
summary(lm)$coefficients[2,2],
pf(summary(lm)$fstatistic[1], summary(lm)$fstatistic[2],
summary(lm)$fstatistic[3], lower.tail = FALSE),
summary(lm)$r.squared)
names(out) <- c("intercept","slope","n","slope.SE","p.value","r.squared")
return(out)}
# FAILED ATTEMPT TO EXPORT
export <-myFun(mod$run1)
write.csv(export, file = "experiment.csv", append = TRUE, quote = TRUE, sep = " ",
eol = "\n", na = "NA", dec = ".", row.names = FALSE,
col.names = FALSE, qmethod = c("escape", "double"),
fileEncoding = "")

R: Dynamically create a variable name

I'm looking to create multiple data frames using a for loop and then stitch them together with merge().
I'm able to create my data frames using assign(paste(), blah). But then, in the same for loop, I need to delete the first column of each of these data frames.
Here's the relevant bits of my code:
for (j in 1:3)
{
#This is to create each data frame
#This works
assign(paste(platform, j, "df", sep = "_"), read.csv(file = paste(masterfilename, extension, sep = "."), header = FALSE, skip = 1, nrows = 100))
#This is to delete first column
#This does not work
assign(paste(platform, j, "df$V1", sep = "_"), NULL)
}
In the first situation I'm assigning my variables to a data frame, so they inherit that type. But in the second situation, I'm assigning it to NULL.
Does anyone have any suggestions on how I can work this out? Also, is there a more elegant solution than assign(), which seems to bog down my code? Thanks,
n.i.
assign can be used to build variable names, but "name$V1" isn't a variable name. The $ is an operator in R so you're trying to build a function call and you can't do that with assign. In fact, in this case it's best to avoid assign completely. You con't need to create a bunch of different variables. If you data.frames are related, just keep them in a list.
mydfs <- lapply(1:3, function(j) {
df<- read.csv(file = paste(masterfilename, extension, sep = "."),
header = FALSE, skip = 1, nrows = 100))
df$V1<-NULL
df
})
Now you can access them with mydfs[[1]], mydfs[[2]], etc. And you can run functions overall data.sets with any of the *apply family of functions.
As #joran pointed out in his comment, the proper way of doing this would be using a list. But if you want to stick to assign you can replace your second statement with
assign(paste(platform, j, "df", sep = "_"),
get(paste(platform, j, "df", sep = "_"))[
2:length(get(paste(platform, j, "df", sep = "_")))]
If you wanted to use a list instead, your code to read the data frames would look like
dfs <- replicate(3,
read.csv(file = paste(masterfilename, extension, sep = "."),
header = FALSE, skip = 1, nrows = 100), simplify = FALSE)
Note you can use replicate because your call to read.csv does not depend on j in the loop. Then you can remove the first column of each
dfs <- lapply(dfs, function(d) d[-1])
Or, combining everything in one command
dfs <- replicate(3,
read.csv(file = paste(masterfilename, extension, sep = "."),
header = FALSE, skip = 1, nrows = 100)[-1], simplify = FALSE)

With R, loop over two files at a time

Hello my favourite coding experts,
I am trying to loop through two files at a time in R: i.e. take one 'case' file and another 'control' file, create a graph and dump it into a pdf, then take another set of 2 files and do the same and so on. I have a list indicating which file is a case and which is a control, like this:
case control
A01 G01
A02 G02
A06 G03
and so on… which can be reproduced like this:
mylist<- data.frame(rbind(c("A01","G01"),c("A02","G02"),c("A06","G03")))
colnames(mylist)<- c('control', 'case')
I cannot find a way to specify which 2 files to loop through each time.
The file (each file with many variables) are: "/Users/francy/Desktop/cc_files_A01", ""/Users/francy/Desktop/cc_files_A02", "/Users/francy/Desktop/cc_files_A06", "/Users/francy/Desktop/cc_files_G01", "/Users/francy/Desktop/cc_files_G02", "/Users/francy/Desktop/cc_files_G03"
For each set of case and control, I would like to do this:
case<- read.table(file="/Users/francy/Desktop/case_files_A01.txt", sep = '\t', header = F)
case <- case[,c(1,2,19,20)]
colnames(case)<- c("ID", "fname", "lname", "Position")
control<- read.table(file="/Users/francy/Desktop/case_files_G01.txt", sep = '\t', header = F)
control <- control[,c(1,2,19,20)]
colnames(control)<- c("ID", "fname", "lname", "Position")
#t-test Position:
test<- t.test(case[20],control[20])
p.value= round(test$p.value, digits=3)
mean_case= round(mean(case[20], na.rm=T), digits=2)
mean_control= round(mean(control[20], na.rm=T), digits=2)
boxplot(c(case[20], control[20]), names=c(paste("case", "mean", mean_case, sep=":"),paste("control", "mean", mean_control, sep=":")))
And want to create a pdf file with all the boxplots.
This is what I have for now:
myFiles <- list.files(path= "/mypath/", pattern=".txt")
pdf('/home/graph.pdf')
for (x in myFiles) {
control <- read.table(file = myFiles[x], sep = '\t', header = F)
## How do I specify that is the other file here, and which file it is?
case <- read.table(file = myFiles[x], sep = '\t', header = F)
}
Any help is very appreciated. Thank you!
Why not just pass the pairs of files to the loops via a list?
files <- list(
c("fileA","fileB"),
c("fileC","fileD")
)
for( f in files ) {
cat("~~~~~~~~\n")
cat("f[1] is",f[1],"~ f[2] is",f[2],"\n")
}
The first time the loop runs, f contains the 1st element of the list files. Since the first element is a character vector of length two, f[1] contains the first file name of the pair, and f[2] contains the second. See the printed output of the above code, which should hopefully make it clear.
What probably makes more sense in this case, is building up the two filenames from your "list" (a data.frame?) of cases and controls.
If this "list" is present in a data.frame lcc, you could do something like:
for(i in seq(nrow(lcc)))
{
currentcase<-lcc$case[i]
currentcontrol<-lcc$control[i]
currentcasefilename<-paste("someprefix_", currentcase, "_somepostfix.txt")
currentcontrolfilename<-paste("someprefix_", currentcontrol, "_somepostfix.txt")
#now open and process both files...
}
Assuming your list of cases and controls is in an R object (dataframe or matrix) called mylist:
for (x in seq_along(nrow(mylist)) {
case <- read.table(file = paste("/my/path/", mylist[x, "case"], ".txt", sep = ""),
sep = "\t", header = F)
control <- read.table(file = paste("/my/path/", mylist[x, "control"], ".txt", sep = ""),
sep = "\t", header = F)
## your code here ##
}

Resources