Difficulty exporting data from R with write.csv and append - r

I would be delighted and most grateful if anyone can explain to me why I am having a problem exporting some data from a function which extracts coefficients from a linear model. I have hundreds to do so I’m hoping to build a loop to handle it but have fallen at an earlier hurdle.
I am using methods borrowed from someone much smarter at this stuff:
https://stat.ethz.ch/pipermail/r-sig-ecology/2008-May/000062.html
The relevant bits (data creation, the function and finally my attempt to export my data are below) but firstly I will mention that the data, “export”, is exported to the experiment.csv file as a single COLUMN. I am told that the Append property of the write.table function only works with rows. Consequently it overwrites previous runs of the same sets of commands rather than successfully appending it.
The error messages are of the form below: (they are all the same, one for each piece of information).
Warning messages:
1: In write.csv(export, file = "experiment.csv", append = TRUE, quote = TRUE, :
attempt to set 'append' ignored
#DATA CREATION
# create an empty list
mod <- list()
# start a loop for create 5 objects of class 'lm'
for (i in 1:5) {
x <- rnorm(i*10)
y <- rnorm(i*10)
mod[[paste("run",i,sep="")]] <- lm(y ~ x)
}
# FUNCTION TO EXTRACT DATA
myFun <-
function(lm)
{
out <- c(lm$coefficients[1],
lm$coefficients[2],
length(lm$model$y),
summary(lm)$coefficients[2,2],
pf(summary(lm)$fstatistic[1], summary(lm)$fstatistic[2],
summary(lm)$fstatistic[3], lower.tail = FALSE),
summary(lm)$r.squared)
names(out) <- c("intercept","slope","n","slope.SE","p.value","r.squared")
return(out)}
# FAILED ATTEMPT TO EXPORT
export <-myFun(mod$run1)
write.csv(export, file = "experiment.csv", append = TRUE, quote = TRUE, sep = " ",
eol = "\n", na = "NA", dec = ".", row.names = FALSE,
col.names = FALSE, qmethod = c("escape", "double"),
fileEncoding = "")

Related

How to Create Table from Irregular Length Element in R

I'm new with R and seek some digestible guidance. I wish to create data.frame so I can create column and establish variables in my data. I start with exporting url into R and save into Excel;
data <- read.delim("http://weather.uwyo.edu/cgi-bin/wyowx.fcgi?
TYPE=sflist&DATE=20170303&HOUR=current&UNITS=M&STATION=wmkd",
fill = TRUE, header = TRUE,sep = "\t" stringsAsFactors = TRUE,
na.strings = " ", strip.white = TRUE, nrows = 27, skip = 9)
write.xlsx(data, "E:/Self Tutorial/R/data.xls")
This data got missing value somewhere in the middle of element thus make the length irregular. Due to irregular length I use write.table instead of data.frame.
As 1st attempt, in global environment, data exist in R value(NULL) not in R data;
dat.table = write.table(data)
str(dat.table) # just checking #result NULL?
try again
dat.table = write.table(data,"E:/Self Tutorial/R/data.xls", sep = "\t", quote = FALSE)
dat.table ##print nothing
remove sep =
dat.table = write.table(data,"E:/Self Tutorial/R/data.xls", quote = FALSE
dat.table ##print nothing
since its not working, I try read.table
dat.read <- read.table("E:/Self Tutorial/R/data.xls", header = T, sep = "\t")
Data loaded in R console, but as expected with irregular column distribution, (??even though I already use {na.strings = " ", strip.white = TRUE} in read.delim argument)
What should I understand from this mistake, and which is it. Thank you.

R find maxima of multiple variables from multiple .CSV files

I have multiple csv's, each containing multiple observations for one participant on several variables. Let's say each csv file looks something like the below, and the name of the file indicates the participant's ID:
data.frame(
happy = sample(1:20, 10),
sad = sample(1:20, 10),
angry = sample(1:20, 10)
)
I found some code in an excellent stackoverflow answer that allows me to access all files saved into a specific folder, calculate the sums of these emotions, and output them into a file:
# access all csv files in the working directory
fileNames <- Sys.glob("*.csv")
for (fileName in fileNames) {
# read original data:
sample <- read.csv(fileName,
header = TRUE,
sep = ",")
# create new data based on contents of original file:
data.summary <- data.frame(
File = fileName,
happy.sum = sum(sample$happy),
sad.sum = sum(sample$sad),
angry.sum = sum(sample$angry))
# write new data to separate file:
write.table(data.summary,
"sample-allSamples.csv",
append = TRUE,
sep = ",",
row.names = FALSE,
col.names = FALSE)}
However, I can ONLY get "sum" to work in this function. I would like to not only find the sums of each emotion for each participant, but also the maximum value of each.
When I try to modify the above:
for (fileName in fileNames) {
# read original data:
sample <- read.csv(fileName,
header = TRUE,
sep = ",")
# create new data based on contents of original file:
data.summary <- data.frame(
File = fileName,
happy.sum = sum(sample$happy),
happy.max = max(sample$happy),
sad.sum = sum(sample$sad),
angry.sum = sum(sample$angry))
# write new data to separate file:
write.table(data.summary,
"sample-allSamples.csv",
append = TRUE,
sep = ",",
row.names = FALSE,
col.names = FALSE)}
I get the following warning message:
In max(sample$happy) : no non-missing arguments to max; returning -Inf
Would sincerely appreciate any advice anyone can give me!
using your test data, the max() statement works fine for me. Is it related to a discrepancy between the sample code you have posted and your actual csv file structure?

Write all variable values to a table

I'm a total R-newbie and I have a simple question which might be hilarious but I could not find a answer even though I searched for 4 hours. I might miss the concept.
I write a Monte-Carlo script with a lot of variables stored in differnet environments. At the end of every iteration I want to write all variables (the ones which are listed when typing ls()) to a table.
This would be a working example (without the item I ask for) of what I want to do. (Thank you for your help sofar, it helped me to build that example!)
#input data (data will be manipulated for mc later on)
ha<-5
w_eff<-1.9
v_T1<-8
n<-1000 #number of iterations
#function
T1_func <- function(ha_mc, w_eff_mc, v_T1_mc){
T1_result <- ((ha*10)/(w_eff*v_T1));
return(T1_result)
}
for(i in 1:n){ #number of iterations
#MC maipulation (illustrative)
ha_mc<-rnorm(1, ha, sd=1)
w_eff_mc<-rnorm(1, w_eff, sd=1)
v_T1_mc<-rnorm(1, v_T1, sd=1)
#calculation
T1_mc<-T1_func(ha_mc, w_eff_mc, v_T1_mc)
#now I want to write all variables to a table
df<-data.frame(ha, w_eff, v_T1, ha_mc, w_eff_mc, v_T1_mc, T1_mc)
write.table(df, file = "result.txt", append = TRUE, quote = TRUE, sep = " ",
eol = "\n", na = "NA", dec = ".", row.names = FALSE,
col.names = !file.exists("result.txt"), qmethod = c("escape", "double"))
}
My question would be: how do I get that:
df<-data.frame(ha, w_eff, v_T1, ha_mc, w_eff_mc, v_T1_mc, T1_mc)
without writing down all the variables (ha, w_eff, v_T1, ha_mc, w_eff_mc, v_T1_mc, T1_mc) but with something like "ls()". And how do I get that for the variables in the different environments so that I will have a column named "my.env$w_eff".
Thank you very much!
I woud suggest not using ls() and instead making a data.frame which contains the variables you want to store. Here I firstly create the file "results.txt" with the correct column headers (I'm storing values of a, b, and c) and then in each iteration I append the corresponding values to the file. Hope this helps:
n <- 10L
write.table(data.frame("a", "b", "c"), file = "result.txt",
col.names = FALSE, row.names = FALSE)
for (i in seq_len(n)) {
#do MC
a <- rnorm(1L)
b <- exp(a)
c <- a + b
write.table(data.frame(a, b, c), file = "result.txt",
append = TRUE, row.names = i, col.names = FALSE)
}
Thats the solution I found with your help, thx!
write_table_func <- function(env_name, file_part_name, dir_name){
#write input to table
df_input<-data.frame(as.list(get(env_name), all.names=TRUE))
sort.df_input <- df_input[,order(names(df_input))]
filename<-(paste(sep="", dir_name, "/", "tabl_", process_n, "_", process_step_n, file_part_name, ".txt"));
suppressWarnings(write.table(sort.df_input, file = paste(filename), append = TRUE, quote = TRUE, sep = " ",
eol = "\n", na = "NA", dec = ".", row.names = FALSE,
col.names = !file.exists(paste(filename)), qmethod = c("escape", "double")));
rm(df_input);
rm(sort.df_input);
}

write function help in R

> MLest<- arima(X, order = c(1,0,0), method = c("ML"))
> MLest
Call:
arima(x = X, order = c(1, 0, 0), method = c("ML"))
>Coefficients:
ar1 intercept
0.2657 -0.0824
0.0680 0.1018
sigma^2 estimated as 1.121: log likelihood = -295.23, aic = 596.47
I would like to write the 0.2657 and 1.121 results to an output file. I have defined a path and file name and here is my codes.
When I use, write(MLest, file=filename, append=TRUE, sep="\t") I got the following error:
Error in cat(list(...), file, sep, fill, labels, append) :
argument 1 (type 'list') cannot be handled by 'cat'
When I use, write.table(MLest[1:2], file=filename,sep=" ", col.names = F, row.names = F)
It works but I have:
0.265705946688229 1.12087092992291
-0.0823543583874666 1.12087092992291
I would like to have a result as:
0.265705946688229 -0.0823543583874666 1.12087092992291 (each value to different columns)
What should I use?
write.table is a bit of an overkill for writing a single line in a file. I'd recommend you use cat directly on a vector. As you can see from the error message, that's what write.table uses under the hood. This works:
cat(with(MLest, c(coef, sigma2)), "\n", sep = "\t",
file = filename, append = TRUE)
But I will point out: every time you run this command, a file handle is created, moved to the end of the file, a new line is written, then the filehandle is closed. It is quite inefficient and you'd better open a file connection instead:
fh <- open(filename)
for (...) {
MLest <- arima(...)
cat(with(MLest, c(coef, sigma2)), "\n", sep = "\t", file = fh)
}
close(fh)
This way, only one filehandle is created and it always points to the end of your file.
Alternatively, you could wait to have all your arima outputs to create a whole data.frame or matrix of coefficients and only then print it with a single call to write.table.
Assuming you have built a list ll of arima outputs, you can create and write that matrix of coefficients by doing:
output.mat <- t(sapply(ll, with, c(coef, sigma2 = sigma2)))
write.table(output.mat, file = "test.csv", row.names = FALSE)
Try
write.table(unique(as.vector(MLest[1:2])), file=filename,sep=" ", col.names = F, row.names = F)

Avoid repeating statements when importing data

Iv'e written the following code to import data into R:
## specify where all the data files are stored
DataFolder <- "DataFolder"
## obtain the name of each file in DataFolder
files <- list.files(DataFolder)
## obtain name of each file
LocNames <- unique(sub("^([^.]*).*", "\\1", files)) # this removes the extension and keeps the unique names
for (i in 1:length(LocNames)){
#
car <- read.table(paste(DataFolder, paste(LocNames[i], ".car", sep=""), sep="/"),
header = TRUE, sep = "\t", colClasses=c(dateTime="POSIXct"))
car <- aggregate(car[colnames(car)[2:length(colnames(car))]],list(dateTime = cut(car$dateTime,breaks = "hour")),mean, na.rm = TRUE)
#
light <- read.table(paste(DataFolder, paste(LocNames[i], ".light", sep=""), sep="/"),
header = TRUE, sep = "\t", colClasses=c(dateTime="POSIXct"))
light <- aggregate(light[colnames(light)[2]],list(dateTime = cut(light$dateTime, breaks = "hour")),mean, na.rm = TRUE)
}
So, here I have a DataFolder where all of my files are stored. The files are named according to the location where the data was recorded and the extension of the file given the name of the variable measured. Here we have car sales and light as examples.
From here I would like to reduce the size of the arguments inside of the loop so instead of having to name one variable after the other repeating the same steps I want to only have to write the variable name e.g. car, light and then the outcome of the script shown will be returned.
Please let me know if my intentions have not been clear.
Just use a function. Something to the effect of
## specify where all the data files are stored
DataFolder <- "DataFolder"
## obtain the name of each file in DataFolder
files <- list.files(DataFolder)
readMyFiles <- function(DataFolder, LocNames, extension){
data <- read.table(paste(DataFolder, paste(LocNames[i], ".", extension, sep=""), sep="/"),
header = TRUE, sep = "\t", colClasses=c(dateTime="POSIXct"))
data <- aggregate(data[colnames(data)[2:length(colnames(data))]],list(dateTime = cut(data$dateTime,breaks = "hour")),mean, na.rm = TRUE)
data
}
## obtain name of each file
LocNames <- unique(sub("^([^.]*).*", "\\1", files)) # this removes the extension and keeps the unique names
for (i in 1:length(LocNames)){
car <- readMyFiles(DataFolder, LocNames, ".car")
light <- readMyFiles(DataFolder, LocNames, ".light")
}

Resources