R interate over files and write results - r

Making my first attempt in R to iterate over data files to do an analysis and write output to a file. Here is the code I have:
# load packages igraph, dils, sna
sink('analysis-output.txt')
for (week in c("002","003","004","005","006")) {
cat("*** ",week,"\n")
obs <- read.table(paste(week,"obs.txt",sep="-"),sep="\t", header=FALSE)
per <- read.table(paste(week,"per.txt",sep="-"),sep="\t", header=FALSE)
n <- length(obs)
mper <- AdjacencyFromEdgelist(per, check.full = TRUE)
mobs <- AdjacencyFromEdgelist(obs, check.full = TRUE)
g<-array(c(mper[[1]],mobs[[1]]),c(n,n,2))
q<-qaptest(g,gcor,g1=1,g2=2)
summary(q)
}
sink()
When I run this, there are no errors, but analysis-output.txt contains only the output of the cat function, i.e.
*** 002
*** 003
*** 004
*** 005
*** 006
However, when I run the code interior to the loop by itself like so
sink('analysis-output.txt')
week = "002"
cat("*** ",week,"\n")
obs <- read.table(paste(week,"obs.txt",sep="-"),sep="\t", header=FALSE)
per <- read.table(paste(week,"per.txt",sep="-"),sep="\t", header=FALSE)
n <- length(obs)
mper <- AdjacencyFromEdgelist(per, check.full = TRUE)
mobs <- AdjacencyFromEdgelist(obs, check.full = TRUE)
g<-array(c(mper[[1]],mobs[[1]]),c(n,n,2))
q<-qaptest(g,gcor,g1=1,g2=2)
summary(q)
sink()
I do get the summary results in the output file:
*** 002
QAP Test Results
Estimated p-values:
p(f(perm) >= f(d)): 0.355
p(f(perm) <= f(d)): 0.656
Test Diagnostics:
Test Value (f(d)): -0.09045692
Replications: 1000
Distribution Summary:
Min: -0.207416
1stQ: -0.1971768
Med: -0.1720356
Mean: 0.01895663
3rdQ: 0.1185298
Max: 0.9982945
What am I doing wrong?

sink diverts the console output to a file, so the data needs to be output by R to appear in the file. I think what's missing is that, within a loop, you need to explicitly print the output of summary to get it to appear in the console (and thus be written by sink).
Compare the console output of these two loops:
for (i in 1:5) {
print(summary(cars))
}
Prints results to console
for (i in 1:5) {
summary(cars)
}
Does not print to console

Related

R Markdown, output test results in loop

I'm looking for a nicely formated markdown output of test results that are produced within a for loop and structured with headings. For example
df <- data.frame(x = rnorm(1000),
y = rnorm(1000),
z = rnorm(1000))
for (v in c("y","z")) {
cat("##", v, " (model 0)\n")
summary(lm(x~1, df))
cat("##", v, " (model 1)\n")
summary(lm(as.formula(paste0("x~1+",v)), df))
}
whereas the output should be
y (model 0)
Call:
lm(formula = x ~ 1, data = df)
Residuals:
Min 1Q Median 3Q Max
-3.8663 -0.6969 -0.0465 0.6998 3.1648
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.05267 0.03293 -1.6 0.11
Residual standard error: 1.041 on 999 degrees of freedom
y (model 1)
Call:
lm(formula = as.formula(paste0("x~1+", v)), data = df)
Residuals:
Min 1Q Median 3Q Max
-3.8686 -0.6915 -0.0447 0.6921 3.1504
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.05374 0.03297 -1.630 0.103
y -0.02399 0.03189 -0.752 0.452
Residual standard error: 1.042 on 998 degrees of freedom
Multiple R-squared: 0.0005668, Adjusted R-squared: -0.0004346
F-statistic: 0.566 on 1 and 998 DF, p-value: 0.452
z (model 0)
and so on...
There are several results discussing parts of the question like here or here suggesting the asis-tag in combination with the cat-statement. This one includes headers.
Closest to me request seems to be this question from two years ago. However, even though highly appreciated, some of suggestions are deprecated like the asis_output or I can't get them to work in general conditions like the formattable suggestion (e.g. withlm-output). I just wonder -- as two years have past since then -- if there is a modern approach that facilitates what I'm looking for.
Solution Type 1
You could do a capture.output(cat(.)) approach with some lapply-looping. Send the output to a file and use rmarkdown::render(.).
This is the R code producing a *.pdf.
capture.output(cat("---
title: 'Test Results'
author: 'Tom & co.'
date: '11 10 2019'
output: pdf_document
---\n\n```{r setup, include=FALSE}\n
knitr::opts_chunk$set(echo = TRUE)\n
mtcars <- data.frame(mtcars)\n```\n"), file="_RMD/Tom.Rmd") # here of course your own data
lapply(seq(mtcars), function(i)
capture.output(cat("# Model", i, "\n\n```{r chunk", i, ", comment='', echo=FALSE}\n\
print(summary(lm(mpg ~ ", names(mtcars)[i] ,", mtcars)))\n```\n"),
file="_RMD/Tom.Rmd", append=TRUE))
rmarkdown::render("_RMD/Tom.Rmd")
Produces:
Solution Type 2
When we want to automate the output of multiple model summaries in the rmarkdown itself, we could chose between 1. selecting chunk option results='asis' which would produce code output but e.g. # Model 1 headlines, or 2. to choose not to select it, which would produce Model 1 but destroys the code formatting. The solution is to use the option and combine it with inline code that we can paste() together with another sapply()-loop within the sapply() for the models.
In the main sapply we apply #G.Grothendieck's venerable solution to nicely substitute the Call: line of the output using do.call("lm", list(.)). We need to wrap an invisible(.) around it to avoid the unnecessary sapply() output [[1]] [[2]]... of the empty lists produced.
I included a ". " into the cat(), because leading white space like ` this` will be rendered to this in lines 6 and 10 of the summary outputs.
This is the rmarkdown script producing a *pdf that can also be executed ordinary line by line:
---
title: "Test results"
author: "Tom & co."
date: "15 10 2019"
output: pdf_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
# Overview
This is an example of an ordinary code block with output that had to be included.
```{r mtcars, fig.width=3, fig.height=3}
head(mtcars)
```
# Test results in detail
The test results follow fully automated in detail.
```{r mtcars2, echo=FALSE, message=FALSE, results="asis"}
invisible(sapply(tail(seq(mtcars), -2), function(i) {
fo <- reformulate(names(mtcars)[i], response="mpg")
s <- summary(do.call("lm", list(fo, quote(mtcars))))
cat("\n## Model", i - 2, "\n")
sapply(1:19, function(j)
cat(paste0("`", ". ", capture.output(s)[j]), "` \n"))
cat(" \n")
}))
```
***Note:*** This is a concluding remark to show that we still can do other stuff afterwards.
Produces:
(Note: Site 3 omitted)
Context
I was hit by the same need as that of OP when trying to generate multiple plots in a loop, but one of them would apparently crash the graphical device (because of unpredictable bad input) even when called using try() and prevent all the remaining figures from being generated. I needed really independent code blocks, like in the proposed solution.
Solution
I've thought of preprocessing the source file before it was passed to knitr, preferably inside R, and found that the jinjar package was a good candidate. It uses a dynamic template syntax based on the Jinja2 templating engine from Python/Django. There are no syntax clashes with document formats accepted by R Markdown, but the tricky part was integrating it nicely with its machinery.
My hackish solution was to create a wrapper rmarkdown::output_format() that executes some code inside the rmarkdown::render() call environment to process the source file:
preprocess_jinjar <- function(base_format) {
if (is.character(base_format)) {
base_format <- rmarkdown:::create_output_format_function(base_format)
}
function(...) {
# Find the markdown::render() environment.
callers <- sapply(sys.calls(), function(x) deparse(as.list(x)[[1]]))
target <- grep('^(rmarkdown::)?render$', callers)
target <- target[length(target)] # render may be called recursively
render_envir <- sys.frames()[[target]]
# Modify input with jinjar.
input_paths <- evalq(envir = render_envir, expr = {
original_knit_input <- sub('(\\.[[:alnum:]]+)$', '.jinjar\\1', knit_input)
file.rename(knit_input, original_knit_input)
input_lines <- jinjar::render(paste(input_lines, collapse = '\n'))
writeLines(input_lines, knit_input)
normalize_path(c(knit_input, original_knit_input))
})
# Add an on_exit hook to revert the modification.
rmarkdown::output_format(
knitr = NULL,
pandoc = NULL,
on_exit = function() file.rename(input_paths[2], input_paths[1]),
base_format = base_format(...),
)
}
}
Then I can call, for example:
rmarkdown::render('input.Rmd', output_format = preprocess_jinjar('html_document'))
Or, more programatically, with the output format specified in the source file metadata as usual:
html_jinjar <- preprocess_jinjar('html_document')
rmarkdown::render('input.Rmd')
Here is a minimal example for input.Rmd:
---
output:
html_jinjar:
toc: false
---
{% for n in [1, 2, 3] %}
# Section {{ n }}
```{r block-{{ n }}}
print({{ n }}**2)
```
{% endfor %}
Caveats
It's a hack. This code depends on the internal logic of markdown::render() and likely there are edge cases where it won't work. Use at your own risk.
For this solution to work, the output format contructor must be called by render(). Therefore, evaluating it before passing it to render() will fail:
render('input.Rmd', output_format = 'html_jinja') # works
render('input.Rmd', output_format = html_jinja) # works
render('input.Rmd', output_format = html_jinja()) # fails
This second limitation could be circumvented by putting the preprocessing code inside the pre_knit() hook, but then it would only run after other output format hooks, like intermediates_generator() and other pre_knit() hooks of the format.

Load the MNIST digit recognition dataset with R and see any results

In the book "Machine Learning - A Probabilistic Perspective" by Kevin P. Murphy the first task reads:
Exercise 1.1 KNN classifier on shuffled MNIST data
Run mnist1NNdemo
and verify that the misclassification rate (on the first 1000 test
cases) of MNIST of a 1-NN classifier is 3.8%. (If you run it all on
all 10,000 test cases, the error rate is 3.09%.) Modify the code so
that you first randomly permute the features (columns of the training
and test design matrices), as in shuffledDigitsDemo, and then apply
the classifier. Verify that the error rate is not changed.
My simple understanding is that the exercise is looking for the 1-NN after loading the files(kNN() in R).
The files:
train-images-idx3-ubyte.gz: training set images (9912422 bytes)
train-labels-idx1-ubyte.gz: training set labels (28881 bytes)
t10k-images-idx3-ubyte.gz: test set images (1648877 bytes)
t10k-labels-idx1-ubyte.gz: test set labels (4542 bytes)
are taken from the The MNIST DATABASE
I found a popular template for loading the files:
# for the kNN() function
library(VIM)
load_mnist <- function() {
load_image_file <- function(filename) {
ret = list()
f = file(filename,'rb')
readBin(f,'integer',n=1,size=4,endian='big')
ret$n = readBin(f,'integer',n=1,size=4,endian='big')
nrow = readBin(f,'integer',n=1,size=4,endian='big')
ncol = readBin(f,'integer',n=1,size=4,endian='big')
x = readBin(f,'integer',n=ret$n*nrow*ncol,size=1,signed=F)
ret$x = matrix(x, ncol=nrow*ncol, byrow=T)
close(f)
ret
}
load_label_file <- function(filename) {
f = file(filename,'rb')
readBin(f,'integer',n=1,size=4,endian='big')
n = readBin(f,'integer',n=1,size=4,endian='big')
y = readBin(f,'integer',n=n,size=1,signed=F)
close(f)
y
}
train <<- load_image_file("train-images.idx3-ubyte")
test <<- load_image_file("t10k-images.idx3-ubyte")
train$y <<- load_label_file("train-labels.idx1-ubyte")
test$y <<- load_label_file("t10k-labels.idx1-ubyte")
}
show_digit <- function(arr784, col=gray(12:1/12)) {
image(matrix(arr784, nrow=28)[,28:1], col=col)
}
According to the comment, in the command line this should work:
# Error "Error in matrix(arr784, nrow = 28) : object 'train' not found"
show_digit(train$x[5,])
The question is how can I use the show_digit function ?
Edit Remove extra question
What I figured out for the problem:
First run the whole file in R Studio or ESS, then call the load_mnist() from the console.
After that execute show_digit(train$x[3,]) in the console again and it works.
Finding the KNN classifier can be done on the whole data set:
a <- knn(train, test, train$y) but it would be a very slow process.
Predictions for the result can be done like table(test$y, a), test$y is predicted, a is the actual result.

Writing a single NCDF4 file with variables extracted from 159 separate NCDF4 files

I have a 159 Ncdf4 files with 56 ensembles in each file. I want to pull out ensemble 1 from each of the 159 input files. Then produce a single NCDF4 file with all the ensemble 1 in a single file. My code is below. My problem is that only data the last file of the 159 is written to the output file. I think I am missing a nested loop, but not sure and my attempts have failed.
rm(list=ls())
library(ncdf.tools)
library(ncdf4)
library(ncdf4.helpers)
library(RNetCDF)
setwd("D:/Rwork/Project") # set working folder
#####Write NCDF4 files#############################################
dir("D:/Rwork/Project/Test")->xlab # This is the directory where the file for analysising are
filelist <- paste("Test/",dir("Test"),sep="")
N <- length(filelist) # Loop over the individual files
for(j in 1:N){
File<-nc_open(filelist[j])
print(filelist[j])
Temperature<-ncvar_get(File,"t2m")
Lat<-ncvar_get(File, "lat")
Lon<-ncvar_get(File,"lon")
Time<-ncvar_get(File,"time")
EnsambleNo.<-ncvar_get(File,"ensemble_member")
Temperature
Ensamble1<-Temperature[,,1,] #The Ensamble wanted, 1 to 56
Ensamble1<-round(Ensamble1,digits = 0)
tunits<-"hours since 1800-01-01 00:00:00"
#Define dimensions
##################################################################
londim<-ncdim_def("Lon","degrees_east",as.double(Lon))
latdim<-ncdim_def("Lat", "degrees_north",as.double(Lat))
timedim<-ncdim_def("Time",tunits,as.double(Time))
#Define variables
##################################################################
fillvalue<-1e32
dlname<-"tm2"
tmp_def<-ncvar_def("Ensamble1","deg_K", list(londim,latdim,timedim),fillvalue,dlname,prec = "double")
ncfname<-("D:/Rwork/Project/TrialEnsamble/TrialEnsamble.nc")
ncout<-nc_create(ncfname,list(tmp_def),force_v4=T)
ncvar_put(ncout,tmp_def,Ensamble1,start=NA,count = NA )# Think I need a nested loop here
ncatt_put(ncout,"Lon","axis","X")
ncatt_put(ncout, "Lat", "axis", "Y")
ncatt_put(ncout, "Time","axis", "T")
title<-c( 1:2 )
names(title)<-c("Ian","Gillespie")
title<-as.data.frame(title)
ncatt_put(ncout,0,"Make_NCDF4_File",1, prec="int")
ncatt_put(ncout,0,"Maynooth_University",1,prec="short")
ncatt_put(ncout,0,"AR000087828",1, prec="short")
ncatt_put(ncout,0,"mickymouse",1, prec="short")
history <- paste("P.J. Bartlein", date(), sep=", ")
ncatt_put(ncout,0,"description","this is the script to write NCDF4 files")
#Close file and write date to disk
##########################################################
nc_close(ncout)
}
Found it is better to prepare an empty array of four dimensions with the first 3 the same size and names dimensions as the array produced from the for loop with an additional fourth dimension. The forth dimension to hold the results of each iteration
dir("D:/Rwork/Project/Test")->xlab # This is the directory where the file for analysising are
filelist <- paste("Test/",dir("Test"),sep="")
output <- array(, dim=c(192,94,12,160))# need to change this to length(Lat), etc
N <- length(filelist) # Loop over the individual files
for(j in 1:N){
File<-nc_open(filelist[j])
print(filelist[j])
Temperature<-ncvar_get(File,"t2m")
Lat<-ncvar_get(File, "lat")
Lon<-ncvar_get(File,"lon")
Time<-ncvar_get(File,"time")
Year<-c(1851:2010)
EnsambleNo.<-ncvar_get(File,"ensemble_member")
Temperature
Lat<-round(Lat,digits = 2)
Lon<-round(Lon,digits = 2)
Ensamble1<-Temperature[,,1,] #The Ensamble wanted, 1 to 56
Ensamble1<-round(Ensamble1,digits = 1)
dimnames(Ensamble1)<-list(Lon,Lat,Time)
dimnames(output) <- list(Lon,Lat,Time,Year)
}
print(Ensamble1)

loop with rsource inside not working

I am using rsource in order to produce certain variables in this way:
set more off
local n 4
local i 100
rsource, terminator(END_OF_R) rpath(C:\Program Files\R\R-3.2.5\bin\R.exe) roptions(`" --vanilla --args "`i'" "`n'" "')
library(foreign)
trailargs <- commandArgs(trailingOnly=TRUE);
trailargs;
i<- as.numeric(trailargs[1])
n<- as.numeric(trailargs[2])
y = n-i
sample = data.frame(y)
path = paste("C:/Users/.../Desktop/sample","_",n,"_test.dta",sep="")
write.dta(sample, path)
END_OF_R
This is working, and i do get my sample_100_test.dta
However, i would like to produce a series of these datasets, in this way
set more off
local n 4
forvalues i = 1/10 {
rsource, terminator(END_OF_R) rpath(C:\Program Files\R\R-3.2.5\bin\R.exe) roptions(`" --vanilla --args "`i'" "`n'" "')
library(foreign)
trailargs <- commandArgs(trailingOnly=TRUE);
trailargs;
i<- as.numeric(trailargs[1])
n<- as.numeric(trailargs[2])
y = n-i
sample = data.frame(y)
path = paste("C:/Users/.../Desktop/sample","_",n,"_test.dta",sep="")
write.dta(sample, path)
END_OF_R
..... within loop
use sample","_",n,"_test.dta, clear
...estimation
erase sample","_",n,"_test.dta
}
And this is not working. What is the problem with this loop, why do i get error
"End of R output: command library is unrecognized"
Thanks for suggestions
This is the approach I would use, which is to increment i within R:
set more off
local n 100
local maxi 4
local seeds "1234 12345 123467"
rsource, terminator(END_OF_R) rpath(C:\Program Files\R\R-3.2.5\bin\R.exe) roptions(`" --vanilla --args "`n'" "`maxi'" "`seeds'" "')
library(foreign)
trailargs <- commandArgs(trailingOnly=TRUE);
trailargs;
n <- as.numeric(trailargs[1])
maxi <- as.numeric(trailargs[2])
seeds <- as.numeric(unlist(strsplit(trailargs[3], split=" ")))
print(seeds)
for(seed in seeds) {
set.seed(seed)
for(i in 1:maxi) {
y = n-i
sample = data.frame(y)
path = paste("C:/Users/.../Desktop/sample","_",i,"_test.dta",sep="")
write.dta(sample, path)
}
}
END_OF_R
As I said in my comments, this should be a lot more efficient than opening and closing R 10 times.

R2WinBUGS error in R

I'm trying to duplicate some code and am running into troubles with WinBUGS. The code was written in 2010 and I think that back then, the package was installed with additional files which R is now looking for and can't find (hence the error), but I'm not sure.
R stops trying to run #bugs.directory (see code) and the error is:
Error in file(con, "rb") : cannot open the connection
In addition: Warning message:
In file(con, "rb") :
cannot open file 'C:/Users/Hiwi/Documents/R/Win-library/3.0/R2winBUGS/System/Rsrc/Registry.odc': No such file or directory
Error in bugs.run(n.burnin, bugs.directory, WINE = WINE, useWINE = useWINE, :
WinBUGS executable does not exist in C:/Users/Hiwi/Documents/R/Win-library/3.0/R2winBUGS
I have the results of the analysis so if there is another way of conducting a Bayesian analysis for the "rawdata" file (in the 14 day model with [-3,0] event window) or if someone would PLEASE shed some light on what's wrong with the code, I would be forever grateful.
The code is:
rm(list=ls(all=TRUE))
setwd("C:/Users/Hiwi/Dropbox/Oracle/Oracle CD files/analysis/chapter6_a")
library(foreign)
rawdata <- read.dta("nyt.dta",convert.factors = F)
library(MASS)
summary(glm.nb(rawdata$num_events_14 ~ rawdata$nyt_num))
# WinBUGS code
library("R2WinBUGS")
nb.model <- function(){
for (i in 1:n){ # loop for all observations
# stochastic component
dv[i]~dnegbin( p[i], r)
# link and linear predictor
p[i] <- r/(r+lambda[i])
log(lambda[i] ) <- b[1] + b[2] * iv[i]
}
#
# prior distributions
r <- exp(logr)
logr ~ dnorm(0.0, 0.01)
b[1]~dnorm(0,0.001) # prior (please note: second element is 1/variance)
b[2]~dnorm(0,0.001) # prior
}
write.model(nb.model, "negativebinomial.bug")
n <- dim(rawdata)[1] # number of observations
winbug.data <- list(dv = rawdata$num_events_14,
iv = rawdata$nyt_num,
n=n)
winbug.inits <- function(){list(logr = 0 ,b=c(2.46,-.37)
)} # Ausgangswerte aus der Uniformverteilung zwischen -1 und 1
bug.erg <- bugs(data=winbug.data,
inits=winbug.inits,
#inits=NULL,
parameters.to.save = c("b","r"),
model.file="negativebinomial.bug",
n.chains=3, n.iter=10000, n.burnin=5000,
n.thin=1,
codaPkg=T,
debug=F,
#bugs.directory="C:/Users/Hiwi/Documents/R/Win-library/3.0/R2winBUGS/"
bugs.directory="C:/Users/Hiwi/Documents/R/Win-library/3.0/R2winBUGS"
)
tempdir()
setwd(tempdir())
file.rename("codaIndex.txt","simIndex.txt")
file.rename("coda1.txt","sim1.txt")
file.rename("coda2.txt","sim2.txt")
file.rename("coda3.txt","sim3.txt")
posterior <- rbind(read.coda("sim1.txt","simIndex.txt"),read.coda("sim2.txt","simIndex.txt"),read.coda("sim3.txt","simIndex.txt"))
post.df <- as.data.frame(posterior)
summary(post.df)
quantile(post.df[,2],probs=c(.025,.975))
quantile(post.df[,2],probs=c(.05,.95))
quantile(post.df[,2],probs=c(.10,.90))
tempdir()
Difficult to say for sure without sitting at your PC... Maybe it is something to do with R2WinBUGS looking in the wrong directory for WinBUGS.exe? You can point R2WinBUGS to the right place using the bugs.directory argument in the bugs function.
If not, try and install OpenBUGS and give R2OpenBUGS a go.

Resources