I wanted to do some simple image comparisons with R (the reason I am not using python is that the workflow is in R). I tried to search for ms-ssim implementations in packages in R, but did find any except for spatialcompare::msssim. However, as I mentioned in my post yesterday, I figured out that the result of this function might not be correct for my input (could be related to matrix to raster transformation?). Are there any more suggestions for appropriate ms-ssim codes? I actually self-implemented one based on SpatialPack::SSIM because it seems easy to just downsample the image again and again, but am not sure if I am writing it correctly. I'll put it as an answer.
My demo code for a very simple ms-ssim:
calcMSSSIM<-function(Mat1,Mat2){
weight = c(0.0448, 0.2856, 0.3001, 0.2363, 0.1333)
level<-5
array<-c()
ssim<-SpatialPack::SSIM(Mat1, Mat2, alpha=1, beta=1, gamma=1,
eps=c(0.01,0.03), L=max(tmp1,tmp2))$SSIM
array[1]<-ssim$comps["contrast"]*ssim$comps["structure"]
result<-abs(array[1])^weight[1]
for (i in 2:level) {
tmp1 <- EBImage::resize(tmp1, w=dim(tmp1)[1]/2, filter="none")
tmp2 <- EBImage::resize(tmp2, w=dim(tmp2)[1]/2, filter="none")
ssim <- SpatialPack::SSIM(tmp1,tmp2,alpha=1, beta=1, gamma=1,
eps=c(0.01,0.03), L=max(tmp1,tmp2))$SSIM
array[i]<-ssim$comps["contrast"]*ssim$comps["structure"]
result <- result*abs(array[i])^weight[i]
}
lum<-ssim$comps["luminance"]
result<-result*lum^weight[i]
return(list(array,result))
}
Is it correct?
Using expss package I am creating cross tabs by reading SPSS files in R. This actually works perfectly but the process takes lots of time to load. I have a folder which contains various SPSS files(usually 3 files only) and through R script I am fetching the last modified file among the three.
setwd('/file/path/for/this/file/SPSS')
library(expss)
expss_output_viewer()
#get all .sav files
all_sav <- list.files(pattern ='\\.sav$')
#use file.info to get the index of the file most recently modified
pass<-all_sav[with(file.info(all_sav), which.max(mtime))]
mydata = read_spss(pass,reencode = TRUE) # read SPSS file mydata
w <- data.frame(mydata)
args <- commandArgs(TRUE)
Everything is perfect and works absolutely fine but it generally takes too much time to load large files(112MB,48MB for e.g) which isn't good.
Is there a way I can make it more time-efficient and takes less time to create the table. The dropdowns are created using PHP.
I have searched for this and found another library called 'haven' but I am not sure whether that can give me significance as well. Can anyone help me with this? I would really appreciate that. Thanks in advance.
As written in the expss vignette (https://cran.r-project.org/web/packages/expss/vignettes/labels-support.html) you can use in the following way:
# we need to load packages strictly in this order to avoid conflicts
library(haven)
library(expss)
spss_data = haven::read_spss("spss_file.sav")
# add missing 'labelled' class
spss_data = add_labelled_class(spss_data)
came across this seemingly new package - skimr, which looks pretty nifty, and was trying it out and looks like I'm missing some package installation. Skim works fine except that it doesn't print the histogram, it is supposed to print for numeric variables. I am merely trying the examples given in the documentation.
Link to skimr documentation here - https://github.com/ropenscilabs/skimr#skimr
this is the code I'm using
devtools::install_github("hadley/colformat")
devtools::install_github("ropenscilabs/skimr")
library(skimr)
a<-skim(mtcars)
dim(a)
View(a)
instead of histograms being printed, I see some ASCII/unicode characters .
A solution that can be used to workaround the above problem is to set the locale of the R system to Chinese and to set the font of the R console to NSimSun.
temp <- tempfile()
cat("font = NSimSun\n", file = temp, append = TRUE)
loadRconsole(file = temp)
Sys.setlocale( locale='Chinese' )
library(skimr)
(a <- skim(mtcars))
View(a)
In RStudio this solution works only partially. Histograms generated by skim can be visualized only using View after setting the locale of R to Chinese
Sys.setlocale( locale='Chinese' )
library(skimr)
a <- skim(mtcars)
View(a)
Hope this can help you.
I have what I think is a common enough issue, on optimising workflow in R. Specifically, how can I avoid the common issue of having a folder full of output (plots, RData files, csv, etc.), without, after some time, having a clue where they came from or how they were produced? In part, it surely involves trying to be intelligent about folder structure. I have been looking around, but I'm unsure of what the best strategy is. So far, I have tackled it in a rather unsophisticated (overkill) way: I created a function metainfo (see below) that writes a text file with metadata, with a given file name. The idea is that if a plot is produced, this command is issued to produce a text file with exactly the same file name as the plot (except, of course, the extension), with information on the system, session, packages loaded, R version, function and file the metadata function was called from, etc. The questions are:
(i) How do people approach this general problem? Are there obvious ways to avoid the issue I mentioned?
(ii) If not, does anyone have any tips on improving this function? At the moment it's perhaps clunky and not ideal. Particularly, getting the file name from which the plot is produced doesn't necessarily work (the solution I use is one provided by #hadley in 1). Any ideas would be welcome!
The function assumes git, so please ignore the probable warning produced. This is the main function, stored in a file metainfo.R:
MetaInfo <- function(message=NULL, filename)
{
# message - character string - Any message to be written into the information
# file (e.g., data used).
# filename - character string - the name of the txt file (including relative
# path). Should be the same as the output file it describes (RData,
# csv, pdf).
#
if (is.null(filename))
{
stop('Provide an output filename - parameter filename.')
}
filename <- paste(filename, '.txt', sep='')
# Try to get as close as possible to getting the file name from which the
# function is called.
source.file <- lapply(sys.frames(), function(x) x$ofile)
source.file <- Filter(Negate(is.null), source.file)
t.sf <- try(source.file <- basename(source.file[[length(source.file)]]),
silent=TRUE)
if (class(t.sf) == 'try-error')
{
source.file <- NULL
}
func <- deparse(sys.call(-1))
# MetaInfo isn't always called from within another function, so func could
# return as NULL or as general environment.
if (any(grepl('eval', func, ignore.case=TRUE)))
{
func <- NULL
}
time <- strftime(Sys.time(), "%Y/%m/%d %H:%M:%S")
git.h <- system('git log --pretty=format:"%h" -n 1', intern=TRUE)
meta <- list(Message=message,
Source=paste(source.file, ' on ', time, sep=''),
Functions=func,
System=Sys.info(),
Session=sessionInfo(),
Git.hash=git.h)
sink(file=filename)
print(meta)
sink(file=NULL)
}
which can then be called in another function, stored in another file, e.g.:
source('metainfo.R')
RandomPlot <- function(x, y)
{
fn <- 'random_plot'
pdf(file=paste(fn, '.pdf', sep=''))
plot(x, y)
MetaInfo(message=NULL, filename=fn)
dev.off()
}
x <- 1:10
y <- runif(10)
RandomPlot(x, y)
This way, a text file with the same file name as the plot is produced, with information that could hopefully help figure out how and where the plot was produced.
In terms of general R organization: I like to have a single script that recreates all work done for a project. Any project should be reproducible with a single click, including all plots or papers associated with that project.
So, to stay organized: keep a different directory for each project, each project has its own functions.R script to store non-package functions associated with that project, and each project has a master script that starts like
## myproject
source("functions.R")
source("read-data.R")
source("clean-data.R")
etc... all the way through. This should help keep everything organized, and if you get new data you just go to early scripts to fix up headers or whatever and rerun the entire project with a single click.
There is a package called Project Template that helps organize and automate the typical workflow with R scripts, data files, charts, etc. There is also a number of helpful documents like this one Workflow of statistical data analysis by Oliver Kirchkamp.
If you use Emacs and ESS for your analyses, learning Org-Mode is a must. I use it to organize all my work. Here is how it integrates with R: R Source Code Blocks in Org Mode.
There is also this new free tool called Drake which is advertised as "make for data".
I think my question belies a certain level of confusion. Having looked around, as well as explored the suggestions provided so far, I have reached the conclusion that it is probably not important to know where and how a file is produced. You should in fact be able to wipe out any output, and reproduce it by rerunning code. So while I might still use the above function for extra information, it really is a question of being ruthless and indeed cleaning up folders every now and then. These ideas are more eloquently explained here. This of course does not preclude the use of Make/Drake or Project Template, which I will try to pick up on. Thanks again for the suggestions #noah and #alex!
There is also now an R package called drake (Data Frames in R for Make), independent from Factual's Drake. The R package is also a Make-like build system that links code/dependencies with output.
install.packages("drake") # It is on CRAN.
library(drake)
load_basic_example()
plot_graph(my_plan)
make(my_plan)
Like it's predecessor remake, it has the added bonus that you do not have to keep track of a cumbersome pile of files. Objects generated in R are cached during make() and can be reloaded easily.
readd(summ_regression1_small) # Read objects from the cache.
loadd(small, large) # Load objects into your R session.
print(small)
But you can still work with files as single-quoted targets. (See 'report.Rmd' and 'report.md' in my_plan from the basic example.)
There is package developed by RStudio called pins that might address this problem.
I am still new to R and I have searched around for a solution to my simple question, but I haven't found an answer that I've been able to get to work. I am looking to use a previously identified variable per data set, here variable=SNPname to include in script for automated generation of graph output in png format.
I am using this to generate a kmeans plot and have:
(cl <- kmeans(FilteredData[,6:7], 5, nstart=25))
png("C:/temp/$SNPnamegraph1.png") #SNPname to include in filename
plot(FilteredData[,6:7], col=cl$cluster)
points(cl$centers, col=1:5, pch=8)
dev.off()
where I want to include that variable in line 2 at the beginning of the file name. Is there a simple way to do this that I am just missing?
Close, you're just missing the use of paste() and setwd()
setwd("C:/temp/") # use this to set where you want things saved
...
c1 <- kmeans...
png(paste(SNPname, " graph1.png", sep=""))
...
If it's in a loop of some kind, you might need to use SNPname[loop_var]