Related
I have been trying to obtain the time at which the content of an .xlsx file was created without any success so far. I can track the much-desired information on Windows either through File Properties -> Details -> Origin -> Content created, or by opening the Excel file and navigating to File -> Info -> Related Dates -> Created.
I was hoping that I would be able to obtain this information through openxlsx but while I am able to track down the creators by using the getCreators() function there does not appear to exist a similar function for the time.
I have also tried the file.info() function but it won't cut it as mtime, ctime, and atime all point to the time of the download.
Any help would be much appreciated!
I don't think openxlsx is going to do it for you, but you might want to submit a FR for them to add/extend file metadata availability. Here's something in a pinch, assuming that the XLSX file is in a newer zip-based format and not the previous binary format.
myfile <- "path/to/yourfile.xlsx"
docProps <- xml2::read_xml(unz(myfile, "docProps/core.xml"))
docProps
# {xml_document}
# <coreProperties xmlns:cp="http://schemas.openxmlformats.org/package/2006/metadata/core-properties" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:dcterms="http://purl.org/dc/terms/" xmlns:dcmitype="http://purl.org/dc/dcmitype/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
# [1] <dc:creator>r2</dc:creator>
# [2] <cp:lastModifiedBy>r2</cp:lastModifiedBy>
# [3] <dcterms:created xsi:type="dcterms:W3CDTF">2021-05-09T20:01:41Z</dcterms:created>
# [4] <dcterms:modified xsi:type="dcterms:W3CDTF">2021-05-10T00:14:14Z</dcterms:modified>
xml2::xml_text(xml2::xml_find_all(docProps, "dcterms:created"))
# [1] "2021-05-09T20:01:41Z"
It's a text file, so in a pinch you can look at it manually, but I recommend not trying to do regex on XML in general. (You could get away with it here, but it's still fraught with peril.)
I am wanting to clip a large shapefile (67MB) in program R and derive a much smaller raster from around ~5% of it. Once loaded the shapefile has 221388 features and 5 fields - and explodes to 746 MB.
My difficulty comes when trying to clip the file to a workable size - the program crashes after a few minutes. I have tried both crop (from raster) and gIntersection (from rgeos) without success. I have 8GB of RAM - clearly there is a memory issue.
I am guessing there maybe a work around. I know that there are some big memory packages out there - but can any of them help in my kind of situation? My current code is below:
# dataset can be found at
# http://data.fao.org/map?entryId=271096b2-8a12-4050-9ff2-27c0fec16c8f
# location of files
ogrListLayers("C:/Users/Me/Documents/PNG Glob")
# import shapefile
ogrDrivers()[10,]
# shapefiles
Glob<-readOGR("C:/Users/Me/Documents/PNG Glob", layer="png_gc_adg_1")
# assign projection
Glob#proj4string<- CRS("+proj=longlat")
#object size
object.size(Glob)
# clipping
crop(Glob, extent(c(144,146,-7,-5)))
As suggested by #Pascal, GDAL's ogr2ogr is useful for this. You can call it from R with system as follows (including on Windows), though this assumes that (1) you have a working GDAL installation and (2) the path to the GDAL binaries exists in your PATH environment variable:
Download and unzip the PNG shapefile:
download.file('http://www.fao.org/geonetwork/srv/en/resources.get?id=37162&fname=png_gc_adg.zip&access=private',
f <- tempfile(fileext='.zip'))
unzip(f, exdir=tempdir())
Call ogr2ogr with system from R to clip the PNG shapefile and save the resulting .shp to the working directory:
system(sprintf('ogr2ogr -clipsrc 144 -7 146 -5 png_clip.shp %s',
file.path(tempdir(), 'png_gc_adg_1.shp')))
On my system this took around 70 seconds, and memory usage didn't seem to increase by more than about 100MB. (I did get a lot of warnings of the likes of Warning 1: Value 138717513240 of field AREA of feature 0 not successfully written. Possibly due to too larger number with respect to field width - not sure what that's about.)
Load the clipped shapefile:
library(rgdal)
p <- readOGR('.', 'png_clip')
plot(p)
I have what I think is a common enough issue, on optimising workflow in R. Specifically, how can I avoid the common issue of having a folder full of output (plots, RData files, csv, etc.), without, after some time, having a clue where they came from or how they were produced? In part, it surely involves trying to be intelligent about folder structure. I have been looking around, but I'm unsure of what the best strategy is. So far, I have tackled it in a rather unsophisticated (overkill) way: I created a function metainfo (see below) that writes a text file with metadata, with a given file name. The idea is that if a plot is produced, this command is issued to produce a text file with exactly the same file name as the plot (except, of course, the extension), with information on the system, session, packages loaded, R version, function and file the metadata function was called from, etc. The questions are:
(i) How do people approach this general problem? Are there obvious ways to avoid the issue I mentioned?
(ii) If not, does anyone have any tips on improving this function? At the moment it's perhaps clunky and not ideal. Particularly, getting the file name from which the plot is produced doesn't necessarily work (the solution I use is one provided by #hadley in 1). Any ideas would be welcome!
The function assumes git, so please ignore the probable warning produced. This is the main function, stored in a file metainfo.R:
MetaInfo <- function(message=NULL, filename)
{
# message - character string - Any message to be written into the information
# file (e.g., data used).
# filename - character string - the name of the txt file (including relative
# path). Should be the same as the output file it describes (RData,
# csv, pdf).
#
if (is.null(filename))
{
stop('Provide an output filename - parameter filename.')
}
filename <- paste(filename, '.txt', sep='')
# Try to get as close as possible to getting the file name from which the
# function is called.
source.file <- lapply(sys.frames(), function(x) x$ofile)
source.file <- Filter(Negate(is.null), source.file)
t.sf <- try(source.file <- basename(source.file[[length(source.file)]]),
silent=TRUE)
if (class(t.sf) == 'try-error')
{
source.file <- NULL
}
func <- deparse(sys.call(-1))
# MetaInfo isn't always called from within another function, so func could
# return as NULL or as general environment.
if (any(grepl('eval', func, ignore.case=TRUE)))
{
func <- NULL
}
time <- strftime(Sys.time(), "%Y/%m/%d %H:%M:%S")
git.h <- system('git log --pretty=format:"%h" -n 1', intern=TRUE)
meta <- list(Message=message,
Source=paste(source.file, ' on ', time, sep=''),
Functions=func,
System=Sys.info(),
Session=sessionInfo(),
Git.hash=git.h)
sink(file=filename)
print(meta)
sink(file=NULL)
}
which can then be called in another function, stored in another file, e.g.:
source('metainfo.R')
RandomPlot <- function(x, y)
{
fn <- 'random_plot'
pdf(file=paste(fn, '.pdf', sep=''))
plot(x, y)
MetaInfo(message=NULL, filename=fn)
dev.off()
}
x <- 1:10
y <- runif(10)
RandomPlot(x, y)
This way, a text file with the same file name as the plot is produced, with information that could hopefully help figure out how and where the plot was produced.
In terms of general R organization: I like to have a single script that recreates all work done for a project. Any project should be reproducible with a single click, including all plots or papers associated with that project.
So, to stay organized: keep a different directory for each project, each project has its own functions.R script to store non-package functions associated with that project, and each project has a master script that starts like
## myproject
source("functions.R")
source("read-data.R")
source("clean-data.R")
etc... all the way through. This should help keep everything organized, and if you get new data you just go to early scripts to fix up headers or whatever and rerun the entire project with a single click.
There is a package called Project Template that helps organize and automate the typical workflow with R scripts, data files, charts, etc. There is also a number of helpful documents like this one Workflow of statistical data analysis by Oliver Kirchkamp.
If you use Emacs and ESS for your analyses, learning Org-Mode is a must. I use it to organize all my work. Here is how it integrates with R: R Source Code Blocks in Org Mode.
There is also this new free tool called Drake which is advertised as "make for data".
I think my question belies a certain level of confusion. Having looked around, as well as explored the suggestions provided so far, I have reached the conclusion that it is probably not important to know where and how a file is produced. You should in fact be able to wipe out any output, and reproduce it by rerunning code. So while I might still use the above function for extra information, it really is a question of being ruthless and indeed cleaning up folders every now and then. These ideas are more eloquently explained here. This of course does not preclude the use of Make/Drake or Project Template, which I will try to pick up on. Thanks again for the suggestions #noah and #alex!
There is also now an R package called drake (Data Frames in R for Make), independent from Factual's Drake. The R package is also a Make-like build system that links code/dependencies with output.
install.packages("drake") # It is on CRAN.
library(drake)
load_basic_example()
plot_graph(my_plan)
make(my_plan)
Like it's predecessor remake, it has the added bonus that you do not have to keep track of a cumbersome pile of files. Objects generated in R are cached during make() and can be reloaded easily.
readd(summ_regression1_small) # Read objects from the cache.
loadd(small, large) # Load objects into your R session.
print(small)
But you can still work with files as single-quoted targets. (See 'report.Rmd' and 'report.md' in my_plan from the basic example.)
There is package developed by RStudio called pins that might address this problem.
In the world of the R statistics package, rgl allows me to generate 3d plots that I can rotate with my mouse. Is there a way I can export these plots in a portable format, load them in a web browser or other third party tool and rotate them there? I Am especially interested in the web browser solution since this will allow me to share the plots on an internal wiki.
If rgl does not allow this, are there other libraries or strategies that would allow me to accomplish this?
You could try the vrmlgen package. It will produce 3d VRML files that can be displayed with a browser plugin; you can find a plugin at VRML Plugin and Browser Detector.
Once you've installed a plugin, try this:
require(vrmlgen)
example(bar3d)
NB: the example code didn't automatically open in a browser for me (RStudio, Win7, Chrome) because the path got mangled. You might need to use:
require(stringr)
browseURL(str_replace_all(file.path(outdir, 'barplot.html'), fixed('\\'), '/'))
If you don't want to install a VRML plugin, you could use X3DOM instead. You'll need a converter, but your users should be able to view them with just a (modern) browser. You might have to modify the following code to get the paths right:
setwd(outdir)
aopt <- 'C:/PROGRA~1/INSTAN~1/bin/aopt' # Path to conversion program
vrml <- 'barplot.wrl'
x3dom <- 'barx.html'
command <- paste(aopt, '-i', vrml, '-N', x3dom)
system(command)
# LOG Avalon Init: 47/616, V2.0.0 build: R-21023 Jan 12 2011
# LOG Avalon Read url
# LOG Avalon Read time: 0.074000
# ============================================
# Call: writeHTML with 1 param
# Write raw-data to barx.html as text/html
# WARNING Avalon Run NodeNameSpace "scene" destructor and _nodeCount == 3
# WARNING Avalon Try to remove nodes from parents
# WARNING Avalon PopupText without component, cannot unregister
# WARNING Avalon Avalon::exitSystem() call and node/obj left: 0/3331
browseURL(file.path(outdir, 'barx.html'))
setwd(curdir)
For a simple solution try this...
x <- sort(rnorm(1000))
y <- rnorm(1000)
z <- rnorm(1000) + atan2(x,y)
plot3d(x,y,z,
col=rainbow(1000),
type = "s",
size=1,
xlab = "x",
ylab = "y",
zlab = "z",
box=T)
# This writes a copy into temporary directory 'webGL', and then displays it
browseURL(paste("file://", writeWebGL(dir=file.path("C:/Your-Directory-Here/", "webGL"), width=700), sep=""))
open the index.html file in Firefox or similar browser that supports HTML5 and WebGL
Pete's suggestion is worth the bounty. The wrl-detour is not really necessary, it is rather easy to generate the xml-file with sprintf and friends.
The problem is speed: As a comparison, I had a color-code stomach MRI with 17000 spheres (for voxels), which was quite responsive on my screen with rgl.
When I ported it to x3dom, the system froze. A reduced set with 450 spheres works:
http://www.menne-biomed.de/uni/x3dsample.html
Browser support is inconsistent. Some of the samples on the x3dom example page work best with (believe it or not) Internet Explorer + Flash 11. Check the dynamic light example.
My example works, but looks flat on Firefox 7.0.1. Best is always Chrome.
Added later:
Here is another example:
Stomach3D as Zip
The x3d file contained in it can be displayed even with on-board graphics using the Instant Reality Viewer. The html file generated from it sometimes loads, but cannot be rotated.
For ultimate flexibility, I've had great luck using Processing. It was originally written in java, but has now been stably ported to javascript, and more experimentally to python and even a few others.
http://processingjs.org
http://processing.org
It uses the HTML5 <canvas> element to process your Processing code on-the-fly. You can either link to your visualization code in another file, or write it right in your html file (reminds me of Sweave!).
Also, there is a huge resource of open source examples online. For example:
http://openprocessing.org
Lastly, here is a gist I put together to demonstrate the basic setup. Just download the processing.js file into the same folder as the gist and open up your browser.
https://gist.github.com/1295842
It'll look like this:
A couple of million years ago (OK, 2005) I wrote R code to dump graphics primitives in Mathematica (!!) graphics format, which could then be embedded and viewed with the LiveGraphics3D Java plug-in. I haven't tried to use it in 6 years, but I could try to resurrect it if there were interest.
PS here are the results of help(package="LG3d"):
get.live.jar Download live.jar Java archive
LG.display Display Live3D graphics in a browser
LG.html.head header and footer files for LiveGraphics HTML
files
LGmobius Draw a 3D mobius strip
LG.open open and close LiveGraphics3D files
LG.plot.profiles Plot likelihood surface + profiles using
Live3D
LGtorus Draw a torus in LG graphics system
LGtoruswrap Utility functions for LGtorus
mma.brace Low-level graphics primitives for
LiveGraphics3D
mma.edge change edge style
mma.persp Output a perspective plot to a LiveGraphics3D
file
mma.point Medium-level graphics primitives for
LiveGraphics3D
mma.polygon draw a Mma/LG3d polygon
The rgl package now hast the rglwidget function, which is probably the cleanest and easiest method to create widgets of rgl plots.
I am trying to learn R and want to bring in an SPSS file, which I can open in SPSS.
I have tried using read.spss from foreign and spss.get from Hmisc. Both error messages are the same.
Here is my code:
## install.packages("Hmisc")
library(foreign)
## change the working directory
getwd()
setwd('C:/Documents and Settings/BTIBERT/Desktop/')
## load in the file
## ?read.spss
asq <- read.spss('ASQ2010.sav', to.data.frame=T)
And the resulting error:
Error in read.spss("ASQ2010.sav", to.data.frame = T) : error
reading system-file header In addition: Warning message: In
read.spss("ASQ2010.sav", to.data.frame = T) : ASQ2010.sav: position
0: character `\000' (
Also, I tried saving out the SPSS file as a SPSS 7 .sav file (was previously using SPSS 18).
Warning messages: 1: In read.spss("ASQ2010_test.sav", to.data.frame =
T) : ASQ2010_test.sav: Unrecognized record type 7, subtype 14
encountered in system file 2: In read.spss("ASQ2010_test.sav",
to.data.frame = T) : ASQ2010_test.sav: Unrecognized record type 7,
subtype 18 encountered in system file
I had a similar issue and solved it following a hint in read.spss help.
Using package memisc instead, you can import a portable SPSS file like this:
data <- as.data.set(spss.portable.file("filename.por"))
Similarly, for .sav files:
data <- as.data.set(spss.system.file('filename.sav'))
although in this case I seem to miss some string values, while the portable import works seamlessly. The help page for spss.portable.file claims:
The importer mechanism is more flexible and extensible than read.spss and read.dta of package "foreign", as most of the parsing of the file headers is done in R. They are also adapted to load efficiently large data sets. Most importantly, importer objects support the labels, missing.values, and descriptions, provided by this package.
The read.spss seems to be outdated a little bit, so I used package called memisc.
To get this to work do this:
install.packages("memisc")
data <- as.data.set(spss.system.file('yourfile.sav'))
You may also try this:
setwd("C:/Users/rest of your path")
library(haven)
data <- read_sav("data.sav")
and if you want to read all files from one folder:
temp <- list.files(pattern = "*.sav")
read.all <- sapply(temp, read_sav)
I know this post is old, but I also had problems loading a Qualtrics SPSS file into R. R's read.spss code came from PSPP a long time ago, and hasn't been updated in a while. (And Hmisc's code uses read.spss(), too, so no luck there.)
The good news is that PSPP 0.6.1 should read the files fine, as long as you specify a "String Width" of "Short - 255 (SPSS 12.0 and earlier)" on the "Download Data" page in Qualtrics. Read it into PSPP, save a new copy, and you should be in business. Awkward, but free.
,
You can read SPSS file from R using above solutions or the one you are currently using. Just make sure that the command is fed with the file, that it can read properly. I had same error and the problem was, SPSS could not access that file. You should make sure the file path is correct, file is accessible and it is in correct format.
library(foreign)
asq <- read.spss('ASQ2010.sav', to.data.frame=TRUE)
As far as warning message is concerned, It does not affect the data. The record type 7 is used to store features in newer SPSS software to make older SPSS software able to read new data. But does not affect data. I have used this numerous times and data is not lost.
You can also read about this at http://r.789695.n4.nabble.com/read-spss-warning-message-Unrecognized-record-type-7-subtype-18-encountered-in-system-file-td3000775.html#a3007945
It looks like the R read.spss implementation is incomplete or broken. R2.10.1 does better than R2.8.1, however. It appears that R gets upset about custom attributes in a sav file even with 2.10.1 (The latest I have). R also may not understand the character encoding field in the file, and in particular it probably does not work with SPSS Unicode files.
You might try opening the file in SPSS, deleting any custom attributes, and resaving the file.
You can see whether there are custom attributes with the SPSS command
display attributes.
If so, delete them (see VARIABLE ATTRIBUTE and DATAFILE ATTRIBUTE commands), and try again.
HTH,
Jon Peck
If you have access to SPSS, save file as .csv, hence import it with read.csv or read.table. I can't recall any problem with .sav file importing. So far it was working like a charm both with read.spss and spss.get. I reckon that spss.get will not give different results, since it depends on foreign::read.spss
Can you provide some info on SPSS/R/Hmisc/foreign version?
Another solution not mentioned here is to read SPSS data in R via ODBC. You need:
IBM SPSS Statistics Data File Driver. Standalone driver is enough.
Import SPSS data using RODBC package in R.
See the example here. However I have to admit that, there could be problems with very big data files.
For me it works well using memisc!
install.packages("memisc")
load('memisc')
Daten.Februar <-as.data.set(spss.system.file("NPS_Februar_15_Daten.sav"))
names(Daten.Februar)
I agree with #SDahm that the haven package would be the way to go. I myself have struggled a bit with string values when starting to use it, so I thought I'd share my approach on that here, too.
The "semantics" vignette has some useful information on this topic.
library(tidyverse)
library(haven)
# Some interesting information in here
vignette('semantics')
# Get data from spss file
df <- read_sav(path_to_file)
# get value labels
df <- map_df(.x = df, .f = function(x) {
if (class(x) == 'labelled') as_factor(x)
else x})
# get column names
colnames(df) <- map(.x = spss_file, .f = function(x) {attr(x, 'label')})
There is no such problem with packages you are using. The only requirement for read a spss file is to put the file into a PORTABLE format file. I mean, spss file have *.sav extension. You need to transform your spss file in a portable document that uses *.por extension.
There is more info in http://www.statmethods.net/input/importingdata.html
In my case this warning was combined with a appearance of a new variable before first column of my data with values -100, 2, 2, 2, ..., a shift in the correspondence between labels and values and the deletion of the last variable. A solution that worked was (using SPSS) to create a new dump variable in the last column of the file, fill it with random values and execute the following code:
(filename is the path to the sav file and in my case the original SPSS file had 62 columns, thus 63 with the additional dumb variable)
library(memisc)
data <- as.data.set(spss.system.file(filename))
copyofdata = data
for(i in 2:63){
names(data)[i] <- names(copyofdata)[i-1]
}
data[[1]] <- NULL
newcopyofdata = data
for(i in 2:62){
labels(data[[i]]) <- labels(newcopyofdata[[i-1]])
}
labels(data[[1]]) <- NULL
Hope the above code will help someone else.
Turn your UNICODE in SPSS off
Open SPSS without any data open and run the code below in your syntax editor
SET UNICODE OFF.
Open the data set and resave it to remove the Unicode
read.spss('yourdata.sav', to.data.frame=T) works correctly then
I just came came across an SPSS file that I couldn't get open using haven, foreign, or memisc, but readspss::read.por did the trick for me:
download.file("http://www.tcd.ie/Political_Science/elections/IMSgeneral92.zip",
"IMSgeneral92.zip")
unzip("IMSgeneral92.zip", exdir = "IMSgeneral92")
# rio, haven, foreign, memisc pkgs don't work on this file! But readspss does:
if(!require(readspss)) remotes::install_git("https://github.com/JanMarvin/readspss.git")
ims92 <- readspss::read.por("IMSgeneral92/IMS_Nov7 92.por", convert.factors = FALSE)
Nice! Thanks, #JanMarvin!
1)
I've found the program, stat-transfer, useful for importing spss and stata files into R.
It resolves the issue you mention by converting spss to R dataset. Also very useful for subsetting super large datasets into smaller portions consumable by R. Not free, but a very useful tool for working with datasets from different programs -- especially if you don't have access to them.
2)
Memisc package also has an spss function worth trying.