May go beyond a SAS question, but I recently ran into an issue where the batch environment will not deliver(email) the specific file(xls) due to a size restriction. Obviously I had to move to a flat file, but also noticed that a ".data" file is also the same size (was hoping it would be smaller). Are there smaller file formats that exist (that SAS will support)?
Depending on your version of SAS and the modules you have installed, you may be able to export to .xlsx (compressed) instead of .xls, or create a zip file containing the .xls instead. Try googling for libname xlsx for more details.
Related
There is no version control at my work (we have an outdated, centralized system with sensitive patient information, so we can't save things outside of it). When I save a .RData file from a script, I would like to be able to save the exact version of that .R file with it at that time. Is there a way to do this?
E.g. if I have an R script "run_analysis.R" that has the line
save(data,file='foo.RData')
Is there a way I can do something like
save(data,run_analysis.R,file='foo.RData')
so that if I pull up the data file a year later I'll know exactly what code was used to create it?
you could zip the foo.RData file together with the run_analysis.R file and store the zipped file.
the CRAN package [zip] (https://cran.r-project.org/web/packages/zip/zip.pdf) can be used to create the zip file from within r.
I am writing SPSS .sav files from R using the package haven, which works very well for me in general. However I have noticed that the .sav file size written on disk using write_sav() seems to be much bigger than nescessary. Whenever I open and save a .sav file written by write_sav() in SPSS, the file size is reduced by a factor of up to ~10!
This matters to me as I am writing rather big data to SPSS for others and sometimes SPSS refuses to open a very big file. Maybe this would problem would not arise if write_sav() would store more efficiently in a "real" native SPSS way?
Does anyone know this issue and maybe has a helpful comment on it?
SPSS installation is needed to replicate this issue
It's not clear from the Haven write_sav() documentation, but it sounds like it is saving them as uncompressed .sav files. The default for (most) SPSS installations would be to save as compressed files. SPSS has an extra compression option of 'zCompressed' which will produce even smaller files but these generally can't be opened outside of SPSS.
You can experiment with this like so;
Save outfile = 'Uncompressed file.sav'
/UnCompressed.
Save outfile = 'Compressed file.sav'
/Compressed.
Save outfile = 'ZCompressed file.zsav'
/ZCompressed.
Note the .zsav file extension isn't necessary (could be .sav) but it's considered best practice to use this to make it clear where compatibility might be an issue.
See https://www.ibm.com/support/knowledgecenter/en/SSLVMB_21.0.0/com.ibm.spss.statistics.help/syn_save_compressed_uncompressed.htm for more info.
What form does your actual data take? Is is Codepage or Unicode; and what is Haven doing? Since SPSS 16.0 and the introduction of the UNICODE setting, there has been a tripling of string field widths when converting from Codepage to Unicode. This is a pain best suffered only once. Get your data to unicode and then stay there.
See https://www.ibm.com/support/knowledgecenter/SSLVMB_26.0.0/statistics_reference_project_ddita/spss/base/syn_set_unicode.html for more information.
If the output size is a problem, you could have a look at my package readspss. Using compression and zsav you should be able to get the best available compression. Compression in sav files depends on how the file is written. SPSS has different compression methods to store numeric information. Numerics can be stored only as doubles (no compression) or in a mix of doubles and int8_t (compression 1). Zsav used zlib to compress whatever the initial input was (compression 2). Eight integers take the size of a double hence the difference in the file size.
There are three variants of the SPSS (.sav) file format:
Uncompressed (.sav). This is haven's default output, but is rarely used in my experience.
Compressed (.sav). This is what most people use, and it has been the default save format for SPSS for many, many years.
Zcompressed (.zsav, but sometimes .sav). Added a few years ago to SPSS, but doesn't seem used much. You can get this from haven by adding compress=TRUE to write_spss()
I have submitted a pull request to make the compressed (2) format the default.
I want to save my code in R. I did:
save(Data,file="Code_Data.R")
When I open the file in R again, the code looks like hieroglyphics.
How can I save the code in a way, that I can read the code in an editor or RStudio again?
save outputs a binary copy of the objects you tell it to save, not R code. Because you are naming this file with a ".R" extension, RStudio is blindly trying to open this binary file as R code, and you are seeing the results of that mess.
Technically, the R language doesn't care what the extension of the file is. As long as you know that the file contains, you can load it back in with the command load("Code_Data.R"). However, if you want to get RStudio to recognize that this is actually a file containing binary data and not R code, try saving the file with the canonical ".RData" extension:
save(Data, file="Code_Data.RData")
Using the ".RData" extension will also help you and other programmers who look at your code avoid this confusion in the future.
I get a lot of datasets that arrive as .dat files with syntax files for converting to SPSS (.sps). I'm an R user, so I need to convert the .dat file into a .sav that R can read.
In the past, I've used PSPP to do this manually. (I can't afford SPSS!) But I'd MUCH prefer a programmatic solution.
I thought pspp-convert would do the trick, but there's something I'm not understanding about how that works in terms of inputting the syntax file:
My files are:
data.dat
data.sps (which correctly points to data.dat)
I tried
pspp-convert data.sps data.sav
But get
`data.sps' is not a system or portable file.
Makes sense since the input is supposed to be a portable file. Am I trying to do something beyond the scope of this CLI?
Generally speaking, there MUST be some way to apply an SPS file to a DAT file to get a SAV file (or any other portable file) back, right?
From an SPSS Statistics point of view, a .dat file extension most often means the data is in a fixed ASCII text format. You would need the accompanying codebook to tell you what variables to read and in what formats. The SPSS Statistics command syntax file (.sps) does this for you. But this file is simply the list of SPSS Statistics commands used to read the ASCII data. It is not a data file itself.
Elsewhere you've referenced these files as "portable files". An SPSS Statistics portable file (.por) is a very special case of an ASCII file; structured to be read and written by SPSS Statistics. In any case, if your preferred tool takes an SPSS Statistics portable file (.por), these *.dat files likely aren't it.
Assuming these *.dat files are fixed ASCII text files, you'll need to discern how the information therein is stored and then use a likely tool for reading ASCII text.
Plain and simple: is there a way to read (not run) .sas files on osx in order to rewrite old SAS programs in another language, e.g. R? Note I do not refer to reading sas data files – I know there are several ways, I am just interested in reading old SAS code.
.sas files containing SAS code should just be a text file. You can use any text editor that you like to open and modify these files. Since the system probably doesn't have .sas files associated with any particular program you can either use the "Open with" option when "right-clicking" on the file or you could open the editor first and then open the file from within the editor.
TextEdit will work. Another editor that I like is Komodo Edit.