reading data from spss file and using stat package - r

I usually use spss but needed to use a special stats package in R for the same dataset. I have found ways to read my spss data into r but I need to analyze this data with a statistical package.
The code I have to read the data is:
>RED.data<-read.spss("RED.sav", use.value.labels=TRUE, to.data.frame=TRUE)
The code for the stats package:
>library(twang)
>data(DTA)
>set.seed(1)
>mnps.RED<-mnps(treat~illact+crimjust+subprob, data=RED, estimand="ATT", verbose=FALSE, stop.method=c("es.mean", "ks.mean"), n.trees=3000)
I know I am missing a step between these but can't figure this out so far. I am confused about the formats of the data and their usage. What is the difference between the codes below and how can I use them?
>RED=read.csv("RED.csv")
>attach(RED)
>data(RED)
Thanks for your help!

Why not use the SPSS Statistics R integration apis for this? You would read the data into Statistics and then run R code like this.
begin program r.
dta = spssdata.GetDataFromSPSS()
mnps.RED<-mnps(treat~illact+crimjust+subprob, data=data, estimand...)
print(mnps.RED) # or whateverd else you need from the output.
end program.
You need to install the appropriate version of R and the free R Essentials for your version of Statistics to make this work. More details can be provided if you indicate what version of Statistics and what platform you are using.

Thanks for the help, JKP and Laterow. I'm using R 3.2.3 and SPSS 23.
Also, I made a mistake in the code in the original question. Here is the code again:
This is what I found for reading SPSS data into R :
require(foreign)
RED.data<-read.spss("RED.sav", use.value.labels=TRUE, to.data.frame=TRUE)
but I am not sure it is necessary to do this if I want to analyze the data with the "twang" package code below:
library(twang)
data(RED)
set.seed(1)
mnps.RED<-mnps(treat~illact+crimjust+subprob, data=RED, estimand="ATT", verbose=FALSE, stop.method=c("es.mean", "ks.mean"), n.trees=3000)
JKP, I'm a little confused about the code you provided. Do I need to enter this into SPSS syntax only?

Related

getGEO R Program

Good Afternoon Everyone,
I have been using R Program, Bioconductor, and GEO Query to analyze some NCBI GEO Microarray Datasets for a few months now.
The current R Program Versions I have been using have been Version 3.4.1 and Version 3.4.4. The Bioconductor Version that has been available for use for this R Program versions on my personal computer and laboratory computer is Bioconductor 3.6.
I have been experiencing a problem with using the "getGEO" command function in the R Program. Whenever I load all of the required packages needed, update all of the sources according to the request of the R program, and then try to use getGEO to upload the microarray expression set (both GDS and GSE Series Matrix/SOFT Family Files), the R Program responds back saying that the "getGEO function" cannot be found.
This problem started yesterday evening and has never happened to me while using GEOQuery, Bioconductor, or R Program.
I have typed out my coding below:
source("http://www.bioconductor.org/biocLite.R")
biocLite()
biocLite("GEOquery")
library(GEOQuery)
getPackageIfNeeded <- function(pkg){if(!require(pkg,character.only = TRUE))biocLite(pkgs=pkg)}
sapply(pkgs,getPackageIfNeeded)
library(Biobase)
library(limma)
library(affy)
gset <- getGEO("GSE69033", GSEMatrix =TRUE)
gse <- getGEO("GSE69033", GSEMatrix =TRUE)
Questions:
Was there some kind of program update with R Program to where the getGEO function does not work anymore with Bioconductor or GEOQuery?
Is there something wrong with my scripting or coding? Is there something missing or is there something that is not supposed to be there?
Is there another way to upload NCBI GEO Microarray Datasets into R Program without the getGEO function?
Do any of you recommend any Gene Expression Analysis Techniques from MATLAB or Python? If so, could you mention a link to a coding script for either of those software programs?
Any help or direction in regards to my questions would be greatly appreciated. Thank you so much and I am very glad to be a part of this forum.

Downloading CSV_GDX_tools.exe package

I have to work with GAMS and R to extract data however I am a new R user and never have used GAMS before. I need to download a package called CSV_GDX_tools.exe and I have no idea what that is...
When I try to install it in R, I get this error message:
Warning in install.packages :
package ‘CSV_GDX_tools.exe’ is not available (for R version 3.3.2)
Can anyone please help me how and where I can download the package?
First, that does not sound like an R-package, but rather like an outdated utility program for GAMS.
I say outdated, because GAMS now has built-in the functionality to convert GDX (GAMS data files) to CSV files, which can be read by any statistics program including R.
GAMS also gives you the option of exporting your data to an SQLite database file (.db), which can be read by R.
Have a look here:
https://www.gams.com/help/index.jsp?topic=%2Fgams.doc%2Fuserguides%2Fmccarl%2Fgdx_utilities.htm

Read Stata 13 file in R

Is there a way to read a Stata version 13 dataset file in R?
I have tried to do the following:
> library(foreign)
> data = read.dta("TEAdataSTATA.dta")
However, I got an error:
Error in read.dta("TEAdataSTATA.dta") :
not a Stata version 5-12 .dta file
Could someone point out if there is a way to fix this?
There is a new package to import Stata 13 files into a data.frame in R.
Install the package and read a Stata 13 dataset with read.dta13():
install.packages("readstata13")
library(readstata13)
dat <- read.dta13("TEAdataSTATA.dta")
Update: readstata13 imports in version 0.8 also files from Stata 6 to 14
More about the package: https://github.com/sjewo/readstata13
There's a new package called Haven, by Hadley Wickham, which can load Stata 13 dta files (as well as SAS and SPSS files)
library(haven) # haven package now available on cran
df <- read_dta('c:/somefile.dta')
See: https://github.com/hadley/haven
If you have Stata 13, then you can load it there and save it as a Stata 12 format using the command saveold (see help saveold). Afterwards, take it to R.
If you have, Stata 10 - 12, you can use the user-written command use13, (by Sergiy Radyakin) to load it and save it there; then to R. You can install use13 running ssc install use13.
Details can be found at http://radyakin.org/transfer/use13/use13.htm
Other alternatives, still with Stata, involve exporting the Stata format to something else that R will read, e.g. text-based files. See help export within Stata.
Update
Starting Stata 14, saveold has a version() option, allowing one to save in Stata .dta formats as old as Stata 11.
In the meanwhile savespss command became a member of the SSC archive and can be installed to Stata with: findit savespss
The homepage http://www.radyakin.org/transfer/savespss/savespss.htm continues to work, but the program should be installed from the SSC now, not from the beta location.
I am not familiar with the current state of R programs regarding their ability
to read other file formats, but if someone doesn't have Stata installed on their computer and R cannot read a specific version of Stata's dta files, Pandas in Python can now do the vast majority of such conversions.
Basically, the data from the dta file are first loaded using the pandas.read_stata function. As of version 0.23.0, the supported encoding and formats can be found in a related answer of mine.
Then one can either save the data as a csv file and import them
using standard R functions, or instead use the pandas.DataFrame.to_feather function, which exports the data using a serialization format built on Apache Arrow. The latter has extensive support in R as it was conceived to promote interoperability with Pandas.
I had the same problem. Tried read.dta13, read.dta but nothing worked. Then tried the easiest and least expected: MS Excel! It opened marvelously. I saved it as a .csv and used in R!!! Hope this helps!!!!

Specify my dataset as working dataset

I am a newbie to R.
I can successfully load my dataset into R-Studio, and I can see my dataset in the workspace.
When I run the command summary(mydataset), I get the expected summary of all my variables.
However, when I run
data(mydataset)
I get the following warning message:
In data(mydataset) : data set ‘mydataset’ not found
I need to run the data() command as recommended in the fitLogRegModel() command, which is part of the PredictABEL package.
Does anybody have a hint on how I can specify mydataset as working dataset?
You don't need to use the data command. You can just pass your data to the function
riskmodel <- fitLogRegModel(data=mydataset, cOutcome=2,
cNonGenPreds=3:10, cNonGenPredsCat=6:8,
cGenPreds=c(11, 13:16), cGenPredsCat=0)
The example uses data(ExampleData) so that it can make data that is in the package available to you. Since you already have your data, you don't need to load it.
An alternative, although it has its drawbacks, is to use attach(mydataset). You can then refer to variables without a mydatdataset$ prefix. The main drawback, as far as I know (although I'd welcome the views of more expert R users) is that if you modify a variable after attaching, it is then not part of the dataset. This can cause confusion and lead to "gotchas". In any case, many expert R users counsel against the use of attach.

Using the "foreign" package in R

I need to import a STATA data set into R and I have downloaded the "foreign" package. Could you please tell me the steps to "load" the package into R and the steps to import the STATA dataset?
R helplist style answer: RTFM!
Statalist style answer: save your Stata file as usual. In R, type
help(package="foreign")
to find out what the commands are. The ones pertaining to Stata would have .dta in them, as .dta is Stata data file extension. read.dta(file="path/name.dta") should work on most occasions. If it does not, try saving your file from Stata as an old version (saveold filename.dta, replace).
BTW, it is Stata, not STATA. It's not an acronym, unlike SAS or SPSS... so you don't have to YELL.
P.S. As DWin correctly pointed out, you need to load the package:
library(foreign)
I assumed that since you seem to know R, remembering that won't be an issue.
It rather depends what you mean by "downloaded". You should not need to download anything, since 'foreign' is included in the standard R installation along with 'base', 'stats', 'utils', 'Matrix', and a few others like 'grDevices'. Whether or not you have already installed the 'foreign' package (unnecessarily) using one of the GUI commands, all you should need to do is:
library(foreign)
?read.dta # and run the example
I just had to deal with the same issue therefore the code:
library(foreign)
setwd(your working directory)
Please note that you have to set the working directory so that R knows where to look for your Stata dta dataset
And last the code:
read.dta("name of the dataset .dta")
A video for that topic:
https://www.youtube.com/watch?v=tCkCz4cu918

Resources