R read.spss error importing SPSS .por file - "Bad character in time" - r

I'm trying to import the NYPD stop-and-frisk data into R. The data is in SPSS .por files at http://www.nyc.gov/html/nypd/downloads/zip/analysis_and_planning/YYYY.zip
where YYYY is a year from 2003 to 2012
Most of the files load fine, but the 2004, 2007, and 2008 files all give me this error:
> library(foreign)
> mydata= read.spss("2004.por", to.data.frame=TRUE)
Error in read.spss("2004.por", to.data.frame = TRUE) :
error reading portable-file dictionary
In addition: Warning message:
In read.spss("2004.por", to.data.frame = TRUE) : Bad character in time
Execution halted
Any suggestions on how to debug this? I realize that read.spss does not support the latest SPSS versions, but given that most of the files (7 out of 10) import properly I wonder whether it's something more subtle.
psppire loads all the files without complaint, but the data looks corrupted, with some fields seemingly combined with others, and binary data in some of the fields.

I had some success using memisc as recommended in Read SPSS file into R. Namely, after installing memisc:
> install.packages('memisc')
You can read the data rather easily:
> library(memisc)
> data <- as.data.set(spss.portable.file('2004.por'))
While I haven't thoroughly inspected the data, it appears on first glance to be right.

Related

unable to open .dat files on R even with haven installed

So I use SGA tools for processing my images. It gives back results in .dat files. Now in order to work on this data in R, I tried to import the .dat file using the haven package. I installed haven and then its library, but I am not able to import data still and it gives this error message.
Error: Failed to parse C:/Users/QuRana/Desktop/SGA Tools/Plate_Image_Example (1).dat: This version of the file format is not supported.
When I use this command install.packages("haven"), haven is loaded, but then when I load library using library(haven) nothing appears on my console except for this
> library(haven)
Then when I use this code:
datatrial1 <- read_dta("C:/Users/QuRana/Desktop/SGA Tools/Plate_Image_Example (1).dat")
It gives me the error mentioned above. When I try converting my .dat file to a .csv file and load my data, the imported data adds additional "t" values before the values in columns except for the first one like this:
Flags: S - Colony spill or edge interference C - Low colony circularity
# row\tcol\tsize\tcircularity\tflags
1\t1\t4355\t0.9053\t
1\t2\t4456\t0.8401\t
1\t3\t3439\t0.8219\t
1\t4\t3215\t0.8707\t
All the t's before the numeric values are not what I want. Another issue that I am facing is I cannot install the gitter package on my R version which is R 4.2.2.
You can read your tab separated file like so `read.delim("file_path", header = TRUE, sep = "\t")

Why is my .sav data file being picked up on by read.spss function in R?

I'm trying to load a .sav dataset into R. But I am getting the following error:
Error in read.spss(file.choose(), to.data.frame = TRUE) :
file 'C:\Users\H-Zah\OneDrive\Documents\UniQ\R work\Hamzah.sav' is not in any supported SPSS format
Any there any reasons why this might be happening?

How to import SPSS sav file with variable values, labels, role and measurement level (measure)?

I am doing a personal scientific project and I am using an SPSS file out of its purpose, to calculate some tables. I successfully did it in C#, using spss .net library, already. However, I would like it to be easily and freely accessible by using R script.
WHAT I NEED TO DO
I am new in R. I would like to import a huge spss file with its actual data, variable values, labels, role and measurement level (measure).
CURRENT SETTING
I am using Microsoft r in MS Visual Studio (R Tools for Visual Studio). So far I installed expss and lattice packages.
WHAT DID I DO
By internet search I found 2 ways of doing it:
dataframe
dosya<-file.choose()
data <- read.spss(dosya, to.data.frame = TRUE)
and
dataset
dosya <- file.choose()
df <- as.data.set(dosya)
data.label.table <- attr(dosya, "label.table")
missings <- attr(dosya, "missings")
(found in What is the best way to import spss file in R with value labels?)
I failed to reach my aim on both. I was hopeful about the second but I got some error messages:
Error in (function (classes, fdef, mtable) :
unable to find an inherited method for function as.data.set' for signature "character"'
You have installed expss package but calling the read.spss function from foreign package. That's why it is giving you the error. If you are using expss package then use
library(expss)
dosya<-file.choose()
data <- read_spss(dosya, reencode = TRUE)
Otherwise, you can install and use foreign to do this like
install.packages("foreign")
library(foreign)
dosya<-file.choose()
data <- read.spss(dosya, to.data.frame = TRUE)
This is the easiest way I found after extensive search:
library(foreign)
data <- read.spss('C:/Users/mmr1/datafile/data 2017 project/hand tumors 2020.sav', use.value.labels = TRUE, to.data.frame = T)
I usually like to use the sjPlot package to view all the names, value labels, etc. in the viewer in RStudio.
If you import the data according to the answer by #Bappa Das, you could then run the following to bring up a nice html table with that information.
library(sjPlot)
view_df(data)
Here is good article on dealing with SPSS labels in R if you haven't seen it - https://martinctc.github.io/blog/working-with-spss-labels-in-r/

How does lazydata loading work in R package installation?

I want to expose data that is already published in my data/ directory of my R package skeleton. See this link for "External data" sharing basics: http://r-pkgs.had.co.nz/data.html.
My data is stored in .txt format. If you didn't want to load the data via lazy loading, which would just be loaded by loading the R package require(myRpackage) and then doing data(datasetName)... you can read the data in normally using many of the read.table(), read.csv2() functions in base R.
My dataset is called "publishedData.txt" in this example, and can be loaded as below, which works beautifully:
tmp = read.table("/dir/to/R/package/data/publishedData.txt", sep="\t", header=TRUE)
However, when I go to re-install my R package with this new shiny & wonderful data, I get the following fail message, over and over (see pasted below).
Downloading GitHub repo myGitRepo/myRpackage#master
from URL https://api.github.com/repos/myGitRepo/myRpackage/zipball/master
Installing myRpackage
library='/Library/Frameworks/R.framework/Versions/3.5/Resources/library' --
install-tests
* installing *source* package ‘myRpackage’ ...
** R
** data
*** moving datasets to lazyload DB
Error in scan(file = file, what = what, sep = sep, quote = quote, dec =
dec, :
line 1 did not have 215 elements
ERROR: lazydata failed for package ‘myRpackage’
* removing
‘/Library/Frameworks/R.framework/Versions/3.5/Resources/library/myRpackage’
Installation failed: Command failed (1)
Note, the above Github repo isn't real. I'm writing a generic post, so don't try to install this fake R package yourself.
My question: How do I debug lazydata load, when I don't know how lazydata load is performed? i.e., what code decides if the data in my publishedData.txt in my data/ folder is "A-OK", versus "Not okay"? I know they are using scan(), yet it should know that sep="\t" in a .txt file, and other than that, I'm not sure what's tripping it up?
Things I've tried:
I've scrubbed my header names as best as I can (e.g., removing non-alphabetical characters from column or rownames strings).
I've also removed any other column besides the rownames column that has string data in it instead of numerical data, just in case stringsAsFactors is set to default of TRUE in lazydata loading (which would slow down things by a lot).
Also, I've restarted R after each re-install attempt...
Okay, so I figured out a way to get this to work, without having to actually understand what was tripping it up.
Say your dataset loads using read.table(), but doesn't reinstall with lazydata load as described above. Chances are, your headers / rownames are off. A quick solution is just to do this:
# Load your data into R the way it works
tmp = read.table("/dir/to/R/package/data/publishedData.txt", sep="\t", header=TRUE)
# Write data to same file with these arguments
write.table(tmp, file="/dir/to/R/package/data/publishedData.txt", sep="\t", row.names = TRUE, col.names = TRUE)
Then, update your Github repo with git, and then try to reinstall R package. It will work this time around! The difference in the .txt file was the header for the col.names - the first "column" does not have a label associated with the rownames. It just starts with the col.name for column 1 of your data matrix. Then, in row 2, the row name for row 2 starts, then all the data comes next. So technically, row 1 has 1 less element in it than row 2, if you were to parse this data using a different method.
Hope it helps someone else. :-)

how to import .rec files in R

I have a .rec file that I want to import into R. I have saved the .rec file to my working directory. This is what I have tried.
library(foreign)
library(RODBC)
data.test <- read.epiinfo("data_in.rec")
I get this error:
Error in if (headerlength <= 0L)
stop("file has zero or fewer variables: probably not an EpiInfo file") :
missing value where TRUE/FALSE needed
In addition: Warning messages:
1:
In readLines(file, 1L, ok = TRUE) :
line 1 appears to contain an embedded nul
2:
In strsplit(line, " ") : input string 1 is invalid in this locale
I have looked online and in the read.epiinfo help package in R. The help package says
Some later versions of Epi Info use the Microsoft Access file format
to store data. That may be readable with the RODBC package.
I have two questions.
1. Is the error I am getting because the .rec file I have is from an Epi Info version later than 6?
2. How do I use the RODBC library to open the .rec file?
The .rec (or .REC) file turned out to be a .EDF (European Data Format) file type. It was easily opened in R using the library edfReader. The edfReader library help file is very useful for opening the file and extracting the time series data. See code below for what I used. Code was adapted from the help file.
install.packages('edfReader')
library(edfReader)
?edfReader
lib.dir <- system.file("data_in.rec",package="edfReader")
Cfile <- paste(lib.dir,'/edfPlusC.edf',sep='')
CHdr <- readEdfHeader("data_in.rec")
CSignals <- readEdfSignals(CHdr)
summary(CSignals)

Resources