SPSS date format when imported into R - r

I have not worked with SPSS (.sav) files before and am trying to work with some data files provided to me by importing them into R. I did not receive any explanation of the files, and because communication is difficult I am trying to figure out as much as I can on my own.
Here's my first question. This is what the Date field looks like in an R data frame after import:
> dataset2$Date[1:4]
[1] 13608172800 13608259200 13608345600 13608345600
I don't know what dates the data is supposed to be for, but I found that if I divide the above numbers by 10, that seems to give a reasonable date (in February 2013). Can anyone confirm this is indeed what the above represents?
My second question is regarding another column called Begin_time. Here's what that looks like:
> dataset2$Begin_time[1:4]
[1] 29520 61800 21480 55080
Any idea what this is representing? I want to believe this is some representation of time of day because the records are for wildlife observations, but I haven't got more info than that to try to guess. I noticed that if I take the difference between End_Time and Begin_time I get numbers like 120 and 180, which seems like minutes to me (3 hours seems reasonable to observe a wild animal), but the absolute numbers are far greater than the number of minutes in a day (1440), so that leaves me puzzled. Is this some time keeping format from SPSS? If so, what's the logic?
Unfortunately, I don't have access to SPSS, so any help would be much appreciated.

I had the same problem and this function is a good solution:
pss2date <- function(x) as.Date(x/86400, origin = "1582-10-14")
This is where I found the answer:
http://scs.math.yorku.ca/index.php/R:_Importing_dates_from_SPSS

Dates in SPSS Statistics are represented as floating point doubles holding the number of seconds since Oct 1, 1582. If you use the SPSS R plugin apis, they can be automatically converted to R dates, but any proper converter should be able to do this for you.

Related

read RTF files into R containing econ data (a lot of numbers)

Recently I obtained a lot of RTF files that contain econ data for analysis I need to do. Unfortunately, this is how Bureau of Statistics of my country could help with time series data for a long time lapse. If there is a one time need to select particular indicator for 10 years or so I'm OK to find these values by hand using Word/Notepad/TestEdit(for Mac). But my problem is that I have 15 files with data that I need to combine somehow in one dataset for my work. But, before even start doing this I don't have a clue if it is possible to read those files in appropriate format (data.frame). Wanted to ask expert opinion on how to approach this task. An example of one of files could be downloaded from here:
[https://www.dropbox.com/s/863ikx6poid8unc/Export_for_SO.rtf?dl=0][1]
All values are in Russian. Dataset represents export of particular product (first column) across countries (second column) in US dollars for 2 periods.
Thank you.
Use code found on https://datascienceplus.com/how-to-import-multiple-csv-files-simultaneously-in-r-and-create-a-data-frame/
replace read_csv with read_rtf
You may want to manually convert your files to another format using an office suite or a text editor. You should be able to save as in another format.
While in R, you may want to give striprtf a try. I'm guessing you will still have to clean your data a bit afterward.
You can install the package like this:
install.packages("striprtf")

Extracting from data frame at specific time intervals ....index or posixct?

I currently have a dataframe which includes a running timeline of POSIXct (see below).
Basically given a starting time of my choosing I want to be able to take rows at specific time interval. E.g. Say I start taking rows at four pm I then want to take 9 minutes, then not take anything for the next two, then nine again. I'm guessing the best approach is possibly using indexing but I also thought something like the lubridae package could be used but not sure how to exactly do it.
Thanks!

Converting data between classes in R: large CSV file, prices recorded like "$10.00"

So full disclosure, I am new to R and programming in general. Because of that, it is very hard for me to search when I have problems because I am not even sure what keywords to use. I am learning, and all I am hoping for y'all to do is point me in the right direction.
I have a very large csv file that I imported into R. Around 2 million observations (don't worry, I am not planning on using all 2 million). The only problem is that the people recording the data formatted the file to record to prices as "$10.00". Because of this, R recognizes the data has a factor, and also treats each individual price as a separate variable because of the dollar sign. I would like to reformat this column as a numeric variable.
I am sure there is some way to go about reformatting this in R, the only problem is I am not sure which functions I need. Sorry for the very basic question, I have just hit a wall a figured I would reach out.
Any and all help is much appreciated!
Thank you!
We could also use sub
as.numeric(sub('\\D+', '', x))
#[1] 10.00 11.24 15.22
data
x<-c("$10.00","$11.24","$15.22")
Suppose that your data looks like this:
x<-c("$10.00","$11.24","$15.22")
You can use the substring function to trim the initial dollar sign (which will still leave you with strings) and then use as.numeric to turn it to a numeric vector.
newx<-as.numeric(substring(x,2))
will produce a vector named newx with value
c(10.00,11.24,15.22)
We tell the substring to start at the 2nd character (strings in R are 1-indexed), and then cast to numeric.
In your data frame (suppose it is called df), you can replace the column like
df$MoneyColumn <- as.numeric(substring(df$MoneyColumn,2))

R combining vectors into a data frame but without a continuous common variable

I have two different files which I'd like to combine to make one data frame but I've no idea how to do it! The first file has two columns; one is the date, the following is a binary code for debris flow events. Then my other file has also two columns; the date and then precipitation data.
The thing is, is that the two date columns don't all contain the same dates. The binary one is every day from April - October from 1900 - 2005, yet the precipitation file has dates from from say, 1911 to 2004, with some missing data on certain months and during certain years.
So my question is how to make a data frame which would have the date, the binary 0 or 1, and then the corresponding precip value from that particular date. I only want the precip information for the days where I have information in the binary file; the others can be disregarded.
I have tried using codes which I found in answer to other questions asked but none of them work with my issue. If I'm honest I don't really know if this is what I need. I'm hoping to end up doing a logistic regression.
I'd be really grateful if anyone could help me and suggest a way to do it!
I'm also really not very technical and am not at all comfortable with R so if you could be really basic in your advice I'd appreciate it!
If you want to combine data which have identical dates, take a look at %in% and intersect . Something like
# not tested. Beware.
new_dates<- intersect(data_1[,1],data_2[,1])
new_data <- cbind(new_dates,data_1[data_1[,1] %in% new_dates,2] , data_2{data_2 %in% new_dates,2])
(probably much cleaner ways with the ever-looming plyr package :-) )

Specific date format conversion problems in R

Basically I want to know why as.Date(200322,format="%Y%W") gives me NA. While we are at it, I would appreciate any advice on a data structure for repeated cross-section (aka pseudo-panel) in R.
I did get aggregate() to (sort of) work, but it is not flexible enough - it misses data on columns when I omit the missed values, for example.
Specifically, I have a survey that is repeated weekly for a couple of years with a bunch of similar questions answers to which I would like to combine, average, condition and plot in both dimensions. Getting the date conversion right should presumably help me towards my goal with zoo package or something similar.
Any input is appreciated.
Update: thanks for string suggestion, but as you can see in your own example, %W part doesn't work - it only identifies the year while setting the current day while I need to set a specific week (and leave the day blank).
Use a string as first argument in as.Date() and select a specific weekday (format %w, value 0-6). There are seven possible dates in each week, therefore strptime needs more information to select a unique date. Otherwise the current day and month are returned.
> as.Date(paste("200947", "0", sep="-"), format="%Y%W-%w")
[1] "2009-11-22"

Resources