How I can convert time variable from sas data in R? [duplicate]

How I can convert time variable from sas data in R? [duplicate] - r

It's quite weird to ask this question that I apply a sas7 dataset into R.
One of my variable is visit_date
now it looks like this, i am wondering where i can transform them back to MM-DD-YYYY since i need to exclude data that's less than MDY(08-01-2010).
> chris$visit_date
[1] 17077 17091 17105 17119 17133 17069 17083 17097 17111 17125 17080 17094 17108
[14] 17122 17136 17098 17112 17210 17224 17238 17252 17266 17247 17261 17254 17268
[27] 17282 17296 17324 17237 17251 17265 17279 17293 17329 17343 17357 17385 17413
[40] 17259 17273 17287 17301 17315 17328 17342 17356 17370 17384 17335 17349 17377
[53] 17391 17405 17331 17345 17359 17373 17387 17435 17449 17463 17477 17505 17336
[66] 17364 17378 17392 17406 17352 17366 17380 17394 17408 17427 17441 17469 17483
[79] 17497 17440 17454 17468 17482 17496 17434 17448 17462 17476 17490 17419 17433
[92] 17447 17461 17475 17518 17560 17574 17588 17616 17653 17667 17681 17695 17709
[105] 17644 17658 17686 17700 17728 17755 17769 17783 17811 17825 17825 17610 17624
[118] 17638 17652 17666 18072 18114 18127 18155 18169 17651 17665 17680 17693 17707
[131] 17657 17671 17685 17699 17659 17673 17687 17701 17715 17646 17660 17674 17688
[144] 17702 17721 17735 17749 17763 17770 17734 17748 17762 17790 17861 17736 17750
[157] 17764 17778 17792 17751 17765 17779 17793 17807 17742 17756 17770 17784 17798
[170] 17772 17757 17771 17785 17799 17813 17777 17791 17819 17833 17854 17923 17937
[183] 17965 17979 17993 17825 17839 17853 17867 17909 17832 17846 17860 17874 17888
[196] 17919 17933 17961 17975 17989 17960 17974 17988 18002 18016 18183 18211 18225
[209] 18239 18253 17931 17945 17959 17973 17987 17940 17954 17968 17982 17996 17966
[222] 17980 17994 18022 18036 18021 18035 18049 18063 18091 18050 18064 18078 18092
[235] 18106 18045 18059 18073 18087 18115 18024 18038 18052 18066 18080 18056 18070
[248] 18084 18098 18112 18107 18121 18135 18149 18163 18105 18119 18133 18161 18175
[261] 18143 18171 18185 18199 18213 18203 18246 18274 18288 18302 18316 18248 18276
[274] 18290 18304 18318 18310 18324 18338 18352 18366 18315 18343 18357 18364 18378
[287] 18350 18364 18378 18406 18420 18337 18351 18365 18379 18393 18374 18388 18402
[300] 18430 18472 18344 18358 18386 18400 18414 18353 18381 18395 18409 18423 18387
[313] 18415 18429 18443 18450 18408 18422 18436 18443 18464 18430 18437 18457 18464
[326] 18471 18427 18434 18441 18455 18462 18428 18442 18456 18463 18470
Thanks

Those "dates" are clearly using a different origin/offset than the typical POSIX standard that would work with this conversion. R generally uses YYYY-MM-DD format
as.Date(ddd, origin="1970-01-01")
> head( as.Date(ddd, origin="1970-01-01") )
[1] "2016-10-03" "2016-10-17" "2016-10-31" "2016-11-14" "2016-11-28" "2016-09-25"
So you need to establish the correct origin. If it was 1960-01-01, then none of those dates is greater than 08-01-2010.
> sum( as.Date(ddd, origin="1960-01-01") >= as.Date("2010-08-01") )
[1] 0
> sum( as.Date(ddd, origin="1960-01-01") < as.Date("2010-08-01") )
[1] 336

Related

How do I subtract date columns in R when date format is not recognized? [duplicate]

This question already has answers here:
calculating time difference in R
(2 answers)
Closed 2 years ago.
I have tried the following methods beforehand, without success:
Changing date format in R
t1$date <- dmy(t1$date_admission)
I have been trying to calculate the difference in time between two columns. Somehow, R does not recognize the format Y-m-d in one of them and returns me a wrong value, as it follows:
> [1] "2020-06-07" "2020-09-07" "2020-02-08" "2020-08-15" "2020-08-15" "2020-08-18" "2020-08-25" "2020-08-29" "2020-06-30"
[10] "2020-05-07" "2020-07-15" "2020-08-14" "2020-01-09" "2020-09-09" "2020-12-09" "2020-02-07" "2020-09-07" "2020-02-08"
[19] "2020-08-15" "2020-02-09" "2020-06-07" "2020-06-07" "2020-07-29" "2020-08-16" "2020-08-21" "2020-08-22" "2020-01-07"
[28] "2020-04-07" "2020-02-07" "2020-01-09" "2020-06-07" "2020-09-08" "2020-10-08" "2020-08-14" "2020-08-27" "2020-08-30"
[37] "2020-07-16" "2020-07-23" "2020-09-14" "2020-01-07" "2020-04-07" "2020-07-07" "2020-07-07" "2020-10-07" "2020-07-25"
[46] "2020-03-08" "2020-08-31" "2020-02-07" "2020-06-07" "2020-08-13" "2020-08-24" "2020-01-07" "2020-07-18" "2020-09-15"
[55] "2020-01-07" "2020-07-07" "2020-07-17" "2020-07-27" "2020-08-14" "2020-10-09" "2020-09-14" "2020-04-08" "2020-01-07"
[64] "2020-01-07" "2020-12-07" "2020-07-27" "2020-04-08" "2020-08-16" "2020-02-07" "2020-07-07" "2020-07-20" "2020-08-19"
[73] "2020-03-09" "2020-05-09"
> print(df$data_inicio_sint)
[1] "2020-06-27" NA "2020-07-29" NA "2020-07-31" "2020-08-19" "2020-08-22" "2020-08-18" "2020-06-29"
[10] "2020-06-25" "2020-07-14" "2020-05-09" "2020-01-10" "2020-08-31" "2020-08-30" "2020-06-28" "2020-09-08" "2020-07-23"
[19] "2020-12-09" "2020-08-22" "2020-04-08" "2020-06-25" "2020-07-20" "2020-08-16" "2020-12-09" "2020-08-23" "2020-06-30"
[28] "2020-06-26" "2020-03-31" "2020-08-23" "2020-06-21" "2020-07-29" "2020-07-29" "2020-08-01" "2020-08-19" "2020-08-14"
[37] "2020-06-30" "2020-07-22" "2020-09-10" "2020-07-01" "2020-02-08" "2020-06-08" "2020-06-23" "2020-06-27" "2020-07-17"
[46] "2020-07-29" "2020-08-31" "2020-06-20" "2020-03-08" "2020-02-09" "2020-08-24" "2020-01-08" "2020-06-08" "2020-10-10"
[55] "2020-06-23" "2020-05-08" "2020-10-08" "2020-07-24" "2020-07-09" "2020-08-29" "2020-10-10" "2020-02-09" "2020-06-23"
[64] "2020-06-22" "2020-08-08" "2020-07-21" "2020-07-28" "2020-05-09" "2020-06-19" "2020-07-08" "2020-07-14" "2020-10-09"
[73] "2020-01-10" "2020-12-09"
> diff(df$data_int_uti - df$data_inicio_sint)
Time differences in days
[1] NA NA NA NA -16 4 8 -10 -50 50 96 -98 10 92 -243 141 -165 50 -79 255 -78 27 -9
[24] -110 109 -174 95 27 -174 213 55 30 -58 -5 8 0 -15 3 -180 235 -30 -15 88 -94 -151 143
[47] -134 225 95 -186 -1 41 -65 -143 228 -143 86 33 5 -67 85 -227 1 288 -115 -117 210 -232 132
[70] 7 -57 110 -273
Expected outcome: Time interval between date of symptoms and date of admission in hospital, in days, e.g.
(2020-06-07) - (2020-06-27) = 20 days
So the output would look like
[1] 20
and so on
Any light would be greatly appreciated.
Here's the dput:
dput(t1)
structure(list(data_int_uti = structure(c(18420, 18512, 18300,
18489, 18489, 18492, 18499, 18503, 18443, 18389, 18458, 18488,
18270, 18514, 18605, 18299, 18512, 18300, 18489, 18301, 18420,
18420, 18472, 18490, 18495, 18496, 18268, 18359, 18299, 18270,
18420, 18513, 18543, 18488, 18501, 18504, 18459, 18466, 18519,
18268, 18359, 18450, 18450, 18542, 18468, 18329, 18505, 18299,
18420, 18487, 18498, 18268, 18461, 18520, 18268, 18450, 18460,
18470, 18488, 18544, 18519, 18360, 18268, 18268, 18603, 18470,
18360, 18490, 18299, 18450, 18463, 18493, 18330, 18391), class = "Date"),
data_inicio_sint = structure(c(18440, NA, 18472, NA, 18474,
18493, 18496, 18492, 18442, 18438, 18457, 18391, 18271, 18505,
18504, 18441, 18513, 18466, 18605, 18496, 18360, 18438, 18463,
18490, 18605, 18497, 18443, 18439, 18352, 18497, 18434, 18472,
18472, 18475, 18493, 18488, 18443, 18465, 18515, 18444, 18300,
18421, 18436, 18440, 18460, 18472, 18505, 18433, 18329, 18301,
18498, 18269, 18421, 18545, 18436, 18390, 18543, 18467, 18452,
18503, 18545, 18301, 18436, 18435, 18482, 18464, 18471, 18391,
18432, 18451, 18457, 18544, 18271, 18605), class = "Date")), row.names = c(NA,
-74L), class = c("tbl_df", "tbl", "data.frame"))

diff is the wrong function to calculate difference between dates. You can directly subtract the dates.
t1$date_admission - t1$date_symptoms
#Time differences in days
# [1] -20 NA -172 NA 15 -1 3 11 1 -49 1 97 -1 9 101
#[16] -142 -1 -166 -116 -195 60 -18 9 0 -110 -1 -175 -80 -53 -227
#[31] -14 41 71 13 8 16 16 1 4 -176 59 29 14 102 8
#[46] -143 0 -134 91 186 0 -1 40 -25 -168 60 -83 3 36 41
#[61] -26 59 -168 -167 121 6 -111 99 -133 -1 6 -51 59 -214
You might be trying to use difftime :
difftime(t1$date_admission, t1$date_symptoms, units = "days")
diff function subtracts consecutive values. See for example :
diff(c(5, 9, 4, 5))
#[1] 4 -5 1
where the calculation is (9 - 5 = 4), (4 - 9 = -5) and (5 - 4 = 1). In your case you are first subtracting the dates and then taking applying diff on them to get difference between consecutive numbers.

one solution is with dplyr and converting to date
library(dplyr)
# example data
t1 <- data.frame(date_admission = c("2020-08-07","2020-07-31","2020-02-08","2020-08-15","2020-08-17","2020-08-24","2020-08-27","2020-10-09","2020-01-07"),
date_symptoms = c( "2020-06-27", NA ,"2020-07-29", NA , "2020-07-31", "2020-08-19", "2020-08-22", "2020-08-18", "2020-06-29"))
# calculation (convert all columns to date and substract according to your example)
t1 %>%
dplyr::mutate_all(~ as.Date(.)) %>%
dplyr::mutate(DIF = date_admission - date_symptoms)

How do I find the closest date to a given date?

I am trying to figure out how to find the closest date in 1 zoo object to a given date in another zoo object (could also use data.frame). Suppose I have:
dates.zoo <- zoo(data.frame(val=seq(1:121)), order.by = seq.Date(as.Date('2018-12-01'), as.Date('2019-03-31'), "days"))
monthly.zoo <- zoo(data.frame(val=c(1,2,4)), order.by = c(as.Date('2018-12-14'), as.Date('2019-1-2'), as.Date('2019-2-3')))
For each date in dates.zoo I would like to align it with the closest previous date in monthly.zoo. (NA if no monthly date is found). So the data.frame/zoo object I am expecting is:
...
2018-12-02 2 NA
...
2018-12-14 14 2018-12-14
2018-12-15 15 2018-12-14
2018-12-16 16 2018-12-14
...
2019-01-01 32 2018-12-14
2019-01-02 33 2019-01-02
2019-01-03 34 2019-01-02
...
NOTE: I would prefer a Base-R solution but others would be interesting to see also

Following through on Henrik's suggestion to use findInterval. We can do:
library(zoo)
interval.idx <- findInterval(index(dates.zoo), index(monthly.zoo))
interval.idx <- ifelse(interval.idx == 0, NA, interval.idx)
dates.zoo$month <- index(monthly.zoo)[interval.idx]

A rolling join using data.table can be used.
See also: https://www.r-bloggers.com/understanding-data-table-rolling-joins/
Also a solution using base-R
data.table solution
library(data.table)
dates.df <- data.table(val=seq(1:121), dates = seq.Date(as.Date('2018-12-01'), as.Date('2019-03-31'), "days"))
monthly.df <- data.table(val=c(1,2,4,5), dates = c(as.Date('2018-12-14'), as.Date('2019-1-2'), as.Date('2019-2-3')))
setkeyv(dates.df,"dates")
setkeyv(monthly.df,"dates")
#monthly.df[,nearest:=(dates)][dates.df,roll = 'nearest'] #closest date
monthly.df[,nearest:=(dates)][dates.df,roll = Inf] #Closest _previous_ date
base R solution
dates.df <- zoo(data.frame(val=seq(1:121)), order.by = seq.Date(as.Date('2018-12-01'), as.Date('2019-03-31'), "days"))
monthly.df <- zoo(data.frame(val=c(1,2,4)), order.by = c(as.Date('2018-12-14'), as.Date('2019-1-2'), as.Date('2019-2-3')))
dates.df <- data.frame(val=dates.df$val,dates=attributes(dates.df)$index)
monthly.df <- data.frame(val=monthly.df$val,dates=attributes(monthly.df)$index)
min_distances <- as.numeric(dates.df$dates)- matrix(rep(as.numeric(monthly.df$dates),nrow(dates.df)),ncol=length(monthly.df$dates),byrow=T)
min_distances <- as.data.frame(t(min_distances))
closest <- sapply(min_distances,function(x)
{
w <- which(x==min(x[x>0]));
ifelse(length(w)==0,NA,w)
})
dates.df$closest_month <- monthly.df$dates[closest]
Results: data.table
> monthly.df[,nearest:=(dates)][dates.df,roll = Inf]
val dates nearest i.val
1: NA 2018-12-01 <NA> 1
2: NA 2018-12-02 <NA> 2
3: NA 2018-12-03 <NA> 3
4: NA 2018-12-04 <NA> 4
5: NA 2018-12-05 <NA> 5
---
118: 4 2019-03-27 2019-02-03 117
119: 4 2019-03-28 2019-02-03 118
120: 4 2019-03-29 2019-02-03 119
121: 4 2019-03-30 2019-02-03 120
122: 4 2019-03-31 2019-02-03 121
Results base R
> dates.df[64:69,]
val dates closest_month
2019-02-02 64 2019-02-02 2019-01-02
2019-02-03 65 2019-02-03 2019-01-02
2019-02-04 66 2019-02-04 2019-02-03
2019-02-05 67 2019-02-05 2019-02-03
2019-02-06 68 2019-02-06 2019-02-03
2019-02-07 69 2019-02-07 2019-02-03

If, for each date in dates.df, you want to get the closest date in monthly.df which is less than the given date, and monthly.df is sorted by date ascending, you can use the method below. It counts the number of rows in monthly.df with index less than the given date, which is equivalent to the index if mothly.df is sorted by date ascending. If there are 0 such rows, the index is changed to NA.
inds <- rowSums(outer(index(dates.df), index(monthly.df), `>`))
inds[inds == 0] <- NA
dates.df_monthmatch <- index(monthly.df)[inds]
dates.df_monthmatch
# [1] NA NA NA NA NA NA
# [7] NA NA NA NA NA NA
# [13] NA NA "2018-12-14" "2018-12-14" "2018-12-14" "2018-12-14"
# [19] "2018-12-14" "2018-12-14" "2018-12-14" "2018-12-14" "2018-12-14" "2018-12-14"
# [25] "2018-12-14" "2018-12-14" "2018-12-14" "2018-12-14" "2018-12-14" "2018-12-14"
# [31] "2018-12-14" "2018-12-14" "2018-12-14" "2019-01-02" "2019-01-02" "2019-01-02"
# [37] "2019-01-02" "2019-01-02" "2019-01-02" "2019-01-02" "2019-01-02" "2019-01-02"
# [43] "2019-01-02" "2019-01-02" "2019-01-02" "2019-01-02" "2019-01-02" "2019-01-02"
# [49] "2019-01-02" "2019-01-02" "2019-01-02" "2019-01-02" "2019-01-02" "2019-01-02"
# [55] "2019-01-02" "2019-01-02" "2019-01-02" "2019-01-02" "2019-01-02" "2019-01-02"
# [61] "2019-01-02" "2019-01-02" "2019-01-02" "2019-01-02" "2019-01-02" "2019-02-03"
# [67] "2019-02-03" "2019-02-03" "2019-02-03" "2019-02-03" "2019-02-03" "2019-02-03"
# [73] "2019-02-03" "2019-02-03" "2019-02-03" "2019-02-03" "2019-02-03" "2019-02-03"
# [79] "2019-02-03" "2019-02-03" "2019-02-03" "2019-02-03" "2019-02-03" "2019-02-03"
# [85] "2019-02-03" "2019-02-03" "2019-02-03" "2019-02-03" "2019-02-03" "2019-02-03"
# [91] "2019-02-03" "2019-02-03" "2019-02-03" "2019-02-03" "2019-02-03" "2019-02-03"
# [97] "2019-02-03" "2019-02-03" "2019-02-03" "2019-02-03" "2019-02-03" "2019-02-03"
# [103] "2019-02-03" "2019-02-03" "2019-02-03" "2019-02-03" "2019-02-03" "2019-02-03"
# [109] "2019-02-03" "2019-02-03" "2019-02-03" "2019-02-03" "2019-02-03" "2019-02-03"
# [115] "2019-02-03" "2019-02-03" "2019-02-03" "2019-02-03" "2019-02-03" "2019-02-03"
# [121] "2019-02-03"

Here is a possibility, although I did have to change the object to a data frame in order to assign the zoo index dates. This code compares the month, then year, and then finally day with criteria that it is less than or equal to the date to be matched against. If there is no date that matches this criteria then an NA is assigned. These comparisons were done with he package 'lubridate' checking for the individual date elements, and then which to logically index the best match.
library(zoo)
library(lubridate)
dates.df <- zoo(data.frame(val=seq(1:121)), order.by = seq.Date(as.Date('2018-12-01'), as.Date('2019-03-31'), "days"))
monthly.df <- zoo(data.frame(val=c(1,2,4)), order.by = c(as.Date('2018-12-14'), as.Date('2019-1-2'), as.Date('2019-2-3')))
month_m<-month(monthly.df)
month_d<-month(dates.df)
year_m<-year(monthly.df)
year_d<-year(dates.df)
day_m<-day(monthly.df)
day_d<-day(dates.df)
index<-list()
Index<-list()
for( i in 1:length(monthly.df)){
index[[i]]<-which(month_m[i] == month_d & year_m[i] == year_d
& day_d <= day_m[i])
test<-unlist(index[[i]])
#Assigns NA if no suitable match is found
if(length(test)==0){
print("NA")
Index[[i]]=NA
}else {
Index[[i]]<-tail(test, n=1)
}
}
Test<-unlist(Index)
monthly.df_Fin<-as.data.frame(monthly.df)
dates.df_Fin<-as.data.frame(dates.df)
monthly.df_Fin$match<-as.character(row.names(dates.df_Fin)[Test])
monthly.df_Fin$value<-dates.df_Fin[Test,]
> monthly.df_Fin
val match value
2018-12-14 1 2018-12-14 14
2019-01-02 2 2019-01-02 33
2019-02-03 4 2019-02-03 65
Say we changed a value outside of the critera range:
monthly.df <- zoo(data.frame(val=c(1,2,4)), order.by = c(as.Date('2018-12-
14'), as.Date('2019-1-2'), as.Date('2017-2-3')))
....
#Result
> monthly.df_Fin
val match value
2017-02-03 4 <NA> NA
2018-12-14 1 2018-12-14 14
2019-01-02 2 2019-01-02 33

how to tell if a factor has no value R

Here is a sample of what the data looks like. I need to replace all those empty spaces with NA so that as.Date(dat[,i]) produces no errors
> dat[,i]
[1]
[28]
[55]
[82] 6/26/2007 7/5/2007 7/5/2007 12/6/2007 2/5/2008
[109] 3/27/2008 6/29/2008 9/16/2008 11/3/2008 9/11/2008 11/24/2008 12/29/2008 11/20/2008 1/26/2009 1/8/2009 3/5/2009
[136] 4/7/2009 6/9/2009 8/23/2009 8/16/2009 9/2/2009 10/6/2009 10/14/2009 10/24/2009 10/22/2009 11/5/2009 12/9/2009 2/4/2010 3/18/2010
[163] 7/8/2010 7/7/2010 7/29/2010 10/6/2010 10/7/2010 11/18/2010 1/12/2011 1/6/2011 4/5/2011 4/21/2011 5/25/2011 6/20/2011
[190] 12/12/2011 2/29/2012 2/22/2012 3/7/2012 3/28/2012 5/16/2012 5/23/2012 6/14/2012 8/14/2012 8/16/2012 9/5/2012 9/30/2012 11/5/2012 12/25/2012 12/27/2012 3/14/2013
[217] 7/24/2013 7/31/2013 9/2/2013 10/16/2013 10/30/2013 12/13/2013 2/24/2014 3/9/2014 6/29/2014 6/23/2014
[244] 9/1/2014 9/22/2014 9/22/2014 11/23/2014 2/24/2015 3/17/2015 4/8/2015 6/23/2015 6/23/2015 7/4/2015
[271] ...
[3538] 6/29/2012 11/16/2012 11/23/2012 9/1/2012
916 Levels: 10/10/2008 10/10/2009 10/10/2012 10/11/2010 10/11/2013 10/1/2010 10/12/2009 10/14/2009 10/14/2010 10/14/2011 10/14/2014 10/15/2009 10/15/2014 10/16/2013 10/17/2011 10/19/2009 10/19/2010 10/19/2011 10/20/2012 10/21/2008 10/21/2010 10/21/2013 10/2/2010 10/2/2012 10/2/2013 ... 9/9/2014
But each cell in this has the same data type - 'factor.' dat[,i][1] == "" returns false for both dat[,i][1] and dat[,i][3511], so how am I supposed to tell them apart so that I can use apply appropriately to place NA where it needs to go?
> dat[,i][1]
[1]
916 Levels: 10/10/2008 10/10/2009 10/10/2012 10/11/2010 10/11/2013 10/1/2010 10/12/2009 10/14/2009 10/14/2010 10/14/2011 10/14/2014 10/15/2009 10/15/2014 10/16/2013 10/17/2011 10/19/2009 10/19/2010 10/19/2011 10/20/2012 10/21/2008 10/21/2010 10/21/2013 10/2/2010 10/2/2012 10/2/2013 ... 9/9/2014
> class(dat[,i][1])
[1] "factor"
> dat[,i][3511]
[1] 2/20/2012
916 Levels: 10/10/2008 10/10/2009 10/10/2012 10/11/2010 10/11/2013 10/1/2010 10/12/2009 10/14/2009 10/14/2010 10/14/2011 10/14/2014 10/15/2009 10/15/2014 10/16/2013 10/17/2011 10/19/2009 10/19/2010 10/19/2011 10/20/2012 10/21/2008 10/21/2010 10/21/2013 10/2/2010 10/2/2012 10/2/2013 ... 9/9/2014
> class(dat[,i][3511])
[1] "factor"
Also, trying to "go down a level" does nothing, still just a factor:
> dat[,i][[1]]
[1]
916 Levels: 10/10/2008 10/10/2009 10/10/2012 10/11/2010 10/11/2013 10/1/2010 10/12/2009 10/14/2009 10/14/2010 10/14/2011 10/14/2014 10/15/2009 10/15/2014 10/16/2013 10/17/2011 10/19/2009 10/19/2010 10/19/2011 10/20/2012 10/21/2008 10/21/2010 10/21/2013 10/2/2010 10/2/2012 10/2/2013 ... 9/9/2014
> dat[,i][1][1]
[1]
916 Levels: 10/10/2008 10/10/2009 10/10/2012 10/11/2010 10/11/2013 10/1/2010 10/12/2009 10/14/2009 10/14/2010 10/14/2011 10/14/2014 10/15/2009 10/15/2014 10/16/2013 10/17/2011 10/19/2009 10/19/2010 10/19/2011 10/20/2012 10/21/2008 10/21/2010 10/21/2013 10/2/2010 10/2/2012 10/2/2013 ... 9/9/2014

It would have been better to show the dput of the example. Based on the OP's post, I am assuming that the levels are white space (' ') instead of a blank (''). So, we can remove the space to convert to a '' and then use ==
library(stringr)
sapply(dat, function(x) sum(str_trim(x)=='')==1)
#[1] TRUE FALSE
Or use grep
sapply(lapply(dat, grepl, pattern= '^\\s+$'), all)
#[1] TRUE FALSE
data
dat <- list(factor(' ', levels=c(' ', 1:5)), factor(1:5, levels=1:5))

how do you make a sequence using along.with for unique values in r

Lets suppose I have a vector of numeric values
[1] 2844 4936 4936 4972 5078 6684 6689 7264 7264 7880 8133 9018 9968 9968 10247
[16] 11267 11508 11541 11607 11717 12349 12349 12364 12651 13025 13086 13257 13427 13427 13442
[31] 13442 13442 13442 14142 14341 14429 14429 14429 14538 14872 15002 15064 15163 15163 15324
[46] 15324 15361 15361 15400 15624 15648 15648 15648 15864 15864 15881 16332 16847 17075 17136
[61] 17136 17196 17843 17925 17925 18217 18455 18578 18578 18742 18773 18806 19130 19195 19254
[76] 19254 19421 19421 19429 19585 19686 19729 19729 19760 19760 19901 20530 20530 20530 20581
[91] 20629 20629 20686 20693 20768 20902 20980 21054 21079 21156
and I want to create a sequence along this vector but for unique numbers. for example
length(unique(vector))
is 74 and there are a total of 100 values in the vector. The sequence should have numbers ranging from 1 - 74 only but with length 100 as some numbers will be repeated.
Any idea on how this can be done?
Thanks.

Perhaps
res <- as.numeric(factor(v1))
head(res)
#[1] 1 2 2 3 4 5
Or
res1 <- match(v1, unique(v1))
Or
library(fastmatch)
res2 <- fmatch(v1, unique(v1))
Or
res3 <- findInterval(v1, unique(v1))
data
v1 <- c(2844, 4936, 4936, 4972, 5078, 6684, 6689, 7264, 7264, 7880,
8133, 9018, 9968, 9968, 10247, 11267, 11508, 11541, 11607, 11717,
12349, 12349, 12364, 12651, 13025, 13086, 13257, 13427, 13427,
13442, 13442, 13442, 13442, 14142, 14341, 14429, 14429, 14429,
14538, 14872, 15002, 15064, 15163, 15163, 15324, 15324, 15361,
15361, 15400, 15624, 15648, 15648, 15648, 15864, 15864, 15881,
16332, 16847, 17075, 17136, 17136, 17196, 17843, 17925, 17925,
18217, 18455, 18578, 18578, 18742, 18773, 18806, 19130, 19195,
19254, 19254, 19421, 19421, 19429, 19585, 19686, 19729, 19729,
19760, 19760, 19901, 20530, 20530, 20530, 20581, 20629, 20629,
20686, 20693, 20768, 20902, 20980, 21054, 21079, 21156)

You could use .GRP from "data.table" for this:
library(data.table)
y <- as.data.table(x)[, y := .GRP, by = x]
head(y)
# x y
# 1: 2844 1
# 2: 4936 2 ## Note the duplicated value
# 3: 4936 2 ## in these rows, corresponding to x
# 4: 4972 3
# 5: 5078 4
# 6: 6684 5
tail(y)
# x y
# 1: 20768 69
# 2: 20902 70
# 3: 20980 71
# 4: 21054 72
# 5: 21079 73
# 6: 21156 74 ## "y" values go to 74

transfer date data from SAS into R

It's quite weird to ask this question that I apply a sas7 dataset into R.
One of my variable is visit_date
now it looks like this, i am wondering where i can transform them back to MM-DD-YYYY since i need to exclude data that's less than MDY(08-01-2010).
> chris$visit_date
[1] 17077 17091 17105 17119 17133 17069 17083 17097 17111 17125 17080 17094 17108
[14] 17122 17136 17098 17112 17210 17224 17238 17252 17266 17247 17261 17254 17268
[27] 17282 17296 17324 17237 17251 17265 17279 17293 17329 17343 17357 17385 17413
[40] 17259 17273 17287 17301 17315 17328 17342 17356 17370 17384 17335 17349 17377
[53] 17391 17405 17331 17345 17359 17373 17387 17435 17449 17463 17477 17505 17336
[66] 17364 17378 17392 17406 17352 17366 17380 17394 17408 17427 17441 17469 17483
[79] 17497 17440 17454 17468 17482 17496 17434 17448 17462 17476 17490 17419 17433
[92] 17447 17461 17475 17518 17560 17574 17588 17616 17653 17667 17681 17695 17709
[105] 17644 17658 17686 17700 17728 17755 17769 17783 17811 17825 17825 17610 17624
[118] 17638 17652 17666 18072 18114 18127 18155 18169 17651 17665 17680 17693 17707
[131] 17657 17671 17685 17699 17659 17673 17687 17701 17715 17646 17660 17674 17688
[144] 17702 17721 17735 17749 17763 17770 17734 17748 17762 17790 17861 17736 17750
[157] 17764 17778 17792 17751 17765 17779 17793 17807 17742 17756 17770 17784 17798
[170] 17772 17757 17771 17785 17799 17813 17777 17791 17819 17833 17854 17923 17937
[183] 17965 17979 17993 17825 17839 17853 17867 17909 17832 17846 17860 17874 17888
[196] 17919 17933 17961 17975 17989 17960 17974 17988 18002 18016 18183 18211 18225
[209] 18239 18253 17931 17945 17959 17973 17987 17940 17954 17968 17982 17996 17966
[222] 17980 17994 18022 18036 18021 18035 18049 18063 18091 18050 18064 18078 18092
[235] 18106 18045 18059 18073 18087 18115 18024 18038 18052 18066 18080 18056 18070
[248] 18084 18098 18112 18107 18121 18135 18149 18163 18105 18119 18133 18161 18175
[261] 18143 18171 18185 18199 18213 18203 18246 18274 18288 18302 18316 18248 18276
[274] 18290 18304 18318 18310 18324 18338 18352 18366 18315 18343 18357 18364 18378
[287] 18350 18364 18378 18406 18420 18337 18351 18365 18379 18393 18374 18388 18402
[300] 18430 18472 18344 18358 18386 18400 18414 18353 18381 18395 18409 18423 18387
[313] 18415 18429 18443 18450 18408 18422 18436 18443 18464 18430 18437 18457 18464
[326] 18471 18427 18434 18441 18455 18462 18428 18442 18456 18463 18470
Thanks

Those "dates" are clearly using a different origin/offset than the typical POSIX standard that would work with this conversion. R generally uses YYYY-MM-DD format
as.Date(ddd, origin="1970-01-01")
> head( as.Date(ddd, origin="1970-01-01") )
[1] "2016-10-03" "2016-10-17" "2016-10-31" "2016-11-14" "2016-11-28" "2016-09-25"
So you need to establish the correct origin. If it was 1960-01-01, then none of those dates is greater than 08-01-2010.
> sum( as.Date(ddd, origin="1960-01-01") >= as.Date("2010-08-01") )
[1] 0
> sum( as.Date(ddd, origin="1960-01-01") < as.Date("2010-08-01") )
[1] 336

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

How I can convert time variable from sas data in R? [duplicate] - r

Related

How do I subtract date columns in R when date format is not recognized? [duplicate]

How do I find the closest date to a given date?

how to tell if a factor has no value R

how do you make a sequence using along.with for unique values in r

transfer date data from SAS into R

Categories

Resources