Json to dataframe (empty observations, different lengths) - r

Through an API I accessed weather information in Json format. I want to convert this data to a dataframe. The problem is that not for every date-city combination the api returns weather conditions, so a few rows are empty. Second , not every combination that does return gives the same aspects of the weather. My goal is to convert the Json to a dataframe, where rows that are empty are still showed in the dataframe (which does not happen when I unlist them) and the different aspects of the weather are properly showed under the right variable with NA values if there is no record for that particular variable. I've tried enlisting them and putting it into a dataframe, flattening the table etc (getting the error: arguments imply differing number of rows: 0, 1) . I've searched for this topic but none of them worked for my case (or maybe because I'm not that experienced I applied them wrong), but every tip is welcome!
The input looks like this:
reviewid dateofwriting lon lat
98338143 28-02-11 11,41693611 22,3193039
58929813 18-03-10 -3,7037902 40,4167754
65945346 31-05-10 -3,188267 55,953252
The output looks like this (the second observation returns 36 columns and the third one 38. the first entry is missing because there was no observation for that day and is not displayed)
enter image description here
[{},
{"daily":
[{"time":"2010-03-18",
"summary":"Partly cloudy throughout the day.",
"icon":"partly-cloudy-day",
"sunriseTime":"2010-03-18 07:22:51",
"sunsetTime":"2010-03-18 19:25:28",
"moonPhase":0.08,
"precipIntensity":0,
"precipIntensityMax":0,
"precipProbability":0,
"temperatureHigh":63.14,
"temperatureHighTime":1268928000,
"temperatureLow":45.16,
"temperatureLowTime":1268971200,
"apparentTemperatureHigh":63.14,
"apparentTemperatureHighTime":1268928000,
"apparentTemperatureLow":45.16,
"apparentTemperatureLowTime":1268971200,
"dewPoint":36.97,
"humidity":0.58,
"pressure":1025.96,
"windSpeed":1.24,
"windGust":7.87,
"windGustTime":1268866800,
"windBearing":48,
"cloudCover":0.54,
"uvIndex":5,
"uvIndexTime":1268913600,
"visibility":6.19,
"temperatureMin":43.97,
"temperatureMinTime":"2010-03-18 07:00:00",
"temperatureMax":63.14,
"temperatureMaxTime":"2010-03-18 17:00:00",
"apparentTemperatureMin":42.03,
"apparentTemperatureMinTime":"2010-03-18 08:00:00",
"apparentTemperatureMax":63.14,
"apparentTemperatureMaxTime":"2010-03-18 17:00:00"}]},
{"daily":
[{"time":"2010-05-30 01:00:00",
"summary":"Mostly cloudy until evening.",
"icon":"partly-cloudy-day",
"sunriseTime":"2010-05-30 05:38:39",
"sunsetTime":"2010-05-30 22:44:55",
"moonPhase":0.58,
"precipIntensity":0.0038,
"precipIntensityMax":0.0766,
"precipIntensityMaxTime”:"2010-05-30 04:00:00",
"precipProbability":1,
"precipType":"rain",
"temperatureHigh":58.99,
"temperatureHighTime":1275242400,
"temperatureLow":36.62,
"temperatureLowTime":1275278400,
"apparentTemperatureHigh":58.99,
"apparentTemperatureHighTime":1275242400,
"apparentTemperatureLow":36.62,
"apparentTemperatureLowTime":1275278400,
"dewPoint":43.61,
"humidity":0.76,
"pressure":1011.52,
"windSpeed":4.65,
"windGust":21.4,
"windGustTime":1275224400,
"windBearing":350,
"cloudCover":0.61,
"uvIndex":5,
"uvIndexTime":1275213600,
"visibility":5.85,
"temperatureMin":45.99,
"temperatureMinTime":"2010-05-30 07:00:00",
"temperatureMax":58.99,
"temperatureMaxTime":"2010-05-30 20:00:00",
"apparentTemperatureMin":43.31,
"apparentTemperatureMinTime":"2010-05-30 06:00:00",
"apparentTemperatureMax":58.99,
"apparentTemperatureMaxTime":"2010-05-30 20:00:00"}]}]
The goal is to add these rows to the input excel above.
icon sunrisetime sunsettime etc.
NA NA NA etc.
partly-cloudy-day 18-03-10 07:22 18-03-10 19:25 etc.
partly-cloudy-day 30-05-10 05:38 30-05-10 22:44 etc.

There is a problem dealing with the responses that return NULL. To simplify the issue, it is easier to remove these non responses and then parse the remaining JSON response. If desired, one can go back and add the empty rows for the non responses.
library(jsonlite)
library(dplyr)
#test<- result from converting the JSON response.
#vector of reviewid, used to make the initial request to the API
reviewid<-c(98338143, 58929813, 65945346)
#find only the responses that are not Null or blank
valid<-which(sapply(1:nrow(test), function(j) {length(test[[1]][[j]])}) >0)
NullResponses<-which(sapply(1:nrow(test), function(j) {length(test[[1]][[j]])}) == 0)
#create a list of data frames with the data from row of the response
dflist<-lapply( valid, function(j) {
temp<-t(as.matrix(unlist(test[j,])))
df<-data.frame(reviewid=reviewid[j], temp, stringsAsFactors = FALSE)
df
})
#bind the rows together.
answer<-bind_rows(dflist)

Related

Find differences betwen 2 dataframes with different lengths

I have two dataframes with each two columns c("price", "size") with different lengths.
Each price must be linked to its size. It's two lists of trade orders. I have to discover the differences between the two dataframes knowing that the two databases can have orders that the other doesn't have and vice versa. I would like an output with the differences or two outputs, it doesn't matter. But I need the row number in the output to find where are the differences in the series.
Here is sample data :
> out
price size
1: 36024.86 0.01431022
2: 36272.00 0.00138692
3: 36272.00 0.00277305
4: 36292.57 0.05420000
5: 36292.07 0.00403948
---
923598: 35053.89 0.30904890
923599: 35072.76 0.00232000
923600: 35065.60 0.00273000
923601: 35049.36 0.01760000
923602: 35037.23 0.00100000
>bit
price size
1: 37279.89 0.01340020
2: 37250.84 0.00930000
3: 37250.32 0.44284049
4: 37240.00 0.00056491
5: 37215.03 0.99891906
---
923806: 35053.89 0.30904890
923807: 35072.76 0.00232000
923808: 35065.60 0.00273000
923809: 35049.36 0.01760000
923810: 35037.23 0.00100000
For example, I need to know if the first row of the database out is in the database bit.
I've tried many functions : comparedf()
summary(comparedf(bit, out, by = c("price","size"))
but I've got error:
Error in vecseq(f__, len__, if (allow.cartesian || notjoin ||
!anyDuplicated(f__, :
I've tried compare_df() :
compareout=compare_df(out,bit,c("price","size"))
But I know the results are wrong, I've only 23 results and I know that there are more than 200 differences minimum.
I've tried match(), which() functions but it doesn't get the results I search.
If you have any other methods, I will take them.
Perhaps you could just do inner_join on out and bit by price and size? But first make id variable for both data.frame's
library(dplyr)
out$id <- 1:nrow(out)
bit$id <- 1:nrow(bit)
joined <- inner_join(bit, out, by = c("price", "size"))
Now we can check which id from out and bit are not present in joined table:
id_from_bit_not_included_in_out <- bit$id[!bit$id %in% joined$id.x]
id_from_out_not_included_in_bit <- out$id[!out$id %in% joined$id.y]
And these ids are the rows not included in out or bit, i.e. variable id_from_bit_not_included_in_out contains rows present in bit, but not in out and variable id_from_out_not_included_in_bit contains rows present in out, but not in bit
First attempt here. It will be difficult to do a very clean job with this data tho.
The data I used:
out <- read.table(text = "price size
36024.86 0.01431022
36272.00 0.00138692
36272.00 0.00277305
36292.57 0.05420000
36292.07 0.00403948
35053.89 0.30904890
35072.76 0.00232000
35065.60 0.00273000
35049.36 0.01760000
35037.23 0.00100000", header = T)
bit <- read.table(text = "price size
37279.89 0.01340020
37250.84 0.00930000
37250.32 0.44284049
37240.00 0.00056491
37215.03 0.99891906
37240.00 0.00056491
37215.03 0.99891906
35053.89 0.30904890
35072.76 0.00232000
35065.60 0.00273000
35049.36 0.01760000
35037.23 0.00100000", header = T)
Assuming purely that row 1 of out should match with row 1 of bit a simple solution could be:
df <- cbind(distinct(out), distinct(bit))
names(df) <- make.unique(names(df))
However judging from the data you have provided I am not sure if this is the way to go (big differences in the first few rows) so maybe try sorting the data first?:
df <- cbind(distinct(out[order(out$price, out$size),]), distinct(bit[order(bit$price, bit$size),]))
names(df) <- make.unique(names(df))

Extract attributes in XML using R

Trying to extract two attributes from the XML file extract (from a large XML file) namely 'nmRegime' and 'CalendarSystemT' (this is the date). Once extract those two records need to be saved as two columns in a data frame in R along with the filename.
There are several 'event' nodes within one given XML file and there are nearly 100 individual XML files.
<Event tEV="FirA" clearEV="false" onEV="true" dateOriginEV="Calendar" nYrsFromStEV="" nDaysFromStEV="" tFaqEV="Blank" tAaqEV="Blank" aqStYrEV="0" aqEnYrEV="0" nmEV="Fire_Cool" categoryEV="CatUndef" tEvent="Doc" idSP="105" nmRegime="Wheat, Tilled, stubble cool burn" regimeInstance="1">
<notesEV></notesEV>
<dateEV CalendarSystemT="FixedLength">19710331</dateEV>
<FirA fracAfctFirA="0.6" fracGbfrToAtmsFirA="0.98" fracStlkToAtmsFirA="0.98" fracLeafToAtmsFirA="0.98" fracGbfrToGlitFirA="0.02" fracStlkToSlitFirA="0.02" fracLeafToLlitFirA="0.02" fracCortToCodrFirA="1.0" fracFirtToFidrFirA="1.0" fracDGlitToAtmsFirA="0.931" fracRGlitToAtmsFirA="0.931" fracDSlitToAtmsFirA="0.931" fracRSlitToAtmsFirA="0.931" fracDLlitToAtmsFirA="0.931" fracRLlitToAtmsFirA="0.931" fracDCodrToAtmsFirA="0.0" fracRCodrToAtmsFirA="0.0" fracDFidrToAtmsFirA="0.0" fracRFidrToAtmsFirA="0.0" fracDGlitToInrtFirA="0.019" fracRGlitToInrtFirA="0.019" fracDSlitToInrtFirA="0.019" fracRSlitToInrtFirA="0.019" fracDLlitToInrtFirA="0.019" fracRLlitToInrtFirA="0.019" fracDCodrToInrtFirA="0.0" fracRCodrToInrtFirA="0.0" fracDFidrToInrtFirA="0.0" fracRFidrToInrtFirA="0.0" fracSopmToAtmsFirA="" fracLrpmToAtmsFirA="" fracMrpmToAtmsFirA="" fracSommToAtmsFirA="" fracLrmmToAtmsFirA="" fracMrmmToAtmsFirA="" fracMicrToAtmsFirA="" fracSopmToInrtFirA="" fracLrpmToInrtFirA="" fracMrpmToInrtFirA="" fracSommToInrtFirA="" fracLrmmToInrtFirA="" fracMrmmToInrtFirA="" fracMicrToInrtFirA="" fracMnamNToAtmsFirA="" fracSAmmNToAtmsFirA="" fracSNtrNToAtmsFirA="" fracDAmmNToAtmsFirA="" fracDNtrNToAtmsFirA="" fixFirA="" phaFirA="" />
</Event>
Had some success in extracting 'nmRegime' but no success with 'CalendarSystemT'. Used below code for data extraction.
The second question, is there a way to loop the list of XML files and do this operation?
# get records
library(xml2)
recs <- xml_find_all(xml, "//Event")
#extract the names
labs <- trimws(xml_attr(recs, "nmRegime"))
names <- labs[!is.na(labs)]
# Extract the date
recs_t <- xml_find_all(xml, "//Event/dateEV")
time <- trimws(xml_attr(recs_t, "CalendarSystemT"))
The calendar time value is not an attribute but is stored as the node's element and is accessed directly.
Also note that if an Event Node is missing a "dateEV" then there will be problems aligning the "labs" with the "time". It is better to extract the time value from each parent node instead of the entire document.
library(xml2)
library(dplyr)
xml<- read_xml('<Event tEV="FirA" clearEV="false" onEV="true" dateOriginEV="Calendar" nYrsFromStEV="" nDaysFromStEV="" tFaqEV="Blank" tAaqEV="Blank" aqStYrEV="0" aqEnYrEV="0" nmEV="Fire_Cool" categoryEV="CatUndef" tEvent="Doc" idSP="105" nmRegime="Wheat, Tilled, stubble cool burn" regimeInstance="1">
<notesEV></notesEV>
<dateEV CalendarSystemT="FixedLength">19710331</dateEV>
<FirA fracAfctFirA="0.6" fracGbfrToAtmsFirA="0.98" fracStlkToAtmsFirA="0.98" fracLeafToAtmsFirA="0.98" fracGbfrToGlitFirA="0.02" fracStlkToSlitFirA="0.02" fracLeafToLlitFirA="0.02" fracCortToCodrFirA="1.0" fracFirtToFidrFirA="1.0" fracDGlitToAtmsFirA="0.931" fracRGlitToAtmsFirA="0.931" fracDSlitToAtmsFirA="0.931" fracRSlitToAtmsFirA="0.931" fracDLlitToAtmsFirA="0.931" fracRLlitToAtmsFirA="0.931" fracDCodrToAtmsFirA="0.0" fracRCodrToAtmsFirA="0.0" fracDFidrToAtmsFirA="0.0" fracRFidrToAtmsFirA="0.0" fracDGlitToInrtFirA="0.019" fracRGlitToInrtFirA="0.019" fracDSlitToInrtFirA="0.019" fracRSlitToInrtFirA="0.019" fracDLlitToInrtFirA="0.019" fracRLlitToInrtFirA="0.019" fracDCodrToInrtFirA="0.0" fracRCodrToInrtFirA="0.0" fracDFidrToInrtFirA="0.0" fracRFidrToInrtFirA="0.0" fracSopmToAtmsFirA="" fracLrpmToAtmsFirA="" fracMrpmToAtmsFirA="" fracSommToAtmsFirA="" fracLrmmToAtmsFirA="" fracMrmmToAtmsFirA="" fracMicrToAtmsFirA="" fracSopmToInrtFirA="" fracLrpmToInrtFirA="" fracMrpmToInrtFirA="" fracSommToInrtFirA="" fracLrmmToInrtFirA="" fracMrmmToInrtFirA="" fracMicrToInrtFirA="" fracMnamNToAtmsFirA="" fracSAmmNToAtmsFirA="" fracSNtrNToAtmsFirA="" fracDAmmNToAtmsFirA="" fracDNtrNToAtmsFirA="" fixFirA="" phaFirA="" />
</Event>')
recs <- xml_find_all(xml, "//Event")
#extract the names
labs <- trimws(xml_attr(recs, "nmRegime")) names <- labs[!is.na(labs)]
# Extract the date
time <- xml_find_first(recs, ".//dateEV") %>% xml_text() %>% trimws()
To answer your second question, yes you could can wrap the above script into a function and then use lapply to loop through your entire list of files.
See this question and answer for details: R XML - combining parent and child nodes(w same name) into data frame

R: Replace all Values that are not equal to a set of values

All.
I've been trying to solve a problem on a large data set for some time and could use some of your wisdom.
I have a DF (1.3M obs) with a column called customer along with 30 other columns. Let's say it contains multiple instances of customers Customer1 thru Customer3000. I know that I have issues with 30 of those customers. I need to find all the customers that are NOT the customers I have issues and replace the value in the 'customer' column with the text 'Supported Customer'. That seems like it should be a simple thing...if it werent for the number of obs, I would have loaded it up in Excel, filtered all the bad customers out and copy/pasted the text 'Supported Customer' over what remained.
Ive tried replace and str_replace_all using grepl and paste/paste0 but to no avail. my current code looks like this:
#All the customers that have issues
out <- c("Customer123", "Customer124", "Customer125", "Customer126", "Customer127",
"Customer128", ..... , "Customer140")
#Look for everything that is NOT in the list above and replace with "Enabled"
orderData$customer <- str_replace_all(orderData$customer, paste0("[^", paste(out, collapse =
"|"), "]"), "Enabled Customers")
That code gets me this error:
Error in stri_replace_all_regex(string, pattern, fix_replacement(replacement), :
In a character range [x-y], x is greater than y. (U_REGEX_INVALID_RANGE)
I've tried the inverse of this approach and pulled a list of all obs that dont match the list of out customers. Something like this:
in <- orderData %>% filter(!customer %in% out) %>% select(customer) %>%
distinct(customer)
This gets me a much larger list of customers that ARE enabled (~3,100). Using the str_replace_all and paste approach seems to have issues though. At this large number of patterns, paste no longer collapses using the "|" operator. instead I get a string that looks like:
"c(\"Customer1\", \"Customer2345\", \"Customer54\", ......)
When passed into str_replace_all, this does not match any patterns.
Anyways, there's got to be an easier way to do this. Thanks for any/all help.
Here is a data.table approach.
First, some example data since you didn't provide any.
customer <- sample(paste0("Customer",1:300),5000,replace = TRUE)
orderData <- data.frame(customer = sample(paste0("Customer",1:300),5000,replace = TRUE),stringsAsFactors = FALSE)
orderData <- cbind(orderData,matrix(runif(0,100,n=5000*30),ncol=30))
out <- c("Customer123", "Customer124", "Customer125", "Customer126", "Customer127", "Customer128","Customer140")
library(data.table)
setDT(orderData)
result <- orderData[!(customer %in% out),customer := gsub("Customer","Supported Customer ",customer)]
result
customer 1 2 3 4 5 6 7 8 9
1: Supported Customer 134 65.35091 8.57117 79.594166 84.88867 97.225276 84.563997 17.15166 41.87160 3.717705
2: Supported Customer 225 72.95757 32.80893 27.318046 72.97045 28.698518 60.709381 92.51114 79.90031 7.311200
3: Supported Customer 222 39.55269 89.51003 1.626846 80.66629 9.983814 87.122153 85.80335 91.36377 14.667535
4: Supported Customer 184 24.44624 20.64762 9.555844 74.39480 49.189537 73.126275 94.05833 36.34749 3.091072
5: Supported Customer 194 42.34858 16.08034 34.182737 75.81006 35.167769 23.780069 36.08756 26.46816 31.994756
---

Filtering datetime by vector

It's probably really simple.
In the first case, using presidential data, I can filter by either years or years 2. And I get the same result.
However, when I use posixct data and try to filter in a similar way I run into problems.
When I write
school_hours2<-as.character(c(07:18))
I can see the values in school_hours 2 are
"7", "8","9" etc
whereas in
school_hours they are
"07" "08" "09" etc
EDIT: I think this explains that difference then?
EDIT: I can see the problem comparing integer:character, and even when I write the vector as.character the values in the vector do not match what I want.
What I'd like is to be able to filter by school_hours2. As that would mean I could think "i'd like to filter between these two times" and put the upper and lower bounds in. Rather than having to write all the interval points in between. How do I get this?
Why is filtering by "Y" easier than filtering by "H"?
library (tidyverse)
#some data - filtering works
data(presidential)
head(presidential)
str(presidential)
presidential%>%filter(format(as.Date(start),"%Y")<=2005)
years<-c('1979', '1980', '1981', '1982',
'1983', '1984', '1985', '1986',
'1987', '1988', '1989', '1990'
)
years2<-c(1950:1990)
presidential%>%filter(format(as.Date(start),"%Y")%in% years2)
presidential%>%filter(format(as.Date(start),"%Y")%in% years)
#some date time data - filtering.
test_data<-sample(seq(as.POSIXct('2013/01/01'), as.POSIXct('2017/05/01'), by="day"), 1000)
td<-as.data.frame(test_data)%>%mutate(id = row_number())
school_hours<-c('07', '08', '09', '10',
'11', '12', '13', '14',
'15', '16', '17', '18'
)
school_hours2<-c(07:18)
school_years<-c(2015,2016,2017)
school_years2<-c(2015:2017)
str(td)
test1<-td%>%
filter(id >=79)
schools<-td%>%
filter(format(test_data,'%H') %in% school_hours)
schools2<-td%>%
filter(format(test_data,'%H') %in% school_hours2)
schools3<-td%>%
filter(format(test_data,'%Y')==2017)
schools4<-td%>%
filter(format(test_data,'%Y') %in% school_years)
schools5<-td%>%
filter(format(test_data,'%Y') %in% school_years2)
Here's my question:
In the code above, when I try to filter td (which contains posixct data) using school_hours or school_hours2 I get zero data returned.
Why?
What I'd like to be able to do is instead of writing
school_hours<-c('07', '08', '09', '10',
'11', '12', '13', '14',
'15', '16', '17', '18'
)
I'd write
school_hours2<-c(07:18)
Just like I have for school_years and the filtering would work.
This doesn't work
schools2<-td%>%
filter(format(test_data,'%H') %in% school_hours2)
This does work
schools5<-td%>%
filter(format(test_data,'%Y') %in% school_years2)
WHY?
I ask because:
I've used something similar to filter my real data, which I can't share, and I get a discrepancy.
When I use school_hours (which is a character) I generate 993 records and the first time is 07:00.
When I use school_hours2 (which is an integer) I generate 895 records and the first time is 10:00.
I know - "without the data we can't make any evaluation" but what I can't work out is why the two different vector filters work differently. Is it because school_hours contains characters and school_hours2 integers?
EDIT:
I changed the test_data line to:
#some date time data - filtering.
test_data<-as.POSIXct(sample(seq(1127056501, 1127056501), 1000),origin = "1899-12-31",tz="UTC")
it's still problematic:
schools<-td%>%
filter(format(test_data,'%H') %in% school_hours)
generates 510 rows
schools2<-td%>%
filter(format(test_data,'%H') %in% school_hours2)
generates 379 rows
All of the data I'm really interested looks like this
1899-12-31 23:59:00
(where the last 6 digits represent a 24 hr clock time)
All I'm really trying to do is convert the time from this
1899-12-31 07:59:00
to
the hour (7)
and then
use
school_hours2<-c(07:18)
as a filter.
But will the hour generated by the conversion of
1899-12-31 07:59:00
be
07
or
7
Because if it's 07, then
school_hours2<-c(07:18)
generates
7
and
school_hours2<-as.character(c(07:18))
generates
'7'
How do I get around this?
EDIT:
LIKE THIS:
R: how to filter a timestamp by hour and minute?
td1<-td%>%mutate(timestamp_utc = ymd_hms(test_data,tz="UTC"))%>%
mutate(hour = hour(timestamp_utc))%>%
filter(hour(timestamp_utc) %in% school_hours)
td2<-td%>%mutate(timestamp_utc = ymd_hms(test_data,tz="UTC"))%>%
mutate(hour = hour(timestamp_utc))%>%
filter(hour(timestamp_utc) %in% school_hours2)
td3<-td%>%
mutate(hour = hour(test_data))%>%
filter(hour(test_data) %in% school_hours2)
After a lot of mucking around and talking to myself in my question
I found this thread:
filtering a dataset by time stamp
and it helped me to realise how to isolate the hour in the time stamp and then use that to filter the data properly.
the final answer is to isolate the hour by this
filter(hour(timestamp_utc) %in% school_hours2)

How to convert from yearweek to yearmonthday?

I have this date vector:
dput(date)
c("1981035", "1981036", "1981037", "1981038", "1981039", "1981040",
"1981041", "1981042", "1981043", "1981044", "1981045", "1981046",
"1981047", "1981048", "1981049", "1981050", "1981051", "1981052",
"1982001", "1982002", "1982003", "1982004", "1982005", "1982006",
"1982007", "1982008", "1982009", "1982010", "1982011", "1982012",
"1982013", "1982014", "1982015", "1982016", "1982017", "1982018",
"1982019", "1982020", "1982021", "1982022", "1982023", "1982024",
"1982025", "1982026", "1982027", "1982028", "1982029", "1982030",
"1982031", "1982032", "1982033", "1982034", "1982035", "1982036",
"1982037")
It is given as yearweek [week 1 covers day-of-the-year 1 to 7]. I want to convert this to format year-month-day and tried the following, but it didn't work:
as.Date(date, "%Y%U")
You have to assign a day of the week for them. Otherwise it cannot be converted to a specific date, since it refers to a range of dates. Choosing day 0 with %w, i.e. Sunday, you can use the code below.
as.Date(paste0(da, '0'), format = '%Y0%U%w')
Note: This assumes the fifth digit contains no info. That seems odd to me, but is correct according to OP.
Edit: #Kath pointed out it probably makes more sense to think of the data as being in %Y%w%U format, so you can achieve the same result with the simpler code below
as.Date(da, format = '%Y%w%U')

Resources