How to extract only year from the date in dataframes? [duplicate] - r

This question already has answers here:
Get the year from a timestamp in R [duplicate]
(2 answers)
Closed 7 years ago.
This is my data...& i need to extract data by using only basic R(dont use mysql, php,python ,c# or any other)
**service** **Date**
disconnected 2013-01-14
disconnected 2013-03-15
disconnected 2012-02-24
disconnected 2012-12-05
disconnected 2012-06-08
disconnected 2011-05-08
disconnected 2010-10-11
disconnected 2010-12-02
The data i need to extract is only year...from the date....& later again i need to assign it to new variable or vector.....
the following output should be....
OUTPUT
**service** **Date**
disconnected 2013
disconnected 2013
disconnected 2012
disconnected 2012
disconnected 2012
disconnected 2011
disconnected 2010
disconnected 2010

There are many options. One way is using substr to get the first 4 character elements from 'Date' column (assuming that we are not going back to > 1000 )
df1$Year <- substr(df1$Date, 1,4)
Or we match the substring that begins from - followed by one or more characters to the end of the string, replace with '' using sub.
df1$Year <- sub('-.*$', '', df1$Date)
Or we can extract the year by converting to POSIXlt class
strptime(df1$Date, '%Y-%m-%d')$year+1900
If we are allowed to use packages, library(lubridate) has a convenient function i.e. year
library(lubridate)
year(df1$Date)
data
df1 <- structure(list(service = c("disconnected", "disconnected", "disconnected",
"disconnected", "disconnected", "disconnected", "disconnected",
"disconnected"), Date = c("2013-01-14", "2013-03-15", "2012-02-24",
"2012-12-05", "2012-06-08", "2011-05-08", "2010-10-11", "2010-12-02"
)), .Names = c("service", "Date"), class = "data.frame",
row.names = c(NA, -8L))

If you make date a Date variable, format can pull out the year quite easily.
D <- data.frame(service = rep("disconnected", 3),
date = c("2013-01-14", "2013-03-15", "2012-02-24"))
D$year <- format(as.Date(D$date), format = "%Y")
D
service date year
1 disconnected 2013-01-14 2013
2 disconnected 2013-03-15 2013
3 disconnected 2012-02-24 2012

Related

Compiling API outputs in XML format in R

I have searched everywhere trying to find an answer to this question and I haven't quite found what I'm looking for yet so I'm hoping asking directly will help.
I am working with the USPS Tracking API, which provides an output an XML format. The API is limited to 35 results per call (i.e. you can only provide 35 tracking numbers to get info on each time you call the API) and I need information on ~90,000 tracking numbers, so I am running my calls in a for loop. I was able to store the results of the call in a list, but then I had trouble exporting the list as-is into anything usable. However, when I tried to convert the results from the list into JSON, it dropped the attribute tag, which contained the tracking number I had used to generate the results.
Here is what a sample result looks like:
<TrackResponse>
<TrackInfo ID="XXXXXXXXXXX1">
<TrackSummary> Your item was delivered at 6:50 am on February 6 in BARTOW FL 33830.</TrackSummary>
<TrackDetail>February 6 6:49 am NOTICE LEFT BARTOW FL 33830</TrackDetail>
<TrackDetail>February 6 6:48 am ARRIVAL AT UNIT BARTOW FL 33830</TrackDetail>
<TrackDetail>February 6 3:49 am ARRIVAL AT UNIT LAKELAND FL 33805</TrackDetail>
<TrackDetail>February 5 7:28 pm ENROUTE 33699</TrackDetail>
<TrackDetail>February 5 7:18 pm ACCEPT OR PICKUP 33699</TrackDetail>
Here is the script I ran to get the output I'm currently working with:
final_tracking_info <- list()
for (i in 1:x) { # where x = the number of calls to the API the loop will need to make
usps = input_tracking_info[i] # input_tracking_info = GET commands
usps = read_xml(usps)
final_tracking_info1[[i+1]]<-usps$TrackResponse
gc()
}
final_output <- toJSON(final_tracking_info)
write(final_output,"final_tracking_info.json") # tried converting to JSON, lost the ID attribute
cat(capture.output(print(working_list),file = "Final_Tracking_Info.txt")) # exported the list to a textfile, was not an ideal format to work with
What I ultimately want tog et from this data is a table containing the tracking number, the first track detail, and the last track detail. What I'm wondering is, is there a better way to compile this in XML/JSON that will make it easier to convert to a tibble/df down the line? Is there any easy way/preferred format to select based on the fact that I know most of the columns will have the same name ("Track Detail") and the DFs will have to be different lengths (since each package will have a different number of track details) when I'm trying to compile 1,000 of results into one final output?
Using XML::xmlToList() will store the ID attribute in .attrs:
$TrackSummary
[1] " Your item was delivered at 6:50 am on February 6 in BARTOW FL 33830."
$TrackDetail
[1] "February 6 6:49 am NOTICE LEFT BARTOW FL 33830"
$TrackDetail
[1] "February 6 6:48 am ARRIVAL AT UNIT BARTOW FL 33830"
$TrackDetail
[1] "February 6 3:49 am ARRIVAL AT UNIT LAKELAND FL 33805"
$TrackDetail
[1] "February 5 7:28 pm ENROUTE 33699"
$TrackDetail
[1] "February 5 7:18 pm ACCEPT OR PICKUP 33699"
$.attrs
ID
"XXXXXXXXXXX1"
A way of using that output which assumes that the Summary and ID are always present as first and last elements, respectively, is:
xml_data <- XML::xmlToList("71563898.xml") %>%
unlist() %>% # flattening
unname() # removing names
data.frame (
ID = tail(xml_data, 1), # getting last element
Summary = head(xml_data, 1), # getting first element
Info = xml_data %>% head(-1) %>% tail(-1) # remove first and last elements
)

Getting as.Date to show year only instead of automatically including today's date

When I try to create an XTS object using an existing column with years as characters, my xts object automatically included today's date instead of only the year as I specified it. Is there any way to only include the year?
Here's my code:
global_totals_ts <- xts(global_totals_m[,-1], as.Date(ts_index, format = "%Y"))
and the output that I get is:
Christians Muslims Hindus Agnostics Buddhists
1900-05-17 557754602 200318122 202973290 3028610 126956371
1910-05-17 611362430 222347113 223383337 3368564 138064000
1950-05-17 870653646 338066461 323138775 129261500 175510794
1970-05-17 1229448027 570772699 462980539 544290164 234957917
2000-05-17 1987502477 1292170756 822391937 660693376 452314303
2005-05-17 2130604801 1427056087 893077485 669224713 477436475
I want the following output:
Christians Muslims Hindus Agnostics Buddhists
1900 557754602 200318122 202973290 3028610 126956371
1910 611362430 222347113 223383337 3368564 138064000
1950 870653646 338066461 323138775 129261500 175510794
1970 1229448027 570772699 462980539 544290164 234957917
2000 1987502477 1292170756 822391937 660693376 452314303
2005 2130604801 1427056087 893077485 669224713 477436475
thanks very much!
Date objects will always have days (because dates have days).
One alternative is to keep it as a date, but floor it by year. Then the dates are always the first day of the year, so that, for instance, group_by() operations will be done by year.
library(lubridate)
global_totals_ts <- xts(global_totals_m[,-1], floor_date(as.Date(ts_index, format = "%Y"), "year"))

sqlSave, How to Write data to SQL developer having date Column containing hyphen

I have a dataframe data,Which Contains the columns having integers,and columns containing date and time,As shown
>head(data,2)
PRESSURE AMBIENT_TEMP OUTLET_PRESSURE COMP_STATUS DATE TIME predict
1 14 65 21 0 2014-01-09 12:45:00 0.6025863
2 17 65 22 0 2014-01-10 06:00:00 0.6657910
And Now i'm going to write this back to Sql database by the chunck
sqlSave(channel,data,tablename = "ANL_ASSET_CO",append = T)
Where channel is connection name,But this gives error
[RODBC] Failed exec in Update
22018 1722 [Oracle][ODBC][Ora]ORA-01722: invalid number
But When i try excluding the date column ,it writes back without any error.
> sqlSave(channel,data[,c(1:4,7)],tablename = "ANL_ASSET_CO",append = T)
> sqlSave(channel,data[,c(1:4,6:7)],tablename = "ANL_ASSET_CO",append = T)
Because of the date column the data is not writing to ORACLE SQL developer,Could be problem with the hyphen.
How can i write , Any help !!
>class(data$DATE)
[1] "POSIXct" "POSIXt"
So had to change the data type as character
>data$DATE <- as.character(data$DATE)
>sqlSave(channel,data,tablename = "ANL_ASSET_CO",append=T)
This one worked!!

R! posIXCT in sqldf

first time question, so if I missed something I apologize:
I imported an excel file into R! using XLconnect, the str() function is as follow:
data.frame': 931 obs. of 5 variables:
$ Media : chr "EEM" "EEM" "EEM" "EEM" ...
$ Month : POSIXct, format: "2014-08-01" "2014-08-01" "2014-08-01" "2014-08-01" ...
$ Request_Row : num 8 25 26 37 38 44 53 62 69 83 ...
$ Total_Click : num 12 9 9 8 8 8 7 7 7 7 ...
$ Match_Type : chr "S" "S" "S" "S" ...
when I use the following sqldf I get no rows selected, anyway to what could be wrong:
sqldf(" select Media, sum(Total_Click) , avg(Request_Row), min(Request_Row) , max(Request_Row), count(distinct(Media)) from All_Data
where Request_Row < 100
and month='2014-09-01'
group by 1,2 order by 2,6 desc ")
<0 rows> (or 0-length row.names)
Thanks for the help
Vj
Its not clear what is intended but the code shown has these problems:
Month is used in the data but month is used in the SQL statement
SQLite has no date or time types and so if you send a POSIXct value to SQLite it will be interpreted as the number of seconds since the UNIX epoch (in GMT time zone). Thus the comparison of the month to a character string won't work. You can convert the number of seconds to yy-mm-dd using the SQLite strftime or date functions. Alternately use a database that has datetime types. sqldf supports the H2 database and it supports date and time types.
The statement is trying to group by Media and sum(Total_Click). Grouping by an aggregated value is not legal although perhaps it could be done by nesting selects depending on what you intended.
Since the statement is grouping by Media the expressoin count(distinct(Media)) fromAll_Data will always be 1 since there can only be one Media in such a group.
You will need to clarify what your intent is but if we drop or fix up the various points we can get this:
sqldf("select
Media,
sum(Total_Click) sum_Total_Click,
avg(Request_Row) avg_Request_Row,
min(Request_Row) min_Request_Row,
max(Request_Row) max_Request_Row
from All_Data
where Request_Row < 100
and date(month, 'unixepoch', 'localtime') = '2014-08-01'
group by 1 order by 2 desc")
which gives:
Media sum_Total_Click avg_Request_Row min_Request_Row max_Request_Row
1 EEM 38 24 8 37
RH2 To use the RH2 package and H2 database instead be sure you have Java and RH2 installed (RH2 includes the H2 database so that does not need to be separately installed) and then:
library(RH2)
library(sqldf)
sqldf("...")
where the ... is replaced with the same SQL statement except the date comparison simplifies to this line:
and month = '2014-08-01'
Data: When posting to the SO R tag please show your data using dput. In this case this was used:
All_Data <-
structure(list(Media = c("EEM", "EEM", "EEM", "EEM"), Month = structure(c(1406865600,
1406865600, 1406865600, 1406865600), class = c("POSIXct", "POSIXt"
), tzone = ""), Request_Row = c(8, 25, 26, 37), Total_Click = c(12,
9, 9, 8), Match_Type = c("S", "S", "S", "S")), .Names = c("Media",
"Month", "Request_Row", "Total_Click", "Match_Type"), row.names = c(NA,
-4L), class = "data.frame")
Update: Misc revisions.

R: generate dataframe of Friday dates for the year [duplicate]

This question already has answers here:
Get Dates of a Certain Weekday from a Year in R
(3 answers)
Closed 9 years ago.
I would like to generate a dataframe that contains all the Friday dates for the whole year.
Is there a simple way to do this?
eg for December 2013: (6/12/13,13/12/13,20/12/13,27/12/13)
Thank you for your help.
I'm sure there is a simpler way, but you could brute force it easy enough:
dates <- seq.Date(as.Date("2013-01-01"),as.Date("2013-12-31"),by="1 day")
dates[weekdays(dates)=="Friday"]
dates[format(dates,"%w")==5]
Building on #Frank's good work, you can find all of any specific weekday between two dates like so:
pick.wkday <- function(selday,start,end) {
fwd.7 <- start + 0:6
first.day <- fwd.7[as.numeric(format(fwd.7,"%w"))==selday]
seq.Date(first.day,end,by="week")
}
start and end need to be Date objects, and selday is the day of the week you want (0-6 representing Sunday-Saturday).
i.e. - for the current query:
pick.wkday(5,as.Date("2013-01-01"),as.Date("2013-12-31"))
Here is a way.
d <- as.Date(1:365, origin = "2013-1-1")
d[strftime(d,"%A") == "Friday"]
Alternately, this would be a more efficient approach for generating the data for an arbitrary number of Fridays:
wk1 <- as.Date(seq(1:7), origin = "2013-1-1") # choose start date & make 7 consecutive days
wk1[weekdays(wk1) == "Friday"] # find Friday in the sequence of 7 days
seq.Date(wk1[weekdays(wk1) == "Friday"], length.out=50, by=7) # use it to generate fridays
by=7 says go to the next Friday.
length.out controls the number of Fridays to generate. One could also use to to control how many Fridays are generated (e.g. use to=as.Date("2013-12-31") instead of length.out).
Takes a year as input and returns only the fridays...
getFridays <- function(year) {
dates <- seq(as.Date(paste0(year,"-01-01")),as.Date(paste0(year,"-12-31")), by = "day")
dates[weekdays(dates) == "Friday"]
}
Example:
> getFridays(2000)
[1] "2000-01-07" "2000-01-14" "2000-01-21" "2000-01-28" "2000-02-04" "2000-02-11" "2000-02-18" "2000-02-25" "2000-03-03" "2000-03-10" "2000-03-17" "2000-03-24" "2000-03-31"
[14] "2000-04-07" "2000-04-14" "2000-04-21" "2000-04-28" "2000-05-05" "2000-05-12" "2000-05-19" "2000-05-26" "2000-06-02" "2000-06-09" "2000-06-16" "2000-06-23" "2000-06-30"
[27] "2000-07-07" "2000-07-14" "2000-07-21" "2000-07-28" "2000-08-04" "2000-08-11" "2000-08-18" "2000-08-25" "2000-09-01" "2000-09-08" "2000-09-15" "2000-09-22" "2000-09-29"
[40] "2000-10-06" "2000-10-13" "2000-10-20" "2000-10-27" "2000-11-03" "2000-11-10" "2000-11-17" "2000-11-24" "2000-12-01" "2000-12-08" "2000-12-15" "2000-12-22" "2000-12-29"
There are probably more elegant ways to do this, but here's one way to generate a vector of Fridays, given any year.
year = 2007
st <- as.POSIXlt(paste0(year, "/1/01"))
en <- as.Date(paste0(year, "/12/31"))
#get to the next Friday
skip_ahead <- 5 - st$wday
if(st$wday == 6) skip_ahead <- 6 #for Saturdays, skip 6 days ahead.
first.friday <- as.Date(st) + skip_ahead
dates <- seq(first.friday, to=en, by ="7 days")
dates
#[1] "2007-01-05" "2007-01-12" "2007-01-19" "2007-01-26"
# [5] "2007-02-02" "2007-02-09" "2007-02-16" "2007-02-23"
# [9] "2007-03-02" "2007-03-09" "2007-03-16" "2007-03-23"
I think this would be the most efficient way and would also returns all the Friday in the whole of 2013.
FirstWeek <- seq(as.Date("2013/1/1"), as.Date("2013/1/7"), "days")
seq(
FirstWeek[weekdays(FirstWeek) == "Friday"],
as.Date("2013/12/31"),
by = "week"
)

Resources