Add a variable including the day of the week

Add a variable including the day of the week - r

This could seem a repetition but I haven't found this exact question's answer yet.
I have this dataframe:
Day.of.the.month Month Year Items Amount.in.euros
1 1 January 2005 Nothing 0.00
2 2 February 2008 Food 7.78
3 3 April 2009 Nothing 0.00
4 4 January 2016 Bus 2.00
I want to create a column named "day.of.the.week" including, of course, "saturday", "sunday" and so on. If the date was formatted as '2012/02/02' I would not have probs, but this way I don't know whether there is a way nor a workaround to solve the issue.
Any hint?

Do you want this?
options(stringsAsFactors = F)
df <- data.frame( x = c(1, 2, 3, 4) ,y = c("January", "February","April", "January"), z = c(2005, 2008, 2009, 2016))
weekdays(as.Date(paste0(df$x, df$y, df$z),"%d%B%Y")) # %d for date, %B for month in complete and %Y for year complete
This is just a side note
Note: Since someone commented that this solution is being locale dependent. So if that is the case you can always do "Sys.setlocale("LC_TIME", "C")" to change your locale settings also, use Sys.getlocale() to get your locale settings.
If someone interested in making this permanent while starting the R session everytime:
You can also write below script on your .RProfile file (which is usually located at your home directory , in windows it is mostly found at Documents folder):
.First <- function() {
Sys.setlocale("LC_TIME", "C")
}

Day.of.the.month<-as.numeric(c(1,2,3,4))
Month<-as.character(c("January","February","April","January"))
Year<-as.numeric(c(2005,2008,2009,2016))
Items<-as.character(c("Nothing","Food","Nothing","Bus"))
Amount.in.euros<-as.numeric(c(0.00,7.78,0.0,2.0))
complete.date<-paste(Year,Month,Day.of.the.month)
strptime(complete.date, "%Y %B %d")
example1.data <-
data.frame(Day.of.the.month,Month,Year,Items,Amount.in.euros,complete.date)
example1.data$weekday <- weekdays(as.Date(example1.data$complete.date))

Related

Create vector through ifelse in R

I've got two values in my global environment associated with columns starting Priority_ one is called week and one is called rest.
Week can be any number from 01 to 52
While rest can be 2018, 2017, 2016.
I'm creating an ifelse statement that if the week is equal to 3 and the year is equal to 2017 then the vector output needs to go to year 2016 and week 52. This equals to week = 3 and rest = Priority_2017
This is the code I use to create the vector:
test<-function(year){
c(paste0(rest, seq(week, by = -1, length.out=3)),paste0(year,52))
}
Then because the year before 2017 is 2016 I enter:
test("Priority_2016")
Gives the output:
[1] "Priority_20173" "Priority_20172" "Priority_20171" "Priority_201752"
I want to add this into an ifelse statement as rest and week will change, I try and use the below code:
test<-function(year){
ifelse((rest %in% c("Priority_2018", "Priority_2017","Priority_2016") & week ==3), c(paste0(rest, seq(week, by = -1, length.out=3)),paste0(year,52)),0)
}
But when I put in:
test("Priority_2016")
this only outputs:
[1] "Priority_20173"
Please let me know if you need any more information from me.

Creating a unified time-series, with dates coming from different (natural) languages

I am using the as.Date function as follows:
x$time_date <- as.Date(x$time_date, format = "%H:%M - %d %b %Y")
This worked fine until I saw a lot of NA values in the output, which I traced back to some of the dates stemming from a different language: German.
My English dates look like this: 18:00 - 10 Dec 2014
Where the German equivalent is: 18:00 - 10 Dez 2014
The month December is abbreviated the German way. This is not recognised by the as.Date function. I have the same problem for five other months:
Mar - März
May - Mai
Jun - Juni
Jul - Juli
Oct - Okt
This looks like it would be of use, but I am unsure of how to implement it for 'unrecognised' formats:
How to change multiple Date formats in same column
I attempted to just go through and use gsub to replace all the occurences of German months, but without luck. x below is the data.table and I work on just the time_date column:
x$time_date <- gsub("(März)?", "Mar", x$time_date) %>%
gsub("(Mai)?", "May", .) %>%
gsub("(Juni)?", "Jun", .) %>%
gsub("(Juli)?", "Jul", .) %>%
gsub("(Okt)?", "Oct", .) %>%
gsub("(Dez)?", "Dec", .)
Not only did this not work, but it is also a very slow process and I have nearly 20 GB of pure .csv files to work through.
In the as.Date documentation there is mention of different locales / languages, but not how to work with several simultaneously. I also found instructions on how to use different languages, however my data is all mixed, so I can only thing of a conditional loop using the correct language for each file, however that would also be slow.
Is there a known workaround for this, which I can't find?

Create a table tab that contains all the translations and then use subscripting to actually do the translation. The code below seems to work for me on Windows provided your input abbreviations are the same as the standard ones generated but the precise language names ("German", etc.) may vary depending on your system. See ?Sys.setlocale for more information. Also if the abbreviations in your input are different than the ones generated here you will have to add those to tab yourself, e.g. tab <- c(tab, Juli = "Jul")
langs <- c("French", "German", "English")
tab <- unlist(lapply(langs, function(lang) {
Sys.setlocale("LC_TIME", lang)
nms <- format(ISOdate(2000, 1:12, 1), "%b")
setNames(month.abb, nms)
}))
x <- c("18:00 - 10 Juli 2014", "18:00 - 10 Mai 2014") # test input
source_month <- gsub("[^[:alpha:]]", "", x)
mapply(sub, source_month, tab[source_month], x, USE.NAMES = FALSE)
giving:
[1] "18:00 - 10 Jul 2014" "18:00 - 10 May 2014"

How do I change the index in a csv file to a proper time format?

I have a CSV file of 1000 daily prices
They are of this format:
1 1.6
2 2.5
3 0.2
4 ..
5 ..
6
7 ..
.
.
1700 1.3
The index is from 1:1700
But I need to specify a begin date and end date this way:
Start period is lets say, 25th january 2009
and the last 1700th value corresponds to 14th may 2013
So far Ive gotten this close to this problem:
> dseries <- ts(dseries[,1], start = ??time??, freq = 30)
How do I go about this? thanks
UPDATE:
managed to create a seperate object with dates as suggested in the answers and plotted it, but the y axis is weird, as shown in the screenshot

Something like this?
as.Date("25-01-2009",format="%d-%m-%Y") + (seq(1:1700)-1)
A better way, thanks to #AnandaMahto:
seq(as.Date("2009-01-25"), by="1 day", length.out=1700)
Plotting:
df <- data.frame(
myDate=seq(as.Date("2009-01-25"), by="1 day", length.out=1700),
myPrice=runif(1700)
)
plot(df)

R stores Date-classed objects as the integer offset from "1970-01-01" but the as.Date.numeric function needs an offset ('origin') which can be any staring date:
rDate <- as.Date.numeric(dseries[,1], origin="2009-01-24")
Testing:
> rDate <- as.Date.numeric(1:10, origin="2009-01-24")
> rDate
[1] "2009-01-25" "2009-01-26" "2009-01-27" "2009-01-28" "2009-01-29"
[6] "2009-01-30" "2009-01-31" "2009-02-01" "2009-02-02" "2009-02-03"
You didn't need to add the extension .numeric since R would automticallly seek out that function if you used the generic stem, as.Date, with an integer argument. I just put it in because as.Date.numeric has different arguments than as.Date.character.

Date sequence in R spanning B.C.E. to A.D

I would like to generate a sequence of dates from 10,000 B.C.E. to the present. This is easy for 0 C.E. (or A.D.):
ADtoNow <- seq.Date(from = as.Date("0/1/1"), to = Sys.Date(), by = "day")
But I am stumped as to how to generate dates before 0 AD. Obviously, I could do years before present but it would be nice to be able to graph something as BCE and AD.

To expand on Ricardo's suggestion, here is some testing of how things work. Or don't work for that matter.
I will repeat Joshua's warning taken from ?as.Date for future searchers in big bold letters:
"Note: Years before 1CE (aka 1AD) will probably not be handled correctly."
as.integer(as.Date("0/1/1"))
[1] -719528
as.integer(seq(as.Date("0/1/1"),length=2,by="-10000 years"))
[1] -719528 -4371953
seq(as.Date(-4371953,origin="1970-01-01"),Sys.Date(),by="1000 years")
# nonsense
[1] "0000-01-01" "'000-01-01" "(000-01-01" ")000-01-01" "*000-01-01"
[6] "+000-01-01" ",000-01-01" "-000-01-01" ".000-01-01" "/000-01-01"
[11] "0000-01-01" "1000-01-01" "2000-01-01"
> as.integer(seq(as.Date(-4371953,origin="1970-01-01"),Sys.Date(),by="1000 years"))
# also possibly nonsense
[1] -4371953 -4006710 -3641468 -3276225 -2910983 -2545740 -2180498 -1815255
[9] -1450013 -1084770 -719528 -354285 10957
Though this does seem to work for graphing somewhat:
yrs1000 <- seq(as.Date(-4371953,origin="1970-01-01"),Sys.Date(),by="1000 years")
plot(yrs1000,rep(1,length(yrs1000)),axes=FALSE,ann=FALSE)
box()
axis(2)
axis(1,at=yrs1000,labels=c(paste(seq(10000,1000,by=-1000),"BC",sep=""),"0AD","1000AD","2000AD"))
title(xlab="Year",ylab="Value")

Quite some time has gone by since this question was asked. With that time came a new R package, gregorian which can handle BCE time values in the as_gregorian method.
Here's an example of piecewise constructing a list of dates that range from -10000 BCE to the current year.
library(lubridate)
library(gregorian)
# Container for the dates
dates <- c()
starting_year <- year(now())
# Add the CE dates to the list
for (year in starting_year:0){
date <- sprintf("%s-%s-%s", year, "1", "1")
dates <- c(dates, gregorian::as_gregorian(date))
}
starting_year <- "-10000"
# Add the BCE dates to the list
for (year in starting_year:0){
start_date <- gregorian::as_gregorian("-10000-1-1")
date <- sprintf("%s-%s-%s", year, "1", "1")
dates <- c(dates, gregorian::as_gregorian(date))
}
How you use the list is up to you, just know that the relevant properties of the date objects are year and bce. For example, you can loop over list of dates, parse the year, and determine if it's BCE or not.
> gregorian_date <- gregorian::as_gregorian("-10000-1-1")
> gregorian_date$bce
[1] TRUE
> gregorian_date$year
[1] 10001
Notes on 0AD
The gregorian package assumes that when you mean Year 0, you're really talking about year 1 (shown below). I personally think an exception should be thrown, but that's the mapping users needs to keep in mind.
> gregorian::as_gregorian("0-1-1")
[1] "Monday January 1, 1 CE"
This is also the case with BCE
> gregorian::as_gregorian("-0-1-1")
[1] "Saturday January 1, 1 BCE"

As #JoshuaUlrich commented, the short answer is no.
However, you can splice out the year into a separate column and then convert to integer. Would this work for you?

The package lubridate seems to handle "negative" years ok, although it does create a year 0, which from the above comments seems to be inaccurate. Try:
library(lubridate)
start <- -10000
stop <- 2013
myrange <- NULL
for (x in start:stop) {
myrange <- c(myrange,ymd(paste0(x,'-01-01')))
}

R time series data, daily only working days

I am using the following code:
dates<-seq(as.Date("1991/1/4"),as.Date("2010/3/1"),"days")
However, I would like to only have working days, how can it be done?
(Assuming that 1991/1/4 is a Monday, I would like to exclude: 1991/6/4 and 1991/7/4.
And that for each week.)
Thank you for your help.

Would this work for you? (note, it requires the timeDate package to be installed)
# install.packages('timeDate')
require(timeDate)
# A ’timeDate’ Sequence
tS <- timeSequence(as.Date("1991/1/4"), as.Date("2010/3/1"))
tS
# Subset weekdays
tW <- tS[isWeekday(tS)]; tW
dayOfWeek(tW)

You are entering your dates incorrectly. In order to use the YYYY/DD/MM input mode which is implied by 1991/1/4 being Monday, you need to have a format string in as.Date.
So the full solution assuming you want to exclude weekends is:
X <- seq( as.Date("1991/1/4", format="%Y/%m/%d"), as.Date("2010/3/1", format="%Y/%m/%d"),"days")
weekdays.X <- X[ ! weekdays(X) %in% c("Saturday", "Sunday") ]
# negation easier since only two cases in exclusion
# probably do not want to print that vector to screen.
str(weekdays.X)
Regarding your comment I am unable to reproduce. I get:
> table(weekdays(weekdays.X) )
Friday Monday Thursday Tuesday Wednesday
1000 1000 999 999 999

I came to this question while looking up business day functions, and since the OP requested "business days" instead of "weekdays", and timeDate also has the isBizday function, this answer uses that.
# A timeDate Sequence
date.sequence <- timeSequence(as.Date("1991-12-15"), as.Date("1992-01-15")); # a short example period with three London holidays
date.sequence;
# holidays in the period
years.included <- unique( as.integer( format( x=date.sequence, format="%Y" ) ) );
holidays <- holidayLONDON(years.included) # (locale was not specified by OP in question nor in profile, so this assumes for example: holidayLONDON; also supported by timeDate are: holidayNERC, holidayNYSE, holidayTSX & holidayZURICH)
# Subset business days
business.days <- date.sequence[isBizday(date.sequence, holidays)];
business.days

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Add a variable including the day of the week - r

Related

Create vector through ifelse in R

Creating a unified time-series, with dates coming from different (natural) languages

How do I change the index in a csv file to a proper time format?

Date sequence in R spanning B.C.E. to A.D

R time series data, daily only working days

Categories

Resources