making 2 digit month and day columns in R - r

I'm very new to R, so this might seem straightforward. But I have a data frame with an original date column that has values that look like this: 4-02-91, 5-29-93 (i.e. m-d-y). I am trying to separate this column into 3, where months, days, and years are separate. Then I need to combine them again to this format 19910402, 19930529 - I need it this way in order to compare it to another dataset with similar dates.
Here is what I've been trying to do:
# Make DATE an actual date column
dataframe$DATE <- as.Date(used$DATE, format="%m-%d-%Y")
# This changes the original date column into something that looks like this: 1991-04-02, 1993-05-29
# Separate DATE into multiple columns
dataframe$year <- year(dataframe$DATE)
dataframe$month <- month(dataframe$DATE)
dataframe$day <- day(dataframe$DATE)
# Combine dates again to get string
dataframe$raster_date<-paste(dataframe$year, dataframe$month, dataframe$day, sep = "")
The last step looks great except where the months or days are single digits. It's coming out as 199142 and 1993529 instead of 19910402 and 19930529. How do I insert zeros when the month and day values are 1 digit?

Here, we can use sprintf instead of paste as the year, month, day from lubridate extracts those as numeric values and numeric class would drop the 0 padded as prefix. We add those prefix with 0s in sprintf
sprintf("%04d%02d%02d", dataframe$year, dataframe$month, dataframe$day)

Related

Subset a dataframe based on numerical values of a string inside a variable

I have a data frame which is a time series of meteorological measurement with monthly resolution from 1961 till 2018. I am interested in the variable that measures the monthly average temperature since I need the multi-annual average temperature for the summers.
To do this I must filter from the "DateVaraible" column the fifth and sixth digit, which are the month.
The values in time column are formatted like this
"19610701". So I need the 07(Juli) after 1961.
I start coding for 1 month for other purposes, so I did not try anything worth to mention. I guess that .grepl could do the work, but I do not know how the "matching" operator works.
So I started with this code that works.
summersmonth<- Df[DateVariable %like% "19610101" I DateVariable %like% "19610201"]
I am expecting a code like this
summermonths <- Df[DateVariable %like% "**06**" I DateVariable%like% "**07**..]
So that all entries with month digit from 06 to 09 are saved in the new dataframe summermonths.
Thanks in advance for any reply or feedback regarding my question.
Update
Thank to your answers I got the first part, which is to convert the variable in a as.date with the format "month"(Class=char)
Now I need to select months from Juni to September .
A horrible way to get the result I wanted is to do several subset and a rbind afterward.
Sommer1<-subset(Df, MonthVar == "Mai")
Sommer2<-subset(Df, MonthVar == "Juli")
Sommer3<-subset(Df, MonthVar == "September")
SummerTotal<-rbind(Sommer1,Sommer2,Sommer3)
I would be very glad to see this written in a tidy way.
Update 2 - Solution
Here is the tidy way, as here Using multiple criteria in subset function and logical operators
Veg_Seas<-subset(Df, subset = MonthVar %in% c("Mai","Juni","Juli","August","September"))
You can convert your date variable as date (format) and take the month:
allmonths <- month(as.Date(Df$DateVariable, format="%Y%m%d"))
Note that of your column has been originally imported as factor you need to convert it to character first:
allmonths <- month(as.Date(as.character(Df$DateVariable), format="%Y%m%d"))
Then you can check whether it is a summermonth:
summersmonth <- Df[allmonths %in% 6:9, ]
Example:
as.Date("20190702", format="%Y%m%d")
[1] "2019-07-02"
month(as.Date("20190702", format="%Y%m%d"))
[1] 7
We can use anydate from anytime to convert to Date class and then extract the month
library(anytime)
month(anydate(as.character(Df$DateVariable)))

Date.Time column split

I am trying to split the Date.Time column in my data table to separate date and time columns. currently the column is as character.
this is what I already tried but it just gave me a column with 2019 dates. I don't want the year to be 2019 so doesn't work. even if it does, not sure how to get the time to a separate column
office$date <- as.Date(office$Date.Time, format = '%m/%d')
office$date <- as.Date(office$Date.Time, format = '%m/%d')
Date require the year field. You can remove the year field, but the result will be a character, not the date format.
office$date <- as.Date(office$Date.Time, format = 'Y%/%m/%d')
office$date <- as.character(gsub("^.{5}","",office$date))

function in R that creates dummies for given time period

There is a data frame like this:
The first two columns in the df describe the start date (month and year) and the end date (month and year). Column names describe every single month and year of a certain time period.
I need a function/loop that insterts "1" or "0" in each cell - "1" when the date from given column name is within the period described by the two first columns, and "0" if not.
I would appreciate any help.
You want to do two different things. (a) create a dummy variable and (b) see if a particular date is in an interval.
Making a dummy variable is the easiest one, in base R you can use ifelse. For example in the iris data frame:
iris$dummy <- ifelse(iris$Sepal.Width > 2.5, 1, 0)
Now working with dates is more complicated. In this answer we will use the library lubridate. First you need to convert all those dates to a format 'Month Year' to something that R can understand. For example for February you could do:
new_format_february_2016 <- interval(ymd('2016-02-01'), ymd('2016-03-01') - dseconds(1))
#[1] 2016-02-01 UTC--2016-02-29 23:59:59 UTC
This is February, the interval of time from the 1 of February to one second before the 1 of March. You can do the same with your start date column and you end date column.
To compare two intevals of time (so, to see if a particular month fall into your other intervals) you can do:
int_overlaps(new_format_february_2016, other_interval)
If this returns true, the two intervals (one particular month and another one) overlaps. This is not the same as one being inside another, but in your case it will work. Using this you can iterate over different columns and rows and build your dummy variable.
But before doing so, I would recommend to clean your data, as your current format is complicate to work with. To get all the power that vector types in R provides ideally you would want to have one row per observation and one variable per column. This does not seem to be the case with your data frame. Take a look to the chapter 'Tidy data' of 'R for Data Science' specially the spreading and gathering subsection:
Tidy data

Converting variables in form of "2015M01" to date format in R?

I have a date frame df that simply looks like this:
month values
2012M01 99904
2012M02 99616
2012M03 99530
2012M04 99500
2012M05 99380
2012M06 99103
2013M01 98533
2013M02 97600
2013M03 96431
2013M04 95369
2013M05 94527
2013M06 93783
with month that was written in form of "M01", "M02"... and so on.
Now I want to convert this column to date format, is there a way to do it in R with lubridate?
I also want to select columns that contain one certain month from each year, like only March columns from all these years, what is the best way to do it?
The short answer is that dates require a year, month and day, so you cannot convert directly to a date format. You have 2 options.
Option 1: convert to a year-month format using zoo::as.yearmon.
library(zoo)
df$yearmon <- as.yearmon(df$month, "%YM%m")
# you can get e.g. month from that
months(df$yearmon[1])
# [1] "January"
Option 2: convert to a date by assuming that the day is always the first day of the month.
df$date <- as.Date(paste(df$month, "01", sep = "-"), "%YM%m-%d")
For selection (and I think you mean select rows, not columns), you already have everything you need. For example, to select only March 2013:
library(dplyr)
df %>% filter(month == "2013M03")
Something like this will get it:
raw <- "2012M01"
dt <- strptime(raw,format = "%YM%m")
dt will be in a Posix format. The strptime function will assign a '1' as the default day of month to make it a complete date.

R - fill in values for all dates

I have a data set with sales by date, where date is not unique and not all dates are represented: my data set has dates (the date of the sale), quantity, and totalprice. This is an irregular time series.
What I'd like is a vector of sales by date, with every date represented exactly once, and quantities and totalprice summed by date, with zeros where there are no sales.
I have part of this now; I can make a sequence containing all dates:
first_date=as.Date(min(dates))
last_date=as.Date(max(dates))
all_dates=seq(first_date, by=1, to=last_date)
And I can aggregate the sales data by sale date:
quantitybydate=aggregate(quantity, by=list(as.Date(dates)), sum)
But not sure what to do next. If this were python I'd loop through one of the dates arrays, setting or getting the related quantity. But this being R I suspect there's a better way.
Make a dataframe with the all_dates as a column, then merge with quantitybydate using the by variable columns as the by.y, and all.x=TRUE. Then replace the NA's by 0.

Resources