The dataset looks pretty much like this
I searched around but found only the function that needs a delimiter. I managed to import the file to R successfully with two columns.
Then I want to separate DATE column into "Year", "Month", and "Date." So I want to have 4 column in total. And this is where I got stock. The column doesn't have usual "/" or "-" that usually come with the date format.
Thanks for your help.
As #alistaire has shown, you can convert what you have to an R recognised date format with (replace the single character string in the below with your column vector df$DATE to work on the entire data frame):
date <- as.Date( '19981201', '%Y%m%d' )
date
[1] "1998-12-01"
From there, you can separate out your year, month, and day as you please.
year <- format( date, "%Y")
year
[1] "1998"
month <- format( date, "%m" )
month
[1] "12"
day <- format( date, "%d" )
day
[1] "01"
Of course, you could also skip the date step, and just split the first 8 characters into 3 shorter strings (as #warmoverflow has suggested), but I'd recommend the above as probably better in practice. Mostly because you'll be best using the date format for things like sorting and plotting, so it would be a good idea to use it along the way too.
IN RESPONSE TO YOUR ANSWER/FOLLOW-UP-QUESTION:
Notice in the console output in step 3, the column vector is labeled as class int (integer). You probably need to make sure it's being fed to as.Date as character. It looks like that's what you tried to do in step 4, but by surrounding the vector reference in quotations, you pass the character string "v1$DATE", which R has no idea what to do with. Instead:
v1$date_v2 <- as.Date( as.character( v1$DATE ), format = "%Y%m%d" )
Related
I have a large xts object, with multiple variable. The index is daily in that manner, it corresponds to exact days, however there is only one observation for each variable in a month. Is there a way to drop the day from the index and only keep year-month?
To ilustrate my problem for instance I have var1 with an observation on 2011-06-28 and var2 with observation 2011-06-30. I would like to index both as 2011-06
Thanks
alternatively you could "tell" R that you are using dates of a certain format with the as.Date() function and then use format() to change it to the format you desire.
Like this:
dates=c("2011-06-28","2011-06-29","2011-06-30","2011-07-1") #test string with dates in original format
dates2 <- format(as.Date(dates,"%Y-%m-%d"), format="%Y-%m") #changing the "%Y-%m-%d" format to the desired "%Y-%m"
print(dates2)
Edit: If you only want to change the index of a xts:
indexFormat(xts_object) <- "%Y-%m"
Cheers
Chris
You can probably do this:
Use gsub (replace a pattern with whatever you want) with regex (a sequence of characters that define a search pattern in e.g. a string).
The pattern is done with regex, which has lots of metacharacters that allow you to do more advanced things. The dot (.) is a wildcard and the $ anchors it at the back. So the pattern is basically any 3 characters before the end and replace them with nothing.
your_object<-c("2011-06-28","2011-06-30")
gsub(pattern = "...$", replace = "", x = your_object)
Here is a guide for using gsub with regex (http://uc-r.github.io/regex).
I have a dataset in R with a column called event_date.
The variables look like this:
31-Dec-18
30-Dec-18
28-Dec-18
And so on.
I want to create a new column called date where I separate out the day of the event. So it looks like:
31
30
28
I'm pretty new to working with R, so I'm wondering whether a for loop is the way to go, or if there's a more efficient way I don't know about.
if the dates are of type character
df$date <- sub(".*-.*-(.*)","\\1", df$event_date)
otherwise you can look into creating data type objects in R.
If the days are two digit, then substr would be faster
df$day <- substr(df$event_date, 1, 2)
Or convert to Date class and extract the day
df$day <- format(as.Date(df$event_date, "%d-%b-%y"), "%d")
Im trying to convert a column from character to date.
The current format in the column is "YearMmonth", as in "1990M01".
It is a really weird format, so i'm wondering how R reads this when i use the as.Date code. It is basically "Year Month Month-number". I know how to use the rest of the code, i just need to know how to translate this to R.
I have tried using
df <- as.Date(df, "%YM%m", "%Y/%m")
df <- as.Date(paste0("01-", df), format = "%Y/%m/%d")
and alot others, the main problem is translating the character column.
There are several problems with the code in the question:
The first attempt in the question does not have a day field
the second attempt does but puts the day field first yet the format says it is last.
the use of df in the question suggests that the result is a data frame yet the result is a Date class object, not a data frame.
Here are some approaches that work.
yearmon
Use as.yearmon to convert to a yearmon object or as.Date(as.yearmon(...)) to convert to a Date object. yearmon objects directly represent year and month without day so that may be preferable.
library(zoo)
as.yearmon("1990M01", "%YM%m")
## [1] "Jan 1990"
as.Date(as.yearmon("1990M01", "%YM%m"))
## [1] "1990-01-01"
Replacing the M with a minus would also work:
as.yearmon(chartr("M", "-", "1990M01"))
## [1] "Jan 1990"
Base R
A way that does not involve any packages is to append a day:
as.Date(paste("1990M01", 1), "%YM%m %d")
## [1] "1990-01-01"
or change the M to a minus and append a minus and a 1 in which case it is already in the default format and no format string is needed.
as.Date(sub("M(\\d+)", "-\\1-1", "1990M01"))
## [1] "1990-01-01"
I am trying to convert integer data from my data frame in R, to date format.
The data is under column named svcg_cycle within orig_svcg_filtered data frame.
The original data looking something like 200502, 200503, and so forth, and I expect to turn it into yyyy-mm-dd format.
I am trying to use this code:
as.Date(orig_svcg_filtered$svcg_cycle, origin = "2000-01-01")
but the output is not something that I expected:
[1] "2548-12-15" "2548-12-15" "2548-12-15" "2548-12-15" "2548-12-15"
while it is supposed to be 2005-02-01, 2005-03-01, and so forth.
How to solve this?
If you have
x <- c(200502, 200503)
Then
as.Date(x, origin = "2000-01-01")
tells R you want the days 200,502 and 200,503 days after 2000-01-01. From help("as.Date"):
as.Date will accept numeric data (the number of days since an epoch),
but only if origin is supplied.
So, integer data gives days after the origin supplied, not some sort of numeric code for the dates like 200502 for "2005-02-01".
What you want is
as.Date(paste(substr(x, 1, 4), substr(x, 5, 6), "01", sep = "-"))
# [1] "2005-02-01" "2005-03-01"
The
paste(substr(x, 1, 4), substr(x, 5, 6), "01", sep = "-")
part takes your integers and creates strings like
# [1] "2005-02-01" "2005-03-01"
Then as.Date() knows how to deal with them.
You could alternatively do something like
as.Date(paste0(x, "01"), format = "%Y%m%d")
# [1] "2005-02-01" "2005-03-01"
This just pastes on an "01" to each element (for the day), converts to character, and tells as.Date() what format to read the date into. (See help("as.Date") and help("strptime")).
I like to use Regex to fix these kinds of string formatting issues. as.Date by default only checks for several standard date formats like YYYY-MM-DD. origin is used when you have an integer date (i.e. seconds from some reference point), but in this case your date is actually not an integer date, rather it's just a date formatted as a string of integers.
We simply split the month and day with a dash, and add a day, in this case the first of the month, to make it a valid date (you must have a day to store it as a date object in R). The Regex bit captures the first 4 digits in group one and final two digits in group two. We then combine the two groups, separated by dashes, along with the day.
as.Date(gsub("^(\\d{4})(\\d{2})", "\\1-\\2-01", x))
[1] "2005-02-01" "2005-03-01"
You don't need to specify format in this case, because YYYY-MM-DD is one of the standard formats as.Date checks, however, the format argument is format = "%Y-%m-%d"
I am fairly new to R and need help with applying operations to an entire column in a dataframe. Imagine a few values of the date_time column in the df look like this:
date_time
2017-05-01T00:00:00.000Z
2017-05-01T10:00:00.000Z
2017-05-01T20:00:00.000Z
...
Currently date_time is of type factor. If everything was formatted nicely, I think I want to do something similar to what I have below to convert it to DateTime (based off of what I've been seeing online):
df$date_time <- strptime(x = as.character(df$date_time), format = "%Y-%m-%d %H:%M:%S")
Does this look correct?
Assuming the code above is correct for converting the factor into DateTime, we still need to do some formatting for that to work. In order to use the code above, I have to get rid of the T that separates the date and time and replace that with ' ', and cut off the .000Z at the end. How can I do this for the entire column?
Thanks!