R: Daily data to monthly - r

I have a large xts object, with multiple variable. The index is daily in that manner, it corresponds to exact days, however there is only one observation for each variable in a month. Is there a way to drop the day from the index and only keep year-month?
To ilustrate my problem for instance I have var1 with an observation on 2011-06-28 and var2 with observation 2011-06-30. I would like to index both as 2011-06
Thanks

alternatively you could "tell" R that you are using dates of a certain format with the as.Date() function and then use format() to change it to the format you desire.
Like this:
dates=c("2011-06-28","2011-06-29","2011-06-30","2011-07-1") #test string with dates in original format
dates2 <- format(as.Date(dates,"%Y-%m-%d"), format="%Y-%m") #changing the "%Y-%m-%d" format to the desired "%Y-%m"
print(dates2)
Edit: If you only want to change the index of a xts:
indexFormat(xts_object) <- "%Y-%m"
Cheers
Chris

You can probably do this:
Use gsub (replace a pattern with whatever you want) with regex (a sequence of characters that define a search pattern in e.g. a string).
The pattern is done with regex, which has lots of metacharacters that allow you to do more advanced things. The dot (.) is a wildcard and the $ anchors it at the back. So the pattern is basically any 3 characters before the end and replace them with nothing.
your_object<-c("2011-06-28","2011-06-30")
gsub(pattern = "...$", replace = "", x = your_object)
Here is a guide for using gsub with regex (http://uc-r.github.io/regex).

Related

How to alter multiple objects in a data frame in R?

I am working with a database that has 6 columns: 'cik', 'company.name', 'form.type', 'date.filed', 'nword.hits' and 'ticker'. The 'date.filed' column has objects of the following form: 2014-02-21, these numbers are dates. I would like to filter only the year and create a new column for this dataset, so that only the number 2014 remains. First I transformed the variables into as.character by the following code:
t <- transform(fdt, date.filed = as.character(date.filed))
Then, I separated the numbers by using following command:
bb <- strsplit(t$date.filed, split = "-")
In this way, the variables became as follows: '2014''02''21'.
In order to extract the years, I used the following code:
ex11 <- substr(bb, start = 1, stop = 8)
oficial <- data.frame(ex11)
I was able to extract the years, however they looked like this: c("2014". I wonder if there is any way to remove the c, the parentheses and the strings.
Thank you in advance!
Instead of converting to character and then splitting or substring the dates, it may be better to convert to Date class and use the methods to extract those components
# // assuming the format to be in YYYY-MM-DD
fdt$date.filed <- as.Date(fdt$date.filed)
fdt$year <- as.integer(format(fdt$date.filed, "%Y"))
In the OP's code, output of strsplit is a list of vectors. Instead of applying substr on the list (which is already splitted and just needs to extract the first component as in the comments), we need substr on the original column
substr(fdt$date.filed, 1, 4)
NOTE: For Date columns, the recommended solution is to use Date methods instead of regex or substring

How to change dataframe R column from DD/MM/YYYY to YYYY/MM/DD without returning NAs

I have a dataframe that looks like this (see picture below). I want to change the date from DD/MM/YYYY to YYYY/MM/DD but for some reason it returns "NA" values! I think it has to do with the time values behind the date (I do not need those values).
The code I used was this (supposing DF is the data frame)
DF[,1] <- as.Date(DF[,1] , format = "%d-%m-%Y")```
Gregor Thomas gave me the answer: The format you show in the picture has slashes /, but the format string you use has dashes -. Try format = "%d/%m/%Y"

Add leading Zero to a Date

I have dates in a character vector. I cannot easily convert to a date vector using as.Date, because not all of the strings have the form mm/dd/yyyy, thus giving me the ambiguous date error. Some strings have the form m/dd/yyyy (months 1:9).
Here's part of the vector:
data$Date <- c("8/26/2014","3/10/2014","9/25/2014","11/12/2014","8/4/2015")
Indicator for date to let me know which strings I need to add a zero to
data$date <- grepl("[0-9]{2}/[0-9]{2}/[0-9]{4}", data$Date)
Attempt to add zeros through a conditional:
data$Date<-ifelse(data$date == "FALSE", paste0("0", data$Date), data$Date)
Doesn't work (I'm not familiar with paste). Any concise solutions to add a leading zero to single digit months (m/dd/yyy)? I'm guessing gsub or sub? I need all the strings to be in form mm/dd/yyy so I can convert to a date vector.
data <- data.frame(Date=c("8/26/2014","3/10/2014","9/25/2014","11/12/2014","8/4/2015"))
as.Date(data$Date,format="%m/%d/%Y")
works fine for me with your data. Output is
"2014-08-26" "2014-03-10" "2014-09-25" "2014-11-12" "2015-08-04"

How to remove the date from a column containing both date and time using R

I have read a csv file
input <- read.csv("abc.csv",header=FALSE)
and the data frame looks like..
i want my expected result like this..
in the timeStamp column i want to replace "2017/03/10 08:35:07.996" to only "08:35:07.996".
Tried a lot but could find any solution.Please help
We can set the digits.secs to 3, then convert the 'timeStamp' to DateTime class with strptime and format it
op <- options(digits.secs=3)
input$timeStamp <- format(strptime(input$timeStamp, "%Y/%m/%d %H:%M:%OS"), "%H:%M:%OS")
Though, it is better not to use regex on timestamps, one way is to match one or more non-white space (\\S+) character followed by one or more white space (\\s+) from the start (^) of the string and replace it with blanks ("") so that the rest of the string i.e. time part remains
input$timeStamp <- sub("^\\S+\\s+", "", input$timeStamp)
You can split the column into two using the separate function from the tidyr package...
newDat <- separate(Dat, timeStamp, into = c("date", "time"), sep = " ")
Then simply remove the date column if you don't want it.
Use stringr library to deal with strings :
library(stringr) #Do install.packages("stringr") if you don' have it
input <- read.csv("abc.csv",header=FALSE)
input$timeStamp <- str_split(as.character(input$timeStamp)," ")[[1]][2]

R: Separate String Without Delimiter (like Fixed Width in Excel)

The dataset looks pretty much like this
I searched around but found only the function that needs a delimiter. I managed to import the file to R successfully with two columns.
Then I want to separate DATE column into "Year", "Month", and "Date." So I want to have 4 column in total. And this is where I got stock. The column doesn't have usual "/" or "-" that usually come with the date format.
Thanks for your help.
As #alistaire has shown, you can convert what you have to an R recognised date format with (replace the single character string in the below with your column vector df$DATE to work on the entire data frame):
date <- as.Date( '19981201', '%Y%m%d' )
date
[1] "1998-12-01"
From there, you can separate out your year, month, and day as you please.
year <- format( date, "%Y")
year
[1] "1998"
month <- format( date, "%m" )
month
[1] "12"
day <- format( date, "%d" )
day
[1] "01"
Of course, you could also skip the date step, and just split the first 8 characters into 3 shorter strings (as #warmoverflow has suggested), but I'd recommend the above as probably better in practice. Mostly because you'll be best using the date format for things like sorting and plotting, so it would be a good idea to use it along the way too.
IN RESPONSE TO YOUR ANSWER/FOLLOW-UP-QUESTION:
Notice in the console output in step 3, the column vector is labeled as class int (integer). You probably need to make sure it's being fed to as.Date as character. It looks like that's what you tried to do in step 4, but by surrounding the vector reference in quotations, you pass the character string "v1$DATE", which R has no idea what to do with. Instead:
v1$date_v2 <- as.Date( as.character( v1$DATE ), format = "%Y%m%d" )

Resources