I'm trying to group all dates houses in San Francisco were sold by year. I'm using the following code
geo_big$month <- as.Date(paste0(strftime(geo_big$date, format = "%Y-%m"), "-01"))
geo_big$date_r <- cut(geo_big$month, breaks = as.Date(c("2003-04-01", "2004-01-01", "2005-01-01", "2006-01-01", "2007-01-01", "2008-11-01")), include.lowest = TRUE, labels = as.Date(c("2003-01 - 2004-12", "2004-01 - 2004-12", "2005-01 - 2005-12", "2006-01 - 2006-12", "2007-01 - 2007-12", "2008-01 - 2008-11")))
And getting this message:
Error in charToDate(x) :
character string is not in a standard unambiguous format
Anyone know what's going on?
The error given should indicate to you that the issue is not cut but as.Date. (It's complaining to you about not being able to determine the format of the date)
More specifically, it is what you have givn as labels. No need to wrap those in as.Date
The labels should be character and c(.) and the quotation marks are sufficient for that.
Just as a bit of hand, the code above can be cleaned up in a few areas.
Also, the lubridate package might be very useful to you.
# instead of:
geo_big$month <- as.Date(paste0(strftime(geo_big$date, format = "%Y-%m"), "-01"))
# you can use `floor_date`:
library(lubridate)
geo_big$month <- floor_date(geo_big$date, "month") # from the `lubridate` pkg
# instead of:
... a giant cut statement...
# use variables for ease of reading and debugging
# bks <- as.Date(c("2003-04-01", "2004-01-01", "2005-01-01", "2006-01-01", "2007-01-01", "2008-11-01"))
# or:
bks <- c(dmin, seq.Date(ceiling_date(dmin, "year"), floor_date(dmax, "year"), by="year"), dmax) # still using library(lubridate)
# basing your labels on your breaks helps guard against human error & typos
lbls <- head(floor_date(bks, "year"), -1) # dropping the last one, and adding dmax
lbls <- paste( substr(lbls, 1, 7), substr(c(lbls[-1] - 1, dmax), 1, 7), sep=" - ")
# a cleaner, more readable `cut` statement
cut(geo_big$month, breaks=bks, include.lowest=TRUE, labels=lbls)
Related
My problem is that I am importing a CSV file, and trying to get R to recognize the date column as dates and format them as such.
So far I have achieved to replace the format seen below "#yyyy-mm-dd#" with the integer date value in R.
But when I check the class before and after the transformation it still says "character".
I need the column to be recognized as a date class so that I can use it for forecasting. But
DemandCSV <- read_csv("C:/Users/pth/Desktop/Care/Demand.csv")
nrow <- nrow(DemandCSV)
for(i in 1:nrow){
DemandCSV[i,1] <-as.Date(ymd(substr(DemandCSV[i,1], 2, 11)))
}
DemandCSV[,1] <- format(DemandCSV[,1], "%Y-%m-%d")
Figured out an inelegant solution (turns out it was not a solution)
DemandCSV <- read_csv("C:/Users/pth/Desktop/Care/Demand.csv")
nrow <- nrow(DemandCSV)
for(i in 1:nrow){
DemandCSV[i,1] <-as.Date(ymd(substr(DemandCSV[i,1], 2, 11)))
DemandCSV[i,1] <- format(as.Date(as.numeric(DemandCSV[i,1],origin = "01-01-1970")), "%Y-%m-%d")}
DemandCSV %>% pad %>% fill_by_value(0)
Does including the "#" in the format string solve your problem?
data <- c("#2019-09-23#", "#2019-09-24#", "#2019-09-25#")
a <- as.Date(data,format="#%Y-%m-%d#")
or
DemandCSV <- data.frame(date=
c("#2019-09-23#", "#2019-09-24#", "#2019-09-25#"))
mutate_at(DemandCSV,"date",as.Date,format="#%Y-%m-%d#")
Maybe simpler to
Substitute out the #
Rely on anydate from the anytime package
Demo:
R> data <- c("#2019-09-23#", "#2019-09-24#", "#2019-09-25#")
R> anytime::anydate(gsub("#", "", data))
[1] "2019-09-23" "2019-09-24" "2019-09-25"
R>
I have the following code:
gsub("-","/",paste(cut(seq(as.POSIXct(Sys.Date(),format="%d-%b-%y"), by = "-1 day", length.out = 10),"days"),collapse = ","))
The output:
"2019/03/20,2019/03/19,2019/03/18,2019/03/17,2019/03/16,2019/03/15,2019/03/14,2019/03/13,2019/03/12,2019/03/11"
However the desired result is
'20/03/2019','19/03/2019','18/03/2019','17/03/2019','16/03/2019','15/03/2019','14/03/2019','13/03/2019','12/03/2019','11/03/2019'
How can I accomplish that ?
Regards
Not sure what you are trying to do but you can generate the required output by doing
format(Sys.Date() - 1:10, "%d/%m/%Y")
#[1] "20/03/2019" "19/03/2019" "18/03/2019" "17/03/2019" "16/03/2019" "15/03/2019"
# "14/03/2019" "13/03/2019" "12/03/2019" "11/03/2019"
Sadly this answer here seems to not work for me.
From what I saw in the documentation, in the latest version, 0.10-1, the major.format parameter has been removed, opposed to previous versions, like 0.9-7, which has the major.format, that would solve easily my question.
It seems such a major feature to be deprecated. Is there any new way to do this? Seems something simple and easy, but I've been digging this issue for hours without success.
In case the issue lies in my code, here is a snippet of what I'm using.
merra2 = read.table("C:/merra2.csv", header=TRUE, sep=",", na.strings="NA", dec=".", strip.white=TRUE)
merra2$utc = as.POSIXct(merra2$utc, format = "%Y-%m-%d %H:%M:%S", tz="UTC")
merra2$m2_power = as.xts(x=merra2[,"m2_power"],order.by=merra2[,"utc"])
merra2$doy = as.xts(x=merra2[,"doy"],order.by=merra2[,"utc"])
plot.xts(merra2$m2_power, col="blue", lwd = 2, major.ticks="weeks", subset="2012-04-01/2014-04-01")
plot.xts(merra2$m2_power, col="blue", lwd = 2, major.ticks="months", subset="2012-04-01/2014-04-01")
And the input file contains something like:
utc,m2_power,doy
"1980-01-01 00:00:00",643.000,181.5000
"1980-01-01 01:00:00",643.000,181.4583
"1980-01-01 02:00:00",354.000,181.4167
If I add the major.format parameter, nothing changes, the axis stays the same.
Here, a reproductible example :
# Generate a sequence of Dates
StartDate<-"2017-07-01"
EndDate<- "2018-07-05"
dates<-seq(as.POSIXct(StartDate, format="%Y-%m-%d", tz="UTC")
, as.POSIXct(EndDate, format="%Y-%m-%d", tz="UTC")
, by='mins')
# Generate a sequence of x
x <- seq(1, length(dates))
# Create a dataframe, renaming columns
df <- as.data.frame(cbind(as.character(dates,format="%Y-%m-%d", tz="UTC"),x))
colnames(df) <- c("Dates","x")
# Redefine format
df$Dates <- as.POSIXct(df$Dates,format="%Y-%m-%d", tz="UTC")
df$x2 <- as.xts(x= as.numeric(df$x),order.by=df$Dates )
# Plot results
plot.xts(df$x2
, col="blue"
, lwd = 2
, major.ticks="weeks"
, major.format = TRUE
, subset="2017-08-01/2017-08-30")
If you change "major.ticks" the axis change... Have you take a look on the "utc" variable ? What is the complete time interval?
Link to the data set which is a date and time column along with electricity usage columns
https://d396qusza40orc.cloudfront.net/exdata%2Fdata%2Fhousehold_power_consumption.zip
power1 <- read.csv(file = "c:/datasets/household_power_consumption.txt", stringsAsFactors=F, header = TRUE,
sep=";", dec = ".", na.strings="?", col.names = c("date1","time1","Global_active_power", "Global_reactive_power",
"Voltage","Global_intensity","Sub_metering_1","Sub_metering_2",
"Sub_metering_3"))
power1$date1 <- as.Date(power1$date1, format="%d/%m/%Y")
power2 <- subset(power1, subset=(date1 >= "2007-02-01" & date1 <= "2007-02-02"))
datetime1 <- paste(as.Date(power2$date1), power2$time1)
power2$Datetime <- as.POSIXct(datetime1)
plot(power2$Global_active_power~power2$Datetime, type="l", ylab="Global Active Power (kilowatts)", xlab="")
When I run the above, I get the graph like I'm supposed to with the days of the week on the x axis even when I run summary, head and str() I don't see anything in the data about days of the week.
I tried to add my own day column with mutate but it didn't work.
And it didn't work when I subset it like the following. It subset properly where I had only the data I needed, but it wouldn't plot with the date1 column or the day of the week column I created via mutate
power2 <- subset(power1, subset=(as.Date(date1, format = "%d/%m/%Y") >= "2007-02-01"
& as.Date(date1, format = "%d/%m/%Y") <= "2007-02-02"))
I know that as.Posixct will have all the metadata in there, but I don't understand why is it when I combine the date and time columns into it's own column only then it plots by day of the week graphwithout me asking.
When I run it like this, the combined date and time column data is corrupted with the wrong year
power11 <- read.csv(file = "c:/datasets/household_power_consumption.txt", stringsAsFactors=F, header = TRUE,
sep=";", dec = ".", col.names = c("date1","time1","Global_active_power", "Global_reactive_power",
"Voltage","Global_intensity","Sub_metering_1","Sub_metering_2",
"Sub_metering_3"))
#colClasses = c("Date", "character", "factor", "numeric","numeric","numeric","numeric","numeric","numeric"))
power22 <- subset(power11, subset=(as.Date(date1, format = "%d/%m/%Y") >= "2007-02-01"
& as.Date(date1, format = "%d/%m/%Y") <= "2007-02-02"))
datetime1 <- paste(as.Date(power22$date1), power22$time1)
power22$Datetime <- as.POSIXct(datetime1)
Maybe this link would be helpful:
http://earlh.com/blog/2009/07/07/plotting-with-custom-x-axis-labels-in-r-part-5-in-a-series/
add an argument to your plot() call: xaxt='n'
plot(power2$Global_active_power~power2$Datetime, type="l", ylab="Global Active Power (kilowatts)", xlab="", xaxt='n')
that tells plot not to add x-axis labels. Then add an axis() call:
axis(side=1, at=power22$Datetime, labels=format(power22$Datetime, '%b-%y'))
I used '%b-%y' here, because that's what I saw on the site I referenced, but you would want to use the format code appropriate to your needs.
This is the output I am aiming for where z will provide a plot of mine with an x-axis.
z<-c("2014-01", "", "", "2014-04", "", "", "2014-07", "", "","2014-10", "", "", "2015-01")
The plot will be produced once a month and I aim on automating the axis creation. 2015-01 is my last available data point and the plot will display a span of 1 year back to 2014-01. Next update will make me want to set the plot to 2015-02 to 2014-02.
As can be seen, the labelling of the axis goes back in 3 month steps, leaving the ticks in between empty.
Can I automate this process by providing just the latest label 2015-02 and the rest gets deducted by R somehow?
I was thinking maybe to convert my starting point 2015-01 to a date Format and sequence it back to 2014-01. Then making it a character again and using it for the axis...
myz <- as.Date(c("2014-01", "2015-01"), "%Y - %m")
But myz is empty. And ofc the problem with the empty tick is far froms olved.
Any advice?
This might be done more efficiently but here is a solution:
version <- "201701"
endDate = as.Date(paste0(version,"01"), "%Y%m%d")
startDateString = paste0(as.integer(substr(version, 1, 4)) - 3, substr(version, 5, 6))
startDate = as.Date(paste0(startDateString,"01"), "%Y%m%d")
rngx <- format(seq(startDate, endDate, "3 months"), "%Y-%m")
output = c()
for(i in 1:(length(rngx)-1)) {
output = c(output, rngx[i], "", "")
}
output = c(output, rngx[length(rngx)])
version <- "201701" marks the start date (and is used for some other parts in the script as well, for example the setwd()