Sadly this answer here seems to not work for me.
From what I saw in the documentation, in the latest version, 0.10-1, the major.format parameter has been removed, opposed to previous versions, like 0.9-7, which has the major.format, that would solve easily my question.
It seems such a major feature to be deprecated. Is there any new way to do this? Seems something simple and easy, but I've been digging this issue for hours without success.
In case the issue lies in my code, here is a snippet of what I'm using.
merra2 = read.table("C:/merra2.csv", header=TRUE, sep=",", na.strings="NA", dec=".", strip.white=TRUE)
merra2$utc = as.POSIXct(merra2$utc, format = "%Y-%m-%d %H:%M:%S", tz="UTC")
merra2$m2_power = as.xts(x=merra2[,"m2_power"],order.by=merra2[,"utc"])
merra2$doy = as.xts(x=merra2[,"doy"],order.by=merra2[,"utc"])
plot.xts(merra2$m2_power, col="blue", lwd = 2, major.ticks="weeks", subset="2012-04-01/2014-04-01")
plot.xts(merra2$m2_power, col="blue", lwd = 2, major.ticks="months", subset="2012-04-01/2014-04-01")
And the input file contains something like:
utc,m2_power,doy
"1980-01-01 00:00:00",643.000,181.5000
"1980-01-01 01:00:00",643.000,181.4583
"1980-01-01 02:00:00",354.000,181.4167
If I add the major.format parameter, nothing changes, the axis stays the same.
Here, a reproductible example :
# Generate a sequence of Dates
StartDate<-"2017-07-01"
EndDate<- "2018-07-05"
dates<-seq(as.POSIXct(StartDate, format="%Y-%m-%d", tz="UTC")
, as.POSIXct(EndDate, format="%Y-%m-%d", tz="UTC")
, by='mins')
# Generate a sequence of x
x <- seq(1, length(dates))
# Create a dataframe, renaming columns
df <- as.data.frame(cbind(as.character(dates,format="%Y-%m-%d", tz="UTC"),x))
colnames(df) <- c("Dates","x")
# Redefine format
df$Dates <- as.POSIXct(df$Dates,format="%Y-%m-%d", tz="UTC")
df$x2 <- as.xts(x= as.numeric(df$x),order.by=df$Dates )
# Plot results
plot.xts(df$x2
, col="blue"
, lwd = 2
, major.ticks="weeks"
, major.format = TRUE
, subset="2017-08-01/2017-08-30")
If you change "major.ticks" the axis change... Have you take a look on the "utc" variable ? What is the complete time interval?
Related
I have to pull different data sets from the same API regularly but for different reasons, so I have to write out the code for many different pulls. I'd like to create some functions to help with this, but I need some help.
I haven't been able to figure out how to set up the function so that I can change the data set but still pull from the same column each time. In this example, I have 3 columns with timestamps that mean different things (made up in this data). I need to change the timezone here to my local time zone. The column name will remain the same in all of my datasets, but the name of the dataset will change. I have a few places in my code where I need to do this, and I haven't been able to figure it out, so any suggestions would be much appreciated!
The second section of this example code is not included in the actual code, but it is there to set the data up correctly. The data comes out of the API in the format shown as GMT.
df <- data.frame(col_1 = c(1, 2, 3, 4),
time_1 = c("2021-01-20 23:58:21", "2021-01-20 21:21:00", "2021-01-20 17:14:04", "2021-01-20 01:05:18"),
time_2 = c("2021-01-19 23:58:21", "2021-01-19 21:21:00", "2021-01-19 17:14:04", "2021-01-19 01:05:18"),
time_3 = c("2021-01-18 23:46:21", "2021-01-18 36:21:00", "2021-01-18 15:14:04", "2021-01-18 01:05:18"),
time_4 = c("2021-01-17 23:58:21", "2021-01-17 20:21:00", "2021-01-17 18:14:04", "2021-01-17 02:05:18"))
# Not part of actual code
df$time_1 <- as.POSIXlt(df$time_1, tz = "GMT")
df$time_2 <- as.POSIXlt(df$time_2, tz = "GMT")
df$time_3 <- as.POSIXlt(df$time_3, tz = "GMT")
df$time_4 <- as.POSIXlt(df$time_4, tz = "GMT")
# What I want it to do
# df$time_1 <- lubridate::with_tz(df$time_1, tz = "America/Los_Angeles")
# df$time_2 <- lubridate::with_tz(df$time_2, tz = "America/Los_Angeles")
# df$time_3 <- lubridate::with_tz(df$time_3, tz = "America/Los_Angeles")
# df$time_4 <- lubridate::with_tz(df$time_4, tz = "America/Los_Angeles")
# Attempted function
timezone_cleanup <- function(my_df){
my_df$time_1 <- lubridate::with_tz(my_df$time_1, tz = "America/Los_Angeles")
my_df$time_2 <- lubridate::with_tz(my_df$time_2, tz = "America/Los_Angeles")
my_df$time_3 <- lubridate::with_tz(my_df$time_3, tz = "America/Los_Angeles")
my_df$time_4 <- lubridate::with_tz(my_df$time_4, tz = "America/Los_Angeles")
}
# how I'd like to use this function. Not working now. Even if I wrap it with data.frame(), it's not what I wanted.
new_df <- timezone_cleanup(df)
I think you need to return my_df in your function to get the changed dataframe back. However, you can use lapply or across to apply the same function to multiple columns.
library(dplyr)
timezone_cleanup <- function(my_df){
my_df %>%
mutate(across(starts_with('time'),
lubridate::with_tz, tz = "America/Los_Angeles"))
}
new_df <- timezone_cleanup(df)
By the way, I do recive a warning message while using this Unrecognized time zone 'America/Los_Angeles'. Are you sure you are using the correct tz value?
My problem is that I am importing a CSV file, and trying to get R to recognize the date column as dates and format them as such.
So far I have achieved to replace the format seen below "#yyyy-mm-dd#" with the integer date value in R.
But when I check the class before and after the transformation it still says "character".
I need the column to be recognized as a date class so that I can use it for forecasting. But
DemandCSV <- read_csv("C:/Users/pth/Desktop/Care/Demand.csv")
nrow <- nrow(DemandCSV)
for(i in 1:nrow){
DemandCSV[i,1] <-as.Date(ymd(substr(DemandCSV[i,1], 2, 11)))
}
DemandCSV[,1] <- format(DemandCSV[,1], "%Y-%m-%d")
Figured out an inelegant solution (turns out it was not a solution)
DemandCSV <- read_csv("C:/Users/pth/Desktop/Care/Demand.csv")
nrow <- nrow(DemandCSV)
for(i in 1:nrow){
DemandCSV[i,1] <-as.Date(ymd(substr(DemandCSV[i,1], 2, 11)))
DemandCSV[i,1] <- format(as.Date(as.numeric(DemandCSV[i,1],origin = "01-01-1970")), "%Y-%m-%d")}
DemandCSV %>% pad %>% fill_by_value(0)
Does including the "#" in the format string solve your problem?
data <- c("#2019-09-23#", "#2019-09-24#", "#2019-09-25#")
a <- as.Date(data,format="#%Y-%m-%d#")
or
DemandCSV <- data.frame(date=
c("#2019-09-23#", "#2019-09-24#", "#2019-09-25#"))
mutate_at(DemandCSV,"date",as.Date,format="#%Y-%m-%d#")
Maybe simpler to
Substitute out the #
Rely on anydate from the anytime package
Demo:
R> data <- c("#2019-09-23#", "#2019-09-24#", "#2019-09-25#")
R> anytime::anydate(gsub("#", "", data))
[1] "2019-09-23" "2019-09-24" "2019-09-25"
R>
Basically, I have this date set for electric consumption per min in a household and I have a data with 9 columns, my data is:
https://archive.ics.uci.edu/ml/datasets/Individual+household+electric+power+consumption
so I tried two things and got two somewhat different output and I cant seem to figure out why is that:
first input:
hpc$Datetime<-as.POSIXlt(hpc$Datetime, format = "%d/%m/%Y %H:%M:%S")
with(hpc,plot(Datetime,Global.active.power, ylab = "Global.active.power(Killowatts)",
xlab = "",type = "l"))
second input:
hpc<-read.table("hpc.txt", skip = 66637, nrow = 2879, sep =";")
hpc$Time<-strptime(hpc$Time, format = "%H:%M:%S")
hpc$Date<-as.Date(hpc$Date, format = "%d/%m/%Y")
with(hpc,plot(Time,Global.active.power, ylab = "Global.active.power(Killowatts)",
xlab = "",type = "l"))
Why is there a line appearing in the second image
It will be a great if someone can be kind enough to help me out!!
Thankyou in advance
Link to the data set which is a date and time column along with electricity usage columns
https://d396qusza40orc.cloudfront.net/exdata%2Fdata%2Fhousehold_power_consumption.zip
power1 <- read.csv(file = "c:/datasets/household_power_consumption.txt", stringsAsFactors=F, header = TRUE,
sep=";", dec = ".", na.strings="?", col.names = c("date1","time1","Global_active_power", "Global_reactive_power",
"Voltage","Global_intensity","Sub_metering_1","Sub_metering_2",
"Sub_metering_3"))
power1$date1 <- as.Date(power1$date1, format="%d/%m/%Y")
power2 <- subset(power1, subset=(date1 >= "2007-02-01" & date1 <= "2007-02-02"))
datetime1 <- paste(as.Date(power2$date1), power2$time1)
power2$Datetime <- as.POSIXct(datetime1)
plot(power2$Global_active_power~power2$Datetime, type="l", ylab="Global Active Power (kilowatts)", xlab="")
When I run the above, I get the graph like I'm supposed to with the days of the week on the x axis even when I run summary, head and str() I don't see anything in the data about days of the week.
I tried to add my own day column with mutate but it didn't work.
And it didn't work when I subset it like the following. It subset properly where I had only the data I needed, but it wouldn't plot with the date1 column or the day of the week column I created via mutate
power2 <- subset(power1, subset=(as.Date(date1, format = "%d/%m/%Y") >= "2007-02-01"
& as.Date(date1, format = "%d/%m/%Y") <= "2007-02-02"))
I know that as.Posixct will have all the metadata in there, but I don't understand why is it when I combine the date and time columns into it's own column only then it plots by day of the week graphwithout me asking.
When I run it like this, the combined date and time column data is corrupted with the wrong year
power11 <- read.csv(file = "c:/datasets/household_power_consumption.txt", stringsAsFactors=F, header = TRUE,
sep=";", dec = ".", col.names = c("date1","time1","Global_active_power", "Global_reactive_power",
"Voltage","Global_intensity","Sub_metering_1","Sub_metering_2",
"Sub_metering_3"))
#colClasses = c("Date", "character", "factor", "numeric","numeric","numeric","numeric","numeric","numeric"))
power22 <- subset(power11, subset=(as.Date(date1, format = "%d/%m/%Y") >= "2007-02-01"
& as.Date(date1, format = "%d/%m/%Y") <= "2007-02-02"))
datetime1 <- paste(as.Date(power22$date1), power22$time1)
power22$Datetime <- as.POSIXct(datetime1)
Maybe this link would be helpful:
http://earlh.com/blog/2009/07/07/plotting-with-custom-x-axis-labels-in-r-part-5-in-a-series/
add an argument to your plot() call: xaxt='n'
plot(power2$Global_active_power~power2$Datetime, type="l", ylab="Global Active Power (kilowatts)", xlab="", xaxt='n')
that tells plot not to add x-axis labels. Then add an axis() call:
axis(side=1, at=power22$Datetime, labels=format(power22$Datetime, '%b-%y'))
I used '%b-%y' here, because that's what I saw on the site I referenced, but you would want to use the format code appropriate to your needs.
I'm trying to group all dates houses in San Francisco were sold by year. I'm using the following code
geo_big$month <- as.Date(paste0(strftime(geo_big$date, format = "%Y-%m"), "-01"))
geo_big$date_r <- cut(geo_big$month, breaks = as.Date(c("2003-04-01", "2004-01-01", "2005-01-01", "2006-01-01", "2007-01-01", "2008-11-01")), include.lowest = TRUE, labels = as.Date(c("2003-01 - 2004-12", "2004-01 - 2004-12", "2005-01 - 2005-12", "2006-01 - 2006-12", "2007-01 - 2007-12", "2008-01 - 2008-11")))
And getting this message:
Error in charToDate(x) :
character string is not in a standard unambiguous format
Anyone know what's going on?
The error given should indicate to you that the issue is not cut but as.Date. (It's complaining to you about not being able to determine the format of the date)
More specifically, it is what you have givn as labels. No need to wrap those in as.Date
The labels should be character and c(.) and the quotation marks are sufficient for that.
Just as a bit of hand, the code above can be cleaned up in a few areas.
Also, the lubridate package might be very useful to you.
# instead of:
geo_big$month <- as.Date(paste0(strftime(geo_big$date, format = "%Y-%m"), "-01"))
# you can use `floor_date`:
library(lubridate)
geo_big$month <- floor_date(geo_big$date, "month") # from the `lubridate` pkg
# instead of:
... a giant cut statement...
# use variables for ease of reading and debugging
# bks <- as.Date(c("2003-04-01", "2004-01-01", "2005-01-01", "2006-01-01", "2007-01-01", "2008-11-01"))
# or:
bks <- c(dmin, seq.Date(ceiling_date(dmin, "year"), floor_date(dmax, "year"), by="year"), dmax) # still using library(lubridate)
# basing your labels on your breaks helps guard against human error & typos
lbls <- head(floor_date(bks, "year"), -1) # dropping the last one, and adding dmax
lbls <- paste( substr(lbls, 1, 7), substr(c(lbls[-1] - 1, dmax), 1, 7), sep=" - ")
# a cleaner, more readable `cut` statement
cut(geo_big$month, breaks=bks, include.lowest=TRUE, labels=lbls)