R string vector seq - r

This is the output I am aiming for where z will provide a plot of mine with an x-axis.
z<-c("2014-01", "", "", "2014-04", "", "", "2014-07", "", "","2014-10", "", "", "2015-01")
The plot will be produced once a month and I aim on automating the axis creation. 2015-01 is my last available data point and the plot will display a span of 1 year back to 2014-01. Next update will make me want to set the plot to 2015-02 to 2014-02.
As can be seen, the labelling of the axis goes back in 3 month steps, leaving the ticks in between empty.
Can I automate this process by providing just the latest label 2015-02 and the rest gets deducted by R somehow?
I was thinking maybe to convert my starting point 2015-01 to a date Format and sequence it back to 2014-01. Then making it a character again and using it for the axis...
myz <- as.Date(c("2014-01", "2015-01"), "%Y - %m")
But myz is empty. And ofc the problem with the empty tick is far froms olved.
Any advice?

This might be done more efficiently but here is a solution:
version <- "201701"
endDate = as.Date(paste0(version,"01"), "%Y%m%d")
startDateString = paste0(as.integer(substr(version, 1, 4)) - 3, substr(version, 5, 6))
startDate = as.Date(paste0(startDateString,"01"), "%Y%m%d")
rngx <- format(seq(startDate, endDate, "3 months"), "%Y-%m")
output = c()
for(i in 1:(length(rngx)-1)) {
output = c(output, rngx[i], "", "")
}
output = c(output, rngx[length(rngx)])
version <- "201701" marks the start date (and is used for some other parts in the script as well, for example the setwd()

Related

Transform chr to date format in R

I want to transform from chr to date format
I have this representing year -week:
2020-53
I ve tried to do this
mutate(semana=as_date(year_week,format="%Y-%U"))
but I get the same date in all dataset 2020-01-18
I also tried
mutate(semana=strptime(year_week, "%Y-%U"))
getting the same result
Here you can see the wrong convertion
Any idea?, thanks
I think I've got something that does the job.
library(tidyverse)
library(lubridate)
# Set up table like example in post
trybble <- tibble(year_week = c("2020-53", rep("2021-01", 5)),
country = c("UK", "FR", "GER", "ITA", "SPA", "UK"))
# Function to go into mutate with given info of year and week
y_wsetter <- function(fixme, yeargoal, weekgoal) {
lubridate::year(fixme) <- yeargoal
lubridate::week(fixme) <- weekgoal
return(fixme)
}
# Making a random date so col gets set right
rando <- make_datetime(year = 2021, month = 1, day = 1)
# Show time
trybble <- trybble %>%
add_column(semana = rando) %>% # Set up col of dates to fix
mutate(yerr = substr(year_week, 1, 4)) %>% # Get year as chr
mutate(week = substr(year_week, 6, 7)) %>% # Get week as chr
mutate(semana2 = y_wsetter(semana,
as.numeric(yerr),
as.numeric(week))) %>% # fixed dates
select(-c(yerr, week, semana))
Notes:
If you somehow plug in a week greater than 53, lubridate doesn't mind, and goes forward a year.
I really struggled to get mutate to play nicely without writing my own function y_wsetter. In my experience with mutates with multiple inputs, or where I'm changing a "property" of a value instead of the whole value itself, I need to probably write a function. I'm using the lubridate package to change just the year or week based on your year_week column, so this is one such situation where a quick function helps mutate out.
I was having a weird time when I tried setting rando to Sys.Date(), so I manually set it to something using make_datetime. YMMV

R: transforming character to date with only year and month to apply a dateRange input on a boxplot output in a Shiny app

I have dataframe (in the following called df) with the first column df$date being dates with type character, formatted as %Y-%m.
My first question would be: how can I transform the types of all the entries in this column from character to date, keeping the same format with only year and month? I've tried as.Date("2011-08", format(%Y-%m)), as.yearmon("2011-08") (returns a num, but I want a Date), format(as.Date("2011-08"),"%Y-%m"). All didn't work.
The reason I want to change the type of this column is that I want to implement a dateRange input in a Shiny app, ranging from the minimum to the maximum date in the column mentioned above. Maybe there is another solution to this without needing to change the type?
This is my input in the Shiny-App:
box(width = 4, height = "50px",
dateRangeInput(inputId = 'dateRange',
label = "Period of analysis : ",
format = "yyyy-mm",language="en",
start = min(df$date),
end = max(df$date),
startview = "year", separator = " - ")
)
I want to have min(df$date) resp. max(df$date) for start and end, but it is not working. Again, the problem seems to be, that df$date has the type chr, e.g. "2011-08".
My Output-Code in the server function of the Shiny-App looks like this:
output$PartPlot <- renderPlot({
PartPlot_new <- subset(df, date >= input$dateRange[1] & date <= input$dateRange[2])
boxplot(PartPlot_new[, input$Table2], xlab = "Part", ylab = "Percentage")
})
As you can see, the goal is to have boxplots from the other columns of df (containing percentages).
Appreciate any help! Thanks in advance.
You can transform date string into a date value, with make use of this handy library:
library(anytime)
times <- c("2004-03-21 12:45:33.123456",
"2004/03/21 12:45:33.123456",
"20040321 124533.123456",
"03/21/2004 12:45:33.123456",
"03-21-2004 12:45:33.123456",
"2004-03-21",
"20040321",
"03/21/2004",
"03-21-2004",
"20010101")
anydate(times)
[1] "2004-03-21" "2004-03-21" "2004-03-21" "2004-03-21" "2004-03-21" "2004-03-21"
[7] "2004-03-21" "2004-03-21" "2004-03-21" "2001-01-01"
In your case, if you want to convert with %Y-%M format, you can also try:
times <- c("2018-11")
anydate(times)
[1] "2018-11-01"
What novica commented is absolutely correct. Leave your date as a full proper date with year, month, and day, where day can be 1. Column type will obviously be date. If you need to show the year and month in a label client-side, take the date value, format as a string and drop off the day. In your shiny client page where you select the inputs, you can show the year and month "label" but the "value" you pass will be a full date (or you can add the day server-side before using dateRangeInput.

Changing date format when plotting xts object in R

Sadly this answer here seems to not work for me.
From what I saw in the documentation, in the latest version, 0.10-1, the major.format parameter has been removed, opposed to previous versions, like 0.9-7, which has the major.format, that would solve easily my question.
It seems such a major feature to be deprecated. Is there any new way to do this? Seems something simple and easy, but I've been digging this issue for hours without success.
In case the issue lies in my code, here is a snippet of what I'm using.
merra2 = read.table("C:/merra2.csv", header=TRUE, sep=",", na.strings="NA", dec=".", strip.white=TRUE)
merra2$utc = as.POSIXct(merra2$utc, format = "%Y-%m-%d %H:%M:%S", tz="UTC")
merra2$m2_power = as.xts(x=merra2[,"m2_power"],order.by=merra2[,"utc"])
merra2$doy = as.xts(x=merra2[,"doy"],order.by=merra2[,"utc"])
plot.xts(merra2$m2_power, col="blue", lwd = 2, major.ticks="weeks", subset="2012-04-01/2014-04-01")
plot.xts(merra2$m2_power, col="blue", lwd = 2, major.ticks="months", subset="2012-04-01/2014-04-01")
And the input file contains something like:
utc,m2_power,doy
"1980-01-01 00:00:00",643.000,181.5000
"1980-01-01 01:00:00",643.000,181.4583
"1980-01-01 02:00:00",354.000,181.4167
If I add the major.format parameter, nothing changes, the axis stays the same.
Here, a reproductible example :
# Generate a sequence of Dates
StartDate<-"2017-07-01"
EndDate<- "2018-07-05"
dates<-seq(as.POSIXct(StartDate, format="%Y-%m-%d", tz="UTC")
, as.POSIXct(EndDate, format="%Y-%m-%d", tz="UTC")
, by='mins')
# Generate a sequence of x
x <- seq(1, length(dates))
# Create a dataframe, renaming columns
df <- as.data.frame(cbind(as.character(dates,format="%Y-%m-%d", tz="UTC"),x))
colnames(df) <- c("Dates","x")
# Redefine format
df$Dates <- as.POSIXct(df$Dates,format="%Y-%m-%d", tz="UTC")
df$x2 <- as.xts(x= as.numeric(df$x),order.by=df$Dates )
# Plot results
plot.xts(df$x2
, col="blue"
, lwd = 2
, major.ticks="weeks"
, major.format = TRUE
, subset="2017-08-01/2017-08-30")
If you change "major.ticks" the axis change... Have you take a look on the "utc" variable ? What is the complete time interval?

Why does this code plot by day of the week?

Link to the data set which is a date and time column along with electricity usage columns
https://d396qusza40orc.cloudfront.net/exdata%2Fdata%2Fhousehold_power_consumption.zip
power1 <- read.csv(file = "c:/datasets/household_power_consumption.txt", stringsAsFactors=F, header = TRUE,
sep=";", dec = ".", na.strings="?", col.names = c("date1","time1","Global_active_power", "Global_reactive_power",
"Voltage","Global_intensity","Sub_metering_1","Sub_metering_2",
"Sub_metering_3"))
power1$date1 <- as.Date(power1$date1, format="%d/%m/%Y")
power2 <- subset(power1, subset=(date1 >= "2007-02-01" & date1 <= "2007-02-02"))
datetime1 <- paste(as.Date(power2$date1), power2$time1)
power2$Datetime <- as.POSIXct(datetime1)
plot(power2$Global_active_power~power2$Datetime, type="l", ylab="Global Active Power (kilowatts)", xlab="")
When I run the above, I get the graph like I'm supposed to with the days of the week on the x axis even when I run summary, head and str() I don't see anything in the data about days of the week.
I tried to add my own day column with mutate but it didn't work.
And it didn't work when I subset it like the following. It subset properly where I had only the data I needed, but it wouldn't plot with the date1 column or the day of the week column I created via mutate
power2 <- subset(power1, subset=(as.Date(date1, format = "%d/%m/%Y") >= "2007-02-01"
& as.Date(date1, format = "%d/%m/%Y") <= "2007-02-02"))
I know that as.Posixct will have all the metadata in there, but I don't understand why is it when I combine the date and time columns into it's own column only then it plots by day of the week graphwithout me asking.
When I run it like this, the combined date and time column data is corrupted with the wrong year
power11 <- read.csv(file = "c:/datasets/household_power_consumption.txt", stringsAsFactors=F, header = TRUE,
sep=";", dec = ".", col.names = c("date1","time1","Global_active_power", "Global_reactive_power",
"Voltage","Global_intensity","Sub_metering_1","Sub_metering_2",
"Sub_metering_3"))
#colClasses = c("Date", "character", "factor", "numeric","numeric","numeric","numeric","numeric","numeric"))
power22 <- subset(power11, subset=(as.Date(date1, format = "%d/%m/%Y") >= "2007-02-01"
& as.Date(date1, format = "%d/%m/%Y") <= "2007-02-02"))
datetime1 <- paste(as.Date(power22$date1), power22$time1)
power22$Datetime <- as.POSIXct(datetime1)
Maybe this link would be helpful:
http://earlh.com/blog/2009/07/07/plotting-with-custom-x-axis-labels-in-r-part-5-in-a-series/
add an argument to your plot() call: xaxt='n'
plot(power2$Global_active_power~power2$Datetime, type="l", ylab="Global Active Power (kilowatts)", xlab="", xaxt='n')
that tells plot not to add x-axis labels. Then add an axis() call:
axis(side=1, at=power22$Datetime, labels=format(power22$Datetime, '%b-%y'))
I used '%b-%y' here, because that's what I saw on the site I referenced, but you would want to use the format code appropriate to your needs.

Error in `cut` function

I'm trying to group all dates houses in San Francisco were sold by year. I'm using the following code
geo_big$month <- as.Date(paste0(strftime(geo_big$date, format = "%Y-%m"), "-01"))
geo_big$date_r <- cut(geo_big$month, breaks = as.Date(c("2003-04-01", "2004-01-01", "2005-01-01", "2006-01-01", "2007-01-01", "2008-11-01")), include.lowest = TRUE, labels = as.Date(c("2003-01 - 2004-12", "2004-01 - 2004-12", "2005-01 - 2005-12", "2006-01 - 2006-12", "2007-01 - 2007-12", "2008-01 - 2008-11")))
And getting this message:
Error in charToDate(x) :
character string is not in a standard unambiguous format
Anyone know what's going on?
The error given should indicate to you that the issue is not cut but as.Date. (It's complaining to you about not being able to determine the format of the date)
More specifically, it is what you have givn as labels. No need to wrap those in as.Date
The labels should be character and c(.) and the quotation marks are sufficient for that.
Just as a bit of hand, the code above can be cleaned up in a few areas.
Also, the lubridate package might be very useful to you.
# instead of:
geo_big$month <- as.Date(paste0(strftime(geo_big$date, format = "%Y-%m"), "-01"))
# you can use `floor_date`:
library(lubridate)
geo_big$month <- floor_date(geo_big$date, "month") # from the `lubridate` pkg
# instead of:
... a giant cut statement...
# use variables for ease of reading and debugging
# bks <- as.Date(c("2003-04-01", "2004-01-01", "2005-01-01", "2006-01-01", "2007-01-01", "2008-11-01"))
# or:
bks <- c(dmin, seq.Date(ceiling_date(dmin, "year"), floor_date(dmax, "year"), by="year"), dmax) # still using library(lubridate)
# basing your labels on your breaks helps guard against human error & typos
lbls <- head(floor_date(bks, "year"), -1) # dropping the last one, and adding dmax
lbls <- paste( substr(lbls, 1, 7), substr(c(lbls[-1] - 1, dmax), 1, 7), sep=" - ")
# a cleaner, more readable `cut` statement
cut(geo_big$month, breaks=bks, include.lowest=TRUE, labels=lbls)

Resources