as.POSIXlt vs as.date and strptime - r

Basically, I have this date set for electric consumption per min in a household and I have a data with 9 columns, my data is:
https://archive.ics.uci.edu/ml/datasets/Individual+household+electric+power+consumption
so I tried two things and got two somewhat different output and I cant seem to figure out why is that:
first input:
hpc$Datetime<-as.POSIXlt(hpc$Datetime, format = "%d/%m/%Y %H:%M:%S")
with(hpc,plot(Datetime,Global.active.power, ylab = "Global.active.power(Killowatts)",
xlab = "",type = "l"))
second input:
hpc<-read.table("hpc.txt", skip = 66637, nrow = 2879, sep =";")
hpc$Time<-strptime(hpc$Time, format = "%H:%M:%S")
hpc$Date<-as.Date(hpc$Date, format = "%d/%m/%Y")
with(hpc,plot(Time,Global.active.power, ylab = "Global.active.power(Killowatts)",
xlab = "",type = "l"))
Why is there a line appearing in the second image
It will be a great if someone can be kind enough to help me out!!
Thankyou in advance

Related

Writing functions to reference specific columns

I have to pull different data sets from the same API regularly but for different reasons, so I have to write out the code for many different pulls. I'd like to create some functions to help with this, but I need some help.
I haven't been able to figure out how to set up the function so that I can change the data set but still pull from the same column each time. In this example, I have 3 columns with timestamps that mean different things (made up in this data). I need to change the timezone here to my local time zone. The column name will remain the same in all of my datasets, but the name of the dataset will change. I have a few places in my code where I need to do this, and I haven't been able to figure it out, so any suggestions would be much appreciated!
The second section of this example code is not included in the actual code, but it is there to set the data up correctly. The data comes out of the API in the format shown as GMT.
df <- data.frame(col_1 = c(1, 2, 3, 4),
time_1 = c("2021-01-20 23:58:21", "2021-01-20 21:21:00", "2021-01-20 17:14:04", "2021-01-20 01:05:18"),
time_2 = c("2021-01-19 23:58:21", "2021-01-19 21:21:00", "2021-01-19 17:14:04", "2021-01-19 01:05:18"),
time_3 = c("2021-01-18 23:46:21", "2021-01-18 36:21:00", "2021-01-18 15:14:04", "2021-01-18 01:05:18"),
time_4 = c("2021-01-17 23:58:21", "2021-01-17 20:21:00", "2021-01-17 18:14:04", "2021-01-17 02:05:18"))
# Not part of actual code
df$time_1 <- as.POSIXlt(df$time_1, tz = "GMT")
df$time_2 <- as.POSIXlt(df$time_2, tz = "GMT")
df$time_3 <- as.POSIXlt(df$time_3, tz = "GMT")
df$time_4 <- as.POSIXlt(df$time_4, tz = "GMT")
# What I want it to do
# df$time_1 <- lubridate::with_tz(df$time_1, tz = "America/Los_Angeles")
# df$time_2 <- lubridate::with_tz(df$time_2, tz = "America/Los_Angeles")
# df$time_3 <- lubridate::with_tz(df$time_3, tz = "America/Los_Angeles")
# df$time_4 <- lubridate::with_tz(df$time_4, tz = "America/Los_Angeles")
# Attempted function
timezone_cleanup <- function(my_df){
my_df$time_1 <- lubridate::with_tz(my_df$time_1, tz = "America/Los_Angeles")
my_df$time_2 <- lubridate::with_tz(my_df$time_2, tz = "America/Los_Angeles")
my_df$time_3 <- lubridate::with_tz(my_df$time_3, tz = "America/Los_Angeles")
my_df$time_4 <- lubridate::with_tz(my_df$time_4, tz = "America/Los_Angeles")
}
# how I'd like to use this function. Not working now. Even if I wrap it with data.frame(), it's not what I wanted.
new_df <- timezone_cleanup(df)
I think you need to return my_df in your function to get the changed dataframe back. However, you can use lapply or across to apply the same function to multiple columns.
library(dplyr)
timezone_cleanup <- function(my_df){
my_df %>%
mutate(across(starts_with('time'),
lubridate::with_tz, tz = "America/Los_Angeles"))
}
new_df <- timezone_cleanup(df)
By the way, I do recive a warning message while using this Unrecognized time zone 'America/Los_Angeles'. Are you sure you are using the correct tz value?

R plotting annual data and "January" repeated at end of graph

I'm fairly new to R and am trying to plot some expenditure data. I read the data in from excel and then do some manipulation on the dates
data <- read.csv("Spending2019.csv", header = T)
#converts time so R can use the dates
strdate <- strptime(data$DATE,"%m/%d/%Y")
newdate <- cbind(data,strdate)
finaldata <- newdate[order(strdate),]
This probably isn't the most efficient, but it gets me there :)
Here's the relevant columns of the first four lines of my finaldata dataframe
dput(droplevels(finaldata[1:4,c(5,7)]))
structure(list(AMOUNT = c(25.13, 14.96, 43.22, 18.43), strdate = structure(c(1546578000,
1546750800, 1547010000, 1547010000), class = c("POSIXct", "POSIXt"
), tzone = "")), row.names = c(NA, 4L), class = "data.frame")
The full data set has 146 rows and the dates range from 1/4/2019 to 12/30/2019
I then plot the data
plot(finaldata$strdate,finaldata$AMOUNT, xlab = "Month", ylab = "Amount Spent")
and I get this plot
This is fine for me getting started, EXCEPT why is JAN repeated at the far right end? I have tried various forms of xlim and can't seem to get it to go away.

Changing date format when plotting xts object in R

Sadly this answer here seems to not work for me.
From what I saw in the documentation, in the latest version, 0.10-1, the major.format parameter has been removed, opposed to previous versions, like 0.9-7, which has the major.format, that would solve easily my question.
It seems such a major feature to be deprecated. Is there any new way to do this? Seems something simple and easy, but I've been digging this issue for hours without success.
In case the issue lies in my code, here is a snippet of what I'm using.
merra2 = read.table("C:/merra2.csv", header=TRUE, sep=",", na.strings="NA", dec=".", strip.white=TRUE)
merra2$utc = as.POSIXct(merra2$utc, format = "%Y-%m-%d %H:%M:%S", tz="UTC")
merra2$m2_power = as.xts(x=merra2[,"m2_power"],order.by=merra2[,"utc"])
merra2$doy = as.xts(x=merra2[,"doy"],order.by=merra2[,"utc"])
plot.xts(merra2$m2_power, col="blue", lwd = 2, major.ticks="weeks", subset="2012-04-01/2014-04-01")
plot.xts(merra2$m2_power, col="blue", lwd = 2, major.ticks="months", subset="2012-04-01/2014-04-01")
And the input file contains something like:
utc,m2_power,doy
"1980-01-01 00:00:00",643.000,181.5000
"1980-01-01 01:00:00",643.000,181.4583
"1980-01-01 02:00:00",354.000,181.4167
If I add the major.format parameter, nothing changes, the axis stays the same.
Here, a reproductible example :
# Generate a sequence of Dates
StartDate<-"2017-07-01"
EndDate<- "2018-07-05"
dates<-seq(as.POSIXct(StartDate, format="%Y-%m-%d", tz="UTC")
, as.POSIXct(EndDate, format="%Y-%m-%d", tz="UTC")
, by='mins')
# Generate a sequence of x
x <- seq(1, length(dates))
# Create a dataframe, renaming columns
df <- as.data.frame(cbind(as.character(dates,format="%Y-%m-%d", tz="UTC"),x))
colnames(df) <- c("Dates","x")
# Redefine format
df$Dates <- as.POSIXct(df$Dates,format="%Y-%m-%d", tz="UTC")
df$x2 <- as.xts(x= as.numeric(df$x),order.by=df$Dates )
# Plot results
plot.xts(df$x2
, col="blue"
, lwd = 2
, major.ticks="weeks"
, major.format = TRUE
, subset="2017-08-01/2017-08-30")
If you change "major.ticks" the axis change... Have you take a look on the "utc" variable ? What is the complete time interval?

Why does this code plot by day of the week?

Link to the data set which is a date and time column along with electricity usage columns
https://d396qusza40orc.cloudfront.net/exdata%2Fdata%2Fhousehold_power_consumption.zip
power1 <- read.csv(file = "c:/datasets/household_power_consumption.txt", stringsAsFactors=F, header = TRUE,
sep=";", dec = ".", na.strings="?", col.names = c("date1","time1","Global_active_power", "Global_reactive_power",
"Voltage","Global_intensity","Sub_metering_1","Sub_metering_2",
"Sub_metering_3"))
power1$date1 <- as.Date(power1$date1, format="%d/%m/%Y")
power2 <- subset(power1, subset=(date1 >= "2007-02-01" & date1 <= "2007-02-02"))
datetime1 <- paste(as.Date(power2$date1), power2$time1)
power2$Datetime <- as.POSIXct(datetime1)
plot(power2$Global_active_power~power2$Datetime, type="l", ylab="Global Active Power (kilowatts)", xlab="")
When I run the above, I get the graph like I'm supposed to with the days of the week on the x axis even when I run summary, head and str() I don't see anything in the data about days of the week.
I tried to add my own day column with mutate but it didn't work.
And it didn't work when I subset it like the following. It subset properly where I had only the data I needed, but it wouldn't plot with the date1 column or the day of the week column I created via mutate
power2 <- subset(power1, subset=(as.Date(date1, format = "%d/%m/%Y") >= "2007-02-01"
& as.Date(date1, format = "%d/%m/%Y") <= "2007-02-02"))
I know that as.Posixct will have all the metadata in there, but I don't understand why is it when I combine the date and time columns into it's own column only then it plots by day of the week graphwithout me asking.
When I run it like this, the combined date and time column data is corrupted with the wrong year
power11 <- read.csv(file = "c:/datasets/household_power_consumption.txt", stringsAsFactors=F, header = TRUE,
sep=";", dec = ".", col.names = c("date1","time1","Global_active_power", "Global_reactive_power",
"Voltage","Global_intensity","Sub_metering_1","Sub_metering_2",
"Sub_metering_3"))
#colClasses = c("Date", "character", "factor", "numeric","numeric","numeric","numeric","numeric","numeric"))
power22 <- subset(power11, subset=(as.Date(date1, format = "%d/%m/%Y") >= "2007-02-01"
& as.Date(date1, format = "%d/%m/%Y") <= "2007-02-02"))
datetime1 <- paste(as.Date(power22$date1), power22$time1)
power22$Datetime <- as.POSIXct(datetime1)
Maybe this link would be helpful:
http://earlh.com/blog/2009/07/07/plotting-with-custom-x-axis-labels-in-r-part-5-in-a-series/
add an argument to your plot() call: xaxt='n'
plot(power2$Global_active_power~power2$Datetime, type="l", ylab="Global Active Power (kilowatts)", xlab="", xaxt='n')
that tells plot not to add x-axis labels. Then add an axis() call:
axis(side=1, at=power22$Datetime, labels=format(power22$Datetime, '%b-%y'))
I used '%b-%y' here, because that's what I saw on the site I referenced, but you would want to use the format code appropriate to your needs.

Set up different fonts for fragments of string in R

I have a long string txt that I want to display as margin text in a plot using mtext(). The txt string is composed of another string txt.sub, as well as of a date string, which applies a specific format to a date command argument. However, I want to display the "date" part of that string only in bold.
The string is:
date.in = as.Date( commandArgs( trailingOnly=TRUE )[1], format="%m/%d/%Y" )
date = format(date.in, "%b %d, %Y")
txt.sub = "Today's date is: "
txt = paste(txt.sub, date, sep = "")
I tried the following
## Plot is called first here.
mtext(expression(paste(txt.sub, bold(date), sep = "")), line = 0, adj = 0, cex = 0.8)
but the problem with this is that it doesn't paste the values of txt.sub and date, but rather displays literally the words "txt.sub" and "date".
Is there any way to get to the result I am looking for? Thank you!
Adjusting one of the examples from the help page on mathematical annotation (see example 'How to combine "math" and numeric variables'):
mtext(bquote(.(txt.sub) ~ bold(.(date))), line=0, adj=0, cex=0.8)

Resources