Convert CSV with dates using lubridate - r

I got a dataset in CSV format that has two columns: Date and Value. There are hundreds of rows in the file. Date format in the file is given as YYYY-MM-DD. When I imported this dataset, the Date column got imported as a factor, so I cannot run a regression between those two variables.
I am very new to R, but I understand that lubridate can help me convert the data in the Date column. Could someone provide some suggestions on what command should I use to do so? The file name is: Test.csv.

Next time please provide some test data and show what you did. For variations see ?as.Date and ?read.csv . The following does not use any packages:
# test data
Lines <- "Date,Value
2000-01-01,12
2001-01-01,13"
# DF <- read.csv("myfile.csv")
DF <- read.csv(text = Lines)
DF$Date <- as.Date(DF$Date)
plot(Value ~ Date, DF, type = "o")
giving:
> DF
Date Value
1 2000-01-01 12
2 2001-01-01 13
Note: Since your data is a time series you might want to use a time series representation. In this case read.zoo automatically converts the first column to "Date" class:
library(zoo)
# z <- read.zoo("myfile.csv", header = TRUE, sep = ",")
z <- read.zoo(text = Lines, header = TRUE, sep = ",")
plot(z)

Related

How to convert R date() values

I've got a dateframe with a lot of dates in it that were generated by the date() command in R, resembling the first dataframe below. On my computer with this version of R, the date values are formatted like this "Thu Mar 18 11:15:23 2021" - I believe this is all base R stuff.
I want to strip the weekday, the hours, minutes, and seconds away, and then transform it so that it looks like this "2021-03-18". My goal dataframe is the second dataframe below. I've tried various as.Date() or strftime functions to no avail.
df <- data.frame(date=c(date(),date()),value = c(1,2))
df <- data.frame(date =c("2021-03-18","2021-03-18"), value = c(1,2))
If you don't need strings, you can skip the strftime call and only use as.Date
df <- data.frame(
date=c(date(),date()),
value = c(1,2),
stringsAsFactors = FALSE
)
df$date <- strftime(as.Date(df$date, "%c"), "%Y-%m-%d")
https://stat.ethz.ch/R-manual/R-patched/library/base/html/strptime.html

How to change data type of column in Data frame to Date from Char

I'm messing with some columns in R using RStudio and have tried to change the data type of one of the columns from Char to Date.
I have used a few options and the one that came the closest was
data$Date <- as.Date(as.character(data$Date))
Though even this doesn't seem to work as it changes the values of the column to some weird values
i.e. from
To something like
I can't quite figure out why the transformation isn't working.
Here is my code up until that point
# load the tidyverse library
library("tidyverse")
setwd("C:/Users/ibrahim.cetinkaya/OneDrive - NTT/Desktop/data")
##################### Part A #####################
# data files (you need to specify the paths of the CSV files (e.g. relativeor absolute) )
files <- c("data/201808.csv",
"data/201809.csv",
"data/201810.csv",
"data/201811.csv",
"data/201812.csv",
"data/201901.csv",
"data/201902.csv",
"data/201903.csv",
"data/201904.csv",
"data/201905.csv",
"data/201906.csv",
"data/201908.csv"
)
#Concatenate into one data frame.
data <- data.frame()
for (i in 1:length(files)){
temp <- read_csv(files[i], skip = 7)
data <- rbind(data, temp)
}
#View to verify
view(data)
#Part 2
#Remove vairables which have no data at all (All the data are na's)
#Remove variables that doesn't have adequate data (70% of the number of records are NA's)
data <- data[rowMeans(is.na(data))<=0.9,]
view(data)
#Change the column names to have no spaces between the words
names(data) <- gsub(" ", "_", names(data))
view(data)
#Convert Date to date type
#df2 <- data %>% mutate_at(vars(data), as.Date, format="%m-%d-%Y")
#data %>% mutate(data$Date==as.Date(Date, format = "%m.%d.%Y"))
data$Date <- as.Date(as.character(data$Date))
#^^^ This doesn't seem to be working properly ^^^
#Checking if it worked
typeof(data$Date)
view(data)
Any suggestions would be appreciated.
I want to be able to change the data type and then extract the month and use it for grouping some of the other data in my frame.
Use
data$Date <- as.Date(data$date, "%m/%d/%Y")
and then to extract month
data$Month <- format(data$Date, "%m")
We can also use lubridate
data$date <- lubridate::mdy(data$date)
and use month to extract the month.
data$month <- month(data$date)
and with anytime
data$Date <- anytime::anydate(data$Date)

R check consistency of separated timeseries-table

I have a timeseries-table like this, which goes up to 2000 31 12 23 (12/31/2000 23:00):
I'd like to add temparature values from several weatherstations to it. The problem is, that obviously the different timeseries dont't match by count of rows, so there must be gaps.
How can I check up on these dataframes if they consequently follow the pattern of 0-24 hours, 1-12 months and get information of where these gaps are?
If your data is in the format of the link then you can probably convert it to a POSIXct object by doing the following (assuming your data frame is called data):
data = as.data.frame(list(YY = rep("1962",10),
MM = rep("01",10),
DD = rep("01",10),
HH = c("00","01","02","03","04",
"05","06","07","08","09")))
date = paste(data$YY,data$MM,data$DD,sep="-")
data$dateTime = as.POSIXct(paste(date,data$HH,sep=" "),format="%Y-%m-%d %H")
That should put your data into a POSIXct format. If your temperature dataset also has a column called "dateTime" and it's a POSIXct object you should be able to use the merge function and it will combine the two data frames
temp = as.data.frame(list(YY = rep("1962",10),
MM = rep("01",10),
DD = rep("01",10),
HH = c("00","01","02","03","04",
"05","06","07","08","09")))
date1 = paste(temp$YY,temp$MM,temp$DD,sep="-")
temp$dateTime = as.POSIXct(paste(date1,temp$HH,sep=" "),format="%Y-%m-%d %H")
temp$temp = round(rnorm(10,0,5),1)
temp = temp[,c("dateTime","temp")]
#let's say your temperature dataset is missing an entry for a certain timestamp
temp = temp[-3,]
# this data frame won't have an entry for 02:00:00
data1 = merge(data,temp)
data1
# if you want to look at time differences you can try something like this
diff(data1$dateTime)
# this one will fill in the temp value as NA at 02:00:00
data2 = merge(data,temp,all.x = T)
data2
diff(data2$dateTime)
I hope that helps, I often use the merge function when I'm trying to match up timestamps from ecological datasets
Thank you for you answer and sorry for my late reply.
Couldn't make it without your helpful hints though I now managed to merge all my timeseries on a slightly different way:
Sys.setenv(TZ='UTC') #setting system time to UTC for not having DST-gaps
# creating empty hourly timeseries for following join
start = strptime("1962010100", format="%Y%m%d%H")
end = strptime("2000123123", format= "%Y%m%d%H")
series62_00 <- data.frame(
MESS_DATUM=seq(start, end, by="hour",tz ='UTC'), t = NA)
# joining all the temperatureseries with same timespan using "plyr"-package
library("plyr")
t_allstations <- list(series62_00,t282,t867,t1270,t2261,t2503
,t2597,t3668,t3946,t4752,t5397,t5419,t5705)
t_omain_DWD <- join_all(t_allstations, by = "MESS_DATUM", type = "left")
Using join_all with type = "left" makes sure, that the column "Date" is not changed and missing temperature values are filled in as NA's.

R: aggregate quarterly data to hourly data - different behaviour with same date fields

I am trying to understand why R behaves differently with the "aggregate" function. I wanted to average 15m-data to hourly data. For this, I passed the 15m-data together with a pre-designed "hour" array (4 times the same date per hour, taking the original POSIXct array) to the aggregate function.
After some time, I realized that the function was behaving odd (well, probably the data was odd, but why?) when giving over the date-array with
strftime(data.15min$posix, format="%Y-%m-%d %H")
However, if I handed over the data with
cut(data.15min$posix, "1 hour")
the data was averaged correctly.
Below, a minimal example is embedded, including a sample of the data.
I would be happy to understand what I did wrong.
Thanks in advance!
d <- 3
bla <- read.table("test_daten.dat",header=TRUE,sep=",")
data.15min <- NULL
data.15min$posix <- as.POSIXct(bla$dates,tz="UTC")
data.15min$o3 <- bla$o3
hourtimes <- unique(as.POSIXct(paste(strftime(data.15min$posix, format="%Y-%m-%d %H"),":00:00",sep=""),tz="Universal"))
agg.mean <- function (xx, yy, rm.na = T)
# xx: parameter that determines the aggregation: list(xx), e.g. hour etc.
# yy: parameter that will be aggregated
{
aa <- yy
out.mean <- aggregate(aa, list(xx), FUN = mean, na.rm=rm.na)
out.mean <- out.mean[,2]
}
#############
data.o3.hour.mean <- round(agg.mean(strftime(data.15min$posix, format="%m/%d/%y %H"), data.15min$o3), d); data.o3.hour.mean[1:100]
win.graph(10,5)
par(mar=c(5,15,4,2), new =T)
plot(data.15min$posix,data.15min$o3,col=3,type="l",ylim=c(10,60)) # original data
par(mar=c(5,15,4,2), new =T)
plot(data.date.hour_mean,data.o3.hour.mean,col=5,type="l",ylim=c(10,60)) # Wrong
##############
data.o3.hour.mean <- round(agg.mean(cut(data.15min$posix, "1 hour"), data.15min$o3), d); data.o3.hour.mean[1:100]
win.graph(10,5)
par(mar=c(5,15,4,2), new =T)
plot(data.15min$posix,data.15min$o3,col=3,type="l",ylim=c(10,60)) # original data
par(mar=c(5,15,4,2), new =T)
plot(data.date.hour_mean,data.o3.hour.mean,col=5,type="l",ylim=c(10,60)) # Correct
Data:
Download data
Too long for a comment.
The reason your results look different is that aggregate(...) sorts the results by your grouping variable(s). In the first case,
strftime(data.15min$posix, format="%m/%d/%y %H")
is a character vector with poorly formatted dates (they do not sort properly). So the first row corresponds to the "date" "01/01/96 00".
In your second case,
cut(data.15min$posix, "1 hour")
generates actual POSIXct dates, which sort properly. So the first row corresponds to the date: 1995-11-04 13:00:00.
If you had used
strftime(data.15min$posix, format="%Y-%m-%d %H")
in your first case you would have gotten the same result as using cut(...)

How to import timestamps & durations in HH:MM:SS format from excel in R?

I have an excel tablethat has a column that contains timestamps in the format HH:MM:SS.
However, after reading in the exported CSV into R, the values in the corresponding column data$timestamp are interpreted as a huge small number (e.g., 7,06018515869254E-04.)
How can I get R to interpret the numbers as they were meant to?
I've tried the following without any success (yield NA):
timeTest <- data$timestamp[1]
print(as.POSIXct((timeTest - 719529)*86400, origin = "1970-01-01", tz = "UTC"))
print(as.POSIXct(strptime(timeTest, %Y%m%d %H%M%S")))
A hint how to achieve the desired format would be of great help!
Here is a sample from the actual CSV:
0,692210648148148|0,692534722222222|3,24074074074088E-04
As I use | as separator, I import the data as follows:
data <- read.delim(file,header=TRUE,sep="|")
DF <- read.table(text="x|y|timestamp
0,692210648148148|0,692534722222222|3,24074074074088E-04",
sep="|", dec=",", header=TRUE)
library(chron)
DF$timestamp <- times(DF$timestamp)
# x y timestamp
#1 0.6922106 0.6925347 00:00:28

Resources