Issue in date formating while importing excel file in R - r

I am trying to get the date as Jan 20 to June 20 in the month column of rpivottable. but it's always showing in yyyy-mm-dd format e.g. 2020-01-01. My code as below:
library(readxl)
library(rpivotTable)
myexcel <- read_excel("claimH1data_date.xlsx")
x <- myexcel$Month
as.Date(x, format, tryFormats = c("%m-%Y"),tz = "UTC",
optional = TRUE)
format(x, format="%B %Y")
View(x)
rpivotTable(myexcel, rows = "Month",cols="Action", vals = "Freq",
aggregatorName = "Count", rendererName = "Table")
Can you please help? Thanks.

You can try :
library(readxl)
#Read the data
myexcel <- read_excel("claimH1data_date.xlsx")
#Sort the data based on date
myexcel <- myexcel[order(myexcel$Month), ]
#Apply the format
myexcel$Month <- format(myexcel$Month, format="%B %Y")
myexcel

Related

read.zoo is not returning needed date format

My initial data is in %y-%m-%d format...
using the code
returnsgamma <- read.zoo(returns, header = TRUE, sep = ",", FUN = as.chron)
the zoo file is returning values in the order %m/%d/%y
is there anyway to read.zoo and have the order of dates stay as %y/%m/%d or %d/%m/%y?
Assuming the input shown in the Note at the end we can use the default Date class whose output when rendering defaults to yyyy-mm-dd or use chron with chron(..., out.format="y-m-d") which produces yy-mm-dd.
library(zoo)
read.csv.zoo(text = Lines, format = "%y-%m-%d")
## 2022-12-01
## 34
library(chron)
toChron <- function(x) as.chron(x, out.format = "y-m-d")
read.csv.zoo(text = Lines, FUN = toChron)
## 22-12-01
## 34
Note
Lines <- "date,value
22-12-01,34"

create date vector, change to factor and format

I have the following code:
gsub("-","/",paste(cut(seq(as.POSIXct(Sys.Date(),format="%d-%b-%y"), by = "-1 day", length.out = 10),"days"),collapse = ","))
The output:
"2019/03/20,2019/03/19,2019/03/18,2019/03/17,2019/03/16,2019/03/15,2019/03/14,2019/03/13,2019/03/12,2019/03/11"
However the desired result is
'20/03/2019','19/03/2019','18/03/2019','17/03/2019','16/03/2019','15/03/2019','14/03/2019','13/03/2019','12/03/2019','11/03/2019'
How can I accomplish that ?
Regards
Not sure what you are trying to do but you can generate the required output by doing
format(Sys.Date() - 1:10, "%d/%m/%Y")
#[1] "20/03/2019" "19/03/2019" "18/03/2019" "17/03/2019" "16/03/2019" "15/03/2019"
# "14/03/2019" "13/03/2019" "12/03/2019" "11/03/2019"

R as.POSIXct try two input formats

I am reading in a .csv of dates and gps positions. I need to convert the date column to a date class.
I am using:
data = data.frame(rbind(c('2016/07/19 17:52:00',3674.64416424279,354.266660979476),
c('2016/07/19 17:54:00',3674.65121597935,354.246972537617),
c('2016/07/19 17:55:00',3674.65474186293,354.237128326737),
c('2016/07/19 17:56:00',3674.65826775671,354.227284122559)))
colnames(data) = (c('GMT_DateTime','northing','easting'))
data$GMT_DateTime<-as.POSIXct(data$GMT_DateTime, tz="GMT", format = "%Y/%m/%d %H:%M:%S")
Sometimes the date in the .csv to be read is formatted as "%Y/%m/%d %H:%M:%S" and sometimes as "%m/%d/%Y %H:%M"
Is there a way to feed in two possible formats to as.POSIXct() to try both possible formats? I imagine something like this:
data$GMT_DateTime<-as.POSIXct(data$GMT_DateTime, tz="GMT", format = "%m/%d/%Y %H:%M" or "%Y/%m/%d %H:%M:%S")
Thank you!
In what follows I will use package lubridate.
I have added two extra rows to the example dataset, with date/time values in the "%m/%d/%Y %H:%M" format. Note that that column is of class character, if it is of class factor it will probably throw an error.
As for the warnings, don't worry, they are just lubridate telling you that it found several formats and cannot process them all in one go.
tmp <- data$GMT_DateTime # work on a copy
na <- is.na(ymd_hms(tmp))
data$GMT_DateTime[!na] <- ymd_hms(tmp)[!na]
data$GMT_DateTime[na] <- mdy_hm(tmp)[na]
data$GMT_DateTime <- as.POSIXct(as.numeric(data$GMT_DateTime),
format = "%Y-%m-%d",
origin = "1970-01-01", tz = "GMT")
rm(tmp) # final clean up
Data in dput() format.
data <-
structure(list(GMT_DateTime = c("2016/07/19 17:52:00", "2016/07/19 17:54:00",
"2016/07/19 17:55:00", "2016/07/19 17:56:00", "07/22/2016 17:02",
"07/23/2016 17:15"), northing = c(3674.64416424279, 3674.65121597935,
3674.65474186293, 3674.65826775671, 3674.662, 3674.665), easting = c(354.266660979476,
354.246972537617, 354.237128326737, 354.227284122559, 354.2702,
354.3123)), row.names = c(NA, -6L), class = "data.frame")

Converting datetime from character to POSIXct object

I have an instrument that exports data in an unruly time format. I need to combine the date and time vectors into a new datetime vector in the following POSIXct format: %Y-%m-%d %H:%M:%S. Out of curiosity, I attempted to do this in three different ways, using as.POSIXct(), strftime(), and strptime(). When using my example data below, only the as.POSIXct() and strftime() functions work, but I am curious as to why strptime() is producing NAs? Also, I cannot convert the strftime() output into a POSIXct object using as.POSIXct()...
When trying these same functions on my real data (of which I've only provided you with the first for rows), I am running into an entirely different problem. Only the strftime() function is working. For some reason the as.POSIXct() function is also producing NAs, which is the only command I actually need for converting my datetime into a POSIXct object...
It seems like there are subtle differences between these functions, and I want to know how to use them more effectively. Thanks!
Reproducible Example:
## Creating dataframe:
date <- c("2017-04-14", "2017-04-14","2017-04-14","2017-04-14")
time <- c("14:24:24.992000","14:24:25.491000","14:24:26.005000","14:24:26.511000")
value <- c("4.106e-06","4.106e-06","4.106e-06","4.106e-06")
data <- data.frame(date, time)
data <- data.frame(data, value) ## I'm sure there is a better way to combine three vectors...
head(data)
## Creating 3 different datetime vectors:
## This works in my example code, but not with my real data...
data$datetime1 <- as.POSIXct(paste(data$date, data$time), format = "%Y-%m-%d %H:%M:%S",tz="UTC")
class(data$datetime1)
## This is producing NAs, and I'm not sure why:
data$datetime2 <- strptime(paste(data$date, data$time), format = "%Y-%m-%d %H:%M%:%S", tz = "UTC")
class(data$datetime2)
## This is working just fine
data$datetime3 <- strftime(paste(data$date, data$time), format = "%Y-%m-%d %H:%M%:%S", tz = "UTC")
class(data$datetime3)
head(data)
## Since I cannot get the as.POSIXct() function to work with my real data, I tried this workaround. Unfortunately I am running into trouble...
data$datetime4 <- as.POSIXct(x$datetime3, format = "%Y-%m-%d %H:%M%:%S", tz = "UTC")
Link to real data:
here
Example using real_data.txt:
## Reading in the file:
fpath <- "~/real_data.txt"
x <- read.csv(fpath, skip = 1, header = FALSE, sep = "", stringsAsFactors = FALSE)
names(x) <- c("date","time","bscat","scat_coef","pressure_mbar","temp_K","CH1","CH2") ## This is data from a Radiance Research Integrating Nephelometer Model M903 for anyone who is interested!
## If anyone could get this to work that would be awesome!
x$datetime1 <- as.POSIXct(paste(x$date, x$time), format = "%Y-%m-%d %H:%M%:%S", tz = "UTC")
## This still doesn't work...
x$datetime2 <- strptime(paste(x$date, x$time), format = "%Y-%m-%d %H:%M%:%S", tz = "UTC")
## This works:
x$datetime3 <- strftime(paste(x$date, x$time), format = "%Y-%m-%d %H:%M%:%S", tz = "UTC")
## But I cannot convert from strftime character to POSIXct object, so it doesn't help me at all...
x$datetime4 <- as.POSIXct(x$datetime3, format = "%Y-%m-%d %H:%M%:%S", tz = "UTC")
head(x)
Solution:
I was not providing the as.POSIXct() function with the correct format string. Once I changed %Y-%m-%d %H:%M%:%S to %Y-%m-%d %H:%M:%S, the data$datetime2, data$datetime4, x$datetime1 and x$datetime2 were working properly! Big thanks to PhilC for debugging!
For your real data issue replace the %m% with %m:
## Reading in the file:
fpath <- "c:/r/data/real_data.txt"
x <- read.csv(fpath, skip = 1, header = FALSE, sep = "", stringsAsFactors = FALSE)
names(x) <- c("date","time","bscat","scat_coef","pressure_mbar","temp_K","CH1","CH2") ## This is data from a Radiance Research Integrating Nephelometer Model M903 for anyone who is interested!
## issue was the %m% - fixed
x$datetime1 <- as.POSIXct(paste(x$date, x$time), format = "%Y-%m-%d %H:%M:%S", tz = "UTC")
## Here too - fixed
x$datetime2 <- strptime(paste(x$date, x$time), format = "%Y-%m-%d %H:%M:%S", tz = "UTC")
head(x)
There was a format string error causing the NAs; try this:
## This is no longer producing NAs:
data$datetime2 <- strptime(paste(data$date, data$time), format = "%Y-%m-%d %H:%M:%S",tz="UTC")
class(data$datetime2)
Formatting to "%Y-%m-%d %H:%M:%OS" is a generic view. To make the fractional seconds to a specific number of decimals call the option for degits.sec, e.g.:
options(digits.secs=6) # This will take care of seconds up to 6 decimal points
data$datetime1 <- lubridate::parse_date_time(data$datetime, "%Y-%m-%d %H:%M:%OS")

R POSIXlt timestamp conversion do not know how to convert 'df1$timestamp' to class “POSIXlt”

Hello all I am facing issue while converting timestamp to POSIXlt, later from this timestamp o need to extract Year,month,dayofmoth,hour,min,sec
2015-12-01 00:04:39 is my timestamp
and here is my try
getwd()
rm(list=ls())
library(ggplot2)
library(plyr)
library(reshape)
library(scales)
library(gridExtra)
library(SparkR)
Sys.setenv(SPARK_HOME="/usr/local/spark").libPaths(c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib"), .libPaths()))
sc <- sparkR.init(master="local","RwordCount")
args <- commandArgs(trailing = TRUE)
sqlContext <- sparkRSQL.init(sc)
df1 <- read.df(sqlContext, "hdfs://master:9000/test.csv", header='true', source = "com.databricks.spark.csv", inferSchema='true', stringsAsFactors = F)
if("timestamp" %in% colnames(df1)){
df1$pTime <- as.POSIXlt(df1$timestamp, format= "%Y-%m-%d %H:%M:%S")
}else {
df1$pTime <- as.POSIXlt(df1$Timestamp, format= "%Y-%m-%d %H:%M:%S")
}
but getting error here do not know how to convert 'df1$timestamp' to class “POSIXlt”
later i need to find out the Year,month,dayofmonth,hors,min,sec for that i have this snippet
df1$Year <- df1$pTime$year-100 #Year
df1$Month <- df1$pTime$mon+1 #Month 1-12
df1$Day <- df1$pTime$mday #day of month
df1$Hour <- df1$pTime$hour #0â??23: hours
df1$Min <- df1$pTime$min
df1$Sec <- df1$pTime$sec
df1$WeekOfYear <- strftime(df1$pTime, format="%W")
and i am executing above script using following syntax,
bin/spark-submit --packages com.databricks:spark-csv_2.11:1.3.0 /home/script/analysis.R
**Error in as.POSIXlt.default(df1$timestamp, format = "%Y-%m-%d %H:%M:%S") :
do not know how to convert 'df1$timestamp' to class “POSIXlt”
Calls: as.POSIXlt -> as.POSIXlt.default
Execution halted**
How can i get rid of the error, any help will be appreciated.
Thanks
You can convert your timestamp by using as.POSIXct
x <- as.POSIXct("2015-12-01 00:04:39")
and then using lubridate package, you can extract all the information
library(lubridate)
year(x)
#[1] 2015
month(x)
#[1] 12
day(x)
#[1] 1
hour(x)
#[1] 0
minute(x)
#[1] 4
second(x)
#[1] 39
You can extract parts of datetime values by
x <- Sys.time()
format(x, format="%Y")
for example. See
?strptime
for all options.
I can not reconstruct the first part of your question. What is the error message you get?
firstly, you can index your data.frame using without using the ifelse case
df1[colnames(df1) %in% "timestamp"]
To convert you entire column of format 2015-12-01 00:04:39
as.POSIXlt(strptime(as.character(df1[colnames(df1) %in% "timestamp"]),
format = "%Y-%m-%d %H:%M:%S"),
format = "%Y-%m-%d %H:%M:%S")
I had 'trans_dtime' column type string in dataframe. i have converted 'trans_dtime' column to timestamp type using SparkR
printSchema(df)
root
|-- col1: string (nullable = true)
|-- trans_dtime: string (nullable = true)
df$trans_dtime <- from_utc_timestamp(date_format(df$trans_dtime, "YYYY-MM-dd HH:mm:ss"), "GMT")
printSchema(df)
root
|-- col1: string (nullable = true)
|-- trans_dtime: timestamp (nullable = true)
Hope it will help you. :)

Resources