please do not smash me before reading. First of all before asking the question I spent 3 hours just try to figure out couple of things, I have tried different approaches however it does not work, if you are here to direct me go google it, please do not. Thanks.
I had an xls data frame, tried to read in R with my lecturer told, it does not work, he basically wants us to implement a rjava library to read .xls file with read_xlsx(), i tried, nope does not work, just can not read even the file.
(As usual for me, read_csv() format, I converted .xls into .csv and finally could read in my R script.
And hurraa worked!!!)
Here is the tricky problem, I am explaining all steps, want you to feel my pain.
These are my column names:
When i split into 2 columns and try to convert, function works but all data under the column of date or time or date...time disappears, all i see is NA all the way down.
I appriciate if you can help me, thanks a lot.
sapply(myDataFrame, class)
Date...Timestamp DO1 DO2 Controlling.DO pH
"character" "numeric" "numeric" "numeric" "numeric"
Biomass Titre..mg.mL. Base.Buffer Media.Batch
"numeric" "numeric" "integer" "integer"**
As you can see I got a very very well thought column name (Date...Timestamp), anyway I need to convert char format into Date format, but all column all the way down "28/03/2020 17:28" formatted.
What i did ?
#Split Date / Timestamp column by character
myDataFrame <- myDataFrame %>%
mutate(Date = str_sub(Date...Timestamp, 1,11)) %>%
mutate(Time = str_sub(Date...Timestamp, 11))
#DataFrame manipulation, drop the old Data Timestamp column
myDataFrame <- myDataFrame %>%
select(-Date...Timestamp)
splitted column into two as wanted, date and time, cool!!! but when I try to change type of columns I get different variations of errors with each function.
i tried as.Date(), as.date.numeric(), as.POSIXct()
# converting to datetime object
myDataFrame[["Date"]] <- as.POSIXct(myDataFrame[["Date"]], format = "%D/%M/%Y")
myDataFrame[["Time"]] <- as.POSIXct(myDataFrame[["Time"]], format = "%H:%M")
myDataFrame$Date <- as.Date(myDataFrame$Date)
myDataFrame$Time <- as.Date(myDataFrame$Time)
# tried before splitting and splitting afterwards, still error.
myDataFrame$Date...Timestamp <- as.Date(myDataFrame$Date...Timestamp)
Related
I have a problem with the as.date function.
I have a list of normal date shows in the excel, but when I import it in R, it becomes numbers, like 33584. I understand that it counts since a specific day. I want to set up my date in the form of "dd-mm-yy".
The original data is:
how the "date" variable looks like in r
I've tried:
as.date <- function(x, origin = getOption(date.origin)){
origin <- ifelse(is.null(origin), "1900-01-01", origin)
as.Date(date, origin)
}
and also simply
as.Date(43324, origin = "1900-01-01")
but none of them works. it shows the error: do not know how to convert '.' to class “Date”
Thank you guys!
The janitor package has a pair of functions designed to deal with reading Excel dates in R. See the following links for usage examples:
https://www.rdocumentation.org/packages/janitor/versions/2.0.1/topics/excel_numeric_to_date
https://www.rdocumentation.org/packages/janitor/versions/2.0.1/topics/convert_to_date
janitor::excel_numeric_to_date(43324)
[1] "2018-08-12"
I've come across excel sheets read in with readxl::read_xls() that read date columns in as strings like "43488" (especially when there is a cell somewhere else that has a non-date value). I use
xldate<- function(x) {
xn <- as.numeric(x)
x <- as.Date(xn, origin="1899-12-30")
}
d <- data.frame(date=c("43488"))
d$actual_date <- xldate(d$date)
print(d$actual_date)
# [1] "2019-01-23"
Dates are notoriously annoying. I would highly recommend the lubridate package for dealing with them. https://lubridate.tidyverse.org/
Use as_date() from lubridate to read numeric dates if you need to.
You can use format() to put it in dd-mm-yy.
library(lubridate)
date_vector <- as_date(c(33584, 33585), origin = lubridate::origin)
formatted_date_vector <- format(date_vector, "%d-%m-%y")
I am very new to R, I watched a youtube video to do various time series analysis, but it downloaded data from yahoo - my data is in Excel. I wanted to follow the same analysis, but with data from an excel.csv file. I spent two days finding out that the date must be in USA style. Now I am stuck again on a basic step - loading the data so it can be analysed - this seems to be the biggest hurdle with R. Please can someone give me some guidance on why the command shown below does not do the returns for the complete column set. I tried the zoo format, but it didn't work, then I tried xts and it worked partially. I suspect the original import from excel is the major problem. Can I get some guidance please
> AllPrices <- as.zoo(AllPrices)
> head(AllPrices)
Index1 Index2 Index3 Index4 Index5 Index6 Index7 Index8 Index9 Index10
> AllRets <- dailyReturn(AllPrices)
Error in NextMethod("[<-") : incorrect number of subscripts on matrix
> AllPrices<- as.xts(AllPrices)
> AllRets <- dailyReturn(AllPrices)
> head(AllRets)
daily.returns
2012-11-06 0.000000e+00
2012-11-07 -2.220249e-02
2012-11-08 1.379504e-05
2012-11-09 2.781961e-04
2012-11-12 -2.411128e-03
2012-11-13 7.932869e-03
Try to load your data using the readr package.
library(readr)
Then, look at the documentation by running ?read_csv in the console.
I recommend reading in your data this way. Specify the column types. For instance, if your first column is the date, read it in as a character "c" and if your other columns are numeric use "n".
data <- read_csv('YOUR_DATA.csv', col_types = "cnnnnn") # date in left column, 5 numeric columns
data$Dates <- as.Date(data$Dates, format = "%Y-%m-%d") # make the dates column a date class (you need to update "Dates" to be your column name for the Dates column, you may need to change the format
data <- as.data.frame(data) # turn the result into a dataframe
data <- xts(data[,-1], order.by = XAU[,1]) # then make an xts, data is everything but the date column, order.by is the date column
I have dates in the following formats:
08MAR1978:00:00:00
10FEB1973:00:00:00
15AUG1982:00:00:00
I would like to convert them to:
1978-03-08
1973-02-10
1982-09-15
I have tried the following in SparkR:
period_uts <- unix_timestamp(all.new$DATE_OF_BIRTH, '%d%b%Y:%H:%M:%S')
period_ts <- cast(period_uts, 'timestamp')
period_dt <- cast(period_ts, 'date')
df <- withColumn(all.new, 'p_dt', period_dt)
But when I do this, all the dates get changed into "NA".
Can anyone please provide some insights on how I can convert dates in %d%B%Y:%H:%M:%S format to dates in SparkR?
Thanks!
I don't think you need SparkR to solve this question.
What you have:
DoB <- c("08MAR1978:00:00:00", "10FEB1973:00:00:00", "15AUG1982:00:00:00")
If you want to get 1978-03-08 etc. you could just use as.Date in combination with the date format you already found yourself:
as.Date(DoB, format="%d%B%Y:%H:%M:%S")
# [1] "1978-03-08" "1973-02-10" "1982-08-15"
as.Date will ensure that R knows how to interpret your string as a date.
Note, however, that in general the way dates are displayed to you (i.e. 1978-03-08) actually don't really matter. The reason is that 'under the hood', R understands your date now, so all date-related operations will be performed appropriately.
I figured out how to do it:
all.new = all.new %>% withColumn("Date_of_Birth_Fixed", to_date(.$DATE_OF_BIRTH, "ddMMMyyyy"))
This works in Spark 2.2.x
I have a dataset with some date time like this "{datetime:2015-07-01 09:10:00" So I wanted to remove the text, and then keep the date & the time as as.Date returns only the date. So I write this code but the only problem I have is that during the second line with strsplit, it only returns me the date time of the first line and so erase the others... I woud love to get ALL my date time not only the first. I thought about sapply maybe, but I can't make it right I have many errors or maybe with a loop for? I am novice to R so I don't really know how to do this the best way.
Could you help me please? Besides If you have another idea for the time & date format or a simple way to do it, it should be very nice of you too.
data$`Date Time`=as.character(data$`Date Time`)
data$`Date Time`=unlist(strsplit(data[,1], split='e:'))[2]
date=substr(data$`Date Time`,0,10)
date=as.Date(date)
time=substr(data$`Date Time`,12,19)
data$Date=date
data$Time=time
Thank you very much for your help!
You could use the format argument to avoid all the strsplit:
times <- as.POSIXct(data$`Date Time`, format='{datetime:%Y-%m-%d %H:%M:%S')
(The reason for the "{datetime:" in the format is because you mentioned this is the format of your strings).
This object has both date and time in it, and then you can just store it in the dataframe as a single column of type POSIXct rather than two columns of type string e.g.
data$datetime <- times
but if you do want to store the date as a Date and the time as a string (as in your example above):
data$Date <- as.Date(times)
data$Time <- strftime(times, format='%H:%M:%S')
See ?as.Date, ?as.POSIXct, ?strptime for more details on that format argument and various conversions between date and string.
I'm getting an error using smartbind to append two datasets. First, I'm pretty sure the error I'm getting:
> Error in as.vector(x, mode) : invalid 'mode' argument
is coming from the date variable in both datasets. The date variable in it's raw format is such: month/day/year. I transformed the variable after importing the data using as.Date and format
> rs.month$xdeeddt <- as.Date(rs.month$xdeeddt, "%m/%d/%Y")
> rs.month$deed.year <- as.numeric(format(rs.month$xdeeddt, format = "%Y"))
> rs.month$deed.day <- as.numeric(format(rs.month$xdeeddt, format = "%d"))
> rs.month$deed.month <- as.numeric(format(rs.month$xdeeddt, format = "%m"))
The resulting date variable is as such:
> [1] "2014-03-01" "2014-03-13" "2014-01-09" "2013-10-09"
The transformation for the date was applied to both datasets (the format of the raw data was identical for both datasets). When I try to use smartbind, from the gtools package, to append the two datasets it returns with the error above. I removed the date, month, day, and year variables from both datasets and was able to append the datasets successfully with smartbind.
Any suggestions on how I can append the datasets with the date variables.....?
I came here after googling for the same error message during a smartbind of two data frames. The discussion above, while not so conclusive about a solution, definitely helped me move through this error.
Both my data frames contain POSIXct date objects. Those are just a numeric vector of UNIXy seconds-since-epoch, along with a couple of attributes that provide the structure needed to interpret the vector as a date object. The solution is simply to strip the attributes from that variable, perform the smartbind, and then restore the attributes:
these.atts <- attributes(df1$date)
attributes(df1$date) <- NULL
attributes(df2$date) <- NULL
df1 <- smartbind(df1,df2)
attributes(df1$date) <- these.atts
I hope this helps someone, sometime.
-Andy