In Julia I need to convert numbers to DateTime in the same manner as Microsoft Excel.
In Excel, today's date of 23-Sep-2019 is represented by 43731 and 6pm this afternoon by 43731.75. I can ignore the fact that Excel incorrectly assumes that 1900 is a leap year since all my data is safely beyond that point. Millisecond accuracy is sufficient.
The code below seems to work, but is there a better way?
function exceldatetodate(exceldate::Integer)
Dates.Date(1899, 12, 30) + Dates.Day(exceldate)
end
function exceldatetodate(exceldate::Real)
t,d = modf(exceldate)
Dates.Date(1899, 12, 30) + Dates.Day(d) + Dates.Millisecond(floor(t * 86400000))
end
julia> exceldatetodate(43731)
2019-09-23
julia> exceldatetodate(43731.75)
2019-09-23T18:00:00
you can overload the convert methods, and create a custom type that holds the value.
using Dates
struct ExcelDate{T<:Real}
val::T
end
function exceldatetodate(exceldate::Integer)
Dates.DateTime(1899, 12, 30) + Dates.Day(exceldate)
end
function exceldatetodate(exceldate::Real)
t,d = modf(exceldate)
return Dates.DateTime(1899, 12, 30) + Dates.Day(d) + Dates.Millisecond((floor(t * 86400000)))
end
function exceldatetodate(exceldate::ExcelDate)
exceldatetodate(exceldate.val)
end
function exceldatetodate(exceldate::ExcelDate)
exceldatetodate(exceldate.val)
end
function toexceldate(date::Date)
datetime = Dates.value(DateTime(date) - Dates.DateTime(1899, 12, 30))
datetime = round(datetime/86400000,digits = 3)
return ExcelDate(datetime)
end
function toexceldate(date::DateTime)
datetime = Dates.value(date - Dates.DateTime(1899, 12, 30))
datetime = round(datetime/86400000,digits = 3)
return ExcelDate(datetime)
end
Base.convert(d::Type{Dates.DateTime},n::ExcelDate) = exceldatetodate(n)
Base.convert(d::Type{Dates.Date},n::ExcelDate) = convert(Date,exceldatetodate(n))
Base.convert(d::Type{T},n::ExcelDate) where T<: Real = convert(d,n.val)
Base.convert(d::Type{ExcelDate},n::Dates.DateTime) = toexceldate(n)
Base.convert(d::Type{ExcelDate},n::Dates.Date) = toexceldate(n)
then, you can play with the values:
original_numbers = 40000.01:41000.01 #test numbers
excel_dates = convert.(ExcelDate,original_numbers)
dates = convert.(Date,excel_dates) #just days
datetimes = convert.(DateTime,excel_dates) #days and miliseconds
orig2 = convert.(ExcelDate,datetimes) #this preserves the original number
orig3 = convert.(ExcelDate,dates) #this does not preserve the original number
Is very important to mention that excel treats all numbers as float64, where in Julia, Dates are a completely different type. In my opinion, if you want a certain range of numbers to behave like a Date, it's better to construct a type that reflects that behavior.
One important characteristic of a excel date is that you can operate over dates like numbers, but the result of that operation isn't formatted as a date. That is a result of Excel's decision of using a Float64 to represent Dates.
The defined type has more restrictions than a number, and if you want to work with the dates as numbers, you can transform the ExcelDate's as numbers first, but it makes more sense to just use the julia Date type, that has better and more methods to to use with dates.
Offtopic, but dates are a not solved problem of programming, with different standards across all programming languages.
Related
Here is my code right now:
f=function(Symbol, start, end, interval){
getSymbols(Symbols=Symbol, from=start, to= end)
Symbol=data.frame(Symbol)
a=length(Symbol$Symbol.Adjusted)
b=a/interval
c=ceiling(b)
origData=as.data.frame(matrix(`length<-`(Symbol$Symbol.Adjusted, c * interval), ncol = interval, byrow = TRUE))
return(origData)
}
f("SPY", "2012-01-01", "2013-12-31", 10)
Next I need to Get the adjusted close price and consider this price data only for following tasks. Split daily stock adjusted close price into N blocks as rows in a data frame. So that each block containing M days (columns) data, where M equals to the time interval value. It’s referred as origData in my code.
The function is supposed to return the data frame origData, but whenever I try running this it tells me that the Symbol data frame is empty. How do I need to change my function to get the data frame output?
#IRTFM's observations are correct. Incorporating those changes you can change your function to :
library(quantmod)
f = function(Symbol, start, end, interval){
getSymbols(Symbols=Symbol, from=start, to= end)
data= get(Symbol)
col = data[, paste0(Symbol, '.Adjusted')]
a=length(col)
b=a/interval
c=ceiling(b)
origData= as.data.frame(matrix(`length<-`(col, c * interval),
ncol = interval, byrow = TRUE))
return(origData)
}
f("SPY", "2012-01-01", "2013-12-31", 10)
I haven't figured out what the set of expressions inside the data.matrix call is supposed to do and you made no effort to explain your intent. However, your error occurs farther up the line. If you put in a debugging call to str(Symbol) you will see that Symbol will evaluate to "SPY" but that is just a character value and not an R object name. The object you wnat is named SPY and the way to retrieve an object's value when you can only have access to a character value is to use the R function get, So try adding this after the getSymbols call inside the function:
library(quantmod) # I'm assuming this was the package in use
...
Symbol=data.frame( get(Symbol) )
str(Symbol) # will print the result at your console
....
# then perhaps you can work on what you were trying inside the data.matrix call
You will also find that the name Symbol.Adjusted will not work (since R is not a macro language). You will need to do something like:
a=length( Symbol[[ paste0(Symbol, ".Adjusted")]] )
Oh wait. You overwrote the value for Symbol. That won't work. You need to use a different name for your dataframe. So why don't you edit your question to fix the errors I've identified so far and also describe what you are trying to do when you were using as.data.frame.
I have a Dataset in Power Bi with many columns, which contain information on incident tickets (e.g. How long it took to solve the issue, etc.)
Unfortunately the data I'm getting is not in the correct Time format. I wrote a simple R Function which would re-calculate the Time and return the correct value:
calculateHours <- function(hours) {
x <- trunc(hours/24)
rest <- mod(hours,24)
y <- trunc(rest/10)
z <- mod(rest,10)
result <- (((x+y)*10)+z)
return(result)
}
Example: 204 hours would turn into 92 hours if you run this through the Function.
Now I need to have a new column with the calculated values in it.
E.g. 'Business Elapsed Time = 204' -> 'Business Elapsed Time calculated (new Column) = 92'
How can I use this function in Power BI to add a new column which uses the values from another column of this table and then calculates the correct time values?
I'm still new to Power Bi and R so any help would be appreciated! Thanks in advance!
In Power BI Query Editor you can add an R Script (Transform -> Run R script) to your query. Here's a simple example that assumes you have a column Number:
# 'dataset' holds the input data for this script
myfunction <- function(x)
{
return (x + 1)
}
dataset$NewNumber <- myfunction(dataset$Number) ## apply function and add result as new column
output <- dataset ## PowerBI uses "output" as result from this query step
Here's a more detailed intro: https://www.red-gate.com/simple-talk/sql/bi/power-bi-introduction-working-with-r-scripts-in-power-bi-desktop-part-3/
Power Query can handle most of calculation itself using M formula. It would be much simpler than invoking R script, more integrated, and probably faster.
In Power Query Editor, navigate Add Column > Custom Column, then input M formula like below.
let
x = Number.IntegerDivide([Hours], 24),
rest = Number.Mod([Hours], 24),
y = Number.IntegerDivide(rest, 10),
z = Number.Mod(rest, 10),
result = (x + y) * 10 + z
in
result
I have two arrays in Julia, X = Array{Float64,2} and Y = Array{Float64,2}. I'd like to perform a vlookup as per Excel functionality. I can't seem to find something like this.
the following code returns first matched from s details matrix using related record from a master matrix.
function vlook(master, detail, val)
val = master[findfirst(x->x==val,master[:,2]),1]
return detail[findfirst(x->x==val,detail[:,1]),2]
end
julia> vlook(a,b,103)
1005
A more general approach is to use DataFrame.jl, for working with tabular data.
VLOOKUP is a popular function amongst Excel users, and has signature:
VLOOKUP(lookup_value,table_array,col_index_num,range_lookup)
I've never much liked that last argument range_lookup. First it's not clear to me what "range_lookup" is intended to mean and second it's an optional argument defaulting to the much-less-likely-to-be-what-you-want value of TRUE for approximate matching, rather than FALSE for exact matching.
So in my attempt to write VLOOKUP equivalents in Julia I've dropped the range_lookup argument and added another argument keycol_index_num to allow for searching of other than the first column of table_array.
WARNING
I'm very new new to Julia, so there may be some howlers in the code below. But it seems to work for me. Julia 0.6.4. Also, and as already commented, using DataFrames might be a better solution for looking up values in an array-like structure.
#=
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Procedures: vlookup and vlookup_withfailsafe
Purpose : Inspired by Excel VLOOKUP. Searches a column of table_array for
lookup_values and returns the corresponding elements from another column of
table_array.
Arguments:
lookup_values: a value or array of values to be searched for inside
column keycol_index_num of table_array.
table_array: An array with two dimensions.
failsafe: a single value. The return contains this value whenever an element
of lookup_values is not found.
col_index_num: the number of the column of table_array from which values
are returned.
keycol_index_num: the number of the column of table_array to be searched.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
=#
vlookup = function(lookup_values, table_array::AbstractArray, col_index_num::Int = 2, keycol_index_num::Int = 1)
if ndims(table_array)!=2
error("table_array must have 2 dimensions")
end
if isa(lookup_values,AbstractArray)
indexes = indexin(lookup_values,table_array[:,keycol_index_num])
if(any(indexes==0))
error("at least one element of lookup_values not found in column $keycol_index_num of table_array")
end
return(table_array[indexes,col_index_num])
else
index = indexin([lookup_values],table_array[:,keycol_index_num])[1]
if(index==0)
error("lookup_values not found in column $keycol_index_num of table_array")
end
return(table_array[index,col_index_num])
end
end
vlookup_withfailsafe = function(lookup_values, table_array::AbstractArray, failsafe, col_index_num::Int = 2, keycol_index_num::Int = 1)
if ndims(table_array)!=2
error("table_array must have 2 dimensions")
end
if !isa(failsafe,eltype(tablearray))
error("failsafe must be of the same type as the elements of table_array")
end
if isa(lookup_values,AbstractArray)
indexes = indexin(lookup_values,table_array[:,keycol_index_num])
Result = Array{eltype(table_array)}(size(lookup_values))
for i in 1:length(lookup_values)
if(indexes[i]==0)
Result[i] = failsafe
else
Result[i] = table_array[indexes[i],col_index_num]
end
end
return(Result)
else
index = indexin([lookup_values],table_array[:,keycol_index_num])[1]
if index == 0
return(failsafe)
else
return(table_array[index,col_index_num])
end
end
end
I have some data that is badly formatted. Specifically I have numeric columns that have some elements with spurious text in them (e.g. "8 meters" instead of "8"). I want to use readtable to read in the data, make the necessary fixes to the data and then convert the column to a Float64 so that it behaves correctly (comparison, etc).
There seems to have been a macro called #transform that would do the conversion but it has been deleted. How do I do this now?
My best solution at the moment is to clean up the data, write it out as a csv and then re-read it using readtable and specify eltypes. But that is horrible.
What else can I do?
There is no need to run things via a csv file. You can change or update the DataFrame directly.
using DataFrames
# Lets make up some data
df=DataFrame(A=rand(5),B=["8", "9 meters", "4.5", "3m", "12.0"])
# And then make a function to clean the data
function fixdata(arr)
result = DataArray(Float64, length(arr))
reg = r"[0-9]+\.*[0-9]*"
for i = 1:length(arr)
m = match(reg, arr[i])
if m == nothing
result[i] = NA
else
result[i] = float64(m.match)
end
end
result
end
# Then just apply the function to the column to clean the data
# and then replace the column with the cleaned data.
df[:B] = fixdata(df[:B])
lets say you had a dataframe = df and a column B that has strings to convert.
First this converts a string to a float and returns NA if a failure:
string_to_float(str) = try convert(Float64, str) catch return(NA) end
Then transform that column:
df[:B] = map(string -> string_to_float string, df[:B])
an alternative shorter version
df[:B] = map(string_to_float, df[:B])
I have a date in the format dd-mm-yyyy HH:mm:ss
What is the best and easiest way to validate this date?
I tried
d <- format.Date(date, format="%d-%m-%Y %H:%M:%S")
But how can I catch the error when an illegal date is passed?
Simple way:
d <- try(as.Date(date, format="%d-%m-%Y %H:%M:%S"))
if("try-error" %in% class(d) || is.na(d)) {
print("That wasn't correct!")
}
Explanation: format.Date uses as.Date internally to convert date into an object of the Date class. However, it does not use a format option, so as.Date uses the default format, which is %Y-%m-%dand then %Y/%m/%d.
The format option from format.Date is used only for the output, not for the parsing. Quoting from the as.Date man page:
The ‘as.Date’ methods accept character strings, factors, logical
‘NA’ and objects of classes ‘"POSIXlt"’ and ‘"POSIXct"’. (The
last is converted to days by ignoring the time after midnight in
the representation of the time in specified timezone, default
UTC.) Also objects of class ‘"date"’ (from package ‘date’) and
‘"dates"’ (from package ‘chron’). Character strings are processed
as far as necessary for the format specified: any trailing
characters are ignored.
However, when you directly call as.Date with a format specification, nothing else will be allowed than what fits your format.
See also: ?as.Date
You may want to look at the gsubfn package. This has functions (gsubfn specifically) that work like other regular expression functions to match pieces to a string, but then it calls a user supplied function and passes the matching pieces to this function. So you would write your own function that looks at the year, moth, and day and makes sure that they are in the correct ranges (and the range for day can depend on the passed month and year.
This might be helpful if flexibility is desired in a date-time entry.
I have a function where I want to allow either a date-only entry or a date-time entry, then set a flag - for use inside the function only. I'm calling this flag data_type. The flag will be used later in the larger function to select units for getting a difference in two dates with difftime. (In most cases, the function will be perfectly fine with date only, but in some cases a user might need a shorter time frame. I don't want to inconvenience users with the shorter time frame if they don't need it.)
I am posting this for two reasons: 1) to help anyone trying to allow flexibility in date arguments and 2) to welcome sanity checks in case there's a problem with the method, since this is going into a function in an R package.
dat_time_check_fn <- function(dat_time) {
if (!anyNA(as.Date(dat_time, format= "%Y-%m-%d %H:%M:%S"))) date_type <- 1
else if (!anyNA(as.Date(dat_time, format= "%Y-%m-%d"))) date_type <- 2
else stop("Error: dates must either be in format '1999-12-31' or '1999-12-31 23:59:59' ")
date_type
}
Date-time case
date5 <- "1999-12-31 23:59:59"
date_type <- dat_time_check_fn(date5)
date_type
[1] 1
Date only case:
date6 <- "1999-12-31"
date_type <- dat_time_check_fn(date6)
date_type
[1] 2
Note that if the order above in the function is reversed, the longer date-time can be inadvertently coerced to the shorter version and both types result in date_type = 1.
My larger function has more than one date, but I need them to be compatible. Below, I'm checking the two dates checked above, where one was type 1 and one was type 2. Combining types gives the result with date only (type 2):
date_type <- dat_time_check_fn(c(date5, date6))
date_type
[1] 2
Here's a non-compliant version:
date7 <- "1/31/2011"
date_type <- dat_time_check_fn(date7)
Error in dat_time_check_fn(date7) :
Error: dates must either be in format '1999-12-31' or '1999-12-31 23:59:59'
Many solutions here are prone to SQL injection. They return TRUE for date = "2020-08-11; DROP * FROM my_table". Here is a vectorized base R function that works with NA:
is_date = function(x, format = NULL) {
formatted = try(as.Date(x, format), silent = TRUE)
is_date = as.character(formatted) == x & !is.na(formatted) # valid and identical to input
is_date[is.na(x)] = NA # Insert NA for NA in x
return(is_date)
}
Let's try:
> is_date(c("2020-08-11", "2020-13-32", "2020-08-11; DROP * FROM table", NA), format = "%Y-%m-%d")
## TRUE FALSE FALSE NA
I believe that what you are looking for is the tryCatch function.
The following as an excerpt from a script I wrote which accepts any .csv file with two series that have a common x axis. The first column in 'data' is the common x axis variable, and columns 2 & 3 are the y axis variables. I needed the tryCatch statement to make sure the script would create a plot regardless of whether the x axis data is a time series, or some other type of variable.
### READ DATA FROM A CSV FILE
data = read.csv("STLDvsNEM2.csv", header = TRUE)
#CONVERT FIRST ROW OF DATA (IN MY CASE, THE COLUMN INTENDED TO BE THE X AXIS)
#TO AN ACCEPTABLE DATE FORMAT
#IF FIRST ROW OF DATA IS NOT IN AN ACCEPTABLE DATE FORMAT
#USE THE VALUE WITHOUT ANY TRANSFORMATION
x <- tryCatch({
as.Date(data[,1])},
warning = function(w) {},
error = function(e) {
x <- data[,1]
})
y1 <- data[,2]
y2 <- data[,3]