Pentaho Formula - formula

I am new to Pentaho, so please be gentle.
I am, perhaps naively, wanting to use a Formula to convert a six-character string in the form YYYYMM to the date representing the final day of that month.
I imagine doing this step by step using successive lines of the Formula: checking that the string is of the correct length and, if so:
extracting the year and converting it to integer (with error checking)
extracting the month and converting it to integer (also with error checking)
converting ([year], [month], 1) to a date (the first of the month)
adding a month
subtracting a day
Some of those steps may be combined but, overall, it relies on a succession of steps to achieve a final result.
Formula does not seem to recognise the values achieved along the way though, at least not by enclosing them in square brackets as you do with fields from previous objects in the mapping.
I suppose I could have a series of Formula objects one after the other in the mapping but that seems untidy and inefficient. If a single Formula object cannot have a series of values defined on successive lines, what is the point of even having lines? How do I use a value I have defined on a previous line?

The formula step isn’t the best way to achieve that. The resulting formula will be hard to read and quite cumbersome.
It’s better (and faster) to use a calculator step. A javascript step can also be used, and it will be easier to read, but slower (though that probably won't be a major issue).
So, one way forward is to implement this on a calculator step:
Create a copy of your string field as a Date
Create 2 constant fields: 1 and -1
Add 1 month to the date field
Subtract 1 day to the result
Create a copy of the result as a string.
See screenshot:

Related

How to do addition operations intuitively with datetime in AutoHotKey?

I find that there are two ways to calculate the date in AutoHotKey:
Use EnvAdd, which is synonymous with var += value
Convert the date to the YYYYMMDDHH24MISS format, and calculate it as if it's a regular number, then convert back to date format
It seems that using EnvAdd is better, because it has a parameter to determine the time unit. (Using the second method may lead to unaccepted value, such as days 62 or month 20.) But since EnvAdd only changes the current value of the input variable, not assign the result to another variable, so if I want to have keep the original one, I have to do this:
a:=b
a+=10
This is counter-intuitive, because the original value is stored in a new variable, while it's more natural to expect the original value is stored in the old variable.
Is there a way to keep it more natural to read?
Do you mean something like this?
a:=b+10
I was confused. I should have used:
b := a
b += 10, months

How to compare 2 dates to obtain whether they are before, same as, or ahead of each other based on quarter?

I have 2 fields of datetimestamp type. I need to compare them based on the quarters those dates occur in and determine whether one occurs in a past, same as, or future quarter.
2 fields: pay.check_dt and pay.done_in_dt. I want to know if pay.check_dt occurs in a prior, same as, or future quarter in comparison to pay.done_in_date
I originally thought to use a case statement converting them using to_Char(fieldname, 'Q-YYYY'), but then I can't to the mathematical comparison because they are then character strings.
Thanks for the help!
Craig
Use the TRUNC(date) function: documentation
I have no DB available now, but something like:
TRUNC(pay.check_dt, 'Q') < TRUNC(pay.done_in_dt, 'Q')

Can I make a time series work with date objects rather than integers?

I have time series data that I'm trying to analyse in R. It was provided as a CSV from excel, which I subsequently read as a data.frame all. Let's say it has two columns: all$date and all$people, representing the count of people on a particular date. The frequency is hence daily.
Being from Excel, the dates are integers representing the number of days since 1900-01-01.
I could read the data as people = ts(all$people, start=c(all$date[1], 1), frequency=365); but that gives a silly start value of almost 40000 because the data starts in 2006. The start parameter doesn't take a date object, according to ?ts, so I can't just use as.Date():
ts - ...
start: the time of the first observation. Either a single number
or a vector of two integers, which specify a natural time unit and
a (1-based) number of samples into the time unit. See the examples
for the use of the second form.
I could of course set start=1, but it's a bit painful to figure out what season we're in when the plot tells me interesting things are happening around day 2100. (To be clear, setting frequency=365 does tell me what year we're in, but isn't useful more precise dates). Is there a useful way of expressing the date in ts in a human-readable form so that I don't have to keep calling as.Date() to understand when the interesting features are happening?

Index xts using string and return only observations at that exact time

I have an xts time series in R and am using the very handy function to subset the time series based on a string, for example
time_series["17/06/2006 12:00:00"]
This will return the nearest observation to that date/time - which is very handy in many situations. However, in this particular situation I only want to return the elements of the time series which are at that exact time. Is there a way to do this in xts using a nice date/time string like this?
In a more general case (I don't have this problem immediately now, but suspect I may run into it soon) - is it possible to extract the closest observation within a certain period of time? For example, the closest observation to the given date/time, assuming it is within 10 minutes of the given date/time - otherwise just discard that observation.
I suspect this more general case may require me writing a function to do this - which I am happy to do - I just wanted to check whether the more specific case (or the general case) was already catered for in xts.
AFAIK, the only way to do this is to use a subset that begins at the time you're interested in, then get the first observation of that.
e.g.
first(time_series["2006-06-17 12:00:00/2006-06-17 12:01"])
or, more generally, to get the 12:00 price every day, you can subset down to 1 minute of each day, then split by days and extract the first observation of each.
do.call(rbind, lapply(split(time_series["T12:00:00/T12:01"],'days'), first))
Here's a thread where Jeff (the xts author) contemplates adding the functionality you want
http://r.789695.n4.nabble.com/Find-first-trade-of-day-in-xts-object-td3598441.html#a3599887

Specific date format conversion problems in R

Basically I want to know why as.Date(200322,format="%Y%W") gives me NA. While we are at it, I would appreciate any advice on a data structure for repeated cross-section (aka pseudo-panel) in R.
I did get aggregate() to (sort of) work, but it is not flexible enough - it misses data on columns when I omit the missed values, for example.
Specifically, I have a survey that is repeated weekly for a couple of years with a bunch of similar questions answers to which I would like to combine, average, condition and plot in both dimensions. Getting the date conversion right should presumably help me towards my goal with zoo package or something similar.
Any input is appreciated.
Update: thanks for string suggestion, but as you can see in your own example, %W part doesn't work - it only identifies the year while setting the current day while I need to set a specific week (and leave the day blank).
Use a string as first argument in as.Date() and select a specific weekday (format %w, value 0-6). There are seven possible dates in each week, therefore strptime needs more information to select a unique date. Otherwise the current day and month are returned.
> as.Date(paste("200947", "0", sep="-"), format="%Y%W-%w")
[1] "2009-11-22"

Resources