IF with duration (hh:mm:ss) - datetime

I have a column of durations that I received in text format looking like 00:01:20s. I removed the s from the end and formatted it as duration in Google Sheets, so the text 00:01:20 became duration 00:01:20.
What I'd like to do is take each entry that is 0:00:00 and change it to 0:00:01. Here's my formula (where column B contains the duration):
=if(B2=00:00:00,00:00:01,B2)
When I run it, I get Formula parse error.
I double-checked to make sure that column B was usable in calculations by adding two cells, and the result was correct. The column in which I put the formula is also formatted as duration. I've been Googling, and somehow haven't found another case of this error.
What am I missing?

I avoid Duration format (at least for the time being!) but please try:
=if(VALUE(B2)=0,time(0,0,1),B2)

Related

Pentaho Formula

I am new to Pentaho, so please be gentle.
I am, perhaps naively, wanting to use a Formula to convert a six-character string in the form YYYYMM to the date representing the final day of that month.
I imagine doing this step by step using successive lines of the Formula: checking that the string is of the correct length and, if so:
extracting the year and converting it to integer (with error checking)
extracting the month and converting it to integer (also with error checking)
converting ([year], [month], 1) to a date (the first of the month)
adding a month
subtracting a day
Some of those steps may be combined but, overall, it relies on a succession of steps to achieve a final result.
Formula does not seem to recognise the values achieved along the way though, at least not by enclosing them in square brackets as you do with fields from previous objects in the mapping.
I suppose I could have a series of Formula objects one after the other in the mapping but that seems untidy and inefficient. If a single Formula object cannot have a series of values defined on successive lines, what is the point of even having lines? How do I use a value I have defined on a previous line?
The formula step isn’t the best way to achieve that. The resulting formula will be hard to read and quite cumbersome.
It’s better (and faster) to use a calculator step. A javascript step can also be used, and it will be easier to read, but slower (though that probably won't be a major issue).
So, one way forward is to implement this on a calculator step:
Create a copy of your string field as a Date
Create 2 constant fields: 1 and -1
Add 1 month to the date field
Subtract 1 day to the result
Create a copy of the result as a string.
See screenshot:

Reading dates as dates and not characters

The data I have been working with reads everything just fine, **except** for the date column. It always reads it as characters instead.
This would be fine except that, when I have lots of dates (like over 400 of them), then you can see something like this on a scatterplot:
Scatter Plot
In essence, I have two questions.
The first is, apart from using as.Date, which is fine when I'm needed temporary stuff, how do I permanently make R read the date column as legit dates? What I mean is, is there a way I can make that date column read as dates when I am using read.csv or read.excel?
When graphing, like the graph I have included here, how can I only include some of the labels throughout so that it won't be so cramped up? I still want all the data, but really do not want all those labels.
I was hoping to add the data file, but I am unaware of how to add excel/csv files on this website and my data set is quite long (n = 491). I do have 9 columns, 1 of which is the date column. The others are numbers or actual letters (the latter of which is in fact a character). I can add maybe a few rows just to help out.
Some of the data set

Gather turns my dates into unrecognisable format

I am trying to gather a couple columns of dates so that its easier for it to be choices in shiny. However, when I gather dates, it turns into for example, 2020/12/14 to 128284 format. I have tried as.Date, as.character, I have tried lubridating but it doesn't work. (I have been gathering in a separate script besides shiny). Please see my code when gathering.
Here is my data
before gather
df<-df%>%gather(key="date.type", value="dates",
date.1, date.2, date.3, date.4)
This turns it to something like this;
after gather
This becomes a problem when I am trying to find difference between two dates in Shiny(I have been using difftime).
The error I get in shiny is:
x character string is not in a standard unambiguous format
I am also thinking of not gathering at all, but allowing the user to choose the from date column and to date column in the UI, but I am not sure how to then find the difference in days between the from and to dates in the server.
mutate(theduration=difftime(input$to,input$from,units="days")
This doesn't work.
OK, so I had this problem when using gather to make a dataset. When you get those 5 digit time character blocks, try this:
mutate(time=as.Date(as.numeric(time),origin="1899-12-30"))
Apparently, that that 5 digit number is days since the origin date. It's a MS thing. Good Luck!

Mixed Timed Data

I have a vector that contains time data, but there's a problem: some of the entries are listed as dates (e.g., 10/11/2017), while other entries are listed as dates with time (e.g., 12/15/2016 09:07:17). This is problematic for myself, since as.Date() can't recognize the time portion and enters dates in an odd format (0012-01-20), while seemingly adding dates with time entries as NA's. Furthermore, using as.POSIXct() doesn't work, since not all entries are a combination of date with time.
I suspect that, since these entries are entered in a consistent format, I could hypothetically use an if function to change the entries in the vector to a consistent format, such as using an if statement to remove time entirely, but I don't know enough about it to get it to work.
use
library(lubridate)
Name of the data frame or table-> x
the column that has date->Date
use the ymd function
x$newdate<-ydm(x$Date)

timedeltas and datetimes subtraction and converting to duration in minutes

I am at a standstill with this problem. I outlined it in another question ( Creating data histograms/visualizations using ipython and filtering out some values ) which meandered a bit so I'd like to fix the question and give it more context since I am sure others must have a workaround for this or have the problem. I've also seen similar, not identical, questions asked and can't quite adapt any of the solutions thus far given.
I have columns in my data frame for Start Time and End Time and created a 'Duration' column for time lapsed. I'm using ipython.
The Start Time/End Time columns have fields that look like:
2014/03/30 15:45
A date and then a time in hh:mm
when I type:
pd.to_datetime('End Time') and
pd.to_datetime('Start Time')
I get fields resulting that look like:
2014-03-30 15:45:00
same date but with hyphens and same time but with :00 seconds appended
I then decided to create a new column for the difference between the End and Start times. The 'Duration' or time lapsed column was created by typing in one command:
df['Duration'] = pd.to_datetime(df['End Time'])-pd.to_datetime(df['Start Time'])
The format of the fields in the duration column is:
01:14:00
no date just a time lapsed in the format hh:mm:ss
to indicate time lapsed or 74 mins in the above example.
When I type:
df.Duration.dtype
dtype('m8[ns]') is returned, whereas, when I type
df.Duration.head(4)
0 00:14:00
1 00:16:00
2 00:03:00
3 00:09:00
Name: Duration, dtype: timedelta64[ns]
is returned which seems to indicate a different dtype for Duration.
How can I convert the format I have in the Duration column to a single integer value of minutes (time lapsed)? I see no methods that I can use, I'd write a function but wouldn't know how to treat the input of hh:mm:ss. This must be a common requirement of data analysis, should I be going about converting these dates and times differently if my end goal is to get a single integer indicating minutes lapsed? Should I just be using Excel?... because I have so far spent a day on this problem and it should be a simple problem to solve.
**update:
THANK YOU!! (Jeff and Dataswede) I added a column with the command:
df['Durationendminusstart'] = pd.to_timedelta(df.Duration,unit='ns').astype('timedelta64[m]')
which seems to give me the Duration (minutes lapsed) as wanted so that huge part is solved!
What still is not clear is why there were two different dtypes for the same column depending how I asked, oh well right now it doesn't matter.**

Resources