I am at a standstill with this problem. I outlined it in another question ( Creating data histograms/visualizations using ipython and filtering out some values ) which meandered a bit so I'd like to fix the question and give it more context since I am sure others must have a workaround for this or have the problem. I've also seen similar, not identical, questions asked and can't quite adapt any of the solutions thus far given.
I have columns in my data frame for Start Time and End Time and created a 'Duration' column for time lapsed. I'm using ipython.
The Start Time/End Time columns have fields that look like:
2014/03/30 15:45
A date and then a time in hh:mm
when I type:
pd.to_datetime('End Time') and
pd.to_datetime('Start Time')
I get fields resulting that look like:
2014-03-30 15:45:00
same date but with hyphens and same time but with :00 seconds appended
I then decided to create a new column for the difference between the End and Start times. The 'Duration' or time lapsed column was created by typing in one command:
df['Duration'] = pd.to_datetime(df['End Time'])-pd.to_datetime(df['Start Time'])
The format of the fields in the duration column is:
01:14:00
no date just a time lapsed in the format hh:mm:ss
to indicate time lapsed or 74 mins in the above example.
When I type:
df.Duration.dtype
dtype('m8[ns]') is returned, whereas, when I type
df.Duration.head(4)
0 00:14:00
1 00:16:00
2 00:03:00
3 00:09:00
Name: Duration, dtype: timedelta64[ns]
is returned which seems to indicate a different dtype for Duration.
How can I convert the format I have in the Duration column to a single integer value of minutes (time lapsed)? I see no methods that I can use, I'd write a function but wouldn't know how to treat the input of hh:mm:ss. This must be a common requirement of data analysis, should I be going about converting these dates and times differently if my end goal is to get a single integer indicating minutes lapsed? Should I just be using Excel?... because I have so far spent a day on this problem and it should be a simple problem to solve.
**update:
THANK YOU!! (Jeff and Dataswede) I added a column with the command:
df['Durationendminusstart'] = pd.to_timedelta(df.Duration,unit='ns').astype('timedelta64[m]')
which seems to give me the Duration (minutes lapsed) as wanted so that huge part is solved!
What still is not clear is why there were two different dtypes for the same column depending how I asked, oh well right now it doesn't matter.**
Related
I have a dataset file with a time variable in "seconds since 1981-01-01 00:00:00". What I need is to convert this time into calendar date (YYYY-MM-DD HH:mm:ss). I've seen a lot of different ways to do this for time since epoch (1970) (timestamp, calendar.timegm, etc) but I'm failing to do this with a different reference date.
One option is to simply add 347133600s (11 years) to each value in seconds. this will then allow you to simply use conversion as it would be from 1970-01-01.
Referring to the above screenshot, i'm trying to crawl data from Singapore Stock Exchange, which the web content is loaded dynamically from an API call returning json, example here
I'm having some problem with the dates, which is given as a number by the json. For example, 1575491760000 is supposed to be 2019-12-04 20:36:00GMT.
After some trial and error, i've figured solution using R:
as.POSIXct(1575491760000/1000, origin="1970-01-01", tz = 'GMT')
# not sure why need to divide the number by 1000 here but i guess this is the way to make it work
and the above code does return "2019-12-04 20:36:00 GMT" in R.
However, my question is there a solution to the above conversion in Excel? I've tried a few different ways but none of them can deal with such long data scenario (date + time format). Appreciated if anyone can provide a specific solution!
Here's the Excel equivalent.
=DATE(1970,1,1) + 1575491760000/(1000*60*60*24)
# 12/4/19 20:36:00 with cell formatting set to m/d/yy h:mm:ss
Unix time increments one for every millisecond since 1/1/1970. Excel datetimes increment one for every day since 1/1/1900.
So to convert from UNIX time to excel, divide by the number of milliseconds in a day (1000*60*60*24) and add to the date 1/1/70 (25569 under the hood in Excel.)
This is my first time ever asking a question on Stack Overflow and I'm a programming novice so any advice as to how to improve my question asking abilities would be appreciated.
Onto my question: I have two csv files, one containing three columns (date time in dd/mm/yyyy hh:(00 or 30) format, production of a certain product, and demand for said product), and the other containing several columns (decomposition of the date time into year, month, day, hour, and whether it is :00 or :30 represented by 1 or 2 respectively, alongside several columns for independent variables which may affect production/demand of said product).
I've only played around with the first csv file, converting the string into a datetime object but the ts() function won't recognise the datetime objects as my times. I've tried adjusting the frequency parameter but ultimately failed and have no idea how to create a time series using half hourly data. Would appreciate any help.
Thanks in advance!
My suggestion is to apply the "difftime" over all your time data. For instance, like following code, you can use your initial time (the time of first record) for all comparisons as time_start and the others as time_finish. Then it return the time intervals as number of seconds and then you are ready to use other column values as the value of the time stamps.
interval=as.integer(difftime(strptime(time_finish,"%H:%M"),strptime(time_start,"%H:%M"),units = "sec"))
Second 0 10 15 ....
I am totally new to R environment and I'm stuck at Date operations. The scenario is, I have a daily database of customer activity of a certain Store, and I need to extract last 30 months data starting from current date.
In other words, suppose today is 18-NOV-2014, I need all the data from 18-OCT-2014 till today in a separate data-frame. To extract it, what kind of iteration logic should I write in R?
You don't need an iteration. What you could do is, assuming your data.frame is called X, and the date column, DATE, you could write:
X$DATE=as.Date(X$DATE, format='%d-%B-%Y')
the 'format' argument is to match your date format you specify in you question. Then, to get the lines you are interested in, something like:
X[X$DATE>=as.Date(today(),format='%d-%B-%Y')-30)]
which is all the lines that are after today - 30 days.
Does this help at all?
I've come into possession of hundreds of ascii data files where the date and time are separate columns like so:
date time
1-Jan-08 23:05
I need to convert this to a usable R Date object, subtract 8 hours (timezone conversion from UTC to Pacific) and then turn it into unix time. I need to do this since the data are collected every evening (from 5pm through 2am the following morning). So if I were to use regular date/time format it would confound days (day1 spans two days when in fact it was just one evening of data collection). I'd like to consider each day's events separately.
Using unixtime will allow me to calculate time differences in events that occur each day (I will probably retain a date field in addition to the unix time). Can someone suggest an efficient way to do this?
Here is some data to use (this is in UTC)
dummy=data.frame(date="1-Jan-08",time="23:05")
Paste them together (which works vectorised) and then parse, e.g.
datetime <- paste(dummy$date, dummy$time)
parsed <- strptime(datetime, "%d-%b-%y %H:%M")
which you can also assign as columns in the data frame.
Edit: strptime() has an optional tz="" argument you can use.