I have a text file of many rows containing date and time and the end goal is for me to group together the number of rows per week that their date values are in. This is so that I can plot a scatter diagram with x values being the week number and y values being the frequency. For example the text file (dates.txt):
Mon May 11 22:51:27 2013
Mon May 11 22:58:34 2013
Wed May 13 23:15:27 2013
Thu May 14 04:11:22 2013
Sat May 16 19:46:55 2013
Sat May 16 22:29:54 2013
Sun May 17 02:08:45 2013
Sun May 17 23:55:15 2013
Mon May 18 00:42:07 2013
So from here, week 1 will have a frequency of 6 and week 2 will have a frequency of 1
As I want to plot a scatter diagram for this, I want to convert them to text value first using strptime() with format %a %b
my attempt so far has been
time_stamp <- strptime(time_stamp, format='%a.%b')
However it shows the input string is too long. I'm very new to R-studio so could somebody please help me figure this out?
Thank you
Example of final output graph : https://imgur.com/a/3o3DivA
You could use readLines() to avoid the data frame, then read time using strptime, and finally strftime to format the output.
strftime(strptime(readLines('dates.txt'), '%c'), '%a.%b')
# [1] "Sat.May" "Sat.May" "Mon.May" "Tue.May" "Thu.May" "Thu.May" "Fri.May" "Fri.May" "Sat.May"
Edit
So it appears that your dates have a time zone abbreviation "Mon Apr 06 23:49:29 PDT 2009". Since it is constant during the dates we can specify it literally in the pattern.
We will use '%d_%m' for strftime to get something numeric seperated by _ with which we feed strsplit and then type.convert into numerics.
Finally we unlist, create a matrix that we fill byrow, and plot the guy.
strptime(readLines('timestamp.txt'), '%a %b %d %H:%M:%S PDT %Y') |>
strftime('%d_%m') |>
strsplit('_') |>
type.convert(as.is=TRUE) |>
unlist() |>
matrix(ncol=2, byrow=TRUE) |>
plot(pch=20, col=4, main='My Plot', xlab='day', ylab='month')
Note: Please use R>=4.1 for the |> pipes.
You need to first read (or assign) the data, parse it to a date type and then use that to e.g. get the number of the week.
Here is one example
text <- "Mon May 11 22:51:27 2013
Mon May 11 22:58:34 2013
Wed May 13 23:15:27 2013
Thu May 14 04:11:22 2013
Sat May 16 19:46:55 2013
Sat May 16 22:29:54 2013
Sun May 17 02:08:45 2013
Sun May 17 23:55:15 2013
Mon May 18 00:42:07 2013"
data <- read.table(text=text, sep='\n', col.names="dates")
data$parse <- anytime::anytime(data$dates)
data$week <- as.integer(format(data$parse, "%V"))
data
The result is a new data.frame object:
> data
dates parse week
1 Mon May 11 22:51:27 2013 2013-05-11 22:51:27 19
2 Mon May 11 22:58:34 2013 2013-05-11 22:58:34 19
3 Wed May 13 23:15:27 2013 2013-05-13 23:15:27 20
4 Thu May 14 04:11:22 2013 2013-05-14 04:11:22 20
5 Sat May 16 19:46:55 2013 2013-05-16 19:46:55 20
6 Sat May 16 22:29:54 2013 2013-05-16 22:29:54 20
7 Sun May 17 02:08:45 2013 2013-05-17 02:08:45 20
8 Sun May 17 23:55:15 2013 2013-05-17 23:55:15 20
9 Mon May 18 00:42:07 2013 2013-05-18 00:42:07 20
>
Related
I have the following 2 columns as part of a larger data frame. The Timezone_Offset is the difference in hours for the local time (US West Coast in the data I'm looking at). In other words, UTC + Offset = Local Time.
I'm looking to convert the UTC time to the local time, while also correctly changing the day of the week and date, if necessary. For instance, here are the first 5 rows of the two columns.
UTC Timezone_Offset
Sun Apr 08 02:42:03 +0000 2012 -7
Sun Jul 01 03:27:20 +0000 2012 -7
Wed Jul 11 04:40:18 +0000 2012 -7
Sat Nov 17 01:31:36 +0000 2012 -8
Sun Apr 08 20:50:30 +0000 2012 -7
Things get tricky when the day of the week and date also have to be changed. For instance, looking at the first row, the local time should be Sat Apr 07 19:42:03 +0000 2012. In the second row, the month also has to be changed.
Sorry, I'm fairly new to R. Could someone possibly explain how to do this? Thank you so much in advance.
Parse as UTC, then apply the offset in seconds, ie times 60*60 :
data <- read.csv(text="UTC, Timezone_Offset
Sun Apr 08 02:42:03 +0000 2012, -7
Sun Jul 01 03:27:20 +0000 2012, -7
Wed Jul 11 04:40:18 +0000 2012, -7
Sat Nov 17 01:31:36 +0000 2012, -8
Sun Apr 08 20:50:30 +0000 2012, -7", stringsAsFactors=FALSE)
data$pt <- as.POSIXct(strptime(data$UTC, "%a %b %d %H:%M:%S %z %Y", tz="UTC"))
data$local <- data$pt + data$Timezone_Offset*60*60
Result:
> data[,3:4]
pt local
1 2012-04-08 02:42:03 2012-04-07 19:42:03
2 2012-07-01 03:27:20 2012-06-30 20:27:20
3 2012-07-11 04:40:18 2012-07-10 21:40:18
4 2012-11-17 01:31:36 2012-11-16 17:31:36
5 2012-04-08 20:50:30 2012-04-08 13:50:30
>
I am running the following R codes in Rstudio with the aim to convert a wide data frame (called 'merged') into a long one.
> merged
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
2017 (A) 5980 5341 5890 5596 5753 5470 5589 5545 5749 5938 5844 5356
2017 (P) 5762 5275 5733 5411 5406 4954 5464 5536 5805 5819 5903 5630
I'm after the following output:
Description Month RN
2017 (A) Jan 5980
2017 (P) Jan 5762
2017 (A) Feb 5341
2017 (P) Feb 5275
... ... ...
I have tried the following (but with no success):
library(reshape2)
merged_long <- melt(data=merged,
id.vars="Description",
variable.name="Month",
value.name="RN")
I'm getting the following error message:
Error: id variables not found in data: Description
What am I doing wrong?
As noted by #Sotos in the comments, data in the rownames of the merged data set is required to uniquely identify an observation in the melted data set. To include the rownames in the melted data set, add the following to your code.
merged$Description <- rownames(merged)
Then your original code should produce the expected result.
library(reshape2)
merged_long <- melt(data=merged,
id.vars="Description",
variable.name="Month",
value.name="RN")
It's easiest to just use melt(as.matrix(...)) given the nature of your data. Omit the as.matrix part if your data is already a matrix, obviously.
melt(as.matrix(mydf))
You can use setNames to rename the columns at the same time:
setNames(melt(as.matrix(mydf)), c("Description", "Month", "RN"))
# Description Month RN
# 1 2017 (A) Jan 5980
# 2 2017 (P) Jan 5762
# 3 2017 (A) Feb 5341
# .........................
# .........................
# 23 2017 (A) Dec 5356
# 24 2017 (P) Dec 5630
I have a dataset with dates in following format:
Initial:
Jan-2015 Apr-2013 Jun-2014 Jan-2015 Jan-2016 Jan-2015 Jan-2016 Jan-2015 Apr-2012 Nov-2012 Jun-2013 Sep-2013
Final:
Feb-2014 Jan-2013 Sep-2014 Apr-2013 Sep-2014 Mar-2013 Aug-2012 Apr-2012 Oct-2012 Oct-2013 Jun-2014 Oct-2013
I would like to perform these steps:
create dummy variables for Month and Year
Subtract these dates from another dates to find out duration (final- initials) in months
I would like to do these in R?
You could use as.yearmon from the zoo package for this.
library(zoo)
12 * (as.yearmon("Jan-2015", "%b-%Y") - as.yearmon("Feb-2014", "%b-%Y"))
# result
# [1] 11
To expand on #neilfws answer, you can use the month and year functions from the lubridate package to create your dummy variables with the month and year in your data frame.
Here is the code:
library(lubridate)
library(zoo)
df <- data.frame(Initial = c("Jan-2015", "Apr-2013", "Jun-2014", "Jan-2015", "Jan-2016", "Jan-2015",
"Jan-2016", "Jan-2015", "Apr-2012", "Nov-2012", "Jun-2013", "Sep-2013"),
Final = c("Feb-2014", "Jan-2013", "Sep-2014", "Apr-2013", "Sep-2014", "Mar-2013",
"Aug-2012", "Apr-2012", "Oct-2012", "Oct-2013", "Jun-2014", "Oct-2013"))
df$Initial <- as.character(df$Initial)
df$Final <- as.character(df$Final)
df$Initial <- as.yearmon(df$Initial, "%b-%Y")
df$Final <- as.yearmon(df$Final, "%b-%Y")
df$month_initial <- month(df$Initial)
df$year_intial <- year(df$Initial)
df$month_final <- month(df$Final)
df$year_final <- year(df$Final)
df$Difference <- 12*(df$Initial-df$Final)
And here is the final data.frame:
> head(df)
Initial Final month_initial year_intial month_final year_final Difference
1 Jan 2015 Feb 2014 1 2015 2 2014 11
2 Apr 2013 Jan 2013 4 2013 1 2013 3
3 Jun 2014 Sep 2014 6 2014 9 2014 -3
4 Jan 2015 Apr 2013 1 2015 4 2013 21
5 Jan 2016 Sep 2014 1 2016 9 2014 16
6 Jan 2015 Mar 2013 1 2015 3 2013 22
Hope this helps!
diff(seq(as.Date("2016-12-21"), as.Date("2017-04-05"), by="month"))
Time differences in days
[1] 31 31 28
The above code generates no of days in the month Dec, Jan and Feb.
However, my requirement is as follows
#Results that I need
#monthly days from date 2016-12-21 to 2017-04-05
11, 31, 28, 31, 5
#i.e 11 days of Dec, 31 of Jan, 28 of Feb, 31 of Mar and 5 days of Apr.
I even tried days_in_month from lubridate but not able to achieve the result
library(lubridate)
days_in_month(c(as.Date("2016-12-21"), as.Date("2017-04-05")))
Dec Apr
31 30
Try this:
x = rle(format(seq(as.Date("2016-12-21"), as.Date("2017-04-05"), by=1), '%b'))
> setNames(x$lengths, x$values)
# Dec Jan Feb Mar Apr
# 11 31 28 31 5
Although we have seen a clever replacement of table by rle and a pure table solution, I want to add two approaches using grouping. All approaches have in common that they create a sequence of days between the two given dates and aggregate by month but in different ways.
aggregate()
This one uses base R:
# create sequence of days
days <- seq(as.Date("2016-12-21"), as.Date("2017-04-05"), by = 1)
# aggregate by month
aggregate(days, list(month = format(days, "%b")), length)
# month x
#1 Apr 5
#2 Dez 11
#3 Feb 28
#4 Jan 31
#5 Mrz 31
Unfortunately, the months are ordered alphabetically as it happened with the simple table() approach. In these situations, I do prefer the ISO8601 way of unambiguously naming the months:
aggregate(days, list(month = format(days, "%Y-%m")), length)
# month x
#1 2016-12 11
#2 2017-01 31
#3 2017-02 28
#4 2017-03 31
#5 2017-04 5
data.table
Now that I've got used to the data.table syntax, this is my preferred approach:
library(data.table)
data.table(days)[, .N, .(month = format(days, "%b"))]
# month N
#1: Dez 11
#2: Jan 31
#3: Feb 28
#4: Mrz 31
#5: Apr 5
The order of months is kept as they have appeared in the input vector.
I am trying to plot a series of sunset times in matplotlib but I get the following error:
"TypeError: Empty 'DataFrame': no numeric data to plot"
I have looked at several options to convert, e.g. plt.dates.date2num but that doesn't really fullfil my needs as i would like to plot it in a readable format, i.e. times. All examples I have found have times on the x-axis but non have them on the y-axis.
Is there no way of accomplishing this task? Has anyone got an idea?
I am looking very forward to your replies.
Best regards, Arne
3 Jan 2013 16:44:00
4 Jan 2013 16:45:00
5 Jan 2013 16:46:00
6 Jan 2013 16:47:00
7 Jan 2013 16:48:00
8 Jan 2013 16:49:00
9 Jan 2013 16:51:00
10 Jan 2013 16:52:00
11 Jan 2013 16:53:00
12 Jan 2013 16:55:00
13 Jan 2013 16:56:00
14 Jan 2013 16:57:00
It's not quite clear from your question if you're trying to plot some unspecified data on the x-axis with date/time on the y-axis or if you're trying to plot days on the x-axis with times on the y-axis.
From your question, though, I'm going to assume it's the latter.
It sounds like you might be using pandas, but for the moment, I'll just assume you have two sequences of strings: One with the day, and another sequence with the time.
To treat a given axis as dates, just call ax.xaxis_date() or ax.yaxis_date(). In this case, both will actually be dates. (The times will have today as the day, though you won't see this directly.)
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
date = ['3 Jan 2013', '4 Jan 2013', '5 Jan 2013', '6 Jan 2013', '7 Jan 2013',
'8 Jan 2013', '9 Jan 2013', '10 Jan 2013', '11 Jan 2013', '12 Jan 2013',
'13 Jan 2013', '14 Jan 2013']
time = ['16:44:00', '16:45:00', '16:46:00', '16:47:00', '16:48:00', '16:49:00',
'16:51:00', '16:52:00', '16:53:00', '16:55:00', '16:56:00', '16:57:00']
# Convert to matplotlib's internal date format.
x = mdates.datestr2num(date)
y = mdates.datestr2num(time)
fig, ax = plt.subplots()
ax.plot(x, y, 'ro-')
ax.yaxis_date()
ax.xaxis_date()
# Optional. Just rotates x-ticklabels in this case.
fig.autofmt_xdate()
plt.show()