I receive historical demand data from an excel spreadsheet in the following format:
Part Number Requested Date Quantity
123 01/24/2013 12:53 1
122 02/07/2013 09:57 1
122 02/14/2013 09:58 7
124 11/21/2012 12:46 1
I typically provide my management charts from excel by part like this but I want to be more proficient in R---doing hundreds of these at time
We can try with as.yearmon to create the grouping variable and then get the sum of 'Quantity' in summarise
library(zoo)
library(dplyr)
df1 %>%
group_by(PartNumber, yearMon = as.yearmon(RequestedDate, "%m/%d/%Y %H:%M")) %>%
summarise(Quantity = sum(Quantity))
Related
I'm trying to visualize some bird data, however after grouping by month, the resulting output is out of order from the original data. It is in order for December, January, February, and March in the original, but after manipulating it results in December, February, January, March.
Any ideas how I can fix this or sort the rows?
This is the code:
BirdDataTimeClean <- BirdDataTimes %>%
group_by(Date) %>%
summarise(Gulls=sum(Gulls), Terns=sum(Terns), Sandpipers=sum(Sandpipers),
Plovers=sum(Plovers), Pelicans=sum(Pelicans), Oystercatchers=sum(Oystercatchers),
Egrets=sum(Egrets), PeregrineFalcon=sum(Peregrine_Falcon), BlackPhoebe=sum(Black_Phoebe),
Raven=sum(Common_Raven))
BirdDataTimeClean2 <- BirdDataTimeClean %>%
pivot_longer(!Date, names_to = "Species", values_to = "Count")
You haven't shared any workable data but i face this many times when reading from csv and hence all dates and data are in character.
as suggested, please convert the date data to "date" format using lubridate package or base as.Date() and then arrange() in dplyr will work or even group_by
example :toy data created
birds <- data.table(dates = c("2020-Feb-20","2020-Jan-20","2020-Dec-20","2020-Apr-20"),
species = c('Gulls','Turns','Gulls','Sandpiper'),
Counts = c(20,30,40,50)
str(birds) will show date is character (and I have not kept order)
using lubridate convert dates
birds$dates%>%lubridate::ymd() will change to date data-type
birds$dates%>%ymd()%>%str()
Date[1:4], format: "2020-02-20" "2020-01-20" "2020-12-20" "2020-04-20"
save it with birds$dates <- ymd(birds$dates) or do it in your pipeline as follows
now simply so the dplyr analysis:
birds%>%group_by(Months= ymd(dates))%>%
summarise(N=n()
,Species_Count = sum(Counts)
)%>%arrange(Months)
will give
# A tibble: 4 x 3
Months N Species_Count
<date> <int> <dbl>
1 2020-01-20 1 30
2 2020-02-20 1 20
3 2020-04-20 1 50
However, if you want Apr , Jan instead of numbers and apply as.Date() with format etc, the dates become "character" again. I woudl suggest you keep your data that way and while representing in output for others -> format it there with as.Date or if using DT or other datatables -> check the output formatting options. That way your original data remains and users see what they want.
this will make it character
birds%>%group_by(Months= as.character.Date(dates))%>%
summarise(N=n()
,Species_Count = sum(Counts)
)%>%arrange(Months)
A tibble: 4 x 3
Months N Species_Count
<chr> <int> <dbl>
1 2020-Apr-20 1 50
2 2020-Dec-20 1 40
3 2020-Feb-20 1 20
4 2020-Jan-20 1 30
I have the following data frame:
Cold Date
1 Yes "21/10/2018 22:00"
2 No "05/10/2019 15:32"
3 Yes "07/12/2020 21:20"
4 No "31/08/2019 03:45"
5 No "08/12/2020 11:12"
I would like to plot to see how many occurrences (counts) there are for each month. This means the months should be on the X-as. However, as you can see, the column "Date" is formatted as a string. Also, the timestamp is included.
Furthermore, there are multiple years included in the column. I think it's best to arrange multiple plots at the same time to get a nice overview of what is happening for each month in each year. Do you guys have an idea how I could tackle this problem? I have no idea where to begin.
Are you looking for this:
library(dplyr)
library(ggplot2)
library(lubridate)
df %>% mutate(Date = dmy_hm(Date)) %>% count(Month = month(Date)) %>%
ggplot(aes(Month, n)) + geom_col()
Data used:
df
Cold Date
1 Yes 21/10/2018 22:00
2 No 05/10/2019 15:32
3 Yes 07/12/2020 21:20
4 No 31/08/2019 03:45
5 No 08/12/2020 11:12
Here is a base R option -
val <- format(sort(as.POSIXct(df$Date, format = '%d/%m/%Y %H:%M',tz = 'UTC')), '%b-%Y')
barplot(table(factor(val, unique(val))))
I have a character column as "Date" but not written in proper mmddyyyy format.
Example:
ID Date
125 9282007
350 10152007
225 1112007
240 1052007
How can I format the "Date" column to proper mmddyyyy format using R.?
Thanks in advance.
You can do the following with dplyr + lubridate:
library(dplyr)
library(lubridate)
df %>%
mutate(Date = gsub("(\\d{2})(\\d{4}$)", "-\\1-\\2", Date),
Date = format(mdy(Date), "%m-%d-%Y"))
Result:
ID Date
1 125 09-28-2007
2 350 10-15-2007
3 225 01-11-2007
Data:
df = read.table(text = "ID Date
125 9282007
350 10152007
225 1112007", header = TRUE)
Note:
The format of your original Date's is ambiguous since 1112007 could mean 11-1-2007 or 1-11-2007. In my solution, I assumed that days is always two digit, so 11-1-2007 would be have been coded as 11-01-2007. You should however investigate whether that is the case before converting.
I've found similar issues, focusing on returning X based on min of column Y, but I'm having trouble with this function. I'm trying to return the min of column X, if column Y is equal to a particular value.
Here is sample data frame (df):
event.id event.date.timestamp touchpoint.date.timestamp touchpoint.type
1 7/16/2015 11:08 11/27/2014 10:34 impression
1 7/16/2015 13:00 6/10/2015 13:19 visit
1 7/16/2015 11:08 12/15/2014 13:24 impression
2 7/16/2015 0:00 4/27/2015 23:04 impression
2 7/16/2015 11:08 11/11/2014 8:01 impression
2 7/16/2015 11:08 11/27/2014 11:50 visit
3 7/16/2015 11:08 12/4/2014 14:36 impression
3 7/16/2015 11:08 11/11/2014 8:01 impression
3 7/16/2015 11:08 12/15/2014 13:21 visit
4 7/16/2015 11:08 11/27/2014 10:01 impression
4 7/16/2015 11:08 11/27/2014 10:22 impression
I am using dplyr to group the above table by event.id. I then am trying to summarise, but want to have a new column (first_impression) that only reports the min of the touchpoint.date.timestamp column IF the touchpoint.type column is = "impression".
So far I have
> df.new.grouped <- group.by(df, event.id)
> df.new.summarised <- summarise(df.new.grouped
,first_imp = min(filter(by.imp_to_click, touchpoint.type == "impression"),touchpoint.date.timestamp))
But that's not working for sure. I know you can't filter within, it was just my most recent attempt. Any ideas?
I think this will work:
df.summarized <- df %>%
group_by(event.id) %>%
filter(touchpoint.type=="impression") %>%
mutate(touchpoint.date.timestamp = as.POSIXct(touchpoint.date.timestamp, format="%m/%d/%Y %H:%M")) %>%
summarise(first_imp = min(touchpoint.date.timestamp))
Following Richard's comment, my assumption is that your date columns are formatted as strings, not dates, so min() won't work on them. Your problem is also a great example of how piping with %>% can make it easier to spell out and follow what you're doing without creating lots of new objects.
I've assumed that if your minimum date corresponds to a "visit" you don't want anything returned for that id, right?
If that's correct check my example (3 different ways to choose), if that's not correct you can modify it a little bit, or use the answer provided by #ulfelder, which looks perfect.
library(lubridate)
library(dplyr)
# example dataset
dt = data.frame(id = c(1,1,1,2,2,2),
date = c("2015-01-02","2015-01-04","2015-01-03",
"2015-01-11","2015-01-08","2015-01-06"),
type = c("impression","visit","visit","impression","visit","visit"))
# save as datetime
dt$date = ymd(dt$date) # you'll probably need the ymd_hms function if you have time as well
dt %>%
group_by(id) %>%
arrange(date) %>%
slice(1) %>%
filter(type=="impression")
dt %>%
group_by(id) %>%
filter(min_rank(date)==1 & type=="impression")
dt %>%
group_by(id) %>%
top_n(1,desc(date)) %>%
filter(type=="impression")
I'm relatively new to R but I am very familiar with Excel and T-SQL.
I have a simple dataset that has a date with time and a numeric value associated it. What I'd like to do is summarize the numeric values by-hour of the day. I've found a couple resources for working with time-types in R but I was hoping to find a solution similar to is offered excel (where I can call a function and pass-in my date/time data and have it return the hour of the day).
Any suggestions would be appreciated - thanks!
library(readr)
library(dplyr)
library(lubridate)
df <- read_delim('DateTime|Value
3/14/2015 12:00:00|23
3/14/2015 13:00:00|24
3/15/2015 12:00:00|22
3/15/2015 13:00:00|40',"|")
df %>%
mutate(hour_of_day = hour(as.POSIXct(strptime(DateTime, "%m/%d/%Y %H:%M:%S")))) %>%
group_by(hour_of_day) %>%
summarise(meanValue = mean(Value))
breakdown:
Convert column of DateTime (character) into formatted time then use hour() from lubridate to pull out just that hour value and put it into new column named hour_of_day.
> df %>%
mutate(hour_of_day = hour(as.POSIXct(strptime(DateTime, "%m/%d/%Y %H:%M:%S"))))
Source: local data frame [4 x 3]
DateTime Value hour_of_day
1 3/14/2015 12:00:00 23 12
2 3/14/2015 13:00:00 24 13
3 3/15/2015 12:00:00 22 12
4 3/15/2015 13:00:00 40 13
The group_by(hour_of_day) sets the groups upon which mean(Value) is computed in the via the summarise(...) call.
this gives the result:
hour_of_day meanValue
1 12 22.5
2 13 32.0