This question already has answers here:
How to join (merge) data frames (inner, outer, left, right)
(13 answers)
Closed 2 years ago.
I am new to R
I've been struggling to understand how to get these columns for plotting. This columns are from different data. What I need to do is either use The wday() function in the lubridate package as it says that it be useful. Then I need to pivot the data to long format to get the Direction column. The Timeperiod column comes from the lockdown_dates data. Summarise the data to get appropriate averages.
Column Description
Weekday Day of week (Sunday through Saturday).
Hour Hour of day (0 through 23).
Direction To City or From City
Timeperiod Before times, Lockdown
Count Average (mean) number of cyclists per direction per hour for each time period and weekday.
I have these sample tables .
Date Timeperiod
1 2020-02-01 Before times
2 2020-02-02 Before times
3 2020-02-03 Before times
Date Hour From City To City
1 2020-02-01 0 0 1
2 2020-02-01 1 0 0
3 2020-02-01 2 0 0
I don't know how to start my code, I was thinking of grouping these to form the data but i know it wont work. I would appreciate if someone can give me an example how to do it.
but I tried this but it only gave me "Friday" not a column.
weekdays(as.Date("4/6/2018 20:14", "%m/%d/%Y"))
You can join the two dataframes and calculate the weekDays for Date.
result <- transform(merge(df1, df2, by = 'Date'), wday = weekdays(Date))
Using dplyr :
library(dplyr)
result <- inner_join(df1, df2, by = 'Date') %>% mutate(wday = weekdays(Date))
This question already has an answer here:
use rollapply and zoo to calculate rolling average of a column of variables
(1 answer)
Closed 2 years ago.
I have a data frame like below (sample data). I want to add two columns for each day to show average and std sales of same day in the last 3 weeks. What I mean by this is the same 3 previous days (last 3 Tuesdays, last 3 Wednesdays, etc.)
df <- data.frame(
stringsAsFactors = FALSE,
date = c("3/28/2019","3/27/2019",
"3/26/2019","3/25/2019","3/24/2019","3/23/2019",
"3/22/2019","3/21/2019","3/20/2019","3/19/2019","3/18/2019",
"3/17/2019","3/16/2019","3/15/2019","3/14/2019",
"3/13/2019","3/12/2020","3/11/2020","3/10/2020","3/9/2021",
"3/8/2021","3/7/2021","3/6/2022","3/5/2022",
"3/4/2022","3/3/2023"),
weekday = c(4L,3L,2L,1L,7L,6L,5L,4L,
3L,2L,1L,7L,6L,5L,4L,3L,2L,1L,7L,6L,5L,4L,
3L,2L,1L,7L),
store_id = c(344L,344L,344L,344L,344L,
344L,344L,344L,344L,344L,344L,344L,344L,344L,344L,
344L,344L,344L,344L,344L,344L,344L,344L,344L,
344L,344L),
store_sales = c(1312005L,1369065L,1354185L,
1339183L,973780L,1112763L,1378349L,1331890L,1357713L,
1366399L,1303573L,936919L,1099826L,1406752L,
1318841L,1321099L,1387767L,1281097L,873449L,1003667L,
1387767L,1281097L,873449L,1003667L,1331636L,1303804L)
)
For example for 3/28/2019 take average sales of (3/21/2019 & 3/14/2019 & 3/7/2021) , like this
date weekday store_id store_sales avg_sameday3
3/28/2019 4 344 1312005 1310609
We can group by weekday and store_id and calculate rolling mean for last 3 entries using zoo::rollapplyr.
library(dplyr)
df %>%
arrange(weekday) %>%
group_by(store_id, weekday) %>%
mutate(store_sales_avg = zoo::rollapplyr(store_sales, 4,
function(x) mean(x[-1]), partial = TRUE))
Note that I have used window size as 4 and removed the first entry from mean calculation so that it does not consider the current value while taking mean. With partial = TRUE it takes mean even when last values are less than 4.
This question already has answers here:
Aggregate / summarize multiple variables per group (e.g. sum, mean)
(10 answers)
Closed 3 years ago.
I have the following data of Unemployement per Year and quarter, but in my data frame is up to 2018, but I will use only 2 years for exemple.
Year Unemployement
1997Q3 1914
1997Q4 1697
1998Q1 1702
1998Q2 1645
1998Q3 1742
1998Q4 1605
What code can I use in order to tidy the Year column and to have the following data, and mainly to obtain the unemployment number by calculating the mean of each data per year: 1997 and 1998 (+ for other years that I have in my data frame). In the final version, I would like to have only one data of Unemployment per year, which theoretically shoud be the average of all Quaters
Year Unemployement
1997 1805.50
1998 1673.50
Thank you!
##Data entry
library(tidyverse)
df<- tribble(
~Year,~Quarter,~Unemployement,
1997,"Q3",1914,
1997,"Q4",1697,
1998,"Q1",1702,
1998,"Q2",1645,
1998,"Q3",1742,
1998,"Q4",1605
)
##Solution
df%>%
group_by(Year)%>%
summarise(mean_year = mean(Unemployement))
# A tibble: 2 x 2
Year mean_year
<dbl> <dbl>
1 1997 1806.
2 1998 1674.
## 2nd Version (first separate the Year-column)
df%>%
separate(Year, c("Year", "Quarter"))%>%
group_by(Year)%>%
summarise(mean_year = mean(Unemployement))
This question already has answers here:
Get the difference between dates in terms of weeks, months, quarters, and years
(9 answers)
Closed 4 years ago.
I have a set of dates for multiple years and I am wondering how to place them on the same scale using a reference point.
For example, I have the following dates:
"2018-04-15" "2018-04-30" "2018-05-06" "2018-05-12" "2018-05-13"
I want to create a separate column that counts the number of days these dates are from
"2018-11-06".
Thanks so much!
you can use difftime like this
library(data.table)
## Create data
df <- data.table(Date = c("2018-04-15", "2018-04-30", "2018-05-06", "2018-05-12", "2018-05-13"))
reference <- "2018-11-06"
## Calculate difference in dates in days
df <- df[,DaysFromRef := difftime(as.Date(reference), as.Date(Date), "days")]
df
Date DaysFromRef
1: 2018-04-15 205 days
2: 2018-04-30 190 days
3: 2018-05-06 184 days
4: 2018-05-12 178 days
5: 2018-05-13 177 days
## Convert DaysFromRef column to numeric
df$DaysFromRef <- as.numeric(df$DaysFromRef)
This question already has answers here:
Calculate the mean by group
(9 answers)
Closed 5 years ago.
I have a dataset that looks approximately like this:
> dataSet
month detrend
1 Jan 315.71
2 Jan 317.45
3 Jan 317.5
4 Jan 317.1
5 Jan 315.71
6 Feb 317.45
7 Feb 313.5
8 Feb 317.1
9 Feb 314.37
10 Feb 315.41
11 March 316.44
12 March 315.73
13 March 318.73
14 March 315.55
15 March 312.64
.
.
.
How do I compute the average by month? E.g., I want something like
> by_month
month ave_detrend
1 Jan 315.71
2 Feb 317.45
3 March 317.5
What you need to focus on is a means to group your column of interest (the "detrend") by the month. There are ways to do this within "vanilla R", but the most effective way is to use tidyverse's dplyr.
I will use the example taken directly from that page:
mtcars %>%
group_by(cyl) %>%
summarise(disp = mean(disp), sd = sd(disp))
In your case, that would be:
by_month <- dataSet %>%
group_by(month) %>%
summarize(avg = mean(detrend))
This new "tidyverse" style looks quite different, and you seem quite new, so I'll explain what's happening (sorry if this is overly obvious):
First, we are grabbing the dataframe, which I'm calling dataSet.
Then we are piping that dataset to our next function, which is group_by. Piping means that we're putting the results of the last command (which in this case is just the dataframe dataSet) and using it as the first parameter of our next function. The function group_by has a dataframe provided as its first function.
Then the results of that group by are piped to the next function, which is summarize (or summarise if you're from down under, as the author is). summarize simply calculates using all the data in the column, however, the group_by function creates partitions in that column. So we now have the mean calculated for each partition that we've made, which is month.
This is the key: group_by creates "flags" so that summarize calculates the function (mean, in this case) separately on each group. So, for instance, all of the Jan values are grouped together and then the mean is calculated only on them. Then for all of the Feb values, the mean is calculated, etc.
HTH!!
R has an inbuilt mean function: mean(x, trim = 0, na.rm = FALSE, ...)
I would do something like this:
january <- dataset[dataset[, "month"] == "january",]
januaryVector <- january[, "detrend"]
januaryAVG <- mean(januaryVector)