R dplyr historic maxima - r

I have used in this example to create a maximum temperature for each season. In addition, I am now trying to include an additional column that shows, for each row, the historic maximum temperature in the winter of that specific year (e.g. the value of winter 2001 for the seasons in 2001, winter 2002 for 2002 seasons, etc.).
I could solve this by subsetting and merging outside dplyr, but I was wondering if there is a way to do this elegantly within dplyr?
library(dplyr)
library(zoo)
library(DataCombine)
df = expand.grid(year = 2000:2003,
season = c("spring","summer","fall","winter"),
month=1:3)
df$temp = rpois(dim(df)[1], 5) # temperature
df2 = df %>%
group_by(year, season) %>%
summarise(max_temp=max(temp))

You may try
library(dplyr)
df %>%
group_by(year) %>%
mutate(max_temp = max(temp[season=='winter']))
Or an option using left_join
left_join(df,
df %>%
filter(season=='winter') %>%
group_by(year) %>%
summarise(max_temp=max(temp)))
A compact option with data.table would be
library(data.table)
setDT(df)[, max_temp := max(temp[season=='winter']) ,year][]

Related

How to eliminate the same rows with NA in all groups using the tidyverse package?

As an example, I have daily data, divided into 3 classes. Class B has an NA on day 3. I would like to eliminate day 3 (or days based on NA) from classes A and C, even if it is not an NA. I have tried using the drop_na() function, but this function only eliminates the row with NA from class B.
library(tidyverse)
Class <- c(rep("A",10),
rep("B",10),
rep("C",10))
Days <- rep(1:10,3)
Values <- c(1:12,
NA,
14:30)
DF <- data.frame(Class, Days, Values)
DF_NA <- DF %>%
group_by(Class) %>%
drop_na()
Do the grouping by 'Days'
library(dplyr)
DF %>%
group_by(Days) %>%
filter(!any(is.na(Values))) %>%
ungroup

How to randomly assign an index date using R

I would like to:
randomly assign an index date to people in the df_1 according to the distribution of the index date in the df_2;
the newly assigned index date should be earlier than the death date of people (the death date contains NA)
Currently I am using:
df_1 <-df_1 %>% mutate(index_date=sample(df_2$index_date, size=n(), replace=TRUE))
However, I do not know how to limit the index date before the death_date in df_1.
I'm guessing your data-format, but rowwise should do the trick:
df_1 <- df_1 %>%
rowwise() %>%
mutate(index_date = sample(df_2$index_date[df_2$index_date <= death_date], size=1)) %>%
ungroup()
Repex with mtcars:
mtcars %>%
rowwise() %>%
mutate(mpg_rand = sample(mtcars$mpg[mtcars$mpg <= mpg], 1))

Finding the first row after which x rows meet some criterium in R

A data wrangling question:
I have a dataframe of hourly animal tracking points with columns for id, time, and whether the animal is on land or in water (0 = water; 1 = land). It looks something like this:
set.seed(13)
n <- 100
dat <- data.frame(id = rep(1:5, each = 10),
datetime=seq(as.POSIXct("2020-12-26 00:00:00"), as.POSIXct("2020-12-30 3:00:00"), by = "hour"),
land = sample(0:1, n, replace = TRUE))
What I need to do is flag the first row after which the animal uses land at least once for 3 straight days. I tried doing something like this:
dat$ymd <- ymd(dat$datetime[1]) # make column for year-month-day
# add land points within each id group
land.pts <- dat %>%
group_by(id, ymd) %>%
arrange(id, datetime) %>%
drop_na(land) %>%
mutate(all.land = cumsum(land))
#flag days that have any land points
flag <- land.pts %>%
group_by(id, ymd) %>%
arrange(id, datetime) %>%
slice(n()) %>%
mutate(flag = if_else(all.land == 0,0,1))
# Combine flagged dataframe with full dataframe
comb <- left_join(land.pts, flag)
comb[is.na(comb)] <- 1
and then I tried this:
x = comb %>%
group_by(id) %>%
arrange(id, datetime) %>%
mutate(time.land=ifelse(land==0 | is.na(lag(land)) | lag(land)==0 | flag==0,
0,
difftime(datetime, lag(datetime), units="days")))
But I still can't quite wrap my head around what to do to make it so that I can figure out when the animal has been on land at least once for three days straight, and then flag that first point on land. Thanks so much for any help you can provide!
Create a date column from the timestamp. Summarise the data and keep only 1 row for each id and date which shows whether the animal was on land even once in the entire day.
Use zoo's rollapply function to mark the first day as TRUE if the next 3 days the animal was on land.
library(dplyr)
library(zoo)
dat <- dat %>% mutate(date = as.Date(datetime))
dat %>%
group_by(id, date) %>%
summarise(on_land = any(land == 1)) %>%
mutate(consec_three = rollapply(on_land, 3,all, align = 'left', fill = NA)) %>%
ungroup %>%
#If you want all the rows of the data
left_join(dat, by = c('id', 'date'))

R - Filter data by month

I apologize for my bad English, but I really need your help.
I have a .csv dataset with two columns - year and value. There is data about height of precipitation monthly from 1900 to 2019.
It looks like this:
year value
190001 100
190002 39
190003 78
190004 45
...
201912 25
I need to create two new datasets: the first one with the data for every year from July (07) to September (09) and the second one from January (01) to March (03).
Also I need to summarize this data for every year (it means I need only one value per year).
So I have data for summer 1900-2019 and winter 1900-2019.
You can use the dplyr and stringr packages to achive what you need. I created a mock data set first:
library(dplyr)
library(stringr)
df <- data.frame(time = 190001:201219, value=runif(length(190001:201219), 0, 100))
After that, we create two separate columns for month and year:
df$year <- as.numeric(str_extract(df$time, "^...."))
df$month <- as.numeric(str_extract(df$time, "..$"))
At this point, we can filter:
df_1 <- df %>% filter(between(month,7,9))
df_2 <- df %>% filter(between(month,1,3))
... and summarize:
df <- df %>% group_by(year) %>% summarise(value = sum(value))
library(tidyverse)
dat <- tribble(
~year, ~value,
190001, 100,
190002, 39,
190003, 78,
190004, 45)
Splitting the year variable into a month and year variable:
dat_prep <- dat %>%
mutate(month = str_remove(year, "^\\d{4}"), # Remove the first 4 digits
year = str_remove(year, "\\d{2}$"), # Remove the last 2 digits
across(everything(), as.numeric))
dat_prep %>%
filter(month %in% 7:9) %>% # For months Jul-Sep. Repeat with 1:3 for Jan-Mar
group_by(year) %>%
summarize(value = sum(value))

summarize weekly average using daily data in R

How to add one column price.wk.average to the data such that price.wk.average is equal to the average price of last week, and also add one column price.mo.average to the data such that it equals to the average price of last month? The price.wk.average will be the same for the entire week.
Dates Price Demand Price.wk.average Price.mo.average
2010-1-1 x x
2010-1-2 x x
......
2015-1-1 x x
jkl,
try to post reproducible examples. It will make it easier to help you. you can use dplyr:
library(dplyr)
df <- data.frame(date = seq(as.Date("2017-1-1"),by="day",length.out = 100), price = round(runif(100)*100+50,0))
df <- df %>%
group_by(week = week(date)) %>%
mutate(Price.wk.average = mean(price)) %>%
ungroup() %>%
group_by(month = month(date)) %>%
mutate(Price.mo.average = mean(price))
(Since I don't have enough points to comment)
I wanted to point out that Eric's answer will not distinguish average weekly price by year. Therefore, if you are interested in unique weeks (Week 1 of 2012 != Week 1 of 2015 ), you will need to do extra work to group by unique weeks.
df <- data.frame( Dates = c("2010-1-1", "2010-1-2", "2015-01-3"),
Price = c(50, 20, 40) )
Dates Price
1 2010-1-1 50
2 2010-1-2 20
3 2015-01-3 40
Just to keep your data frame tidy, I suggest converting dates to POSIX format then sorting the data frame:
library(lubridate)
df <- df %>%
mutate(Dates = lubridate::parse_date_time(Dates,"ymd")) %>%
arrange( Dates )
To group by unique weeks:
df <- df %>%
group_by( yw = paste( year(Dates), week(Dates)))
Then mutate and ungroup.
To group by unique months:
df <- df %>%
group_by( ym = paste( year(Dates), month(Dates)))
and mutate and ungroup.

Resources