I apologize for my bad English, but I really need your help.
I have a .csv dataset with two columns - year and value. There is data about height of precipitation monthly from 1900 to 2019.
It looks like this:
year value
190001 100
190002 39
190003 78
190004 45
...
201912 25
I need to create two new datasets: the first one with the data for every year from July (07) to September (09) and the second one from January (01) to March (03).
Also I need to summarize this data for every year (it means I need only one value per year).
So I have data for summer 1900-2019 and winter 1900-2019.
You can use the dplyr and stringr packages to achive what you need. I created a mock data set first:
library(dplyr)
library(stringr)
df <- data.frame(time = 190001:201219, value=runif(length(190001:201219), 0, 100))
After that, we create two separate columns for month and year:
df$year <- as.numeric(str_extract(df$time, "^...."))
df$month <- as.numeric(str_extract(df$time, "..$"))
At this point, we can filter:
df_1 <- df %>% filter(between(month,7,9))
df_2 <- df %>% filter(between(month,1,3))
... and summarize:
df <- df %>% group_by(year) %>% summarise(value = sum(value))
library(tidyverse)
dat <- tribble(
~year, ~value,
190001, 100,
190002, 39,
190003, 78,
190004, 45)
Splitting the year variable into a month and year variable:
dat_prep <- dat %>%
mutate(month = str_remove(year, "^\\d{4}"), # Remove the first 4 digits
year = str_remove(year, "\\d{2}$"), # Remove the last 2 digits
across(everything(), as.numeric))
dat_prep %>%
filter(month %in% 7:9) %>% # For months Jul-Sep. Repeat with 1:3 for Jan-Mar
group_by(year) %>%
summarize(value = sum(value))
Related
I would like to add a new column in my data frame (image1), this new column represents the number of occurrences of weekdays within the specific month, at the end, I need to have something like the "working day in the month" in "image2"
how I can achieve this result in R?
This is a solution if you only have one month
for(i in 1:length(df$day_name))
{
b<- as.character(df[i,2])
c<- a[1:i,2]
df$working_day[i] <- length(which(c==b))
}
assuming the dataframe is df, you could do something like this
df <- df %>% mutate(month = month(date), year = year(date))
df <- df %>% group_by(day_name, month)
df <- df %>% summarize(working_day_in_month = n())
df <- df %>% arrange(day_name)
df
A data wrangling question:
I have a dataframe of hourly animal tracking points with columns for id, time, and whether the animal is on land or in water (0 = water; 1 = land). It looks something like this:
set.seed(13)
n <- 100
dat <- data.frame(id = rep(1:5, each = 10),
datetime=seq(as.POSIXct("2020-12-26 00:00:00"), as.POSIXct("2020-12-30 3:00:00"), by = "hour"),
land = sample(0:1, n, replace = TRUE))
What I need to do is flag the first row after which the animal uses land at least once for 3 straight days. I tried doing something like this:
dat$ymd <- ymd(dat$datetime[1]) # make column for year-month-day
# add land points within each id group
land.pts <- dat %>%
group_by(id, ymd) %>%
arrange(id, datetime) %>%
drop_na(land) %>%
mutate(all.land = cumsum(land))
#flag days that have any land points
flag <- land.pts %>%
group_by(id, ymd) %>%
arrange(id, datetime) %>%
slice(n()) %>%
mutate(flag = if_else(all.land == 0,0,1))
# Combine flagged dataframe with full dataframe
comb <- left_join(land.pts, flag)
comb[is.na(comb)] <- 1
and then I tried this:
x = comb %>%
group_by(id) %>%
arrange(id, datetime) %>%
mutate(time.land=ifelse(land==0 | is.na(lag(land)) | lag(land)==0 | flag==0,
0,
difftime(datetime, lag(datetime), units="days")))
But I still can't quite wrap my head around what to do to make it so that I can figure out when the animal has been on land at least once for three days straight, and then flag that first point on land. Thanks so much for any help you can provide!
Create a date column from the timestamp. Summarise the data and keep only 1 row for each id and date which shows whether the animal was on land even once in the entire day.
Use zoo's rollapply function to mark the first day as TRUE if the next 3 days the animal was on land.
library(dplyr)
library(zoo)
dat <- dat %>% mutate(date = as.Date(datetime))
dat %>%
group_by(id, date) %>%
summarise(on_land = any(land == 1)) %>%
mutate(consec_three = rollapply(on_land, 3,all, align = 'left', fill = NA)) %>%
ungroup %>%
#If you want all the rows of the data
left_join(dat, by = c('id', 'date'))
I've got a data frame (df1) with an ID variable and two date variables (dat1 and dat2).
I'd like to subset the data frame so that I get the observations for which the difference between dat2 and dat1 is less than or equal to 30 days.
I'm trying to use dplyr() but I can't get it to work.
Any help would be much appreciated.
Starting point (df):
df1 <- data.frame(ID=c("a","b","c","d","e","f"),dat1=c("01/05/2017","01/05/2017","01/05/2017","01/05/2017","01/05/2017","01/05/2017"),dat2=c("14/05/2017","05/06/2017","23/05/2017","15/10/2017","15/11/2017","15/12/2017"), stringsAsFactors = FALSE)
Desired outcome (df):
dfgoal <- data.frame(ID=c("a","c"),dat1=c("01/05/2017","01/05/2017"),dat2=c("14/05/2017","23/05/2017"),newvar=c(13,22))
Current code:
library(dplyr)
df2 <- df1 %>% mutate(newvar = as.Date(dat2) - as.Date(dat1)) %>%
filter(newvar <= 30)
We need to convert to Date class before doing the subtraction
library(dplyr)
library(lubridate)
df1 %>%
mutate_at(2:3, dmy) %>%
mutate(newvar = as.numeric(dat2- dat1)) %>%
filter(newvar <=30)
The as.Date also needs to include the format argument, otherwise, it will think that the format is in the accepted %Y-%m-%d. Here, it is in %d/%m/%Y
df1 %>%
mutate(newvar = as.numeric(as.Date(dat2, "%d/%m/%Y") - as.Date(dat1, "%d/%m/%Y"))) %>%
filter(newvar <= 30)
# ID dat1 dat2 newvar
#1 a 01/05/2017 14/05/2017 13
#2 c 01/05/2017 23/05/2017 22
How to add one column price.wk.average to the data such that price.wk.average is equal to the average price of last week, and also add one column price.mo.average to the data such that it equals to the average price of last month? The price.wk.average will be the same for the entire week.
Dates Price Demand Price.wk.average Price.mo.average
2010-1-1 x x
2010-1-2 x x
......
2015-1-1 x x
jkl,
try to post reproducible examples. It will make it easier to help you. you can use dplyr:
library(dplyr)
df <- data.frame(date = seq(as.Date("2017-1-1"),by="day",length.out = 100), price = round(runif(100)*100+50,0))
df <- df %>%
group_by(week = week(date)) %>%
mutate(Price.wk.average = mean(price)) %>%
ungroup() %>%
group_by(month = month(date)) %>%
mutate(Price.mo.average = mean(price))
(Since I don't have enough points to comment)
I wanted to point out that Eric's answer will not distinguish average weekly price by year. Therefore, if you are interested in unique weeks (Week 1 of 2012 != Week 1 of 2015 ), you will need to do extra work to group by unique weeks.
df <- data.frame( Dates = c("2010-1-1", "2010-1-2", "2015-01-3"),
Price = c(50, 20, 40) )
Dates Price
1 2010-1-1 50
2 2010-1-2 20
3 2015-01-3 40
Just to keep your data frame tidy, I suggest converting dates to POSIX format then sorting the data frame:
library(lubridate)
df <- df %>%
mutate(Dates = lubridate::parse_date_time(Dates,"ymd")) %>%
arrange( Dates )
To group by unique weeks:
df <- df %>%
group_by( yw = paste( year(Dates), week(Dates)))
Then mutate and ungroup.
To group by unique months:
df <- df %>%
group_by( ym = paste( year(Dates), month(Dates)))
and mutate and ungroup.
I have used in this example to create a maximum temperature for each season. In addition, I am now trying to include an additional column that shows, for each row, the historic maximum temperature in the winter of that specific year (e.g. the value of winter 2001 for the seasons in 2001, winter 2002 for 2002 seasons, etc.).
I could solve this by subsetting and merging outside dplyr, but I was wondering if there is a way to do this elegantly within dplyr?
library(dplyr)
library(zoo)
library(DataCombine)
df = expand.grid(year = 2000:2003,
season = c("spring","summer","fall","winter"),
month=1:3)
df$temp = rpois(dim(df)[1], 5) # temperature
df2 = df %>%
group_by(year, season) %>%
summarise(max_temp=max(temp))
You may try
library(dplyr)
df %>%
group_by(year) %>%
mutate(max_temp = max(temp[season=='winter']))
Or an option using left_join
left_join(df,
df %>%
filter(season=='winter') %>%
group_by(year) %>%
summarise(max_temp=max(temp)))
A compact option with data.table would be
library(data.table)
setDT(df)[, max_temp := max(temp[season=='winter']) ,year][]