This is my tibble:
date;temp
1953-1-1;-0.2
1953-1-2;-0.2
1953-1-3;-0.2
1953-1-4;-0.1
...
1954-1-1;2
1954-1-2;3
1954-1-3;4
1954-1-4;5
...
1955-1-1;6
1955-1-2;7
1955-1-3;8
1955-1-4;9
I would now like to calculate the mean temperature per year. That means I want to calculate all values of the column temp for each year. However, I have no idea how I can work in R with the year numbers. Can someone tell me how to solve the problem?
tb <- tb %>%
mutate(year = substr(date, start=1, stop=4)) %>%
group_by(year) %>%
summarise(mean_temp = mean(temp, na.rm=TRUE))
Otherwise, lubridate is a nice library to work with Dates.
Related
I have a data frame with 58 columns labeled SD1 through to SD58 along with columns for date info (Date, Year, Month, Day).
I'm trying to find the date of the maximum value of each of the SD columns each year using the following code:
maxs<-aggregate(SD1~Year, data=SDtime, max)
SDMax<-merge(maxs,SDtime)
I only need the dates so I made a new df and relabeled the column as below:
SD1Max = subset(SDMax, select = c(Year, Date))
SD1Max %>%
rename(
SD1=Date
)
I want to do the same thing for every SD column but I don't want to have to repeat these steps 58 times. Is there a way to loop the process?
Assuming there are no ties (multiple days with where the variable reached its maximum) this probably does what you want:
library('tidyverse')
SDtime %>%
pivot_longer(
cols = matches('^SD[0-9]{1,2}$')
) %>%
group_by(name) %>%
filter(value == max(value, na.rm = TRUE)) %>%
ungroup()
You might want to pivot_wider afterwards.
I would like to do some computation on several rows in a table.
I created an exemple below:
library(dplyr)
set.seed(123)
year_week <- c(200045:200053, 200145:200152, 200245:200252)
input <- as.vector(sample(1:10,25,TRUE))
partial_sum <- c( 20,12,13,18,12,13,4,15,9,13,10,20,11,9,9,5,13,13,,8,13,11,15,14,7,14)
df <- data.frame(year_week, input, partial_sum)
Given are the columns input and year_week. The later represents dates but the values are numerical in my case with the first 4 digits as years and the last two as the working weeks for that year.
What I need, is to iterate over each week in each year and to sum up the values from the same weeks in the other years and save the results into a column called here partial_sum. The current value is excluded from the sum.
The week 53 in the lap year 2000 will get the same treatment but in this case I have only one lap year therefore its value 3 doesn't change.
Any idea on how to make it?
Thank you
I would expect something like this would work, though as pointed out in comments your example isn't exactly reproducible.
library(dplyr)
df %>%
mutate(week = substr(year_week, 5, 6)) %>%
group_by(week) %>%
mutate(result = sum(input))
Perhaps this helps - grouped by 'week' by taking the substring, get the difference between the sum of 'input' and the 'input'
library(dplyr)
df %>%
group_by(week = substring(year_week, 5)) %>%
mutate(partial_sum2 = sum(input) - input)
I am trying to create two frequency tables, one that is daily, and one that is hourly. I am able to get the daily values fairly easily.
C<-Data
C$Data<-format(C$Data, "%m/%d/%Y")
Freq_Day<- C %>% group_by(Data) %>% summarise(frequency = n())
However when I try to get the hourly frequency by doing the following
B<-Data
B$Data<-format(B$Data,"%m/%d/%Y %H:%M")
Freq_HRLY<-B %>% group_by(Data) %>% summarise(frequency = n())
It omits hours that simply did not occur in the data set. Thus it returns a column that is less than (# of Days) *24. How would I go about getting a column of dates in one hour increments with their corresponding frequency, in a way that if there is no occurrence in "Data' it just has a value of 0
One way would be to use tidyr::complete to fill in the missing hours on the Freq_HRLY data which is already calculated by creating a sequence of hourly interval between min and max Data.
library(dplyr)
Freq_HRLY %>%
ungroup() %>%
mutate(Data = as.POSIXct(Data, format = "%m/%d/%Y %H:%M")) %>%
tidyr::complete(Data = seq(min(Data), max(Data), by = "1 hour"),
fill = list(frequency = 0))
I conducted a dietary analysis in a raptor species and I would like to calculate the percentage of occurence of the prey items in the three different stages of it's breeding cycle. I would like the occurence to be expressed a percentage of the sample size. As an example if the sample size is 135 and I get an occurence of Orthoptera 65. I would like to calculate the percentage: 65/135.
So far I have tried with the long version without succes. The result I am getting is not correct. Any help is highly recommended and sorry if this question is reposted.
The raw dataset is as it follows:
set.seed(123)
pellets_2014<-data.frame(
Period = sample(c("Prebreeding","Breeding","Postbreedng"),12, replace=TRUE),
Orthoptera = sample(0:10, 12,replace=TRUE),
Coleoptera=sample(0:10,12,replace = TRUE),
Mammalia=sample(0:10,12, replace=TRUE))
##I transform the file to long format
##Library all the necessary packages
library(dplyr)
library(tidyr)
library(scales)
library(naniar)
pellets2014_long<-gather(pellets_2014,Categories, Count, c(Orthoptera,Coleoptera,Mammalia))
##I trasnform the zero values to NAs
pellets2014_NA<-pellets2014_long %>% replace_with_na(replace = list(Count = 0))
## Try to calculate the occurence
Occurence2014<-pellets2014_NA %>%
group_by(Period,Categories) %>%
summarise(n=n())
## I do get here but I don't get the right number of occurence and I am stuck how to get the right percentage
##If I try this:
Occurence2014<-pellets2014_NA %>%
group_by(Period,Categories) %>%
summarise(n=n())%>%mutate(Freq_n=n/sum(n)*100)
##The above is also wrong because I need it to be divide by the sample size in each period (here is 4 samples per period, the overall sample size is 12)!
The output must be occurence and percentage of occurence for its prey category in each Period. As it is shown in the picture below
Desired output
Is this close to what you're looking for?
Occurence2014 <- pellets2014_NA %>%
group_by(Period,Categories) %>%
summarise(n = n()) %>%
ungroup() %>%
mutate(
freq = n / sum(n)
)
Something like this?
Occurence2014 <- pellets2014_NA %>%
group_by(Period) %>%
mutate(period_sample_size = n()) %>%
ungroup() %>%
group_by(Period,Categories,period_sample_size) %>%
summarise(n=n())%>%
mutate(Freq_n=n/period_sample_size*100)
I am using the code below to group by month to sum or count. However, the SLARespond column seems like it sums for the whole data set, not for each month.
Any way that I can fix the problem?
also, instead of sum function, can I do count function with SLAIncident$IsSlaRespondByViolated == 1
Appreciate for helps!
SLAIncident <- SLAIncident %>%
mutate(month = format(SLAIncident$CreatedDateLocal, "%m"), year = format(SLAIncident$CreatedDateLocal, "%Y")) %>%
group_by(year, month) %>%
summarise(SLARespond = sum(SLAIncident$IsSlaRespondByViolated))
If you could provide a small bit of the dataset to illustrate your example that would be great. I would first make sure that your months/years are characters or factors so that dplyr can grab them. An ifelse function wrapped in a sum should also fit your criteria for the second part of the question. I am using your code here to convert the dates into month and year but I recommend lubridate
SLAIncident <- SLAIncident %>%
mutate(month = as.character(format(SLAIncident$CreatedDateLocal, "%m")),
year = as.character(format(SLAIncident$CreatedDateLocal, "%Y"))) %>%
group_by(year, month) %>%
summarise(SLARespond = sum(IsSlaRespondByViolated),
sla_1 = sum(ifelse(isSlaRespondByViolated == 1, 1, 0)))
Also as hinted to in the comments, these column names are really long and could use some tidying