I have a data frame with data that looks like this that has 365 rows reflecting the calendar year. I am trying to shift the county name columns up by one row. The data frame doesn't contain any missing values.
I tried using the following code to shift it, but the resulting table has values that are all NA.
covid_shift <- covid_pivot %>%
mutate(Maricopa = lag(Maricopa), Cook = lag(Cook), Harris = lag(Harris))
Does anyone know what might be the issue?
Since covid_pivot is grouped by date, and each of these groups has one row, the lead and lag functions return NA.
Try:
covid_shift <- covid_pivot %>%
ungroup() %>%
mutate(Maricopa = lag(Maricopa), Cook = lag(Cook), Harris = lag(Harris))
You might also consider using across()
covid_pivot %>%
ungroup() %>%
mutate(across(-date, ~lag(.x)))
I have a sequence of numeric labels for records that can be shared by a variable number of records per label (labelsequence). I also have the records themselves, but unfortunately for some of the sequence values, all records have been lost (dataframe df). I need to identify when a numeric label from labelsequence does not appear in the label column of df, copy all records within df that are associated with the closest label value that is less than the missing value, and append these to a newly filled-in dataframe, say df2.
I am trying to accomplish this in R (a dplyr answer would be ideal), and have looked at answers to questions regarding filling in missing rows, such as Fill in missing rows in R and fill missing rows in a dataframe, and have a working solution below, was wondering if anyone has a better way of doing this.
Take , for instance, this example data:
labelsequence<-data.frame(label=c(1,2,3,4,5,6))
and
df<-data.frame(label=c(1,1,1,1,3,3,4,4,4),
place=c('vermont','kentucky',
'wisconsin','wyoming','nevada',
'california','utah','georgia','kentucky'),
animal=c('wolf','wolf','cougar','cougar','lamb',
'cougar','donkey','lamb','wolf'))
with desired result...
desired_df2<-data.frame(label=c(1,1,1,1,2,2,2,2,3,3,4,4,4,5,5,5,6,6,6),
place=c('vermont','kentucky',
'wisconsin','wyoming','vermont','kentucky',
'wisconsin','wyoming','nevada',
'california','utah','georgia','kentucky','utah',
'georgia','kentucky','utah','georgia','kentucky'),
animal=c('wolf','wolf','cougar','cougar','wolf',
'wolf','cougar','cougar','lamb','cougar',
'donkey','lamb','wolf','donkey','lamb','wolf',
'donkey','lamb','wolf'))
Is there a better (be it effiency of code, flexibility, or resource efficiency) way than the following?
df2<- df %>%
full_join(expand.grid(label=unique(df$label),newlabel=labelsequence$label)) %>%
mutate(missing = ifelse(newlabel %in% label,0,1))%>%
filter(label<newlabel)%>%
group_by(newlabel) %>%
filter(label==max(label) & missing ==1) %>%
ungroup()%>%
mutate(label=newlabel,missing=NULL,newlabel=NULL) %>%
bind_rows(df) %>%
arrange(label)
I have a list of statcast data, per day dating back to 2016. I am attempting to aggregate this data for finding the mean for each pitching ID.
I have the following code:
aggpitch <- aggregate(pitchingstat, by=list(pitchingstat$PitcherID),
FUN=mean, na.rm = TRUE)
This function aggregates every single column. I am looking to only aggregate a certain amount of columns.
How would I include only certain columns?
If you have more than one column that you'd like to summarize, you can use QAsena's approach and add summarise_at function like so:
pitchingstat %>%
group_by(PitcherID) %>%
summarise_at(vars(col1:coln), mean, na.rm = TRUE)
Check out link below for more examples:
https://dplyr.tidyverse.org/reference/summarise_all.html
Replace the first argument (pitchingstat) with the name of the column you want to aggregate (or a vector thereof)
How about?:
library(tidyverse)
aggpitch <- pitchingstat %>%
group_by(PitcherID) %>%
summarise(pitcher_mean = mean(variable)) #replace 'variable' with your variable of interest here
or
library(tidyverse)
aggpitch <- pitchingstat %>%
select(var_1, var_2)
group_by(PitcherID) %>%
summarise(pitcher_mean = mean(var_1),
pitcher_mean2 = mean(var_2))
I think this works but could use a dummy example of your data to play with.
In R, I have a dataframe, so that I have One Variable (the name of a country), a number of variables (Population, Number of cars, etc) and then a Column that represents region.
I would like to sum the variables (1, 2, ....) based on the value of the last region. I think this should be possible with dplyr and summarise each, but I cannot get it to work.
Would someone be able to help me please? Thanks a lot.
Reading the response (althought this may change if you can get some of your dataframe together...
library(dplyr)
summarized_df <- df %>%
group_by(region) %>%
summarise(var1=sum(variable1), var2=sum(variable2), var3=sum(variable3))
If this doesn't seem to work, maybe you can post your code and the errors even if you can't post the dataframe.
The column ID is a sequence of 1-63 in repetition. I wish to add two new columns Closepctl and Quantitypctl in which I can rank each entry from 1-63 i.e., on the basis of Close and Quantity column but grouped with respect to ID. Is there any way to do this in R?
I tried it in excel but failed to find any grouping option there. Excel approach is also appreciated.
See if the following is helpful...
library(dplyr)
data(iris)
df <- iris %>%
group_by(Species) %>%
mutate(RankSepal = percent_rank(Sepal.Length),
RankPetal = percent_rank(Petal.Length))