Im new to coding with R and especially in time series. My problem is that I'd like to include a "Beep" column in a dataset. More specifically, in the dataset, there are 3 columns, ID, date and time like this
.
It would be really useful, next to these columns, to add a corresponding beep, since the individuals got many beeps during the day for some days. I'd like my final result to be something like this
.
How could I do that?
library(dplyr) # Load package dplyr
mydata <- mydata %>% # Take the dataframe, then...
group_by(Name, Dates) %>% # Group it by name and dates, then...
mutate(beep = row_number()) %>% # Add a beep column with a sequential number by name & date
ungroup() # Remove grouping
Related
I have a dataset with 51 columns and I want to add summary rows for most of these variables. Currently columns 5:48 are various metrics with each row being 1 area from 1 quarter, I am summing the metric for all quarters for each area and ultimately creating a rate. The below code works fine for doing this to one individual metric but I need to run this for 44 different columns.
example <- test %>%
group_by(Area) %>%
summarise(`Metric 1`= (sum(`Metric 1`))/(mean(Population))*10000) %>%
bind_rows(test) %>%
arrange(Area,-Quarter)%>% # sort so that total rows come after the quarters
mutate(Timeframe=if_else(is.na(Quarter),'12 month rolling', 'Quarterly'))
I have tried creating a for loop and using the column index values, however, that hasn't worked and just returns various errors. I've been unable to get the above script working with index values as well, the below gives an error ('Error: unexpected '=' in: " group_by_at(Local_Authority) %>% summarise(u17_12mo[5]=")
example <- test %>%
group_by_at(Area) %>%
summarise(test[5]= (sum(test[5]))/(mean(test[4]))*10000) %>%
bind_rows(test) %>%
arrange(Area,-Quarter)%>% # sort so that total rows come after the quarters
mutate(Timeframe=if_else(is.na(Quarter),'12 month rolling', 'Quarterly'))
Any help on setting up a for loop for this, or another way entirely would be great
Without data, its tough to help, but maybe this would work for you:
library(tidyverse)
example <- test %>%
group_by(Area) %>%
summarise(across(5:48, ~(sum(.))/(mean(Population))*10000))
Consider a set of time series having the same length. Some have missing data in the end, due to the product being out of stock, or due to delisting.
If the series contains at least four missing observations (in my case it is value = 0 and not NA) at the end, I consider the series as delisted.
In my time series panel, I want to separate the series with delisted id's from the other ones and create two different dataframes based on this separation.
I created a simple reprex to illustrate the problem:
library(tidyverse)
library(lubridate)
data <- tibble(id = as.factor(c(rep("1",24),rep("2",24))),
date = rep(c(ymd("2013-01-01")+ months(0:23)),2),
value = c(c(rep(1,17),0,0,0,0,2,2,3), c(rep(9,20),0,0,0,0))
)
I am searching for a pipeable tidyverse solution.
Here is one possibility to find delisted ids
data %>%
group_by(id) %>%
mutate(delisted = all(value[(n()- 3):n()] == 0)) %>%
group_by(delisted) %>%
group_split()
In the end I use group_split to split the data into two parts: one containing delisted ids and the other one contains the non-delisted ids.
I am trying to make country-level (by year) summaries of a long-form aggregated dataset that has individual-level data. I have tried using dplyr to summarize the average of the variable I am interested in to create a new dataset. However... there appears to be something wrong with my group_by because the answer is only one observation that appears to be the mean of every observation.
data named: "finaldata.giniE",
country variable: "iso3c",
year variable: "date",
individual-level variable of interest: "Ladder.Life.Present"
Note: there are more variables in my data-- could this be an issue?
country_summmary <- finaldata.giniE %>%
select(iso3c, date, Ladder.Life.Present) %>%
group_by(iso3c, date) %>%
summarize(averaged.M = mean(Ladder.Life.Present))
country_summmary
My output appears like this:> country_summmary
averaged.M
1 5.505455
Thank you!
I actually just changed something and added your suggested code to the front and it worked! Here is the code that was able to work!
library(dplyr)
country_summary <- finaldata.gini %>%
group_by(iso3c, date) %>%
select(Ladder.Life.Present) %>%
summarise_each(funs(mean))
I am currently struggling to transition one of my columns in my data to a row as an observation. Below is a representative example of what my data looks like:
library(tidyverse)
test_df <- tibble(unit_name=rep("Chungcheongbuk-do"),unit_n=rep(2),
can=c("Cho Bong-am","Lee Seung-man","Lee Si-yeong","Shin Heung-woo"),
pev1=rep(510014),vot1=rep(457815),vv1=rep(445955),
ivv1=rep(11860),cv1=c(25875,386665,23006,10409),
abstention=rep(52199))
As seen above, the abstention column exists at the end of my data frame, and I would like my data to look like the following:
library(tidyverse)
desired_df <- tibble(unit_name=rep("Chungcheongbuk-do"),unit_n=rep(2),
can=c("Cho Bong-am","Lee Seung-man","Lee Si-yeong","Shin Heung-woo","abstention"),
pev1=rep(510014),vot1=rep(457815),vv1=rep(445955),
ivv1=rep(11860),cv1=c(25875,386665,23006,10409,52199))
Here, abstentions are treated like a candidate, in the can column. Thus, the rest of the data is maintained, and the abstention values are their own observation in the cv1 column.
I have tried using pivot_wider, but I am unsure how to use the arguments to get what I want. I have also considered t() to transpose the column into a row, but also having a hard time slotting it back into my data. Any help is appreciated! Thanks!
Here's a strategy what will work if you have multiple unit_names
test_df %>%
group_split(unit_name) %>%
map( function(group_data) {
slice(group_data, 1) %>%
mutate(can="abstention", cv1=abstention) %>%
add_row(group_data, .) %>%
select(-abstention)
}) %>%
bind_rows()
Basically we split the data up by unit_name, then we grab the first row for each group and move the values around. Append that as a new row to each group, and then re-combine all the groups.
I have my data frame below, I want to sum the data like I have in the first row in the image below (labelled row 38). The total flowering summed for Sections A-D for each date, i also have multiple plots not just Dry1, but Dry2, Dry3 etc.
It's so simple to do in my head but I can't workout how to do it in R?
Essentially I want to do this:
with(dat1, sum(dat1$TotalFlowering[dat1$Date=="1997-07-01" & dat1$Plot=="Dry1"]))
Which tells me that the sum of total flowers for sections "A,B,C,D" in plot "Dry1" for the date "1997-07-01" = 166
I want a way to write a code so this does so for every date and plot combo, and then puts it in the data frame?
In the same format as the first row in the image I included :)
Based on your comment it seems like you just want to add a variable to keep track of TotalFlowering for every Date and Plot combination. If that's the case then can you just add a field like TotalCount below?
library(dplyr)
df %>%
group_by(Date, Plot) %>%
mutate(TotalCount = sum(TotalFlowering)) %>%
ungroup()
Or, alternatively, if all you want is the sum you could make use of dplyr's summarise like below
library(dplyr)
df %>%
group_by(Date, Plot) %>%
summarise(TotalCount = sum(TotalFlowering))