I'm trying to sum all of the similar Date/time rows into one row and a "count" row. Therefore I'll get two columns- one for the Date/Time and one for the count.
I used this argument to round my observations into a 15 minute time period:
dat$by15 <- cut(dat$Date_Time, breaks = "15 min", )
I tried to use this argument, but it's "jumping" to a previous dataset and giving me the wrong observations for some reason:
dat <- aggregate(dat, by = list(dat$by15), length )
Thank you guys !
I'm not sure if I understood the question, but if you are trying to group by date and count observations for each date it's really simple
library(dplyr)
grouped_dates <- dat %>%
group_by(Date_Time) %>%
summarise(Count = n())
Related
I would like to do some computation on several rows in a table.
I created an exemple below:
library(dplyr)
set.seed(123)
year_week <- c(200045:200053, 200145:200152, 200245:200252)
input <- as.vector(sample(1:10,25,TRUE))
partial_sum <- c( 20,12,13,18,12,13,4,15,9,13,10,20,11,9,9,5,13,13,,8,13,11,15,14,7,14)
df <- data.frame(year_week, input, partial_sum)
Given are the columns input and year_week. The later represents dates but the values are numerical in my case with the first 4 digits as years and the last two as the working weeks for that year.
What I need, is to iterate over each week in each year and to sum up the values from the same weeks in the other years and save the results into a column called here partial_sum. The current value is excluded from the sum.
The week 53 in the lap year 2000 will get the same treatment but in this case I have only one lap year therefore its value 3 doesn't change.
Any idea on how to make it?
Thank you
I would expect something like this would work, though as pointed out in comments your example isn't exactly reproducible.
library(dplyr)
df %>%
mutate(week = substr(year_week, 5, 6)) %>%
group_by(week) %>%
mutate(result = sum(input))
Perhaps this helps - grouped by 'week' by taking the substring, get the difference between the sum of 'input' and the 'input'
library(dplyr)
df %>%
group_by(week = substring(year_week, 5)) %>%
mutate(partial_sum2 = sum(input) - input)
x<-data.frame(product=c(rep("A",3),rep("B",4)),
xdate=as.Date(c("2020-01-01","2020-01-02","2020-01-04",'2020-01-02','2020-01-04','2020-01-07','2020-01-08')),
number=sample(1:10,7))
In sample data I want to fill missing dates by category. In the sample data it means that
for category A I want all missing dates between it's minimum date 2020-01-01 and maximum '2020-01-04
and the same logic for category B. I am aware of function complete but it seems like it's insufficient for what I am looking for. And the number variable should be filled with 0s
We can use complete here as well :
library(dplyr)
library(tidyr)
x %>%
group_by(product) %>%
complete(xdate = seq(min(xdate), max(xdate), by = "1 day"), fill = list(number = 0))
Im new to coding with R and especially in time series. My problem is that I'd like to include a "Beep" column in a dataset. More specifically, in the dataset, there are 3 columns, ID, date and time like this
.
It would be really useful, next to these columns, to add a corresponding beep, since the individuals got many beeps during the day for some days. I'd like my final result to be something like this
.
How could I do that?
library(dplyr) # Load package dplyr
mydata <- mydata %>% # Take the dataframe, then...
group_by(Name, Dates) %>% # Group it by name and dates, then...
mutate(beep = row_number()) %>% # Add a beep column with a sequential number by name & date
ungroup() # Remove grouping
I've got a (very) basic level of competency with R when working with numbers, but when it comes to manipulating data based on text values in columns I'm stuck. For example, if I want to plot meal frequency vs. day of week (is Tuesday really for tacos?) using the following data frame, how would I do that? I've seen suggestions of tapply, aggregate, colSums, and others, but those have all been for slightly different scenarios and nothing gives me what I'm looking for. Should I be looking at something other than R for this problem? My end goal is a graph with day of week on the X-axis, count on the Y-axis, and a line plot for each meal.
df <- data.frame(meal= c("tacos","spaghetti","burgers","tacos","spaghetti",
"spaghetti"), day = c("monday","tuesday","wednesday","monday","tuesday","wednesday"))
This is as close as I've gotten, and, to be honest, I don't fully understand what it's doing:
tapply(df$day, df$meal, FUN = function(x) length(x))
It will summarize the meal counts, but a) it doesn't have column names (my understanding is that's due to tapply returning a vector), and b) it doesn't keep an association with the day of the week.
Edit: The melt() suggestion below works for this dataset, but it won't scale to the size I need. I was, however, able to get a working graph from the dataframe produced by the melt. If anybody runs across this in the future, try:
ggplot(new, aes(day, value, group=meal, col=meal)) +
geom_line() + geom_point() + scale_y_continuous(breaks = function(x)
unique(floor(pretty(seq(0, (max(x) + 1) * 1.1)))))
(The part after geom_point() is to force the Y-axis to only be integers, which is what makes sense in this case.)
I tried to cut this into smaller pieces so you can understand whats going on
library(tidyverse)
# defining the dataframes
df <- data.frame(meal = c("tacos","spaghetti","burgers","tacos","spaghetti","spaghetti"),
day = c("monday","tuesday","wednesday","monday","tuesday","wednesday"))
# define a vector of days of week ( will be useful to display x axis in the correct order)
ordered_days =c("sunday","monday","tuesday","wednesday",
"thursday","friday",'saturday')
# count the number of meals per day of week
df_count <- df %>% group_by(meal,day) %>% count() %>% ungroup()
# a lot of combinations are missing, for example no burgers on monday
# so i am creating all combinations with count 0
fill_0 <- expand.grid(
meal=factor(unique(df$meal)),
day=factor(ordered_days),
n=0)
# append this fill_0 to df_count
# as some combinations already exist, group by again and sum n
# so only one row per (meal,day) combination
df_count <- rbind(df_count,fill_0) %>%
group_by(meal,day) %>%
summarise(n=sum(n)) %>%
mutate(day=factor(day,ordered=TRUE,
ordered_days))
# plot this by grouping by meal
ggplot(df_count,aes(x=day,y=n,group=meal,col=meal)) + geom_line()
The magic is here, courtesy of #fmarm:
df_count <- df %>% group_by(meal,day) %>% count() %>% ungroup()
The fill_0 and rbind bits also in the sample provided by #fmarm are necessary to keep from bombing out on unspecified combinations, but it's the line above that handles summing meals by day.
I'm using dplyr and rollmean mean to calculate a 13 Week Moving Average and Growth rates. The following works:
NEW_DATA <- DATA %>%
select(CAT, Inventory_Amount, Sales, Shipments, DATE)%>%
group_by(CAT, DATE)%>%
summarise(
INVENTORY = sum(Inventory_Amount),
SO = sum(Sale),
SI = sum(Shipments)
) %>%
arrange(CAT, DATE)%>%
mutate(SO_13WK_AVG = rollmean(x = SO, 13, align = "right", fill = NA ),
GROWTH = round(((SO - lag(SO, 52)) / lag(SO, 52)) *100,2))
This codes adds two new columns "SO_13WK_AVG" (the 13 week sales average) and Growth (YoY Growth Rate for Sales)
When I try to select an additional variable from the original dataframe to include in the new summarized dataframe, the values for the new variables being created all turn to NA's. The following code generates NA's for the SO_13WK_AVG and GROWTH (all I've done is selected the "WK" variable:
NEW_DATA <- DATA %>%
select(CAT, Inventory_Amount, Sales, Shipments, DATE, WK)%>%
group_by(CAT, DATE, WK)%>%
summarise(
INVENTORY = sum(Inventory_Amount),
SO = sum(Sale),
SI = sum(Shipments)
) %>%
arrange(CAT, DATE)%>%
mutate(SO_13WK_AVG = rollmean(x = SO, 13, align = "right", fill = NA ),
GROWTH = round(((SO - lag(SO, 52)) / lag(SO, 52)) *100,2))
I searched stackoverflow and one found one thread that seems related:
Group/Mutate only returns NA and not an average
This thread suggests using na.rm = TRUE to remove NA values from calculations. However as far as I can tell I don't have any missing values. Any help / commentary is appreciated.
I just resolved a very similar issue. Can't quite tell whether it will fix yours without spending more time thingking about it, but I was grouping by the two variables which accounted for all of the variation across my data set (location and week). Therefore, the rolling mean was either not able to calculate, or could only create the fill values. Not grouping by "week" solved the issue. Since "WK" is almost certainly 100% dependent on "Date", I expect you have the same issue. Remember, "summarise" drops the last grouping variable from the grouping. Try grouping by WK before your summarise, and then regrouping without week or date.
(BTW, I'm sure you've figured something out, since this was almost two years ago, but I imagine others will encounter this as well, after all, that's why I came to this question.)