I currently have a time series of football data for weekly stats for variables such as shots and goals. I want to create a "form" function with input for number of games (specify date) and the variable of choice (shots, goals, etc) so that I can check the form of players for certain stats over the last 4 games, 6 games or whatever period I specify. The data frame is of the form:
week = as.vector(c(rep(25, 5), rep(26, 5), rep(27, 5)))
date = as.vector(c(rep("2019-08-09 15:00:00", 5), rep("2019-08-16 15:00:00", 5), rep("2019-08-23 15:00:00", 5)))
players = c("Player 1", "Player 2", "Player 3", "Player 4", "Player 5")
name = as.vector(c(rep(players, 3)))
goals = as.vector(sample(c(0:2), 15, replace = T))
shots = as.vector(sample(c(0:8), 15, replace = T))
data = data.frame(week, date, name, goals, shots)
Would it make sense to create a function using dplyr and input variables for time period and variable type? Or is there some package that will do this for me?
This answer could give you some idea how to filter the data frame for date or games played as specified in the comments:
library(tidyverse)
library(lubridate)
data = tibble(
week = rep(31:40, each = 2),
date = seq.Date(ymd("2019-01-01"), length.out = 20, by = "months"),
name = paste0("player", rep(1:4, each = 5)),
goals = sample(c(0:2), 20, replace = T),
shots = sample(c(0:8), 20, replace = T)
)
# last 3 months or after
data %>%
filter(date > (today() %m-% months(3) ))
# last 5 games
data %>%
filter(week > (max(week) - 4) )
Related
the error is shown above. I am trying to plot a graph that show the amount of tweet within each month of 2016. My question is how can I am able to found out the amount of tweet for each month in order for me to plot a graph to see which month tweeted the most.
library(ggplot2)
library(RColorBrewer)
library(rstudioapi)
current_path = rstudioapi::getActiveDocumentContext()$path
setwd(dirname(current_path ))
print( getwd() )
donaldtrump <- read.csv("random_poll_tweets.csv", stringsAsFactors = FALSE)
print(str(donaldtrump))
time8_ts <- ts(random$time8, start = c(2016,8), frequency = 12)
time7_ts <- ts(random$time7, start = c(2016,7), frequency = 12)
time6_ts <- ts(random$time6, start = c(2016,6), frequency = 12)
time5_ts <- ts(random$time5, start = c(2016,5), frequency = 12)
time4_ts <- ts(random$time4, start = c(2016,4), frequency = 12)
time3_ts <- ts(random$time3, start = c(2016,3), frequency = 12)
time2_ts <- ts(random$time2, start = c(2016,2), frequency = 12)
time1_ts <- ts(random$time1, start = c(2016,1), frequency = 12)
browser_mts <- cbind(time8_ts, time7_ts,time6_ts,time5_ts,time4_ts,time3_ts,time2_ts,time1_ts)
dimnames(browser_mts)[[2]] <- c("8","7","6","5","4","3","2","1")
pdf(file="fig_browser_tweet_R.pdf",width = 11,height = 8.5)
ts.plot(browser_mts, ylab = "Amount of Tweet", xlab = "Month",
plot.type = "single", col = 1:5)
legend("topright", colnames(browser_mts), col = 1:5, lty = 1, cex=1.75)
library(lubridate)
library(dplyr)
donaldtrump$created_at <- donaldtrump$created_at |>
mdy_hm() |>
floor_date(unit = "month")
donaldtrump |> count(created_at)
Just because you are looking at a time series doesn't mean that you must use a time series object.
If you want a plot:
library(ggplot2)
donaldtrump |>
count(created_at) |>
ggplot(aes(created_at, n)) + geom_col() +
labs(x = "Amount of Tweet", y = "Month")
I want to apply a simple wavelet analyze using "waveletcomp" package. I want to use the year shown in x-axis. But it always report error in "lease check your calendar dates, format and time zone: dates may not be in an unambiguous format or chronological. The default numerical axis was used instead." I tried to fix the date, but it seems fine. I really don't know where is the wrong part. Thank you in advance.
Here is the code.
library('WaveletComp')
firecount <- data.frame( YEAR = c("1986-01-01","1987-01-01","1988-01-01","1989-01-01","1990-01-01"
,"1991-01-01","1992-01-01","1993-01-01","1994-01-01","1995-01-01"
,"1996-01-01","1997-01-01","1998-01-01","1999-01-01","2000-01-01"
,"2001-01-01","2002-01-01","2003-01-01","2004-01-01","2005-01-01"
,"2006-01-01","2007-01-01","2008-01-01","2009-01-01","2010-01-01"
,"2011-01-01","2012-01-01","2013-01-01","2014-01-01","2015-01-01"
,"2016-01-01","2017-01-01","2018-01-01","2019-01-01","2020-01-01"
),
COUNT = c(3,5,4,0,0,0,13,0,2,3,0,1,0,3,15,13,
59,18,42,16,20,46,44,8,68,18,7,3,9
,48,7,48,23,84,54)
)
flycount$YEAR <- as.Date(as.character(firecount$YEAR),"%Y")
my.w <- analyze.wavelet(flycount, my.series = "COUNT",
loess.span = 0.5,
dt = 1, dj = 1/35,
lowerPeriod = 2, upperPeriod = 12,
make.pval = TRUE, n.sim = 10,
)
wt.image(my.w, color.key = "interval", n.levels = 15,
legend.params = list(lab = "fire occurrence wavelet", label.digits = 2),
periodlab = "periods (years)",
# Concerning item 1 above --- plot the square root of power:
exponent = 0.5,
# Concerning item 2 above --- time axis:
show.date = TRUE,
date.format = "%F",
timelab = "",
spec.time.axis = list(at = c(paste(1986:2020, "-01-01", sep = "")),
labels = c(1986:2020)),
timetcl = -0.5)
The function analyze.wavelet automatically takes the date from a dataframe column called date. So just rename your column from YEAR to date and you're good to go.
I would like to create specific object based of 2 data frames. First contains basic information about students, second information how many points each student get in each day.
students <- data.frame(
studentId = c(1,2,3),
name = c('Sophia', 'Mike', 'John'),
age = c(13,12,15)
)
studentPoints <- data.frame(
studentId = c(1,1,1,1,1,2,2,2,2,2,3,3,3,3,3),
date = rep(c(Sys.Date()+c(1:5)),3),
point = c(5,1,3,9,9,9,5,2,4,5,8,9,5,8,4)
)
As a result I would like to get object:
result <- list(
list(
studentId = 1,
name = 'Sophia',
age = 13,
details = list(
date = c("2021-04-11", "2021-04-12", "2021-04-13", "2021-04-14", "2021-04-15"),
point = c(5,1,3,9,9)
)
),
list(
studentId = 2,
name = 'Mike',
age = 12,
details = list(
date = c("2021-04-11", "2021-04-12", "2021-04-13", "2021-04-14", "2021-04-15"),
point = c(9,5,2,4,5)
)
),
list(
studentId = 3,
name = 'John',
age = 15,
details = list(
date = c("2021-04-11", "2021-04-12", "2021-04-13", "2021-04-14", "2021-04-15"),
point = c(8,9,5,8,4)
)
)
)
I created it manually, any idea how to create it automatically? because my database contains few thousands of students
Split the lists by studentId, convert each column to it's own list and combine the datasets.
result <- Map(function(x, y) list(x, details = y),
lapply(split(students, students$studentId), as.list),
lapply(split(studentPoints, studentPoints$studentId), as.list))
TLDR: I want to label the frame slider with the three letter abbreviation instead of the number for each month.
I created a bar chart showing average snow depth each month over a 40 year period. I'm pulling my data from NOAA and then grouping by year and month using lubridate. Here is the code:
snow_depth <- govy_data$snwd %>%
replace_na(list(snwd = 0)) %>%
mutate(month_char = month(date, label = TRUE, abbr = TRUE)) %>%
group_by(year = year(date), month = month(date), month_char) %>%
summarise(avg_depth = mean(snwd))
The mutate function creates a column (month_char) in the data frame holding the three letter abbreviation for each month. The class for this column is an ordered factor.
The code below shows how I'm creating the chart/animation:
snow_plot <- snow_depth %>% plot_ly(
x = ~year,
y = ~avg_depth,
color = ~avg_temp,
frame = ~month,
text = ~paste('<i>Month</i>: ', month_char,
'<br><b>Avg. Depth</b>: ', avg_depth,
'<br><b>Avg. Temp</b>: ', avg_temp),
hoverinfo = 'text',
type = 'bar'
)
snow_plot
This code generates a plot that animates well and looks like this:
What I'd like to do is change the labels on the slider so instead of numbers, it shows the three letter month abbreviation. I've tried switching the frame to ~month_char which is the ordered factor of three letter month abbreviations. What I end up with, isn't right at all:
The data frame looks like:
I fear, with the current implementation of animation sliders in R's plotly API the desired behaviour can't be realized. This is due to the fact, that no custom animation steps are allowed (this includes the labels). Please see (and support) my GitHub FR for further information.
This is the best I was currently able to come up with:
library(plotly)
DF <- data.frame(
year = rep(seq(1980L, 2020L), each = 12),
month = rep(1:12, 41),
month_char = rep(factor(month.abb), 41),
avg_depth = runif(492)
)
fig <- DF %>%
plot_ly(
x = ~year,
y = ~avg_depth,
frame = ~paste0(sprintf("%02d", month), " - ", month_char),
type = 'bar'
) %>%
animation_slider(
currentvalue = list(prefix = "Month: ")
)
fig
(Edit from OP) Here's the resulting graph using the above code:
I have a dataframe with the following structure:
df <- structure(list(Name = structure(1:9, .Label = c("task 1", "task 2",
"task 3", "task 4", "task 5", "task 6", "task 7", "task 8", "task 9"
), class = "factor"), Start = structure(c(1479799800, 1479800100,
1479800400, 1479800700, 1479801000, 1479801300, 1479801600, 1479801900,
1479802200), class = c("POSIXct", "POSIXt"), tzone = ""), End = structure(c(1479801072,
1479800892, 1479801492, 1479802092, 1479802692, 1479803292, 1479803892,
1479804492, 1479805092), class = c("POSIXct", "POSIXt"), tzone = "")), .Names = c("Name",
"Start", "End"), row.names = c(NA, -9L), class = "data.frame")
Now I want to count the items in column "Name" over time. They all have a start and end datetimes, which are formated as POSIXct.
With help of this solution here on SO I was able to do so (or at least I think I was) with following code:
library(data.table)
setDT(df)
dates = seq(min(df$Start), max(df$End), by = "min")
lookup = data.table(Start = dates, End = dates, key = c("Start", "End"))
ans = foverlaps(df, lookup, type = "any", which = TRUE)
library(ggplot2)
ggplot(ans[, .N, by = yid], aes(x = yid, y = N)) + geom_line()
Problem now:
How do I match my DateTime-scale to those integer values on the x-axis? Or is there a faster and better solution to solve my problem?
I tried to use x = as.POSIXct(yid, format = "%Y-%m-%dT%H:%M:%S", origin = min(df$Start)) within the aes of the ggplot(). But that didn't work.
EDIT:
When using the solution for this problem, I face another. Items, where there is no count, are displayed with the count of the latest countable item in the plot. This is why we have to merge (leftjoin) the table with the counts (ants) again with a complete sequence of all Datetimes and put a 0 for every NA. So we get explicit values for every necessary datapoint.
Like this:
# The part we use to count and match the right times
df1 <- ans[, .N, by = yid] %>%
mutate(time = min(df$Start) + minutes(yid))
# The part where we use the sequence from the beginning for a LEFT JOIN with the counting dataframe
df2 <- data.frame(time = dates)
dt <- merge(x = df2, y = df1, by = "time", all.x = TRUE)
dt[is.na(dt)] <- 0
In the tidyverse framework, this is a slightly different task -
Generate the sames dates variable you have.
Construct a data frame with all dates and all times (cartesian join)
Filter out the rows that are not in the interval for each task
Add up the tasks for each minute that remain
Plot.
That looks something like this --
library(tidyverse)
library(lubridate)
dates = seq(min(df$Start), max(df$End), by = "min")
df %>%
mutate(key = 1) %>%
left_join(data_frame(key = 1, times = dates)) %>%
mutate(include = times %within% interval(Start, End)) %>%
filter(include) %>%
group_by(times) %>%
summarise(count = n()) %>%
ggplot(aes(times, count)) +
geom_line()
#> Joining, by = "key"
If you need it to be faster, it will almost certainly be faster using your original data.table code.
Consider this.
library(data.table)
setDT(df)
dates = seq(min(df$Start), max(df$End), by = "min")
lookup = data.table(Start = dates, End = dates, key = c("Start", "End"))
ans = foverlaps(df, lookup, type = "any", which = TRUE)
ans[, .N, by = yid] %>%
mutate(time = min(df$Start) + minutes(yid)) %>%
ggplot(aes(time, N)) +
geom_line()
Now we use data.table to calculate the overlap, and then index time off the starting minute. Once we add a new column with the times, we can plot.