I've got a simple dataset.
structure(list(ID = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 4L, 4L,
4L, 5L, 5L), Primrely = c(0L, 2L, 1L, 1L, 1L, 1L, 3L, 4L, 4L,
3L, 1L, 2L, 2L), Primset = c(-4L, -3L, 1L, 2L, -4L, 5L, 3L, 1L,
2L, -4L, -2L, -3L, 3L), Primvalue = c(45L, 5L, 6L, 15L, 53L,
45L, 44L, 65L, 1L, 5L, 1L, 12L, 5L), Secrely = c(5L, 7L, 2L,
1L, 2L, 0L, 4L, 5L, 1L, 1L, 1L, 0L, 2L), Secset = c(-3L, 1L,
2L, -2L, -3L, 2L, 5L, 7L, 7L, 4L, 3L, 2L, 1L), Secvalue = c(38L,
-2L, -1L, 8L, 46L, 38L, 37L, 58L, -6L, -2L, -6L, 5L, -2L), Desired = structure(c(NA,
1L, NA, NA, 2L, 2L, NA, NA, NA, NA, NA, 1L, 1L), .Label = c("Primary",
"Secondary"), class = "factor")), .Names = c("ID", "Primrely",
"Primset", "Primvalue", "Secrely", "Secset", "Secvalue", "Desired"
), class = "data.frame", row.names = c(NA, -13L))
ID Primrely Primset Primvalue Secrely Secset Secvalue Desired
1 1 0 -4 45 5 -3 38 <NA>
2 1 2 -3 5 7 1 -2 Primary
3 1 1 1 6 2 2 -1 <NA>
4 1 1 2 15 1 -2 8 <NA>
5 2 1 -4 53 2 -3 46 Secondary
6 2 1 5 45 0 2 38 Secondary
7 2 3 3 44 4 5 37 <NA>
8 3 4 1 65 5 7 58 <NA>
9 4 4 2 1 1 7 -6 <NA>
10 4 3 -4 5 1 4 -2 <NA>
11 4 1 -2 1 1 3 -6 <NA>
12 5 2 -3 12 0 2 5 Primary
13 5 2 3 5 2 1 -2 Primary
For each ID, I'd like to select rows that meet the criteria (Prim = primary, Sec = secondary): If Primrely is 0 or 2 and Primset is -3:3, select all rows for each ID. If no rows for a given ID meet the primary criteria, select rows that meet the secondary criteria (Secrely is 0 or 2 and Secset is -3:3). Ideally, I'd like to add a column (Desired) that indicate which criteria was met (primary/secondary/NA).
I've been working with ifelse and if else functions without much luck mainly because I don't know how to command R to ingore a given ID if the primary criteria was already met (eg ID #1 meets the second criteria but doesn't need it because it already met the first criteria). In other words, if a 'primary' shows up in a given ID, it trumps all the 'secondary' criteria that were met. I would appreciate any advice.
If I understand you correctly now:
(left in the steps to show you what I was doing, you can remove them and/or do this all in one step if you want)
dat <- structure(list(ID = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 4L, 4L,
4L, 5L, 5L), Primrely = c(0L, 2L, 1L, 1L, 1L, 1L, 3L, 4L, 4L,
3L, 1L, 2L, 2L), Primset = c(-4L, -3L, 1L, 2L, -4L, 5L, 3L, 1L,
2L, -4L, -2L, -3L, 3L), Primvalue = c(45L, 5L, 6L, 15L, 53L,
45L, 44L, 65L, 1L, 5L, 1L, 12L, 5L), Secrely = c(5L, 7L, 2L,
1L, 2L, 0L, 4L, 5L, 1L, 1L, 1L, 0L, 2L), Secset = c(-3L, 1L,
2L, -2L, -3L, 2L, 5L, 7L, 7L, 4L, 3L, 2L, 1L), Secvalue = c(38L,
-2L, -1L, 8L, 46L, 38L, 37L, 58L, -6L, -2L, -6L, 5L, -2L), Desired = structure(c(NA,
1L, NA, NA, 2L, 2L, NA, NA, NA, NA, NA, 1L, 1L), .Label = c("Primary",
"Secondary"), class = "factor")), .Names = c("ID", "Primrely",
"Primset", "Primvalue", "Secrely", "Secset", "Secvalue", "Desired"
), class = "data.frame", row.names = c(NA, -13L))
within(dat, {
Desired_step1 <- ifelse(Primrely %in% c(0,2) & Primset %in% -3:3,
1, ifelse(Secrely %in% c(0,2) & Secset %in% -3:3,
2, 3))
Desired_new <- factor(ave(Desired_step1, ID, FUN = function(x)
ifelse(x == min(x), x, NA)),
levels = 1:3, labels = c('Primary', 'Secondary', 'NA'))
Desired_step1 <- c('1'='Primary','2'='Secondary','3'=NA)[Desired_step1]
})
# ID Primrely Primset Primvalue Secrely Secset Secvalue Desired Desired_new Desired_step1
# 1 1 0 -4 45 5 -3 38 <NA> <NA> <NA>
# 2 1 2 -3 5 7 1 -2 Primary Primary Primary
# 3 1 1 1 6 2 2 -1 <NA> <NA> Secondary
# 4 1 1 2 15 1 -2 8 <NA> <NA> <NA>
# 5 2 1 -4 53 2 -3 46 Secondary Secondary Secondary
# 6 2 1 5 45 0 2 38 Secondary Secondary Secondary
# 7 2 3 3 44 4 5 37 <NA> <NA> <NA>
# 8 3 4 1 65 5 7 58 <NA> NA <NA>
# 9 4 4 2 1 1 7 -6 <NA> NA <NA>
# 10 4 3 -4 5 1 4 -2 <NA> NA <NA>
# 11 4 1 -2 1 1 3 -6 <NA> NA <NA>
# 12 5 2 -3 12 0 2 5 Primary Primary Primary
# 13 5 2 3 5 2 1 -2 Primary Primary Primary
Here's my quick & dirty solution assuming your data.frame is named df. You can refine it yourself I think:
df$Desired <- ifelse((df$Primrely==0 | df$Primrely==2) & (df$Primset >= -3 & df$Primset <= 3),
"Primary",
NA)
idx <- is.na(df$Desired)
df$Desired[idx] <- ifelse((df$Secrely[idx]==0 | df$Secrely[idx]==2) & (df$Secset[idx] >= -3 & df$Secset[idx] <= 3),
"Secondary",
NA)
Related
I have a csv file like these: this csv filled is called df_plane in R
Situation
flight_uses
People-ID
1
1
1
2
1
1
3
0
1
1
1
2
2
1
2
3
1
2
1
1
3
2
0
3
3
1
3
1
1
4
2
1
4
3
0
4
1
1
5
2
0
5
3
0
5
1
1
6
2
1
6
3
NA
6
1
NA
7
2
1
7
3
1
7
1
1
8
2
0
8
3
0
8
1
NA
9
2
NA
9
3
1
9
1
1
10
2
1
10
3
0
10
1
0
11
2
0
11
3
0
11
I would like to find out what percentage of people uses airplane in situation 2. I would like to know if there is a more efficient way than use the code below. Because with the below code I have to calculate it manually.
table(select(df_plane,situation,flight_uses))
You can use functions from the janitor package.
library(tidyverse)
library(janitor)
#>
#> Attaching package: 'janitor'
#> The following objects are masked from 'package:stats':
#>
#> chisq.test, fisher.test
df_plane <- tibble::tribble(
~Situation, ~flight_uses, ~`People-ID`,
1L, 1L, 1L,
2L, 1L, 1L,
3L, 0L, 1L,
1L, 1L, 2L,
2L, 1L, 2L,
3L, 1L, 2L,
1L, 1L, 3L,
2L, 0L, 3L,
3L, 1L, 3L,
1L, 1L, 4L,
2L, 1L, 4L,
3L, 0L, 4L,
1L, 1L, 5L,
2L, 0L, 5L,
3L, 0L, 5L,
1L, 1L, 6L,
2L, 1L, 6L,
3L, NA, 6L,
1L, NA, 7L,
2L, 1L, 7L,
3L, 1L, 7L,
1L, 1L, 8L,
2L, 0L, 8L,
3L, 0L, 8L,
1L, NA, 9L,
2L, NA, 9L,
3L, 1L, 9L,
1L, 1L, 10L,
2L, 1L, 10L,
3L, 0L, 10L,
1L, 0L, 11L,
2L, 0L, 11L,
3L, 0L, 11L
) |>
clean_names()
df_plane |>
tabyl(situation, flight_uses) |>
adorn_percentages() |>
adorn_pct_formatting()
#> situation 0 1 NA_
#> 1 9.1% 72.7% 18.2%
#> 2 36.4% 54.5% 9.1%
#> 3 54.5% 36.4% 9.1%
Created on 2022-10-26 with reprex v2.0.2
In Situation 2, 54.5% of passengers uses airplane.
You can use mean to calculate the proportion
> with(df_plane,mean(replace(flight_uses, is.na(flight_uses), 0)[Situation==2]))
[1] 0.5454545
Are you asking, of those rows where Situation==2, what is the percent where flight_uses==1?
dplyr approach
dplyr is useful for these types of manipulations:
library(dplyr)
df_plane |>
filter(Situation == 2) |>
summarise(
percent_using_plane = sum(flight_uses==1, na.rm=T) / n() * 100
)
# percent_using_plane
# 1 54.54545
base R
If you want to stick with the base R table syntax (which seems fine in this case but can become unwieldy once calculations get more complicated), you were nearly there:
table(df_plane[df_plane$Situation==2,]$flight_uses) / nrow(df_plane[df_plane$Situation==2,])*100
# 0 1
# 36.36364 54.54545
Use with instead of dplyr::select and wrap it in proportions.
proportions(with(df_plane, table(flight_uses, Situation, useNA='ifany')), 2)
# Situation
# flight_uses 1 2 3
# 0 0.09090909 0.36363636 0.54545455
# 1 0.72727273 0.54545455 0.36363636
# <NA> 0.18181818 0.09090909 0.09090909
Here is some mock data related to this problem:
structure(list(HHID = c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 3L,
3L, 3L, 4L, 4L, 4L, 4L, 4L), PERS = c(1L, 2L, 3L, 4L, 5L, 1L,
2L, 3L, 4L, 1L, 2L, 3L, 1L, 2L, 3L, 4L, 5L), MARSTAT = c(2L,
2L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 5L, 1L, 1L
), SEX = c(1L, 2L, 2L, 1L, 1L, 2L, 1L, 2L, 1L, 2L, 2L, 1L, 2L,
1L, 2L, 2L, 1L), VAR1 = c(NA, 1L, 4L, 4L, 4L, NA, 1L, 5L, 4L,
NA, 4L, 4L, NA, 1L, 8L, 4L, 4L), VAR2 = c(NA, NA, 4L, 4L, 4L,
NA, NA, 4L, 5L, NA, NA, 6L, NA, NA, 12L, 4L, 4L), VAR3 = c(NA,
NA, NA, 6L, 6L, NA, NA, NA, 7L, NA, NA, NA, NA, NA, NA, 11L,
11L), VAR4 = c(NA, NA, NA, NA, 6L, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, 6L), VAR5 = c(NA_integer_, NA_integer_, NA_integer_,
NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_,
NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_,
NA_integer_, NA_integer_, NA_integer_, NA_integer_), FLAG = c(0L,
0L, 0L, 1L, 0L, 0L, 0L, 1L, 1L, 0L, 1L, 0L, 0L, 0L, 0L, 1L, 1L
)), .Names = c("HHID", "PERS", "MARSTAT", "SEX", "VAR1", "VAR2",
"VAR3", "VAR4", "VAR5", "FLAG"), row.names = c(NA, 17L), class = "data.frame")
For each household in my data, I want to transpose the values in the lower triangle into the upper triangle so that for each household I essentially have a symmetrical matrix with the diagonal either NA or 0 (for this analysis, 0 and NA are interchangeable). So based on the above example, I would be looking for the following dataset:
structure(list(HHID = c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 3L,
3L, 3L, 4L, 4L, 4L, 4L, 4L), PERS = c(1L, 2L, 3L, 4L, 5L, 1L,
2L, 3L, 4L, 1L, 2L, 3L, 1L, 2L, 3L, 4L, 5L), MARSTAT = c(2L,
2L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 5L, 1L, 1L
), SEX = c(1L, 2L, 2L, 1L, 1L, 2L, 1L, 2L, 1L, 2L, 2L, 1L, 2L,
1L, 2L, 2L, 1L), VAR1 = c(NA, 1L, 4L, 4L, 4L, NA, 1L, 5L, 4L,
NA, 4L, 4L, NA, 1L, 8L, 4L, 4L), VAR2 = c(1L, NA, 4L, 4L, 4L,
1L, NA, 4L, 5L, 4L, NA, 6L, 1L, NA, 12L, 4L, 4L), VAR3 = c(4L,
4L, NA, 6L, 6L, 5L, 4L, NA, 7L, 4L, 6L, NA, 8L, 12L, NA, 11L,
11L), VAR4 = c(4L, 4L, 6L, NA, 6L, 4L, 5L, 7L, NA, NA, NA, NA,
4L, 4L, 11L, NA, 6L), VAR5 = c(4L, 4L, 6L, 6L, NA, NA, NA, NA,
NA, NA, NA, NA, 4L, 4L, 11L, 6L, NA), FLAG = c(0L, 0L, 0L, 1L,
0L, 0L, 0L, 1L, 1L, 0L, 1L, 0L, 4L, 4L, 11L, 1L, 1L)), .Names = c("HHID",
"PERS", "MARSTAT", "SEX", "VAR1", "VAR2", "VAR3", "VAR4", "VAR5",
"FLAG"), class = "data.frame", row.names = c(NA, -17L))
I have been able to do this for one household, as follows (though it misses the HHID which I would need to distinguish between households):
HH1 <- df %>%
filter(HHID == 1) %>%
select(VAR1, VAR2, VAR3, VAR4, VAR5)
HH1 <- as.matrix(HH1)
HH1[is.na(HH1)] <- 0
T_HH1 <- t(HH1)
T_HH1[is.na(T_HH1)] <- 0
combo <- HH1 + T_HH1
A <- combo
However, how would I go about doing this for multiple households across my dataset, also keeping the "HHID" and "PERS" information so that I can link on any extra info if needed?
Thank you so much in advance!
One approach is:
Split your data frame by HHID into groups
Create a custom function to take VAR columns, make it a square matrix, and transpose
Use rbindlist to reconstruct into rows again using fill to add NA as lengths in the list differ
Replace VAR columns (5 through 9) with new VAR columns
Let me know if this works for you.
f <- function(m) {
m <- m[, 1:nrow(m)]
m[upper.tri(m)] <- t(m)[upper.tri(m)]
m
}
df1[,5:9] <- rbindlist(lapply(split(df1[,5:9], df1$HHID), f), fill = TRUE)
Output
HHID PERS MARSTAT SEX VAR1 VAR2 VAR3 VAR4 VAR5 FLAG
1 1 1 2 1 NA 1 4 4 4 0
2 1 2 2 2 1 NA 4 4 4 0
3 1 3 1 2 4 4 NA 6 6 0
4 1 4 1 1 4 4 6 NA 6 1
5 1 5 1 1 4 4 6 6 NA 0
6 2 1 2 2 NA 1 5 4 NA 0
7 2 2 2 1 1 NA 4 5 NA 0
8 2 3 1 2 5 4 NA 7 NA 1
9 2 4 1 1 4 5 7 NA NA 1
10 3 1 1 2 NA 4 4 NA NA 0
11 3 2 1 2 4 NA 6 NA NA 1
12 3 3 1 1 4 6 NA NA NA 0
13 4 1 2 2 NA 1 8 4 4 0
14 4 2 2 1 1 NA 12 4 4 0
15 4 3 5 2 8 12 NA 11 11 0
16 4 4 1 2 4 4 11 NA 6 1
17 4 5 1 1 4 4 11 6 NA 1
additional solution
library(purrr)
library(tidyverse)
df %>%
mutate_all(~ replace_na(., 0)) %>%
select(HHID, starts_with("VAR")) %>%
group_by(HHID) %>%
nest %>%
mutate(data = map(data, ~ .x + t(.x))) %>%
unnest(data) %>%
bind_cols(select(df, -starts_with("VAR"), -HHID))
You can split the data on the HHID, apply an anonymous function to do the matrix stuff, then unsplit it.
vars <- grep("^VAR", names(df))
df[, vars] <- unsplit(lapply(split(df[, vars], df$HHID), tt), df$HHID)
# HHID PERS MARSTAT SEX VAR1 VAR2 VAR3 VAR4 VAR5 FLAG
# 1 1 1 2 1 0 1 4 4 4 0
# 2 1 2 2 2 1 0 4 4 4 0
# 3 1 3 1 2 4 4 0 6 6 0
# 4 1 4 1 1 4 4 6 0 6 1
# 5 1 5 1 1 4 4 6 6 0 0
# 6 2 1 2 2 0 1 5 4 0 0
# 7 2 2 2 1 1 0 4 5 0 0
# 8 2 3 1 2 5 4 0 7 0 0
# 9 2 4 1 1 4 5 7 0 0 0
# 10 3 1 1 2 0 4 4 0 0 0
# 11 3 2 1 2 4 0 6 0 0 0
# 12 3 3 1 1 4 6 0 0 0 0
# 13 4 1 2 2 0 1 8 4 4 0
# 14 4 2 2 1 1 0 12 4 4 0
# 15 4 3 5 2 8 12 0 11 11 0
# 16 4 4 1 2 4 4 11 0 6 1
# 17 4 5 1 1 4 4 11 6 0 1
Here's the anonymous function:
tt <- function(x) {
x <- x[, 1:nrow(x)] # Make it square
x[upper.tri(x)] <- 0 # replace upper triangle with 0
x + t(x) # add them together
}
I have the following dataset
mydata=structure(list(id = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), ad_id = c(111L, 111L, 111L,
111L, 1111L, 1111L, 11111L, 11111L, 11111L, 111L, 111L, 1111L,
1111L, 11111L, 11111L, 11111L, 111111L, 111111L), price = c(1L,
0L, 1L, 0L, 2L, 0L, 3L, 0L, 0L, 1L, 0L, 2L, 0L, 3L, 0L, 0L, 1L,
0L), rev = c(2L, 0L, 0L, 2L, 3L, 3L, 4L, 4L, 4L, 2L, 2L, 3L,
3L, 4L, 4L, 4L, 0L, 0L), data = structure(c(1L, 2L, 2L, 3L, 1L,
3L, 1L, 3L, 4L, 1L, 3L, 1L, 3L, 1L, 3L, 4L, 1L, 3L), .Label = c("01.01.2018",
"01.02.2018", "01.03.2018", "02.03.2018"), class = "factor")), .Names = c("id",
"ad_id", "price", "rev", "data"), class = "data.frame", row.names = c(NA,
-18L))
How can I create a dummy variable according to the following logic:
For each id and ad_id I need to aggregate by data price and rev. Each ad_id has a date column (data).
If for each id and ad_idfor the period up to 90 days(data column -d-m-y) rev is greater than the price, then the flag is set to 1 otherwise the flag is 0.
In this reproducible example , I just take 1 id and 4 ad_id.
In aggregated by sum form it is view
id ad_id price rev
1 1 111 2 4
2 1 1111 2 6
3 1 11111 3 12
4 1 111111 1 0
So for id=1 , all ad_id (besides ad_id = 111111) satisfy rev > price, so in initial data
ad_id = 111, 1111, 111111 must have flag = 1 and 111111 must have flag = 0.
Here is the desired output:
id ad_id price rev data flag
1 1 111 1 2 01.01.2018 1
2 1 111 0 0 01.02.2018 1
3 1 111 1 0 01.02.2018 1
4 1 111 0 2 01.03.2018 1
5 1 1111 2 3 01.01.2018 1
6 1 1111 0 3 01.03.2018 1
7 1 11111 3 4 01.01.2018 1
8 1 11111 0 4 01.03.2018 1
9 1 11111 0 4 02.03.2018 1
10 1 111111 1 0 01.01.2018 0
11 1 111111 0 0 01.03.2018 0
How to perform such condition
I am not sure if understood you correctly, but is this what you are looking for:
library(tidyverse)
mydata %>% as_tibble() %>%
group_by(id, ad_id) %>%
summarise_at(vars("price", "rev"), sum) %>%
mutate(flag = if_else(price > rev, 0, 1)) %>%
select(id, ad_id, flag) %>%
left_join(mydata, ., by = c("id", "ad_id"))
I have a data, as an example I show below
a = rep(1:5, each=3)
b = rep(c("a","b","c","a","c"), each = 3)
df = data.frame(a,b)
I want to select all the rows that have the "a"
I tried to do it with
df[df$a %in% a,]
Can someone give me an idea how to get them out?
df2<- structure(list(V1 = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), V2 = structure(c(1L,
2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 1L, 2L, 3L, 4L,
5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L), .Label = c("B02", "B03",
"B04", "B05", "B06", "B07", "C02", "C03", "C04", "C05", "C06",
"C07"), class = "factor")), .Names = c("V1", "V2"), class = "data.frame", row.names = c(NA,
-24L))
I want to select specific rows that start with B but not all of them and just 02, 03, 04, 05
1 B02
1 B03
1 B04
1 B05
2 B02
2 B03
2 B04
2 B05
I also want to have the original data without them too
We need to check the 'b' column
df[df$b %in% 'a',]
For the updated question with 'df2', we can use paste to create the strings 'B02' to 'B05' and use %in% to subset
df2[df2$V2 %in% paste0("B0", 2:5),]
Or another option is grep
df2[grep("^B0[2-5]$", df2$V2),]
> df
a b
1 1 a
2 1 a
3 1 a
4 2 b
5 2 b
6 2 b
7 3 c
8 3 c
9 3 c
10 4 a
11 4 a
12 4 a
13 5 c
14 5 c
15 5 c
This basically says:
For all columns in df choose rows that have value equal to a
> rows_with_a<-df[df$b=='a', ]
> rows_with_a
a b
1 1 a
2 1 a
3 1 a
10 4 a
11 4 a
12 4 a
I have a monthly time series - monthlyTs:
monthlyTs <- ts(all.xts , frequency = 12, start=decimal_date(ymd("2012-01-29")))
head(index(monthlyTs))
1 "2012-01-29 00:00:00 UTC" "2012-02-26 01:22:47 UTC" "2012-03-25
02:45:35 UTC" "2012-04-29 04:29:04 UTC"
[5] "2012-05-27 05:51:52 UTC" "2012-06-24 07:14:39 UTC"
I want to apply a time windows that starts from 2013:
head(window(monthly, start = 2013))
2012-01-29 00:00:00 2
2012-02-26 01:22:47 8 2012-03-25 02:45:35 6 2012-04-29 04:29:04
5 2012-05-27 05:51:52 4 2012-06-24 07:14:39 4
So looks like window function is not filtering as expected. What is wrong?
Fully reproducible example as requested:
christmas.csv - tiny CSV file (google trends for 'Christmas' request)
#Reading data from the csv. Format - [week start date], [views per week]
data = read.csv('christmas.csv', sep=",", header = FALSE, skip = 3,col.names = c("Week","Views"))[[2]]
# creating time series
myTs <- ts(data[[2]], freq=365.25/7, start=decimal_date(ymd("2012-01-29")))
#converting from weekly to month time series
all.xts <- xts(myTs, date_decimal(index(myTs)))
monthlyTs <- ts(all.xts , frequency = 12, start=decimal_date(ymd("2012-01-29")))
head(window(monthlyTs, start = 2013))
2012-01-29 00:00:00 2
2012-02-26 01:22:47 8 2012-03-25 02:45:35 6 2012-04-29 04:29:04 5
2012-05-27 05:51:52 4 2012-06-24 07:14:39 4
There are two problems :
the object all.xts is a weekly and not a monthly time
The value your pass for the argument frequency is not correct
For the second point, try to change the value you pass for the argument start in your call of the function ts with
c(lubridate::year("2012-01-29"), lubridate::month("2012-01-29"))
and change the frequency to value 12. i.e use the line :
ts(all.xts , frequency = 12, start = c(lubridate::year("2012-01-29"), lubridate::month("2012-01-29")) )
Using the output from dput, your code rewrite as follow :
data <- c(2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L,
2L, 3L, 3L, 3L, 4L, 5L, 5L, 6L, 8L, 11L, 16L, 22L, 33L, 42L,
45L, 55L, 64L, 8L, 4L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 4L, 4L, 5L, 6L, 8L,
12L, 16L, 21L, 27L, 43L, 47L, 56L, 79L, 10L, 5L, 2L, 2L, 2L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 3L,
3L, 3L, 4L, 5L, 5L, 6L, 8L, 12L, 17L, 21L, 27L, 43L, 47L, 53L,
87L, 12L, 5L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 4L, 5L, 6L, 6L, 8L, 13L,
17L, 20L, 27L, 44L, 50L, 54L, 100L, 15L, 6L, 3L, 2L, 2L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L,
3L, 4L, 5L, 5L, 6L, 8L, 11L, 16L, 21L, 29L, 43L, 48L, 53L, 80L,
46L, 8L, 3L, 2L)
myTs <- ts(data, freq=365.25/7, start=decimal_date(ymd("2012-01-29")))
all.xts <- xts::xts(myTs, date_decimal(index(myTs)))
monthlyTs <- ts(all.xts , frequency = 12, start = c(lubridate::year("2012-01-29"), lubridate::month("2012-01-29")) )
window(monthlyTs, start= c(2013))
The last line will print :
> window(monthlyTs, start= c(2013))
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
2013 1 1 1 1 1 1 1 1 1 1 1 1
2014 1 1 1 1 2 2 2 2 3 3 3 4
2015 5 5 6 8 11 16 22 33 42 45 55 64
2016 8 4 2 2 2 2 2 2 1 1 1 1
2017 1 1 1 1 1 1 1 1 1 1 1 1
2018 1 1 1 1 1 1 1 2 2 2 2 2
2019 3 3 3 4 4 5 6 8 12 16 21 27
2020 43 47 56 79 10 5 2 2 2 1 1 1
2021 1 1 1 1 1 1 1 1 1 1 1 1
2022 1 1 1 1 1 1 1 1 1 1 2 2
2023 2 2 2 2 3 3 3 4 5 5 6 8
2024 12 17 21 27 43 47 53 87 12 5 2 2
2025 2 1 1 1 1 1 1 1 1 1 1 1
2026 1 1 1 1 1 1 1 1 1 1 1 1
2027 1 2 2 2 2 2 2 2 3 3 3 4
2028 5 6 6 8 13 17 20 27 44 50 54 100
2029 15 6 3 2 2 1 1 1 1 1 1 1
2030 1 1 1 1 1 1 1 1 1 1 1 1
2031 1 1 1 1 1 1 2 2 2 2 2 2
2032 3 3 3 4 5 5 6 8 11 16 21 29
2033 43 48 53 80 46 8 3 2