I have grouped data that I want to convert to ungrouped data.
year<-c(rep(2014,4),rep(2015,4))
Age<-rep(c(22,23,24,25),2)
n<-c(1,1,3,2,0,2,3,1)
mydata<-data.frame(year,Age,n)
I would like to have a dataset like the one below created from the previous one.
year Age
1 2014 22
2 2014 23
3 2014 24
4 2014 24
5 2014 24
6 2014 25
7 2014 25
8 2015 23
9 2015 23
10 2015 24
11 2015 24
12 2015 24
13 2015 25
Try
mydata[rep(1:nrow(mydata),mydata$n),]
year Age n
1 2014 22 1
2 2014 23 1
3 2014 24 3
3.1 2014 24 3
3.2 2014 24 3
4 2014 25 2
4.1 2014 25 2
6 2015 23 2
6.1 2015 23 2
7 2015 24 3
7.1 2015 24 3
7.2 2015 24 3
8 2015 25 1
Here's a tidyverse solution:
library(tidyverse)
mydata %>%
uncount(n)
which gives:
year Age
1 2014 22
2 2014 23
3 2014 24
4 2014 24
5 2014 24
6 2014 25
7 2014 25
8 2015 23
9 2015 23
10 2015 24
11 2015 24
12 2015 24
13 2015 25
You can also use tidyr syntax for this:
library(tidyr)
year<-c(rep(2014,4),rep(2015,4))
Age<-rep(c(22,23,24,25),2)
n<-c(1,1,3,2,0,2,3,1)
mydata<-data.frame(year,Age,n)
uncount(mydata, n)
#> year Age
#> 1 2014 22
#> 2 2014 23
#> 3 2014 24
#> 4 2014 24
#> 5 2014 24
#> 6 2014 25
#> 7 2014 25
#> 8 2015 23
#> 9 2015 23
#> 10 2015 24
#> 11 2015 24
#> 12 2015 24
#> 13 2015 25
But of course you shouldn't use tidyr just because it is tidyr :) An alternate view of the Tidyverse "dialect" of the R language, and its promotion by RStudio.
We can use tidyr::complete
library(tidyr)
library(dplyr)
mydata %>% group_by(year, Age) %>%
complete(n = seq_len(n)) %>%
select(-n) %>%
ungroup()
# A tibble: 14 × 2
year Age
<dbl> <dbl>
1 2014 22
2 2014 23
3 2014 24
4 2014 24
5 2014 24
6 2014 25
7 2014 25
8 2015 23
9 2015 23
10 2015 24
11 2015 24
12 2015 24
13 2015 25
14 2015 22
Related
I have this time series data frame as follows:
df <- read.table(text =
"Year Month Value
2021 1 4
2021 2 11
2021 3 18
2021 4 6
2021 5 20
2021 6 5
2021 7 12
2021 8 4
2021 9 11
2021 10 18
2021 11 6
2021 12 20
2022 1 14
2022 2 11
2022 3 18
2022 4 9
2022 5 22
2022 6 19
2022 7 22
2022 8 24
2022 9 17
2022 10 28
2022 11 16
2022 12 26",
header = TRUE)
I want to turn this data frame into a time series object of date column and value column only so that I can use the ts function to filter the starting point and the endpoint like ts(ts, start = starts, frequency = 12). R should know that 2022 is a year and the corresponding 1:12 are its months, the same thing should apply to 2021. I will prefer lubridate package.
pacman::p_load(
dplyr,
lubridate)
UPDATE
I now use unite function from dplyr package.
df|>
unite(col='date', c('Year', 'Month'), sep='')
Perhaps this?
df |>
tidyr::unite(col='date', c('Year', 'Month'), sep='-') |>
mutate(date = lubridate::ym(date))
# date Value
# 1 2021-01-01 4
# 2 2021-02-01 11
# 3 2021-03-01 18
# 4 2021-04-01 6
# 5 2021-05-01 20
# 6 2021-06-01 5
# 7 2021-07-01 12
# 8 2021-08-01 4
# 9 2021-09-01 11
# 10 2021-10-01 18
# 11 2021-11-01 6
# 12 2021-12-01 20
# 13 2022-01-01 14
# 14 2022-02-01 11
# 15 2022-03-01 18
# 16 2022-04-01 9
# 17 2022-05-01 22
# 18 2022-06-01 19
# 19 2022-07-01 22
# 20 2022-08-01 24
# 21 2022-09-01 17
# 22 2022-10-01 28
# 23 2022-11-01 16
# 24 2022-12-01 26
I have a dataset where I have been able to loop over different test values with dpois. For simplicity's sake, I have used an average of 4 events per month and I wanted to know what is the likelihood of n or more events, given the average. Here is what I have managed to make work:
MonthlyAverage <- 4
cnt <- c(0:10)
for (i in cnt) {
CountProb <- ppois(cnt,MonthlyAverage,lower.tail=FALSE)
}
dfProb <- data.frame(cnt,CountProb)
I am interested in investigating this to figure out how many events I may expect each month given the mean of that month.
I would be looking to say:
For January, what is the probability of 0
For January, what is the probability of 1
For January, what is the probability of 2
etc...
For February, what is the probability of 0
For February, what is the probability of 1
For February, what is the probability of 2
etc.
To give something like (numbers here are just an example):
I thought about trying one loop to select the correct month and then remove the month column so I am just left with the single "Monthly Average" value and then performing the count loop, but that doesn't seem to work. I still get "Non-numeric argument to mathematical function". I feel like I'm close, but can anyone please point me in the right direction for the formatting?
a "tidy-style" solution:
library(tidyr)
library(dplyr)
## example data:
df <- data.frame(Month = c('Jan', 'Feb'),
MonthlyAverage = c(5, 2)
)
> df
Month MonthlyAverage
1 Jan 5
2 Feb 2
df |>
mutate(n = list(1:10)) |>
unnest_longer(n) |>
mutate(CountProb = ppois(n, MonthlyAverage,
lower.tail=FALSE
)
)
# A tibble: 20 x 4
Month MonthlyAverage n CountProb
<chr> <dbl> <int> <dbl>
1 Jan 5 1 0.960
2 Jan 5 2 0.875
3 Jan 5 3 0.735
4 Jan 5 4 0.560
5 Jan 5 5 0.384
6 Jan 5 6 0.238
## ...
How about something like this:
cnt <- 0:10
MonthlyAverage <- c(1.8, 1.56, 2.44, 1.86, 2.1, 2.3, 2, 2.78, 1.89, 1.86, 1.4, 1.71)
grid <- expand.grid(cnt =cnt, m_num = 1:12)
grid$MonthlyAverage <- MonthlyAverage[grid$m_num]
mnames <- c("Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec")
grid$month <- mnames[grid$m_num]
grid$prob <- ppois(grid$cnt, grid$MonthlyAverage, lower.tail=FALSE)
grid[,c("month", "cnt", "prob")]
#> month cnt prob
#> 1 Jan 0 8.347011e-01
#> 2 Jan 1 5.371631e-01
#> 3 Jan 2 2.693789e-01
#> 4 Jan 3 1.087084e-01
#> 5 Jan 4 3.640666e-02
#> 6 Jan 5 1.037804e-02
#> 7 Jan 6 2.569450e-03
#> 8 Jan 7 5.615272e-04
#> 9 Jan 8 1.097446e-04
#> 10 Jan 9 1.938814e-05
#> 11 Jan 10 3.123964e-06
#> 12 Feb 0 7.898639e-01
#> 13 Feb 1 4.620517e-01
#> 14 Feb 2 2.063581e-01
#> 15 Feb 3 7.339743e-02
#> 16 Feb 4 2.154277e-02
#> 17 Feb 5 5.364120e-03
#> 18 Feb 6 1.157670e-03
#> 19 Feb 7 2.202330e-04
#> 20 Feb 8 3.743272e-05
#> 21 Feb 9 5.747339e-06
#> 22 Feb 10 8.044197e-07
#> 23 Mar 0 9.128391e-01
#> 24 Mar 1 7.001667e-01
#> 25 Mar 2 4.407062e-01
#> 26 Mar 3 2.296784e-01
#> 27 Mar 4 1.009515e-01
#> 28 Mar 5 3.813271e-02
#> 29 Mar 6 1.258642e-02
#> 30 Mar 7 3.681711e-03
#> 31 Mar 8 9.657751e-04
#> 32 Mar 9 2.294546e-04
#> 33 Mar 10 4.979244e-05
#> 34 Apr 0 8.443274e-01
#> 35 Apr 1 5.547763e-01
#> 36 Apr 2 2.854938e-01
#> 37 Apr 3 1.185386e-01
#> 38 Apr 4 4.090445e-02
#> 39 Apr 5 1.202455e-02
#> 40 Apr 6 3.071778e-03
#> 41 Apr 7 6.928993e-04
#> 42 Apr 8 1.398099e-04
#> 43 Apr 9 2.550478e-05
#> 44 Apr 10 4.244028e-06
#> 45 May 0 8.775436e-01
#> 46 May 1 6.203851e-01
#> 47 May 2 3.503686e-01
#> 48 May 3 1.613572e-01
#> 49 May 4 6.212612e-02
#> 50 May 5 2.044908e-02
#> 51 May 6 5.862118e-03
#> 52 May 7 1.486029e-03
#> 53 May 8 3.373058e-04
#> 54 May 9 6.927041e-05
#> 55 May 10 1.298297e-05
#> 56 Jun 0 8.997412e-01
#> 57 Jun 1 6.691458e-01
#> 58 Jun 2 4.039612e-01
#> 59 Jun 3 2.006529e-01
#> 60 Jun 4 8.375072e-02
#> 61 Jun 5 2.997569e-02
#> 62 Jun 6 9.361934e-03
#> 63 Jun 7 2.588841e-03
#> 64 Jun 8 6.415773e-04
#> 65 Jun 9 1.439431e-04
#> 66 Jun 10 2.948727e-05
#> 67 Jul 0 8.646647e-01
#> 68 Jul 1 5.939942e-01
#> 69 Jul 2 3.233236e-01
#> 70 Jul 3 1.428765e-01
#> 71 Jul 4 5.265302e-02
#> 72 Jul 5 1.656361e-02
#> 73 Jul 6 4.533806e-03
#> 74 Jul 7 1.096719e-03
#> 75 Jul 8 2.374473e-04
#> 76 Jul 9 4.649808e-05
#> 77 Jul 10 8.308224e-06
#> 78 Aug 0 9.379615e-01
#> 79 Aug 1 7.654944e-01
#> 80 Aug 2 5.257652e-01
#> 81 Aug 3 3.036162e-01
#> 82 Aug 4 1.492226e-01
#> 83 Aug 5 6.337975e-02
#> 84 Aug 6 2.360590e-02
#> 85 Aug 7 7.809999e-03
#> 86 Aug 8 2.320924e-03
#> 87 Aug 9 6.254093e-04
#> 88 Aug 10 1.540564e-04
#> 89 Sep 0 8.489282e-01
#> 90 Sep 1 5.634025e-01
#> 91 Sep 2 2.935807e-01
#> 92 Sep 3 1.235929e-01
#> 93 Sep 4 4.327373e-02
#> 94 Sep 5 1.291307e-02
#> 95 Sep 6 3.349459e-03
#> 96 Sep 7 7.672845e-04
#> 97 Sep 8 1.572459e-04
#> 98 Sep 9 2.913775e-05
#> 99 Sep 10 4.925312e-06
#> 100 Oct 0 8.443274e-01
#> 101 Oct 1 5.547763e-01
#> 102 Oct 2 2.854938e-01
#> 103 Oct 3 1.185386e-01
#> 104 Oct 4 4.090445e-02
#> 105 Oct 5 1.202455e-02
#> 106 Oct 6 3.071778e-03
#> 107 Oct 7 6.928993e-04
#> 108 Oct 8 1.398099e-04
#> 109 Oct 9 2.550478e-05
#> 110 Oct 10 4.244028e-06
#> 111 Nov 0 7.534030e-01
#> 112 Nov 1 4.081673e-01
#> 113 Nov 2 1.665023e-01
#> 114 Nov 3 5.372525e-02
#> 115 Nov 4 1.425330e-02
#> 116 Nov 5 3.201149e-03
#> 117 Nov 6 6.223149e-04
#> 118 Nov 7 1.065480e-04
#> 119 Nov 8 1.628881e-05
#> 120 Nov 9 2.248494e-06
#> 121 Nov 10 2.828495e-07
#> 122 Dec 0 8.191342e-01
#> 123 Dec 1 5.098537e-01
#> 124 Dec 2 2.454189e-01
#> 125 Dec 3 9.469102e-02
#> 126 Dec 4 3.025486e-02
#> 127 Dec 5 8.217692e-03
#> 128 Dec 6 1.937100e-03
#> 129 Dec 7 4.028407e-04
#> 130 Dec 8 7.489285e-05
#> 131 Dec 9 1.258275e-05
#> 132 Dec 10 1.927729e-06
Created on 2023-01-09 by the reprex package (v2.0.1)
If you have each month's mean, in base R you could easily use sapply to estimate the probability of obtaining values 0 to 10 using each month's mean value. Then you can simply combine it in a data frame:
# Data
df <- data.frame(month = month.name,
mean = c(1.8, 2.8, 1.7, 1.6, 1.8, 2,
2.3, 2.4, 2.1, 1.4, 1.9, 1.9))
probs <- sapply(1:12, function(x) ppois(0:10, df$mean[x], lower.tail = FALSE))
finaldata <- data.frame(month = rep(month.name, each = 11),
events = rep(0:10, times = 12),
prob = prob = as.vector(probs))
Output:
# month events prob
# 1 January 0 8.347011e-01
# 2 January 1 5.371631e-01
# 3 January 2 2.693789e-01
# 4 January 3 1.087084e-01
# 5 January 4 3.640666e-02
# 6 January 5 1.037804e-02
# 7 January 6 2.569450e-03
# 8 January 7 5.615272e-04
# 9 January 8 1.097446e-04
# 10 January 9 1.938814e-05
# 11 January 10 3.123964e-06
# 12 February 0 9.391899e-01
# 13 February 1 7.689218e-01
# 14 February 2 5.305463e-01
# 15 February 3 3.080626e-01
# ...
# 131 December 9 3.044317e-05
# 132 December 10 5.172695e-06
I have a time series object ts. I have mentioned the entire object here. It has data from Jan 2013 to Dec 2017 for all years. I am trying to find the daily average value so that the value is divided by the number of days in a month.
Expected output
The first value for Jan 2013 in ts is 23770, I want the value to be 23770/31 where 31 is the number of days in Jan, second value for Feb 2013 is 23482. I want the value to be 23482/28 as 28 was the number of days in Feb 2013 and so on
Tried so far:
I know monthdays() can do this. Something like ts/monthdays() .Monthdays() returns number of days in a month. I am not able to implement it here. Read about this tapply somewhere but it is not giving me desired result, since i need values corresponding to each month year combination.
ts
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
2013 23770 23482 23601 22889 23401 24240 23873 23647 23378 23871 22624 23496
2014 26765 27619 26341 27320 27389 27418 26874 27005 27538 26324 27267 27583
2015 28354 27452 28336 28998 28595 28338 27806 28660 27226 28317 28666 28574
2016 30209 30659 31554 30248 30358 31091 30389 30247 31227 31839 30602 30609
2017 32180 32203 31639 31784 32375 30856 31863 32827 32506 31702 31681 32176
> cycle(ts_actual_group2)
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
2013 1 2 3 4 5 6 7 8 9 10 11 12
2014 1 2 3 4 5 6 7 8 9 10 11 12
2015 1 2 3 4 5 6 7 8 9 10 11 12
2016 1 2 3 4 5 6 7 8 9 10 11 12
2017 1 2 3 4 5 6 7 8 9 10 11 12
Using tapply since i read it , but this is not giving desired output
tapply(ts_actual_group2, cycle(ts_actual_group2), mean)
1 2 3 4 5 6 7 8 9 10 11 12
28255.6 28283.0 28294.2 28247.8 28423.6 28388.6 28161.0 28477.2 28375.0 28410.6 28168.0 28487.6
I am not able to implement it here.
I'm not sure why you couldn't. The monthdays function from the forecast package, when applied to a ts object, returns the number of days in each month of the series. The object returned is a time-series of the same dimension as the input. So you can simply divide them.
library(forecast)
ts/monthdays(ts)
Jan Feb Mar Apr May Jun Jul
2013 766.7742 838.6429 761.3226 762.9667 754.8710 808.0000
2014 863.3871 986.3929 849.7097 910.6667 883.5161 913.9333
2015 914.6452 980.4286 914.0645 966.6000 922.4194 944.6000
2016 974.4839 1057.2069 1017.8710 1008.2667 979.2903 1036.3667
2017 1038.0645 1150.1071 1020.6129 1059.4667 1044.3548 1028.5333
monthsdays(ts) # Accepts a time-series object
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
2013 31 28 31 30 31 30 31 31 30 31 30 31
2014 31 28 31 30 31 30 31 31 30 31 30 31
2015 31 28 31 30 31 30 31 31 30 31 30 31
2016 31 29 31 30 31 30 31 31 30 31 30 31
2017 31 28 31 30 31 30 31 31 30 31 30 31
I am running a GAMM using package mgcv. The model is running fine and gives an output that makes sense, but when I use vis.gam(plot.type="persp") my graph appears like this:
enter image description here
Why is this happening? When I use vis.gam(plot.type="contour") there is no area which is transparent.
It appears to not simply be a problem with the heat color pallete; the same thing happens when I change the color scheme of the "persp" plot:
persp plot, "topo" colour
The contour plot is completely filled while the persp plot is still transparent at the top.
Data:
logcpue assnage distkm fsamplingyr
1 -1.5218399 7 3.490 2015
2 -1.6863990 4 3.490 2012
3 -1.4534337 6 3.490 2014
4 -1.5207723 5 3.490 2013
5 -2.4061258 2 3.490 2010
6 -2.5427262 3 3.490 2011
7 -1.6177367 3 3.313 1998
8 -4.4067192 10 3.313 2005
9 -4.3438054 11 3.313 2006
10 -2.8834031 7 3.313 2002
11 -2.3182512 2 3.313 1997
12 -4.1108738 1 3.235 2010
13 -2.0149030 3 3.235 2012
14 -1.4900912 6 3.235 2015
15 -3.7954892 2 3.235 2011
16 -1.6499840 4 3.235 2013
17 -1.9924302 5 3.235 2014
18 -1.2122716 4 3.189 1998
19 -0.6675703 3 3.189 1997
20 -4.7957905 7 3.106 1998
21 -3.8763958 6 3.106 1997
22 -1.2205021 4 3.073 2010
23 -1.9262374 7 3.073 2013
24 -3.3463891 9 3.073 2015
25 -1.7805862 2 3.073 2008
26 -3.2451931 8 3.073 2014
27 -1.4441139 5 3.073 2011
28 -1.4395389 6 3.073 2012
29 -1.6357552 4 2.876 2014
30 -1.3449091 5 2.876 2015
31 -2.3782225 3 2.876 2013
32 -4.4886364 1 2.876 2011
33 -2.6026897 2 2.876 2012
34 -3.5765503 1 2.147 2002
35 -4.8040211 9 2.147 2010
36 -1.3993664 5 2.147 2006
37 -1.2712250 4 2.147 2005
38 -1.8495790 7 2.147 2008
39 -2.5073795 1 2.034 2012
40 -2.0654553 4 2.034 2015
41 -3.6309855 2 2.034 2013
42 -2.2643639 3 2.034 2014
43 -2.2643639 6 1.452 2006
44 -3.3900241 8 1.452 2008
45 -4.9628446 2 1.452 2002
46 -2.0088240 5 1.452 2005
47 -3.9186675 1 1.323 2013
48 -4.3438054 2 1.323 2014
49 -3.5695327 3 1.323 2015
50 -1.6986690 7 1.200 2005
51 -3.2451931 8 1.200 2006
52 -0.9024016 4 1.200 2002
library(mgcv)
f1 <- formula(logcpue ~ s(assnage)+distkm)
m1 <- gamm(f1,random = list(fsamplingyr =~ 1),
method = "REML",
data =ycsnew)
vis.gam(m1$gam,color="topo",plot.type = "persp",theta=180)
vis.gam(m1$gam,color="heat",plot.type = "persp",theta=180)
vis.gam(m1$gam,view=c("assnage","distkm"),
plot.type="contour",color="heat",las=1)
vis.gam(m1$gam,view=c("assnage","distkm"),
plot.type="contour",color="terrain",las=1,contour.col="black")
The code of vis.gam has this:
surf.col[surf.col > max.z * 2] <- NA
I am unable to understand what it is doing and it appears to be rather ad_hoc. NA values of colors are generally transparent. If you comment out that line (and assign the environment of the new function as:
environment(vis.gam2) <- environment(vis.gam)
.... you get complete coloring of the surface.
My question is trivial but some how I cannot find how to sort numbers. I would like it to be order by group and rank (1,2,3,4,5,6,7,8,9,10,11,12,13,14)
means <- ddply(Data, ~Group ~rank, summarise, mean=mean(Foo))
#My column types
str(means)
#'data.frame': 56 obs. of 3 variables:
# $ Group: chr "dEC" "dEC" "dEC" "dEC" ...
# $ rank : chr "1" "10" "11" "12" ...
# $ mean : num 41.4 67.4 NA 65.9 71.3 ...
#means
Group rank mean
1 dEC 1 41.37500
2 dEC 10 67.37500
3 dEC 11 NA
4 dEC 12 65.88889
5 dEC 13 71.33333
6 dEC 14 69.87500
7 dEC 2 60.87500
8 dEC 3 65.75000
9 dEC 4 66.00000
10 dEC 5 64.50000
11 dEC 6 70.25000
12 dEC 7 66.75000
13 dEC 8 65.12500
14 dEC 9 68.75000
15 Sham - dEC 1 46.90909
16 Sham - dEC 10 67.54545
17 Sham - dEC 11 68.90909
18 Sham - dEC 12 70.00000
19 Sham - dEC 13 68.36364
20 Sham - dEC 14 71.27273
21 Sham - dEC 2 55.72727
22 Sham - dEC 3 62.09091
23 Sham - dEC 4 61.54545
24 Sham - dEC 5 66.09091
25 Sham - dEC 6 67.63636
26 Sham - dEC 7 66.09091
27 Sham - dEC 8 65.90909
28 Sham - dEC 9 65.81818
#Desired results
#Ordered means
Group rank mean
1 dEC 1 41.37500
7 dEC 2 60.87500
8 dEC 3 65.75000
9 dEC 4 66.00000
10 dEC 5 64.50000
11 dEC 6 70.25000
12 dEC 7 66.75000
13 dEC 8 65.12500
14 dEC 9 68.75000
2 dEC 10 67.37500
3 dEC 11 NA
4 dEC 12 65.88889
5 dEC 13 71.33333
6 dEC 14 69.87500
15 Sham - dEC 1 46.90909
21 Sham - dEC 2 55.72727
22 Sham - dEC 3 62.09091
23 Sham - dEC 4 61.54545
24 Sham - dEC 5 66.09091
25 Sham - dEC 6 67.63636
26 Sham - dEC 7 66.09091
27 Sham - dEC 8 65.90909
28 Sham - dEC 9 65.81818
16 Sham - dEC 10 67.54545
17 Sham - dEC 11 68.90909
18 Sham - dEC 12 70.00000
19 Sham - dEC 13 68.36364
20 Sham - dEC 14 71.27273
The rank column was not numeric. So, we convert that to 'numeric' from 'character' class and order the columns 'Group' and 'rank'
means[with(means, order(Group,as.numeric(rank))),]
Or another option would be arrange from plyr (as commented by #Wistar)
library(plyr)
arrange(means, Group, as.numeric(rank))
If we are using dplyr, all the steps can be chained together (not tested)
library(dplyr)
Data %>%
group_by(Group, rank) %>%
summarise(mean=mean(Foo)) %>%
arrange(Group, as.numeric(rank))