How to dynamically loop through a split dataframe in R - r

I have dataframe df3
df3
x d y
1 bbc Sep 2020 123
2 rsb Sep 2020 234
3 atc Sep 2020 345
4 svc Sep 2020 543
5 mwe Sep 2020 567
6 bpa Oct 2020 322
7 mwe Oct 2020 456
8 uhs Oct 2020 786
9 se Oct 2020 543
10 db Oct 2020 778
11 rsb Nov 2020 358
12 svc Nov 2020 678
13 db Nov 2020 321
14 rb Nov 2020 689
15 bpa Nov 2020 765
After applying split function `df5 = split(df3,f=df3$d)
> df5 = split(df3,f=df3$d)
> df5
$`Sep 2020`
x d y
1 bbc Sep 2020 123
2 rsb Sep 2020 234
3 atc Sep 2020 345
4 svc Sep 2020 543
5 mwe Sep 2020 567
$`Oct 2020`
x d y
6 bpa Oct 2020 322
7 mwe Oct 2020 456
8 uhs Oct 2020 786
9 se Oct 2020 543
10 db Oct 2020 778
$`Nov 2020`
x d y
11 rsb Nov 2020 358
12 svc Nov 2020 678
13 db Nov 2020 321
14 rb Nov 2020 689
15 bpa Nov 2020 765
I would like to dynamically loop through the split dataframe.
I need to find out if any values present in Nov 2020 are also present in Oct 2020.
if it is present in both, then have to check the previous one Sep 2020, and also find the number of times the names have occurred. Here df3$d is in as.yearmon format. If any names in df5[["Nov 2020"]]$x are present in df5[["Sep 2020"]]$x, extract and store it in an object along with its count. here the count is 2 since it would be present in Nov 2020 and Oct 2020. Only if the names are present in the most recent month, it should check the previous months. For this example, the output should be
> df4
names_present present_for
1 bpa 2
2 db 2
Thank you in advance

Related

Find the number of times a name occurs in each 'month year' using R

I have a dataframe like
x d y
bbc Sep 2020 123
rsb Sep 2020 234
atc Sep 2020 345
svc Sep 2020 543
mwe Sep 2020 567
bpa Oct 2020 322
mwe Oct 2020 456
uhs Oct 2020 786
se Oct 2020 543
db Oct 2020 778
rsb Nov 2020 358
svc Nov 2020 678
db Nov 2020 321
rb Nov 2020 689
bpa Nov 2020 765
The column 'd' is in as.yearmon format
I wanna find out the values in x column that are repeated for each month year column d.
In this dataframe bpa and db are present in Nov 2020 and Oct 2020. So the output needs to be like
names_present present_for
bpa 2
db 2
The values of x present in each month year data should be given as output only if they are present continuosly. Here svc is present in Nov 2020 and Sep 2020 but it cannot be considered as a part of the output since it is absent in Oct 2020.
I tried splitting the dataframe based on the df$d column but I couldn't loop through the split dataframes and get the output required. the split dataframe looks like
$`Nov 2020`
x d y
11 rsb Nov 2020 358
12 svc Nov 2020 678
13 db Nov 2020 321
14 rb Nov 2020 689
15 bpa Nov 2020 765
$`Oct 2020`
x d y
6 bpa Oct 2020 322
7 mwe Oct 2020 456
8 uhs Oct 2020 786
9 se Oct 2020 543
10 db Oct 2020 778
$`Sep 2020`
x d y
1 bbc Sep 2020 123
2 rsb Sep 2020 234
3 atc Sep 2020 345
4 svc Sep 2020 543
5 mwe Sep 2020 567
The code needs to first check in the most recent month year followed by the previous month year and so on. Check if any of the x names are present in Nov 2020 first and then followed by Oct 2020 and then Sep 2020 and count their occurrences. if the names are present in Nov 2020 and Sep 2020 but not in Oct 2020, then it cannot be considered as a part of the output dataframe. There can be any number of month year values. The dataframe given here is just a small sample. I wanna dynamically find out this information.
I've been struggling with this for a long time. Would be great if anyone could help me solve this. Thank you in advance.

Dealing with nonexistent data when converting to time-series in CRAN R

I have got following data set and I am trying to convert the consumption to time series. Some of the data are nonexistent (e.g. there is no data for 10/2014).
year month consumption
2014 7 10617
2014 8 8318
2014 9 3199
2014 12 2066
2015 1 10825
2015 2 3096
2015 3 1665
2015 4 3651
2015 5 5807
2015 7 2951
2015 8 5885
2015 9 3653
2015 10 4266
2015 11 9706
when I use ts() in R, the wrong values are replaced for nonexistent months.
ts(mkt$consumptions, start = c(2014,7),end=c(2015,11), frequency=12)
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
2014 10617 8318 3199 2066 10825 3096
2015 1665 3651 5807 2951 5885 3653 4266 9706 10617 8318 3199
,y question is how to simply replace the nonexistent values with zero or blank?
"ts" class requires that the data be regularly spaced, i.e. every month should be present or NA but that is not the case here. The zoo package can handle irregularly spaced series. Read the input into zoo using the "yearmon" class for the year/month and then simply use it as a "zoo" series or else convert it to "ts". If the input is in a file but otherwise is exactly the same as in Lines then replace text = Lines with something like "myfile.dat" .
Lines <- "year month consumption
2014 7 10617
2014 8 8318
2014 9 3199
2014 12 2066
2015 1 10825
2015 2 3096
2015 3 1665
2015 4 3651
2015 5 5807
2015 7 2951
2015 8 5885
2015 9 3653
2015 10 4266
2015 11 9706"
library(zoo)
toYearmon <- function(y, m) as.yearmon(paste(y, m), "%Y %m")
z <- read.zoo(text = Lines, header = TRUE, index = 1:2, FUN = toYearmon)
as.ts(z)

How to switch a row dimension to a column dimension in R? [duplicate]

This question already has answers here:
How to reshape data from long to wide format
(14 answers)
Closed 7 years ago.
I have a df looks like below.
Year Month Cont
1 2011 Apr 1376
2 2012 Apr 1232
3 2013 Apr 1360
4 2014 Apr 1294
5 2015 Apr 1344
6 2011 Aug 1933
7 2012 Aug 1930
8 2013 Aug 1821
9 2014 Aug 1845
10 2015 Aug 1855
So my question is how can I switch the rows in "Month" the column. The result should look like this.
Cont Apr Aug
1 2011 1376 1933
2 2012 1232 1930
3 2013 1360 1821
4 2014 1294 1845
5 2015 1344 1855
You can use reshape2:
library(reshape2)
dcast(df, Year~Month, value.var="Cont")
Or tidyr:
library(tidyr)
spread(df, Month, Cont)
Please refer the following code
> dat <- read.table("data.txt", quote="\"", comment.char="")
> dat
V1 V2 V3
1 2011 Apr 1376
2 2012 Apr 1232
3 2013 Apr 1360
4 2014 Apr 1294
5 2015 Apr 1344
6 2011 Aug 1933
7 2012 Aug 1930
8 2013 Aug 1821
9 2014 Aug 1845
10 2015 Aug 1855
> library(reshape2)
> dcast(dat, V1~V2)
Using V3 as value column: use value.var to override.
V1 Apr Aug
1 2011 1376 1933
2 2012 1232 1930
3 2013 1360 1821
4 2014 1294 1845
5 2015 1344 1855

Summarising data frame using maximum counts

I have this data.frame:
counts <- data.frame(year = sort(rep(2000:2009, 12)), month = rep(month.abb,10), count = sample(1:500, 120, replace = T))
First 20 rows of data:
head(counts, 20)
year month count
1 2000 Jan 14
2 2000 Feb 182
3 2000 Mar 462
4 2000 Apr 395
5 2000 May 107
6 2000 Jun 127
7 2000 Jul 371
8 2000 Aug 158
9 2000 Sep 147
10 2000 Oct 41
11 2000 Nov 141
12 2000 Dec 27
13 2001 Jan 72
14 2001 Feb 7
15 2001 Mar 40
16 2001 Apr 351
17 2001 May 342
18 2001 Jun 81
19 2001 Jul 442
20 2001 Aug 389
Lets say I try to calculate the standard deviation of these data using the usual R code:
library(plyr)
ddply(counts, .(month), summarise, s.d. = sd(count))
month s.d.
1 Apr 145.3018
2 Aug 140.9949
3 Dec 173.9406
4 Feb 127.5296
5 Jan 148.2661
6 Jul 162.4893
7 Jun 133.4383
8 Mar 125.8425
9 May 168.9517
10 Nov 93.1370
11 Oct 167.9436
12 Sep 166.8740
This gives the standard deviation around the mean of each month. How can I get R to output standard deviation around maximum value of each month?
you want: "max of values per month and the average from this maximum value" [which is not the same as the standard deviation].
counts <- data.frame(year = sort(rep(2000:2009, 12)), month = rep(month.abb,10), count = sample(1:500, 120, replace = T))
library(data.table)
counts=data.table(counts)
counts[,mean(count-max(count)),by=month]
This question is highly vague. If you want to calculate the standard deviation of the differences to the maximum, you can use this code:
> library(plyr)
> ddply(counts, .(month), summarise, sd = sd(count - max(count)))
month sd
1 Apr 182.5071
2 Aug 114.3068
3 Dec 117.1049
4 Feb 184.4638
5 Jan 138.1755
6 Jul 167.0677
7 Jun 100.8841
8 Mar 144.8724
9 May 173.3452
10 Nov 132.0204
11 Oct 127.4645
12 Sep 152.2162

Order dataframe by month

I have calculated the maximum counts per month in this data.frame:
counts <- data.frame(year = sort(rep(2000:2009, 12)), month = rep(month.abb,10), count = sample(1:500, 120, replace = T))
library(plyr)
count_max <- ddply(counts, .(month), summarise, max.count = max(count))
month max.count
1 Apr 470
2 Aug 389
3 Dec 446
4 Feb 487
5 Jan 473
6 Jul 460
7 Jun 488
8 Mar 449
9 May 488
10 Nov 464
11 Oct 483
12 Sep 394
I now want to sort count_max by the month.abb vector, so that month is in the usual order Jan-Dec. This is what I tried:
count_max[match(count_max$month, month.abb),]
...but it didn't work. How can I arrange count_max$month in the order Jan-Dec?
An alternate without conversion:
count_max[order(match(count_max$month, month.abb)), ]
# month max.count
# 5 Jan 466
# 4 Feb 356
# 8 Mar 496
# 1 Apr 489
# 9 May 498
# 7 Jun 497
# 6 Jul 491
# 2 Aug 446
# 12 Sep 414
# 11 Oct 490
# 10 Nov 416
# 3 Dec 475
Note that in your example, match(count...) returns the position of a given month in month.abb, which is what you want to sort by. You came real close, but instead of sorting by that vector, you subsetted by it. So, for example, August is the 2nd value in your original DF, but the 8th value in month.abb, so the match value for the 2nd value in your subset vector is 8, which means you are going to put the 8th row of your original data frame (in your case March), into the second position of your new DF, instead of ranking the 2nd row in your original DF into 8th position of the new one.
The distinction is a bit of a brain twister, but if you think it through it should make sense.
Convert your "month" column into an ordered factor:
factor(count_max$month, month.abb, ordered=TRUE)
# [1] Apr Aug Dec Feb Jan Jul Jun Mar May Nov Oct Sep
# Levels: Jan < Feb < Mar < Apr < May < Jun < Jul < Aug < Sep < Oct < Nov < Dec
Example:
count_max$month <- factor(count_max$month, month.abb, ordered=TRUE)
count_max[order(count_max$month), ]
# month max.count
# 5 Jan 482
# 4 Feb 408
# 8 Mar 483
# 1 Apr 489
# 9 May 369
# 7 Jun 432
# 6 Jul 344
# 2 Aug 470
# 12 Sep 474
# 11 Oct 450
# 10 Nov 492
# 3 Dec 366

Resources