Weighted price in a data.frame with R - r

I have a weekly dataset of prices of a product. This product has many varieties, each with its own price. I am interested in calculating a weighted price depending on the sales volume of each.
I tried to do with a loop, but does not work.
Can someone help me?
Here, a minimal example of my dataset:
Any
nrow week variety price volume
1 10 Semiduro 911 15550
2 10 Semiduro 809 13400
3 10 Semiduro 611 15200
4 10 Semiduro 517 17250
5 10 Semiduro 389 4550
6 10 Semiduro 300 1500
7 10 Paisana(o) 1100 19200
8 10 Paisana(o) 726 22900
9 10 Paisana(o) 452 10450
10 11 Semiduro 1362 13250
11 11 Semiduro 1163 7100
12 11 Semiduro 1032 15580
13 11 Semiduro 768 9700
14 11 Semiduro 703 3670
15 11 Semiduro 550 1450
16 11 Paisana(o) 1825 20200
17 11 Paisana(o) 1402 30650
18 11 Paisana(o) 838 9750
19 12 Semiduro 1050 11350
20 12 Semiduro 878 9200

We could use dplyr
library(dplyr)
df1 %>%
group_by(week, variety) %>%
summarise(wprice = weighted.mean(price, volume))
# week variety wprice
# <int> <chr> <dbl>
#1 10 Paisana(o) 808.1598
#2 10 Semiduro 673.5663
#3 11 Paisana(o) 1452.2574
#4 11 Semiduro 1048.4625
#5 12 Semiduro 972.9976

Related

How to transpose cells with multiple values?

I'm loading in data from Excel and there are some cells with multiple values. I would like to transpose these cells such that each value gets a row.
For instance, in my data below, I'd have 10 rows for the numbers in id and time that are currently bunched in the first row.
The other values would need to be duplicated. So, as above, I'd repeat run fish, and boat_speed ten times for the first row.
structure(list(run = c(1, 2, 3, 4, 5, 6), id = c("20 4 4 4 4 4 4 11 11 11",
"18 18 18 18 18 15 15 15 15 21 18 17 17 4 4 4 19", "8 8 8 7 7 7 7 4 4 4 4 4 4 15 15 4 4 4 4 18 18 18 18",
"7 7 7 5 16 12 12 12 4", "21 21 21 21 21 21 8 6 6 6 6 6 6 9 9 9 4 4 4 4",
"5 13 13 13 13 8"), time = c("550 1574 1575 1638 1639 1640 1641 2116 2117 2118",
"632 633 637 638 639 880 881 882 883 1365 1413 1567 1569 2204 2205 2206 2214",
"82 83 84 961 962 963 964 1527 1528 1529 1544 1545 1585 1596 1597 1649 1650 1651 1652 2001 2002 2003 2033",
"734 735 736 1119 1376 1674 1675 1676 1869", "420 421 422 423 424 425 469 926 927 936 937 938 939 1353 1354 1355 2035 2036 2037 2038",
"14 587 588 589 590 4455"), fish = c(20, 20, 20, 20, 20, 20),
boat_speed = c(0.05, 0.05, 0.05, 0.05, 0.05, 0.05)), row.names = c(NA,
-6L), class = c("tbl_df", "tbl", "data.frame"))
The tidyr::separate_rows function does exactly this. Assuming your data are stored in a data frame called df:
library(tidyverse)
df %>%
separate_rows(c(id, time))
run id time fish boat_speed
<dbl> <chr> <chr> <dbl> <dbl>
1 1 20 550 20 0.05
2 1 4 1574 20 0.05
3 1 4 1575 20 0.05
4 1 4 1638 20 0.05
5 1 4 1639 20 0.05
6 1 4 1640 20 0.05
7 1 4 1641 20 0.05
8 1 11 2116 20 0.05
9 1 11 2117 20 0.05
10 1 11 2118 20 0.05
# … with 75 more rows

How can I write a commmand in R that groups by multiple critera?

I am looking for a function where I can classify my data into five different industries given their SIC code
Permno SIC Industry
1 854
2 977
3 549
4 1231
5 3295
6 2000
7 1539
8 2549
9 3950
10 4758
11 4290
12 5498
13 5248
14 142
15 3209
16 2759
17 4859
18 2569
19 739
20 4529
It could be that all SICS between 100-200 and 400-700 should be in Industry 1, all SICs between 300-350 and 980-1020 should be in Industry 2 etc.
So in short - an 'If = or' function where I could list all the SICs that could match a given industry
Thank you!
You can add a new column with the filters by number:
For example:
data$Group <- 0
data[data$SCIS < 1000, data$Group == 1]
data[data$SCIS >= 1000, data$Group == 2 ]
floor the value after dividing the SIC value by 1000.
df$Industry <- floor(df$SIC/1000) + 1
df
# Permno SIC Industry
#1 1 854 1
#2 2 977 1
#3 3 549 1
#4 4 1231 2
#5 5 3295 4
#6 6 2000 3
#7 7 1539 2
#8 8 2549 3
#9 9 3950 4
#10 10 4758 5
#11 11 4290 5
#12 12 5498 6
#13 13 5248 6
#14 14 142 1
#15 15 3209 4
#16 16 2759 3
#17 17 4859 5
#18 18 2569 3
#19 19 739 1
#20 20 4529 5
If there is no way to programmatically define groups you may need to individually define the ranges. It is convenient to do this with case_when in dplyr.
library(dplyr)
df %>%
mutate(Industry = case_when(between(SIC, 100, 200) | between(SIC, 400, 700) ~ 'Industry 1',
between(SIC, 300, 350) | between(SIC, 980, 1020) ~ 'Industry 2'))

Aggregate a data frame on variance

Say I have this data frame, df,
Day value
1 2012-06-10 552
2 2012-06-10 4850
3 2012-06-11 4642
4 2012-06-11 4132
5 2012-06-11 4190
6 2012-06-12 4186
7 2012-06-13 1139
8 2012-06-13 490
9 2012-06-13 5156
10 2012-06-13 4430
11 2012-06-13 4447
12 2012-06-14 4256
13 2012-06-14 3856
14 2012-06-14 1163
15 2012-06-17 564
16 2012-06-17 4866
17 2012-06-17 4421
18 2012-06-19 4206
19 2012-06-20 4272
20 2012-06-20 3993
21 2012-06-20 1211
22 2012-07-21 698
23 2012-07-21 5770
24 2012-07-21 5103
25 2012-07-21 775
26 2012-07-21 5140
27 2012-07-22 4868
I would like a to create a data.frame, dfvar, that would contain the daily variance: something like:
Day Variance
1 2012-06-10 9236402
2 2012-06-11 X
3 2012-06-12 4186
4 2012-06-13 1139
5 2012-06-14 4256
6 2012-06-17 564
7 2012-06-19 4206
8 2012-06-20 4272
9 2012-07-21 698
10 2012-07-22 4868
So for example, I computed it, the entry
dfvar$Variance[1] = var(c(552, 4850))
I tried to do
dfvar <- aggregate(df, by = list(Day), FUN = var)
but this isn't the input I expected. I really want to have the variance of the values of the same day, without the other days...
Any ideas about that?
Is this what you want ?
library(dplyr)
df%>%group_by(Day)%>%dplyr::summarise(Variance=var(value))#return NA if only one value within the group
Day Variance
<fctr> <dbl>
1 2012-06-10 9236402.00
2 2012-06-11 77961.33
3 2012-06-12 NA
4 2012-06-13 4615704.30
5 2012-06-14 2829816.33
6 2012-06-17 5596946.33
7 2012-06-19 NA
8 2012-06-20 2864514.33
9 2012-07-21 6422224.70
10 2012-07-22 NA

how to calculate Riemann Sums in R?

Can any one help how to find approximate area under the curve using Riemann Sums in R?
It seems we do not have any package in R which could help.
Sample data:
MNo1 X1 Y1 MNo2 X2 Y2
1 2981 -66287 1 595 -47797
1 2981 -66287 1 595 -47797
2 2973 -66087 2 541 -47597
2 2973 -66087 2 541 -47597
3 2963 -65887 3 485 -47397
3 2963 -65887 3 485 -47397
4 2952 -65687 4 430 -47197
4 2952 -65687 4 430 -47197
5 2942 -65486 5 375 -46998
5 2942 -65486 5 375 -46998
6 2935 -65286 6 322 -46798
6 2935 -65286 6 322 -46798
7 2932 -65086 7 270 -46598
7 2932 -65086 7 270 -46598
8 2936 -64886 8 222 -46398
8 2936 -64886 8 222 -46398
9 2948 -64685 9 176 -46198
9 2948 -64685 9 176 -46198
10 2968 -64485 10 135 -45999
10 2968 -64485 10 135 -45999
11 2998 -64284 11 97 -45799
11 2998 -64284 11 97 -45799
12 3035 -64084 12 65 -45599
12 3035 -64084 12 65 -45599
13 3077 -63883 13 37 -45399
13 3077 -63883 13 37 -45399
14 3122 -63683 14 14 -45199
14 3122 -63683 14 14 -45199
15 3168 -63482 15 -5 -44999
15 3168 -63482 15 -5 -44999
16 3212 -63282 16 -20 -44799
16 3212 -63282 16 -20 -44799
17 3250 -63081 17 -31 -44599
17 3250 -63081 17 -31 -44599
18 3280 -62881 18 -38 -44399
18 3280 -62881 18 -38 -44399
19 3301 -62680 19 -43 -44199
19 3301 -62680 19 -43 -44199
20 3313 -62480 20 -45 -43999
Check this demo :
> library(zoo)
> x <- 1:10
> y <- -x^2
> Result <- sum(diff(x[x]) * rollmean(y[x], 2))
> Result
[1] -334.5
After check this question, I found function trapz() from package pracma be more efficient:
> library(pracma)
> Result.2 <- trapz(x, y)
> Result.2
[1] -334.5

Get row number data frame R

I have a dataset like this
epoch epochIndex year month
1 335 1 1850 12
2 639 2 1851 10
3 670 3 1851 11
4 366 4 1851 1
5 517 5 1851 6
6 547 6 1851 7
7 578 7 1851 8
8 1005 8 1852 10
9 1036 9 1852 11
10 1066 10 1852 12
What I would like to do is to set the Year and Month and get the correspondent row number, like
MONTH <- 12
YEAR <- 1850
ROWNUMBER = 1
Many thanks
A simple which call would be enough, e.g.:
df <- read.table(textConnection("
epoch epochIndex year month
1 335 1 1850 12
2 639 2 1851 10
3 670 3 1851 11
4 366 4 1851 1
5 517 5 1851 6
6 547 6 1851 7
7 578 7 1851 8
8 1005 8 1852 10
9 1036 9 1852 11
10 1066 10 1852 12"), header=TRUE)
which(df$year == 1850 & df$month == 12)
# [1] 1
which(df$year == 1852 & df$month == 12)
# [1] 10
Sorry I found the answer
TIMEC <- which(df$year==YEAR & df$month==MONTH)

Resources