Scraping Shopee reviews and ratings with BS4

Scraping Shopee reviews and ratings with BS4 - web-scraping

I am a business student and I want to do research on the customer behaviour in Shopee. I want to create a dataset that includes reviews and ratings. I just find out this code here, but it shows the list not the table as I expect. Could anyone please help me to create a dataset in csv file. Thank you so much for your help!
import re
import json
import requests
url = 'https://shopee.vn/-Mã-FASHIONT4MA2-giảm-10K-đơn-50K-Áo-thun-nam-nữ-form-rộng-Yinxx-áo-phông-tay-lỡ-ATL43-i.14746382.6519318270'
r = re.search(r'i\.(\d+)\.(\d+)', url)
shop_id, item_id = r[1], r[2]
ratings_url = 'https://shopee.vn/api/v2/item/get_ratings?filter=0&flag=1&itemid={item_id}&limit=20&offset={offset}&shopid={shop_id}&type=0'
offset = 0
while True:
data = requests.get(ratings_url.format(shop_id=shop_id, item_id=item_id, offset=offset)).json()
# uncomment this to print all data:
#print(json.dumps(data, indent=4))
#leng enumerate tra ket qua duoi dang liet ke
i = 1
for i, rating in enumerate(data['data']['ratings'], 1):
print(rating['author_username'])
print(rating['rating_star'])
print(rating['comment'])
print('-' * 100)
if i % 20:
break
offset += 20
'''

To create pandas dataframe and save it to csv file, you can use this example:
import re
import json
import requests
import pandas as pd
url = "https://shopee.vn/-Mã-FASHIONT4MA2-giảm-10K-đơn-50K-Áo-thun-nam-nữ-form-rộng-Yinxx-áo-phông-tay-lỡ-ATL43-i.14746382.6519318270"
r = re.search(r"i\.(\d+)\.(\d+)", url)
shop_id, item_id = r[1], r[2]
ratings_url = "https://shopee.vn/api/v2/item/get_ratings?filter=0&flag=1&itemid={item_id}&limit=20&offset={offset}&shopid={shop_id}&type=0"
offset = 0
d = {"username": [], "rating": [], "comment": []}
while True:
data = requests.get(
ratings_url.format(shop_id=shop_id, item_id=item_id, offset=offset)
).json()
# uncomment this to print all data:
# print(json.dumps(data, indent=4))
# leng enumerate tra ket qua duoi dang liet ke
i = 1
for i, rating in enumerate(data["data"]["ratings"], 1):
d["username"].append(rating["author_username"])
d["rating"].append(rating["rating_star"])
d["comment"].append(rating["comment"])
print(rating["author_username"])
print(rating["rating_star"])
print(rating["comment"])
print("-" * 100)
if i % 20:
break
offset += 20
df = pd.DataFrame(d)
print(df)
df.to_csv("data.csv", index=False)
Prints:
username rating comment
0 lthn_29122002 5 sản phẩm tuyệt vời\ngiao hàng nhanh\nđóng gói ...
1 chanchan1754 5 Giao trễ hơn dự kiến 1 ngày nhưng vì áo quá xi...
2 trangtrinhh16 5 mua của shop mấy đơn rồi, lần nào cũng mua 4-5...
3 vanchi1001001 5 Áo hình đẹp, rộng rãi. To nhưng mặc ko bị béo,...
4 nguyentangthuy 5 Tui đặt 2 cái áo 🌝 tui cao 1m68 nặng 50kg á , ...
5 ngan34538 5 Ảnh minh họa thôi nhưng áo của shop đẹp lắm fo...
6 thuhuyen100401 5 Đọc rv trên tiktok các bạn khen chất oke nhưg ...
7 kimloan2_4 5 Mình sẽ tiếp tục ủng hộ shop vải với áo khá ok...
8 g*****9 5 Chất lượng sản phẩm tuyệt vời. Đóng gói sản ph...
9 beooobb 5 Áo đẹp xỉuuuuuuu á chờiii, vải dàyyy hơi nóng ...
10 ngocmai_26 5 Sản phẩm đúng yêu cầu\nGiao hàng nhanhhhh\nĐón...
11 n*****u 5 Chất bt mỏng tiền nào của đấy okkkkkkkkkkkkkkk...
12 ahnpl 5 Vải mặc mát áo rộng thoải mái chất quá là okii...
13 n*****8 5 Với giá như thế này thì mình đánh giá là oke. ...
14 d*****0 5 chất áo khá ổn giao cũng khá nhanh nchung là o...
15 t*****8 5 Áo đẹp nhưng vải hơi dày và nóng, với giá thù ...
16 h*****2 5 Đã nhận được hàng\nCó gặp vài sự cố giao hàng,...
17 t*****2 5 form áo ok tay lỡ mình m55 trùm qua mông thoải...
18 nguyenthinam2302 5 Uy con mẹ tín, mua cho ng yêuu, ng yêu mặc vào...
19 thltdh54 5 Áo xinh lắm hihiu \nShop đóng gói cẩn thận ...
20 t.quin907 5 Áo dày dặn, form rộng đẹp, nói chung với giá n...
21 tu_chinh 5 Áo mặc thích, rộng, thoải mái, vải ok, phù hợp...
22 l*****k 5 Đẹp mua lần 2 rồi và sẽ tiếp tục ủng hộ 😂😂😂😂😂...
23 nhatquynh_2208 5 Sản phẩm rất tốt, hình ảnh chỉ mang tính chất ...
24 luongthuhang12 5 Jdlsksncjeoakcbfkalskcndoapdjcndloajxndkdpspsi...
25 l*****0 5 Sản phẩm ổn, vải dày, chắc là nóng nhưng so vớ...
26 v*****h 5 Áo đẹp nha, hình in đẹp, kbt có bền ko nhưng g...
27 v*****n 5 Lần sau sẽ ủng hộ tiếp .....dbsisgsbajkausfsbh...
28 h*****2 5 Vải áo trắng dày hơn áo đen, chỉ dư nhiều, đườ...
29 v*****y 5 Giao hàng nhanh, chuẩn bị hàng tầm 2 3 ngày, á...
30 t*****1 5 Bâhhhwhhshshjhshdjdjjdjdjdjdhdhdhdhdhdhdhdhd
31 g*****5 5 From áo ok, chất vải bình thường tương xứng vớ...
32 d*****t 5 Jjfjndbxjjxhduijdbdjxnxhxkxnxjjdnjdjdndnmcmxnx...
33 h*****n 5 Áo xinh lắm nhooooo, mình 55kg 1m61 mặc rộng v...
34 nguyenhuyenchibi 5 Gói hàng đẹp . Vận chuyển nhanh . Áo rất đẹp s...
...
and saves data.csv (screenshot from LibreOffice):

Related

Display result of predict in a 3 x 4 table

I have the following code in R
v <- c("featureA", "featureB")
newdata <- unique(data[v])
print(unique(data[v])
print(predict(model, newdata, type='response', allow.new.level = TRUE)
And I got the following result
featureA featureB
1 bucket_in_10_to_30 bucket_in_90_to_100
2 bucket_in_10_to_30 bucket_in_50_to_90
3 bucket_in_0_to_10 bucket_in_50_to_90
4 bucket_in_0_to_10 bucket_in_90_to_100
7 bucket_in_10_to_30 bucket_in_10_to_50
10 bucket_in_30_to_100 bucket_in_90_to_100
19 bucket_in_0_to_10 bucket_in_0_to_10
33 bucket_in_0_to_10 bucket_in_10_to_50
36 bucket_in_30_to_100 bucket_in_10_to_50
38 bucket_in_10_to_30 bucket_in_0_to_10
52 bucket_in_30_to_100 bucket_in_0_to_10
150 bucket_in_30_to_100 bucket_in_50_to_90
1 2 3 4 7 10 19 33 36 38 52 150
0.001920662 0.005480186 0.000961198 0.000335883 0.006311521 0.004005570 0.000620979 0.001107773 0.013100210 0.003546136 0.007382468 0.011384935
And I'm wondering if it's possible in R that I can reshape and directly get a 3 x 4 tables similar to this
feature A / features B
bucket_in_90_to_100
bucket_in_50_to_90
bucket_in_0_to_30
...
...
bucket_in_0_to_30
...
...
Thanks for the help!

Getting output in combination form in r

I have 3 variables. Hence, I will get 6 combinations. I want to produce the 2nd column in conbination form. I got the folloWing result.
P<- matrix(c (
0.427, -0.193, 0.673,
-0.193, 0.094, -0.428,
0.673, -0.428, 224.099
), c( 3,3))
G <-matrix(c(
0.238, -0.033, 0.468,
-0.033, 0.084, -0.764,
0.468, -0.764, 205.144
), c(3,3))
A<-rep(1, each=nrow(P))
df<-do.call(rbind, lapply(1:ncol(P), function(x) {
do.call(rbind, combn(ncol(P), x, function(y) {
data.frame(comb = paste(y, collapse = ""),
B = ((solve(P)%*%G)%*%A),
stringsAsFactors = FALSE)
}, simplify = FALSE))
}))
>df
comb B
1 1 -19.7814149
2 1 -44.1515387
3 1 0.8891786
4 2 -19.7814149
5 2 -44.1515387
6 2 0.8891786
7 3 -19.7814149
8 3 -44.1515387
9 3 0.8891786
10 12 -19.7814149
11 12 -44.1515387
12 12 0.8891786
13 13 -19.7814149
14 13 -44.1515387
15 13 0.8891786
16 23 -19.7814149
17 23 -44.1515387
18 23 0.8891786
19 123 -19.7814149
20 123 -44.1515387
21 123 0.8891786
Here I got only 3 (-19.7814149,-44.1515387,0.8891786) values But I wanted 12 values.
comb B
1 0.5574
2 0.8936
3 0.9154
12 10.0772, 21.233
13 0.2083 , 0.9169
23 -3.1085, 0.9061
123 -19.7814, -44.1515, 0.8892
I can't manage this. Furthermore, I want to use these B values to calculate my desired result (GA), where my formula is
b<-t(B)
gain<-do.call(rbind, lapply(1:ncol(P), function(x) {
do.call(rbind, combn(ncol(P), x, function(y) {
data.frame(GA = abs(round(1.76*(sum(G[y,y] %*% B[y] * A[y]))/ sqrt((b[y] %*%P[y,y])%*%B[y]),2)),
stringsAsFactors = FALSE)
}, simplify = FALSE))
}))
My final output is
comb B GA
1 0.5574 0.641
2 0.8936 0.4822
3 0.9154 24.1186
12 10.0772, 21.233 3.123
13 0.2083 , 0.9169 24.1748
23 -3.1085, 0.9061 24.0867
123 -19.7814, -44.1515, 0.8892 24.9097
Is there any solution?

Issue when using sqldf() to manipulate a data frame

data_processed <- sqldf(" select a.permno, a.number, a.mean, b.ret as med, a.std
from data_processed as a
left join data_processed2 as b
on a.permno=b.permno")
The code above is not working. I an getting the error below:
Error in result_create(conn#ptr, statement) : no such column: b.ret
Here is my data:
data_processed:
permno number mean std
1 10107 120 0.0117174000 0.06802718
2 11850 120 0.0024398083 0.04594591
3 12060 120 0.0005072167 0.08544500
4 12490 120 0.0063569167 0.05325215
5 14593 120 0.0200060583 0.08865493
6 19561 120 0.0154743500 0.07771348
7 25785 120 0.0184815583 0.16510082
8 27983 120 0.0025951333 0.09538822
9 55976 120 0.0092889000 0.04812975
10 59328 120 0.0098526167 0.07135423
data_processed2:
permno return
1 10107 0.0191920
2 11850 0.0015495
3 12060 -0.0040130
4 12490 0.0078245
5 14593 0.0231735
6 19561 0.0202610
7 25785 -0.0018760
8 27983 0.0027375
9 55976 0.0089435
10 59328 0.0166490

Rolling prediction in a data frame using dplyr and rollapply

My first question here :)
My goal is: Given a data frame with predictors (each column a predictor / rows observations) fit a regression using lm and then predict the value using the last observation using a rolling window.
The data frame looks like:
> DfPredictor[1:40,]
Y X1 X2 X3 X4 X5
1 3.2860 192.5115 2.1275 83381 11.4360 8.7440
2 3.2650 190.1462 2.0050 88720 11.4359 8.8971
3 3.2213 192.9773 2.0500 74130 11.4623 8.8380
4 3.1991 193.7058 2.1050 73930 11.3366 8.7536
5 3.2224 193.5407 2.0275 80875 11.3534 8.7555
6 3.2000 190.6049 2.0950 86606 11.3290 8.8555
7 3.1939 191.1390 2.0975 91402 11.2960 8.8433
8 3.1971 192.2921 2.2700 88181 11.2930 8.8681
9 3.1873 194.9700 2.3300 115959 1.9477 8.5245
10 3.2182 194.5396 2.4200 134754 11.3200 8.4990
11 3.2409 194.5396 2.2025 136685 1.9649 8.4192
12 3.2112 195.1362 2.1900 136316 1.9750 8.3752
13 3.2231 193.3560 2.2475 140295 1.9691 8.3546
14 3.2015 192.9649 2.2575 139474 1.9500 8.3116
15 3.1744 194.0154 2.1900 146202 1.8476 8.2225
16 3.1646 194.4423 2.2650 142983 1.8600 8.1948
17 3.1708 194.9473 2.2425 141377 1.8522 8.2589
18 3.1675 193.9788 2.2400 141377 1.8600 8.2600
19 3.1744 194.2563 2.3000 149875 1.8718 8.2899
20 3.1410 193.4316 2.2300 129561 1.8480 8.2395
21 3.1266 191.2633 2.2550 122636 1.8440 8.2396
22 3.1486 192.0354 2.3600 130996 1.8570 8.8640
23 3.1282 194.3351 2.4825 92430 1.7849 8.1291
24 3.1214 193.5196 2.4750 94814 1.7624 8.1991
25 3.1230 193.2017 2.3725 87590 1.7660 8.2310
26 3.1182 192.1642 2.4475 87715 1.6955 8.2414
27 3.1203 191.3744 2.3775 89857 1.6539 8.2480
28 3.1156 192.2646 2.3725 92159 1.5976 8.1676
29 3.1270 192.7555 2.3675 97425 1.5896 8.1162
30 3.1154 194.0375 2.3725 87598 1.5277 8.2640
31 3.1104 192.0596 2.3850 93236 1.5132 7.9999
32 3.0846 192.2792 2.2900 94608 1.4990 8.1600
33 3.0569 193.2573 2.3050 84663 1.4715 8.2200
34 3.0893 192.7632 2.2550 67149 1.4955 7.9590
35 3.0991 192.1229 2.3050 75519 1.4280 7.9183
36 3.0879 192.1229 2.3100 76756 1.3839 7.9133
37 3.0965 192.0502 2.2175 61748 1.3130 7.8750
38 3.0655 191.2274 2.2300 41490 1.2823 7.8656
39 3.0636 191.6342 2.1925 51049 1.1492 7.7447
40 3.1097 190.9312 2.2150 21934 1.1626 7.6895
For instance using the rolling window with width = 10 the regression should be estimate and then predict the 'Y' correspondent to the X1,X2,...,X5.
The predictions should be included in a new column 'Ypred'.
There's some way to do that using rollapply + lm/predict + mudate??
Many thanks!!

Using the data in the Note at the end and assuming that in a window of width 10 we want to predict the last Y (i..e. the 10th), then:
library(zoo)
pred <- function(x) tail(fitted(lm(Y ~., as.data.frame(x))), 1)
transform(DF, pred = rollapplyr(DF, 10, pred, by.column = FALSE, fill = NA))
giving:
Y X1 X2 X3 X4 X5 pred
1 3.2860 192.5115 2.1275 83381 11.4360 8.7440 NA
2 3.2650 190.1462 2.0050 88720 11.4359 8.8971 NA
3 3.2213 192.9773 2.0500 74130 11.4623 8.8380 NA
4 3.1991 193.7058 2.1050 73930 11.3366 8.7536 NA
5 3.2224 193.5407 2.0275 80875 11.3534 8.7555 NA
6 3.2000 190.6049 2.0950 86606 11.3290 8.8555 NA
7 3.1939 191.1390 2.0975 91402 11.2960 8.8433 NA
8 3.1971 192.2921 2.2700 88181 11.2930 8.8681 NA
9 3.1873 194.9700 2.3300 115959 1.9477 8.5245 NA
10 3.2182 194.5396 2.4200 134754 11.3200 8.4990 3.219764
11 3.2409 194.5396 2.2025 136685 1.9649 8.4192 3.241614
12 3.2112 195.1362 2.1900 136316 1.9750 8.3752 3.225423
13 3.2231 193.3560 2.2475 140295 1.9691 8.3546 3.217797
14 3.2015 192.9649 2.2575 139474 1.9500 8.3116 3.205856
15 3.1744 194.0154 2.1900 146202 1.8476 8.2225 3.177928
16 3.1646 194.4423 2.2650 142983 1.8600 8.1948 3.156405
17 3.1708 194.9473 2.2425 141377 1.8522 8.2589 3.176243
18 3.1675 193.9788 2.2400 141377 1.8600 8.2600 3.177165
19 3.1744 194.2563 2.3000 149875 1.8718 8.2899 3.177211
20 3.1410 193.4316 2.2300 129561 1.8480 8.2395 3.145533
21 3.1266 191.2633 2.2550 122636 1.8440 8.2396 3.127410
22 3.1486 192.0354 2.3600 130996 1.8570 8.8640 3.148792
23 3.1282 194.3351 2.4825 92430 1.7849 8.1291 3.124913
24 3.1214 193.5196 2.4750 94814 1.7624 8.1991 3.124992
25 3.1230 193.2017 2.3725 87590 1.7660 8.2310 3.117981
26 3.1182 192.1642 2.4475 87715 1.6955 8.2414 3.117679
27 3.1203 191.3744 2.3775 89857 1.6539 8.2480 3.119898
28 3.1156 192.2646 2.3725 92159 1.5976 8.1676 3.121039
29 3.1270 192.7555 2.3675 97425 1.5896 8.1162 3.123903
30 3.1154 194.0375 2.3725 87598 1.5277 8.2640 3.119438
31 3.1104 192.0596 2.3850 93236 1.5132 7.9999 3.113963
32 3.0846 192.2792 2.2900 94608 1.4990 8.1600 3.101229
33 3.0569 193.2573 2.3050 84663 1.4715 8.2200 3.076817
34 3.0893 192.7632 2.2550 67149 1.4955 7.9590 3.083266
35 3.0991 192.1229 2.3050 75519 1.4280 7.9183 3.089377
36 3.0879 192.1229 2.3100 76756 1.3839 7.9133 3.084225
37 3.0965 192.0502 2.2175 61748 1.3130 7.8750 3.075252
38 3.0655 191.2274 2.2300 41490 1.2823 7.8656 3.063025
39 3.0636 191.6342 2.1925 51049 1.1492 7.7447 3.068808
40 3.1097 190.9312 2.2150 21934 1.1626 7.6895 3.091819
Note: Input DF in reproducible form is:
Lines <- " Y X1 X2 X3 X4 X5
1 3.2860 192.5115 2.1275 83381 11.4360 8.7440
2 3.2650 190.1462 2.0050 88720 11.4359 8.8971
3 3.2213 192.9773 2.0500 74130 11.4623 8.8380
4 3.1991 193.7058 2.1050 73930 11.3366 8.7536
5 3.2224 193.5407 2.0275 80875 11.3534 8.7555
6 3.2000 190.6049 2.0950 86606 11.3290 8.8555
7 3.1939 191.1390 2.0975 91402 11.2960 8.8433
8 3.1971 192.2921 2.2700 88181 11.2930 8.8681
9 3.1873 194.9700 2.3300 115959 1.9477 8.5245
10 3.2182 194.5396 2.4200 134754 11.3200 8.4990
11 3.2409 194.5396 2.2025 136685 1.9649 8.4192
12 3.2112 195.1362 2.1900 136316 1.9750 8.3752
13 3.2231 193.3560 2.2475 140295 1.9691 8.3546
14 3.2015 192.9649 2.2575 139474 1.9500 8.3116
15 3.1744 194.0154 2.1900 146202 1.8476 8.2225
16 3.1646 194.4423 2.2650 142983 1.8600 8.1948
17 3.1708 194.9473 2.2425 141377 1.8522 8.2589
18 3.1675 193.9788 2.2400 141377 1.8600 8.2600
19 3.1744 194.2563 2.3000 149875 1.8718 8.2899
20 3.1410 193.4316 2.2300 129561 1.8480 8.2395
21 3.1266 191.2633 2.2550 122636 1.8440 8.2396
22 3.1486 192.0354 2.3600 130996 1.8570 8.8640
23 3.1282 194.3351 2.4825 92430 1.7849 8.1291
24 3.1214 193.5196 2.4750 94814 1.7624 8.1991
25 3.1230 193.2017 2.3725 87590 1.7660 8.2310
26 3.1182 192.1642 2.4475 87715 1.6955 8.2414
27 3.1203 191.3744 2.3775 89857 1.6539 8.2480
28 3.1156 192.2646 2.3725 92159 1.5976 8.1676
29 3.1270 192.7555 2.3675 97425 1.5896 8.1162
30 3.1154 194.0375 2.3725 87598 1.5277 8.2640
31 3.1104 192.0596 2.3850 93236 1.5132 7.9999
32 3.0846 192.2792 2.2900 94608 1.4990 8.1600
33 3.0569 193.2573 2.3050 84663 1.4715 8.2200
34 3.0893 192.7632 2.2550 67149 1.4955 7.9590
35 3.0991 192.1229 2.3050 75519 1.4280 7.9183
36 3.0879 192.1229 2.3100 76756 1.3839 7.9133
37 3.0965 192.0502 2.2175 61748 1.3130 7.8750
38 3.0655 191.2274 2.2300 41490 1.2823 7.8656
39 3.0636 191.6342 2.1925 51049 1.1492 7.7447
40 3.1097 190.9312 2.2150 21934 1.1626 7.6895"
DF <- read.table(text = Lines, header = TRUE)

R: Assigning variable to quintile on monthly basis

I am trying in R to indicate in which quintile a value of a variable is for every month of my data frame in this case based on volatility.
For each month I want to know for each stock if it is in the most volatile quintile of if it is in one of the others.
So far I have come up with the following function (see below). Unfortunately, the function only works in some cases and often gives the following error:
Error in cut.default(df$VOLATILITY, unique(breaks), label = FALSE, na.rm =TRUE):
invalid number of intervals
Could you give me some advice on how to improve this code so that it works properly.
It's relatively urgent. Many thanks!
quintilesVolByMonth <- function(x){
months<-as.vector(unique(x$DATE))
dfx<-data.frame()
for(n in seq(1,length(months))){
num<-5
print(paste("Appending month",months[n],sep=""))
df<-subset(x,DATE==months[n])
breaks<-quantile(df$VOLATILITY,probs=seq(0,1, 1/num),na.rm=TRUE)
df$volquintile <- cut(df$VOLATILITY,unique(breaks),
label=FALSE, na.rm=TRUE)
dfx<-rbind(dfx,df)
}
return(dfx)
}
Frame.Quintile <- quintilesVolByMonth(x)
EXAMPLE OF THE DATA: The last column is what I am trying to get. The data here is just an example and not actual results.
> DATE <- c("01/10/2011","01/10/2012","01/10/2010","01/08/2010","01/10/2011","01/12/2011","01/09/2011","01/10/2011","01/09/2012","01/08/2012","01/02/2010","01/01/2011","01/09/2010","01/06/2010","01/07/2010","01/01/2012","01/01/2012","01/11/2011","01/09/2011","01/10/2011")
> NAME<-c("HOEK'S MACHINE DEAD - DELIST.","WORLD SCOPE (CADB TEST STOCK)","BRILL (KON.)", "BBL DEAD - 30/06/465", "GENK LOGISTICS","GROENIJK.YLCBN. DEAD - DELIST.31/05/479", "NOORD-EUR.HOUTH.","PALTHE DEAD - 4/2/475","GENERALE BANQUE DEAD - DEL. 30/12/490","STORK DEAD - TAKEOVER 905099","LOUVAIN-LA-NEUVE","VENTOS DEAD - 06/06/384","BRAINE-LE-COMTE SUSP 14/02/460","VILENZO DEAD - 25/11/370","ECONOSTO KON. DEAD - 07/07/374","ELECTRORAIL DEAD - DELIST 21/02/387","BLYSTEIN FL.1384","OBOURG (CIMENTS)","BRUGEFI DEAD - 31/07/475","GIB NEW")
> VOLATILITY<-c(0.3383, 0.084, 0.046, 0.0945, 0.0465, 0.2008, 0.1361, 0.2183, 0.1032, 0.1083, 0.0494, 0.0538, 0.0357, 0.037, 0.0386, 0.073, 0.073, 0.0393, 0.0687, 0.3308)
> VOLQUINTILE<-c(4,1,1,2,2,3,2,3,4,2,3,2,4,1,2,1,1,2,3,4)
>
> x<-data.frame(DATE,NAME,VOLATILITY, VOLQUINTILE)
> x
DATE NAME VOLATILITY VOLQUINTILE
1 01/10/2011 HOEK'S MACHINE DEAD - DELIST. 0.3383 4
2 01/10/2012 WORLD SCOPE (CADB TEST STOCK) 0.0840 1
3 01/10/2010 BRILL (KON.) 0.0460 1
4 01/08/2010 BBL DEAD - 30/06/465 0.0945 2
5 01/10/2011 GENK LOGISTICS 0.0465 2
6 01/12/2011 GROENIJK.YLCBN. DEAD - DELIST.31/05/479 0.2008 3
7 01/09/2011 NOORD-EUR.HOUTH. 0.1361 2
8 01/10/2011 PALTHE DEAD - 4/2/475 0.2183 3
9 01/09/2012 GENERALE BANQUE DEAD - DEL. 30/12/490 0.1032 4
10 01/08/2012 STORK DEAD - TAKEOVER 905099 0.1083 2
11 01/02/2010 LOUVAIN-LA-NEUVE 0.0494 3
12 01/01/2011 VENTOS DEAD - 06/06/384 0.0538 2
13 01/09/2010 BRAINE-LE-COMTE SUSP 14/02/460 0.0357 4
14 01/06/2010 VILENZO DEAD - 25/11/370 0.0370 1
15 01/07/2010 ECONOSTO KON. DEAD - 07/07/374 0.0386 2
16 01/01/2012 ELECTRORAIL DEAD - DELIST 21/02/387 0.0730 1
17 01/01/2012 BLYSTEIN FL.1384 0.0730 1
18 01/11/2011 OBOURG (CIMENTS) 0.0393 2
19 01/09/2011 BRUGEFI DEAD - 31/07/475 0.0687 3
20 01/10/2011 GIB NEW 0.3308 4

Does this work for you?
library(plyr)
vol1<-ddply(mydata,.(DATE), transform, max.name=NAME[which.max(quantile(VOLATILITY))])
DATE NAME VOLATILITY max.name
1 01/01/2011 VENTOS DEAD - 06/06/384 0.0538 VENTOS DEAD - 06/06/384
2 01/01/2012 ELECTRORAIL DEAD - DELIST 21/02/387 0.0730 ELECTRORAIL DEAD - DELIST 21/02/387
3 01/01/2012 BLYSTEIN FL.1384 0.0730 ELECTRORAIL DEAD - DELIST 21/02/387
4 01/02/2010 LOUVAIN-LA-NEUVE 0.0494 LOUVAIN-LA-NEUVE
5 01/06/2010 VILENZO DEAD - 25/11/370 0.0370 VILENZO DEAD - 25/11/370
6 01/07/2010 ECONOSTO KON. DEAD - 07/07/374 0.0386 ECONOSTO KON. DEAD - 07/07/374
7 01/08/2010 BBL DEAD - 30/06/465 0.0945 BBL DEAD - 30/06/465
8 01/08/2012 STORK DEAD - TAKEOVER 905099 0.1083 STORK DEAD - TAKEOVER 905099
9 01/09/2010 BRAINE-LE-COMTE SUSP 14/02/460 0.0357 BRAINE-LE-COMTE SUSP 14/02/460
10 01/09/2011 NOORD-EUR.HOUTH. 0.1361 <NA>
11 01/09/2011 BRUGEFI DEAD - 31/07/475 0.0687 <NA>
12 01/09/2012 GENERALE BANQUE DEAD - DEL. 30/12/490 0.1032 GENERALE BANQUE DEAD - DEL. 30/12/490
13 01/10/2010 BRILL (KON.) 0.0460 BRILL (KON.)
14 01/10/2011 HOEK'S MACHINE DEAD - DELIST. 0.3383 <NA>
15 01/10/2011 GENK LOGISTICS 0.0465 <NA>
16 01/10/2011 PALTHE DEAD - 4/2/475 0.2183 <NA>
17 01/10/2011 GIB NEW 0.3308 <NA>
18 01/10/2012 WORLD SCOPE (CADB TEST STOCK) 0.0840 WORLD SCOPE (CADB TEST STOCK)
19 01/11/2011 OBOURG (CIMENTS) 0.0393 OBOURG (CIMENTS)
20 01/12/2011 GROENIJK.YLCBN. DEAD - DELIST.31/05/479 0.2008 GROENIJK.YLCBN. DEAD - DELIST.31/05/479
Updated solution:
library(plyr)
vol2<-ddply(x,.(DATE), transform,quantile=ifelse(VOLATILITY<quantile(VOLATILITY,p=0.25),1,
ifelse(((VOLATILITY>quantile(VOLATILITY,p=0.25))& (VOLATILITY<quantile(VOLATILITY,p=0.5))),2,ifelse(((VOLATILITY>quantile(VOLATILITY,p=0.5))& VOLATILITY<quantile(VOLATILITY,p=0.75)),3,4))))
DATE NAME VOLATILITY quantile
1 01/01/2011 VENTOS DEAD - 06/06/384 0.0538 4
2 01/01/2012 ELECTRORAIL DEAD - DELIST 21/02/387 0.0730 4
3 01/01/2012 BLYSTEIN FL.1384 0.0730 4
4 01/02/2010 LOUVAIN-LA-NEUVE 0.0494 4
5 01/06/2010 VILENZO DEAD - 25/11/370 0.0370 4
6 01/07/2010 ECONOSTO KON. DEAD - 07/07/374 0.0386 4
7 01/08/2010 BBL DEAD - 30/06/465 0.0945 4
8 01/08/2012 STORK DEAD - TAKEOVER 905099 0.1083 4
9 01/09/2010 BRAINE-LE-COMTE SUSP 14/02/460 0.0357 4
10 01/09/2011 NOORD-EUR.HOUTH. 0.1361 4
11 01/09/2011 BRUGEFI DEAD - 31/07/475 0.0687 1
12 01/09/2012 GENERALE BANQUE DEAD - DEL. 30/12/490 0.1032 4
13 01/10/2010 BRILL (KON.) 0.0460 4
14 01/10/2011 HOEK'S MACHINE DEAD - DELIST. 0.3383 4
15 01/10/2011 GENK LOGISTICS 0.0465 1
16 01/10/2011 PALTHE DEAD - 4/2/475 0.2183 2
17 01/10/2011 GIB NEW 0.3308 3
18 01/10/2012 WORLD SCOPE (CADB TEST STOCK) 0.0840 4
19 01/11/2011 OBOURG (CIMENTS) 0.0393 4
20 01/12/2011 GROENIJK.YLCBN. DEAD - DELIST.31/05/479 0.2008 4

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Scraping Shopee reviews and ratings with BS4 - web-scraping

Related

Display result of predict in a 3 x 4 table

Getting output in combination form in r

Issue when using sqldf() to manipulate a data frame

Rolling prediction in a data frame using dplyr and rollapply

R: Assigning variable to quintile on monthly basis

Categories

Resources