Convert a string to a number using jq - jq

After parsing some json I have numbers like the following
1 BTC 1.1 -1.27 4.5 12483.315628 209496088918
2 XRP -1.14 20.92 153.78 3.0061025564 116453842357
3 ETH -1.08 13.41 40.64 847.89295234 82049924696.0
4 BCH 0.51 -9.21 -5.22 2025.07027989 34210094446.0
5 ADA 1.12 14.9 205.14 0.9950722725 25799309000.0
6 XEM -0.02 20.4 100.84 1.4710629893 13239566903.0
and I would like to convert them to numbers before printing them because they're not that readable this way.
In the last but one column I'd like to truncate precision to 3 digits after the dot and in the last column I'd like to divide by 1M. In the json these fields are strings. I'm forced to use jq.
The context is this:
curl -s "https://api.coinmarketcap.com/v1/ticker/?convert=EUR&limit=20" | jq -r '.[] | [.rank, .symbol, .percent_change_1h, .percent_change_24h, .percent_change_7d, .price_eur, .market_cap_eur] | #tsv'
I added a issue on the project github, so check there for updates also.

jq + awk solution (assuming Linux environment):
curl -s "https://api.coinmarketcap.com/v1/ticker/?convert=EUR&limit=20" \
| jq -r '.[] | [.rank, .symbol, .percent_change_1h, .percent_change_24h,
.percent_change_7d, .price_eur, .market_cap_eur] | #tsv' \
| awk '{ $6=sprintf("%.3f",$6); $7=$7/(1*10^6) }1' | column -t
The output:
1 BTC 1.62 0.76 5.63 12618.627 211768
2 XRP -4.04 13.64 142.7 2.885 111773
3 ETH -1.12 11.55 40.07 845.206 81790.2
4 BCH 1.01 -7.0 -4.13 2043.695 34524.9
5 ADA -2.79 10.38 194.99 0.966 25046.2
6 XEM -1.64 17.26 96.98 1.445 13006.7
7 XLM -3.54 -9.96 267.75 0.660 11804
8 TRX -0.48 152.86 454.55 0.169 11132.6
9 LTC -1.64 -3.04 -2.97 197.775 10801.2
10 MIOTA 5.79 4.76 19.54 3.481 9675.58
11 DASH 0.73 8.52 13.65 1045.762 8155.11
12 NEO 0.13 9.17 69.28 87.454 5684.52
13 EOS -0.25 27.87 22.32 9.652 5628.04
14 XMR 0.32 0.28 6.36 333.080 5183.6
15 BTG 1.12 2.98 -2.53 231.167 3870.93
16 QTUM 0.28 3.58 11.99 50.186 3702.88
17 XRB 4.29 15.4 160.91 25.188 3356.32
18 ETC -3.23 10.74 24.11 30.394 3005.25
19 ICX -3.59 9.69 37.95 6.352 2398.27
20 LSK 5.42 9.61 -1.45 18.794 2192.41

Related

#1 Combining categories of a categorical variable

I would like to combine some Brazilian political party names from a categorical variable (partido_pref) that was wrongly coded.
The categories that I would like to combine are "PC do B" and "PCdoB", and "PT do B" and "PTdoB". The parties with and without space are the same parties.
I would rather do it in Stata but I can also work on R.
Below you will find the list of political parties.
. tab partido_pref
partido_pref | Freq. Percent Cum.
---------------+-----------------------------------
DEM | 2,267 2.14 2.14
NA | 34,848 32.84 34.98
Não disponível | 2 0.00 34.98
Outra situação | 19 0.02 35.00
PAN | 6 0.01 35.00
PC do B | 260 0.25 35.25
PCB | 2 0.00 35.25
PCdoB | 7 0.01 35.26
PCO | 1 0.00 35.26
PDT | 3,933 3.71 38.97
PFL | 6,811 6.42 45.39
PHS | 194 0.18 45.57
PL | 2,525 2.38 47.95
PMDB | 14,833 13.98 61.93
PMN | 410 0.39 62.31
PP | 5,467 5.15 67.47
PPB | 1,661 1.57 69.03
PPL | 10 0.01 69.04
PPS | 2,493 2.35 71.39
PR | 1,861 1.75 73.14
PRB | 298 0.28 73.43
PRN | 9 0.01 73.43
PRONA | 26 0.02 73.46
PRP | 273 0.26 73.72
PRTB | 121 0.11 73.83
PSB | 2,905 2.74 76.57
PSC | 480 0.45 77.02
PSD | 816 0.77 77.79
PSDB | 11,316 10.66 88.45
PSDC | 121 0.11 88.57
PSL | 273 0.26 88.83
PSOL | 4 0.00 88.83
PST | 48 0.05 88.87
PSTU | 1 0.00 88.88
PT | 5,258 4.96 93.83
PT do B | 139 0.13 93.96
PTB | 5,383 5.07 99.03
PTC | 140 0.13 99.17
PTdoB | 10 0.01 99.18
PTN | 108 0.10 99.28
PV | 702 0.66 99.94
Recusa | 2 0.00 99.94
Sem partido | 62 0.06 100.00
---------------+-----------------------------------
Total | 106,105 100.00
Thank you in advance!
One option is fct_collapse from forcats
library(forcats)
fct_collapse(df1$partido_pref, pc = c( "PC do B", "PCdoB"),
pt = c( "PT do B", "PTdoB"))
If your problem is just getting rid of whitespace:
replace partido_pref = subinstr(partido_pref, " ", "")
See help string_functions for more options.
R is more flexible, but Stata can handle that level of simple text management.

Scraping an interactive table with rvest

I'm attempting to scrape the second table shown at the URL below, and I'm running into issues which may be related to the interactive nature of the table.
div_stats_standard appears to refer to the table of interest.
The code runs with no errors but returns an empty list.
url <- 'https://fbref.com/en/comps/9/stats/Premier-League-Stats'
data <- url %>%
read_html() %>%
html_nodes(xpath = '//*[(#id = "div_stats_standard")]') %>%
html_table()
Can anyone tell me where I'm going wrong?
Look for the table.
library(rvest)
url <- "https://fbref.com/en/comps/9/stats/Premier-League-Stats"
page <- read_html(url)
nodes <- html_nodes(page, "table") # you can use Selectorgadget to identify the node
table <- html_table(nodes[[1]]) # each element of the nodes list is one table that can be extracted
head(table)
Result:
head(table)
Playing Time Playing Time Playing Time Performance Performance
1 Squad # Pl MP Starts Min Gls Ast
2 Arsenal 26 27 297 2,430 39 26
3 Aston Villa 28 27 297 2,430 33 27
4 Bournemouth 25 28 308 2,520 27 17
5 Brighton 23 28 308 2,520 28 19
6 Burnley 21 28 308 2,520 32 23
Performance Performance Performance Performance Per 90 Minutes Per 90 Minutes
1 PK PKatt CrdY CrdR Gls Ast
2 2 2 64 3 1.44 0.96
3 1 3 54 1 1.22 1.00
4 1 1 60 3 0.96 0.61
5 1 1 44 2 1.00 0.68
6 2 2 53 0 1.14 0.82
Per 90 Minutes Per 90 Minutes Per 90 Minutes Expected Expected Expected Per 90 Minutes
1 G+A G-PK G+A-PK xG npxG xA xG
2 2.41 1.37 2.33 35.0 33.5 21.3 1.30
3 2.22 1.19 2.19 30.6 28.2 22.0 1.13
4 1.57 0.93 1.54 31.2 30.5 20.8 1.12
5 1.68 0.96 1.64 33.8 33.1 22.4 1.21
6 1.96 1.07 1.89 30.9 29.4 18.9 1.10
Per 90 Minutes Per 90 Minutes Per 90 Minutes Per 90 Minutes
1 xA xG+xA npxG npxG+xA
2 0.79 2.09 1.24 2.03
3 0.81 1.95 1.04 1.86
4 0.74 1.86 1.09 1.83
5 0.80 2.01 1.18 1.98
6 0.68 1.78 1.05 1.73

Taking inverse of certain rows in dataframe

I have a dataframe of market trades and need to multiply only the put returns by -1. I have the code for that, but can't figure out how to assign it back without affecting the calls.
Input df:
Date Type Stock_Open Stock_Close Stock_ROI
0 2016-04-27 Call 5.33 4.80 -0.099437
1 2016-06-03 Put 4.80 4.52 -0.058333
2 2016-06-30 Call 4.52 5.29 0.170354
3 2016-07-21 Put 5.29 4.84 -0.085066
4 2016-08-08 Call 4.84 5.35 0.105372
5 2016-08-25 Put 5.35 4.65 -0.130841
6 2016-09-21 Call 4.65 5.07 0.090323
7 2016-10-13 Put 5.07 4.12 -0.187377
8 2016-11-04 Call 4.12 4.79 0.162621
Code:
flipped_puts = trades_df[trades_df['Type']=='Put']['Stock_ROI']*-1
trades_df['Stock_ROI'] = flipped_puts
Output of flipped puts:
1 0.058333
3 0.085066
5 0.130841
7 0.187377
Output of original DF:
Date Type Stock_Open Stock_Close Stock_ROI
0 2016-04-27 Call 5.33 4.80 NaN
1 2016-06-03 Put 4.80 4.52 0.058333
2 2016-06-30 Call 4.52 5.29 NaN
3 2016-07-21 Put 5.29 4.84 0.085066
4 2016-08-08 Call 4.84 5.35 NaN
5 2016-08-25 Put 5.35 4.65 0.130841
6 2016-09-21 Call 4.65 5.07 NaN
7 2016-10-13 Put 5.07 4.12 0.187377
8 2016-11-04 Call 4.12 4.79 NaN
try
trades_df.loc[trades_df.Type.eq('Put'), 'Stock_ROI'] *= -1
Or
trades_df.update(trades_df.query('Type == "Put"').Stock_ROI.mul(-1))
both give you
trades_df
We can use data.table from R. Convert the 'data.frame' to 'data.table' (setDT(trades_df)), specify the logical condition in 'i', multiply the 'Stock_ROI' with -1 and assign (:=) it to a new column. The other values will be filled by NA.
library(data.table)
setDT(trades_df)[Type == 'Put', Stock_ROIN := Stock_ROI * -1][]
If we want to update the same column
setDT(trades_df)[Type == 'Put', Stock_ROI := Stock_ROI * -1]
trades_df
# Date Type Stock_Open Stock_Close Stock_ROI
#1: 2016-04-27 Call 5.33 4.80 -0.099437
#2: 2016-06-03 Put 4.80 4.52 0.058333
#3: 2016-06-30 Call 4.52 5.29 0.170354
#4: 2016-07-21 Put 5.29 4.84 0.085066
#5: 2016-08-08 Call 4.84 5.35 0.105372
#6: 2016-08-25 Put 5.35 4.65 0.130841
#7: 2016-09-21 Call 4.65 5.07 0.090323
#8: 2016-10-13 Put 5.07 4.12 0.187377
#9: 2016-11-04 Call 4.12 4.79 0.162621
and want to change the other values to NA
setDT(trades_df)[Type == 'Put', Stock_ROI := Stock_ROI * -1
][Type!= 'Put', Stock_ROI := NA]
trades_df
# Date Type Stock_Open Stock_Close Stock_ROI
#1: 2016-04-27 Call 5.33 4.80 NA
#2: 2016-06-03 Put 4.80 4.52 0.058333
#3: 2016-06-30 Call 4.52 5.29 NA
#4: 2016-07-21 Put 5.29 4.84 0.085066
#5: 2016-08-08 Call 4.84 5.35 NA
#6: 2016-08-25 Put 5.35 4.65 0.130841
#7: 2016-09-21 Call 4.65 5.07 NA
#8: 2016-10-13 Put 5.07 4.12 0.187377
#9: 2016-11-04 Call 4.12 4.79 NA

Error in producing the output

I have problem with my code. I can't trace the error. I have coor data (40 by 2 matrix) as below and a rainfall data (14610 by 40 matrix).
No Longitude Latitude
1 100.69 6.34
2 100.77 6.24
3 100.39 6.11
4 100.43 5.53
5 100.39 5.38
6 101.00 5.71
7 101.06 5.30
8 100.80 4.98
9 101.17 4.48
10 102.26 6.11
11 102.22 5.79
12 102.28 5.31
13 102.02 5.38
14 101.97 4.88
15 102.95 5.53
16 103.13 5.32
17 103.06 4.94
18 103.42 4.76
19 103.42 4.23
20 102.38 4.24
21 101.94 4.23
22 103.04 3.92
23 103.36 3.56
24 102.66 3.03
25 103.19 2.89
26 101.35 3.70
27 101.41 3.37
28 101.75 3.16
29 101.39 2.93
30 102.07 3.09
31 102.51 2.72
32 102.26 2.76
33 101.96 2.74
34 102.19 2.36
35 102.49 2.29
36 103.02 2.38
37 103.74 2.26
38 103.97 1.85
39 103.72 1.76
40 103.75 1.47
rainfall= 14610 by 40 matrix;
coor= 40 by 2 matrix
my_prog=function(rainrain,coordinat,misss,distance)
{
rain3<-rainrain # target station i**
# neighboring stations for target station i
a=coordinat # target station i**
diss=as.matrix(distHaversine(a,coor,r=6371))
mmdis=sort(diss,decreasing=F,index.return=T)
mdis=as.matrix(mmdis$x)
mdis1=as.matrix(mmdis$ix)
dist=cbind(mdis,mdis1)
# NA creation
# create missing values in rainfall data
set.seed(100)
b=sample(1:nrow(rain3),(misss*nrow(rain3)),replace=F)
k=replace(rain3,b,NA)
# pick i closest stations
neig=mdis1[distance] # neighbouring selection distance
# target (with NA) and their neighbors
rainB=rainfal00[,neig]
rainA0=rainB[,2:ncol(rainB)]
rainA<-as.matrix(cbind(k,rainA0))
rain2=na.omit(rainA)
x=as.matrix(rain2[,1]) # used to calculate the correlation
n1=ncol(rainA)-1
#1) normal ratio(nr)
jum=as.matrix(apply(rain2,2,mean))
nr0=(jum[1]/jum)
nr=as.matrix(nr0[2:nrow(nr0),])
m01=as.matrix(rainA[is.na(k),])
m1=m01[,2:ncol(m01)]
out1=as.matrix(sapply(seq_len(nrow(m1)),
function(i) sum(nr*m1[i,],na.rm=T)/n1))
print(out1)
}
impute=my_prog(rainrain=rainfall[,1],coordinat=coor[1,],misss=0.05,distance=mdis<200)
I have run this code and and the output obtained is:
Error in my_prog(rainrain = rainfal00[, 1], misss = 0.05, coordinat = coor[1, :
object 'mdis' not found
I have checked the program, but cannot trace the problem. I would really appreciate if someone could help me.

Merging two dataframes in R with date

I have the following 2 dataframes:
> bvg1
Parameters X18.Oct.14 X19.Oct.14 X20.Oct.14 X21.Oct.14 X22.Oct.14 X23.Oct.14 X24.Oct.14
1 24K Equivalent Plan 29.00 29.60 33.80 36.60 35.30 31.90 29.00
2 24K Equivalent Act 28.80 31.00 35.40 35.90 34.70 33.40 31.90
3 Plan Rep WS 2463.00 2513.00 2869.00 3115.00 2999.00 2714.00 2468.00
4 Act Rep WS 2447.00 2633.00 3013.00 3054.00 2953.00 2842.00 2714.00
5 Rep WS Var -16.00 120.00 144.00 -61.00 -46.00 128.00 246.00
6 Plan Rep Intakes 568.00 461.00 1159.00 1146.00 1126.00 1124.00 1106.00
7 Act Rep Intakes 707.00 494.00 1106.00 1096.00 1274.00 1087.00 1101.00
8 Rep Intakes Var 139.00 33.00 -53.00 -50.00 148.00 -37.00 -5.00
9 Plan Rep Comps_DL 468.00 54.00 836.00 1190.00 1327.00 1286.00 1108.00
10 Act Rep Comps_DL 471.00 70.00 995.00 1137.00 1323.00 1150.00 1073.00
11 Rep Comps Var_DL 3.00 16.00 159.00 -53.00 -4.00 -136.00 -35.00
12 Plan Rep Mandays_DL 148.00 19.00 260.00 368.00 412.00 398.00 345.00
13 Act Rep Mandays_DL 147.00 19.00 303.00 359.00 423.00 374.00 348.00
14 Rep Mandays Var_DL -1.00 1.00 43.00 -9.00 12.00 -24.00 3.00
15 Plan FVR Mandays_DL 0.00 0.00 4.00 18.00 18.00 18.00 18.00
16 Act FVR Mandays_DL 0.00 0.00 4.00 7.00 8.00 8.00 7.00
17 FVR Mandays Var_DL 0.00 0.00 0.00 -11.00 -10.00 -10.00 -11.00
18 Plan Rep Prod_DL 3.16 2.88 3.21 3.23 3.22 3.23 3.21
19 Act Rep Prod_DL 3.21 3.62 3.28 3.16 3.12 3.07 3.08
20 Rep Prod Var_DL 0.05 0.74 0.07 -0.07 -0.10 -0.16 -0.13
> bvg2
Parameters X18.Oct X19.Oct X20.Oct X21.Oct X22.Oct X23.Oct X24.Oct
1 24K Equivalent Plan 30.50 31.30 35.10 36.10 33.60 28.80 25.50
2 24K Equivalent Act 31.40 33.40 36.60 38.10 36.80 34.40 32.10
3 Plan Rep WS 3419.00 3509.00 3933.00 4041.00 3764.00 3220.00 2859.00
4 Act Rep WS 3514.00 3734.00 4098.00 4271.00 4122.00 3852.00 3591.00
5 Rep WS Var 95.00 225.00 165.00 230.00 358.00 632.00 732.00
6 Plan Rep Intakes 813.00 613.00 1559.00 1560.00 1506.00 1454.00 1410.00
7 Act Rep Intakes 964.00 602.00 1629.00 1532.00 1657.00 1507.00 1439.00
8 Rep Intakes Var 151.00 -11.00 70.00 -28.00 151.00 53.00 29.00
9 Plan Rep Comps_DL 675.00 175.00 1331.00 1732.00 1938.00 1706.00 1493.00
10 Act Rep Comps_DL 718.00 224.00 1389.00 1609.00 1848.00 1698.00 1537.00
11 Rep Comps Var_DL 43.00 49.00 58.00 -123.00 -90.00 -8.00 44.00
12 Plan Rep Mandays_DL 203.00 58.00 428.00 541.00 605.00 536.00 475.00
13 Act Rep Mandays_DL 215.00 63.00 472.00 542.00 608.00 556.00 523.00
14 Rep Mandays Var_DL 12.00 5.00 44.00 2.00 3.00 20.00 48.00
15 Plan FVR Mandays_DL 0.00 0.00 1.00 12.00 2.00 32.00 57.00
16 Act FVR Mandays_DL 0.00 0.00 2.00 2.00 5.00 5.00 5.00
17 FVR Mandays Var_DL 0.00 0.00 1.00 -10.00 3.00 -27.00 -52.00
18 Plan Rep Prod_DL 3.33 3.03 3.11 3.20 3.20 3.18 3.14
19 Act Rep Prod_DL 3.34 3.56 2.94 2.97 3.04 3.05 2.94
20 Rep Prod Var_DL 0.01 0.53 -0.17 -0.23 -0.16 -0.13 -0.20
It is a time series data i.e. 24K Equivalent Plan was 29 on 18th Oct, 29.60 on 19th Oct and 33.80 on 20th Oct. First dataframe have data for one business unit and second dataframe have the data for a different business unit.
I want to merge dataframes into 1 and want to analyse the variance i.e. where they differ in values. Draw ggplots like 2 histograms showing the difference, timeseries plots etc.
I have tried the following:
I can merge the two dataframes by:
joined = rbind(bvg1, bvg2)
however, i can't identify the record whether it belongs to bvg1 or bvg2 df.
if i add an additional column i.e.
bvg1$id = "bvg1"
bvg2$id = "bvg2"
then merge command doesn't work and gives the following error:
Error in match.names(clabs, names(xi)) :
names do not match previous names
Any sample code would be highly appreciated.
You can match the column names of the two datasets by stripping the . followed by the digits in the bvg1. This can be done using regex. In the below code, a lookbehind regex is used. It matches the lookbehind (?<=[A-Za-]) i.e. an alphabet followed by . followed by one or more elements .* to the end of string $ and remove those "".
colnames(bvg1) <-gsub("(?<=[A-Za-z])\\..*$", "", colnames(bvg1), perl=TRUE)
res <- rbind(bvg1, bvg2)
dim(res)
#[1] 40 9
head(res,3)
# Parameters X18.Oct X19.Oct X20.Oct X21.Oct X22.Oct X23.Oct X24.Oct
#1 24K Equivalent Plan 29.0 29.6 33.8 36.6 35.3 31.9 29.0
#2 24K Equivalent Act 28.8 31.0 35.4 35.9 34.7 33.4 31.9
#3 Plan Rep WS 2463.0 2513.0 2869.0 3115.0 2999.0 2714.0 2468.0
# id
#1 bvg1
#2 bvg1
#3 bvg1

Resources