unable to scrape by traversing <br> in <td> using scrapy with css - css

html code is following:
<td class="column-3">
(price per 1,000 images)<br>
0-1M images -
<span class="price-data " data-amount="{"regional":{"asia-pacific-southeast":0.5,"australia-east":0.5,"brazil-south":0.5,"canada-central":0.5,"central-india":0.5,"europe-north":0.5,"europe-west":0.5,"united-kingdom-south":0.5,"us-east":0.5,"us-east-2":0.5,"us-south-central":0.5,"us-west-2":0.5,"us-west-central":0.5}}" data-decimals="3" data-decimals-force="0" data-region-unavailable="N/A" data-has-valid-price="true">$0.50</span> <br>
1M-5M images -
<span class="price-data " data-amount="{"regional":{"asia-pacific-southeast":0.4,"australia-east":0.4,"brazil-south":0.4,"canada-central":0.4,"central-india":0.4,"europe-north":0.4,"europe-west":0.4,"united-kingdom-south":0.4,"us-east":0.4,"us-east-2":0.4,"us-south-central":0.4,"us-west-2":0.4,"us-west-central":0.4}}" data-decimals="3" data-decimals-force="0" data-region-unavailable="N/A" data-has-valid-price="true">$0.40</span> <br>
5M+ images -
<span class="price-data " data-amount="{"regional":{"asia-pacific-southeast":0.325,"australia-east":0.325,"brazil-south":0.325,"canada-central":0.325,"central-india":0.325,"europe-north":0.325,"europe-west":0.325,"united-kingdom-south":0.325,"us-east":0.325,"us-east-2":0.325,"us-south-central":0.325,"us-west-2":0.325,"us-west-central":0.325}}" data-decimals="3" data-decimals-force="0" data-region-unavailable="N/A" data-has-valid-price="true">$0.325</span> <br>
</td>
url: https://azure.microsoft.com/en-in/pricing/details/search/
How can I traverse <br> and scrape the data? I want to split td tags into count(br) times and then scrape. I don't want to use xpath. I want to get the result through css.

dumb = 'Your response, or above text'
html_dumb = Selector(text=dumb)
td_vals = [x.strip().strip('- ') for x in
html_dumb.xpath("//td/text()").extract() if x.strip()] #got all td values
f_val = td_vals[0] # seperate the first one. here (price per 1,000 images)
td_vals = td_vals[1:]
span_vals = [x.strip() for x in html_dumb.xpath("//span/#data-amount").extract() if x.strip()] #got all span data, you can also get span text if you need
inner_json = {}
result = {}
for td_val, span_val in zip(td_vals, span_vals):
d[td_val] = json.loads(span_val) #building inner dictionary
result[f_val] = d #append in outer one
{u'(price per 1,000 images)': {u'5M+ images': {u'regional': {u'united-kingdom-south': 0.325, u'europe-north': 0.325, u'brazil-south': 0.325, u'us-west-2': 0.325, u'us-south-central': 0.325, u'central-india': 0.325, u'us-east': 0.325, u'canada-central': 0.325, u'europe-west': 0.325, u'us-east-2': 0.325, u'us-west-central': 0.325, u'asia-pacific-southeast': 0.325, u'australia-east': 0.325}}, u'0-1M images': {u'regional': {u'united-kingdom-south': 0.5, u'europe-north': 0.5, u'brazil-south': 0.5, u'us-west-2': 0.5, u'us-south-central': 0.5, u'central-india': 0.5, u'us-east': 0.5, u'canada-central': 0.5, u'europe-west': 0.5, u'us-east-2': 0.5, u'us-west-central': 0.5, u'asia-pacific-southeast': 0.5, u'australia-east': 0.5}}, u'1M-5M images': {u'regional': {u'united-kingdom-south': 0.4, u'europe-north': 0.4, u'brazil-south': 0.4, u'us-west-2': 0.4, u'us-south-central': 0.4, u'central-india': 0.4, u'us-east': 0.4, u'canada-central': 0.4, u'europe-west': 0.4, u'us-east-2': 0.4, u'us-west-central': 0.4, u'asia-pacific-southeast': 0.4, u'australia-east': 0.4}}}}

Related

Why do the results of Dunn's test in GraphPad Prism and R differ?

I have three sets of data, to which I want to apply Dunn's test. However, the test shows different results when performed in GraphPad Prism and R. I've been reading a little bit about the test here, but I couldn't understand why there is a difference in the p-values. I even tested in R all the methods to adjust the p-value, but none of them matched the GrapPad Prism result.
Below I present screenshots of the step-by-step in GraphPad Prism and the code I used in R.
library(rstatix)
Day <- rep(1:10, 3)
FLs <- c(rep("FL1", 10), rep("FL2", 10), rep("FL3", 10))
Value <- c(0.2, 0.4, 0.3, 0.2, 0.3, 0.4, 0.2, 0.25, 0.32, 0.21,
0.9, 0.6, 0.7, 0.78, 0.74, 0.81, 0.76, 0.77, 0.79, 0.79,
0.6, 0.58, 0.54, 0.52, 0.39, 0.6, 0.52, 0.67, 0.65, 0.56)
DF <- data.frame(FLs, Day, Value)
Dunn <- DF %>%
dunn_test(Value ~ FLs,
p.adjust.method = "bonferroni",
detailed = TRUE) %>%
add_significance()

bnlearn Error: Wrong number of conditional probability distributions

I am learning to work with bnlearn and I keep running into the following error in the last line of my code below:
Error in custom.fit(dag, cpt) : wrong number of conditional probability distributions
What am I doing wrong?
modelstring(dag)= "[s][r][nblw|r][nblg|nblw][mlw|s:r][f|s:r:mlw][mlg|mlw:f]
[mlgr|mlg:nblg]"
###View DAG Specifics
dag
arcs(dag)
nodes(dag)
# Create Levels
State <- c("State0", "State1")
##Create probability distributions given; these are all 2d b/c they have 1 or 2 nodes
cptS <- matrix(c(0.6, 0.4), ncol=2, dimnames=list(NULL, State))
cptR <- matrix(c(0.7, 0.3), ncol=2, dimnames=list(NULL, State))
cptNBLW <- matrix(c(0.95, 0.05, 0.05, 0.95), ncol=2, dimnames=list(NULL, "r"= State))
cptNBLG <- matrix(c(0.9, 0.099999999999999998, 0.2, 0.8), ncol=2, dimnames=list(NULL,
"nblw"=State))
cptMLG <- matrix(c(0.95, 0.05, 0.4, 0.6, 0.2, 0.8, 0.05, 0.95),ncol=2,nrow = 2,
dimnames=list("mlw"= State, "f"=State))
cptMLGR <- matrix(c(0.6,0.4,0.95,0.05,0.2,0.8,0.55,0.45),ncol=2,nrow = 2,
dimnames=list("mlg"= State, "nblg"=State))
cptMLW <-matrix(c(0.95, 0.05, 0.1, 0.9, 0.2, 0.8, 0.01, 0.99), ncol=2,nrow = 2,byrow = TRUE,
dimnames=list("r"= State, "s"=State))
# Build 3-d matrices( becuase you have 3 nodes, you can't use the matrix function; you
have to build it from scratch)
cptF <- c(0.05, 0.95, 0.4, 0.6, 0.9, 0.1, 0.99, 0.01, 0.9, 0.1, 0.95, 0.05, 0.95, 0.05, 0.99,
0.01)
dim(cptF) <- c(2, 2, 2, 2)
dimnames(cptF) <- list("s"=State, "r"=State, "mlw"=State)
###Create CPT Table
cpt <- list(s = cptS, r = cptR, mlw = cptMLW,nblw= cptNBLW,
mlg= cptMLG, nblg= cptNBLG, mlgr= cptMLGR)
# Construct BN network with Conditional Probability Table
S.net <- custom.fit(dag,cpt)
Reference: https://rpubs.com/sarataheri/bnlearnCGM
You have several errors in your CPT definitions. Primarily, you need to make sure that:
the number of probabilities supplied are equal to the product of the number of states in the child and parent nodes,
that the number of dimensions of the matrix/array is equal to the number of parent nodes plus one, for the child node,
the child node should be given in the first dimension when the node dimension is greater than one.
the names given in the dimnames arguments (e.g. the names in dimnames=list(ThisName = ...)) should match the names that were defined in the DAG, in your case with modelstring and in my answer with model2network. (So my earlier suggestion of using dimnames=list(cptNBLW = ...) should be dimnames=list(nblw = ...) to match how node nblw was declared in the model string)
You also did not add node f into your cpt list.
Below is your code with comments where things have been changed. (I have commented out the offending lines and added ones straight after)
library(bnlearn)
dag <- model2network("[s][r][nblw|r][nblg|nblw][mlw|s:r][mlg|mlw:f][mlgr|mlg:nblg][f|s:r:mlw]")
State <- c("State0", "State1")
cptS <- matrix(c(0.6, 0.4), ncol=2, dimnames=list(NULL, State))
cptR <- matrix(c(0.7, 0.3), ncol=2, dimnames=list(NULL, State))
# add child node into first slot of dimnames
cptNBLW <- matrix(c(0.95, 0.05, 0.05, 0.95), ncol=2, dimnames=list(nblw=State, "r"= State))
cptNBLG <- matrix(c(0.9, 0.099999999999999998, 0.2, 0.8), ncol=2, dimnames=list(nblg=State,"nblw"=State))
# Use a 3d array and not matrix, and add child node into dimnames
# cptMLG <- matrix(c(0.95, 0.05, 0.4, 0.6, 0.2, 0.8, 0.05, 0.95),ncol=2,nrow = 2, dimnames=list("mlw"= State, "f"=State))
cptMLG <- array(c(0.95, 0.05, 0.4, 0.6, 0.2, 0.8, 0.05, 0.95),dim=c(2,2,2), dimnames=list(mlg = State, "mlw"= State, "f"=State))
# cptMLGR <- matrix(c(0.6,0.4,0.95,0.05,0.2,0.8,0.55,0.45),ncol=2,nrow = 2, dimnames=list("mlg"= State, "nblg"=State))
cptMLGR <- array(c(0.6,0.4,0.95,0.05,0.2,0.8,0.55,0.45), dim=c(2,2,2), dimnames=list(mlgr=State, "mlg"= State, "nblg"=State))
# cptMLW <-matrix(c(0.95, 0.05, 0.1, 0.9, 0.2, 0.8, 0.01, 0.99), ncol=2,nrow = 2,byrow = TRUE, dimnames=list("r"= State, "s"=State))
cptMLW <-array(c(0.95, 0.05, 0.1, 0.9, 0.2, 0.8, 0.01, 0.99), dim=c(2,2,2), dimnames=list(mlw=State, "r"= State, "s"=State))
# add child into first slot of dimnames
cptF <- c(0.05, 0.95, 0.4, 0.6, 0.9, 0.1, 0.99, 0.01, 0.9, 0.1, 0.95, 0.05, 0.95, 0.05, 0.99, 0.01)
dim(cptF) <- c(2, 2, 2, 2)
dimnames(cptF) <- list("f" = State, "s"=State, "r"=State, "mlw"=State)
# add missing node f into list
cpt <- list(s = cptS, r = cptR, mlw = cptMLW,nblw= cptNBLW, mlg= cptMLG, nblg= cptNBLG, mlgr= cptMLGR, f=cptF)
# Construct BN network with Conditional Probability Table
S.net <- custom.fit(dag, dist=cpt)

Complex clipping (spatial intersection ?) of polygons and lines in R

I would like to clip (or maybe the right formulation is performing spatial intersection) polygons and lines using a polygon rather than a rectangle, like so:
Here is some code to make the polygons for reproducibility and examples:
p1 <- data.frame(x = c(-0.81, -0.45, -0.04, 0.32, 0.47, 0.86, 0.08, -0.46, -1, -0.76),
y = c(0.46, 1, 0.64, 0.99, -0.04, -0.14, -0.84, -0.24, -0.44, 0.12))
p2 <- data.frame(x = c(-0.63, -0.45, -0.2, -0.38, -0.26, -0.82, -0.57, -0.76),
y = c(-0.1, 0.15, -0.17, -0.79, -1, -0.97, -0.7, -0.61))
l1 <- data.frame(x = c(0.1, 0.28, 0.29, 0.52, 0.51, 0.9, 1),
y = c(0.19, -0.15, 0.25, 0.28, 0.64, 0.9, 0.47))
plot.new()
plot.window(xlim = c(-1, 1), ylim = c(-1,1))
polygon(p2$x, p2$y, col = "blue")
polygon(p1$x, p1$y)
lines(l1$x, l1$y)
You could use the spatstat package for this. Below the original example is
worked through. In spatstat polygons are used as “observation windows” of
point patterns, so they are of class owin. It is possible to do set
intersection, union etc. with owin objects.
p1 <- data.frame(x = c(-0.81, -0.45, -0.04, 0.32, 0.47, 0.86, 0.08, -0.46, -1, -0.76),
y = c(0.46, 1, 0.64, 0.99, -0.04, -0.14, -0.84, -0.24, -0.44, 0.12))
p2 <- data.frame(x = c(-0.63, -0.45, -0.2, -0.38, -0.26, -0.82, -0.57, -0.76),
y = c(-0.1, 0.15, -0.17, -0.79, -1, -0.97, -0.7, -0.61))
l1 <- data.frame(x = c(0.1, 0.28, 0.29, 0.52, 0.51, 0.9, 1),
y = c(0.19, -0.15, 0.25, 0.28, 0.64, 0.9, 0.47))
In spatstat polygons must be traversed anti-clockwise, so:
library(spatstat)
p1rev <- lapply(p1, rev)
p2rev <- lapply(p2, rev)
W1 <- owin(poly = p1rev)
W2 <- owin(poly = p2rev)
L1 <- psp(x0 = l1$x[-nrow(l1)], y0 = l1$y[-nrow(l1)],
x1 = l1$x[-1], y1 = l1$y[-1], window = boundingbox(l1))
plot(boundingbox(W1,W2,L1), type= "n", main = "Original")
plot(W2, col = "blue", add = TRUE)
plot(W1, add = TRUE)
plot(L1, add = TRUE)
W2clip <- W2[W1]
L1clip <- L1[W1]
plot(W1, main = "Clipped")
plot(W2clip, col = "blue", add = TRUE)
plot(L1clip, add = TRUE)

How to change the a axis to a time series in ggplot2

I'm trying to replicate the graph provided at https://www.chicagofed.org/research/data/cfnai/current-data since I will be needing graphs for data sets soon that look like this. I'm almost there, I can't seem to figure out how to change the x axis to the dates when using ggplot2. Specifically, I would like to change it to the dates in the Date column. I tried about a dozen ways and nothing is working. The data for this graph is under indexes on the website. Here's my code and the graph where dataSet is the data from the website:
library(ggplot2)
library(reshape2)
library(tidyverse)
library(lubridate)
df = data.frame(time = index(dataSet), melt(as.data.frame(dataSet)))
df
str(df)
df$data1.Date = as.Date(as.character(df$data1.Date))
str(df)
replicaPlot1 = ggplot(df, aes(x = time, y = value)) +
geom_area(aes(colour = variable, fill = variable)) +
stat_summary(fun = sum, geom = "line", size = 0.4) +
labs(title = "Chicago Fed National Activity Index (CFNAI) Current Data")
replicaPlot1 + scale_x_continuous(name = "time", breaks = waiver(), labels = waiver(), limits =
df$data1.Date)
replicaPlot1
Any sort of help on this would be very much appreciated!
G:\BOS\Common\R-Projects\Graphs\Replica of Chicago Fed National Acitivty index (PCA)\dataSet
Not sure what's your intention with data.frame(time = index(dataSet), melt(as.data.frame(dataSet))). When I download the data and read via readxl::read_excel I got a nice tibble with a date(time) column which after reshaping via tidyr::pivot_longer could easily be plotted and by making use of scale_x_datetime has a nicely formatted date axis:
Using just the first 20 rows of data try this:
library(ggplot2)
library(readxl)
library(tidyr)
df <- pivot_longer(df, -Date, names_to = "variable")
ggplot(df, aes(x = Date, y = value)) +
geom_area(aes(colour = variable, fill = variable)) +
stat_summary(fun = sum, geom = "line", size = 0.4) +
labs(title = "Chicago Fed National Activity Index (CFNAI) Current Data") +
scale_x_datetime(name = "time")
#> Warning: Removed 4 rows containing non-finite values (stat_summary).
#> Warning: Removed 4 rows containing missing values (position_stack).
Created on 2021-01-28 by the reprex package (v1.0.0)
DATA
# Data downloaded from https://www.chicagofed.org/~/media/publications/cfnai/cfnai-data-series-xlsx.xlsx?la=en
# df <- readxl::read_excel("cfnai-data-series-xlsx.xlsx")
# dput(head(df, 20))
df <- structure(list(Date = structure(c(
-87004800, -84412800, -81734400,
-79142400, -76464000, -73785600, -71193600, -68515200, -65923200,
-63244800, -60566400, -58060800, -55382400, -52790400, -50112000,
-47520000, -44841600, -42163200, -39571200, -36892800
), tzone = "UTC", class = c(
"POSIXct",
"POSIXt"
)), P_I = c(
-0.26, 0.16, -0.43, -0.09, -0.19, 0.58, -0.05,
0.21, 0.51, 0.33, -0.1, 0.12, 0.07, 0.04, 0.35, 0.04, -0.1, 0.14,
0.05, 0.11
), EU_H = c(
-0.06, -0.09, 0.01, 0.04, 0.1, 0.22, -0.04,
0, 0.32, 0.16, -0.2, 0.34, 0.06, 0.17, 0.17, 0.07, 0.12, 0.12,
0.15, 0.18
), C_H = c(
-0.01, 0.01, -0.05, 0.08, -0.07, -0.01,
0.12, -0.11, 0.1, 0.15, -0.04, 0.04, 0.17, -0.03, 0.05, 0.08,
0.09, 0.05, -0.06, 0.09
), SO_I = c(
-0.01, -0.07, -0.08, 0.02,
-0.16, 0.22, -0.08, -0.07, 0.38, 0.34, -0.13, -0.1, 0.08, -0.07,
0.06, 0.07, 0.12, -0.3, 0.35, 0.14
), CFNAI = c(
-0.34, 0.02, -0.55,
0.04, -0.32, 1, -0.05, 0.03, 1.32, 0.97, -0.46, 0.39, 0.38, 0.11,
0.63, 0.25, 0.22, 0.01, 0.49, 0.52
), CFNAI_MA3 = c(
NA, NA, -0.29,
-0.17, -0.28, 0.24, 0.21, 0.33, 0.43, 0.77, 0.61, 0.3, 0.1, 0.29,
0.37, 0.33, 0.37, 0.16, 0.24, 0.34
), DIFFUSION = c(
NA, NA, -0.17,
-0.14, -0.21, 0.16, 0.11, 0.17, 0.2, 0.5, 0.41, 0.28, 0.2, 0.32,
0.36, 0.32, 0.33, 0.25, 0.31, 0.47
)), row.names = c(NA, -20L), class = c(
"tbl_df",
"tbl", "data.frame"
))

How can I create a customized colormap for geoviews (bokeh)?

I'm trying to plot an xarray dataset in Geoviews, like this:
https://geoviews.org/gallery/bokeh/xarray_image.html#bokeh-gallery-xarray-image
There I can define a colormap by cmap.
The cmap is just a list of hex-codes, like:
['#150b00',
'#9b4e00',
'#f07800',
'#ffa448',
'#a8a800',
'#dddd00',
'#ffff00',
'#ffffb3',
'#ffffff',
'#b0ffff',
'#00e8e8',
'#00bfbf',
'#008a8a',
'#79bcff',
'#0683ff',
'#0000c1',
'#000048']
I want to define to levels of values for these color, like this list:
[-10.0,
-5.0,
-2.5,
-1.0,
-0.5,
-0.2,
-0.1,
-0.05,
0.05,
0.1,
0.2,
0.5,
1.0,
2.5,
5.0,
10.0]
How can I define these levels?
Please try to set the parameter color_levels to the wanted values. This is explained in HoloViews Styling Plots in the section Custom color intervals. HoloVies is the source where the gv.Image comes from. Therefore this should work.
cmap = ['#150b00', '#9b4e00', '#f07800', '#ffa448', '#a8a800', '#dddd00', '#ffff00', '#ffffb3', '#ffffff', '#b0ffff', '#00e8e8', '#00bfbf', '#008a8a', '#79bcff', '#0683ff', '#0000c1', '#000048']
levels = [-10.0, -5.0, -2.5, -1.0, -0.5, -0.2, -0.1, -0.05, 0.05, 0.1, 0.2, 0.5, 1.0, 2.5, 5.0, 10.0]
images.opts(
cmap=cmap,
color_levels=levels,
colorbar=True,
width=600,
height=500) * gf.coastline
Comment
If this is not working, then I apologize. In the moment I am not able to install GeoViews on my machine.

Resources