How can I scrape data value? [duplicate] - web-scraping

This question already has answers here:
IMDB scrapy get all movie data
(4 answers)
Closed 2 years ago.
Extract from imdb website:
<div class="inline-block ratings-imdb-rating" name="ir" data-value="6.9">
<span class="gloabl-sprite rating-star imdb-rating"></span>
<strong>6.9</strong>
Code using Scrapy
rating = response.css(".global-sprite rating-star imdb-rating::text").extract_first()
I was using the above class 'global-sprite rating-star imdb-rating' but it doesn't grab the value 6.9 from the imdb website. How can I grab the value with the above code?

Quotation seems to be missing.
data-value="6.9"
Correct and it should work.

Related

Cannot import a simple vector in R : why? [duplicate]

This question already has answers here:
R eval parse character limit
(1 answer)
Command Lines error in Rstudio console
(1 answer)
Closed 1 year ago.
I would like to import in R this vector that contains the names of several variables :
names14 <- c("INAMI","HcwProfession","code_qualif","espaces","start_convention","end_convention","date_adh_convention","date_refus_convention","n_limitations","jour_a1","h_s_a1","m_s_a1","h_e_a1","m_e_a1","jour_b1","h_s_b1","m_s_b1","h_e_b1","h_e_b1","na1","type1","INAMI1","lieu1","postal1","ins1","local1","province1","com1","start1","end1","jour_a2","h_s_a2","m_s_a2","h_e_a2","m_e_a2","jour_b2","h_s_b2","m_s_b2","h_e_b1","h_e_b1","na2","type2","INAMI2","lieu2","postal2","ins2","local2","province2","com2","start2","end2","jour_a3","h_s_a3","m_s_a3","h_e_a3","m_e_a3","jour_b3","h_s_b3","m_s_b3","h_e_b1","h_e_b1","na3","type3","INAMI3","lieu3","postal3","ins3","local3","province3","com3","start3","end3","jour_a4","h_s_a4","m_s_a4","h_e_a4","m_e_a4","jour_b4","h_s_b4","m_s_b4","h_e_b1","h_e_b1","na4","type4","INAMI4","lieu4","postal4","ins4","local4","province4","com4","start4","end4","jour_a5","h_s_a5","m_s_a5","h_e_a5","m_e_a5","jour_b5","h_s_b5","m_s_b5","h_e_b1","h_e_b1","na5","type5","INAMI5","lieu5","postal5","ins5","local5","province5","com5","start5","end5","jour_a6","h_s_a6","m_s_a6","h_e_a6","m_e_a6","jour_b6","h_s_b6","m_s_b6","h_e_b1","h_e_b1","na6","type6","INAMI6","lieu6","postal6","ins6","local6","province6","com6","start6","end6","jour_a7","h_s_a7","m_s_a7","h_e_a7","m_e_a7","jour_b7","h_s_b7","m_s_b7","h_e_b1","h_e_b1","na7","type7","INAMI7","lieu7","postal7","ins7","local7","province7","com7","start7","end7","jour_a8","h_s_a8","m_s_a8","h_e_a8","m_e_a8","jour_b8","h_s_b8","m_s_b8","h_e_b1","h_e_b1","na8","type8","INAMI8","lieu8","postal8","ins8","local8","province8","com8","start8","end8","jour_a9","h_s_a9","m_s_a9","h_e_a9","m_e_a9","jour_b9","h_s_b9","m_s_b9","h_e_b1","h_e_b1","na9","type9","INAMI9","lieu9","postal9","ins9","local9","province9","com9","start9","end9","jour_a10","h_s_a10","m_s_a10","h_e_a10","m_e_a10","jour_b10","h_s_b10","m_s_b10","h_e_b1","h_e_b1","na10","type10","INAMI10","lieu10","postal10","ins10","local10","province10","com10","start10","end10","jour_a11","h_s_a11","m_s_a11","h_e_a11","m_e_a11","jour_b11","h_s_b11","m_s_b11","h_e_b1","h_e_b1","na11","type11","INAMI11","lieu11","postal11","ins11","local11","province11","com11","start11","end11","jour_a12","h_s_a12","m_s_a12","h_e_a12","m_e_a12","jour_b12","h_s_b12","m_s_b12","h_e_b1","h_e_b1","na12","type12","INAMI12","lieu12","postal12","ins12","local12","province12","com12","start12","end12","jour_a13","h_s_a13","m_s_a13","h_e_a13","m_e_a13","jour_b13","h_s_b13","m_s_b13","h_e_b1","h_e_b1","na13","type13","INAMI13","lieu13","postal13","ins13","local13","province13","com13","start13","end13","jour_a14","h_s_a14","m_s_a14","h_e_a14","m_e_a14","jour_b14","h_s_b14","m_s_b14","h_e_b1","h_e_b1","na14","type14","INAMI14","lieu14","postal14","ins14","local14","province14","com14","start14","end14","jour_a15","h_s_a15","m_s_a15","h_e_a15","m_e_a15","jour_b15","h_s_b15","m_s_b15","h_e_b1","h_e_b1","na15","type15","INAMI15","lieu15","postal15","ins15","local15","province15","com15","start15","end15","jour_a16","h_s_a16","m_s_a16","h_e_a16","m_e_a16","jour_b16","h_s_b16","m_s_b16","h_e_b1","h_e_b1","na16","type16","INAMI16","lieu16","postal16","ins16","local16","province16","com16","start16","end16","jour_a17","h_s_a17","m_s_a17","h_e_a17","m_e_a17","jour_b17","h_s_b17","m_s_b17","h_e_b1","h_e_b1","na17","type17","INAMI17","lieu17","postal17","ins17","local17","province17","com17","start17","end17","jour_a18","h_s_a18","m_s_a18","h_e_a18","m_e_a18","jour_b18","h_s_b18","m_s_b18","h_e_b1","h_e_b1","na18","type18","INAMI18","lieu18","postal18","ins18","local18","province18","com18","start18","end18","jour_a19","h_s_a19","m_s_a19","h_e_a19","m_e_a19","jour_b19","h_s_b19","m_s_b19","h_e_b1","h_e_b1","na19","type19","INAMI19","lieu19","postal19","ins19","local19","province19","com19","start19","end19","jour_a20","h_s_a20","m_s_a20","h_e_a20","m_e_a20","jour_b20","h_s_b20","m_s_b20","h_e_b1","h_e_b1","na20","type20","INAMI20","lieu20","postal20","ins20","local20","province20","com20","start20","end20","jour_a21","h_s_a21","m_s_a21","h_e_a21","m_e_a21","jour_b21","h_s_b21","m_s_b21","h_e_b1","h_e_b1","na21","type21","INAMI21","lieu21","postal21","ins21","local21","province21","com21","start21","end21","jour_a22","h_s_a22","m_s_a22","h_e_a22","m_e_a22","jour_b22","h_s_b22","m_s_b22","h_e_b1","h_e_b1","na22","type22","INAMI22","lieu22","postal22","ins22","local22","province22","com22","start22","end22","jour_a23","h_s_a23","m_s_a23","h_e_a23","m_e_a23","jour_b23","h_s_b23","m_s_b23","h_e_b1","h_e_b1","na23","type23","INAMI23","lieu23","postal23","ins23","local23","province23","com23","start23","end23","jour_a24","h_s_a24","m_s_a24","h_e_a24","m_e_a24","jour_b24","h_s_b24","m_s_b24","h_e_b1","h_e_b1","na24","type24","INAMI24","lieu24","postal24","ins24","local24","province24","com24","start24","end24")
But R tells me something is missing (it shows "+", telling me to add something), but I don't understand what ? It works if I shorten the vector, maybe it's too long ?

Adding strings at the back of an existing string help please [duplicate]

This question already has an answer here:
Add a prefix to all rows in R
(1 answer)
Closed 2 years ago.
df<- data.frame(Speaker=c('Abraham','Wassimo','Fredrick','Richard','Ravish','Rubina','Laura'),Age=c(45,47,39,3
3,36,28,30))
#data frame
df
gsub('.*^','Mr/Mrs.',df$Speaker)
results
Mr/Mrs.Abraham
Mr/Mrs.Wassimo
Mr/Mrs.Fredrick
Mr/Mrs.Richard
Mr/Mrs.Ravish
Mr/Mrs.Rubina
Mr/Mrs.Laura
I can not figure out how to add a string after the names though. Can anyone help me add a string after the names?
I don't know why you think you need a regex substitution here, just use paste:
df$Speaker <- paste0("Mr/Mrs.", df$Speaker, "text after here")

Extract a sentence from mail with Regex [duplicate]

This question already has answers here:
Extracting a string between other two strings in R
(4 answers)
Closed 3 years ago.
I need to extract with Regex a sentence without the tag <br> but it's give me issues with that.
(?<=Status:) (.*)[^<br>]
Status: i3 Naviera indicates that the container is already released<br>
This sentence comes from an mail
"<html>\r\n<head>\r\n<meta http-equiv=\"Content-Type\"
content=\"text/html; charset=utf-8\">\r\n</head>\r\n<body>\r\nStatus:
i3 Naviera indicates that the container is already
released<br>\r\nObservations: data requested.<br>\r\n<br>\r\n<img
src=\"http://test/logo/Logo2.png\">\r\n</body>\r\n</html>\r\n"
I just need to extract:
i3 Naviera indicates that the container is already released
This regex would work for your content:
(?<=Status: )(.*?)(?=<br>)
It matches the Status: with space, and stops at the first <br> and does not include it in the match.
Please note that using regex for html parsing requires that the html content does not change much.

How can I add individual html code on WooCommerce after each single product summary [duplicate]

This question already has an answer here:
How can I add html code on WooCommerce after single product summary
(1 answer)
Closed 3 years ago.
How can I add individual HTML code for individual products?
Here is the screenshot: https://i.stack.imgur.com/QeGGw.png
Regards
Golam Rabbi
You can try this:
function html_below_product_tabs() {
echo 'Content you want to place';
}
add_action('woocommerce_product_after_tabs', 'html_below_product_tabs');

R Convert HTML ASCII characters to character [duplicate]

This question already has answers here:
convert HTML Character Entity Encoding in R
(5 answers)
Convert HTML Entity to proper character R
(1 answer)
Closed 4 years ago.
Is there a standard way in R to transliterate ASCII HTML codes to a standard character? For example, ' is an apostrophe, like ' or ' (I typed an apostrophe for the second one and the HTML code for the first). I'd like to change the following text
text = "Met with Mark's boss today to discuss performance"
to be
"Met with Mark's boss today to discuss performance"
I tried using iconv like below but the HTML code is all valid encoding, so nothing changes.
iconv(text, from="ASCII", to="UTF-8//TRANSLIT")
I could get a lookup table and do it that way but thought I'd check if there's an existing method to accomplish this.

Resources