R to web scrape- using rvest- timeout error - r

library(rvest)
jobbank <- read_html("https://www.jobbank.gc.ca/LMI_bulletin.do?cid=3373&AREA=0007&INDUSTRYCD=&EVENTCD=")
Error in open.connection(x, "rb") :
Timeout was reached: Connection timed out after 10015 milliseconds
jobbank %>%
html_node(".lmiBox") %>%
html_text()
Error in eval(lhs, parent, parent) : object 'jobbank' not found
I'm trying to find keywords from the news section of the websites but it seems to be showing me these 2 error messages.

Seems to be working fine on my side.
library(rvest)
#> Loading required package: xml2
library(stringr)
jobbank <- read_html("https://www.jobbank.gc.ca/LMI_bulletin.do?cid=3373&AREA=0007&INDUSTRYCD=&EVENTCD=")
jobbank %>%
html_node(".lmiBox") %>%
html_text() %>%
str_split("(\r\\n+\\s+)|(\\n\\s+)")
#> [[1]]
#> [1] ""
#> [2] "Week of Jan 14 - Jan 18, 2019Lowe's Canada is looking to hire about 2,650 full-time, part-time and seasonal staff at its stores in Ontario. The company will hold a National Hiring Day on February 23."
#> [3] "The Ministry of Innovation, Science, and Economic Development announced $5M in funding to support automotive innovation at APAG Elektronik Corp. and Service Mold + Aerospace Inc. in Windsor, creating 160 jobs"
#> [4] "A $1M investment by the provincial government into Kenora's Downtown Revitalization Project for a plaza and infrastructure upgrades will create 75 new jobs"
#> [5] "Redfin Corp., an American real estate brokerage, is expanding into Canada and hiring in Toronto"
#> [6] "The construction of townhomes at Walkerville Stones in Windsor is expected to begin this spring "
#> [7] "The Ontario Emerging Jobs Institute (OEJI) at the Nav Centre in Cornwall opened. The OEJI provides skills training in areas with worker shortages."
#> [8] "The Chartwell Meadowbrook Retirement Residence in Lively broke ground on their expansion project, which includes 41 new suites and 14 town homes"
#> [9] "Lambton College created an Information Technology and Communication Research Centre using a five-year, $2M grant from the Natural Sciences and Engineering Research Council of Canada. They hope to use part of the funding to employ students."
#> [10] "SnapCab, a workspace pod manufacturer in Kingston, has grown from 20 to 25 employees with more hiring expected to occur in 2019"
#> [11] "Niagara Pallet & Recyclers Ltd., a manufacturer of pallets and shipping materials in Smithville, is hiring general labour workers, AZ and DZ drivers, production staff, forklift drivers and saw operators"
#> [12] "A1 Demolition will begin demolition of the former Maliboo Club in Simcoe. The plan is to rebuild the structure with residential and commercial space."
#> [13] "MidiCi: The Neapolitan Pizza Co., Sweet Jesus, La Carnita and The Pie Commission will be among several restaurants opening in the 34,000-sq.-ft. Food District in Mississauga this spring "
#> [14] "Menkes Developments Ltd., in partnership with TD Greystone Asset Management, will renovate the former Canada Permanent Trust Building in Toronto. Work on the 270,000-sq.-ft. space is expected to take between 12 and 18 months."
#> [15] "Westmount Signs & Printing in Waterloo is hiring experienced installers after doubling the size of its workforce to 24 employees in the last year and a half"
#> [16] "Microbrewery, Heral Haus Brewing Co. opened in Stratford at the end of December"
#> [17] "Demolition is expected to start this month on Windsor's old City Hall and is expected to be complete by August"
#> [18] "Urban Planet, a clothing store, will open as early as February 2019 at the Cornwall Square mall in Cornwall"
#> [19] "The federal government committed $3.5M towards the construction of a new art gallery in Thunder Bay, bringing total government funding for the project to $27.5M"
#> [20] "The Rec Room, a 44,000-sq.-ft. entertainment complex by Cineplex Entertainment LP, is scheduled to open in Mississauga in March "
#> [21] "Yang Teashop opened a second location in Toronto with plans to open two more locations in the Greater Toronto Area"
#> [22] "Spacecraft Brewery opened in Sudbury"
#> [23] "The Town of Lakeshore will be accepting applications for 11 summer student positions until March 1"
#> [24] "Virtual reality arcade Cntrl V opened in Lindsay"
#> [25] "A new restaurant, Presqu'ile Café and Burger, opened in Brighton"
#> [26] "Beauty brand Morphe LLC opened a store in Mississauga"
#> [27] "Footwear retailer Brown Shoe Company of Canada Ltd. Inc. will open an outlet store in Halton Hills in April"
#> [28] "The Westdale Theatre in Hamilton is scheduled to reopen in February "
#> [29] "Early ON/Family Grouping will open a child care centre in Monkton"
#> [30] "The De Novo addiction treatment centre opened in Huntsville "
#> [31] "French Revolution Bakery & Crêperie opened in Dundas"
#> [32] "A Williams Fresh Cafe is slated to open in Stoney Creek, one of three new locations opening this year in southwestern Ontario"
#> [33] "Monigram Coffee Midtown cafe will open in Kitchener this winter "
#> [34] "My Roti Place opened a fourth restaurant in Toronto"
#> [35] "A Gangster Cheese restaurant opened in Whitby"
#> [36] "A Copper Branch restaurant opened in Mississauga "
#> [37] "Hallmark Canada will exit about 20 company-owned stores across Canada in 2019 by either transitioning them to independent ownership or closing them. The loacations of the affected stores have not been identified."
#> [38] "Lush Cosmetics at the Intercity Shopping Centre in Thunder Bay will close at the end of January"
#> [39] ""
Created on 2019-01-28 by the reprex package (v0.2.1)

Related

How do you scrape multiple pages from same website on Rstudio

so I want to download data from multiple pages of the same website using RStudio
https://www.irishjobs.ie/ShowResults.aspx?Keywords=Data&autosuggestEndpoint=%2fautosuggest&Location=0&Category=&Recruiter=Company&btnSubmit=Search&Page=2
The difference between page 2 and page 3, is …at the end of the hyperlink we just have a 3 instead of a 2
I have no problem getting what I need from 25 jobs in 1 page, but I want to get 100 jobs from 4 pages.
I am using the selector gadget chrome extension.
I tried the for loop
for (page_result in seq(from =1, to = 101, by = 25)) {
link = paste0(“ https://www.irishjobs.ie/ShowResults.aspx?Keywords=Data&autosuggestEndpoint=%2fautosuggest&Location=0&Category=&Recruiter=Company&btnSubmit=Search&Page=2)
page = read_html(link)
I can’t figure out how to do it
I think I need to fit in page_result into the link, but I don’t know where.
I welcome any ideas.
i have the rvest package and the dplyr package. But I want the for loop to go through each page. Any idea how best to do this, thanks
4 links can be easily put in for loop.
Copy the CSS link from DOM and iterate over 5 to 30 to get all 25 jobs.
AllJOBS <- vector()
for (i in 1:4) {
print("s")
url <- paste0("https://www.irishjobs.ie/ShowResults.aspx?Keywords=Data&autosuggestEndpoint=%2fautosuggest&Location=0&Category=&Recruiter=Company&btnSubmit=Search&Page=",i,sep="")
for (k in 5:30) {
jobs <- read_html(url) %>% html_node(css = paste0("#page > div.container > div.column-wrap.order-one-two > div.two-thirds > div:nth-child(",k,") > div > div.job-result-logo-title > div.job-result-title > h2 > a")) %>% html_text()
AllJOBS <- append(AllJOBS,jobs)
Sys.sleep(runif(1,1,2))
print(k)
}
print(paste0("Page",i))
}
output
> AllJOBS
[1] "Senior Consultant - Fund Static Data"
[2] "Data Warehouse Engineer"
[3] "Senior Software Engineer - Big Data DevOps"
[4] "HR Data Analyst"
[5] "Data Insights Engineer - Dublin - Permanent/Contract - SQL Server"
[6] NA
[7] "Data Engineer - Master Data Services - SQL Server - Permanent/Contract"
[8] "Senior Data Protection Officer (DPO) - Contract"
[9] "QC Data Analyst (Trending)"
[10] "Senior Data Warehouse Developer"
[11] "Senior Data Analyst FTC"
[12] "Compliance Advisory and Data Protection Relationship Manager"
[13] "Contracts Manager-Data Center"
[14] "Payments Product Data Analyst"
[15] "Data Center Product Hardware Platform Engineer"
[16] "People Data Privacy Program Lead"
[17] "Head of Data Science"
[18] "Data Protection Counsel (Product or Compliance)"
[19] "Data Engineer, GMS"
[20] "Data Protection Associate General Counsel"
[21] "Senior Data Engineer"
[22] "Geospatial Data Scientist"
[23] "Data Solutions Manager"
[24] "Data Protection Solicitor"
[25] "Junior Data Scientist"
[26] "Master Data Specialist"
[27] "Temp QC Electronic Data Management Analyst"
[28] "20725 -Data Scientist - Limerick"
[29] "Technical Support Specialist - Data Centre"
[30] "Lead QC Micro Analyst (data review and compliance)"
[31] "Temp QC Data Analyst"
[32] "#Abbvie Compliance Engineer (Data Integrity)"
[33] "People Data Analyst"
[34] "Senior Electrical Design Engineer - Data Centre Ex"
[35] "Laboratory Data Entry Assistant, UCD NVRL"
[36] "Data Migrations Specialist"
[37] "Data Protection Officer"
[38] "Data Center Operations Engineer (Linux)"
[39] "Senior Electrical Engineer | Data Centre LV Design"
[40] "Data Scientist - (Process Sciences)"
[41] "Mgr Supply Logistics Global Materials Data"
[42] "Data Protection / Privacy Delivery Consultant"
[43] "Global Supply Chain Data Analyst"
[44] "QC Data Analyst"
[45] "0582GradeVIIFOIOLOL1120 - Grade VII Data Protection / Freedom of Information & Compliance Officer"
[46] "DPO001 - Deputy Data Protection Officer (General Manager) Office of the Head of Data Protection, HSE"
[47] "Senior Campaign Data Analyst"
[48] "Data & Reporting Analyst II"
[49] "Azure Data Analytics Solution Architect"
[50] "Head of Risk Assurance for IT, Data, Projects and Outsourcing"
[51] "Trainee Data Technician, Ireland"
[52] NA
You can deal with NAs separately. Does this answer your question or I misinterpreted it?

How to read a .txt file into a dataframe with readr?

I have the following data that I obtained from a .txt file using the read_lines function from readr
txtread<-read_lines("expenses_copy1.txt")
txtread
[1] "Amount:Category:Date:Description"
[2] "5.25:supply:20170222:box of staples"
[3] "79.81:meal:20170222:lunch with ABC Corp. clients Al, Bob, and Cy"
[4] "43.00:travel:20170222:cab back to office"
[5] "383.75:travel:20170223:flight to Boston, to visit ABC Corp."
[6] "55.00:travel:20170223:cab to ABC Corp. in Cambridge, MA"
[7] "23.25:meal:20170223:dinner at Logan Airport"
[8] "318.47:supply:20170224:paper, toner, pens, paperclips, tape"
[9] "142.12:meal:20170226:host dinner with ABC clients, Al, Bob, Cy, Dave, Ellie"
[10] "303.94:util:20170227:Peoples Gas"
[11] "121.07:util:20170227:Verizon Wireless"
[12] "7.59:supply:20170227:Python book (used)"
[13] "79.99:supply:20170227:spare 20\" monitor"
[14] "49.86:supply:20170228:Stoch Cal for Finance II"
[15] "6.53:meal:20170302:Dunkin Donuts, drive to Big Inc. near DC"
[16] "127.23:meal:20170302:dinner, Tavern64"
[17] "33.07:meal:20170303:dinner, Uncle Julio's"
[18] "86.00:travel:20170304:mileage, drive to/from Big Inc., Reston, VA"
[19] "22.00:travel:20170304:tolls"
[20] "378.81:travel:20170304:Hyatt Hotel, Reston VA, for Big Inc. meeting"
I want to read each of these in to vectors that are "Amount", "Category", "Date" and "Description" and create a dataframe out of them so that I have a dataset I can work with
I tried the following
for (i in length(txtread) ) {
data<-read.table(textConnection(txtread[[i]]))
print(data)
}
However this does't seem to work.
how can I read this data into a dataframe in R

Are there any website content monitoring packages in R?

I know there are free website content monitoring programs that send email alerts when the content of a website is changed, but is there a package (or any way to hard code) in R which can do this? It would be helpful to integrate this in one work flow.
R is a general purpose programming language so you can do anything with it.
Core idiom for what you are trying to do is:
Identify target site
Pull content & content metadata
Cache ^^ (you need to figure this out; RDBMS tables? NoSQL tables? Files?)
Let n time-periods pass (you need to figure this out: cron? launchd? Amazon lambda?)
Pull content & content metadata
Compare ^^ against cached versions; NOTE this works best if you know the structure of the target site vs use an overly generic framework)
If difference is "significant", notify via whatever means you want (you need to figure this out: email? SMS? Twitter?)
For content, you may not be aware that httr::GET() returns a rich, complex data object full of metadata. I did not do a str(res) below to encourage you to do so on your own.
library(httr)
library(rvest)
library(splashr)
library(hgr) # devtools::install_github("hrbrmstr/hgr")
library(tlsh) # devtools::install_github("hrbrmstr/tlsh")
target_url <- "https://www.whitehouse.gov/briefings-statements/"
Get it like a browser would
httr::GET(
url = target_url,
httr::user_agent(splashr::ua_macos_safari)
) -> res
Cache page size and use a substantial difference to signal notification
(page_size <- res$headers['content-length'])
## $`content-length`
## [1] "12783"
Calculate & cache local sensitify hash value use tlsh_simple_diff() to see if there are "substantial" hash changes and use that as a signal to notify:
doc_text <- httr::content(res, as = "text")
(doc_hash <- tlsh_simple_hash(doc_text))
## [1] "563386E33C44683E060B739261ADF20CB2D38563EE151C88A3F95169999FF97A1F385D"
This site uses structured <div>'s so cache and use more/fewer/different ones to signal notification:
doc <- httr::content(res)
news_items <- html_nodes(doc, "div.briefing-statement__content")
(total_news_items <- length(news_items))
## [1] 10
(headlines <- gsub("[[:space:]]+", " ", html_text(news_items, trim=TRUE)))
## [1] "News Clips CNBC: “Job Openings Hit Record 7.136 Million in August” Economy & Jobs Oct 16, 2018"
## [2] "Fact Sheets Congressional Democrats Want to Take Away Your Doctor, Outlaw Your Private Insurance, and Put Bureaucrats In Charge of Your Healthcare Healthcare Oct 16, 2018"
## [3] "Remarks Remarks by President Trump in Briefing on Hurricane Michael Land & Agriculture Oct 15, 2018"
## [4] "Remarks Remarks by President Trump and Governor Scott at FEMA Aid Distribution Center | Lynn Haven, FL Land & Agriculture Oct 15, 2018"
## [5] "Remarks Remarks by President Trump During Tour of Lynn Haven Community | Lynn Haven, FL Land & Agriculture Oct 15, 2018"
## [6] "Remarks Remarks by President Trump and Governor Scott Upon Arrival in Florida Land & Agriculture Oct 15, 2018"
## [7] "Remarks Remarks by President Trump Before Marine One Departure Foreign Policy Oct 15, 2018"
## [8] "Statements & Releases White House Appoints 2018-2019 Class of White House Fellows Oct 15, 2018"
## [9] "Statements & Releases President Donald J. Trump Approves Georgia Disaster Declaration Land & Agriculture Oct 14, 2018"
## [10] "Statements & Releases President Donald J. Trump Amends Florida Disaster Declaration Land & Agriculture Oct 14, 2018"
Use a "readability" tool to turn the contents into plaintext cache & compare with one of the many "text diff/string diff" R packages:
content_meta <- hgr::just_the_facts(target_url)
str(content_meta)
## List of 11
## $ title : chr "Briefings & Statements"
## $ content : chr "<p class=\"body-overflow\"> <header class=\"header\"> </header>\n<main id=\"main-content\"> <div class=\"page-r"| __truncated__
## $ lead_image_url: chr "https://www.whitehouse.gov/wp-content/uploads/2017/12/wh.gov-share-img_03-1024x538.png"
## $ next_page_url : chr "https://www.whitehouse.gov/briefings-statements/page/2"
## $ url : chr "https://www.whitehouse.gov/briefings-statements/"
## $ domain : chr "www.whitehouse.gov"
## $ excerpt : chr "Get official White House briefings, statements, and remarks from President Donald J. Trump and members of his Administration."
## $ word_count : int 22
## $ direction : chr "ltr"
## $ total_pages : int 2
## $ pages_rendered: int 2
## - attr(*, "row.names")= int 1
## - attr(*, "class")= chr "hgr"
Unfortunately, you asked a general purpose computing-ish question and, as such, it is likely to get closed.

How to group similar data in a column using nlp in r? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
complete dataset link : https://drive.google.com/open?id=12u0Ql1z5T2lzCXRVjp75i9ke9mNYrCWv
In this you can see general motors are not counted together as they are in different category. Like this many more manufacturer's are there. I want to group them together like General Motors. How can I group them together using nlp in r?
Try this way to achieve your goal:
Your Input data.frame:
Vehicle_Manufacturer<-c("GENERAL MOTORS CORP.","FORD MOTOR COMPANY","CHRYSLER CORPORATION","PACCAR INCORPORATED","MACK TRUCKS, INCORPORATED","FOREST RIVER, INC.","BLUE BIRD BODY COMPANY","DAIMLER TRUCKS NORTH AMERICA","GENERAL MOTORS LLC","HONEYWELL INTERNATIONAL, INC.","WINNEBAGO INDUSTRIES, INC.","BMW OF NORTH AMERICA, LLC","NISSAN NORTH AMERICA, INC.","NAVISTAR INTL CORP.","INTERNATIONAL TRUCK AND ENGINE","FREIGHTLINER LLC","HONDA (AMERICAN HONDA MOTOR CO.)","NEWMAR CORPORATION","NAVISTAR, INC","INTERNATIONAL TRUCK & ENGINE CORPORATION","PIERCE MANUFACTURING","GULF STREAM COACH, INC.","FLEETWOOD ENTERPRISES, INC.","FREIGHTLINER CORPORATION","DAIMLER TRUCKS NORTH AMERICA LLC","PACCAR, INCORPORATED","WHITE MOTOR CORPORATION","BAYERISCHE MOTOREN WERKE","THOMAS BUILT BUSES, INC.","DAIMLERCHRYSLER CORPORATION","VOLKSWAGEN OF AMERICA,INC","SPARTAN MOTORS, INC.","VOLVO TRUCKS NORTH AMERICA INC","TOYOTA MOTOR ENGINEERING & MANUFACTURING","PREVOST CAR, INCORPORATED","CHAMPION BUS, INC.","ALTEC INDUSTRIES INC.","SABERSPORT","MERCEDES-BENZ USA, LLC.","HARLEY-DAVIDSON MOTOR COMPANY","COOPER TIRE & RUBBER CO.","KEYSTONE RV COMPANY","SUBARU OF AMERICA, INC.","CHRYSLER (FCA US LLC)","MONACO COACH CORPORATION","CHRYSLER GROUP LLC","JAYCO, INC.","MITSUBISHI FUSO TRUCK OF AMERICA, INC.","COLLINS BUS CORPORATION","PRO-A MOTORS, INC.","NAVISTAR, INC.")
Recalls<-c(6228,5403,2787,2317,1988,1903,1898,1737,1620,1558,1353,1297,1174,1130,1055,987,985,980,955,950,925,922,918,896,835,824,818,801,797,794,749,731,724,709,694,669,641,623,616,613,599,586,582,578,578,572,569,568,559,549,511)
df<-data.frame(Vehicle_Manufacturer,Recalls)
Using package stringdist find similar strings between Vehicle_Manufacturer, in this example using Jaro-Winkler distance:
dist_matrix<-stringdistmatrix(as.character(df[,1]),as.character(df[,1]),method="jw")
Find a threshold under that similar strings are grouped, like this:
thr<-quantile(dist_matrix,probs=0.025) #2.5% quantile
Find strings to merge (in this example a for-loop but if you have a lot of data a lapply solution is better)
to_merge<-NULL
for(i in 1:nrow(df))
{
to_merge[[i]]<-Vehicle_Manufacturer[dist_matrix[i,]<thr]
}
Your output will be in to_merge list
To see only possible merge:
to_merge[sapply(to_merge, length) > 1]
[[1]]
[1] "GENERAL MOTORS CORP." "GENERAL MOTORS LLC"
[[2]]
[1] "PACCAR INCORPORATED" "PACCAR, INCORPORATED"
[[3]]
[1] "MACK TRUCKS, INCORPORATED" "PACCAR, INCORPORATED"
[[4]]
[1] "DAIMLER TRUCKS NORTH AMERICA" "DAIMLER TRUCKS NORTH AMERICA LLC"
[[5]]
[1] "GENERAL MOTORS CORP." "GENERAL MOTORS LLC"
[[6]]
[1] "NAVISTAR INTL CORP." "NAVISTAR, INC" "NAVISTAR, INC."
[[7]]
[1] "NAVISTAR INTL CORP." "NAVISTAR, INC" "NAVISTAR, INC."
[[8]]
[1] "DAIMLER TRUCKS NORTH AMERICA" "DAIMLER TRUCKS NORTH AMERICA LLC"
[[9]]
[1] "PACCAR INCORPORATED" "MACK TRUCKS, INCORPORATED" "PACCAR, INCORPORATED"
[[10]]
[1] "NAVISTAR INTL CORP." "NAVISTAR, INC" "NAVISTAR, INC."

Collapse elements separated by ""

I have raw bibliographic data as follows:
bib =
c("Bernal, Martin, \\\"Liu Shi-p\\'ei and National Essence,\\\" in Charlotte",
"Furth, ed., *The Limit of Change, Essays on Conservative Alternatives in",
"Republican China*, Cambridge: Harvard University Press, 1976.",
"", "Chen,Hsi-yuan, \"*Last Chapter Unfinished*: The Making of the *Draft Qing",
"History* and the Crisis of Traditional Chinese Historiography,\"",
"*Historiography East & West*2.2 (Sept. 2004): 173-204", "",
"Dennerline, Jerry, *Qian Mu and the World of Seven Mansions*, New Haven:",
"Yale University Press, 1988.", "")
[1] "Bernal, Martin, \\\"Liu Shi-p\\'ei and National Essence,\\\" in Charlotte"
[2] "Furth, ed., *The Limit of Change, Essays on Conservative Alternatives in"
[3] "Republican China*, Cambridge: Harvard University Press, 1976."
[4] ""
[5] "Chen,Hsi-yuan, \"*Last Chapter Unfinished*: The Making of the *Draft Qing"
[6] "History* and the Crisis of Traditional Chinese Historiography,\""
[7] "*Historiography East & West*2.2 (Sept. 2004): 173-204"
[8] ""
[9] "Dennerline, Jerry, *Qian Mu and the World of Seven Mansions*, New Haven:"
[10] "Yale University Press, 1988."
[11] ""
I would like to collapse elements between the ""s in one line so that:
clean_bib[1]=paste(bib[1], bib[2], bib[3])
clean_bib[2]=paste(bib[5], bib[6], bib[7])
clean_bib[3]=paste(bib[9], bib[10])
[1] "Bernal, Martin, \\\"Liu Shi-p\\'ei and National Essence,\\\" in Charlotte Furth, ed., *The Limit of Change, Essays on Conservative Alternatives in Republican China*, Cambridge: Harvard University Press, 1976."
[2] "Chen,Hsi-yuan, \"*Last Chapter Unfinished*: The Making of the *Draft Qing History* and the Crisis of Traditional Chinese Historiography,\" *Historiography East & West*2.2 (Sept. 2004): 173-204"
[3] "Dennerline, Jerry, *Qian Mu and the World of Seven Mansions*, New Haven: Yale University Press, 1988."
Is there a one-liner that does this automatically?
You can use tapply while grouping with all "" then paste together the groups
unname(tapply(bib,cumsum(bib==""),paste,collapse=" "))
[1] "Bernal, Martin, \\\"Liu Shi-p\\'ei and National Essence,\\\" in Charlotte Furth, ed., *The Limit of Change, Essays on Conservative Alternatives in Republican China*, Cambridge: Harvard University Press, 1976."
[2] " Chen,Hsi-yuan, \"*Last Chapter Unfinished*: The Making of the *Draft Qing History* and the Crisis of Traditional Chinese Historiography,\" *Historiography East & West*2.2 (Sept. 2004): 173-204"
[3] " Dennerline, Jerry, *Qian Mu and the World of Seven Mansions*, New Haven: Yale University Press, 1988."
[4] ""
you can also do:
unname(c(by(bib,cumsum(bib==""),paste,collapse=" ")))
or
unname(tapply(bib,cumsum(grepl("^$",bib)),paste,collapse=" "))
etc
Similar to the other answer. This uses split and sapply. The second line is just to remove any elements with only has "".
vec <- unname(sapply(split(bib, f = cumsum(bib %in% "")), paste0, collapse = " "))
vec[!vec %in% ""]

Resources