Transform a large list into a tibble with one column containing all elements [duplicate] - r

This question already has answers here:
Convert a list to a data frame
(26 answers)
Closed 3 years ago.
I'am sorry for this question because it seems quite obvious but I can't come up with a solution myself. I have a large list of 130 elements each a list of 10 single character strings.
I want to have this as a combined tibble with one column containing all strings.
If I try do.call(dplyr::bind_rows, y) on my list I still get an error: Error: Argument 1 must have names
For more insight about the list I will post the console output of the first sublist by calling dput(bribe.test[[1]])
dput(bribe.test[1])
list(list("\r\n Supercharge your R/C vehicle and also this systems will boost horsepower and performance of any RC nitro engines, visit us to get online xtm racing, xtm racing rail, xtm racing engine, xtm xt2 engine, and xtm nitro engine. Visit # https://rbinnovations.com/collections/super-chargers/xtm-racing\r\n ",
"\r\n The Powermatic 2+ or Powermatic 2 Plus Electric Cigarette Rolling Machine uses an electric spoon-fed cigarette injector that will make king size or 100's cigarettes in a few seconds and you can buy it online with us at Hard Working Products. Visit https://hardworkingproducts.com/powermatic-2\r\n ",
"\r\n Hello sir, My uncle just coming india yesterday night at ahmedabad airport from New Zealand. And i gave him 2 iphone , iphone 8 plus and iphone 11 pro.. and they called by custom department. The officer told him that they are not allowed with these phone. They force him to pay 42,000/- custom duty for these phone. He just arrived that's why they haven't got money at that time. But his son gave him 600 nzd for his expenses. And these bloody corrupt office force him to pay 600$. They felt helpless at that time and gave 600$ with the passport.My uncle dont know his name. You can check cameras if you want, he was at counter around 1:00 o'clock at night. It is bloody bad experience with them. I'm going to tell my friends and all the relatives which are here to not go india ever..\nI'm felling helpless to come my home country. If you can then take strict actions against these bloody corrupt officers who are cheating with our nation. Please take strict action. Hope you can save our nation from this corrupt officers\nSingapor airlines \nSQ530 arrived at 21:50 evening on 6/1/20\nThank you\r\n ",
"\r\n Date of the incident: 29th December 2019\nTime of incident: Around 8 PM in the evening\nPlace of incident: ECR road, Pondicherry to Tamil Nadu check post.\nWhile driving back from Pondicherry to our stay near ECR road, we (4 people in the car) took 8 beer cans of 500 ml each. At the checkpost (just 100mtrs before our lodge) police stopped us, started checking the vehicle. We voluntarily declared the beer quantity and handed over to them.\nThey asked us to pay Rs 4200 and go else, they will create a case on us and arrest us, seize the vehicle. Since we took the vehicle from self drive agency, we really wanted come out of this. We apologise to them as we weren't aware of the border lines between the states. Requested them to dispose the beers and let us go. My 5 year old daughter was crying seeing the officers are not allowing me to leave. Nothing was fruitful and we literally beg them to leave us. Language was a big barrier as we don't know tamil and none of the officers understand English/hindi properly. Somehow a communication happened and I had to show them the account balance online as I didn't have that much cash with me. Finally, the officer agreed to leave us with a cost of Rs 500 and 4 beer cans.\nWe noticed at the same time, 4 college students from chennai were also got caught with a bag full of Liquors. The officer was very casual to them and also denied money from them even though they offered him 200 rupees. They may be from families where the indian law does not get applied easily. I understand that.\nWe can't speak tamil or pondi language.. Is this what you are angry on us? Is this what you discriminate us? Don't you ruin the future of your own students in the name of partiality??\r\n ",
"\r\n Dear Sir,\nThis is not the first time I am facing this issue with Rohit Gas Agency. I tried to bring it to the notice of Indane. Its of no use. Rohit Gas Agency provides worst service. We do not have option. To Deliver the Cylinder, the Delivery boy demands Rs 50 everytime. This is a common issue. If not paid he shouts badly on road and moves out. Rohit Gas Agency is always unreachable. These bugs working in the Gas Agency are eating up the money paid by Gas Subscribers. \nMany a times, the cylinder is not delivered to home. We are forced to collect the Cylinder paying additional bribe of Rs 50 near Godown. If not paid, we need to lift the cylinder and carry the same back till the car parking and drive back home. \nThe Gas Delivery - Rohit Gas Agency is unfit to manage the delivery business. Please look into the complaints and reviews on google atleast. \nRegards\nPrashanth .P\r\n ",
"\r\n I paid bribe today to a police officer who came for passport verification of my mother. Even after providing all supporting documents and required information, officer asked to pay 500Rs for Chai Pani. When I asked to reduce the amount, officer said that it is decided by higher officials of police. \nI feel very bad after paying, this practice is so common in UP. Please take necessary actions against this to prevent civilians from such corrupt people. \nOfficer Name - Indrapal Singh\nThana - New Agra Police Station\nDate - 6th Jan 2020\nPlace - Agra\r\n ",
"\r\n I have asked to pay bribe to avoid huge penalty for putting tent sheet on car windows. Police asked me to pay 1100 rs fine or pay bribe instead of that. Since I don't had that much money and I was in urgency, I paid bribe to escape from the situation. This was happened at corporation circle church opposite to church at 12 30 PM. \r\n ",
"\r\n Help desk officer prashant who are trapping people to make work done by giving bribes to higher officials at malakpet rto malakpet Hyderabad \r\n ",
"\r\n Get free shipping when you buy the Revolution the great american electric cigarette machine, within the continental US from https://hardworkingproducts.com/revolution-electric-cigarette-machine-made-in-america and also you will get this machine at best market price in USA.\r\n ",
"\r\n I Would like to Inform you that a lot of corruption is going on in the DC Office Bangalore Urban Dept. I am not paid bribe directly there is lot more agents have to collect the money and some one has do the deel not direct deel with D C Officer. Brib agents collecting the money and send it to direct DC officer house. The Officer have a one more home office in Kumarakrupa road bangalore. the deeling files as going their for officer signature. One agent is doing his job in that office his name called Mahendre Kumar (Shift car No.KA 04 MK 282) Please do the action for this. Govt officers also been included in this deels and they get commission also.\nNames Sadanada Swamy , Basavaraju, G N Shivamurthy. \r\n "))

You could use unlist with tibble
df_tib <- tibble::tibble(col = unlist(bribe.test))
Or data.frame
df1 <- data.frame(col = unlist(bribe.test), stringsAsFactors = FALSE)

Related

Wordcloud2 - separate words for counting

am trying to extract the words so that I can create a wordcloud but have some difficulties
this is the code:
library(readxl)
data <- read_excel("C:\\Users\\me\\OneDrive\\Desktop\\ToPandas.xlsx")
data2 <-data$articlesDescription
#install.packages("wordcloud2")
#install.packages("tidyverse")
#install.packages("tidytext")
library(wordcloud2)
library(tidyverse)
library(tidytext)
data2 <- gsub('[^[:alnum:] ]', '', data2)
data2 <- data2 %>%
ungroup()
data3.df <- as.data.frame(data2)
data3 <- data3.df
data3 <- data3%>%
anti_join(get_stopwords())%>%
unnest_tokens(word, text) %>%
count(word, sort = TRUE)
I have put the hash tags in front of the install packages so it does not try to reinstall.
up to data2 until I start to ungroup then I get this error:
Error in UseMethod("ungroup") : no applicable method for 'ungroup'
applied to an object of class "character"
then when it tries to move forward I get this:
Error in anti_join(): ! by must be supplied when x and y have
no common variables. i use by = character()` to perform a cross-join.
I think that my error stems from the first error (ungroup) but I can't figure out how to do it so I can count the words
this is a sample of how the imported xlsx file looks like:
ToPandas_xlsx Image
Can anyone point me into the right direction?
thanks :)
EDIT 1: adding info from json file (had to remove a row since it was over 3.000 characters beyond the limit):
{\"articlesName\":\"Texas threatens to become next flash point on voting rules\",\"articlesShortDescription\":\"Texas appeared on Thursday to become the next flash point on politically charged issues in Corporate America after legislation passed by the state Senate to limit voting access prompted a rebuke from American Airlines.\",\"articlesDescription\":\"[{\\u0022type\\u0022:\\u0022paragraph\\u0022,\\u0022content\\u0022:\\u0022WASHINGTON Texas appeared on Thursday to become the next flash point on politically charged issues in Corporate America after legislation passed by the state Senate to limit voting access prompted a rebuke from American Airlines.\\u0022},{\\u0022type\\u0022:\\u0022paragraph\\u0022,\\u0022content\\u0022:\\u0022\\u201cWe are strongly opposed to this bill and others like it,\\u201d Fort Worth, Texas-based American said in a statement.\\u0022},{\\u0022type\\u0022:\\u0022paragraph\\u0022,\\u0022content\\u0022:\\u0022The legislation, which is now set to go before the Texas House of Representatives, would eliminate drive-through voting, limit polling site hours and give partisan poll watchers more autonomy.\\u0022},{\\u0022type\\u0022:\\u0022paragraph\\u0022,\\u0022content\\u0022:\\u0022Southwest Airlines, also based in Texas, declined to say if it opposed the legislation but said: \\u201cWe believe every voter should have a fair opportunity to let their voice be heard. This right is essential to our nation\\u2019s success.\\u201d\\u0022},{\\u0022type\\u0022:\\u0022paragraph\\u0022,\\u0022content\\u0022:\\u0022The Texas effort drew sharp criticism from voting rights advocates and Democrats in the state, who argue that the legislation would make it more difficult for Texans, particularly those of color, to cast ballots.\\u0022},{\\u0022type\\u0022:\\u0022paragraph\\u0022,\\u0022content\\u0022:\\u0022The state already has some of the most stringent voting laws in the country, according to election experts. A state House of Representatives committee on Thursday was holding a hearing on a companion bill that would impose other voting restrictions.\\u0022},{\\u0022type\\u0022:\\u0022paragraph\\u0022,\\u0022content\\u0022:\\u0022Texas is one of several states, including Georgia, Florida, Arizona and Iowa, where Republican lawmakers have pursued new voting limits after former President Donald Trump falsely blamed his November loss on widespread voter fraud despite no evidence.\\u0022},{\\u0022type\\u0022:\\u0022paragraph\\u0022,\\u0022content\\u0022:\\u0022Republican lawmakers say the law is needed to ensure public confidence in election integrity.\\u0022},{\\u0022type\\u0022:\\u0022paragraph\\u0022,\\u0022content\\u0022:\\u0022The comments by American and Southwest came after Atlanta-based Delta Air Lines and Coca-Cola on Wednesday joined a growing number of companies that challenged the state of Georgia\\u2019s new voting restrictions.\\u0022},{\\u0022type\\u0022:\\u0022paragraph\\u0022,\\u0022content\\u0022:\\u0022Delta CEO Ed Bastian blasted the law on Wednesday in a reversal from an initial statement last week that sparked a popular backlash.\\u0022},{\\u0022type\\u0022:\\u0022paragraph\\u0022,\\u0022content\\u0022:\\u0022But his new stance drew condemnation from Georgia\\u2019s Republican Governor Brian Kemp and many Republicans, including Senator Marco Rubio who questioned why Delta criticized Georgia but not China.\\u0022},{\\u0022type\\u0022:\\u0022paragraph\\u0022,\\u0022content\\u0022:\\u0022\\u201cFar too many multinational corporations are too eager to make their voices heard on the woke issues of the day in the United States, but remain stunningly silent, or in Delta\\u2019s case, complicit, in real, ongoing atrocities in countries like China.\\u201d Rubio wrote.\\u0022},{\\u0022type\\u0022:\\u0022paragraph\\u0022,\\u0022content\\u0022:\\u0022Delta did not immediately comment on Rubio\\u2019s letter.\\u0022},{\\u0022type\\u0022:\\u0022paragraph\\u0022,\\u0022content\\u0022:\\u0022The Georgia House late Wednesday voted to repeal a jet fuel sales tax break that Delta uses but the state Senate did not act on it before the legislative session adjourned.\\u0022},{\\u0022type\\u0022:\\u0022paragraph\\u0022,\\u0022content\\u0022:\\u0022Kemp told Fox Business he thought the tax issue was \\u201cmoot\\u201d now that the legislature had adjourned.\\u0022}]\",\"minutesToRead\":3,\"primaryAssetType\":0,\"wordCount\":null,\"urlSupplier\":\"https:\\/\\/www.reuters.com\\/article\\/us-usa-election-texas\\/texas-threatens-to-become-next-flash-point-on-voting-rules-idUSKBN2BO6SI\",\"canonicalSupplier\":\"https:\\/\\/www.reuters.com\\/article\\/us-usa-election-texas-idUSKBN2BO6SI\",\"publishedAt\":{\"date\":\"2021-04-01 21:55:32.000000\",\"timezone_type\":3,\"timezone\":\"UTC\"},\"dateModified\":{\"date\":\"2021-04-02 05:15:53.000000\",\"timezone_type\":3,\"timezone\":\"UTC\"},\"files\":[{\"filesName\":null,\"filesTitle\":null,\"filesDescription\":\"Presidio County election judge Lauren Martinez folds a booth after polls and voting ended for the 2020 U.S. presidential election in Marfa, Texas, U.S., November 3, 2020. REUTERS\\/Adrees Latif\",\"contentType\":\"image\\/jpeg\",\"urlCdn\":\"https:\\/\\/static.reuters.com\\/resources\\/r\\/?m=02\\u0026d=20210402\\u0026t=2\\u0026i=1557110638\\u0026r=LYNXMPEH303NA\"}],\"videos\":[],\"tags\":[{\"name\":\"United States\",\"slug\":\"united-states\"},{\"name\":\"Company News\",\"slug\":\"company-news\"},{\"name\":\"Reuters Top News\",\"slug\":\"reuters-top-news\"},{\"name\":\"Government \\/ Politics\",\"slug\":\"government-politics\"},{\"name\":\"Fundamental Rights \\/ Civil Liberties\",\"slug\":\"fundamental-rights-civil-liberties\"},{\"name\":\"Lawmaking\",\"slug\":\"lawmaking\"},{\"name\":\"Airlines (TRBC level 4)\",\"slug\":\"airlines-trbc-level-4\"},{\"name\":\"Elections \\/ Voting\",\"slug\":\"elections-voting\"},{\"name\":\"Regional Airlines (TRBC level 5)\",\"slug\":\"regional-airlines-trbc-level-5\"},{\"name\":\"Texas\",\"slug\":\"texas\"},{\"name\":\"Georgia (US State)\",\"slug\":\"georgia-us-state\"},{\"name\":\"US House of Representatives\",\"slug\":\"us-house-of-representatives\"}],\"keywords\":[{\"keywordName\":\"United States\",\"keywordSlug\":\"united-states\"},{\"keywordName\":\"US\",\"keywordSlug\":\"us\"},{\"keywordName\":\"Company News\",\"keywordSlug\":\"company-news\"},{\"keywordName\":\"Reuters Top News\",\"keywordSlug\":\"reuters-top-news\"},{\"keywordName\":\"Government \\/ Politics\",\"keywordSlug\":\"government-politics\"},{\"keywordName\":\"Fundamental Rights \\/ Civil Liberties\",\"keywordSlug\":\"fundamental-rights-civil-liberties\"},{\"keywordName\":\"Lawmaking\",\"keywordSlug\":\"lawmaking\"},{\"keywordName\":\"USA\",\"keywordSlug\":\"usa\"},{\"keywordName\":\"Airlines (TRBC level 4)\",\"keywordSlug\":\"airlines-trbc-level-4\"},{\"keywordName\":\"Elections \\/ Voting\",\"keywordSlug\":\"elections-voting\"},{\"keywordName\":\"Regional Airlines (TRBC level 5)\",\"keywordSlug\":\"regional-airlines-trbc-level-5\"},{\"keywordName\":\"Texas\",\"keywordSlug\":\"texas\"},{\"keywordName\":\"ELECTION\",\"keywordSlug\":\"election\"},{\"keywordName\":\"Georgia (US State)\",\"keywordSlug\":\"georgia-us-state\"},{\"keywordName\":\"US House of Representatives\",\"keywordSlug\":\"us-house-of-representatives\"}],\"n2\":[],\"authors\":[{\"authorName\":\"Tracy Rucinski\"},{\"authorName\":\"David Shepardson\"},{\"authorName\":\"Joseph Ax\"}]},{\"articlesName\":\"U.S. Supreme Court permits FCC to loosen media ownership rules\",\"articlesShortDescription\":\"The U.S. Supreme Court on Thursday allowed the Federal Communication Commission to loosen local media ownership restrictions, handing a victory to broadcasters in a ruling that could facilitate industry consolidation as consumers increasingly move online.\",\"articlesDescription\":\"[{\\u0022type\\u0022:\\u0022paragraph\\u0022,\\u0022content\\u0022:\\u0022WASHINGTON (Reuters) -The U.S. Supreme Court on Thursday allowed the Federal Communication Commission to loosen local media ownership restrictions, handing a victory to broadcasters in a ruling that could facilitate industry consolidation as consumers increasingly move online.\\u0022},{\\u0022type\\u0022:\\u0022paragraph\\u0022,\\u0022content\\u0022:\\u0022In a 9-0 ruling authored by Justice Brett Kavanaugh, the justices overturned a lower court decision that had blocked the FCC\\u2019s repeal of some media ownership regulations in 2017 for failing to consider the effects on ownership by racial minorities and women. Critics of the industry have said further consolidation could limit media choices for consumers.\\u0022},{\\u0022type\\u0022:\\u0022paragraph\\u0022,\\u0022content\\u0022:\\u0022The justices acted in appeals by the FCC, companies including News Corp, Fox Corp and Sinclair Broadcast Group Inc and the National Association of Broadcasters.\\u0022},{\\u0022type\\u0022:\\u0022paragraph\\u0022,\\u0022content\\u0022:\\u0022The associations for other broadcast networks\\u2019 local affiliates, including ABC, NBC and CBS, backed the appeals, arguing that consolidation would help ensure the economic survival of local television amid heavy competition from internet companies that provide video content. Broadcast television stations have said they are increasingly losing advertising dollars to digital platforms.\\u0022},{\\u0022type\\u0022:\\u0022paragraph\\u0022,\\u0022content\\u0022:\\u0022In 2017, the FCC - then led by Republicans during former President Donald Trump\\u2019s administration - voted to eliminate a ban in place since 1975 on cross-ownership of a newspaper and TV station in a major market. It also voted to make it easier for media companies to buy additional TV stations in the same market, and for companies to buy additional radio stations in some markets.\\u0022},{\\u0022type\\u0022:\\u0022paragraph\\u0022,\\u0022content\\u0022:\\u0022The FCC, now equally divided between Democrats and Republicans, is led by acting chairwoman Jessica Rosenworcel, a Democrat, who voted against the 2017 decision. The agency is set to have a Democratic majority once President Joe Biden nominates and the Senate confirms a new commissioner. The FCC could then seek to reverse the 2017 order.\\u0022},{\\u0022type\\u0022:\\u0022paragraph\\u0022,\\u0022content\\u0022:\\u0022Rosenworcel did not immediately respond to a request for comment after the ruling.\\u0022},{\\u0022type\\u0022:\\u0022paragraph\\u0022,\\u0022content\\u0022:\\u0022Writing for the unanimous court, Kavanaugh said that the FCC reasonably reviewed the ownership rules to find that repealing or modifying them \\u201cwas not likely to harm minority and female ownership.\\u201d\\u0022},{\\u0022type\\u0022:\\u0022paragraph\\u0022,\\u0022content\\u0022:\\u0022Kavanaugh added: \\u201cThe FCC reasoned that the historical justifications for those ownership rules no longer apply in today\\u2019s media market, and that permitting efficient combinations among radio stations, television stations and newspapers would benefit consumers.\\u201d\\u0022},{\\u0022type\\u0022:\\u0022paragraph\\u0022,\\u0022content\\u0022:\\u0022The case highlighted diverging views on the best way to ensure a competitive environment that promotes a broad range of local news and information. Critics of the FCC\\u2019s action have said relaxing ownership rules could jeopardize a wider array of sources at the local level.\\u0022},{\\u0022type\\u0022:\\u0022paragraph\\u0022,\\u0022content\\u0022:\\u0022The Philadelphia-based 3rd U.S. Circuit Court of Appeals had thwarted the FCC\\u2019s efforts to revise the rules since 2003 in a series of decisions. The new rules were challenged by a number of community advocacy groups led by the Prometheus Radio Project. The 3rd Circuit in 2019 blocked the new rules.\\u0022},{\\u0022type\\u0022:\\u0022paragraph\\u0022,\\u0022content\\u0022:\\u0022Former FCC Commissioner Mike O\\u2019Rielly, a Republican who voted for the 2017 order, said he expects there will be some \\u201cstrategic deals\\u201d to consolidate in which a local newspaper could be acquired, but that \\u201cno massive deals\\u201d are going to happen given the struggling local media sector.\\u0022},{\\u0022type\\u0022:\\u0022paragraph\\u0022,\\u0022content\\u0022:\\u0022Cheryl Leanza, a lawyer for the plaintiffs who challenged the 2017 FCC decision, said that \\u201cthe good news is the Biden FCC, once it gains a working majority, can quickly get to work building a solid record to promote the public interest standard and media ownership diversity.\\u201d\\u0022},{\\u0022type\\u0022:\\u0022paragraph\\u0022,\\u0022content\\u0022:\\u0022Advocacy group Free Press said the Biden FCC and Congress \\u201cmust recognize that hedge-fund and Wall Street-driven consolidation harms local communities, and only decimates what\\u2019s left of competition and diversity. ... The silver lining here is (the court) deferred to the agency\\u2019s judgment and left room for a new commission to get this right.\\u201d\\u0022},{\\u0022type\\u0022:\\u0022paragraph\\u0022,\\u0022content\\u0022:\\u0022David Chavern, CEO of the News Media Alliance group that represents more than 2,000 news organizations, hailed the ruling and said the previous restrictions \\u201chad shackled the newspaper industry for far too long. The repeal of the ban will generate much needed investments and cross-platform synergies that will help sustain local news media.\\u201d\\u0022}]\",\"minutesToRead\":5,\"primaryAssetType\":0,\"wordCount\":null,\"urlSupplier\":\"https:\\/\\/www.reuters.com\\/article\\/us-usa-court-fcc\\/u-s-supreme-court-permits-fcc-to-loosen-media-ownership-rules-idUSKBN2BO5S4\",\"canonicalSupplier\":\"https:\\/\\/www.reuters.com\\/article\\/us-usa-court-fcc-idUSKBN2BO5S4\",\"publishedAt\":{\"date\":\"2021-04-01 14:25:10.000000\",\"timezone_type\":3,\"timezone\":\"UTC\"},\"dateModified\":{\"date\":\"2021-04-01 16:45:47.000000\",\"timezone_type\":3,\"timezone\":\"UTC\"},\"files\":[{\"filesName\":null,\"filesTitle\":null,\"filesDescription\":\"The Supreme Court is seen in Washington, U.S., December 11, 2020. REUTERS\\/Joshua Roberts\\/File Photo\",\"contentType\":\"image\\/jpeg\",\"urlCdn\":\"https:\\/\\/static.reuters.com\\/resources\\/r\\/?m=02\\u0026d=20210401\\u0026t=2\\u0026i=1557040208\\u0026r=LYNXMPEH3030U\"}],\"videos\":[],\"tags\":[{\"name\":\"United States\",\"slug\":\"united-states\"},{\"name\":\"Corporate Events\",\"slug\":\"corporate-events\"},{\"name\":\"Company News\",\"slug\":\"company-news\"},{\"name\":\"Financials (Legacy)\",\"slug\":\"financials-legacy\"},{\"name\":\"Financials (TRBC level 1)\",\"slug\":\"financials-trbc-level-1\"},{\"name\":\"Arts \\/ Culture \\/ Entertainment\",\"slug\":\"arts-culture-entertainment\"},{\"name\":\"Reuters Top News\",\"slug\":\"reuters-top-news\"},{\"name\":\"Major News\",\"slug\":\"major-news\"},{\"name\":\"Cyclical Consumer Services (TRBC level 2)\",\"slug\":\"cyclical-consumer-services-trbc-level-2\"},{\"name\":\"Consumer Cyclicals (TRBC level 1)\",\"slug\":\"consumer-cyclicals-trbc-level-1\"},{\"name\":\"Technology (TRBC level 1)\",\"slug\":\"technology-trbc-level-1\"},{\"name\":\"Technology \\/ Media \\/ Telecoms\",\"slug\":\"technology-media-telecoms\"},{\"name\":\"General News\",\"slug\":\"general-news\"},{\"name\":\"Government \\/ Politics\",\"slug\":\"government-politics\"},{\"name\":\"Media \\u0026 Publishing (TRBC level 3)\",\"slug\":\"media-publishing-trbc-level-3\"},{\"name\":\"Media \\/ Publishing (Legacy)\",\"slug\":\"media-publishing-legacy\"},{\"name\":\"Society \\/ Social Issues\",\"slug\":\"society-social-issues\"},{\"name\":\"Crime \\/ Law \\/ Justice\",\"slug\":\"crime-law-justice\"},{\"name\":\"Corporate \\/ Market Regulation\",\"slug\":\"corporate-market-regulation\"},{\"name\":\"Judicial Process \\/ Court Cases \\/ Court Decisions\",\"slug\":\"judicial-process-court-cases-court-decisions\"},{\"name\":\"Broadcasting (TRBC level 4)\",\"slug\":\"broadcasting-trbc-level-4\"},{\"name\":\"US Government News\",\"slug\":\"us-government-news\"},{\"name\":\"Consumer Publishing (TRBC level 4)\",\"slug\":\"consumer-publishing-trbc-level-4\"},{\"name\":\"Broadcasting (NEC) (TRBC level 5)\",\"slug\":\"broadcasting-nec-trbc-level-5\"},{\"name\":\"Advertising \\u0026 Marketing (TRBC level 4)\",\"slug\":\"advertising-marketing-trbc-level-4\"},{\"name\":\"Women\\u0027s Issues\",\"slug\":\"womens-issues\"},{\"name\":\"Race Relations \\/ Ethnic Issues\",\"slug\":\"race-relations-ethnic-issues\"},{\"name\":\"Newspaper Publishing (TRBC level 5)\",\"slug\":\"newspaper-publishing-trbc-level-5\"},{\"name\":\"Television Broadcasting (TRBC level 5)\",\"slug\":\"television-broadcasting-trbc-level-5\"},{\"name\":\"US Supreme Court\",\"slug\":\"us-supreme-court\"}],\"keywords\":[{\"keywordName\":\"United States\",\"keywordSlug\":\"united-states\"},{\"keywordName\":\"US\",\"keywordSlug\":\"us\"},{\"keywordName\":\"Corporate Events\",\"keywordSlug\":\"corporate-events\"},{\"keywordName\":\"Company News\",\"keywordSlug\":\"company-news\"},{\"keywordName\":\"Financials (Legacy)\",\"keywordSlug\":\"financials-legacy\"},{\"keywordName\":\"Financials (TRBC level 1)\",\"keywordSlug\":\"financials-trbc-level-1\"},{\"keywordName\":\"Arts \\/ Culture \\/ Entertainment\",\"keywordSlug\":\"arts-culture-entertainment\"},{\"keywordName\":\"Reuters Top News\",\"keywordSlug\":\"reuters-top-news\"},{\"keywordName\":\"Major News\",\"keywordSlug\":\"major-news\"},{\"keywordName\":\"Cyclical Consumer Services (TRBC level 2)\",\"keywordSlug\":\"cyclical-consumer-services-trbc-level-2\"},{\"keywordName\":\"Consumer Cyclicals (TRBC level 1)\",\"keywordSlug\":\"consumer-cyclicals-trbc-level-1\"},{\"keywordName\":\"Technology (TRBC level 1)\",\"keywordSlug\":\"technology-trbc-level-1\"},{\"keywordName\":\"Technology \\/ Media \\/ Telecoms\",\"keywordSlug\":\"technology-media-telecoms\"},{\"keywordName\":\"General News\",\"keywordSlug\":\"general-news\"},{\"keywordName\":\"Government \\/ Politics\",\"keywordSlug\":\"government-politics\"},{\"keywordName\":\"Media \\u0026 Publishing (TRBC level 3)\",\"keywordSlug\":\"media-publishing-trbc-level-3\"},{\"keywordName\":\"Media \\/ Publishing (Legacy)\",\"keywordSlug\":\"media-publishing-legacy\"},{\"keywordName\":\"Society \\/ Social Issues\",\"keywordSlug\":\"society-social-issues\"},{\"keywordName\":\"Crime \\/ Law \\/ Justice\",\"keywordSlug\":\"crime-law-justice\"},{\"keywordName\":\"Corporate \\/ Market Regulation\",\"keywordSlug\":\"corporate-market-regulation\"},{\"keywordName\":\"Judicial Process \\/ Court Cases \\/ Court Decisions\",\"keywordSlug\":\"judicial-process-court-cases-court-decisions\"},{\"keywordName\":\"Broadcasting (TRBC level 4)\",\"keywordSlug\":\"broadcasting-trbc-level-4\"},{\"keywordName\":\"USA\",\"keywordSlug\":\"usa\"},{\"keywordName\":\"COURT\",\"keywordSlug\":\"court\"},{\"keywordName\":\"US Government News\",\"keywordSlug\":\"us-government-news\"},{\"keywordName\":\"Consumer Publishing (TRBC level 4)\",\"keywordSlug\":\"consumer-publishing-trbc-level-4\"},{\"keywordName\":\"Broadcasting (NEC) (TRBC level 5)\",\"keywordSlug\":\"broadcasting-nec-trbc-level-5\"},{\"keywordName\":\"Advertising \\u0026 Marketing (TRBC level 4)\",\"keywordSlug\":\"advertising-marketing-trbc-level-4\"},{\"keywordName\":\"Women\\u0027s Issues\",\"keywordSlug\":\"womens-issues\"},{\"keywordName\":\"Race Relations \\/ Ethnic Issues\",\"keywordSlug\":\"race-relations-ethnic-issues\"},{\"keywordName\":\"Newspaper Publishing (TRBC level 5)\",\"keywordSlug\":\"newspaper-publishing-trbc-level-5\"},{\"keywordName\":\"Television Broadcasting (TRBC level 5)\",\"keywordSlug\":\"television-broadcasting-trbc-level-5\"},{\"keywordName\":\"US Supreme Court\",\"keywordSlug\":\"us-supreme-court\"},{\"keywordName\":\"FCC\",\"keywordSlug\":\"fcc\"}],\"n2\":[],\"authors\":[{\"authorName\":\"Andrew Chung\"},{\"authorName\":\"David Shepardson\"}]}
Maybe this will be enough to get you started:
test <- data.frame(Text = rep("The quick brown fox jumped over the lazy dog's back.", 5))
Now split out the words:
test.lst <- strsplit(test$Text, " ")
test.lst[[1]]
# [1] "The" "quick" "brown" "fox" "jumped" "over" "the" "lazy" "dog's" "back."
Get rid of the punctuation:
test.lst2 <- lapply(test.lst, function(x) gsub("[[:punct:]]", "", x))
test.lst2[[1]]
# [1] "The" "quick" "brown" "fox" "jumped" "over" "the" "lazy" "dogs" "back"
test.lst2 is a list containing a part for each row of the data. If you want to collapse. To get frequencies:
table(unlist(test.lst2))
back brown dogs fox jumped lazy over quick the The
5 5 5 5 5 5 5 5 5 5

how to split a piece text by a word in R?( break the text after a specific word)

I need to split pdf files into their chapters. In each pdf, at the beginning of every chapter, I added the word "Hirfar" for which to look and split the text. Consider the following example:
t <- c(" Hirfar Mark Zuckerberg has hit back at the testimony of the Facebook whistleblower Frances Haugen, saying her claims the company puts profit over people’s safety are “just not true”.
Hirfar In a blogpost, the Facebook founder and chief executive addressed one of the most damaging statements in Haugen’s opening speech to US senators on Tuesday, that Facebook puts “astronomical profits before people”.
Hirfar “At the heart of these accusations is this idea that we prioritise profit over safety and wellbeing. That’s just not true,” he said.
Hirfar He added: “The argument that we deliberately push content that makes people angry for profit is deeply illogical. We make money from ads, and advertisers consistently tell us they don’t want their ads next to harmful or angry content.”
Hirfar Zuckerberg said many of the claims made by Haugen – and in the Wall Street Journal, based on documents she leaked – “don’t make any sense”. The most damaging reporting in the WSJ, reiterated at length by Haugen in testimony to the US Senate on Tuesday, was that Facebook failed to act on internal research showing that its Instagram app was damaging teenagers’ mental health.")
here I used this code to break it into its words:
library(stringr)
wrds <- str_split(t, pattern = boundary(type = "word")
now, I want to look for the word "Hirfar" and separate this text into 5 different texts. Each of which must include the first word after Hirfar up to the next word before Hirfar.
We may use regex lookaround
strsplit(t, "\\s+(?=Hirfar)", perl = TRUE)[[1]][-1]
-output
[1] "Hirfar Mark Zuckerberg has hit back at the testimony of the Facebook whistleblower Frances Haugen, saying her claims the company puts profit over people’s safety are “just not true”."
[2] "Hirfar In a blogpost, the Facebook founder and chief executive addressed one of the most damaging statements in Haugen’s opening speech to US senators on Tuesday, that Facebook puts “astronomical profits before people”."
[3] "Hirfar “At the heart of these accusations is this idea that we prioritise profit over safety and wellbeing. That’s just not true,” he said."
[4] "Hirfar He added: “The argument that we deliberately push content that makes people angry for profit is deeply illogical. We make money from ads, and advertisers consistently tell us they don’t want their ads next to harmful or angry content.”"
[5] "Hirfar Zuckerberg said many of the claims made by Haugen – and in the Wall Street Journal, based on documents she leaked – “don’t make any sense”. The most damaging reporting in the WSJ, reiterated at length by Haugen in testimony to the US Senate on Tuesday, was that Facebook failed to act on internal research showing that its Instagram app was damaging teenagers’ mental health."
If it shouldn't include Hirfar
strsplit(t, "Hirfar\\s+")[[1]][-1]
[1] "Mark Zuckerberg has hit back at the testimony of the Facebook whistleblower Frances Haugen, saying her claims the company puts profit over people’s safety are “just not true”.\n\n"
[2] "In a blogpost, the Facebook founder and chief executive addressed one of the most damaging statements in Haugen’s opening speech to US senators on Tuesday, that Facebook puts “astronomical profits before people”.\n\n "
[3] "“At the heart of these accusations is this idea that we prioritise profit over safety and wellbeing. That’s just not true,” he said.\n\n"
[4] "He added: “The argument that we deliberately push content that makes people angry for profit is deeply illogical. We make money from ads, and advertisers consistently tell us they don’t want their ads next to harmful or angry content.”\n\n"
[5] "Zuckerberg said many of the claims made by Haugen – and in the Wall Street Journal, based on documents she leaked – “don’t make any sense”. The most damaging reporting in the WSJ, reiterated at length by Haugen in testimony to the US Senate on Tuesday, was that Facebook failed to act on internal research showing that its Instagram app was damaging teenagers’ mental health."

How to read a list of values into a data table in a sandbox?

I have a list of data. It's all a single column, each row is a comment from a post asking for book recommendations. Here's an example, containing the first 2 entries:
"My recommendations from books I read this year:<p>Bad Blood : Man, this book really does read like a Hollywood movie screenplay. The rise and fall of Theranos, documented through interviews with hundreds of ex-employees by the very author who came up with the first expose of Theranos. Truly shows the flaws in the "fake it before you make it" mindset and how we glorify "geniuses".<p>Shoe Dog : Biography of the founder of Nike. Really liked how it's not just a book glorifying the story of Nike, but tells the tale of how much effort, balance and even pure luck went into making the company the household name it is today.<p>Master Algorithm : It's a book about the different fields of Machine learning (from Bayesian to Genetic evolution algos) and talks about the pros and cons of each and how these can play together to create a "master algorithm" for learning. It's a good primer for people entering the field and while it's not a DIY, it shows the scope of the problem of learning as a whole.<p>Three Body Problem: Finally, after years of people telling me to read this (on HN and off), I read the trilogy (Remembrance of Earth's Past), and I must say, the series does live up to the hype. Not only is it fast paced and deeply philosophical, but it's presented in a format very accessible to casual readers as well (unlike many hard sci-fi books which seem to revel in complexity). If I had to describe this series in a single line, it's "What would happen if China was the country that made first contact with an alien race?"","A selection:<p>Sapiens (Yuval Noah Harari, 2014 [English]) - A bit late to the party on this one. Mostly enjoyed it, especially the early ancient history stuff, but I felt it got a bit contrived in the middle - like the author was forcing it. Overall a good read though.<p>How to Invent Everything (Ryan North, 2018) - First book I've pre-ordered in a long time. A look at the history of civilization and technology through a comedic lens. Pretty funny and enjoyable.<p>The Rise of Theodore Roosevelt (Edmund Morris, 1979) - Randomly happened across this book while browsing a used bookstore for some stuff to read on a summer vacation. Loved it. It's big, but reads pretty quick for a biography. I've been a fan of TR since I first really learned about him in High School and I would recommend this for anyone interested in TR/The West/Americana.<p>Jaws (Peter Benchley, 1974) - Quite a bit darker than the movie.<p>Sharp Objects (Gillian Flynn, 2006) - I enjoyed Gone Girl (book and film) so I wanted to read this before the HBO series. To be honest...not my cup of tea. It was <i>okay</i>.<p>The Art of Racing in the Rain (Garth Stein, 2008) - Made me cry on an airplane. Thankfully my coworkers were on a different flight."
(Notice, comments are separated by ",")
I'm trying to load this list into a data table in an R sandbox (rapporter.net). But because of browser security, I can't load a local file (fread, read.table).
How can I read raw data into a data table in R?

Extract textbook names and journal articles from various syllabi

I am trying to extract textbook names, and other journal articles in syllabi collected from various courses using R. My basic assumption is that most of these will be in some kind of a citation format (e.g. APA, MLA, etc). While I can try to create regex-s to extract this information, I was wondering if anyone has tried to do this before, or if an R package exists that I may be able to use to extract this information from differently formatted text.
Below are two examples of the syllabi that I am working with. In Sample 1, the book name is not in a citation format, but in sample 2, it is in a citation format. Both samples have been truncated to meet stackoverflow character limits.
SAMPLE 1:
"ABC State University ARTS 3366 Intermediate Digital Photography Fall 2015 JCM 4127 T/TH 2­4:30 pm Lecturer: John Smith Office Hours: T/TH prior to and after class Email: ​johnsmith#abcstate.edu Alternate email: johnsmith#gmail.com Prerequisites​: ARTS 3364 ­ Introduction to Digital Photography Course Description & Objectives: This course is designed to expand and build on the skills and knowledge acquired in Introduction to Digital Photography. This course builds on the skills and knowledge acquired in Introduction to Digital Photography. Specifically, we will use the history, critical analysis, and production of photography books to: (1) explore the complexities of the medium in social, political, and aesthetic contexts; (2) develop more advanced and conceptually driven photography work; (3) work toward a greater understanding of how photography books function as self­contained art, cultural, and political objects; (4) learn how to choose subject matter and continually explore, experiment, and refine our work. The final outcomes ofthe class will be the creation of an on­demand book and an accompanying folio of fine prints. We will use digital cameras, inkjet printers, Adobe Photoshop, Lightroom, and Macintosh computers in this course. Through lectures, discussions and readings, we will explore and discuss historical trends in traditional (analog) photography, as well as emerging practices in contemporary digital imaging. This will serve as a foundation to help determine the approach, subject matter, and style of the work created for class. In addition to refining these skills, students will also address the practical and theoretical roles of digital imagery. The course objective will be to focus on technical, aesthetic, and conceptual growth of a student’s endeavors in the digital medium. This course requires the completion of: all assignments (on time), participation in all group critiques and completion of a Twelve to Fifteen image final portfolio of prints or equivalent, and three projects throughout the semester. Requirements: Coursework: This course requires the completion of: all assignments (on time), participation in all group critiques and completion of a 12­15 image final portfolio of prints or equivalent, the creation of a book printed with an on demand printing service, as well as making new photographs consistently throughout the entire semester. Suggested (not required)Books: Adobe Photoshop Lightroom 5 Book, The: The Complete Guide for Photographers By Martin Evening Published Jun 30, 2013 by Adobe Press The Photographer’s Playbook 307 Assignments and Ideas Edited by Jason Fulford and Gregory Halpern Published by Aperture On Being a Photographer: A Practical Guide ​ ​by David Hurn and Bill Jay Local Stores:"
SAMPLE 2:"Physical Education Activity ProgramHealth & Fitness Strength TrainingKINE 198-837Instructor: JANE DOE Office: PEAP 230Office Hours: By appointmentPhone: (000) 000-0000E-Mail: jdoe#xyz.edu A. Activity Instructor: Jane DoeOffice: PEAP 250Office Hours: By appointmentClass Time: Thursday 2:20 pmPhone: (000) 000-0000Email: jdoe#xyz1.edu Class Meeting Site: PEAP 117B. Activity Instructor: Jane Doe Phone:Office: PEAP 239Email: jdoe#xyz1.eduOffice Hours: Thursday 10:00 am – 12:00 pmClass Time: Thursday 2:20 pmClass Meeting Site: PEAP 118C. Activity Instructor: John doe Office: PEAP 250/Doe 213KOffice Hours: Tuesday 1:00-2:00 pmClass Time: Thursday 2:20 pmPhone:Email: johndoe#xyz.eduClass Meeting Site: PEAP 120Attire: Proper clothes and shoes designed specifically for strength training on activitydays.Required Materials:Bounds, L., Agnor, D., Darnell,G., & Brekken Shea, K. (2012). Health & Fitness: AGuide to a Healthy Lifestyle (5th edition). Dubuque, IA: Kendall/Hunt Publishing Co.ISBN 978-1-4652-0712-8Cissik, J. (2001). The Basics of Strength Training (3rd Edition). McGraw-Hill,Primus Custom PublishingCourse Description:Health and Fitness is intended for the student who is seeking knowledge and practicalapplication of wellness choices to their life. The course consists of two components,lecture and activity. Students will meet face-to-face one day per week for the activityportion of the class and work approximately the equivalent of one day per week onlinewith lecture materials. The lecture portion will cover current health issues includingmental and physical health, nutrition, human sexuality, communicable and noncommunicable diseases, use and abuse of drugs, and safety. The activity portion willconsist of 14 class days and cover basic knowledge and techniques of strength trainingand improving the individual’s fitness through the utilization of this knowledge.Course Rationale:Research indicates that daily health/fitness related behaviors enhance learning anddetermine the quality and longevity of our life."

Bypass Style Formatting when Parsing RSS Feed in R

I am trying to scrape and parse the following RSS feed http://www.nestle.com/_handlers/rss.ashx?q=068f9d6282034061936dbe150c72d197. I have no problem to extract the basic items that I need (e.g., title, description, pubDate) using the following code:
library(RCurl)
library(XML)
xml.url <- "http://www.nestle.com/_handlers/rss.ashx?q=068f9d6282034061936dbe150c72d197"
script <- getURL(xml.url)
doc <- xmlParse(script)
titles <- xpathSApply(doc,'//item/title',xmlValue)
descriptions <- xpathSApply(doc,'//item/description',xmlValue)
pubdates <- xpathSApply(doc,'//item/pubDate',xmlValue)
My problem is that the output for item "description" includes not only the actual text but also a lot of style formatting expressions. For example, the first element is:
descriptions[1]
[1] "<p><iframe height=\"322\" src=\"https://www.youtube-nocookie.com/embed/fhESDXnlMa0?rel=0\" frameBorder=\"0\" width=\"572\"></iframe><br />\n<br />\n<p><em>Nescafé</em> is partnering with Facebook to launch an immersive video, pioneering new technology just released for the platform.</p>\n<p>\nThe <em>Nescafé</em> <a class=\"externalLink\" title=\"Opens in a new window: Nescafé on Facebook\" href=\"https://www.facebook.com/Nescafe/videos/vb.203900255471/10156233581755472/?type=2&theater\" target=\"_blank\">‘Good Morning World’ video</a> stars people in kitchens across the world, performing the hit song ‘Don’t Worry’ using spoons, cups, forks and a jar of coffee. Uniquely, viewers can rotate their smartphones through 360˚ to explore the video, the first time this has been possible on Facebook.</p>\n<p>\n“We know young coffee lovers pick up their phone at the start of every day looking to be entertained by real experiences. The 360˚ video allows us to be engaging in an innovative way,” said Carsten Fredholm, Senior Vice President of Nestlé’s Beverage Strategic Business Unit.\n</p>\n<p><em>Nescafé</em> recently teamed up with Google to offer the first virtual reality coffee experience through the <em>Nescafé 360˚</em> app. It also became the first global brand to move its website onto Tumblr, to strengthen connections with younger fans by allowing them to create and share content.</p>\n<p>The Nestlé brand is one of only six globally to partner Facebook for the launch of this technology.</p></p>"
I can think of a regex approach to replace the unwanted character strings. However, is there a way to access the plain text elements of item "description" directly through xpath?
Any help with this issue, is very much appreciated. Thank you.
You can do:
descriptions <- sapply(descriptions, function(x) {
xmlValue(xmlRoot(htmlParse(x)))
}, USE.NAMES=FALSE)
which gives (via cat(stringr::str_wrap(descriptions[[1]], 70)):
In a move that will provide young Europeans increased access to
jobs and training opportunities, Nestlé and the Alliance for YOUth
have joined the European Pact for Youth as founding members. Seven
million people in Europe under the age of 25 are still inactive -
neither in employment, education or training. The European Pact for
Youth, created by European CSR business network CSR Europe and the
European Commission, aims to work together with businesses, youth
organisations, education providers and other stakeholders to reduce
skills gaps and increase youth employability. As part of the Pact, the
Alliance for YOUth will focus on setting up âdual learningâ schemes
across Europe, combining formal education with apprenticeships and on-
the-job training to help match skills with jobs on the market. The
Alliance for YOUth is a group of almost 200 companies mobilised by
Nestlé to help young people in Europe find work. It has pledged to
create 100,000 employability opportunities by 2017 and has already met
half of this target in its first year. Luis Cantarell, Executive Vice
President for Nestlé and co-initiator of the European Pact for Youth,
said: âPromoting a cultural shift to dual learning schemes based on
business-education collaboration is at the heart of Nestléâs youth
employment initiative since its start in 2013. The European Pact for
Youth will help to build a skilled workforce and will tackle youth
unemployment.â Learn more about the European Pact for Youth and read
their press release.
There are \n characters at various points in the resultant text (in almost all the descriptions) but you can gsub those away.

Resources