am trying to extract the words so that I can create a wordcloud but have some difficulties
this is the code:
library(readxl)
data <- read_excel("C:\\Users\\me\\OneDrive\\Desktop\\ToPandas.xlsx")
data2 <-data$articlesDescription
#install.packages("wordcloud2")
#install.packages("tidyverse")
#install.packages("tidytext")
library(wordcloud2)
library(tidyverse)
library(tidytext)
data2 <- gsub('[^[:alnum:] ]', '', data2)
data2 <- data2 %>%
ungroup()
data3.df <- as.data.frame(data2)
data3 <- data3.df
data3 <- data3%>%
anti_join(get_stopwords())%>%
unnest_tokens(word, text) %>%
count(word, sort = TRUE)
I have put the hash tags in front of the install packages so it does not try to reinstall.
up to data2 until I start to ungroup then I get this error:
Error in UseMethod("ungroup") : no applicable method for 'ungroup'
applied to an object of class "character"
then when it tries to move forward I get this:
Error in anti_join(): ! by must be supplied when x and y have
no common variables. i use by = character()` to perform a cross-join.
I think that my error stems from the first error (ungroup) but I can't figure out how to do it so I can count the words
this is a sample of how the imported xlsx file looks like:
ToPandas_xlsx Image
Can anyone point me into the right direction?
thanks :)
EDIT 1: adding info from json file (had to remove a row since it was over 3.000 characters beyond the limit):
{\"articlesName\":\"Texas threatens to become next flash point on voting rules\",\"articlesShortDescription\":\"Texas appeared on Thursday to become the next flash point on politically charged issues in Corporate America after legislation passed by the state Senate to limit voting access prompted a rebuke from American Airlines.\",\"articlesDescription\":\"[{\\u0022type\\u0022:\\u0022paragraph\\u0022,\\u0022content\\u0022:\\u0022WASHINGTON Texas appeared on Thursday to become the next flash point on politically charged issues in Corporate America after legislation passed by the state Senate to limit voting access prompted a rebuke from American Airlines.\\u0022},{\\u0022type\\u0022:\\u0022paragraph\\u0022,\\u0022content\\u0022:\\u0022\\u201cWe are strongly opposed to this bill and others like it,\\u201d Fort Worth, Texas-based American said in a statement.\\u0022},{\\u0022type\\u0022:\\u0022paragraph\\u0022,\\u0022content\\u0022:\\u0022The legislation, which is now set to go before the Texas House of Representatives, would eliminate drive-through voting, limit polling site hours and give partisan poll watchers more autonomy.\\u0022},{\\u0022type\\u0022:\\u0022paragraph\\u0022,\\u0022content\\u0022:\\u0022Southwest Airlines, also based in Texas, declined to say if it opposed the legislation but said: \\u201cWe believe every voter should have a fair opportunity to let their voice be heard. This right is essential to our nation\\u2019s success.\\u201d\\u0022},{\\u0022type\\u0022:\\u0022paragraph\\u0022,\\u0022content\\u0022:\\u0022The Texas effort drew sharp criticism from voting rights advocates and Democrats in the state, who argue that the legislation would make it more difficult for Texans, particularly those of color, to cast ballots.\\u0022},{\\u0022type\\u0022:\\u0022paragraph\\u0022,\\u0022content\\u0022:\\u0022The state already has some of the most stringent voting laws in the country, according to election experts. A state House of Representatives committee on Thursday was holding a hearing on a companion bill that would impose other voting restrictions.\\u0022},{\\u0022type\\u0022:\\u0022paragraph\\u0022,\\u0022content\\u0022:\\u0022Texas is one of several states, including Georgia, Florida, Arizona and Iowa, where Republican lawmakers have pursued new voting limits after former President Donald Trump falsely blamed his November loss on widespread voter fraud despite no evidence.\\u0022},{\\u0022type\\u0022:\\u0022paragraph\\u0022,\\u0022content\\u0022:\\u0022Republican lawmakers say the law is needed to ensure public confidence in election integrity.\\u0022},{\\u0022type\\u0022:\\u0022paragraph\\u0022,\\u0022content\\u0022:\\u0022The comments by American and Southwest came after Atlanta-based Delta Air Lines and Coca-Cola on Wednesday joined a growing number of companies that challenged the state of Georgia\\u2019s new voting restrictions.\\u0022},{\\u0022type\\u0022:\\u0022paragraph\\u0022,\\u0022content\\u0022:\\u0022Delta CEO Ed Bastian blasted the law on Wednesday in a reversal from an initial statement last week that sparked a popular backlash.\\u0022},{\\u0022type\\u0022:\\u0022paragraph\\u0022,\\u0022content\\u0022:\\u0022But his new stance drew condemnation from Georgia\\u2019s Republican Governor Brian Kemp and many Republicans, including Senator Marco Rubio who questioned why Delta criticized Georgia but not China.\\u0022},{\\u0022type\\u0022:\\u0022paragraph\\u0022,\\u0022content\\u0022:\\u0022\\u201cFar too many multinational corporations are too eager to make their voices heard on the woke issues of the day in the United States, but remain stunningly silent, or in Delta\\u2019s case, complicit, in real, ongoing atrocities in countries like China.\\u201d Rubio wrote.\\u0022},{\\u0022type\\u0022:\\u0022paragraph\\u0022,\\u0022content\\u0022:\\u0022Delta did not immediately comment on Rubio\\u2019s letter.\\u0022},{\\u0022type\\u0022:\\u0022paragraph\\u0022,\\u0022content\\u0022:\\u0022The Georgia House late Wednesday voted to repeal a jet fuel sales tax break that Delta uses but the state Senate did not act on it before the legislative session adjourned.\\u0022},{\\u0022type\\u0022:\\u0022paragraph\\u0022,\\u0022content\\u0022:\\u0022Kemp told Fox Business he thought the tax issue was \\u201cmoot\\u201d now that the legislature had adjourned.\\u0022}]\",\"minutesToRead\":3,\"primaryAssetType\":0,\"wordCount\":null,\"urlSupplier\":\"https:\\/\\/www.reuters.com\\/article\\/us-usa-election-texas\\/texas-threatens-to-become-next-flash-point-on-voting-rules-idUSKBN2BO6SI\",\"canonicalSupplier\":\"https:\\/\\/www.reuters.com\\/article\\/us-usa-election-texas-idUSKBN2BO6SI\",\"publishedAt\":{\"date\":\"2021-04-01 21:55:32.000000\",\"timezone_type\":3,\"timezone\":\"UTC\"},\"dateModified\":{\"date\":\"2021-04-02 05:15:53.000000\",\"timezone_type\":3,\"timezone\":\"UTC\"},\"files\":[{\"filesName\":null,\"filesTitle\":null,\"filesDescription\":\"Presidio County election judge Lauren Martinez folds a booth after polls and voting ended for the 2020 U.S. presidential election in Marfa, Texas, U.S., November 3, 2020. REUTERS\\/Adrees Latif\",\"contentType\":\"image\\/jpeg\",\"urlCdn\":\"https:\\/\\/static.reuters.com\\/resources\\/r\\/?m=02\\u0026d=20210402\\u0026t=2\\u0026i=1557110638\\u0026r=LYNXMPEH303NA\"}],\"videos\":[],\"tags\":[{\"name\":\"United States\",\"slug\":\"united-states\"},{\"name\":\"Company News\",\"slug\":\"company-news\"},{\"name\":\"Reuters Top News\",\"slug\":\"reuters-top-news\"},{\"name\":\"Government \\/ Politics\",\"slug\":\"government-politics\"},{\"name\":\"Fundamental Rights \\/ Civil Liberties\",\"slug\":\"fundamental-rights-civil-liberties\"},{\"name\":\"Lawmaking\",\"slug\":\"lawmaking\"},{\"name\":\"Airlines (TRBC level 4)\",\"slug\":\"airlines-trbc-level-4\"},{\"name\":\"Elections \\/ Voting\",\"slug\":\"elections-voting\"},{\"name\":\"Regional Airlines (TRBC level 5)\",\"slug\":\"regional-airlines-trbc-level-5\"},{\"name\":\"Texas\",\"slug\":\"texas\"},{\"name\":\"Georgia (US State)\",\"slug\":\"georgia-us-state\"},{\"name\":\"US House of Representatives\",\"slug\":\"us-house-of-representatives\"}],\"keywords\":[{\"keywordName\":\"United States\",\"keywordSlug\":\"united-states\"},{\"keywordName\":\"US\",\"keywordSlug\":\"us\"},{\"keywordName\":\"Company News\",\"keywordSlug\":\"company-news\"},{\"keywordName\":\"Reuters Top News\",\"keywordSlug\":\"reuters-top-news\"},{\"keywordName\":\"Government \\/ Politics\",\"keywordSlug\":\"government-politics\"},{\"keywordName\":\"Fundamental Rights \\/ Civil Liberties\",\"keywordSlug\":\"fundamental-rights-civil-liberties\"},{\"keywordName\":\"Lawmaking\",\"keywordSlug\":\"lawmaking\"},{\"keywordName\":\"USA\",\"keywordSlug\":\"usa\"},{\"keywordName\":\"Airlines (TRBC level 4)\",\"keywordSlug\":\"airlines-trbc-level-4\"},{\"keywordName\":\"Elections \\/ Voting\",\"keywordSlug\":\"elections-voting\"},{\"keywordName\":\"Regional Airlines (TRBC level 5)\",\"keywordSlug\":\"regional-airlines-trbc-level-5\"},{\"keywordName\":\"Texas\",\"keywordSlug\":\"texas\"},{\"keywordName\":\"ELECTION\",\"keywordSlug\":\"election\"},{\"keywordName\":\"Georgia (US State)\",\"keywordSlug\":\"georgia-us-state\"},{\"keywordName\":\"US House of Representatives\",\"keywordSlug\":\"us-house-of-representatives\"}],\"n2\":[],\"authors\":[{\"authorName\":\"Tracy Rucinski\"},{\"authorName\":\"David Shepardson\"},{\"authorName\":\"Joseph Ax\"}]},{\"articlesName\":\"U.S. Supreme Court permits FCC to loosen media ownership rules\",\"articlesShortDescription\":\"The U.S. Supreme Court on Thursday allowed the Federal Communication Commission to loosen local media ownership restrictions, handing a victory to broadcasters in a ruling that could facilitate industry consolidation as consumers increasingly move online.\",\"articlesDescription\":\"[{\\u0022type\\u0022:\\u0022paragraph\\u0022,\\u0022content\\u0022:\\u0022WASHINGTON (Reuters) -The U.S. Supreme Court on Thursday allowed the Federal Communication Commission to loosen local media ownership restrictions, handing a victory to broadcasters in a ruling that could facilitate industry consolidation as consumers increasingly move online.\\u0022},{\\u0022type\\u0022:\\u0022paragraph\\u0022,\\u0022content\\u0022:\\u0022In a 9-0 ruling authored by Justice Brett Kavanaugh, the justices overturned a lower court decision that had blocked the FCC\\u2019s repeal of some media ownership regulations in 2017 for failing to consider the effects on ownership by racial minorities and women. Critics of the industry have said further consolidation could limit media choices for consumers.\\u0022},{\\u0022type\\u0022:\\u0022paragraph\\u0022,\\u0022content\\u0022:\\u0022The justices acted in appeals by the FCC, companies including News Corp, Fox Corp and Sinclair Broadcast Group Inc and the National Association of Broadcasters.\\u0022},{\\u0022type\\u0022:\\u0022paragraph\\u0022,\\u0022content\\u0022:\\u0022The associations for other broadcast networks\\u2019 local affiliates, including ABC, NBC and CBS, backed the appeals, arguing that consolidation would help ensure the economic survival of local television amid heavy competition from internet companies that provide video content. Broadcast television stations have said they are increasingly losing advertising dollars to digital platforms.\\u0022},{\\u0022type\\u0022:\\u0022paragraph\\u0022,\\u0022content\\u0022:\\u0022In 2017, the FCC - then led by Republicans during former President Donald Trump\\u2019s administration - voted to eliminate a ban in place since 1975 on cross-ownership of a newspaper and TV station in a major market. It also voted to make it easier for media companies to buy additional TV stations in the same market, and for companies to buy additional radio stations in some markets.\\u0022},{\\u0022type\\u0022:\\u0022paragraph\\u0022,\\u0022content\\u0022:\\u0022The FCC, now equally divided between Democrats and Republicans, is led by acting chairwoman Jessica Rosenworcel, a Democrat, who voted against the 2017 decision. The agency is set to have a Democratic majority once President Joe Biden nominates and the Senate confirms a new commissioner. The FCC could then seek to reverse the 2017 order.\\u0022},{\\u0022type\\u0022:\\u0022paragraph\\u0022,\\u0022content\\u0022:\\u0022Rosenworcel did not immediately respond to a request for comment after the ruling.\\u0022},{\\u0022type\\u0022:\\u0022paragraph\\u0022,\\u0022content\\u0022:\\u0022Writing for the unanimous court, Kavanaugh said that the FCC reasonably reviewed the ownership rules to find that repealing or modifying them \\u201cwas not likely to harm minority and female ownership.\\u201d\\u0022},{\\u0022type\\u0022:\\u0022paragraph\\u0022,\\u0022content\\u0022:\\u0022Kavanaugh added: \\u201cThe FCC reasoned that the historical justifications for those ownership rules no longer apply in today\\u2019s media market, and that permitting efficient combinations among radio stations, television stations and newspapers would benefit consumers.\\u201d\\u0022},{\\u0022type\\u0022:\\u0022paragraph\\u0022,\\u0022content\\u0022:\\u0022The case highlighted diverging views on the best way to ensure a competitive environment that promotes a broad range of local news and information. Critics of the FCC\\u2019s action have said relaxing ownership rules could jeopardize a wider array of sources at the local level.\\u0022},{\\u0022type\\u0022:\\u0022paragraph\\u0022,\\u0022content\\u0022:\\u0022The Philadelphia-based 3rd U.S. Circuit Court of Appeals had thwarted the FCC\\u2019s efforts to revise the rules since 2003 in a series of decisions. The new rules were challenged by a number of community advocacy groups led by the Prometheus Radio Project. The 3rd Circuit in 2019 blocked the new rules.\\u0022},{\\u0022type\\u0022:\\u0022paragraph\\u0022,\\u0022content\\u0022:\\u0022Former FCC Commissioner Mike O\\u2019Rielly, a Republican who voted for the 2017 order, said he expects there will be some \\u201cstrategic deals\\u201d to consolidate in which a local newspaper could be acquired, but that \\u201cno massive deals\\u201d are going to happen given the struggling local media sector.\\u0022},{\\u0022type\\u0022:\\u0022paragraph\\u0022,\\u0022content\\u0022:\\u0022Cheryl Leanza, a lawyer for the plaintiffs who challenged the 2017 FCC decision, said that \\u201cthe good news is the Biden FCC, once it gains a working majority, can quickly get to work building a solid record to promote the public interest standard and media ownership diversity.\\u201d\\u0022},{\\u0022type\\u0022:\\u0022paragraph\\u0022,\\u0022content\\u0022:\\u0022Advocacy group Free Press said the Biden FCC and Congress \\u201cmust recognize that hedge-fund and Wall Street-driven consolidation harms local communities, and only decimates what\\u2019s left of competition and diversity. ... The silver lining here is (the court) deferred to the agency\\u2019s judgment and left room for a new commission to get this right.\\u201d\\u0022},{\\u0022type\\u0022:\\u0022paragraph\\u0022,\\u0022content\\u0022:\\u0022David Chavern, CEO of the News Media Alliance group that represents more than 2,000 news organizations, hailed the ruling and said the previous restrictions \\u201chad shackled the newspaper industry for far too long. The repeal of the ban will generate much needed investments and cross-platform synergies that will help sustain local news media.\\u201d\\u0022}]\",\"minutesToRead\":5,\"primaryAssetType\":0,\"wordCount\":null,\"urlSupplier\":\"https:\\/\\/www.reuters.com\\/article\\/us-usa-court-fcc\\/u-s-supreme-court-permits-fcc-to-loosen-media-ownership-rules-idUSKBN2BO5S4\",\"canonicalSupplier\":\"https:\\/\\/www.reuters.com\\/article\\/us-usa-court-fcc-idUSKBN2BO5S4\",\"publishedAt\":{\"date\":\"2021-04-01 14:25:10.000000\",\"timezone_type\":3,\"timezone\":\"UTC\"},\"dateModified\":{\"date\":\"2021-04-01 16:45:47.000000\",\"timezone_type\":3,\"timezone\":\"UTC\"},\"files\":[{\"filesName\":null,\"filesTitle\":null,\"filesDescription\":\"The Supreme Court is seen in Washington, U.S., December 11, 2020. REUTERS\\/Joshua Roberts\\/File Photo\",\"contentType\":\"image\\/jpeg\",\"urlCdn\":\"https:\\/\\/static.reuters.com\\/resources\\/r\\/?m=02\\u0026d=20210401\\u0026t=2\\u0026i=1557040208\\u0026r=LYNXMPEH3030U\"}],\"videos\":[],\"tags\":[{\"name\":\"United States\",\"slug\":\"united-states\"},{\"name\":\"Corporate Events\",\"slug\":\"corporate-events\"},{\"name\":\"Company News\",\"slug\":\"company-news\"},{\"name\":\"Financials (Legacy)\",\"slug\":\"financials-legacy\"},{\"name\":\"Financials (TRBC level 1)\",\"slug\":\"financials-trbc-level-1\"},{\"name\":\"Arts \\/ Culture \\/ Entertainment\",\"slug\":\"arts-culture-entertainment\"},{\"name\":\"Reuters Top News\",\"slug\":\"reuters-top-news\"},{\"name\":\"Major News\",\"slug\":\"major-news\"},{\"name\":\"Cyclical Consumer Services (TRBC level 2)\",\"slug\":\"cyclical-consumer-services-trbc-level-2\"},{\"name\":\"Consumer Cyclicals (TRBC level 1)\",\"slug\":\"consumer-cyclicals-trbc-level-1\"},{\"name\":\"Technology (TRBC level 1)\",\"slug\":\"technology-trbc-level-1\"},{\"name\":\"Technology \\/ Media \\/ Telecoms\",\"slug\":\"technology-media-telecoms\"},{\"name\":\"General News\",\"slug\":\"general-news\"},{\"name\":\"Government \\/ Politics\",\"slug\":\"government-politics\"},{\"name\":\"Media \\u0026 Publishing (TRBC level 3)\",\"slug\":\"media-publishing-trbc-level-3\"},{\"name\":\"Media \\/ Publishing (Legacy)\",\"slug\":\"media-publishing-legacy\"},{\"name\":\"Society \\/ Social Issues\",\"slug\":\"society-social-issues\"},{\"name\":\"Crime \\/ Law \\/ Justice\",\"slug\":\"crime-law-justice\"},{\"name\":\"Corporate \\/ Market Regulation\",\"slug\":\"corporate-market-regulation\"},{\"name\":\"Judicial Process \\/ Court Cases \\/ Court Decisions\",\"slug\":\"judicial-process-court-cases-court-decisions\"},{\"name\":\"Broadcasting (TRBC level 4)\",\"slug\":\"broadcasting-trbc-level-4\"},{\"name\":\"US Government News\",\"slug\":\"us-government-news\"},{\"name\":\"Consumer Publishing (TRBC level 4)\",\"slug\":\"consumer-publishing-trbc-level-4\"},{\"name\":\"Broadcasting (NEC) (TRBC level 5)\",\"slug\":\"broadcasting-nec-trbc-level-5\"},{\"name\":\"Advertising \\u0026 Marketing (TRBC level 4)\",\"slug\":\"advertising-marketing-trbc-level-4\"},{\"name\":\"Women\\u0027s Issues\",\"slug\":\"womens-issues\"},{\"name\":\"Race Relations \\/ Ethnic Issues\",\"slug\":\"race-relations-ethnic-issues\"},{\"name\":\"Newspaper Publishing (TRBC level 5)\",\"slug\":\"newspaper-publishing-trbc-level-5\"},{\"name\":\"Television Broadcasting (TRBC level 5)\",\"slug\":\"television-broadcasting-trbc-level-5\"},{\"name\":\"US Supreme Court\",\"slug\":\"us-supreme-court\"}],\"keywords\":[{\"keywordName\":\"United States\",\"keywordSlug\":\"united-states\"},{\"keywordName\":\"US\",\"keywordSlug\":\"us\"},{\"keywordName\":\"Corporate Events\",\"keywordSlug\":\"corporate-events\"},{\"keywordName\":\"Company News\",\"keywordSlug\":\"company-news\"},{\"keywordName\":\"Financials (Legacy)\",\"keywordSlug\":\"financials-legacy\"},{\"keywordName\":\"Financials (TRBC level 1)\",\"keywordSlug\":\"financials-trbc-level-1\"},{\"keywordName\":\"Arts \\/ Culture \\/ Entertainment\",\"keywordSlug\":\"arts-culture-entertainment\"},{\"keywordName\":\"Reuters Top News\",\"keywordSlug\":\"reuters-top-news\"},{\"keywordName\":\"Major News\",\"keywordSlug\":\"major-news\"},{\"keywordName\":\"Cyclical Consumer Services (TRBC level 2)\",\"keywordSlug\":\"cyclical-consumer-services-trbc-level-2\"},{\"keywordName\":\"Consumer Cyclicals (TRBC level 1)\",\"keywordSlug\":\"consumer-cyclicals-trbc-level-1\"},{\"keywordName\":\"Technology (TRBC level 1)\",\"keywordSlug\":\"technology-trbc-level-1\"},{\"keywordName\":\"Technology \\/ Media \\/ Telecoms\",\"keywordSlug\":\"technology-media-telecoms\"},{\"keywordName\":\"General News\",\"keywordSlug\":\"general-news\"},{\"keywordName\":\"Government \\/ Politics\",\"keywordSlug\":\"government-politics\"},{\"keywordName\":\"Media \\u0026 Publishing (TRBC level 3)\",\"keywordSlug\":\"media-publishing-trbc-level-3\"},{\"keywordName\":\"Media \\/ Publishing (Legacy)\",\"keywordSlug\":\"media-publishing-legacy\"},{\"keywordName\":\"Society \\/ Social Issues\",\"keywordSlug\":\"society-social-issues\"},{\"keywordName\":\"Crime \\/ Law \\/ Justice\",\"keywordSlug\":\"crime-law-justice\"},{\"keywordName\":\"Corporate \\/ Market Regulation\",\"keywordSlug\":\"corporate-market-regulation\"},{\"keywordName\":\"Judicial Process \\/ Court Cases \\/ Court Decisions\",\"keywordSlug\":\"judicial-process-court-cases-court-decisions\"},{\"keywordName\":\"Broadcasting (TRBC level 4)\",\"keywordSlug\":\"broadcasting-trbc-level-4\"},{\"keywordName\":\"USA\",\"keywordSlug\":\"usa\"},{\"keywordName\":\"COURT\",\"keywordSlug\":\"court\"},{\"keywordName\":\"US Government News\",\"keywordSlug\":\"us-government-news\"},{\"keywordName\":\"Consumer Publishing (TRBC level 4)\",\"keywordSlug\":\"consumer-publishing-trbc-level-4\"},{\"keywordName\":\"Broadcasting (NEC) (TRBC level 5)\",\"keywordSlug\":\"broadcasting-nec-trbc-level-5\"},{\"keywordName\":\"Advertising \\u0026 Marketing (TRBC level 4)\",\"keywordSlug\":\"advertising-marketing-trbc-level-4\"},{\"keywordName\":\"Women\\u0027s Issues\",\"keywordSlug\":\"womens-issues\"},{\"keywordName\":\"Race Relations \\/ Ethnic Issues\",\"keywordSlug\":\"race-relations-ethnic-issues\"},{\"keywordName\":\"Newspaper Publishing (TRBC level 5)\",\"keywordSlug\":\"newspaper-publishing-trbc-level-5\"},{\"keywordName\":\"Television Broadcasting (TRBC level 5)\",\"keywordSlug\":\"television-broadcasting-trbc-level-5\"},{\"keywordName\":\"US Supreme Court\",\"keywordSlug\":\"us-supreme-court\"},{\"keywordName\":\"FCC\",\"keywordSlug\":\"fcc\"}],\"n2\":[],\"authors\":[{\"authorName\":\"Andrew Chung\"},{\"authorName\":\"David Shepardson\"}]}
Maybe this will be enough to get you started:
test <- data.frame(Text = rep("The quick brown fox jumped over the lazy dog's back.", 5))
Now split out the words:
test.lst <- strsplit(test$Text, " ")
test.lst[[1]]
# [1] "The" "quick" "brown" "fox" "jumped" "over" "the" "lazy" "dog's" "back."
Get rid of the punctuation:
test.lst2 <- lapply(test.lst, function(x) gsub("[[:punct:]]", "", x))
test.lst2[[1]]
# [1] "The" "quick" "brown" "fox" "jumped" "over" "the" "lazy" "dogs" "back"
test.lst2 is a list containing a part for each row of the data. If you want to collapse. To get frequencies:
table(unlist(test.lst2))
back brown dogs fox jumped lazy over quick the The
5 5 5 5 5 5 5 5 5 5
I want to record the job posting information from this search. Is anyone aware of an API or can you confirm it's possible to scrape with Python beautiful soup? (I'm familiar with scraping, I just can't see how to get this website)
Disclosure: I work at SerpApi.
You can use google-search-results package to get data from Google Jobs listings. Check a demo at Repl.it.
from serpapi import GoogleSearch
params = {
"engine": "google_jobs",
"q": "sustainability jobs in mi",
"google_domain": "google.com",
"api_key":
"API_KEY"
}
client = GoogleSearch(params)
data = client.get_dict()
print("Job results")
for job_result in data['jobs_results']:
print(f"""Title: {job_result['title']}
Company name: {job_result['company_name']}
Description: {job_result['description']}
""")
print("Filters")
for chip in data['chips']:
print(f"Type: {chip['type']}\n")
print("Options")
for option in chip['options']:
print(option['text'])
Response
{
"jobs_results":[
{
"title":"Sustainability Analyst",
"company_name":"Amcor",
"location":"Ann Arbor, MI",
"via":"via LinkedIn",
"description":"Amcor Limited Job Posting\n\nRole: Sustainability Analyst\n\nLocation: TBD, ideally in the US (Ann Arbor, MI)\n\nAbout Amcor\n\nAmcor (ASX: AMC;\n\nAmcor is proud of its recent pledge to design all of our packaging to be recyclable or reusable by 2025. The job holder will play a very important and exciting role in Amcor’s journey to deliver this important commitment.\n\nPosition Overview\n\nRead more about Amcor’s sustainability commitment:\n\nThe Sustainability function plays a key role in positioning Amcor as THE leading packaging company for the environment delivering on Amcor’s sustainability strategy, the 2025 pledge and as a supplier of choice for responsible packaging.\n\nThe Sustainability Analyst is responsible for analyzing, reporting, and coordinating selected global Sustainability activities with direction from the VP Sustainability.\n\nEssential Responsibilities And Duties\n• Track legislative activity, analyze for risk and opportunity, help to prioritize actions\n• Assist with drafting... positions, coordinate Amcor activity and governance around advocacy (mostly in industry group participation)\n• Assists with internal reporting and communications, including preparing decks for internal meetings\n• Partnership administration, tracking projects and payments, and liaising with corporate finance on dept budget\n• Manage compliance statements, including anti-slavery statements, conflict minerals etc.\n• Coordinates the International Costal Cleanup, as needed with other partners\n• Other similar duties as required to support the corporate sustainability program\n\nQualifications\n• Education: Master's Degree or equivalent in related field preferred\n• Three to five years of experience\n• Strong analytical skills, including ability to interpret and graphically display environmental performance data\n• Excellent written and verbal communications skills\n• Excellent working knowledge of Microsoft Office\n• Demonstrated professional work characteristics including high initiative, dependability, and ability to manage confidential information\n• Must be well organized and comfortable interfacing with all levels of management\nAmcor Leadership Framework Competencies\n• Drive for Results\n• Influencing Others\n• Customer Focus\n• Learning on the Fly\n• Interpersonal Savvy\n• Organizational Awareness\n• Priority Setting\n• Organizing\n• Functional / Technical Skills\n• Strong Computer Skills\n\nRelationships\n• Amcor Leadership\n• Direct Reports\n• External Vendors\n• Government agencies\n• Global partners/ Nonprofit organizations\n• Industry organizations\nExpected Travel: 10% Travel\n\nThe information contained herein is not intended to be an all-inclusive list of the duties and responsibilities of the job, nor are they intended to be an all-inclusive list of the skills and abilities required to do the job.\n\n#North America",
"extensions":[
"Over 1 month ago",
"Full-time"
]
},
{
"title":"Environmental Jobs in Michigan,USA",
"company_name":"freelancejobopenings.com",
"location":"Michigan",
"via":"via Freelance Job Openings",
"description":"Environmental Jobs in Michigan,USA\n\nSummer Camp Instructor\n\nenvironmental learning center at barr lake state park with a satellite office in fort collins and fieldwork outposts in environmental science, leadership, and or outdoor adventure programs for diverse audiences in formal and non formal outdoor and classroom environmental studies, biological sciences, natural resource management, or related field, with a focus in ornithology.\n\n strong summer, birding, camp, education, colorado, outdoors, teaching\n\nwebsite: barefoot student summer camp\n\nSITE LEAD\n\nenvironmental changes, and sudden work schedule changes.\n• tech savvy: frito lay is an industry leader site: fritolay the site lead is accountable for ensuring the building is operating at top performance to deliver the zone sops strategy and ensures a safe working environment. the role requires cross functional understanding in order to drive operations success.\n\nwe are open 24 hours a day, which means\n\nField Service ... Chromatography Spectrometry Instruments - Grand Rapids, MI\n\nenvironmental testing, and forensic toxicology looking to hire field service engineer to support lcms and gcms platforms. travel to client labs to perform calibrations, diagnose problems with equipment field service chromatography spectrometry instruments grand rapids, mi\n\nleader in liquid chromatography mass spectrometry and gas chromatography mass spectrometry, supporting clinical research, drug discovery, food and environmental testing, and forensic toxicology looking to hire field service engineer to support\n\nUTA Test Engineer\n\nenvironmental demands may be referenced in an attempt to municate the manner in which this position traditionally is performed. about capgemini:\n\na global leader in consulting, technology services and digital transformation, capgemini is at the forefront of innovation to address the entire breadth of clients’ opportunities in the evolving world of cloud, digital and platforms. building on its strong 50 year heritage and deep industry specific expertise, capgemini enables organizations to realize\n\nIndustrial Water/Wastewater Design Engineer\n\nenvironmental, civil, or chemical\n• 4+ years of industrial water wastewater system environmental, civil or chemical\n• water wastewater treatment design experience in variety industrial markets\n• experience with biological and physical chemical treatment design build experience\n\nwhat we offer engineering water wastewater\n\nbusiness line design and consulting services group (dcs)\n\ncountry",
"extensions":[
"13 hours ago",
"Full-time"
]
}
]
}
If you want more information, check out SerpApi documentation.
I am trying to extract textbook names, and other journal articles in syllabi collected from various courses using R. My basic assumption is that most of these will be in some kind of a citation format (e.g. APA, MLA, etc). While I can try to create regex-s to extract this information, I was wondering if anyone has tried to do this before, or if an R package exists that I may be able to use to extract this information from differently formatted text.
Below are two examples of the syllabi that I am working with. In Sample 1, the book name is not in a citation format, but in sample 2, it is in a citation format. Both samples have been truncated to meet stackoverflow character limits.
SAMPLE 1:
"ABC State University ARTS 3366 Intermediate Digital Photography Fall 2015 JCM 4127 T/TH 24:30 pm Lecturer: John Smith Office Hours: T/TH prior to and after class Email: johnsmith#abcstate.edu Alternate email: johnsmith#gmail.com Prerequisites: ARTS 3364 Introduction to Digital Photography Course Description & Objectives: This course is designed to expand and build on the skills and knowledge acquired in Introduction to Digital Photography. This course builds on the skills and knowledge acquired in Introduction to Digital Photography. Specifically, we will use the history, critical analysis, and production of photography books to: (1) explore the complexities of the medium in social, political, and aesthetic contexts; (2) develop more advanced and conceptually driven photography work; (3) work toward a greater understanding of how photography books function as selfcontained art, cultural, and political objects; (4) learn how to choose subject matter and continually explore, experiment, and refine our work. The final outcomes ofthe class will be the creation of an ondemand book and an accompanying folio of fine prints. We will use digital cameras, inkjet printers, Adobe Photoshop, Lightroom, and Macintosh computers in this course. Through lectures, discussions and readings, we will explore and discuss historical trends in traditional (analog) photography, as well as emerging practices in contemporary digital imaging. This will serve as a foundation to help determine the approach, subject matter, and style of the work created for class. In addition to refining these skills, students will also address the practical and theoretical roles of digital imagery. The course objective will be to focus on technical, aesthetic, and conceptual growth of a student’s endeavors in the digital medium. This course requires the completion of: all assignments (on time), participation in all group critiques and completion of a Twelve to Fifteen image final portfolio of prints or equivalent, and three projects throughout the semester. Requirements: Coursework: This course requires the completion of: all assignments (on time), participation in all group critiques and completion of a 1215 image final portfolio of prints or equivalent, the creation of a book printed with an on demand printing service, as well as making new photographs consistently throughout the entire semester. Suggested (not required)Books: Adobe Photoshop Lightroom 5 Book, The: The Complete Guide for Photographers By Martin Evening Published Jun 30, 2013 by Adobe Press The Photographer’s Playbook 307 Assignments and Ideas Edited by Jason Fulford and Gregory Halpern Published by Aperture On Being a Photographer: A Practical Guide by David Hurn and Bill Jay Local Stores:"
SAMPLE 2:"Physical Education Activity ProgramHealth & Fitness Strength TrainingKINE 198-837Instructor: JANE DOE Office: PEAP 230Office Hours: By appointmentPhone: (000) 000-0000E-Mail: jdoe#xyz.edu A. Activity Instructor: Jane DoeOffice: PEAP 250Office Hours: By appointmentClass Time: Thursday 2:20 pmPhone: (000) 000-0000Email: jdoe#xyz1.edu Class Meeting Site: PEAP 117B. Activity Instructor: Jane Doe Phone:Office: PEAP 239Email: jdoe#xyz1.eduOffice Hours: Thursday 10:00 am – 12:00 pmClass Time: Thursday 2:20 pmClass Meeting Site: PEAP 118C. Activity Instructor: John doe Office: PEAP 250/Doe 213KOffice Hours: Tuesday 1:00-2:00 pmClass Time: Thursday 2:20 pmPhone:Email: johndoe#xyz.eduClass Meeting Site: PEAP 120Attire: Proper clothes and shoes designed specifically for strength training on activitydays.Required Materials:Bounds, L., Agnor, D., Darnell,G., & Brekken Shea, K. (2012). Health & Fitness: AGuide to a Healthy Lifestyle (5th edition). Dubuque, IA: Kendall/Hunt Publishing Co.ISBN 978-1-4652-0712-8Cissik, J. (2001). The Basics of Strength Training (3rd Edition). McGraw-Hill,Primus Custom PublishingCourse Description:Health and Fitness is intended for the student who is seeking knowledge and practicalapplication of wellness choices to their life. The course consists of two components,lecture and activity. Students will meet face-to-face one day per week for the activityportion of the class and work approximately the equivalent of one day per week onlinewith lecture materials. The lecture portion will cover current health issues includingmental and physical health, nutrition, human sexuality, communicable and noncommunicable diseases, use and abuse of drugs, and safety. The activity portion willconsist of 14 class days and cover basic knowledge and techniques of strength trainingand improving the individual’s fitness through the utilization of this knowledge.Course Rationale:Research indicates that daily health/fitness related behaviors enhance learning anddetermine the quality and longevity of our life."