How to correctly write a csv or text file in R - r

I am trying to save a csv file or a text file in R.
One column has paragraphs in which has commas in so when I write;
write.table(x, file = "D:/text.csv", sep = ",", row.names = FALSE)
y <- read.csv(file = "D:/text.csv")
It writes the file but when I read it in I go from 50 rows to 57 rows. I understand that its probably because of the sep = "," argument and I could change it to "|" but these files are really large and the text column in likely to contain many different characters the wont work using sep.
How can I overcome this issue?
Data:
x <- structure(list(Document = c("https://www.sec.gov/Archives/edgar/data/718877/000104746916010584/a2227483z10-k.htm.1",
"https://www.sec.gov/Archives/edgar/data/718877/000104746916010584/a2227483z10-k.htm.2",
"https://www.sec.gov/Archives/edgar/data/718877/000104746916010584/a2227483z10-k.htm.3",
"https://www.sec.gov/Archives/edgar/data/718877/000104746916010584/a2227483z10-k.htm.4",
"https://www.sec.gov/Archives/edgar/data/718877/000104746916010584/a2227483z10-k.htm.5",
"https://www.sec.gov/Archives/edgar/data/718877/000104746916010584/a2227483z10-k.htm.6",
"https://www.sec.gov/Archives/edgar/data/718877/000104746916010584/a2227483z10-k.htm.7",
"https://www.sec.gov/Archives/edgar/data/718877/000104746916010584/a2227483z10-k.htm.8",
"https://www.sec.gov/Archives/edgar/data/718877/000104746916010584/a2227483z10-k.htm.9",
"https://www.sec.gov/Archives/edgar/data/718877/000104746916010584/a2227483z10-k.htm.10",
"https://www.sec.gov/Archives/edgar/data/718877/000104746916010584/a2227483z10-k.htm.11",
"https://www.sec.gov/Archives/edgar/data/718877/000104746916010584/a2227483z10-k.htm.12",
"https://www.sec.gov/Archives/edgar/data/718877/000104746916010584/a2227483z10-k.htm.13",
"https://www.sec.gov/Archives/edgar/data/718877/000104746916010584/a2227483z10-k.htm.14",
"https://www.sec.gov/Archives/edgar/data/718877/000104746916010584/a2227483z10-k.htm.15",
"https://www.sec.gov/Archives/edgar/data/718877/000104746916010584/a2227483z10-k.htm.16",
"https://www.sec.gov/Archives/edgar/data/718877/000104746916010584/a2227483z10-k.htm.17",
"https://www.sec.gov/Archives/edgar/data/718877/000104746916010584/a2227483z10-k.htm.18",
"https://www.sec.gov/Archives/edgar/data/718877/000104746916010584/a2227483z10-k.htm.19",
"https://www.sec.gov/Archives/edgar/data/718877/000104746916010584/a2227483z10-k.htm.20",
"https://www.sec.gov/Archives/edgar/data/718877/000104746916010584/a2227483z10-k.htm.21",
"https://www.sec.gov/Archives/edgar/data/718877/000104746916010584/a2227483z10-k.htm.22",
"https://www.sec.gov/Archives/edgar/data/718877/000104746916010584/a2227483z10-k.htm.23",
"https://www.sec.gov/Archives/edgar/data/718877/000104746916010584/a2227483z10-k.htm.24",
"https://www.sec.gov/Archives/edgar/data/718877/000104746916010584/a2227483z10-k.htm.25",
"https://www.sec.gov/Archives/edgar/data/718877/000104746916010584/a2227483z10-k.htm.26",
"https://www.sec.gov/Archives/edgar/data/718877/000104746916010584/a2227483z10-k.htm.27",
"https://www.sec.gov/Archives/edgar/data/718877/000104746916010584/a2227483z10-k.htm.28",
"https://www.sec.gov/Archives/edgar/data/718877/000104746916010584/a2227483z10-k.htm.29",
"https://www.sec.gov/Archives/edgar/data/718877/000104746916010584/a2227483z10-k.htm.30",
"https://www.sec.gov/Archives/edgar/data/718877/000104746916010584/a2227483z10-k.htm.31",
"https://www.sec.gov/Archives/edgar/data/718877/000104746916010584/a2227483z10-k.htm.32",
"https://www.sec.gov/Archives/edgar/data/718877/000104746916010584/a2227483z10-k.htm.33",
"https://www.sec.gov/Archives/edgar/data/718877/000104746916010584/a2227483z10-k.htm.34",
"https://www.sec.gov/Archives/edgar/data/718877/000104746916010584/a2227483z10-k.htm.35",
"https://www.sec.gov/Archives/edgar/data/718877/000104746916010584/a2227483z10-k.htm.36",
"https://www.sec.gov/Archives/edgar/data/718877/000104746916010584/a2227483z10-k.htm.37",
"https://www.sec.gov/Archives/edgar/data/718877/000104746916010584/a2227483z10-k.htm.38",
"https://www.sec.gov/Archives/edgar/data/718877/000104746916010584/a2227483z10-k.htm.39",
"https://www.sec.gov/Archives/edgar/data/718877/000104746916010584/a2227483z10-k.htm.40",
"https://www.sec.gov/Archives/edgar/data/718877/000104746916010584/a2227483z10-k.htm.41",
"https://www.sec.gov/Archives/edgar/data/718877/000104746916010584/a2227483z10-k.htm.42",
"https://www.sec.gov/Archives/edgar/data/718877/000104746916010584/a2227483z10-k.htm.43",
"https://www.sec.gov/Archives/edgar/data/718877/000104746916010584/a2227483z10-k.htm.44",
"https://www.sec.gov/Archives/edgar/data/718877/000104746916010584/a2227483z10-k.htm.45",
"https://www.sec.gov/Archives/edgar/data/718877/000104746916010584/a2227483z10-k.htm.46",
"https://www.sec.gov/Archives/edgar/data/718877/000104746916010584/a2227483z10-k.htm.47",
"https://www.sec.gov/Archives/edgar/data/718877/000104746916010584/a2227483z10-k.htm.48",
"https://www.sec.gov/Archives/edgar/data/718877/000104746916010584/a2227483z10-k.htm.49",
"https://www.sec.gov/Archives/edgar/data/718877/000104746916010584/a2227483z10-k.htm.50"
), text = c("Use these links to rapidly review the document TABLE OF CONTENTS Item 8. FINANCIAL STATEMENTS AND SUPPLEMENTARY DATA",
"Table of Contents", "UNITED STATES SECURITIES AND EXCHANGE COMMISSION Washington, D.C. 20549",
"FORM 10-K", "(Mark one) ý ANNUAL REPORT PURSUANT TO SECTION 13 OR 15(d) OF THE SECURITIES EXCHANGE ACT OF 1934 For the Fiscal Year Ended December 31, 2015 OR o TRANSITION REPORT PURSUANT TO SECTION 13 OR 15(d) OF THE SECURITIES EXCHANGE ACT OF 1934 For the transition period from to",
"Commission File Number 1-15839", "ACTIVISION BLIZZARD, INC. (Exact name of registrant as specified in its charter)",
"Delaware (State or other jurisdiction of incorporation or organization) 95-4803544 (I.R.S. Employer Identification No.) 3100 Ocean Park Boulevard, Santa Monica, CA (Address of principal executive offices) 90405 (Zip Code)",
"Registrant's telephone number, including area code: (310) 255-2000",
"Securities registered pursuant to Section 12(b) of the Act:",
"Title of each Class Name of Each Exchange on Which Registered Common Stock, par value $.000001 per share The NASDAQ Global Select Market",
"Securities registered pursuant to Section 12(g) of the Act: None",
"Indicate by check mark if the registrant is a well-known seasoned issuer, as defined in Rule 405 of the Securities Act. Yes ý No o",
"Indicate by check mark if the registrant is not required to file reports pursuant to Section 13 or Section 15 (d) of the Act. Yes o No ý",
"Indicate by check mark whether the registrant (1) has filed all reports required to be filed by Section 13 or 15(d) of the Securities Exchange Act of 1934 during the preceding 12 months (or for such shorter period that the registrant was required to file such reports), and (2) has been subject to such filing requirements for the past 90 days. Yes ý No o",
"Indicate by check mark whether the registrant has submitted electronically and posted on its corporate Web site, if any, every Interactive Data File required to be submitted and posted pursuant to Rule 405 of Regulation S-T (§ 232.405 of this chapter) during the preceding 12 months (or for such shorter period that the registrant was required to submit and post such files). Yes ý No o",
"Indicate by check mark if disclosure of delinquent filers pursuant to Item 405 of Regulation S-K is not contained herein, and will not be contained, to the best of the registrant's knowledge, in definitive proxy or information statements incorporated by reference in Part III of this Form 10-K or any amendment to this Form 10-K. ý",
"Indicate by check mark whether the registrant is a large accelerated filer, an accelerated filer, a non-accelerated filer, or a smaller reporting company. See the definitions of \"large accelerated filer,\" \"accelerated filer,\" and \"smaller reporting company\" in Rule 12b-2 of the Exchange Act.",
"Large Accelerated Filer ý Accelerated Filer o Non-accelerated Filer o (Do not check if a smaller reporting company) Smaller Reporting Company o",
"Indicate by check mark whether the registrant is a shell company (as defined in Rule 12b-2 of the Act). Yes o No ý",
"The aggregate market value of the registrant's Common Stock held by non-affiliates on June 30, 2015 (based on the closing sale price as reported on the NASDAQ) was $13,345,675,247.",
"The number of shares of the registrant's Common Stock outstanding at February 22, 2016 was 734,998,115.",
"Documents Incorporated by Reference", "Portions of the registrant's definitive Proxy Statement, to be filed with the Securities and Exchange Commission with respect to the 2016 Annual Meeting of Shareholders which is expected to be held on June 2, 2016, are incorporated by reference into Part III of this Annual Report.",
"Table of Contents", "ACTIVISION BLIZZARD, INC. AND SUBSIDIARIES Table of Contents",
"Page No. PART I. 3 Cautionary Statement 3 Item 1. Business Item 1A. Risk Factors 15 Item 1B. Unresolved Staff Comments 40 Item 2. Properties 40 Item 3. Legal Proceedings 40 Item 4. Mine Safety Disclosures 41 PART II. 42 Item 5. Market for Registrant's Common Equity, Related Stockholder Matters, and Issuer Purchases of Equity Securities 42 Item 6. Selected Financial Data 45 Item 7. Management's Discussion and Analysis of Financial Condition and Results of Operations 46 Item 7A. Quantitative and Qualitative Disclosures about Market Risk 83 Item 8. Financial Statements and Supplementary Data 86 Item 9. Changes in and Disagreements with Accountants on Accounting and Financial Disclosure 86 Item 9A. Controls and Procedures 86 Item 9B. Other Information 87 PART III. 88 Item 10. Directors, Executive Officers, and Corporate Governance 88 Item 11. Executive Compensation 88 Item 12. Security Ownership of Certain Beneficial Owners and Management and Related Stockholder Matters 88 Item 13. Certain Relationships and Related Transactions, and Director Independence 88 Item 14. Principal Accounting Fees and Services 88 PART IV. 89 Item 15. Exhibits, Financial Statement Schedule 89 SIGNATURES 90 Exhibit Index E-1",
"2", "Table of Contents", "PART I", "CAUTIONARY STATEMENT", "This Annual Report on Form 10-K contains, or incorporates by reference, certain forward-looking statements within the meaning of the Private Securities Litigation Reform Act of 1995. Such statements consist of any statement other than a recitation of historical facts and include, but are not limited to: (1) projections of revenues, expenses, income or loss, earnings or loss per share, cash flow or other financial items; (2) statements of our plans and objectives, including those relating to product releases; (3) statements of future financial or operating performance; (4) statements relating to the acquisition of King Digital Entertainment plc and expected impact of that transaction, including without limitation, the expected impact on Activision Blizzard's future financial results; and (5) statements of assumptions underlying such statements. Activision Blizzard, Inc. generally uses words such as \"outlook,\" \"forecast,\" \"will,\" \"could,\" \"should,\" \"would,\" \"to be,\" \"plan,\" \"plans,\" \"believes,\" \"may,\" \"might,\" \"expects,\" \"intends,\" \"intends as,\" \"anticipates,\" \"estimate,\" \"future,\" \"positioned,\" \"potential,\" \"project,\" \"remain,\" \"scheduled,\" \"set to,\" \"subject to,\" \"upcoming\" and other similar expressions to help identify forward-looking statements. Forward-looking statements are subject to business and economic risks, reflect management's current expectations, estimates and projections about our business, and are inherently uncertain and difficult to predict. Our actual results could differ materially from expectations stated in forward-looking statements. Some of the risk factors that could cause our actual results to differ from those stated in forward-looking statements can be found in \"Risk Factors\" included in Part I, Item 1A of this Report. The forward-looking statements contained herein are based upon information available to us as of the date of this Annual Report on Form 10-K and we assume no obligation to update any such forward-looking statements. Although these forward-looking statements are believed to be true when made, they may ultimately prove to be incorrect. These statements are not guarantees of our future performance and are subject to risks, uncertainties and other factors, some of which are beyond our control and may cause actual results to differ materially from current expectations.",
"Activision Blizzard Inc.'s names, abbreviations thereof, logos, and product and service designators are all either the registered or unregistered trademarks or trade names of Activision Blizzard. All other product or service names are the property of their respective owners.",
"Overview", "Activision Blizzard, Inc. is a worldwide developer and publisher of online, personal computer (\"PC\"), video game console, handheld, mobile and tablet games. The terms \"Activision Blizzard,\" the \"Company,\" \"we,\" \"us,\" and \"our\" are used to refer collectively to Activision Blizzard, Inc. and its subsidiaries. We currently offer games that operate on the Microsoft Corporation (\"Microsoft\") Xbox One (\"Xbox One\") and Xbox 360 (\"Xbox 360\"), Nintendo Co. Ltd. (\"Nintendo\") Wii U (\"Wii U\") and Wii (\"Wii\"), and Sony Computer Entertainment Inc. (\"Sony\") PlayStation 4 (\"PS4\") and PlayStation 3 (\"PS3\") console systems (Xbox One, Wii U, and PS4 are collectively referred to as \"next-generation\"; Xbox 360, Wii, and PS3 are collectively referred to as \"prior-generation\"); the PC; the Nintendo 3DS, Nintendo Dual Screen and Sony PlayStation Vita handheld game systems; and mobile and tablet devices.",
"Activision - Through Activision Publishing, Inc. (\"Activision\"), we are a leading international developer and publisher of interactive software products and content. Activision develops, markets and sells products through retail channels or digital downloads, which are principally based on our internally developed intellectual properties, as well as some licensed properties. Activision delivers content to a broad range of gamers, ranging from children to adults, and from core gamers to mass-market consumers to \"value\" buyers seeking budget-priced software, in a variety of geographies. Activision continues to focus its efforts in the areas we believe have the most opportunity for growth and higher profitability, while reducing investments in areas we believe have less profit potential and",
"3", "Table of Contents", "limited growth opportunities. To that end, investments are focused on proven intellectual properties to develop deep, high-quality content that offers engaging online gaming experiences. One of Activision's leading franchises is Call of Duty®, which launched in 2003, and has been the best-selling Western interactive franchise since its launch. In 2015, Activision released the latest installment in the franchise, Call of Duty: Black Ops III, which, according to The NPD Group, GfK Chart-Track, and Activision Blizzard internal estimates, was the #1 best-selling console game globally in 2015. Activision is currently developing, distributing, and selling additional digital content for the global community of Call of Duty: Black Ops III players, along with content for the other Call of Duty titles, in addition to developing future releases and sequels.",
"Another leading franchise for Activision is Skylanders®, which launched in 2011 with the release of Skylanders Spyro's Adventure. Games in the Skylanders franchise combine the use of physical toys with digital interactive experiences to deliver innovative gameplay to our audience. In September 2015, we released Skylanders SuperChargers, which introduced vehicles-to-life - an entirely new way for fans to experience the magic of Skylanders.",
"While focusing on proven intellectual properties is one of Activision's priorities, we also continue to make strategic investments in developing new intellectual properties that we believe have the potential for long-term growth and success. For example, on September 15, 2015, we released The Taken King, the third and largest expansion to Destiny, the game universe created by Bungie under our long-term alliance with them. We also introduced microtransactions within Destiny in October 2015 and expect to release additional content to our global community of Destiny players in 2016.",
"Blizzard - Blizzard Entertainment, Inc. (\"Blizzard\") is a leader in online PC gaming, including the subscription-based massively multi-player online role-playing game (\"MMORPG\") category in terms of both subscriber base and revenues generated through its World of Warcraft® franchise. Blizzard also develops, markets, and sells role-playing action and strategy games for the PC, console, mobile and tablet platforms, including games in the multiple-award winning Diablo®, StarCraft®, Hearthstone®: Heroes of Warcraft<U+0099> and Heroes of the Storm<U+0099> franchises. In addition, Blizzard maintains a proprietary online gaming service, Battle.net®, which facilitates the creation of user-generated content, digital distribution and online social connectivity across all Blizzard games. Blizzard distributes its products and generates revenues worldwide through various means, including: subscriptions; sales of prepaid subscription cards; in-game purchases and services; retail sales of physical \"boxed\" products; online download sales of PC products; purchases and downloads via third-party console, mobile and tablet platforms; and licensing of software to third-party or related party companies that distribute Blizzard products.",
"Blizzard has released five expansion packs to the epic World of Warcraft franchise since 2004, with the most recent release, World of Warcraft: Warlords of Draenor®, having been released in November 2014, and the next expansion, World of Warcraft: Legion<U+0099>, to be released in the summer of 2016. For Hearthstone: Heroes of Warcraft, in addition to bringing the game to iOS and Android smartphones in April 2015, three new content releases, Blackrock Mountain<U+0099>, The Grand Tournament<U+0099>, and The League of Explorers<U+0099>, were introduced in 2015 and have continued to drive performance.",
"Blizzard continues to invest in new opportunities, both by leveraging its internally developed intellectual property, such as the release of Heroes of the Storm in 2015, as well as developing new intellectual property with the upcoming team-based first person shooter, Overwatch<U+0099>, which is expected to be released commercially in the spring of 2016.",
"Other - We also engage in other business opportunities including:",
"<U+0095> The Activision Blizzard Media Networks (\"Media Networks\") business, announced in 2015, which builds on our efforts in competitive gaming and the growing eSports industry;",
"4", "Table of Contents", "<U+0095> The Activision Blizzard Studios (\"Studios\") business, announced in 2015, which is devoted to creating original film and television content based on the company's extensive library of iconic and globally-recognized intellectual properties; and <U+0095> The Activision Blizzard Distribution (\"Distribution\") business, which consists of operations in Europe that provide warehousing, logistical, and sales distribution services to third-party publishers of interactive entertainment software, our own publishing operations, and manufacturers of interactive entertainment hardware.",
"Revenues associated with the Call of Duty, World of Warcraft, Skylanders, and Destiny franchises combined accounted for 71%, 72%, and 80% of our consolidated net revenues for the years ended December 31, 2015, 2014, and 2013, respectively."
), part.name = c("", "", "", "", "", "", "", "", "", "", "",
"", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "",
"", "", "PART I", "PART I", "PART I", "PART I", "PART I", "PART I",
"PART I", "PART I", "PART I", "PART I", "PART I", "PART I", "PART I",
"PART I", "PART I", "PART I", "PART I", "PART I", "PART I", "PART I",
"PART I"), item.name = c("", "", "", "", "", "", "", "", "",
"", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "",
"", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "",
"", "", "", "", "", "", "", "", ""), Documentshort = c("a2227483z10-k",
"a2227483z10-k", "a2227483z10-k", "a2227483z10-k", "a2227483z10-k",
"a2227483z10-k", "a2227483z10-k", "a2227483z10-k", "a2227483z10-k",
"a2227483z10-k", "a2227483z10-k", "a2227483z10-k", "a2227483z10-k",
"a2227483z10-k", "a2227483z10-k", "a2227483z10-k", "a2227483z10-k",
"a2227483z10-k", "a2227483z10-k", "a2227483z10-k", "a2227483z10-k",
"a2227483z10-k", "a2227483z10-k", "a2227483z10-k", "a2227483z10-k",
"a2227483z10-k", "a2227483z10-k", "a2227483z10-k", "a2227483z10-k",
"a2227483z10-k", "a2227483z10-k", "a2227483z10-k", "a2227483z10-k",
"a2227483z10-k", "a2227483z10-k", "a2227483z10-k", "a2227483z10-k",
"a2227483z10-k", "a2227483z10-k", "a2227483z10-k", "a2227483z10-k",
"a2227483z10-k", "a2227483z10-k", "a2227483z10-k", "a2227483z10-k",
"a2227483z10-k", "a2227483z10-k", "a2227483z10-k", "a2227483z10-k",
"a2227483z10-k"), companyID = c("718877", "718877", "718877",
"718877", "718877", "718877", "718877", "718877", "718877", "718877",
"718877", "718877", "718877", "718877", "718877", "718877", "718877",
"718877", "718877", "718877", "718877", "718877", "718877", "718877",
"718877", "718877", "718877", "718877", "718877", "718877", "718877",
"718877", "718877", "718877", "718877", "718877", "718877", "718877",
"718877", "718877", "718877", "718877", "718877", "718877", "718877",
"718877", "718877", "718877", "718877", "718877")), .Names = c("Document",
"text", "part.name", "item.name", "Documentshort", "companyID"
), row.names = c(NA, 50L), class = "data.frame")

First convert your file as a data.frame and try this
write.csv(df, file = 'df.csv', row.names = F)
then the file will be stored in your working directory

How about?
library(data.table)
fwrite(as.data.table(x), "text.csv")
Use fread() to read it back.

Related

Mutate with case_when for string values

I'm trying to create a new variable called department that tags each of these
Job titles as Science. Thanks for your help!
I tried...
plist <- plist %>%
mutate(department = case_when(str_detect(Job.Title %in% c("11255 - Data Scientist",
"11256 - Data Scientist I",
"11257 - Data Scientist II",
"11258 - Data Scientist III",
"11259 - Data Scientist IV",
"11260 - Manager, Data Science",
"11261 - Senior Manager, Data Science",
"11262 - Director, Data Science",
"11438 - Lead, Data Science",
"11689 - Senior Director, Data Science",
"12489 - Data Scientist V",
"11263 - Product Scientist I",
"11264 - Product Scientist II",
"11265 - Product Scientist III",
"11266 - Product Scientist IV",
"11267 - Manager, Product Science",
"11268 - Senior Manager, Prod Science",
"11269 - Director, Product Science",
"11447 - Lead, Product Science",
"11848 - Product Scientist",
"12626 - Product Scientist V" ~"Science")))`
and I receive the error:
Error in mutate():
! Problem while computing department = case_when(...).
Caused by error in check_lengths():
! argument "pattern" is missing, with no default

Replacing first character in line in multi-line text column

I am trying to replace the "o " with "• " in this text:
• Direct the Department’s technical
• Perform supervisory and managerial responsibilities as leader of the program
o Set direction to ensure goals and objectives
o Select management and other key personnel
o Collaborate with executive colleagues to develop and execute corporate initiatives and
department strategy
o Oversee the preparation and execution of department’s Annual Financial Plan and budget
o Manage merit pay
• Perform other duties as assigned
Since these are at the beginning of the line I've tried
test<- sub(test, pattern = "o ", replacement = "• ") # does not work
test<- gsub(test, pattern = "^o ", replacement = "• ") # does not work
test<- gsub(test, pattern = "o ", replacement = "• ") # works but it also replaces to to t•
Why does "^o " not work since it only appears at the beginning of each the line
Is this is all in a single value? If so, use a lookbehind to find o following either line breaks or string start:
test2 <- gsub(test, pattern = "(?<=\n|\r|^)o ", replacement = "• ", perl = TRUE)
cat(test2)
• Direct the Department’s technical
• Perform supervisory and managerial responsibilities as leader of the program
• Set direction to ensure goals and objectives
• Select management and other key personnel
• Collaborate with executive colleagues to develop and execute corporate initiatives and department strategy
• Oversee the preparation and execution of department’s Annual Financial Plan and budget
• Manage merit pay
• Perform other duties as assigned
Alternatively, split into individual values per line, then use your original regex:
test3 <- gsub(unlist(strsplit(test, "\n|\r")), pattern = "^o ", replacement = "• ")
test3
[1] "• Direct the Department’s technical"
[2] ""
[3] "• Perform supervisory and managerial responsibilities as leader of the program"
[4] ""
[5] "• Set direction to ensure goals and objectives"
[6] ""
[7] "• Select management and other key personnel"
[8] ""
[9] "• Collaborate with executive colleagues to develop and execute corporate initiatives and department strategy"
[10] ""
[11] "• Oversee the preparation and execution of department’s Annual Financial Plan and budget"
[12] ""
[13] "• Manage merit pay"
[14] ""
[15] "• Perform other duties as assigned"
You do not need any lookbehind here, use ^ with (?m) flag:
test <- gsub(test, pattern = "(?m)^o ", replacement = "• ", perl=TRUE)
The (?m) redefines the behavior of the ^ anchor that means "start of a line" if you specify the m flag.
See the online R demo:
test <- "• Direct the Department’s technical\n\no Set direction to ensure goals and objectives\n\no Select management and other key personnel"
cat(gsub(test, pattern = "(?m)^o ", replacement = "• ", perl=TRUE))
Output:
• Direct the Department’s technical
• Set direction to ensure goals and objectives
• Select management and other key personnel

R extract specific word after keyword

How do I extract a specific word after keyword in R.
I have the following input text which contains details about policy. I need to extract specific words value like FirstName , SurName , FatherName and dob.
input.txt
In Case of unit linked plan, Investment risk in Investment Portfolio is borne by the policyholder.
ly
c I ROPOSAL FORM z
Insurance
Proposal Form Number: 342525 PF 42242
Advisor Coe aranch Code 2
Ff roanumber =F SSOS™S™~™S~S rancid ate = |
IBR. Code S535353424
re GFN ——
INSTRUCTION FOR FILLING THES APPLICATION FORM ; 1. Compiets the proocsal form in CAPITAL LETTERS using = Black Ball Point P]n. 2. Sless= mark your selection by marking “X" insides the
Boe. 3. Slnsse bases 2 Blank soece after eect word, letter or initial 4. Slssse write "MA" for questions whic are not apolicatie. 5.00 NOT USE the Sor") to identify your initial or seperate the sddressiiine.
6. Sulmissson of age proof ie mandatory along wall Ge propel fonm.
IMPORTANT INSTRUCTIONS WITH REGARD TO DISCLOSURE OF INFORMATION: Inturance it a contract of UTMOST GOOD FAITH and itis required by disclose all material and nelevant
fach: complebehy, DO) NOT suppress any fac: in response by the questions in the priposal form. FAILURE TO PROVIDE COMPLETE AND ACCURATE INFORMATION OR
MISREPRESENTATION OF THE FACTS COULD DECLARE THES POLICY CONTRACT NULL AND VOID AFTER PAYMENT OF SURRENDER VALUE, IF ANY, SUBJECT TO SECTION 45 OF
INSURANCE ACT, 1998 As AMENDED FROM TIME TO TIME,
Section I - Details of the Life to be Assured
1. Tite E-] Mr. LJ Mrs. LJ Miss [J Or. LJ Others (Specify)
2. FirstName PETER PAUL
3. Surname T
44. Father's Name
46, Mother's Name ERIKA RESWE D
5. Date of Birth 13/02/1990 6, Gender E] Male ] Female
7. Age Proof L] School Certificate [] Driving License [] Passport {Birth Certificate E"] PAN Card
3, Marital Status D) Single EF] Married 0 Widower) 0 Civorcee
9, Spouse Name ERISEWQ FR
10. Maiden Name
iL. Nationality -] Resident Indian National [J Non Resident Indian (MRI) L] Others (Specify)
12, Education J Postgraduate / Doctorate Ee) Graduate [] 12thstd. Pass [J 10thstd. Pass [J Below 10th std.
OO Dliterate / Uneducated CJ Others (Specify)
13. Address For No 7¥%a vaigai street Flower
Communication Nagar selaiyur
Landmark
City Salem
Pin Code BO00 73: State TAMIL NADU
Address proof [] Passport ([] Driving License [] Voter ID [] Bank Statement [] Utility Bill G4 Others (Specify) Aadhaar Card
14, Permanent No 7¥a vaigai street Flower
Address :
Nagar selaiyur
Landmark
City Salem
Pin Code 5353535 state (TAMIL NADU
Address proof CJ] Passport [9 DrivingLicense [J Voter ID [ Bank Statement [ Utility Bill B] Others (Specify) Aadhaar Card
15. Contact Details Mobile 424242424 Phone (Home)
Office / Business
E-mail fdgrgtr13#yahoo.com
Preferred mode: ((] Letter EF) E-Mail
Preferred Language for Letter {other than English): [] Hindi [] Kannada [-] Tamil J Telugu C] Malayalam C) Gujarati
Bengali GOriya =D] Marathi
16. Occupation CL] Salaried-Govt /PSU ( Salaried-other [9 Self Employed Professional [J Aagriculturist {Farmer [Part Time Business
LJ Retired ] Landlord J Student (current Std) -] Others (Specify) Salaried - MNC
17. Full Name of the Capio software
Employers Businnes/
School/College
18, Designation & Exact nature of Work / Business Manager
19. AnnualIncomein 1,200,000.00 20. Annual Income of Husband / Father = 1,500,000.00
Figures (%) (for female and minor lives)
21. Exact nature of work / business of Husband / Father for female and minor lives Government Employee
Page 10fé
The below code works for me but the problem is if line order changes everything get changed. Is there a way to extract keyword value irrespective of line order. ?
Current Code
path <- getwd()
my_txt <- readLines(paste(path, "/input.txt", sep = ""))
fName <- sub('.*FirstName', '', my_txt[7])
SName <- sub('.*Surname', '', my_txt[8])
FatherNm <- sub(".*Father's Name", '', my_txt[9])
dob <- sub("6, Gender.*", '',sub(".*Date of Birth", '', my_txt[11]))
You can combine the text together as one string and extract the values based on pattern in the data. This approach will work irrespective of the line number in the data provided the pattern in the data is always valid for all the files.
my_txt <- readLines(paste(path, "/input.txt", sep = ""))
#Collapse data in one string
text <- paste0(my_txt, collapse = '\n')
#Extract text after FirstName till '\n'
fName <- sub('.*FirstName (.*?)\n.*', '\\1', text)
fName
#[1] "John Woo"
#Extract text after Surname till '\n'
SName <- sub('.*Surname (.*?)\n.*', '\\1', text)
SName
#[1] "T"
#Extract text after Father's Name till '\n'
FatherNm <- sub(".*Father's Name (.*?)\n.*", '\\1', text)
FatherNm
#[1] "Bill Woo"
#Extract numbers which come after Date of Birth.
dob <- sub(".*Date of Birth (\\d+/\\d+/\\d+).*", '\\1', text)
dob
#[1] "13/07/1970"

extract data from XML files - R

I'm new to extracting data from XML file. I'm trying to process the following an XML file using R XML packages. The information I want is in the attribute values.
I encounter two difficulties:
some attribute values exist in one node, but not in another node. For example, "DRP" has the information in the second but not in the first
some attributes has multiple values for an individual and i don't know how to link them to that individual. For example, "EmpHs" has multiple records for an individual (identified by indvlPK).
Ideally I want the output data has the structure similar to the following:
lastNm
firstNm
indvlPK
fromDt
orgNm
hasCustComp
GIGAX
JEFFREY
2783477
03/2004
GATEWAY FINANCIAL ADVISORS, INC
GIGAX
JEFFREY
2783477
03/2004
GFA IN
GIGAX
JEFFREY
2783477
01/2007
UNITED FIRST
HINSON
BRIAN
2783737
07/1996
LINCOLN FINANCIAL ADVISORS CORPORATION
Y
HINSON
BRIAN
2783737
07/1996
FIRST FINANCIAL GROUP
Y
Is there any way I can parse the data correctly? Thanks!
The code I used but didn't give me what I want:
doc <- "Test.xml"
ind <- xmlParse(doc)
xmltop = xmlRoot(ind)
temp1 <- data.frame(unlist(getNodeSet(xmltop,"//Info/#lastNm")))
temp2 <- data.frame(unlist(getNodeSet(xmltop,"//Info/#firstNm")))
temp3 <- data.frame(unlist(getNodeSet(xmltop,"//Info/#indvlPK")))
temp4 <- data.frame(unlist(getNodeSet(xmltop,"//EmpHs/#fromDt")))
temp5 <- data.frame(unlist(getNodeSet(xmltop,"//DRP/#hasCustComp")))
The data is here:
<?xml version="1.0" encoding="ISO-8859-1"?>
<IAPDIndividualReport GenOn="2021-03-29">
<Indvls>
<Indvl>
<Info lastNm="GIGAX" firstNm="JEFFREY" midNm="W" indvlPK="2783477" actvAGReg="Y" link="https://adviserinfo.sec.gov/individual/summary/2783477"/>
<OthrNms/>
<CrntEmps>
<CrntEmp orgNm="CAMBRIDGE INVESTMENT RESEARCH ADVISORS, INC." orgPK="134139" str1="1776 PLEASANT PLAIN RD." city="FAIRFIELD" state="IA" cntry="United States" postlCd="52556-8757">
<CrntRgstns>
<CrntRgstn regAuth="MO" regCat="RA" st="APPROVED" stDt="2010-09-09"/>
</CrntRgstns>
<BrnchOfLocs>
<BrnchOfLoc city="O&apos;FALLON" state="MO" cntry="United States"/>
</BrnchOfLocs>
</CrntEmp>
</CrntEmps>
<Exms>
<Exm exmCd="S63" exmNm="Uniform Securities Agent State Law Examination" exmDt="1996-08-20"/>
<Exm exmCd="S65" exmNm="Uniform Investment Adviser Law Examination" exmDt="1999-12-21"/>
</Exms>
<Dsgntns/>
<PrevRgstns>
<PrevRgstn orgNm="WOODBURY FINANCIAL SERVICES, INC." orgPK="421" regBeginDt="2009-01-05" regEndDt="2009-12-03">
<BrnchOfLocs>
<BrnchOfLoc city="OFALLON" state="MO"/>
<BrnchOfLoc city="OFALLON" state="MO"/>
<BrnchOfLoc city="DUBLIN" state="CA"/>
</BrnchOfLocs>
</PrevRgstn>
<PrevRgstn orgNm="FSC SECURITIES CORPORATION" orgPK="7461" regBeginDt="2004-10-29" regEndDt="2008-12-01">
<BrnchOfLocs>
<BrnchOfLoc city="O&apos;FALLON" state="MO"/>
<BrnchOfLoc city="ST. PETERS" state="MO"/>
</BrnchOfLocs>
</PrevRgstn>
<PrevRgstn orgNm="GATEWAY FINANCIAL ADVISORS, INC." orgPK="115025" regBeginDt="2004-11-11" regEndDt="2006-10-11">
<BrnchOfLocs>
<BrnchOfLoc city="ST. PETERS" state="MO"/>
</BrnchOfLocs>
</PrevRgstn>
</PrevRgstns>
<EmpHss>
<EmpHs fromDt="03/2004" orgNm="GATEWAY FINANCIAL ADVISORS, INC" city="OFALLON" state="MO"/>
<EmpHs fromDt="03/2004" orgNm="GFA INC" city="OFALLON" state="MO"/>
<EmpHs fromDt="01/2007" orgNm="UNITED FIRST" city="OFALLON" state="MO"/>
<EmpHs fromDt="09/2010" orgNm="CAMBRIDGE INVESTMENT RESEARCH ADVISORS, INC" city="FAIRFIELD" state="IA"/>
<EmpHs fromDt="09/2010" orgNm="CAMBRIDGE INVESTMENT RESEARCH, INC" city="FAIRFIELD" state="IA"/>
</EmpHss>
<OthrBuss>
<OthrBus desc="1)STONEBRIDGE WEALTH MANAGEMENT GROUP, 728 HAWK RUN DR, O&apos;FALLON, MO, 3/2008 AS INDEPENDENT INSURANCE AGENT FOR VARIOUS INDEPENDENT INSURANCE COMPANIES. INV REL - 40/MO - 20/TRADING. 2)UNITED FIRST FINANCIAL MORTGAGE SOFTWARE SALES. START 6/1/07, 10 HOURS PER MONTH, 5 DURING TRADING HOURS. NO OWNERSHIP INTEREST. 3)MORTGAGE STOP INC., 728 HAWK RUN DR., OFALLON, MO 63368. LOAN OFFICER PROCESSING LOAN APPS FOR CLIENTS. START 6/1/2002, 25 HOURS PER MONTH, 10 DURING TRADING HOURS. NO OWNERSHIP. 4)CIRA, 1776 PLEASANT PLAIN RD, FAIRFIELD, IA, AS ADVISORY REP OF A RIA. INV REL - 40 HR/WK - 40/TRADING. SEE EMPLOYMENT HISTORY FOR START DATE. 5) THE MORTGAGE SHOP, 355 MID RIVERS MALL DRIVE, STE E, ST. PETERS, MO 63376. MORTGAGE ORIGINATOR SINCE 01/01/99. NOT INVESTMENT RELATED. WORKS 60 HOURS PER MONTH, 20 OF WHICH ARE DURING TRADING HOURS. 6.365 PROPERTIES LLC, O&apos;FALLON, MO, 8/2018 AS OWNER OF LLC THAT BUYS, SELLS, & HOLDS REAL ESTATE. NIR - 20/MO - 0/TRADING. 7. BEST OFFER HOMES, LLC, 728 HAWK RUN DRIVE, O&apos;FALLON, MO, REAL ESTATE SALES/MORTGAGE ORIGINATION/ ACCOUNTING/FINANCIAL ACTIVITIES, 06/16/20, NIR, 20/MO- 0/TRADING 8. GIGAX WEALTH MANAGEMENT, 728 HAWK RUN DRIVE, OFALLON, MO, INDEPENDENT INSURANCE AGENT FOR VARIOUS INDEPENDENT INSURANCE COMPANIES,11/23/20, INV REL, 10 HR/WK- 10 TRADING HR."/>
</OthrBuss>
<DRPs/>
</Indvl>
<Indvl>
<Info lastNm="HINSON" firstNm="BRIAN" midNm="TROY" indvlPK="2783737" actvAGReg="Y" link="https://adviserinfo.sec.gov/individual/summary/2783737"/>
<OthrNms/>
<CrntEmps>
<CrntEmp orgNm="BRIDGEWORTH WEALTH MANAGEMENT" orgPK="164100" str1="101 25TH STREET NORTH" city="BIRMINGHAM" state="AL" cntry="United States" postlCd="35203">
<CrntRgstns>
<CrntRgstn regAuth="AL" regCat="RA" st="APPROVED" stDt="2015-05-12"/>
<CrntRgstn regAuth="TX" regCat="RA" st="APPROVED_RES" stDt="2015-05-01"/>
</CrntRgstns>
<BrnchOfLocs>
<BrnchOfLoc str1="400 MERIDIAN STREET" str2="SUITE 200" city="HUNTSVILLE" state="AL" cntry="United States" postlCd="35801"/>
<BrnchOfLoc str1="101 25TH STREET NORTH" city="BIRMINGHAM" state="AL" cntry="United States" postlCd="35203"/>
</BrnchOfLocs>
</CrntEmp>
</CrntEmps>
<Exms>
<Exm exmCd="S63" exmNm="Uniform Securities Agent State Law Examination" exmDt="1996-10-11"/>
</Exms>
<Dsgntns>
<Dsgntn dsgntnNm="Certified Financial Planner"/>
<Dsgntn dsgntnNm="Chartered Financial Consultant"/>
<Dsgntn dsgntnNm="Personal Financial Specialist"/>
</Dsgntns>
<PrevRgstns>
<PrevRgstn orgNm="LINCOLN FINANCIAL ADVISORS CORPORATION" orgPK="3978" regBeginDt="2000-04-25" regEndDt="2015-05-11">
<BrnchOfLocs>
<BrnchOfLoc city="HUNTSVILLE" state="AL"/>
<BrnchOfLoc city="HUNTSVILLE" state="AL"/>
</BrnchOfLocs>
</PrevRgstn>
</PrevRgstns>
<EmpHss>
<EmpHs fromDt="04/2015" orgNm="BRIDGEWORTH, LLC" city="HUNTSVILLE" state="AL"/>
<EmpHs fromDt="07/1996" toDt="04/2015" orgNm="LINCOLN FINANCIAL ADVISORS CORPORATION" city="HUNTSVILLE" state="AL"/>
<EmpHs fromDt="07/1996" toDt="04/2015" orgNm="FIRST FINANCIAL GROUP" city="BIRMINGHAM" state="AL"/>
<EmpHs fromDt="04/2015" orgNm="LPL FINANCIAL LLC" city="HUNTSVILLE" state="AL"/>
</EmpHss>
<OthrBuss>
<OthrBus desc="1) 04/30/2015: BRIDGEWORTH FINANCIAL, LLC - DBA FOR LPL BUSINESS (ENTITY FOR LPL BUSINESS) - INV REL - AT REPORTED BUSINESS LOCATIONS - START 01/01/2015 - 1% OF TIME SPENT 2) 04/30/2015: BRIDGEWORTH, LLC - INV REL - AT REPORTED BUSINESS LOCATION(S) - REGISTERED INVESTMENT ADVISOR HYBRID - START 01/2015 - 99% OF TIME SPENT. 3) 5/11/2015: NO BUSINESS NAME - INVESTMENT RELATED - AT REPORTED BUSINESS LOCATION(S) - NON-VARIABLE INSURANCE - STARTED 4/1/2015 - TIME SPENT 1% - LINES OF INSURANCE INCLUDE TERM, WHOLE, UNIVERSAL, LTC, DISABILITY. 4) 6/2/2017 - Bridgeworth Financial - Investment Related - At Reported Business Location(s) - DBA for LPL Business (entity for LPL business) - Started 04/30/2015 - 5 Hours Per Month/3 Hours During Securities Trading. 5) 5/8/2018 - Foster Properties Ltd - Not Investment Related - Home Based - Other-Family Business - Started 12/22/1997 - 1 Hours Per Month/0 Hours During Securities Trading - Handle the majority of business matters for this family business."/>
</OthrBuss>
<DRPs>
<DRP hasRegAction="N" hasCriminal="N" hasBankrupt="N" hasCivilJudc="N" hasBond="N" hasJudgment="N" hasInvstgn="N" hasCustComp="Y" hasTermination="N"/>
</DRPs>
</Indvl>
</Indvls>
</IAPDIndividualReport>

How to read csv with double quotes from WoS?

I'm trying to read CSV files from the citation report of Web of Science. This is the structure of the file:
TI=clinical case of cognitive dysfunction syndrome AND CU=MEXICO
null
Timespan=All years. Indexes=SCI-EXPANDED, SSCI, A&HCI, ESCI.
"Title","Authors","Corporate Authors","Editors","Book Editors","Source Title","Publication Date","Publication Year","Volume","Issue","Part Number","Supplement","Special Issue","Beginning Page","Ending Page","Article Number","DOI","Conference Title","Conference Date","Total Citations","Average per Year","1988","1989","1990","1991","1992","1993","1994","1995","1996","1997","1998","1999","2000","2001","2002","2003","2004","2005","2006","2007","2008","2009","2010","2011","2012","2013","2014","2015","2016"
""Didy," a clinical case of cognitive dysfunction syndrome","Heiblum, Moises; Labastida, Rocio; Chavez Gris, Gilberto; Tejeda, Alberto","","","","JOURNAL OF VETERINARY BEHAVIOR-CLINICAL APPLICATIONS AND RESEARCH","MAY-JUN 2007","2007","2","3","","","","68","72","","10.1016/j.jveb.2007.05.002","","","2","0.20","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","1","0","0","0","1","0","0","0"
""Didy," a clinical case of cognitive dysfunction syndrome (vol 2, pg 68, 2007)","Heiblum, A.; Labastida, R.; Gris, Chavez G.; Tejeda, A.; Edwards, Claudia","","","","JOURNAL OF VETERINARY BEHAVIOR-CLINICAL APPLICATIONS AND RESEARCH","SEP-OCT 2007","2007","2","5","","","","183","183","","","","","0","0.00","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0"
I manage to import the it using fread, however, I still want to know which is the appropriate quote and why is assigning "Didy," as row names despite that the argument is NULL. This are the arguments that I'm using.
s_file <- read.csv(savedrecs.txt,
skip = 4,
header = TRUE,
row.names = NULL,
quote = '\"',
stringsAsFactors = FALSE)
What you have shown is not a valid csv file format. There are some double double quotes (i.e. "") without a comma. For example there is one at the beginning of the second line.
""Didy," a clinical case of cognitive dysfunction syndrome", etc.
So it thinks there is a null followed by Diddy, followed by " a clinical case of cognitive dysfunction syndrome" Fix up the file and you should be ok. E.g. the second line should start with
"","Didy","a clinical case of cognitive dysfunction syndrome"

Resources