Mutate with case_when for string values - case

I'm trying to create a new variable called department that tags each of these
Job titles as Science. Thanks for your help!
I tried...
plist <- plist %>%
mutate(department = case_when(str_detect(Job.Title %in% c("11255 - Data Scientist",
"11256 - Data Scientist I",
"11257 - Data Scientist II",
"11258 - Data Scientist III",
"11259 - Data Scientist IV",
"11260 - Manager, Data Science",
"11261 - Senior Manager, Data Science",
"11262 - Director, Data Science",
"11438 - Lead, Data Science",
"11689 - Senior Director, Data Science",
"12489 - Data Scientist V",
"11263 - Product Scientist I",
"11264 - Product Scientist II",
"11265 - Product Scientist III",
"11266 - Product Scientist IV",
"11267 - Manager, Product Science",
"11268 - Senior Manager, Prod Science",
"11269 - Director, Product Science",
"11447 - Lead, Product Science",
"11848 - Product Scientist",
"12626 - Product Scientist V" ~"Science")))`
and I receive the error:
Error in mutate():
! Problem while computing department = case_when(...).
Caused by error in check_lengths():
! argument "pattern" is missing, with no default

Related

R extract specific word after keyword

How do I extract a specific word after keyword in R.
I have the following input text which contains details about policy. I need to extract specific words value like FirstName , SurName , FatherName and dob.
input.txt
In Case of unit linked plan, Investment risk in Investment Portfolio is borne by the policyholder.
ly
c I ROPOSAL FORM z
Insurance
Proposal Form Number: 342525 PF 42242
Advisor Coe aranch Code 2
Ff roanumber =F SSOS™S™~™S~S rancid ate = |
IBR. Code S535353424
re GFN ——
INSTRUCTION FOR FILLING THES APPLICATION FORM ; 1. Compiets the proocsal form in CAPITAL LETTERS using = Black Ball Point P]n. 2. Sless= mark your selection by marking “X" insides the
Boe. 3. Slnsse bases 2 Blank soece after eect word, letter or initial 4. Slssse write "MA" for questions whic are not apolicatie. 5.00 NOT USE the Sor") to identify your initial or seperate the sddressiiine.
6. Sulmissson of age proof ie mandatory along wall Ge propel fonm.
IMPORTANT INSTRUCTIONS WITH REGARD TO DISCLOSURE OF INFORMATION: Inturance it a contract of UTMOST GOOD FAITH and itis required by disclose all material and nelevant
fach: complebehy, DO) NOT suppress any fac: in response by the questions in the priposal form. FAILURE TO PROVIDE COMPLETE AND ACCURATE INFORMATION OR
MISREPRESENTATION OF THE FACTS COULD DECLARE THES POLICY CONTRACT NULL AND VOID AFTER PAYMENT OF SURRENDER VALUE, IF ANY, SUBJECT TO SECTION 45 OF
INSURANCE ACT, 1998 As AMENDED FROM TIME TO TIME,
Section I - Details of the Life to be Assured
1. Tite E-] Mr. LJ Mrs. LJ Miss [J Or. LJ Others (Specify)
2. FirstName PETER PAUL
3. Surname T
44. Father's Name
46, Mother's Name ERIKA RESWE D
5. Date of Birth 13/02/1990 6, Gender E] Male ] Female
7. Age Proof L] School Certificate [] Driving License [] Passport {Birth Certificate E"] PAN Card
3, Marital Status D) Single EF] Married 0 Widower) 0 Civorcee
9, Spouse Name ERISEWQ FR
10. Maiden Name
iL. Nationality -] Resident Indian National [J Non Resident Indian (MRI) L] Others (Specify)
12, Education J Postgraduate / Doctorate Ee) Graduate [] 12thstd. Pass [J 10thstd. Pass [J Below 10th std.
OO Dliterate / Uneducated CJ Others (Specify)
13. Address For No 7¥%a vaigai street Flower
Communication Nagar selaiyur
Landmark
City Salem
Pin Code BO00 73: State TAMIL NADU
Address proof [] Passport ([] Driving License [] Voter ID [] Bank Statement [] Utility Bill G4 Others (Specify) Aadhaar Card
14, Permanent No 7¥a vaigai street Flower
Address :
Nagar selaiyur
Landmark
City Salem
Pin Code 5353535 state (TAMIL NADU
Address proof CJ] Passport [9 DrivingLicense [J Voter ID [ Bank Statement [ Utility Bill B] Others (Specify) Aadhaar Card
15. Contact Details Mobile 424242424 Phone (Home)
Office / Business
E-mail fdgrgtr13#yahoo.com
Preferred mode: ((] Letter EF) E-Mail
Preferred Language for Letter {other than English): [] Hindi [] Kannada [-] Tamil J Telugu C] Malayalam C) Gujarati
Bengali GOriya =D] Marathi
16. Occupation CL] Salaried-Govt /PSU ( Salaried-other [9 Self Employed Professional [J Aagriculturist {Farmer [Part Time Business
LJ Retired ] Landlord J Student (current Std) -] Others (Specify) Salaried - MNC
17. Full Name of the Capio software
Employers Businnes/
School/College
18, Designation & Exact nature of Work / Business Manager
19. AnnualIncomein 1,200,000.00 20. Annual Income of Husband / Father = 1,500,000.00
Figures (%) (for female and minor lives)
21. Exact nature of work / business of Husband / Father for female and minor lives Government Employee
Page 10fé
The below code works for me but the problem is if line order changes everything get changed. Is there a way to extract keyword value irrespective of line order. ?
Current Code
path <- getwd()
my_txt <- readLines(paste(path, "/input.txt", sep = ""))
fName <- sub('.*FirstName', '', my_txt[7])
SName <- sub('.*Surname', '', my_txt[8])
FatherNm <- sub(".*Father's Name", '', my_txt[9])
dob <- sub("6, Gender.*", '',sub(".*Date of Birth", '', my_txt[11]))
You can combine the text together as one string and extract the values based on pattern in the data. This approach will work irrespective of the line number in the data provided the pattern in the data is always valid for all the files.
my_txt <- readLines(paste(path, "/input.txt", sep = ""))
#Collapse data in one string
text <- paste0(my_txt, collapse = '\n')
#Extract text after FirstName till '\n'
fName <- sub('.*FirstName (.*?)\n.*', '\\1', text)
fName
#[1] "John Woo"
#Extract text after Surname till '\n'
SName <- sub('.*Surname (.*?)\n.*', '\\1', text)
SName
#[1] "T"
#Extract text after Father's Name till '\n'
FatherNm <- sub(".*Father's Name (.*?)\n.*", '\\1', text)
FatherNm
#[1] "Bill Woo"
#Extract numbers which come after Date of Birth.
dob <- sub(".*Date of Birth (\\d+/\\d+/\\d+).*", '\\1', text)
dob
#[1] "13/07/1970"

extract data from XML files - R

I'm new to extracting data from XML file. I'm trying to process the following an XML file using R XML packages. The information I want is in the attribute values.
I encounter two difficulties:
some attribute values exist in one node, but not in another node. For example, "DRP" has the information in the second but not in the first
some attributes has multiple values for an individual and i don't know how to link them to that individual. For example, "EmpHs" has multiple records for an individual (identified by indvlPK).
Ideally I want the output data has the structure similar to the following:
lastNm
firstNm
indvlPK
fromDt
orgNm
hasCustComp
GIGAX
JEFFREY
2783477
03/2004
GATEWAY FINANCIAL ADVISORS, INC
GIGAX
JEFFREY
2783477
03/2004
GFA IN
GIGAX
JEFFREY
2783477
01/2007
UNITED FIRST
HINSON
BRIAN
2783737
07/1996
LINCOLN FINANCIAL ADVISORS CORPORATION
Y
HINSON
BRIAN
2783737
07/1996
FIRST FINANCIAL GROUP
Y
Is there any way I can parse the data correctly? Thanks!
The code I used but didn't give me what I want:
doc <- "Test.xml"
ind <- xmlParse(doc)
xmltop = xmlRoot(ind)
temp1 <- data.frame(unlist(getNodeSet(xmltop,"//Info/#lastNm")))
temp2 <- data.frame(unlist(getNodeSet(xmltop,"//Info/#firstNm")))
temp3 <- data.frame(unlist(getNodeSet(xmltop,"//Info/#indvlPK")))
temp4 <- data.frame(unlist(getNodeSet(xmltop,"//EmpHs/#fromDt")))
temp5 <- data.frame(unlist(getNodeSet(xmltop,"//DRP/#hasCustComp")))
The data is here:
<?xml version="1.0" encoding="ISO-8859-1"?>
<IAPDIndividualReport GenOn="2021-03-29">
<Indvls>
<Indvl>
<Info lastNm="GIGAX" firstNm="JEFFREY" midNm="W" indvlPK="2783477" actvAGReg="Y" link="https://adviserinfo.sec.gov/individual/summary/2783477"/>
<OthrNms/>
<CrntEmps>
<CrntEmp orgNm="CAMBRIDGE INVESTMENT RESEARCH ADVISORS, INC." orgPK="134139" str1="1776 PLEASANT PLAIN RD." city="FAIRFIELD" state="IA" cntry="United States" postlCd="52556-8757">
<CrntRgstns>
<CrntRgstn regAuth="MO" regCat="RA" st="APPROVED" stDt="2010-09-09"/>
</CrntRgstns>
<BrnchOfLocs>
<BrnchOfLoc city="O&apos;FALLON" state="MO" cntry="United States"/>
</BrnchOfLocs>
</CrntEmp>
</CrntEmps>
<Exms>
<Exm exmCd="S63" exmNm="Uniform Securities Agent State Law Examination" exmDt="1996-08-20"/>
<Exm exmCd="S65" exmNm="Uniform Investment Adviser Law Examination" exmDt="1999-12-21"/>
</Exms>
<Dsgntns/>
<PrevRgstns>
<PrevRgstn orgNm="WOODBURY FINANCIAL SERVICES, INC." orgPK="421" regBeginDt="2009-01-05" regEndDt="2009-12-03">
<BrnchOfLocs>
<BrnchOfLoc city="OFALLON" state="MO"/>
<BrnchOfLoc city="OFALLON" state="MO"/>
<BrnchOfLoc city="DUBLIN" state="CA"/>
</BrnchOfLocs>
</PrevRgstn>
<PrevRgstn orgNm="FSC SECURITIES CORPORATION" orgPK="7461" regBeginDt="2004-10-29" regEndDt="2008-12-01">
<BrnchOfLocs>
<BrnchOfLoc city="O&apos;FALLON" state="MO"/>
<BrnchOfLoc city="ST. PETERS" state="MO"/>
</BrnchOfLocs>
</PrevRgstn>
<PrevRgstn orgNm="GATEWAY FINANCIAL ADVISORS, INC." orgPK="115025" regBeginDt="2004-11-11" regEndDt="2006-10-11">
<BrnchOfLocs>
<BrnchOfLoc city="ST. PETERS" state="MO"/>
</BrnchOfLocs>
</PrevRgstn>
</PrevRgstns>
<EmpHss>
<EmpHs fromDt="03/2004" orgNm="GATEWAY FINANCIAL ADVISORS, INC" city="OFALLON" state="MO"/>
<EmpHs fromDt="03/2004" orgNm="GFA INC" city="OFALLON" state="MO"/>
<EmpHs fromDt="01/2007" orgNm="UNITED FIRST" city="OFALLON" state="MO"/>
<EmpHs fromDt="09/2010" orgNm="CAMBRIDGE INVESTMENT RESEARCH ADVISORS, INC" city="FAIRFIELD" state="IA"/>
<EmpHs fromDt="09/2010" orgNm="CAMBRIDGE INVESTMENT RESEARCH, INC" city="FAIRFIELD" state="IA"/>
</EmpHss>
<OthrBuss>
<OthrBus desc="1)STONEBRIDGE WEALTH MANAGEMENT GROUP, 728 HAWK RUN DR, O&apos;FALLON, MO, 3/2008 AS INDEPENDENT INSURANCE AGENT FOR VARIOUS INDEPENDENT INSURANCE COMPANIES. INV REL - 40/MO - 20/TRADING. 2)UNITED FIRST FINANCIAL MORTGAGE SOFTWARE SALES. START 6/1/07, 10 HOURS PER MONTH, 5 DURING TRADING HOURS. NO OWNERSHIP INTEREST. 3)MORTGAGE STOP INC., 728 HAWK RUN DR., OFALLON, MO 63368. LOAN OFFICER PROCESSING LOAN APPS FOR CLIENTS. START 6/1/2002, 25 HOURS PER MONTH, 10 DURING TRADING HOURS. NO OWNERSHIP. 4)CIRA, 1776 PLEASANT PLAIN RD, FAIRFIELD, IA, AS ADVISORY REP OF A RIA. INV REL - 40 HR/WK - 40/TRADING. SEE EMPLOYMENT HISTORY FOR START DATE. 5) THE MORTGAGE SHOP, 355 MID RIVERS MALL DRIVE, STE E, ST. PETERS, MO 63376. MORTGAGE ORIGINATOR SINCE 01/01/99. NOT INVESTMENT RELATED. WORKS 60 HOURS PER MONTH, 20 OF WHICH ARE DURING TRADING HOURS. 6.365 PROPERTIES LLC, O&apos;FALLON, MO, 8/2018 AS OWNER OF LLC THAT BUYS, SELLS, & HOLDS REAL ESTATE. NIR - 20/MO - 0/TRADING. 7. BEST OFFER HOMES, LLC, 728 HAWK RUN DRIVE, O&apos;FALLON, MO, REAL ESTATE SALES/MORTGAGE ORIGINATION/ ACCOUNTING/FINANCIAL ACTIVITIES, 06/16/20, NIR, 20/MO- 0/TRADING 8. GIGAX WEALTH MANAGEMENT, 728 HAWK RUN DRIVE, OFALLON, MO, INDEPENDENT INSURANCE AGENT FOR VARIOUS INDEPENDENT INSURANCE COMPANIES,11/23/20, INV REL, 10 HR/WK- 10 TRADING HR."/>
</OthrBuss>
<DRPs/>
</Indvl>
<Indvl>
<Info lastNm="HINSON" firstNm="BRIAN" midNm="TROY" indvlPK="2783737" actvAGReg="Y" link="https://adviserinfo.sec.gov/individual/summary/2783737"/>
<OthrNms/>
<CrntEmps>
<CrntEmp orgNm="BRIDGEWORTH WEALTH MANAGEMENT" orgPK="164100" str1="101 25TH STREET NORTH" city="BIRMINGHAM" state="AL" cntry="United States" postlCd="35203">
<CrntRgstns>
<CrntRgstn regAuth="AL" regCat="RA" st="APPROVED" stDt="2015-05-12"/>
<CrntRgstn regAuth="TX" regCat="RA" st="APPROVED_RES" stDt="2015-05-01"/>
</CrntRgstns>
<BrnchOfLocs>
<BrnchOfLoc str1="400 MERIDIAN STREET" str2="SUITE 200" city="HUNTSVILLE" state="AL" cntry="United States" postlCd="35801"/>
<BrnchOfLoc str1="101 25TH STREET NORTH" city="BIRMINGHAM" state="AL" cntry="United States" postlCd="35203"/>
</BrnchOfLocs>
</CrntEmp>
</CrntEmps>
<Exms>
<Exm exmCd="S63" exmNm="Uniform Securities Agent State Law Examination" exmDt="1996-10-11"/>
</Exms>
<Dsgntns>
<Dsgntn dsgntnNm="Certified Financial Planner"/>
<Dsgntn dsgntnNm="Chartered Financial Consultant"/>
<Dsgntn dsgntnNm="Personal Financial Specialist"/>
</Dsgntns>
<PrevRgstns>
<PrevRgstn orgNm="LINCOLN FINANCIAL ADVISORS CORPORATION" orgPK="3978" regBeginDt="2000-04-25" regEndDt="2015-05-11">
<BrnchOfLocs>
<BrnchOfLoc city="HUNTSVILLE" state="AL"/>
<BrnchOfLoc city="HUNTSVILLE" state="AL"/>
</BrnchOfLocs>
</PrevRgstn>
</PrevRgstns>
<EmpHss>
<EmpHs fromDt="04/2015" orgNm="BRIDGEWORTH, LLC" city="HUNTSVILLE" state="AL"/>
<EmpHs fromDt="07/1996" toDt="04/2015" orgNm="LINCOLN FINANCIAL ADVISORS CORPORATION" city="HUNTSVILLE" state="AL"/>
<EmpHs fromDt="07/1996" toDt="04/2015" orgNm="FIRST FINANCIAL GROUP" city="BIRMINGHAM" state="AL"/>
<EmpHs fromDt="04/2015" orgNm="LPL FINANCIAL LLC" city="HUNTSVILLE" state="AL"/>
</EmpHss>
<OthrBuss>
<OthrBus desc="1) 04/30/2015: BRIDGEWORTH FINANCIAL, LLC - DBA FOR LPL BUSINESS (ENTITY FOR LPL BUSINESS) - INV REL - AT REPORTED BUSINESS LOCATIONS - START 01/01/2015 - 1% OF TIME SPENT 2) 04/30/2015: BRIDGEWORTH, LLC - INV REL - AT REPORTED BUSINESS LOCATION(S) - REGISTERED INVESTMENT ADVISOR HYBRID - START 01/2015 - 99% OF TIME SPENT. 3) 5/11/2015: NO BUSINESS NAME - INVESTMENT RELATED - AT REPORTED BUSINESS LOCATION(S) - NON-VARIABLE INSURANCE - STARTED 4/1/2015 - TIME SPENT 1% - LINES OF INSURANCE INCLUDE TERM, WHOLE, UNIVERSAL, LTC, DISABILITY. 4) 6/2/2017 - Bridgeworth Financial - Investment Related - At Reported Business Location(s) - DBA for LPL Business (entity for LPL business) - Started 04/30/2015 - 5 Hours Per Month/3 Hours During Securities Trading. 5) 5/8/2018 - Foster Properties Ltd - Not Investment Related - Home Based - Other-Family Business - Started 12/22/1997 - 1 Hours Per Month/0 Hours During Securities Trading - Handle the majority of business matters for this family business."/>
</OthrBuss>
<DRPs>
<DRP hasRegAction="N" hasCriminal="N" hasBankrupt="N" hasCivilJudc="N" hasBond="N" hasJudgment="N" hasInvstgn="N" hasCustComp="Y" hasTermination="N"/>
</DRPs>
</Indvl>
</Indvls>
</IAPDIndividualReport>

How can I bring values in from another data frame based on a match?

I have two data frames: df1 and codesDesc
df1 contains information that has certain codes and I want to add the relevant description into df1$desc (new column) by performing a lookup in codesDesc.
I have tried something like this:
df1$desc <- codesDesc$desc[df1$code %in% codesDesc$code]
Or this:
df1$desc <- codesDesc$desc[which(df1$code %in% codesDesc$code)]
But both fail due to the number of replacement rows not matching.
What am I missing here? I'm guessing that it's a syntactic error on my part.
dput(df1):
structure(list(dx = structure(1:108, .Label = c("Dx010", "Dx0101",
"Dx0103", "Dx0104", "Dx0105", "Dx0106", "Dx0107", "Dx011", "Dx0111",
"Dx0112", "Dx01120", "Dx01121", "Dx01122", "Dx0113", "Dx0114",
"Dx0115", "Dx0116", "Dx0117", "Dx0118", "Dx0119", "Dx012", "Dx0121",
"Dx0122", "Dx0126", "Dx0127", "Dx013", "Dx014", "Dx016", "Dx0162",
"Dx02", "Dx03", "Dx05", "Dx06", "Dx07", "Dx08", "Dx09", "Dx10",
"Dx106", "Dx108", "Dx11", "Dx110", "Dx111", "Dx115", "Dx116",
"Dx117", "Dx118", "Dx119", "Dx12", "Dx120", "Dx13", "Dx14", "Dx15",
"Dx16", "Dx18", "Dx19", "Dx20", "Dx21", "Dx22", "Dx28", "Dx30",
"Dx31", "Dx32", "Dx321", "Dx322", "Dx323", "Dx324", "Dx325",
"Dx326", "Dx327", "Dx328", "Dx329", "Dx330", "Dx332", "Dx333",
"Dx334", "Dx335", "Dx336", "Dx34", "Dx35", "Dx38", "Dx39", "Dx404",
"Dx45", "Dx46", "Dx48", "Dx49", "Dx50", "Dx58", "Dx59", "Dx75",
"Dx76", "Dx77", "Dx78", "Dx80", "Dx81", "Dx82", "Dx85", "Dx86",
"Dx87", "Dx88", "Dx89", "Dx91", "Dx92", "Dx93", "Dx94", "Dx96",
"Dx97", "Dx98", "NULL"), class = "factor"), freq = c(24L, 20L,
6L, 2L, 76L, 90L, 13L, 33L, 11L, 912L, 1L, 67L, 22L, 98L, 121L,
15L, 41L, 87L, 38L, 172L, 146L, 75L, 93L, 6L, 3L, 12L, 10L, 20L,
10L, 1026L, 309L, 4255L, 3006L, 1180L, 2580L, 158L, 40L, 33L,
1893L, 4521L, 9L, 1L, 2L, 126L, 1L, 5L, 18L, 557L, 11L, 398L,
249L, 250L, 169L, 34L, 135L, 432L, 644L, 163L, 101L, 9L, 28L,
910L, 258L, 171L, 744L, 90L, 225L, 24L, 6L, 2L, 39L, 5L, 1L,
3231L, 924L, 3213L, 6L, 23L, 1101L, 1208L, 64L, 2L, 27L, 114L,
5L, 11L, 21L, 66L, 27L, 513L, 565L, 129L, 210L, 59L, 5L, 376L,
653L, 65L, 68L, 3L, 18L, 1L, 95L, 64L, 2L, 274L, 2L, 1L)), row.names = c(NA,
108L), class = "data.frame")
dput(codesDesc):
structure(list(dx = c("Dx015", "Dx019", "Dx023", "Dx027", "Dx04",
"Dx100", "Dx101", "Dx103", "Dx105", "Dx109", "Dx24", "Dx26",
"Dx27", "Dx280", "Dx29", "Dx33", "Dx36", "Dx37", "Dx380", "Dx40",
"Dx41", "Dx53", "Dx54", "Dx55", "Dx56", "Dx57", "Dx65", "Dx66",
"Dx67", "Dx68", "Dx69", "Dx70", "Dx71", "Dx72", "Dx79", "Dx",
"Dx011", "Dx012", "Dx016", "Dx02", "Dx021", "Dx03", "Dx05", "Dx06",
"Dx07", "Dx08", "Dx09", "Dx108", "Dx11", "Dx1111", "Dx118", "Dx12",
"Dx13", "Dx14", "Dx15", "Dx16", "Dx18", "Dx19", "Dx20", "Dx21",
"Dx22", "Dx28", "Dx30", "Dx31", "Dx32", "Dx325", "Dx34", "Dx35",
"Dx38", "Dx39", "Dx49", "Dx50", "Dx60", "Dx61", "Dx62", "Dx64",
"Dx75", "Dx80", "Dx81", "Dx82", "Dx85", "Dx86", "Dx87", "Dx90",
"Dx92", "Dx94", "Dx", "Dx010", "Dx0101", "Dx0102", "Dx0103",
"Dx0104", "Dx0105", "Dx0106", "Dx0107", "Dx011", "Dx0111", "Dx0112",
"Dx0113", "Dx0114", "Dx0115", "Dx0116", "Dx0117", "Dx0118", "Dx0119",
"Dx01120", "Dx01121", "Dx01122", "Dx012", "Dx013", "Dx014", "Dx016",
"Dx0161", "Dx0162", "Dx017", "Dx018", "Dx0181", "Dx02", "Dx021",
"Dx024", "Dx025", "Dx026", "Dx028", "Dx03", "Dx05", "Dx06", "Dx07",
"Dx08", "Dx09", "Dx10", "Dx106", "Dx108", "Dx11", "Dx110", "Dx111",
"Dx1111", "Dx112", "Dx113", "Dx114", "Dx115", "Dx116", "Dx117",
"Dx118 ", "Dx119", "Dx12\n", "Dx120", "Dx121", "Dx13", "Dx14",
"Dx15", "Dx16", "Dx17\n", "Dx18", "Dx19", "Dx20", "Dx21", "Dx22",
"Dx23", "Dx25", "Dx28", "Dx30", "Dx31", "Dx32", "Dx321", "Dx322",
"Dx323", "Dx324", "Dx325", "Dx326", "Dx327", "Dx328", "Dx329",
"Dx330", "Dx332", "Dx333", "Dx334", "Dx335", "Dx336", "Dx337",
"Dx34", "Dx35", "Dx38", "Dx39", "Dx42", "Dx43", "Dx45", "Dx46",
"Dx47", "Dx48", "Dx49", "Dx50", "Dx51", "Dx52", "Dx58", "Dx59",
"Dx60", "Dx63", "Dx64", "Dx73", "Dx74", "Dx75", "Dx76", "Dx77",
"Dx78", "Dx80", "Dx81", "Dx82", "Dx83", "Dx84", "Dx85", "Dx86",
"Dx87", "Dx88", "Dx89", "Dx91", "Dx92", "Dx93", "Dx94", "Dx95",
"Dx96", "DX97", "Dx98", "Dx0121", "Dx0122", "Dx0123", "Dx0125",
"Dx0126", "Dx0127", "Dx0128", "Dx400", "DX401", "DX402", "DX403",
"DX404", "DX405", "DX406", "DX407", "DX408", "DX409"), disposition = c("Priority Transport to Emergency Department ",
"Hazardous Area Response Team", "Assistance is being dispatched to arrive within 30 minutes",
"Assistance is being dispatched to arrive within 8 hours", "Go to the Emergency Department within 1 hour",
"Call Terminated Early", "Call Handler terminated the call",
"Refer To A Clinician From Our Service - Caller Unhappy With The Disposition",
"Service response is required", "Dispatch of other emergency services",
"Health Protection Emergency", "Contact Care Plan Provider within agreed timescales",
"Contact Poisons Centre", "Speak to a nurse from our service for home management advice",
"Contact Specialist Practitioner", "Speak to Clinician From our Service Within 10 Minutes",
"Refer to Health Information Advisor Immediately", "Contact Secondary Care Routine",
"Speak to a nurse from our service for home management advice",
"Refer to Health Information Advisor within 15 minutes", "Refer to Health Information Advisor next working day",
"Refer to Health Information Advisor Immediately", "Refer to Senior Colleague",
"The disposition is Locally Approved Disposition", "The disposition is Follow Admission Protocol",
"Specialist Advice – Contraception ", "Flu Line Dispositions",
"Flu Line Dispositions", "Flu Line Dispositions", "Flu Line Dispositions",
"Flu Line Dispositions", "Flu Line Dispositions", "Flu Line Dispositions",
"Direct referral to Primary Care practitioner for assessment",
"Failed Contraception", "NHS Pathways Disposition Terms ", "Emergency Department Priority 1",
"Emergency Department Priority 2", "Emergency Department Priority 4",
"Emergency Department Priority 3", "Emergency Department Priority 3",
"Emergency Department Priority 4", "Primary Care Priority 1",
"Primary Care Priority 2", "Primary Care Priority 2", "Primary Care Priority 3",
"Primary Care Priority 4", "No further triage indicated", "Primary Care Priority 1",
"Emergency Department Priority 4", "Emergency Department Priority 4",
"Primary Care Priority 1", "Primary Care Priority 2", "Primary Care Priority 2",
"Primary Care Priority 3", "Primary Care Priority 4", "Primary Care Dental Priority 2",
"Primary Care Dental Priority 2", "Primary Care Dental Priority 2",
"Primary Care Dental Priority 2", "Primary Care Dental Priority 4",
"Urgent Care Centre Pharmacist", "Primary Care Midwife Priority 1 ",
"Primary Care GUM ", "Primary Care Priority 1", "One of my clinical colleagues needs to see you - Toxic Ingestion/Inhalation ED Priority 3",
"Primary Care Priority 1", "Primary Care Priority 1", "Primary Care Priority 4",
"Primary Care Priority 4", "Emergency Department Priority 3",
"Midwife or Labour Suite immediately Priority 2", "Primary Care Centre Optician",
"Speak to the GP Practice within 20 minutes ", "999 For an Ambulance ",
"Primary Care Centre - Epidemic - Antiviral", "Primary Care Priority 4",
"Primary Care Centre Repeat Prescription within 6 hours", "Primary Care Centre Repeat Prescription within 12 hours",
"Primary Care Centre Medication Enquiry", "Primary Care Centre Repeat Prescription required within 2 hours",
"Primary Care Centre Repeat Prescription within 12 hours", "Urgent Care Centre Repeat Prescription within 24 hours",
"Repeat Prescription required ", "Emergency Department Mental Health Priority 3",
"Emergency Department Sexual Assault Assessment Priority 3",
"NHS Pathways Disposition Terms ", "\n\nEmergency Ambulance Response for Potential Cardiac Arrest \n\n",
"Emergency Ambulance Response for Potential Cardiac Arrest",
"Emergency Ambulance Response for Potential Cardiac Arrest Post Delivery ",
"Emergency Ambulance response for Fitting Now", "Emergency Ambulance Response for Major Blood Loss",
"Emergency Ambulance Response for Potential Shock ", "Emergency Ambulance Response for Respiratory Distress",
"Emergency Ambulance Response for Unconsciousness", "Emergency Ambulance Response ",
"Emergency Ambulance Response for Acute Abdomen Pregnant", "Emergency Ambulance Response for Acute Coronary Syndrome",
"Emergency Ambulance Response for Anaphylaxis", "Emergency Ambulance Response for Aortic Aneurysm Rupture/Dissection",
"Emergency Ambulance for Labour Complications", "Emergency Ambulance Response for Major Blood Loss",
"Emergency Ambulance Response for Possible Stroke Time Critical",
"Emergency Ambulance Response for Potential Shock", "Emergency Ambulance Response for Respiratory Distress Non-Trauma",
"Emergency Ambulance Response for Respiratory Distress Trauma",
"Emergency Ambulance Response for Septicaemia", "Emergency Ambulance for Unconsciousness",
"Emergency Ambulance Response (Category 3)", "Assistance needed at home due to inability to get off the floor ",
"Crew arrived before a disposition was reached ", "Non-emergency Ambulance Response ",
"Non-emergency Ambulance Response possible Viral Haemorrhagic Fever ",
"Transport to an Emergency Treatment Centre within 1 hour \n(Category 3)",
"Ambulance for Clinical Reasons", "Ambulance for Transport Reasons",
"Emergency Ambulance due to Clinical Reasons ", "Attend Emergency Treatment Centre within 1 Hour",
"Attend Emergency Treatment Centre within 1 hour possible Viral Haemorrhagic Fever",
"Assistance is being dispatched to arrive within 2 hours ", "Assistance is being dispatched to arrive within 4 hours",
"A Deferred Dispatch is being arranged ", "Assistance is being dispatched to arrive within 1 hour ",
"Attend Emergency Treatment Centre within 4 Hours ", "To contact a Primary Care Service within 2 Hours ",
"To contact a Primary Care Service within 6 Hours ", "To contact a Primary Care Service within 12 Hours ",
"To contact a Primary Care Service within 24 Hours ", "For persistent or recurrent symptoms: get in touch with the GP Practice for a Non-Urgent Appointment ",
"MUST contact own GP Practice for a Non-Urgent appointment ",
"A Clinician from our Service will call the individual back immediately to assess the problem ",
"The call is closed with no further action needed", "Speak to a Primary Care Service within 1 Hour",
"Community Nurse within 4 hours ", "Community Nurse within 24 hours ",
"Speak to a Primary Care Service within 1 hour possible Viral Haemorrhagic Fever ",
"Community Nurse next working day ", "Health Visitor next working day ",
"Community Midwife next working Day ", "Contact own GP Practice next working day for an appointment ",
"Speak to a Primary Care Service within 6 hours for Expected Death ",
"Speak to a Primary Care Service within 1 hour for Palliative Care ",
"Attend Emergency Dental Treatment Centre within 4 hours ", "Callback by Healthcare Professional within 2 hours",
"Speak to a Primary Care Service within 2 Hours ", "Callback by Healthcare Professional within 4 hours",
"Speak to a Clinician Immediately for Assessment of Symptoms",
"Speak to a Primary Care Service within 6 Hours ", "Speak to a Primary Care Service within 12 Hours ",
"Speak to a Primary Care Service within 24 Hours", "For persistent or recurrent symptoms: get in touch with the GP Practice within 3 working days ",
"To contact a Dental Service within 1 hour ", "To Contact a Dental Service within 2 hours ",
"To contact a Dental Service within 6 hours ", "To contact a Dental Service within 12 hours ",
"To contact a Dental Service within 24 hours ", "To contact a Dental Practice within 5 working days ",
"Contact Orthodontist next working day ", "Home Management ",
"Contact Pharmacist within 12 hours ", "Speak to Midwife within 1 hour ",
"Contact Genito-Urinary Clinic or other local service ", "Speak to a Clinician from our service Immediately ",
"Speak to a Clinician from our service Immediately – Refused Ambulance Disposition",
"Speak to a Clinician from our service Immediately – Refused Emergency Treatment Centre Disposition ",
"Speak to a Clinician from our service Immediately – Refused Primary Care Service Disposition ",
"Speak to a Clinician from our service Immediately – Refused Disposition ",
"Speak to a Clinician from our service immediately – Toxic Ingestion/Inhalation ",
"Speak to a Clinician from our service immediately – Frequent Caller ",
"Speak to a Clinician from our service immediately – Chemical Eye Splash (Green 3)",
"Speak to a Clinician from our service Immediately – Management of Dying Individual (Expected) (Green 3)",
"Speak to a Clinician from our service Immediately - Failed Contraception ",
"Speak to a Clinician from our Service Immediately - Burn Chemical",
"Speak to a Clinician from our service Immediately Management of Palliative Care",
"Speak to a Clinician from our service Immediately - Ambulance Validation",
"Speak to a Clinician from our service Immediately - Emergency Treatment Centre Validation",
"Speak to a Clinician from our service Immediately - Other Disposition Validation",
"Paramedic requesting callback from Healthcare Professional within 30mins",
"Speak to an Assessor Immediately for Assessment of Symptoms",
"Speak to Clinician from our service within 30 minutes ", "Speak to Clinician from our service within 2 hours ",
"Speak to Clinician from our service for home management advice ",
"Symptom Management Advice ", "Child protection Vulnerable Adult immediate referral ",
"Child protection / Vulnerable Adult non immediate referral",
"Provide Service Location Information ", "Refer to Health Information within 24 hours ",
"Refer to a Community Healthcare Professional ", "Refer to another Out-Of-Hours Service Provider ",
"999 for police (Green 4)", "Speak to Midwife or Labour Suite immediately ",
"Speak to Midwife within 2 hours ", "The call is closed with referral to the Police only ",
"No Service Clinician available refer for urgent (20 minutes ) Primary Care Clinical Assessment.",
"No Service Clinician available refer for urgent 60 minutes primary care clinical assessment",
"Contact Optician next routine appointment within 72 Hours (3 days from now) ",
"Refer to Flu line ", "Speak to the Primary Care Service within 2 hours for antiviral assessment ",
"Refer To Social Services Immediately ", "Refer To Social Services Routinely ",
"MUST contact own GP Practice within 3 working days ", "Call back by Healthcare Professional within 30 minutes ",
"Call back by Healthcare Professional within 60 minutes ", "Receive report of results or tests from laboratory ",
"Repeat Prescription required within 6 hours ", "Contact own GP Practice next working day for a repeat prescription ",
"Medication Enquiry ", "Clinician Home Management of Dying Individual (Expected) ",
"Refer to Another Agency ", "Repeat prescription required within 2 hours ",
"Repeat prescription required within 12 hours", "Repeat prescription required within 24 hours ",
"Speak to a Dental Service within 2 hours", "Attend Emergency Treatment Centre within 12 hours ",
"Unexpected death ", "Refer to Mental Health/Crisis Service within 4 hours",
"Speak to the GP Practice within 1 hour (3 calls within 4 days)",
"Attend Emergency Treatment Centre within 1 hour for Sexual Assault Assessment ",
"This call is closed with no further action required wrong service called ",
"Refer to Health Information within 12 hours ", "Emergency Contraception required within 2 hours ",
"Emergency Contraception required within 12 hours ", "Emergency Ambulance Response (Category 3)",
"Emergency Ambulance Response (Category 3)", "Emergency Ambulance Response (Category 3)",
"Emergency Ambulance Response (Category 3)", "Emergency Ambulance Response for Trauma Emergency (Category 3)",
"Emergency Ambulance Response for Pregnancy/Labour problem (Category 3)",
"Non-emergency Ambulance Response", "Speak to an Assessor Immediately for Assessment of Significant Blood Loss",
"Speak to an Assessor Immediately for Assessment of Breathing Difficulties ",
"Speak to an Assessor Immediately for Assessment of Potential Critical Illness",
"Speak to an Assessor Immediately for Assessment of Potential Life Threatening Shock",
"Speak to an Assessor Immediately for Symptomatic Assessment",
"Speak to an Assessor Immediately for Assessment of Chest Pain",
"Speak to an Assessor Immediately for Assessment of Major Trauma",
"Speak to an Assessor Immediately for Assessment of Head Injury",
" Speak to an Assessor Immediately for Assessment of Probable Stroke or Mini-Stroke",
"Speak to an Assessor Immediately for Assessment of Possible Allergic Reaction"
)), class = "data.frame", row.names = c(NA, -239L))
What about merge the datasets?
In this case a left join:
merged <- merge(x = df1, y = codesDesc, by = "dx", all.x = TRUE)
head(merged)
dx freq disposition
1 Dx010 24 \n\nEmergency Ambulance Response for Potential Cardiac Arrest \n\n
2 Dx0101 20 Emergency Ambulance Response for Potential Cardiac Arrest
3 Dx0103 6 Emergency Ambulance response for Fitting Now
4 Dx0104 2 Emergency Ambulance Response for Major Blood Loss
5 Dx0105 76 Emergency Ambulance Response for Potential Shock
6 Dx0106 90 Emergency Ambulance Response for Respiratory Distress
Or using dplyr:
library(dplyr)
k <- df1 %>% left_join(codesDesc)
Note you have some double descriptions in your codesDesc, so the result has more rows than df1.
library(dplyr)
double_ <- as.data.frame.table(table( codesDesc$dx)) %>% filter(Freq >= 2)
And in df1 you have some of the double codes:
df1[df1$dx %in% double_$Var1,]

How to correctly write a csv or text file in R

I am trying to save a csv file or a text file in R.
One column has paragraphs in which has commas in so when I write;
write.table(x, file = "D:/text.csv", sep = ",", row.names = FALSE)
y <- read.csv(file = "D:/text.csv")
It writes the file but when I read it in I go from 50 rows to 57 rows. I understand that its probably because of the sep = "," argument and I could change it to "|" but these files are really large and the text column in likely to contain many different characters the wont work using sep.
How can I overcome this issue?
Data:
x <- structure(list(Document = c("https://www.sec.gov/Archives/edgar/data/718877/000104746916010584/a2227483z10-k.htm.1",
"https://www.sec.gov/Archives/edgar/data/718877/000104746916010584/a2227483z10-k.htm.2",
"https://www.sec.gov/Archives/edgar/data/718877/000104746916010584/a2227483z10-k.htm.3",
"https://www.sec.gov/Archives/edgar/data/718877/000104746916010584/a2227483z10-k.htm.4",
"https://www.sec.gov/Archives/edgar/data/718877/000104746916010584/a2227483z10-k.htm.5",
"https://www.sec.gov/Archives/edgar/data/718877/000104746916010584/a2227483z10-k.htm.6",
"https://www.sec.gov/Archives/edgar/data/718877/000104746916010584/a2227483z10-k.htm.7",
"https://www.sec.gov/Archives/edgar/data/718877/000104746916010584/a2227483z10-k.htm.8",
"https://www.sec.gov/Archives/edgar/data/718877/000104746916010584/a2227483z10-k.htm.9",
"https://www.sec.gov/Archives/edgar/data/718877/000104746916010584/a2227483z10-k.htm.10",
"https://www.sec.gov/Archives/edgar/data/718877/000104746916010584/a2227483z10-k.htm.11",
"https://www.sec.gov/Archives/edgar/data/718877/000104746916010584/a2227483z10-k.htm.12",
"https://www.sec.gov/Archives/edgar/data/718877/000104746916010584/a2227483z10-k.htm.13",
"https://www.sec.gov/Archives/edgar/data/718877/000104746916010584/a2227483z10-k.htm.14",
"https://www.sec.gov/Archives/edgar/data/718877/000104746916010584/a2227483z10-k.htm.15",
"https://www.sec.gov/Archives/edgar/data/718877/000104746916010584/a2227483z10-k.htm.16",
"https://www.sec.gov/Archives/edgar/data/718877/000104746916010584/a2227483z10-k.htm.17",
"https://www.sec.gov/Archives/edgar/data/718877/000104746916010584/a2227483z10-k.htm.18",
"https://www.sec.gov/Archives/edgar/data/718877/000104746916010584/a2227483z10-k.htm.19",
"https://www.sec.gov/Archives/edgar/data/718877/000104746916010584/a2227483z10-k.htm.20",
"https://www.sec.gov/Archives/edgar/data/718877/000104746916010584/a2227483z10-k.htm.21",
"https://www.sec.gov/Archives/edgar/data/718877/000104746916010584/a2227483z10-k.htm.22",
"https://www.sec.gov/Archives/edgar/data/718877/000104746916010584/a2227483z10-k.htm.23",
"https://www.sec.gov/Archives/edgar/data/718877/000104746916010584/a2227483z10-k.htm.24",
"https://www.sec.gov/Archives/edgar/data/718877/000104746916010584/a2227483z10-k.htm.25",
"https://www.sec.gov/Archives/edgar/data/718877/000104746916010584/a2227483z10-k.htm.26",
"https://www.sec.gov/Archives/edgar/data/718877/000104746916010584/a2227483z10-k.htm.27",
"https://www.sec.gov/Archives/edgar/data/718877/000104746916010584/a2227483z10-k.htm.28",
"https://www.sec.gov/Archives/edgar/data/718877/000104746916010584/a2227483z10-k.htm.29",
"https://www.sec.gov/Archives/edgar/data/718877/000104746916010584/a2227483z10-k.htm.30",
"https://www.sec.gov/Archives/edgar/data/718877/000104746916010584/a2227483z10-k.htm.31",
"https://www.sec.gov/Archives/edgar/data/718877/000104746916010584/a2227483z10-k.htm.32",
"https://www.sec.gov/Archives/edgar/data/718877/000104746916010584/a2227483z10-k.htm.33",
"https://www.sec.gov/Archives/edgar/data/718877/000104746916010584/a2227483z10-k.htm.34",
"https://www.sec.gov/Archives/edgar/data/718877/000104746916010584/a2227483z10-k.htm.35",
"https://www.sec.gov/Archives/edgar/data/718877/000104746916010584/a2227483z10-k.htm.36",
"https://www.sec.gov/Archives/edgar/data/718877/000104746916010584/a2227483z10-k.htm.37",
"https://www.sec.gov/Archives/edgar/data/718877/000104746916010584/a2227483z10-k.htm.38",
"https://www.sec.gov/Archives/edgar/data/718877/000104746916010584/a2227483z10-k.htm.39",
"https://www.sec.gov/Archives/edgar/data/718877/000104746916010584/a2227483z10-k.htm.40",
"https://www.sec.gov/Archives/edgar/data/718877/000104746916010584/a2227483z10-k.htm.41",
"https://www.sec.gov/Archives/edgar/data/718877/000104746916010584/a2227483z10-k.htm.42",
"https://www.sec.gov/Archives/edgar/data/718877/000104746916010584/a2227483z10-k.htm.43",
"https://www.sec.gov/Archives/edgar/data/718877/000104746916010584/a2227483z10-k.htm.44",
"https://www.sec.gov/Archives/edgar/data/718877/000104746916010584/a2227483z10-k.htm.45",
"https://www.sec.gov/Archives/edgar/data/718877/000104746916010584/a2227483z10-k.htm.46",
"https://www.sec.gov/Archives/edgar/data/718877/000104746916010584/a2227483z10-k.htm.47",
"https://www.sec.gov/Archives/edgar/data/718877/000104746916010584/a2227483z10-k.htm.48",
"https://www.sec.gov/Archives/edgar/data/718877/000104746916010584/a2227483z10-k.htm.49",
"https://www.sec.gov/Archives/edgar/data/718877/000104746916010584/a2227483z10-k.htm.50"
), text = c("Use these links to rapidly review the document TABLE OF CONTENTS Item 8. FINANCIAL STATEMENTS AND SUPPLEMENTARY DATA",
"Table of Contents", "UNITED STATES SECURITIES AND EXCHANGE COMMISSION Washington, D.C. 20549",
"FORM 10-K", "(Mark one) ý ANNUAL REPORT PURSUANT TO SECTION 13 OR 15(d) OF THE SECURITIES EXCHANGE ACT OF 1934 For the Fiscal Year Ended December 31, 2015 OR o TRANSITION REPORT PURSUANT TO SECTION 13 OR 15(d) OF THE SECURITIES EXCHANGE ACT OF 1934 For the transition period from to",
"Commission File Number 1-15839", "ACTIVISION BLIZZARD, INC. (Exact name of registrant as specified in its charter)",
"Delaware (State or other jurisdiction of incorporation or organization) 95-4803544 (I.R.S. Employer Identification No.) 3100 Ocean Park Boulevard, Santa Monica, CA (Address of principal executive offices) 90405 (Zip Code)",
"Registrant's telephone number, including area code: (310) 255-2000",
"Securities registered pursuant to Section 12(b) of the Act:",
"Title of each Class Name of Each Exchange on Which Registered Common Stock, par value $.000001 per share The NASDAQ Global Select Market",
"Securities registered pursuant to Section 12(g) of the Act: None",
"Indicate by check mark if the registrant is a well-known seasoned issuer, as defined in Rule 405 of the Securities Act. Yes ý No o",
"Indicate by check mark if the registrant is not required to file reports pursuant to Section 13 or Section 15 (d) of the Act. Yes o No ý",
"Indicate by check mark whether the registrant (1) has filed all reports required to be filed by Section 13 or 15(d) of the Securities Exchange Act of 1934 during the preceding 12 months (or for such shorter period that the registrant was required to file such reports), and (2) has been subject to such filing requirements for the past 90 days. Yes ý No o",
"Indicate by check mark whether the registrant has submitted electronically and posted on its corporate Web site, if any, every Interactive Data File required to be submitted and posted pursuant to Rule 405 of Regulation S-T (§ 232.405 of this chapter) during the preceding 12 months (or for such shorter period that the registrant was required to submit and post such files). Yes ý No o",
"Indicate by check mark if disclosure of delinquent filers pursuant to Item 405 of Regulation S-K is not contained herein, and will not be contained, to the best of the registrant's knowledge, in definitive proxy or information statements incorporated by reference in Part III of this Form 10-K or any amendment to this Form 10-K. ý",
"Indicate by check mark whether the registrant is a large accelerated filer, an accelerated filer, a non-accelerated filer, or a smaller reporting company. See the definitions of \"large accelerated filer,\" \"accelerated filer,\" and \"smaller reporting company\" in Rule 12b-2 of the Exchange Act.",
"Large Accelerated Filer ý Accelerated Filer o Non-accelerated Filer o (Do not check if a smaller reporting company) Smaller Reporting Company o",
"Indicate by check mark whether the registrant is a shell company (as defined in Rule 12b-2 of the Act). Yes o No ý",
"The aggregate market value of the registrant's Common Stock held by non-affiliates on June 30, 2015 (based on the closing sale price as reported on the NASDAQ) was $13,345,675,247.",
"The number of shares of the registrant's Common Stock outstanding at February 22, 2016 was 734,998,115.",
"Documents Incorporated by Reference", "Portions of the registrant's definitive Proxy Statement, to be filed with the Securities and Exchange Commission with respect to the 2016 Annual Meeting of Shareholders which is expected to be held on June 2, 2016, are incorporated by reference into Part III of this Annual Report.",
"Table of Contents", "ACTIVISION BLIZZARD, INC. AND SUBSIDIARIES Table of Contents",
"Page No. PART I. 3 Cautionary Statement 3 Item 1. Business Item 1A. Risk Factors 15 Item 1B. Unresolved Staff Comments 40 Item 2. Properties 40 Item 3. Legal Proceedings 40 Item 4. Mine Safety Disclosures 41 PART II. 42 Item 5. Market for Registrant's Common Equity, Related Stockholder Matters, and Issuer Purchases of Equity Securities 42 Item 6. Selected Financial Data 45 Item 7. Management's Discussion and Analysis of Financial Condition and Results of Operations 46 Item 7A. Quantitative and Qualitative Disclosures about Market Risk 83 Item 8. Financial Statements and Supplementary Data 86 Item 9. Changes in and Disagreements with Accountants on Accounting and Financial Disclosure 86 Item 9A. Controls and Procedures 86 Item 9B. Other Information 87 PART III. 88 Item 10. Directors, Executive Officers, and Corporate Governance 88 Item 11. Executive Compensation 88 Item 12. Security Ownership of Certain Beneficial Owners and Management and Related Stockholder Matters 88 Item 13. Certain Relationships and Related Transactions, and Director Independence 88 Item 14. Principal Accounting Fees and Services 88 PART IV. 89 Item 15. Exhibits, Financial Statement Schedule 89 SIGNATURES 90 Exhibit Index E-1",
"2", "Table of Contents", "PART I", "CAUTIONARY STATEMENT", "This Annual Report on Form 10-K contains, or incorporates by reference, certain forward-looking statements within the meaning of the Private Securities Litigation Reform Act of 1995. Such statements consist of any statement other than a recitation of historical facts and include, but are not limited to: (1) projections of revenues, expenses, income or loss, earnings or loss per share, cash flow or other financial items; (2) statements of our plans and objectives, including those relating to product releases; (3) statements of future financial or operating performance; (4) statements relating to the acquisition of King Digital Entertainment plc and expected impact of that transaction, including without limitation, the expected impact on Activision Blizzard's future financial results; and (5) statements of assumptions underlying such statements. Activision Blizzard, Inc. generally uses words such as \"outlook,\" \"forecast,\" \"will,\" \"could,\" \"should,\" \"would,\" \"to be,\" \"plan,\" \"plans,\" \"believes,\" \"may,\" \"might,\" \"expects,\" \"intends,\" \"intends as,\" \"anticipates,\" \"estimate,\" \"future,\" \"positioned,\" \"potential,\" \"project,\" \"remain,\" \"scheduled,\" \"set to,\" \"subject to,\" \"upcoming\" and other similar expressions to help identify forward-looking statements. Forward-looking statements are subject to business and economic risks, reflect management's current expectations, estimates and projections about our business, and are inherently uncertain and difficult to predict. Our actual results could differ materially from expectations stated in forward-looking statements. Some of the risk factors that could cause our actual results to differ from those stated in forward-looking statements can be found in \"Risk Factors\" included in Part I, Item 1A of this Report. The forward-looking statements contained herein are based upon information available to us as of the date of this Annual Report on Form 10-K and we assume no obligation to update any such forward-looking statements. Although these forward-looking statements are believed to be true when made, they may ultimately prove to be incorrect. These statements are not guarantees of our future performance and are subject to risks, uncertainties and other factors, some of which are beyond our control and may cause actual results to differ materially from current expectations.",
"Activision Blizzard Inc.'s names, abbreviations thereof, logos, and product and service designators are all either the registered or unregistered trademarks or trade names of Activision Blizzard. All other product or service names are the property of their respective owners.",
"Overview", "Activision Blizzard, Inc. is a worldwide developer and publisher of online, personal computer (\"PC\"), video game console, handheld, mobile and tablet games. The terms \"Activision Blizzard,\" the \"Company,\" \"we,\" \"us,\" and \"our\" are used to refer collectively to Activision Blizzard, Inc. and its subsidiaries. We currently offer games that operate on the Microsoft Corporation (\"Microsoft\") Xbox One (\"Xbox One\") and Xbox 360 (\"Xbox 360\"), Nintendo Co. Ltd. (\"Nintendo\") Wii U (\"Wii U\") and Wii (\"Wii\"), and Sony Computer Entertainment Inc. (\"Sony\") PlayStation 4 (\"PS4\") and PlayStation 3 (\"PS3\") console systems (Xbox One, Wii U, and PS4 are collectively referred to as \"next-generation\"; Xbox 360, Wii, and PS3 are collectively referred to as \"prior-generation\"); the PC; the Nintendo 3DS, Nintendo Dual Screen and Sony PlayStation Vita handheld game systems; and mobile and tablet devices.",
"Activision - Through Activision Publishing, Inc. (\"Activision\"), we are a leading international developer and publisher of interactive software products and content. Activision develops, markets and sells products through retail channels or digital downloads, which are principally based on our internally developed intellectual properties, as well as some licensed properties. Activision delivers content to a broad range of gamers, ranging from children to adults, and from core gamers to mass-market consumers to \"value\" buyers seeking budget-priced software, in a variety of geographies. Activision continues to focus its efforts in the areas we believe have the most opportunity for growth and higher profitability, while reducing investments in areas we believe have less profit potential and",
"3", "Table of Contents", "limited growth opportunities. To that end, investments are focused on proven intellectual properties to develop deep, high-quality content that offers engaging online gaming experiences. One of Activision's leading franchises is Call of Duty®, which launched in 2003, and has been the best-selling Western interactive franchise since its launch. In 2015, Activision released the latest installment in the franchise, Call of Duty: Black Ops III, which, according to The NPD Group, GfK Chart-Track, and Activision Blizzard internal estimates, was the #1 best-selling console game globally in 2015. Activision is currently developing, distributing, and selling additional digital content for the global community of Call of Duty: Black Ops III players, along with content for the other Call of Duty titles, in addition to developing future releases and sequels.",
"Another leading franchise for Activision is Skylanders®, which launched in 2011 with the release of Skylanders Spyro's Adventure. Games in the Skylanders franchise combine the use of physical toys with digital interactive experiences to deliver innovative gameplay to our audience. In September 2015, we released Skylanders SuperChargers, which introduced vehicles-to-life - an entirely new way for fans to experience the magic of Skylanders.",
"While focusing on proven intellectual properties is one of Activision's priorities, we also continue to make strategic investments in developing new intellectual properties that we believe have the potential for long-term growth and success. For example, on September 15, 2015, we released The Taken King, the third and largest expansion to Destiny, the game universe created by Bungie under our long-term alliance with them. We also introduced microtransactions within Destiny in October 2015 and expect to release additional content to our global community of Destiny players in 2016.",
"Blizzard - Blizzard Entertainment, Inc. (\"Blizzard\") is a leader in online PC gaming, including the subscription-based massively multi-player online role-playing game (\"MMORPG\") category in terms of both subscriber base and revenues generated through its World of Warcraft® franchise. Blizzard also develops, markets, and sells role-playing action and strategy games for the PC, console, mobile and tablet platforms, including games in the multiple-award winning Diablo®, StarCraft®, Hearthstone®: Heroes of Warcraft<U+0099> and Heroes of the Storm<U+0099> franchises. In addition, Blizzard maintains a proprietary online gaming service, Battle.net®, which facilitates the creation of user-generated content, digital distribution and online social connectivity across all Blizzard games. Blizzard distributes its products and generates revenues worldwide through various means, including: subscriptions; sales of prepaid subscription cards; in-game purchases and services; retail sales of physical \"boxed\" products; online download sales of PC products; purchases and downloads via third-party console, mobile and tablet platforms; and licensing of software to third-party or related party companies that distribute Blizzard products.",
"Blizzard has released five expansion packs to the epic World of Warcraft franchise since 2004, with the most recent release, World of Warcraft: Warlords of Draenor®, having been released in November 2014, and the next expansion, World of Warcraft: Legion<U+0099>, to be released in the summer of 2016. For Hearthstone: Heroes of Warcraft, in addition to bringing the game to iOS and Android smartphones in April 2015, three new content releases, Blackrock Mountain<U+0099>, The Grand Tournament<U+0099>, and The League of Explorers<U+0099>, were introduced in 2015 and have continued to drive performance.",
"Blizzard continues to invest in new opportunities, both by leveraging its internally developed intellectual property, such as the release of Heroes of the Storm in 2015, as well as developing new intellectual property with the upcoming team-based first person shooter, Overwatch<U+0099>, which is expected to be released commercially in the spring of 2016.",
"Other - We also engage in other business opportunities including:",
"<U+0095> The Activision Blizzard Media Networks (\"Media Networks\") business, announced in 2015, which builds on our efforts in competitive gaming and the growing eSports industry;",
"4", "Table of Contents", "<U+0095> The Activision Blizzard Studios (\"Studios\") business, announced in 2015, which is devoted to creating original film and television content based on the company's extensive library of iconic and globally-recognized intellectual properties; and <U+0095> The Activision Blizzard Distribution (\"Distribution\") business, which consists of operations in Europe that provide warehousing, logistical, and sales distribution services to third-party publishers of interactive entertainment software, our own publishing operations, and manufacturers of interactive entertainment hardware.",
"Revenues associated with the Call of Duty, World of Warcraft, Skylanders, and Destiny franchises combined accounted for 71%, 72%, and 80% of our consolidated net revenues for the years ended December 31, 2015, 2014, and 2013, respectively."
), part.name = c("", "", "", "", "", "", "", "", "", "", "",
"", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "",
"", "", "PART I", "PART I", "PART I", "PART I", "PART I", "PART I",
"PART I", "PART I", "PART I", "PART I", "PART I", "PART I", "PART I",
"PART I", "PART I", "PART I", "PART I", "PART I", "PART I", "PART I",
"PART I"), item.name = c("", "", "", "", "", "", "", "", "",
"", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "",
"", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "",
"", "", "", "", "", "", "", "", ""), Documentshort = c("a2227483z10-k",
"a2227483z10-k", "a2227483z10-k", "a2227483z10-k", "a2227483z10-k",
"a2227483z10-k", "a2227483z10-k", "a2227483z10-k", "a2227483z10-k",
"a2227483z10-k", "a2227483z10-k", "a2227483z10-k", "a2227483z10-k",
"a2227483z10-k", "a2227483z10-k", "a2227483z10-k", "a2227483z10-k",
"a2227483z10-k", "a2227483z10-k", "a2227483z10-k", "a2227483z10-k",
"a2227483z10-k", "a2227483z10-k", "a2227483z10-k", "a2227483z10-k",
"a2227483z10-k", "a2227483z10-k", "a2227483z10-k", "a2227483z10-k",
"a2227483z10-k", "a2227483z10-k", "a2227483z10-k", "a2227483z10-k",
"a2227483z10-k", "a2227483z10-k", "a2227483z10-k", "a2227483z10-k",
"a2227483z10-k", "a2227483z10-k", "a2227483z10-k", "a2227483z10-k",
"a2227483z10-k", "a2227483z10-k", "a2227483z10-k", "a2227483z10-k",
"a2227483z10-k", "a2227483z10-k", "a2227483z10-k", "a2227483z10-k",
"a2227483z10-k"), companyID = c("718877", "718877", "718877",
"718877", "718877", "718877", "718877", "718877", "718877", "718877",
"718877", "718877", "718877", "718877", "718877", "718877", "718877",
"718877", "718877", "718877", "718877", "718877", "718877", "718877",
"718877", "718877", "718877", "718877", "718877", "718877", "718877",
"718877", "718877", "718877", "718877", "718877", "718877", "718877",
"718877", "718877", "718877", "718877", "718877", "718877", "718877",
"718877", "718877", "718877", "718877", "718877")), .Names = c("Document",
"text", "part.name", "item.name", "Documentshort", "companyID"
), row.names = c(NA, 50L), class = "data.frame")
First convert your file as a data.frame and try this
write.csv(df, file = 'df.csv', row.names = F)
then the file will be stored in your working directory
How about?
library(data.table)
fwrite(as.data.table(x), "text.csv")
Use fread() to read it back.

R Text Mining - the most frequent word in string across entire data frame

I am struggling to grasp text mining and determine word frequencies. I am just starting to have an understanding of R and its packages and I just find out about tm (after reading a while I have a feeling that this might solve my problem).
My question is: how can I determine the two most frequently used in a string across the entire column?
I have the following example:
structure(list(Location = c("Chicago", "Chicago", "Chicago",
"LA", "LA", "LA", "LA", "LA", "LA", "Texas", "Texas", "Texas",
"Texas", "Texas"), Code = c(4450L, 4450L, 4450L, 4450L, 4450L,
4450L, 4450L, 4450L, 4450L, 4410L, 4410L, 4410L, 4410L, 4410L
), Description = c("LABOR - CROSSOVER BOARD BRACKET", "LABOR - CROWN DOOR GASKET",
"LABOR - CROWN DOOR GASKET - APPLY 4' NEW GASKET AND RE-APPLY",
"LABOR - CUSHIONING DEVICE - END OF CAR CUSTOMER SUPPLIED MATERIAL",
"LABOR - DOOR EDGE", "LABOR - DOOR GASKET, CROWN CORNER", "LABOR - DOOR LOCK POCKET STG",
"LABOR - DOOR LOCK RECEPTICALS STG", "LABOR - DOOR LOCK STG",
"BOLT, HT, UNDER 5/8\"\" DIA & 6\"\" - SIDE POST", "BOLT, HT, UNDER 5/8\"\" DIA & 6\"\" - TRAINLINE TROLLEY",
"BOLT,HT,5/8 IN.DIA.OR LESS UNDER 6\"\" LONG - BRAKE STEP", "BOLT,HT,5/8 IN.DIA.OR LESS UNDER 6\"\" LONG - CROSSOVER BOARD",
"BOLT,HT,5/8 IN.DIA.OR LESS UNDER 6\"\" LONG - CROSSOVER BOARD BRACKET"
), `Desired Description Based on frequency` = c("Labor - CROWN DOOR GASKET",
"Labor - CROWN DOOR GASKET", "Labor - CROWN DOOR GASKET", "Labor - DOOR LOCK",
"Labor - DOOR LOCK", "Labor - DOOR LOCK", "Labor - DOOR LOCK",
"Labor - DOOR LOCK", "Labor - DOOR LOCK", "Bolt - HT", "Bolt - HT",
"Bolt - HT", "Bolt - HT", "Bolt - HT")), .Names = c("Location",
"Code", "Description", "Desired Description Based on frequency"
), row.names = c(NA, -14L), class = "data.frame")
In the end I wish I could add this column:
Desired Description Based on frequency
Labor - CROWN DOOR GASKET
Labor - CROWN DOOR GASKET
Labor - CROWN DOOR GASKET
Labor - DOOR LOCK
Labor - DOOR LOCK
Labor - DOOR LOCK
Labor - DOOR LOCK
Labor - DOOR LOCK
Labor - DOOR LOCK
Bolt - HT
Bolt - HT
Bolt - HT
Bolt - HT
Bolt - HT
Basically I want to evaluate all the 4450 or 4410s and see out of all the description in the table, which the most common and add that as a column. Another condition would be based on the location. Can someone please help me with a simple example?
Thank you so much
I don't think there's a one-size-fits-it-all-solution to your problem. (Beginning with the fact that there's no exact rule on which or how many words to take for the description.) However, here are two quick&dirty approaches, which might be helpful as a starting point:
library(tm)
txts <- gsub("[^A-Z]", " ", df$Description)
groups <- paste(df$Location, df$Code)
# 1
opts <- list(tolower=F, removePunctuation=TRUE, wordLengths=c(2, Inf))
lst <- split(txts, groups)
res <- sapply(lst, function(x) {
freq <- termFreq(paste(x, collapse=" "), opts)/length(x)
paste(names(freq[rank(-freq, ties.method = "first")<=3]), collapse = " - ")
})
rep(res, lengths(lst))
# 2
lst <- lapply(strsplit(txts, "\\s+"), function(x) x[1:min(c(3,length(x)))] )
lst <- split(lst, groups)
n <- lengths(lst)
lst <- mapply("/", lapply(lst, function(x) sort(table(unlist(x)), decreasing = T)), n)
rep(sapply(lst, function(x) paste(names(x)[x>=.5], collapse=" - ")), n)

Resources