How to split Character Columns into multiple columns and then into binary in R? - r

I got a data set with around 4000 observations:
It looks like this format:
View(transaction)
CustomerID Description
12346 MEDIUM CERAMIC TOP STORAGE JAR
12347 c("BLACK CANDLEABRA HOLDER","AIRLINE BAG VINTAGE JET SET BROWN", ...)
12348 c("72 SWEETHEART FAIRY CAKE CASES","60 CAKE CASES DOLLY GIRL DESIGN","PACK OF 72 SKULL CAKE CASES",...)
12349 c("PARISIENNE CURIO CABINET","SWEETHEART WALL TIDY",...)
12350 c("CHOCOLATE THIS WAY METAL SIGN","RETRO MOD TRAY",...)
12352 c("CERAMIC CAKE STAND + HANGING CAKES", WOODEN HAPPY BIRTHDAY GARLAND, ...)
I want it in the first step that it looks like this:
CustomerID PRODUCT_1 PRODUCT_2 PRODUCT_N
12346 MEDIUM CERAMIC TOP STORAGE JAR
12347 BLACK CANDLEABRA HOLDER AIRLINE BAG VINTAGE JET SET BROWN ...
12348 72 SWEETHEART FAIRY CAKE CASES" 60 CAKE CASES DOLLY GIRL DESIGN PACK OF 72 SKULL CAKE CASES
12349 PARISIENNE CURIO CABINET. SWEETHEART WALL TIDY ...
12350 CHOCOLATE THIS WAY METAL SIGN RETRO MOD TRAY ...
12352 CERAMIC CAKE STAND + HANGING CAKES WOODEN HAPPY BIRTHDAY GARLAND ...
I tried it with cSplit, but I don't know how to do this with this Dataset.
In the last step I would like to get a binary Matrix like this:
MEDIUM CERAMIC TOP STORAGE JAR BLACK CANDLEABRA HOLDER AIRLINE BAG VINTAGE JET SET BROWN
12346 1 0 0
12347 0 1 1
12348 0 0 0
12349 0 0 0
12350 0 0 0
12352 0 0 0
It would help me a lot if anyone can solve this problem.
Many thanks,
Marre
Edit:
dput(droplevels(head(transactions)))
structure(list(CustomerID = c(12346, 12347, 12348, 12349, 12350,
12352), Description = structure(list(`0001` = "MEDIUM CERAMIC TOP STORAGE JAR",
`0002` = c("BLACK CANDELABRA T-LIGHT HOLDER", "AIRLINE BAG VINTAGE JET SET BROWN",
"COLOUR GLASS. STAR T-LIGHT HOLDER", "MINI PAINT SET VINTAGE ",
"CLEAR DRAWER KNOB ACRYLIC EDWARDIAN", "PINK DRAWER KNOB ACRYLIC EDWARDIAN",
"GREEN DRAWER KNOB ACRYLIC EDWARDIAN", "RED DRAWER KNOB ACRYLIC EDWARDIAN",
"PURPLE DRAWERKNOB ACRYLIC EDWARDIAN", "BLUE DRAWER KNOB ACRYLIC EDWARDIAN",
"ALARM CLOCK BAKELIKE CHOCOLATE", "ALARM CLOCK BAKELIKE GREEN",
"ALARM CLOCK BAKELIKE RED ", "ALARM CLOCK BAKELIKE PINK",
"ALARM CLOCK BAKELIKE ORANGE", "FOUR HOOK WHITE LOVEBIRDS",
"BLACK GRAND BAROQUE PHOTO FRAME", "BATHROOM METAL SIGN ",
"LARGE HEART MEASURING SPOONS", "BOX OF 6 ASSORTED COLOUR TEASPOONS",
"BLUE 3 PIECE POLKADOT CUTLERY SET", "RED 3 PIECE RETROSPOT CUTLERY SET",
"PINK 3 PIECE POLKADOT CUTLERY SET", "EMERGENCY FIRST AID TIN ",
"SET OF 2 TINS VINTAGE BATHROOM ", "SET/3 DECOUPAGE STACKING TINS",
"BOOM BOX SPEAKER BOYS", "RED TOADSTOOL LED NIGHT LIGHT",
"3D DOG PICTURE PLAYING CARDS", "BLACK EAR MUFF HEADPHONES",
"CAMOUFLAGE EAR MUFF HEADPHONES", "PINK NEW BAROQUECANDLESTICK CANDLE",
"BLUE NEW BAROQUE CANDLESTICK CANDLE", "BLACK CANDELABRA T-LIGHT HOLDER",
"WOODLAND CHARLOTTE BAG", "AIRLINE BAG VINTAGE JET SET BROWN",
"AIRLINE BAG VINTAGE JET SET WHITE", "SANDWICH BATH SPONGE",
"ALARM CLOCK BAKELIKE CHOCOLATE", "ALARM CLOCK BAKELIKE GREEN",
"ALARM CLOCK BAKELIKE RED ", "ALARM CLOCK BAKELIKE PINK",
"ALARM CLOCK BAKELIKE ORANGE", "SMALL HEART MEASURING SPOONS",
"72 SWEETHEART FAIRY CAKE CASES", "60 TEATIME FAIRY CAKE CASES",
"PACK OF 60 MUSHROOM CAKE CASES", "PACK OF 60 SPACEBOY CAKE CASES",
"TEA TIME OVEN GLOVE", "RED RETROSPOT OVEN GLOVE ", "RED RETROSPOT OVEN GLOVE DOUBLE",
"SET/2 RED RETROSPOT TEA TOWELS ", "REGENCY CAKESTAND 3 TIER",
"BOX OF 6 ASSORTED COLOUR TEASPOONS", "MINI LADLE LOVE HEART RED ",
"CHOCOLATE CALCULATOR", "TOOTHPASTE TUBE PEN", "SET OF 2 TINS VINTAGE BATHROOM ",
"RED TOADSTOOL LED NIGHT LIGHT", "3D DOG PICTURE PLAYING CARDS",
"AIRLINE BAG VINTAGE JET SET WHITE", "AIRLINE BAG VINTAGE JET SET RED",
"AIRLINE BAG VINTAGE TOKYO 78", "AIRLINE BAG VINTAGE JET SET BROWN",
"RED RETROSPOT PURSE ", "ICE CREAM SUNDAE LIP GLOSS", "VINTAGE HEADS AND TAILS CARD GAME ",
"HOLIDAY FUN LUDO", "TREASURE ISLAND BOOK BOX", "WATERING CAN PINK BUNNY",
"RED DRAWER KNOB ACRYLIC EDWARDIAN", "LARGE HEART MEASURING SPOONS",
"SMALL HEART MEASURING SPOONS", "PACK OF 60 DINOSAUR CAKE CASES",
"RED RETROSPOT OVEN GLOVE DOUBLE", "REGENCY CAKESTAND 3 TIER",
"ROSES REGENCY TEACUP AND SAUCER ", "RED TOADSTOOL LED NIGHT LIGHT",
"MINI PAINT SET VINTAGE ", "3D SHEET OF DOG STICKERS", "3D SHEET OF CAT STICKERS",
"SMALL FOLDING SCISSOR(POINTED EDGE)", "GIFT BAG PSYCHEDELIC APPLES",
"SET OF 2 TINS VINTAGE BATHROOM ", "RABBIT NIGHT LIGHT",
"REGENCY TEA STRAINER", "REGENCY TEA PLATE GREEN ", "REGENCY TEA PLATE PINK",
"REGENCY TEA PLATE ROSES ", "REGENCY TEAPOT ROSES ", "REGENCY SUGAR BOWL GREEN",
"REGENCY MILK JUG PINK ", "AIRLINE BAG VINTAGE TOKYO 78",
"AIRLINE BAG VINTAGE JET SET BROWN", "VICTORIAN SEWING KIT",
"NAMASTE SWAGAT INCENSE", "TRIPLE HOOK ANTIQUE IVORY ROSE",
"SMALL HEART MEASURING SPOONS", "3D DOG PICTURE PLAYING CARDS",
"FEATHER PEN,COAL BLACK", "ALARM CLOCK BAKELIKE RED ", "ALARM CLOCK BAKELIKE CHOCOLATE",
"SET OF 60 VINTAGE LEAF CAKE CASES ", "SET 40 HEART SHAPE PETIT FOUR CASES",
"AIRLINE BAG VINTAGE JET SET BROWN", "AIRLINE BAG VINTAGE JET SET RED",
"AIRLINE BAG VINTAGE JET SET WHITE", "AIRLINE BAG VINTAGE TOKYO 78",
"AIRLINE BAG VINTAGE WORLD CHAMPION ", "WOODLAND DESIGN COTTON TOTE BAG",
"WOODLAND CHARLOTTE BAG", "ALARM CLOCK BAKELIKE RED ", "TRIPLE HOOK ANTIQUE IVORY ROSE",
"SINGLE ANTIQUE ROSE HOOK IVORY", "TEA TIME OVEN GLOVE",
"72 SWEETHEART FAIRY CAKE CASES", "60 TEATIME FAIRY CAKE CASES",
"PACK OF 60 DINOSAUR CAKE CASES", "REGENCY CAKESTAND 3 TIER",
"REGENCY MILK JUG PINK ", "3D DOG PICTURE PLAYING CARDS",
"REVOLVER WOODEN RULER ", "VINTAGE HEADS AND TAILS CARD GAME ",
"RED REFECTORY CLOCK ", "MINI LIGHTS WOODLAND MUSHROOMS",
"PINK GOOSE FEATHER TREE 60CM", "MADRAS NOTEBOOK MEDIUM",
"AIRLINE BAG VINTAGE WORLD CHAMPION ", "AIRLINE BAG VINTAGE JET SET BROWN",
"AIRLINE BAG VINTAGE TOKYO 78", "AIRLINE BAG VINTAGE JET SET RED",
"BIRDCAGE DECORATION TEALIGHT HOLDER", "CHRISTMAS METAL TAGS ASSORTED ",
"REGENCY CAKESTAND 3 TIER", "REGENCY TEAPOT ROSES ", "TEA TIME DES TEA COSY",
"TEA TIME KITCHEN APRON", "TEA TIME OVEN GLOVE", "PINK REGENCY TEACUP AND SAUCER",
"GREEN REGENCY TEACUP AND SAUCER", "3D DOG PICTURE PLAYING CARDS",
"RABBIT NIGHT LIGHT", "RED TOADSTOOL LED NIGHT LIGHT", "TREASURE ISLAND BOOK BOX",
"VINTAGE HEADS AND TAILS CARD GAME ", "MINI PLAYING CARDS DOLLY GIRL ",
"MINI PLAYING CARDS SPACEBOY ", "PLAYING CARDS KEEP CALM & CARRY ON",
"REVOLVER WOODEN RULER ", "WOODEN SCHOOL COLOURING SET",
"MINI PAINT SET VINTAGE ", "TRADITIONAL KNITTING NANCY",
"TRIPLE HOOK ANTIQUE IVORY ROSE", "PANTRY HOOK SPATULA",
"PANTRY HOOK BALLOON WHISK ", "PANTRY HOOK TEA STRAINER ",
"ROSES REGENCY TEACUP AND SAUCER ", "ALARM CLOCK BAKELIKE CHOCOLATE",
"ALARM CLOCK BAKELIKE PINK", "ALARM CLOCK BAKELIKE GREEN",
"ALARM CLOCK BAKELIKE RED ", "PACK OF 60 MUSHROOM CAKE CASES",
"PACK OF 60 SPACEBOY CAKE CASES", "SET OF 60 VINTAGE LEAF CAKE CASES ",
"60 TEATIME FAIRY CAKE CASES", "72 SWEETHEART FAIRY CAKE CASES",
"SMALL HEART MEASURING SPOONS", "LARGE HEART MEASURING SPOONS",
"WOODLAND CHARLOTTE BAG", "REGENCY TEA STRAINER", "FOOD CONTAINER SET 3 LOVE HEART ",
"CLASSIC CHROME BICYCLE BELL ", "BICYCLE PUNCTURE REPAIR KIT ",
"BOOM BOX SPEAKER BOYS", "PINK NEW BAROQUECANDLESTICK CANDLE",
"RED TOADSTOOL LED NIGHT LIGHT", "RABBIT NIGHT LIGHT", "WOODLAND CHARLOTTE BAG",
"PINK GOOSE FEATHER TREE 60CM", "CHRISTMAS TABLE SILVER CANDLE SPIKE",
"MINI PLAYING CARDS SPACEBOY ", "MINI PLAYING CARDS DOLLY GIRL "
), `0003` = c("72 SWEETHEART FAIRY CAKE CASES", "60 CAKE CASES DOLLY GIRL DESIGN",
"60 TEATIME FAIRY CAKE CASES", "60 TEATIME FAIRY CAKE CASES",
"PACK OF 72 SKULL CAKE CASES", "PACK OF 72 SKULL CAKE CASES",
"PACK OF 12 LONDON TISSUES ", "PACK OF 12 WOODLAND TISSUES ",
"PACK OF 12 SUKI TISSUES ", "SWEETIES STICKERS", "SET OF 72 SKULL PAPER DOILIES",
"SET OF 72 PINK HEART PAPER DOILIES", "60 CAKE CASES VINTAGE CHRISTMAS",
"60 CAKE CASES VINTAGE CHRISTMAS", "PACK OF 60 PINK PAISLEY CAKE CASES",
"PACK OF 60 PINK PAISLEY CAKE CASES", "POSTAGE", "PACK OF 12 RED RETROSPOT TISSUES ",
"PACK OF 12 HEARTS DESIGN TISSUES ", "MULTI HEARTS STICKERS",
"PACK OF 12 BLUE PAISLEY TISSUES ", "PACK OF 12 SKULL TISSUES",
"POSTAGE", "DOUGHNUT LIP GLOSS ", "ICE CREAM PEN LIP GLOSS ",
"ICE CREAM SUNDAE LIP GLOSS", "SET OF 9 BLACK SKULL BALLOONS",
"POSTAGE", "DOUGHNUT LIP GLOSS ", "ICE CREAM PEN LIP GLOSS ",
"POSTAGE"), `0004` = c("PARISIENNE CURIO CABINET", "SWEETHEART WALL TIDY ",
"PINK HEART SHAPE LOVE BUCKET ", "GINGHAM HEART DOORSTOP RED",
"RED HEART SHAPE LOVE BUCKET ", "FOOD CONTAINER SET 3 LOVE HEART ",
"LARGE HEART MEASURING SPOONS", "DOORMAT HEARTS", "HANGING HEART JAR T-LIGHT HOLDER",
"BROCANTE SHELF WITH HOOKS", "PLASTERS IN TIN VINTAGE PAISLEY ",
"PANTRY MAGNETIC SHOPPING LIST", "RECIPE BOX PANTRY YELLOW DESIGN",
"SET OF 3 CAKE TINS PANTRY DESIGN ", "JAM MAKING SET WITH JARS",
"SET OF 6 SPICE TINS PANTRY DESIGN", "PANTRY CHOPPING BOARD",
"DOORMAT WELCOME TO OUR HOME", "16 PIECE CUTLERY SET PANTRY DESIGN",
"SMALL WHITE RETROSPOT MUG IN BOX ", "BLACK/BLUE POLKADOT UMBRELLA",
"20 DOLLY PEGS RETROSPOT", "SET/4 WHITE RETRO STORAGE CUBES ",
"REGENCY CAKESTAND 3 TIER", "SET/5 RED RETROSPOT LID GLASS BOWLS",
"DOORMAT RED RETROSPOT", "SET/6 RED SPOTTY PAPER CUPS", "RED RETROSPOT SMALL MILK JUG",
"RED RETROSPOT SUGAR JAM BOWL", "RETROSPOT LARGE MILK JUG",
"SMALL RED RETROSPOT MUG IN BOX ", "RAIN PONCHO RETROSPOT",
"RED RETROSPOT UMBRELLA", "PLASTERS IN TIN SKULLS", "GLASS SONGBIRD STORAGE JAR",
"SET OF 12 FAIRY CAKE BAKING CASES", "SET OF 6 TEA TIME BAKING CASES",
"SET OF 6 SNACK LOAF BAKING CASES", "WRAP RED VINTAGE DOILY",
"SET OF 12 MINI LOAF BAKING CASES", "STORAGE TIN VINTAGE DOILY ",
"SET OF 4 KNICK KNACK TINS DOILY ", "DOORMAT VINTAGE LEAF",
"ROUND SNACK BOXES SET OF4 WOODLAND ", "PLASTERS IN TIN WOODLAND ANIMALS",
"PLASTERS IN TIN STRONGMAN", "RETROSPOT PARTY BAG + STICKER SET",
"VINTAGE DOILY TRAVEL SEWING KIT", "VINTAGE DOILY DELUXE SEWING KIT ",
"CLASSIC CHROME BICYCLE BELL ", "EMBROIDERED RIBBON REEL SALLY ",
"PAINTED METAL PEARS ASSORTED", "WRAP RED APPLES ", "CHRISTMAS RETROSPOT ANGEL WOOD",
"SET OF 3 WOODEN HEART DECORATIONS", "HEART T-LIGHT HOLDER WILLIE WINKIE",
"HAND WARMER RED LOVE HEART", "ZINC FOLKART SLEIGH BELLS",
"PLASTERS IN TIN CIRCUS PARADE ", "ENGLISH ROSE SPIRIT LEVEL ",
"SET OF 10 LED DOLLY LIGHTS", "FRENCH ENAMEL CANDLEHOLDER",
"GROW YOUR OWN BASIL IN ENAMEL MUG", "ENAMEL WATERING CAN CREAM",
"DOORMAT ENGLISH ROSE ", "CERAMIC STRAWBERRY DESIGN MUG",
"SWEETHEART CERAMIC TRINKET BOX", "STRAWBERRY CERAMIC TRINKET POT",
"PINK DOUGHNUT TRINKET POT ", "CERAMIC CAKE DESIGN SPOTTED MUG",
"TEA TIME TEAPOT IN GIFT BOX", "DOORMAT FAIRY CAKE", "POSTAGE"
), `0005` = c("CHOCOLATE THIS WAY METAL SIGN", "METAL SIGN NEIGHBOURHOOD WITCH ",
"RETRO MOD TRAY", "RETRO PLASTIC ELEPHANT TRAY", "TEA BAG PLATE RED RETROSPOT",
"PINK/PURPLE RETRO RADIO", "PLASTERS IN TIN SPACEBOY", "PLASTERS IN TIN VINTAGE PAISLEY ",
"CHOCOLATE CALCULATOR", "RED HARMONICA IN BOX ", "4 TRADITIONAL SPINNING TOPS",
"BATHROOM METAL SIGN ", "POSTAGE", "UNION JACK FLAG PASSPORT COVER ",
"UNION JACK FLAG LUGGAGE TAG", "BLUE POLKADOT LUGGAGE TAG ",
"BLUE POLKADOT PASSPORT COVER"), `0006` = c("WOODEN HAPPY BIRTHDAY GARLAND",
"PINK DOUGHNUT TRINKET POT ", "STRAWBERRY CERAMIC TRINKET BOX",
"CERAMIC STRAWBERRY CAKE MONEY BANK", "WOODEN OWLS LIGHT GARLAND ",
"REGENCY CAKESTAND 3 TIER", "DELUXE SEWING KIT ", "WELCOME WOODEN BLOCK LETTERS",
"LOVE BUILDING BLOCK WORD", "BATH BUILDING BLOCK WORD", "HOME BUILDING BLOCK WORD",
"CAT BOWL VINTAGE CREAM", "BIG DOUGHNUT FRIDGE MAGNETS",
"DOLLY GIRL LUNCH BOX", "LIGHT GARLAND BUTTERFILES PINK",
"POSTAGE", "DELUXE SEWING KIT ", "PINK HEART SHAPE EGG FRYING PAN",
"BAKING SET 9 PIECE RETROSPOT ", "VINTAGE CREAM DOG FOOD CONTAINER",
"Manual", "Manual", "Manual", "CERAMIC HEART FAIRY CAKE MONEY BANK",
"CERAMIC CAKE DESIGN SPOTTED MUG", "BLUE HARMONICA IN BOX ",
"PINK DOG BOWL", "PINK HEART SHAPE EGG FRYING PAN", "LANTERN CREAM GAZEBO ",
"METAL SIGN TAKE IT OR LEAVE IT ", "POSTAGE", "PINK HEART SHAPE EGG FRYING PAN",
"CERAMIC CAKE DESIGN SPOTTED MUG", "LANTERN CREAM GAZEBO ",
"PINK DOG BOWL", "CERAMIC HEART FAIRY CAKE MONEY BANK", "METAL SIGN TAKE IT OR LEAVE IT ",
"BLUE HARMONICA IN BOX ", "ANTIQUE GLASS PEDESTAL BOWL",
"PANTRY MAGNETIC SHOPPING LIST", "PANTRY SCRUBBING BRUSH",
"SET OF 3 REGENCY CAKE TINS", "SMALL GLASS HEART TRINKET POT",
"DOORMAT WELCOME TO OUR HOME", "SET OF 4 PANTRY JELLY MOULDS",
"PANTRY WASHING UP BRUSH", "BAKING SET SPACEBOY DESIGN",
"IVORY KITCHEN SCALES", "SET OF 3 CAKE TINS PANTRY DESIGN ",
"REGENCY CAKESTAND 3 TIER", "WOODEN OWLS LIGHT GARLAND ",
"LIGHT GARLAND BUTTERFILES PINK", "FAIRY CAKE BIRTHDAY CANDLE SET",
"SPOTTY BUNTING", "SET OF 4 ENGLISH ROSE COASTERS", "POSTAGE",
"OPEN CLOSED METAL SIGN", "SET OF 6 SPICE TINS PANTRY DESIGN",
"SET OF 3 REGENCY CAKE TINS", "RED TOADSTOOL LED NIGHT LIGHT",
"CHILDS BREAKFAST SET DOLLY GIRL ", "CHILDS BREAKFAST SET SPACEBOY ",
"SET OF 3 CAKE TINS PANTRY DESIGN ", "SET OF TEA COFFEE SUGAR TINS PANTRY",
"LOVE BUILDING BLOCK WORD", "HOLIDAY FUN LUDO", "BATH BUILDING BLOCK WORD",
"WOODEN OWLS LIGHT GARLAND ", "LIGHT GARLAND BUTTERFILES PINK",
"POSTAGE", "PETIT TRAY CHIC", "PANTRY ROLLING PIN", "PANTRY PASTRY BRUSH",
"WOODLAND BUNNIES LOLLY MAKERS", "PINK BABY BUNTING", "MINT KITCHEN SCALES",
"SET 12 COLOUR PENCILS SPACEBOY ", "ZINC HEART FLOWER T-LIGHT HOLDER",
"VICTORIAN GLASS HANGING T-LIGHT", "IVORY KITCHEN SCALES",
"GLASS BON BON JAR", "BLUE STRIPE CERAMIC DRAWER KNOB", "SET 12 COLOUR PENCILS DOLLY GIRL ",
"CHILDS BREAKFAST SET DOLLY GIRL ", "POSTAGE")), .Names = c("0001",
"0002", "0003", "0004", "0005", "0006"))), .Names = c("CustomerID",
"Description"), row.names = c(NA, 6L), class = "data.frame")

We can try with unnest to make the elements of list column a row, then spread it to wide format after creating a sequence column ('ind')
library(tidyr)
library(dplyr)
res <- unnest(transactions, Description) %>%
group_by(CustomerID) %>%
mutate(ind = paste0("PRODUCT", row_number())) %>%
spread(ind, Description)
and to get the second part of the question, use mtabulate from qdapTools
library(qdapTools)
res2 <- setNames(as.data.frame(t(res[-1])), res[[1]]) %>%
mtabulate

Related

Extract string with 3 place left and 2 places right of the match in R

I have a data of tyres that is usually shown with a / to give the width and thickness as follows.
c("120/70 R19 Pirelli Scorpion Rally STR", "120/70 ZR 19 Pirelli Scorpion Trail II",
"120/70 ZR17 Pirelli Diablo Rosso III", "120/70 ZR17 Pirelli Diablo Rosso Corsa II",
"Pirelli MT 60 RS 120/80 ZR18", "120/70 R17 V", "160/70TR17 73V Dunlop® Harley-Davidson Series, radial, GT503",
"120/70R19 60V", "MT90B16 72H", "120/70ZR-19 60W", "150/80-16,71H,BW",
"100/80 R17", "130/70B18 63H BW", "100/90B19 57H", "100/80 R17",
"100/90B19,57H,BW", "100/90B19 57H", "100/90B19,57H,BW", "130/90 B16 73H",
"120/70R19 M/C", "110/90B19,62H,BW", "BW 130/80B17 65H", "120/70ZR-19 60W",
"160/60R18,70V,BW", "130/90B16,53H,BW", "130/80B17 65H", "120/70 -R18",
"110/70-17 M/C 54S", "120/70 ZR17", "120/70ZR17", "100/90-19M/C 57S",
"120/70ZR17M/C (58W)", "120/70ZR17M/C (58W)", "120/70 ZR17",
"110/80R19 M/C (59V), tubeless", "120/70-17", "120/70ZR17M/C (58W), tubeless",
"120/70ZR17M/C (58W), tubeless", "120/70-ZR17", "110/80 x 19",
"120/70 ZR17", "120/70 ZR17", "130/90-16M/C 67H, tube type",
"120/70ZR18M/C (59W), tubeless", "130/90 B16", "100/90-19", "120/70 - R17",
"130/90-16", "100/90-R18", "100/90-18", "150/80 R17 V", "90/90-21",
"120/70 ZR17", "100/90-19 Metzeler Tourance", "100/90-18")
Now I want only the thickness / and width for example 120/70 out of a clutter of other text details for each tyre. I am not getting a suitable expression to do that.
Somehow the following line does the trick but there are two NAs, see
str_extract(g3$`Front Tyre:`, '\\d{3}/\\d{2}')

extract data from XML files - R

I'm new to extracting data from XML file. I'm trying to process the following an XML file using R XML packages. The information I want is in the attribute values.
I encounter two difficulties:
some attribute values exist in one node, but not in another node. For example, "DRP" has the information in the second but not in the first
some attributes has multiple values for an individual and i don't know how to link them to that individual. For example, "EmpHs" has multiple records for an individual (identified by indvlPK).
Ideally I want the output data has the structure similar to the following:
lastNm
firstNm
indvlPK
fromDt
orgNm
hasCustComp
GIGAX
JEFFREY
2783477
03/2004
GATEWAY FINANCIAL ADVISORS, INC
GIGAX
JEFFREY
2783477
03/2004
GFA IN
GIGAX
JEFFREY
2783477
01/2007
UNITED FIRST
HINSON
BRIAN
2783737
07/1996
LINCOLN FINANCIAL ADVISORS CORPORATION
Y
HINSON
BRIAN
2783737
07/1996
FIRST FINANCIAL GROUP
Y
Is there any way I can parse the data correctly? Thanks!
The code I used but didn't give me what I want:
doc <- "Test.xml"
ind <- xmlParse(doc)
xmltop = xmlRoot(ind)
temp1 <- data.frame(unlist(getNodeSet(xmltop,"//Info/#lastNm")))
temp2 <- data.frame(unlist(getNodeSet(xmltop,"//Info/#firstNm")))
temp3 <- data.frame(unlist(getNodeSet(xmltop,"//Info/#indvlPK")))
temp4 <- data.frame(unlist(getNodeSet(xmltop,"//EmpHs/#fromDt")))
temp5 <- data.frame(unlist(getNodeSet(xmltop,"//DRP/#hasCustComp")))
The data is here:
<?xml version="1.0" encoding="ISO-8859-1"?>
<IAPDIndividualReport GenOn="2021-03-29">
<Indvls>
<Indvl>
<Info lastNm="GIGAX" firstNm="JEFFREY" midNm="W" indvlPK="2783477" actvAGReg="Y" link="https://adviserinfo.sec.gov/individual/summary/2783477"/>
<OthrNms/>
<CrntEmps>
<CrntEmp orgNm="CAMBRIDGE INVESTMENT RESEARCH ADVISORS, INC." orgPK="134139" str1="1776 PLEASANT PLAIN RD." city="FAIRFIELD" state="IA" cntry="United States" postlCd="52556-8757">
<CrntRgstns>
<CrntRgstn regAuth="MO" regCat="RA" st="APPROVED" stDt="2010-09-09"/>
</CrntRgstns>
<BrnchOfLocs>
<BrnchOfLoc city="O&apos;FALLON" state="MO" cntry="United States"/>
</BrnchOfLocs>
</CrntEmp>
</CrntEmps>
<Exms>
<Exm exmCd="S63" exmNm="Uniform Securities Agent State Law Examination" exmDt="1996-08-20"/>
<Exm exmCd="S65" exmNm="Uniform Investment Adviser Law Examination" exmDt="1999-12-21"/>
</Exms>
<Dsgntns/>
<PrevRgstns>
<PrevRgstn orgNm="WOODBURY FINANCIAL SERVICES, INC." orgPK="421" regBeginDt="2009-01-05" regEndDt="2009-12-03">
<BrnchOfLocs>
<BrnchOfLoc city="OFALLON" state="MO"/>
<BrnchOfLoc city="OFALLON" state="MO"/>
<BrnchOfLoc city="DUBLIN" state="CA"/>
</BrnchOfLocs>
</PrevRgstn>
<PrevRgstn orgNm="FSC SECURITIES CORPORATION" orgPK="7461" regBeginDt="2004-10-29" regEndDt="2008-12-01">
<BrnchOfLocs>
<BrnchOfLoc city="O&apos;FALLON" state="MO"/>
<BrnchOfLoc city="ST. PETERS" state="MO"/>
</BrnchOfLocs>
</PrevRgstn>
<PrevRgstn orgNm="GATEWAY FINANCIAL ADVISORS, INC." orgPK="115025" regBeginDt="2004-11-11" regEndDt="2006-10-11">
<BrnchOfLocs>
<BrnchOfLoc city="ST. PETERS" state="MO"/>
</BrnchOfLocs>
</PrevRgstn>
</PrevRgstns>
<EmpHss>
<EmpHs fromDt="03/2004" orgNm="GATEWAY FINANCIAL ADVISORS, INC" city="OFALLON" state="MO"/>
<EmpHs fromDt="03/2004" orgNm="GFA INC" city="OFALLON" state="MO"/>
<EmpHs fromDt="01/2007" orgNm="UNITED FIRST" city="OFALLON" state="MO"/>
<EmpHs fromDt="09/2010" orgNm="CAMBRIDGE INVESTMENT RESEARCH ADVISORS, INC" city="FAIRFIELD" state="IA"/>
<EmpHs fromDt="09/2010" orgNm="CAMBRIDGE INVESTMENT RESEARCH, INC" city="FAIRFIELD" state="IA"/>
</EmpHss>
<OthrBuss>
<OthrBus desc="1)STONEBRIDGE WEALTH MANAGEMENT GROUP, 728 HAWK RUN DR, O&apos;FALLON, MO, 3/2008 AS INDEPENDENT INSURANCE AGENT FOR VARIOUS INDEPENDENT INSURANCE COMPANIES. INV REL - 40/MO - 20/TRADING. 2)UNITED FIRST FINANCIAL MORTGAGE SOFTWARE SALES. START 6/1/07, 10 HOURS PER MONTH, 5 DURING TRADING HOURS. NO OWNERSHIP INTEREST. 3)MORTGAGE STOP INC., 728 HAWK RUN DR., OFALLON, MO 63368. LOAN OFFICER PROCESSING LOAN APPS FOR CLIENTS. START 6/1/2002, 25 HOURS PER MONTH, 10 DURING TRADING HOURS. NO OWNERSHIP. 4)CIRA, 1776 PLEASANT PLAIN RD, FAIRFIELD, IA, AS ADVISORY REP OF A RIA. INV REL - 40 HR/WK - 40/TRADING. SEE EMPLOYMENT HISTORY FOR START DATE. 5) THE MORTGAGE SHOP, 355 MID RIVERS MALL DRIVE, STE E, ST. PETERS, MO 63376. MORTGAGE ORIGINATOR SINCE 01/01/99. NOT INVESTMENT RELATED. WORKS 60 HOURS PER MONTH, 20 OF WHICH ARE DURING TRADING HOURS. 6.365 PROPERTIES LLC, O&apos;FALLON, MO, 8/2018 AS OWNER OF LLC THAT BUYS, SELLS, & HOLDS REAL ESTATE. NIR - 20/MO - 0/TRADING. 7. BEST OFFER HOMES, LLC, 728 HAWK RUN DRIVE, O&apos;FALLON, MO, REAL ESTATE SALES/MORTGAGE ORIGINATION/ ACCOUNTING/FINANCIAL ACTIVITIES, 06/16/20, NIR, 20/MO- 0/TRADING 8. GIGAX WEALTH MANAGEMENT, 728 HAWK RUN DRIVE, OFALLON, MO, INDEPENDENT INSURANCE AGENT FOR VARIOUS INDEPENDENT INSURANCE COMPANIES,11/23/20, INV REL, 10 HR/WK- 10 TRADING HR."/>
</OthrBuss>
<DRPs/>
</Indvl>
<Indvl>
<Info lastNm="HINSON" firstNm="BRIAN" midNm="TROY" indvlPK="2783737" actvAGReg="Y" link="https://adviserinfo.sec.gov/individual/summary/2783737"/>
<OthrNms/>
<CrntEmps>
<CrntEmp orgNm="BRIDGEWORTH WEALTH MANAGEMENT" orgPK="164100" str1="101 25TH STREET NORTH" city="BIRMINGHAM" state="AL" cntry="United States" postlCd="35203">
<CrntRgstns>
<CrntRgstn regAuth="AL" regCat="RA" st="APPROVED" stDt="2015-05-12"/>
<CrntRgstn regAuth="TX" regCat="RA" st="APPROVED_RES" stDt="2015-05-01"/>
</CrntRgstns>
<BrnchOfLocs>
<BrnchOfLoc str1="400 MERIDIAN STREET" str2="SUITE 200" city="HUNTSVILLE" state="AL" cntry="United States" postlCd="35801"/>
<BrnchOfLoc str1="101 25TH STREET NORTH" city="BIRMINGHAM" state="AL" cntry="United States" postlCd="35203"/>
</BrnchOfLocs>
</CrntEmp>
</CrntEmps>
<Exms>
<Exm exmCd="S63" exmNm="Uniform Securities Agent State Law Examination" exmDt="1996-10-11"/>
</Exms>
<Dsgntns>
<Dsgntn dsgntnNm="Certified Financial Planner"/>
<Dsgntn dsgntnNm="Chartered Financial Consultant"/>
<Dsgntn dsgntnNm="Personal Financial Specialist"/>
</Dsgntns>
<PrevRgstns>
<PrevRgstn orgNm="LINCOLN FINANCIAL ADVISORS CORPORATION" orgPK="3978" regBeginDt="2000-04-25" regEndDt="2015-05-11">
<BrnchOfLocs>
<BrnchOfLoc city="HUNTSVILLE" state="AL"/>
<BrnchOfLoc city="HUNTSVILLE" state="AL"/>
</BrnchOfLocs>
</PrevRgstn>
</PrevRgstns>
<EmpHss>
<EmpHs fromDt="04/2015" orgNm="BRIDGEWORTH, LLC" city="HUNTSVILLE" state="AL"/>
<EmpHs fromDt="07/1996" toDt="04/2015" orgNm="LINCOLN FINANCIAL ADVISORS CORPORATION" city="HUNTSVILLE" state="AL"/>
<EmpHs fromDt="07/1996" toDt="04/2015" orgNm="FIRST FINANCIAL GROUP" city="BIRMINGHAM" state="AL"/>
<EmpHs fromDt="04/2015" orgNm="LPL FINANCIAL LLC" city="HUNTSVILLE" state="AL"/>
</EmpHss>
<OthrBuss>
<OthrBus desc="1) 04/30/2015: BRIDGEWORTH FINANCIAL, LLC - DBA FOR LPL BUSINESS (ENTITY FOR LPL BUSINESS) - INV REL - AT REPORTED BUSINESS LOCATIONS - START 01/01/2015 - 1% OF TIME SPENT 2) 04/30/2015: BRIDGEWORTH, LLC - INV REL - AT REPORTED BUSINESS LOCATION(S) - REGISTERED INVESTMENT ADVISOR HYBRID - START 01/2015 - 99% OF TIME SPENT. 3) 5/11/2015: NO BUSINESS NAME - INVESTMENT RELATED - AT REPORTED BUSINESS LOCATION(S) - NON-VARIABLE INSURANCE - STARTED 4/1/2015 - TIME SPENT 1% - LINES OF INSURANCE INCLUDE TERM, WHOLE, UNIVERSAL, LTC, DISABILITY. 4) 6/2/2017 - Bridgeworth Financial - Investment Related - At Reported Business Location(s) - DBA for LPL Business (entity for LPL business) - Started 04/30/2015 - 5 Hours Per Month/3 Hours During Securities Trading. 5) 5/8/2018 - Foster Properties Ltd - Not Investment Related - Home Based - Other-Family Business - Started 12/22/1997 - 1 Hours Per Month/0 Hours During Securities Trading - Handle the majority of business matters for this family business."/>
</OthrBuss>
<DRPs>
<DRP hasRegAction="N" hasCriminal="N" hasBankrupt="N" hasCivilJudc="N" hasBond="N" hasJudgment="N" hasInvstgn="N" hasCustComp="Y" hasTermination="N"/>
</DRPs>
</Indvl>
</Indvls>
</IAPDIndividualReport>

How can I bring values in from another data frame based on a match?

I have two data frames: df1 and codesDesc
df1 contains information that has certain codes and I want to add the relevant description into df1$desc (new column) by performing a lookup in codesDesc.
I have tried something like this:
df1$desc <- codesDesc$desc[df1$code %in% codesDesc$code]
Or this:
df1$desc <- codesDesc$desc[which(df1$code %in% codesDesc$code)]
But both fail due to the number of replacement rows not matching.
What am I missing here? I'm guessing that it's a syntactic error on my part.
dput(df1):
structure(list(dx = structure(1:108, .Label = c("Dx010", "Dx0101",
"Dx0103", "Dx0104", "Dx0105", "Dx0106", "Dx0107", "Dx011", "Dx0111",
"Dx0112", "Dx01120", "Dx01121", "Dx01122", "Dx0113", "Dx0114",
"Dx0115", "Dx0116", "Dx0117", "Dx0118", "Dx0119", "Dx012", "Dx0121",
"Dx0122", "Dx0126", "Dx0127", "Dx013", "Dx014", "Dx016", "Dx0162",
"Dx02", "Dx03", "Dx05", "Dx06", "Dx07", "Dx08", "Dx09", "Dx10",
"Dx106", "Dx108", "Dx11", "Dx110", "Dx111", "Dx115", "Dx116",
"Dx117", "Dx118", "Dx119", "Dx12", "Dx120", "Dx13", "Dx14", "Dx15",
"Dx16", "Dx18", "Dx19", "Dx20", "Dx21", "Dx22", "Dx28", "Dx30",
"Dx31", "Dx32", "Dx321", "Dx322", "Dx323", "Dx324", "Dx325",
"Dx326", "Dx327", "Dx328", "Dx329", "Dx330", "Dx332", "Dx333",
"Dx334", "Dx335", "Dx336", "Dx34", "Dx35", "Dx38", "Dx39", "Dx404",
"Dx45", "Dx46", "Dx48", "Dx49", "Dx50", "Dx58", "Dx59", "Dx75",
"Dx76", "Dx77", "Dx78", "Dx80", "Dx81", "Dx82", "Dx85", "Dx86",
"Dx87", "Dx88", "Dx89", "Dx91", "Dx92", "Dx93", "Dx94", "Dx96",
"Dx97", "Dx98", "NULL"), class = "factor"), freq = c(24L, 20L,
6L, 2L, 76L, 90L, 13L, 33L, 11L, 912L, 1L, 67L, 22L, 98L, 121L,
15L, 41L, 87L, 38L, 172L, 146L, 75L, 93L, 6L, 3L, 12L, 10L, 20L,
10L, 1026L, 309L, 4255L, 3006L, 1180L, 2580L, 158L, 40L, 33L,
1893L, 4521L, 9L, 1L, 2L, 126L, 1L, 5L, 18L, 557L, 11L, 398L,
249L, 250L, 169L, 34L, 135L, 432L, 644L, 163L, 101L, 9L, 28L,
910L, 258L, 171L, 744L, 90L, 225L, 24L, 6L, 2L, 39L, 5L, 1L,
3231L, 924L, 3213L, 6L, 23L, 1101L, 1208L, 64L, 2L, 27L, 114L,
5L, 11L, 21L, 66L, 27L, 513L, 565L, 129L, 210L, 59L, 5L, 376L,
653L, 65L, 68L, 3L, 18L, 1L, 95L, 64L, 2L, 274L, 2L, 1L)), row.names = c(NA,
108L), class = "data.frame")
dput(codesDesc):
structure(list(dx = c("Dx015", "Dx019", "Dx023", "Dx027", "Dx04",
"Dx100", "Dx101", "Dx103", "Dx105", "Dx109", "Dx24", "Dx26",
"Dx27", "Dx280", "Dx29", "Dx33", "Dx36", "Dx37", "Dx380", "Dx40",
"Dx41", "Dx53", "Dx54", "Dx55", "Dx56", "Dx57", "Dx65", "Dx66",
"Dx67", "Dx68", "Dx69", "Dx70", "Dx71", "Dx72", "Dx79", "Dx",
"Dx011", "Dx012", "Dx016", "Dx02", "Dx021", "Dx03", "Dx05", "Dx06",
"Dx07", "Dx08", "Dx09", "Dx108", "Dx11", "Dx1111", "Dx118", "Dx12",
"Dx13", "Dx14", "Dx15", "Dx16", "Dx18", "Dx19", "Dx20", "Dx21",
"Dx22", "Dx28", "Dx30", "Dx31", "Dx32", "Dx325", "Dx34", "Dx35",
"Dx38", "Dx39", "Dx49", "Dx50", "Dx60", "Dx61", "Dx62", "Dx64",
"Dx75", "Dx80", "Dx81", "Dx82", "Dx85", "Dx86", "Dx87", "Dx90",
"Dx92", "Dx94", "Dx", "Dx010", "Dx0101", "Dx0102", "Dx0103",
"Dx0104", "Dx0105", "Dx0106", "Dx0107", "Dx011", "Dx0111", "Dx0112",
"Dx0113", "Dx0114", "Dx0115", "Dx0116", "Dx0117", "Dx0118", "Dx0119",
"Dx01120", "Dx01121", "Dx01122", "Dx012", "Dx013", "Dx014", "Dx016",
"Dx0161", "Dx0162", "Dx017", "Dx018", "Dx0181", "Dx02", "Dx021",
"Dx024", "Dx025", "Dx026", "Dx028", "Dx03", "Dx05", "Dx06", "Dx07",
"Dx08", "Dx09", "Dx10", "Dx106", "Dx108", "Dx11", "Dx110", "Dx111",
"Dx1111", "Dx112", "Dx113", "Dx114", "Dx115", "Dx116", "Dx117",
"Dx118 ", "Dx119", "Dx12\n", "Dx120", "Dx121", "Dx13", "Dx14",
"Dx15", "Dx16", "Dx17\n", "Dx18", "Dx19", "Dx20", "Dx21", "Dx22",
"Dx23", "Dx25", "Dx28", "Dx30", "Dx31", "Dx32", "Dx321", "Dx322",
"Dx323", "Dx324", "Dx325", "Dx326", "Dx327", "Dx328", "Dx329",
"Dx330", "Dx332", "Dx333", "Dx334", "Dx335", "Dx336", "Dx337",
"Dx34", "Dx35", "Dx38", "Dx39", "Dx42", "Dx43", "Dx45", "Dx46",
"Dx47", "Dx48", "Dx49", "Dx50", "Dx51", "Dx52", "Dx58", "Dx59",
"Dx60", "Dx63", "Dx64", "Dx73", "Dx74", "Dx75", "Dx76", "Dx77",
"Dx78", "Dx80", "Dx81", "Dx82", "Dx83", "Dx84", "Dx85", "Dx86",
"Dx87", "Dx88", "Dx89", "Dx91", "Dx92", "Dx93", "Dx94", "Dx95",
"Dx96", "DX97", "Dx98", "Dx0121", "Dx0122", "Dx0123", "Dx0125",
"Dx0126", "Dx0127", "Dx0128", "Dx400", "DX401", "DX402", "DX403",
"DX404", "DX405", "DX406", "DX407", "DX408", "DX409"), disposition = c("Priority Transport to Emergency Department ",
"Hazardous Area Response Team", "Assistance is being dispatched to arrive within 30 minutes",
"Assistance is being dispatched to arrive within 8 hours", "Go to the Emergency Department within 1 hour",
"Call Terminated Early", "Call Handler terminated the call",
"Refer To A Clinician From Our Service - Caller Unhappy With The Disposition",
"Service response is required", "Dispatch of other emergency services",
"Health Protection Emergency", "Contact Care Plan Provider within agreed timescales",
"Contact Poisons Centre", "Speak to a nurse from our service for home management advice",
"Contact Specialist Practitioner", "Speak to Clinician From our Service Within 10 Minutes",
"Refer to Health Information Advisor Immediately", "Contact Secondary Care Routine",
"Speak to a nurse from our service for home management advice",
"Refer to Health Information Advisor within 15 minutes", "Refer to Health Information Advisor next working day",
"Refer to Health Information Advisor Immediately", "Refer to Senior Colleague",
"The disposition is Locally Approved Disposition", "The disposition is Follow Admission Protocol",
"Specialist Advice – Contraception ", "Flu Line Dispositions",
"Flu Line Dispositions", "Flu Line Dispositions", "Flu Line Dispositions",
"Flu Line Dispositions", "Flu Line Dispositions", "Flu Line Dispositions",
"Direct referral to Primary Care practitioner for assessment",
"Failed Contraception", "NHS Pathways Disposition Terms ", "Emergency Department Priority 1",
"Emergency Department Priority 2", "Emergency Department Priority 4",
"Emergency Department Priority 3", "Emergency Department Priority 3",
"Emergency Department Priority 4", "Primary Care Priority 1",
"Primary Care Priority 2", "Primary Care Priority 2", "Primary Care Priority 3",
"Primary Care Priority 4", "No further triage indicated", "Primary Care Priority 1",
"Emergency Department Priority 4", "Emergency Department Priority 4",
"Primary Care Priority 1", "Primary Care Priority 2", "Primary Care Priority 2",
"Primary Care Priority 3", "Primary Care Priority 4", "Primary Care Dental Priority 2",
"Primary Care Dental Priority 2", "Primary Care Dental Priority 2",
"Primary Care Dental Priority 2", "Primary Care Dental Priority 4",
"Urgent Care Centre Pharmacist", "Primary Care Midwife Priority 1 ",
"Primary Care GUM ", "Primary Care Priority 1", "One of my clinical colleagues needs to see you - Toxic Ingestion/Inhalation ED Priority 3",
"Primary Care Priority 1", "Primary Care Priority 1", "Primary Care Priority 4",
"Primary Care Priority 4", "Emergency Department Priority 3",
"Midwife or Labour Suite immediately Priority 2", "Primary Care Centre Optician",
"Speak to the GP Practice within 20 minutes ", "999 For an Ambulance ",
"Primary Care Centre - Epidemic - Antiviral", "Primary Care Priority 4",
"Primary Care Centre Repeat Prescription within 6 hours", "Primary Care Centre Repeat Prescription within 12 hours",
"Primary Care Centre Medication Enquiry", "Primary Care Centre Repeat Prescription required within 2 hours",
"Primary Care Centre Repeat Prescription within 12 hours", "Urgent Care Centre Repeat Prescription within 24 hours",
"Repeat Prescription required ", "Emergency Department Mental Health Priority 3",
"Emergency Department Sexual Assault Assessment Priority 3",
"NHS Pathways Disposition Terms ", "\n\nEmergency Ambulance Response for Potential Cardiac Arrest \n\n",
"Emergency Ambulance Response for Potential Cardiac Arrest",
"Emergency Ambulance Response for Potential Cardiac Arrest Post Delivery ",
"Emergency Ambulance response for Fitting Now", "Emergency Ambulance Response for Major Blood Loss",
"Emergency Ambulance Response for Potential Shock ", "Emergency Ambulance Response for Respiratory Distress",
"Emergency Ambulance Response for Unconsciousness", "Emergency Ambulance Response ",
"Emergency Ambulance Response for Acute Abdomen Pregnant", "Emergency Ambulance Response for Acute Coronary Syndrome",
"Emergency Ambulance Response for Anaphylaxis", "Emergency Ambulance Response for Aortic Aneurysm Rupture/Dissection",
"Emergency Ambulance for Labour Complications", "Emergency Ambulance Response for Major Blood Loss",
"Emergency Ambulance Response for Possible Stroke Time Critical",
"Emergency Ambulance Response for Potential Shock", "Emergency Ambulance Response for Respiratory Distress Non-Trauma",
"Emergency Ambulance Response for Respiratory Distress Trauma",
"Emergency Ambulance Response for Septicaemia", "Emergency Ambulance for Unconsciousness",
"Emergency Ambulance Response (Category 3)", "Assistance needed at home due to inability to get off the floor ",
"Crew arrived before a disposition was reached ", "Non-emergency Ambulance Response ",
"Non-emergency Ambulance Response possible Viral Haemorrhagic Fever ",
"Transport to an Emergency Treatment Centre within 1 hour \n(Category 3)",
"Ambulance for Clinical Reasons", "Ambulance for Transport Reasons",
"Emergency Ambulance due to Clinical Reasons ", "Attend Emergency Treatment Centre within 1 Hour",
"Attend Emergency Treatment Centre within 1 hour possible Viral Haemorrhagic Fever",
"Assistance is being dispatched to arrive within 2 hours ", "Assistance is being dispatched to arrive within 4 hours",
"A Deferred Dispatch is being arranged ", "Assistance is being dispatched to arrive within 1 hour ",
"Attend Emergency Treatment Centre within 4 Hours ", "To contact a Primary Care Service within 2 Hours ",
"To contact a Primary Care Service within 6 Hours ", "To contact a Primary Care Service within 12 Hours ",
"To contact a Primary Care Service within 24 Hours ", "For persistent or recurrent symptoms: get in touch with the GP Practice for a Non-Urgent Appointment ",
"MUST contact own GP Practice for a Non-Urgent appointment ",
"A Clinician from our Service will call the individual back immediately to assess the problem ",
"The call is closed with no further action needed", "Speak to a Primary Care Service within 1 Hour",
"Community Nurse within 4 hours ", "Community Nurse within 24 hours ",
"Speak to a Primary Care Service within 1 hour possible Viral Haemorrhagic Fever ",
"Community Nurse next working day ", "Health Visitor next working day ",
"Community Midwife next working Day ", "Contact own GP Practice next working day for an appointment ",
"Speak to a Primary Care Service within 6 hours for Expected Death ",
"Speak to a Primary Care Service within 1 hour for Palliative Care ",
"Attend Emergency Dental Treatment Centre within 4 hours ", "Callback by Healthcare Professional within 2 hours",
"Speak to a Primary Care Service within 2 Hours ", "Callback by Healthcare Professional within 4 hours",
"Speak to a Clinician Immediately for Assessment of Symptoms",
"Speak to a Primary Care Service within 6 Hours ", "Speak to a Primary Care Service within 12 Hours ",
"Speak to a Primary Care Service within 24 Hours", "For persistent or recurrent symptoms: get in touch with the GP Practice within 3 working days ",
"To contact a Dental Service within 1 hour ", "To Contact a Dental Service within 2 hours ",
"To contact a Dental Service within 6 hours ", "To contact a Dental Service within 12 hours ",
"To contact a Dental Service within 24 hours ", "To contact a Dental Practice within 5 working days ",
"Contact Orthodontist next working day ", "Home Management ",
"Contact Pharmacist within 12 hours ", "Speak to Midwife within 1 hour ",
"Contact Genito-Urinary Clinic or other local service ", "Speak to a Clinician from our service Immediately ",
"Speak to a Clinician from our service Immediately – Refused Ambulance Disposition",
"Speak to a Clinician from our service Immediately – Refused Emergency Treatment Centre Disposition ",
"Speak to a Clinician from our service Immediately – Refused Primary Care Service Disposition ",
"Speak to a Clinician from our service Immediately – Refused Disposition ",
"Speak to a Clinician from our service immediately – Toxic Ingestion/Inhalation ",
"Speak to a Clinician from our service immediately – Frequent Caller ",
"Speak to a Clinician from our service immediately – Chemical Eye Splash (Green 3)",
"Speak to a Clinician from our service Immediately – Management of Dying Individual (Expected) (Green 3)",
"Speak to a Clinician from our service Immediately - Failed Contraception ",
"Speak to a Clinician from our Service Immediately - Burn Chemical",
"Speak to a Clinician from our service Immediately Management of Palliative Care",
"Speak to a Clinician from our service Immediately - Ambulance Validation",
"Speak to a Clinician from our service Immediately - Emergency Treatment Centre Validation",
"Speak to a Clinician from our service Immediately - Other Disposition Validation",
"Paramedic requesting callback from Healthcare Professional within 30mins",
"Speak to an Assessor Immediately for Assessment of Symptoms",
"Speak to Clinician from our service within 30 minutes ", "Speak to Clinician from our service within 2 hours ",
"Speak to Clinician from our service for home management advice ",
"Symptom Management Advice ", "Child protection Vulnerable Adult immediate referral ",
"Child protection / Vulnerable Adult non immediate referral",
"Provide Service Location Information ", "Refer to Health Information within 24 hours ",
"Refer to a Community Healthcare Professional ", "Refer to another Out-Of-Hours Service Provider ",
"999 for police (Green 4)", "Speak to Midwife or Labour Suite immediately ",
"Speak to Midwife within 2 hours ", "The call is closed with referral to the Police only ",
"No Service Clinician available refer for urgent (20 minutes ) Primary Care Clinical Assessment.",
"No Service Clinician available refer for urgent 60 minutes primary care clinical assessment",
"Contact Optician next routine appointment within 72 Hours (3 days from now) ",
"Refer to Flu line ", "Speak to the Primary Care Service within 2 hours for antiviral assessment ",
"Refer To Social Services Immediately ", "Refer To Social Services Routinely ",
"MUST contact own GP Practice within 3 working days ", "Call back by Healthcare Professional within 30 minutes ",
"Call back by Healthcare Professional within 60 minutes ", "Receive report of results or tests from laboratory ",
"Repeat Prescription required within 6 hours ", "Contact own GP Practice next working day for a repeat prescription ",
"Medication Enquiry ", "Clinician Home Management of Dying Individual (Expected) ",
"Refer to Another Agency ", "Repeat prescription required within 2 hours ",
"Repeat prescription required within 12 hours", "Repeat prescription required within 24 hours ",
"Speak to a Dental Service within 2 hours", "Attend Emergency Treatment Centre within 12 hours ",
"Unexpected death ", "Refer to Mental Health/Crisis Service within 4 hours",
"Speak to the GP Practice within 1 hour (3 calls within 4 days)",
"Attend Emergency Treatment Centre within 1 hour for Sexual Assault Assessment ",
"This call is closed with no further action required wrong service called ",
"Refer to Health Information within 12 hours ", "Emergency Contraception required within 2 hours ",
"Emergency Contraception required within 12 hours ", "Emergency Ambulance Response (Category 3)",
"Emergency Ambulance Response (Category 3)", "Emergency Ambulance Response (Category 3)",
"Emergency Ambulance Response (Category 3)", "Emergency Ambulance Response for Trauma Emergency (Category 3)",
"Emergency Ambulance Response for Pregnancy/Labour problem (Category 3)",
"Non-emergency Ambulance Response", "Speak to an Assessor Immediately for Assessment of Significant Blood Loss",
"Speak to an Assessor Immediately for Assessment of Breathing Difficulties ",
"Speak to an Assessor Immediately for Assessment of Potential Critical Illness",
"Speak to an Assessor Immediately for Assessment of Potential Life Threatening Shock",
"Speak to an Assessor Immediately for Symptomatic Assessment",
"Speak to an Assessor Immediately for Assessment of Chest Pain",
"Speak to an Assessor Immediately for Assessment of Major Trauma",
"Speak to an Assessor Immediately for Assessment of Head Injury",
" Speak to an Assessor Immediately for Assessment of Probable Stroke or Mini-Stroke",
"Speak to an Assessor Immediately for Assessment of Possible Allergic Reaction"
)), class = "data.frame", row.names = c(NA, -239L))
What about merge the datasets?
In this case a left join:
merged <- merge(x = df1, y = codesDesc, by = "dx", all.x = TRUE)
head(merged)
dx freq disposition
1 Dx010 24 \n\nEmergency Ambulance Response for Potential Cardiac Arrest \n\n
2 Dx0101 20 Emergency Ambulance Response for Potential Cardiac Arrest
3 Dx0103 6 Emergency Ambulance response for Fitting Now
4 Dx0104 2 Emergency Ambulance Response for Major Blood Loss
5 Dx0105 76 Emergency Ambulance Response for Potential Shock
6 Dx0106 90 Emergency Ambulance Response for Respiratory Distress
Or using dplyr:
library(dplyr)
k <- df1 %>% left_join(codesDesc)
Note you have some double descriptions in your codesDesc, so the result has more rows than df1.
library(dplyr)
double_ <- as.data.frame.table(table( codesDesc$dx)) %>% filter(Freq >= 2)
And in df1 you have some of the double codes:
df1[df1$dx %in% double_$Var1,]

R Text Mining - the most frequent word in string across entire data frame

I am struggling to grasp text mining and determine word frequencies. I am just starting to have an understanding of R and its packages and I just find out about tm (after reading a while I have a feeling that this might solve my problem).
My question is: how can I determine the two most frequently used in a string across the entire column?
I have the following example:
structure(list(Location = c("Chicago", "Chicago", "Chicago",
"LA", "LA", "LA", "LA", "LA", "LA", "Texas", "Texas", "Texas",
"Texas", "Texas"), Code = c(4450L, 4450L, 4450L, 4450L, 4450L,
4450L, 4450L, 4450L, 4450L, 4410L, 4410L, 4410L, 4410L, 4410L
), Description = c("LABOR - CROSSOVER BOARD BRACKET", "LABOR - CROWN DOOR GASKET",
"LABOR - CROWN DOOR GASKET - APPLY 4' NEW GASKET AND RE-APPLY",
"LABOR - CUSHIONING DEVICE - END OF CAR CUSTOMER SUPPLIED MATERIAL",
"LABOR - DOOR EDGE", "LABOR - DOOR GASKET, CROWN CORNER", "LABOR - DOOR LOCK POCKET STG",
"LABOR - DOOR LOCK RECEPTICALS STG", "LABOR - DOOR LOCK STG",
"BOLT, HT, UNDER 5/8\"\" DIA & 6\"\" - SIDE POST", "BOLT, HT, UNDER 5/8\"\" DIA & 6\"\" - TRAINLINE TROLLEY",
"BOLT,HT,5/8 IN.DIA.OR LESS UNDER 6\"\" LONG - BRAKE STEP", "BOLT,HT,5/8 IN.DIA.OR LESS UNDER 6\"\" LONG - CROSSOVER BOARD",
"BOLT,HT,5/8 IN.DIA.OR LESS UNDER 6\"\" LONG - CROSSOVER BOARD BRACKET"
), `Desired Description Based on frequency` = c("Labor - CROWN DOOR GASKET",
"Labor - CROWN DOOR GASKET", "Labor - CROWN DOOR GASKET", "Labor - DOOR LOCK",
"Labor - DOOR LOCK", "Labor - DOOR LOCK", "Labor - DOOR LOCK",
"Labor - DOOR LOCK", "Labor - DOOR LOCK", "Bolt - HT", "Bolt - HT",
"Bolt - HT", "Bolt - HT", "Bolt - HT")), .Names = c("Location",
"Code", "Description", "Desired Description Based on frequency"
), row.names = c(NA, -14L), class = "data.frame")
In the end I wish I could add this column:
Desired Description Based on frequency
Labor - CROWN DOOR GASKET
Labor - CROWN DOOR GASKET
Labor - CROWN DOOR GASKET
Labor - DOOR LOCK
Labor - DOOR LOCK
Labor - DOOR LOCK
Labor - DOOR LOCK
Labor - DOOR LOCK
Labor - DOOR LOCK
Bolt - HT
Bolt - HT
Bolt - HT
Bolt - HT
Bolt - HT
Basically I want to evaluate all the 4450 or 4410s and see out of all the description in the table, which the most common and add that as a column. Another condition would be based on the location. Can someone please help me with a simple example?
Thank you so much
I don't think there's a one-size-fits-it-all-solution to your problem. (Beginning with the fact that there's no exact rule on which or how many words to take for the description.) However, here are two quick&dirty approaches, which might be helpful as a starting point:
library(tm)
txts <- gsub("[^A-Z]", " ", df$Description)
groups <- paste(df$Location, df$Code)
# 1
opts <- list(tolower=F, removePunctuation=TRUE, wordLengths=c(2, Inf))
lst <- split(txts, groups)
res <- sapply(lst, function(x) {
freq <- termFreq(paste(x, collapse=" "), opts)/length(x)
paste(names(freq[rank(-freq, ties.method = "first")<=3]), collapse = " - ")
})
rep(res, lengths(lst))
# 2
lst <- lapply(strsplit(txts, "\\s+"), function(x) x[1:min(c(3,length(x)))] )
lst <- split(lst, groups)
n <- lengths(lst)
lst <- mapply("/", lapply(lst, function(x) sort(table(unlist(x)), decreasing = T)), n)
rep(sapply(lst, function(x) paste(names(x)[x>=.5], collapse=" - ")), n)

shyquote does not work! What to do?

I would like to convert the following list into a list where all names are between " " (spaces)
I tried shQuote, gsub(" ", "", ) and these methods; Creating a comma separated vector, but no success so far...
George Ezra, Faith No More, Above & Beyond, Paloma Faith, Gavin James, DJ’s Waxfiend, Jebroer, Adje, Pop Evil, Jick munro & the amazing laserbeams, Robbie Williams, Avicii, The Script, Anouk, Kensington, Eagles of Death Metal, Dotan, The Wombats, Selah Sue, Shappard, John Coffey, Magic!, Joost van Bellen, East Camoran Folkcore, Foo Fighters, Pharrel Williams, Sam Smith, One Republic, Rise Agianst, De Jeugd van Tegenwoordig, Counting Crows, Fiddler’s Green, Thyphoon, Kovacs, Kitty, Daisy & Lewis, Oscar and the Wolf, Nick Mulvey, Urbanus, Willie Wartaal, Doppelgang, Ewert and the two dragons, Pierce Brothers,Kovacs, The Kendolls, Stringcaster, Sunday Sun, Toy Dolls, A$AP Rocky, Ride, Eskmo, Temples, The Pop Group, Blank Mass, Cairo Liberation Front, Daniel Norgren, Follakzoid, Ghost Culture, John Coffey, Kevin Morby, Kuenta I Tambu, Marmozets, Mourn, Patten, Sue The Night, The Coathangers, Tora, Vessels, The Libertines, Noel Gallagher’s High Flying Brids, Noel Gallagher, AltJ, Altj, Royal Blood, Sohn, The Jesus & Mary Chain, The Tallest Man On Earth, Black Mountain, Chet Faker, Death Cab For Cutie, Ear Sweatshirt, Evian Christ, Frist Aid Kit, Future Islands, Jonny Greenwood, Mew, Of Monsters And Men, The Vaccines, Ariel Pink, Alvvays, Wolf Alice, Weval, BADBADNOTGOOD, Bass Drum Of Death, Yak, Daniel Romand, Dan Dercon, Eagulls, Gengahr, Fickle friends, Steve Gunn, Liima, Hookworms, Kate Tempest, Kiasmds, Strand of Oaks, Little May , Matthew E. White, Metz, Off!, St. Paul, St. Paul & The Broken Bones, Pissed Jeans, Pretty Vicious, Reigning Sound, Outfit, Sunset Sons, Waxahatchee, Daniel Wilson, Yung Lean, Kindess, Hinds,Damien Rice, The War On Drugs, Iggy Pop, FKA Twigs, Patti Smith And Her Band Perform Horses, Flying Lotus, Fat Freddy’s Drop, Damian Jr Gong Marley, Alabama Shakes, The Gaslamp Killer, Max Richter, Motorpsycho, Goat, Songhoy Blues, Andrew Brid, Glass Animals, King Gizzard & The Lizard Wizard, Misun, JD MCPherson, Happyness, Dolomite Minor, Meridian Brothers, Death From Above 1979, Blaudzun, Oscar And The Wolf, Clark, Ghost Poet, Omar Souleyman, Rhye, Bejamin Booker, Orkesta Mendoza, Ganz,The Chemical Brothers, Patrick Watson, Bleachers, The War on Drugs, The Antlers, Hot Chip, Rico & Sticks, Awolnation
A simple strsplit() will do.
my.bands <- "George Ezra, Faith No More, Above & Beyond, Paloma Faith, Gavin James, DJ’s Waxfiend, Jebroer, Adje, Pop Evil, Jick munro & the amazing laserbeams, Robbie Williams, Avicii, The Script, Anouk, Kensington, Eagles of Death Metal, Dotan, The Wombats, Selah Sue, Shappard, John Coffey, Magic!, Joost van Bellen, East Camoran Folkcore, Foo Fighters, Pharrel Williams, Sam Smith, One Republic, Rise Agianst, De Jeugd van Tegenwoordig, Counting Crows, Fiddler’s Green, Thyphoon, Kovacs, Kitty, Daisy & Lewis, Oscar and the Wolf, Nick Mulvey, Urbanus, Willie Wartaal, Doppelgang, Ewert and the two dragons, Pierce Brothers,Kovacs, The Kendolls, Stringcaster, Sunday Sun, Toy Dolls, A$AP Rocky, Ride, Eskmo, Temples, The Pop Group, Blank Mass, Cairo Liberation Front, Daniel Norgren, Follakzoid, Ghost Culture, John Coffey, Kevin Morby, Kuenta I Tambu, Marmozets, Mourn, Patten, Sue The Night, The Coathangers, Tora, Vessels, The Libertines, Noel Gallagher’s High Flying Brids, Noel Gallagher, AltJ, Altj, Royal Blood, Sohn, The Jesus & Mary Chain, The Tallest Man On Earth, Black Mountain, Chet Faker, Death Cab For Cutie, Ear Sweatshirt, Evian Christ, Frist Aid Kit, Future Islands, Jonny Greenwood, Mew, Of Monsters And Men, The Vaccines, Ariel Pink, Alvvays, Wolf Alice, Weval, BADBADNOTGOOD, Bass Drum Of Death, Yak, Daniel Romand, Dan Dercon, Eagulls, Gengahr, Fickle friends, Steve Gunn, Liima, Hookworms, Kate Tempest, Kiasmds, Strand of Oaks, Little May , Matthew E. White, Metz, Off!, St. Paul, St. Paul & The Broken Bones, Pissed Jeans, Pretty Vicious, Reigning Sound, Outfit, Sunset Sons, Waxahatchee, Daniel Wilson, Yung Lean, Kindess, Hinds,Damien Rice, The War On Drugs, Iggy Pop, FKA Twigs, Patti Smith And Her Band Perform Horses, Flying Lotus, Fat Freddy’s Drop, Damian Jr Gong Marley, Alabama Shakes, The Gaslamp Killer, Max Richter, Motorpsycho, Goat, Songhoy Blues, Andrew Brid, Glass Animals, King Gizzard & The Lizard Wizard, Misun, JD MCPherson, Happyness, Dolomite Minor, Meridian Brothers, Death From Above 1979, Blaudzun, Oscar And The Wolf, Clark, Ghost Poet, Omar Souleyman, Rhye, Bejamin Booker, Orkesta Mendoza, Ganz,The Chemical Brothers, Patrick Watson, Bleachers, The War on Drugs, The Antlers, Hot Chip, Rico & Sticks, Awolnation"
my.bands.vector <- strsplit(my.bands, ', ')[[1]] ## you could probably stop here, but you asked for a list, which means something specific in R
my.bands.list <- as.list(my.bands.vector)
> str(my.bands.list)
List of 159
$ : chr "George Ezra"
$ : chr "Faith No More"
$ : chr "Above & Beyond"
$ : chr "Paloma Faith"
$ : chr "Gavin James"
[list output truncated]
And if you want to convert back to a string with 's in the string:
paste(shQuote(my.bands.list, type = "sh"), collapse = ', ')
[1] "'George Ezra', 'Faith No More', 'Above & Beyond', 'Paloma Faith', 'Gavin James', 'DJ’s Waxfiend', 'Jebroer', 'Adje', 'Pop Evil', 'Jick munro & the amazing laserbeams', 'Robbie Williams', 'Avicii', 'The Script', 'Anouk', 'Kensington', 'Eagles of Death Metal', 'Dotan', 'The Wombats', 'Selah Sue', 'Shappard', 'John Coffey', 'Magic!', 'Joost van Bellen', 'East Camoran Folkcore', 'Foo Fighters', 'Pharrel Williams', 'Sam Smith', 'One Republic', 'Rise Agianst', 'De Jeugd van Tegenwoordig', 'Counting Crows', 'Fiddler’s Green', 'Thyphoon', 'Kovacs', 'Kitty', 'Daisy & Lewis', 'Oscar and the Wolf', 'Nick Mulvey', 'Urbanus', 'Willie Wartaal', 'Doppelgang', 'Ewert and the two dragons', 'Pierce Brothers,Kovacs', 'The Kendolls', 'Stringcaster', 'Sunday Sun', 'Toy Dolls', 'A$AP Rocky', 'Ride', 'Eskmo', 'Temples', 'The Pop Group', 'Blank Mass', 'Cairo Liberation Front', 'Daniel Norgren', 'Follakzoid', 'Ghost Culture', 'John Coffey', 'Kevin Morby', 'Kuenta I Tambu', 'Marmozets', 'Mourn', 'Patten', 'Sue The Night', 'The Coathangers', 'Tora', 'Vessels', 'The Libertines', 'Noel Gallagher’s High Flying Brids', 'Noel Gallagher', 'AltJ', 'Altj', 'Royal Blood', 'Sohn', 'The Jesus & Mary Chain', 'The Tallest Man On Earth', 'Black Mountain', 'Chet Faker', 'Death Cab For Cutie', 'Ear Sweatshirt', 'Evian Christ', 'Frist Aid Kit', 'Future Islands', 'Jonny Greenwood', 'Mew', 'Of Monsters And Men', 'The Vaccines', 'Ariel Pink', 'Alvvays', 'Wolf Alice', 'Weval', 'BADBADNOTGOOD', 'Bass Drum Of Death', 'Yak', 'Daniel Romand', 'Dan Dercon', 'Eagulls', 'Gengahr', 'Fickle friends', 'Steve Gunn', 'Liima', 'Hookworms', 'Kate Tempest', 'Kiasmds', 'Strand of Oaks', 'Little May ', 'Matthew E. White', 'Metz', 'Off!', 'St. Paul', 'St. Paul & The Broken Bones', 'Pissed Jeans', 'Pretty Vicious', 'Reigning Sound', 'Outfit', 'Sunset Sons', 'Waxahatchee', 'Daniel Wilson', 'Yung Lean', 'Kindess', 'Hinds,Damien Rice', 'The War On Drugs', 'Iggy Pop', 'FKA Twigs', 'Patti Smith And Her Band Perform Horses', 'Flying Lotus', 'Fat Freddy’s Drop', 'Damian Jr Gong Marley', 'Alabama Shakes', 'The Gaslamp Killer', 'Max Richter', 'Motorpsycho', 'Goat', 'Songhoy Blues', 'Andrew Brid', 'Glass Animals', 'King Gizzard & The Lizard Wizard', 'Misun', 'JD MCPherson', 'Happyness', 'Dolomite Minor', 'Meridian Brothers', 'Death From Above 1979', 'Blaudzun', 'Oscar And The Wolf', 'Clark', 'Ghost Poet', 'Omar Souleyman', 'Rhye', 'Bejamin Booker', 'Orkesta Mendoza', 'Ganz,The Chemical Brothers', 'Patrick Watson', 'Bleachers', 'The War on Drugs', 'The Antlers', 'Hot Chip', 'Rico & Sticks', 'Awolnation'"
Here's the double quote version, notice that double quotes must be escaped.
paste(shQuote(my.bands.list, type = "cmd"), collapse = ', ')
[1] "\"George Ezra\", \"Faith No More\", \"Above & Beyond\", \"Paloma Faith\", \"Gavin James\", \"DJ’s Waxfiend\", \"Jebroer\", \"Adje\", \"Pop Evil\", \"Jick munro & the amazing laserbeams\", \"Robbie Williams\", \"Avicii\", \"The Script\", \"Anouk\", \"Kensington\", \"Eagles of Death Metal\", \"Dotan\", \"The Wombats\", \"Selah Sue\", \"Shappard\", \"John Coffey\", \"Magic!\", \"Joost van Bellen\", \"East Camoran Folkcore\", \"Foo Fighters\", \"Pharrel Williams\", \"Sam Smith\", \"One Republic\", \"Rise Agianst\", \"De Jeugd van Tegenwoordig\", \"Counting Crows\", \"Fiddler’s Green\", \"Thyphoon\", \"Kovacs\", \"Kitty\", \"Daisy & Lewis\", \"Oscar and the Wolf\", \"Nick Mulvey\", \"Urbanus\", \"Willie Wartaal\", \"Doppelgang\", \"Ewert and the two dragons\", \"Pierce Brothers,Kovacs\", \"The Kendolls\", \"Stringcaster\", \"Sunday Sun\", \"Toy Dolls\", \"A$AP Rocky\", \"Ride\", \"Eskmo\", \"Temples\", \"The Pop Group\", \"Blank Mass\", \"Cairo Liberation Front\", \"Daniel Norgren\", \"Follakzoid\", \"Ghost Culture\", \"John Coffey\", \"Kevin Morby\", \"Kuenta I Tambu\", \"Marmozets\", \"Mourn\", \"Patten\", \"Sue The Night\", \"The Coathangers\", \"Tora\", \"Vessels\", \"The Libertines\", \"Noel Gallagher’s High Flying Brids\", \"Noel Gallagher\", \"AltJ\", \"Altj\", \"Royal Blood\", \"Sohn\", \"The Jesus & Mary Chain\", \"The Tallest Man On Earth\", \"Black Mountain\", \"Chet Faker\", \"Death Cab For Cutie\", \"Ear Sweatshirt\", \"Evian Christ\", \"Frist Aid Kit\", \"Future Islands\", \"Jonny Greenwood\", \"Mew\", \"Of Monsters And Men\", \"The Vaccines\", \"Ariel Pink\", \"Alvvays\", \"Wolf Alice\", \"Weval\", \"BADBADNOTGOOD\", \"Bass Drum Of Death\", \"Yak\", \"Daniel Romand\", \"Dan Dercon\", \"Eagulls\", \"Gengahr\", \"Fickle friends\", \"Steve Gunn\", \"Liima\", \"Hookworms\", \"Kate Tempest\", \"Kiasmds\", \"Strand of Oaks\", \"Little May \", \"Matthew E. White\", \"Metz\", \"Off!\", \"St. Paul\", \"St. Paul & The Broken Bones\", \"Pissed Jeans\", \"Pretty Vicious\", \"Reigning Sound\", \"Outfit\", \"Sunset Sons\", \"Waxahatchee\", \"Daniel Wilson\", \"Yung Lean\", \"Kindess\", \"Hinds,Damien Rice\", \"The War On Drugs\", \"Iggy Pop\", \"FKA Twigs\", \"Patti Smith And Her Band Perform Horses\", \"Flying Lotus\", \"Fat Freddy’s Drop\", \"Damian Jr Gong Marley\", \"Alabama Shakes\", \"The Gaslamp Killer\", \"Max Richter\", \"Motorpsycho\", \"Goat\", \"Songhoy Blues\", \"Andrew Brid\", \"Glass Animals\", \"King Gizzard & The Lizard Wizard\", \"Misun\", \"JD MCPherson\", \"Happyness\", \"Dolomite Minor\", \"Meridian Brothers\", \"Death From Above 1979\", \"Blaudzun\", \"Oscar And The Wolf\", \"Clark\", \"Ghost Poet\", \"Omar Souleyman\", \"Rhye\", \"Bejamin Booker\", \"Orkesta Mendoza\", \"Ganz,The Chemical Brothers\", \"Patrick Watson\", \"Bleachers\", \"The War on Drugs\", \"The Antlers\", \"Hot Chip\", \"Rico & Sticks\", \"Awolnation\""

Resources