Is there a limit to the number of levels in R? - r

I am wondering if there is a limitation on the number of levels for a factor?
I am trying to restructure some curriculums from Xing. The selectable industries are around 135 different ones.
My code looks like that, as I mentioned there are 135 different industries in my actual code.
companyIndustryLevels <- c("","ACADEMIA", "ACCOUNTING", "AEROSPACE")
levels(samples[[1]]$Industry) <- companyIndustryLevels
The following combinations work fine and are selectable when filtering the list.
genderLevels <- c("M","F")
companySizeLevels <- c("","1","1-10","11-50","51-200","201-500","501-1000","1001-5000","5001-10000","10001+")
levels(samples[[1]]$Gender) <- genderLevels
levels(samples[[1]]$CompanySize) <- companySizeLevels
So the problem is, that when viewing the list, the industry column only shows factor with 1 level, not with 135 levels.
EDIT:
I am using RStudio Version 11.1.383 and R Version 3.4.3.
As you can see in the reproductable example below the other columns like "Gender", "Beschäftigungsart", "Position", "Unternehmensgroesse" also got levels.
When selecting the Filter in the View Window in RStudio I am able to filter all of the columns by their levels, except the "Industrie" column.
View(structure(
list(
ID = 1,
Gender = structure(1L, .Label = c("M",
"F"), class = "factor"),
Bildungseinrichtungen = structure(1L, .Label = "", class = "factor"),
Abschluss = structure(1L, .Label = "", class = "factor"),
Studienfach = structure(1L, .Label = "", class = "factor"),
Beschäftigungsart = structure(
1L,
.Label = c(
"",
"FULL_TIME_EMPLOYEE",
"PART_TIME_EMPLOYEE",
"INTERN",
"FREELANCER",
"OWNER",
"PARTNER",
"BOARD_MEMBER",
"VOLUNTEER"
),
class = "factor"
),
Station.Start = NA,
Station.Ende = NA,
Bezeichnung = NA,
Position = structure(
1L,
.Label = c(
"",
"STUDENT_INTERN",
"ENTRY_LEVEL",
"PROFESSIONAL_EXPERIENCED",
"MANAGER_SUPERVISOR",
"EXECUTIVE",
"SENIOR_EXECUTIVE"
),
class = "factor"
),
Unternehmen = structure(1L, .Label = "AMA", class = "factor"),
Unternehmensgroesse = structure(
1L,
.Label = c(
"",
"1",
"1-10",
"11-50",
"51-200",
"201-500",
"501-1000",
"1001-5000",
"5001-10000",
"10001+"
),
class = "factor"
),
Industrie = structure(
1L,
.Label = c(
"ACADEMIA",
"ACCOUNTING",
"AEROSPACE",
"AGRICULTURE",
"AIRLINES",
"ALTERNATIVE_MEDICINE",
"APPAREL_AND_FASHION",
"ARCHITECTURE_AND_PLANNING",
"ARTS_AND_CRAFTS",
"AUTOMOTIVE",
"BANKING",
"BIOTECHNOLOGY",
"BROADCAST_MEDIA",
"BUILDING_MATERIALS",
"BUSINESS_SUPPLIES_AND_EQUIPMENT",
"CHEMICALS",
"CIVIC_AND_SOCIAL_ORGANIZATIONS",
"CIVIL_ENGINEERING",
"CIVIL_SERVICE",
"COMPOSITES",
"COMPUTER_AND_NETWORK_SECURITY",
"COMPUTER_GAMES",
"COMPUTER_HARDWARE",
"COMPUTER_NETWORKING",
"COMPUTER_SOFTWARE",
"CONSTRUCTION",
"CONSULTING",
"CONSUMER_ELECTRONICS",
"CONSUMER_GOODS",
"CONSUMER_SERVICES",
"COSMETICS",
"DAYCARE",
"DEFENSE_MILITARY",
"DESIGN",
"EDUCATION",
"ELEARNING",
"ELECTRICAL_ENGINEERING",
"ENERGY",
"ENTERTAINMENT",
"ENVIRONMENTAL_SERVICES",
"EVENTS_SERVICES",
"FACILITIES_SERVICES",
"FACILITY_MANAGEMENT",
"FINANCIAL_SERVICES",
"FISHERY",
"FOOD",
"FUNDRAISING",
"FURNITURE",
"GARDENING_LANDSCAPING",
"GEOLOGY",
"GLASS_AND_CERAMICS",
"GRAPHIC_DESIGN",
"HEALTH_AND_FITNESS",
"HOSPITALITY",
"HUMAN_RESOURCES",
"IMPORT_AND_EXPORT",
"INDUSTRIAL_AUTOMATION",
"INFORMATION_SERVICES",
"INFORMATION_TECHNOLOGY_AND_SERVICES",
"INSURANCE",
"INTERNATIONAL_AFFAIRS",
"INTERNATIONAL_TRADE_AND_DEVELOPMENT",
"INTERNET",
"INVESTMENT_BANKING",
"JOURNALISM",
"LEGAL_SERVICES",
"LEISURE_TRAVEL_AND_TOURISM",
"LIBRARIES",
"LOGISTICS_AND_SUPPLY_CHAIN",
"LUXURY_GOODS_AND_JEWELRY",
"MACHINERY",
"MANAGEMENT_CONSULTING",
"MARITIME",
"MARKETING_AND_ADVERTISING",
"MARKET_RESEARCH",
"MECHANICAL_INDUSTRIAL_ENGINEERING",
"MEDIA_PRODUCTION",
"MEDICAL_DEVICES",
"MEDICAL_SERVICES",
"MEDICINAL_PRODUCTS",
"METAL_METALWORKING",
"METROLOGY_CONTROL_ENGINEERING",
"MINING_AND_METALS",
"MOTION_PICTURES",
"MUSEUMS_AND_CULTURAL_INSTITUTIONS",
"MUSIC",
"NANOTECHNOLOGY",
"NON_PROFIT_ORGANIZATION",
"NURSING_AND_PERSONAL_CARE",
"OIL_AND_ENERGY",
"ONLINE_MEDIA",
"OTHERS",
"OUTSOURCING_OFFSHORING",
"PACKAGING_AND_CONTAINERS",
"PAPER_AND_FOREST_PRODUCTS",
"PHOTOGRAPHY",
"PLASTICS",
"POLITICS",
"PRINTING",
"PRINT_MEDIA",
"PROCESS_MANAGEMENT",
"PROFESSIONAL_TRAINING_AND_COACHING",
"PSYCHOLOGY_PSYCHOTHERAPY",
"PUBLIC_HEALTH",
"PUBLIC_RELATIONS_AND_COMMUNICATIONS",
"PUBLISHING",
"RAILROAD",
"REAL_ESTATE",
"RECREATIONAL_FACILITIES_AND_SERVICES",
"RECYCLING_AND_WASTE_MANAGEMENT",
"RENEWABLES_AND_ENVIRONMENT",
"RESEARCH",
"RESTAURANTS_AND_FOOD_SERVICE",
"RETAIL",
"SECURITY_AND_INVESTIGATIONS",
"SEMICONDUCTORS",
"SHIPBUILDING",
"SPORTS",
"STAFFING_AND_RECRUITING",
"TAX_ACCOUNTANCY_AUDITING",
"TELECOMMUNICATION",
"TEXTILES",
"THEATER_STAGE_CINEMA",
"TIMBER",
"TRAFFIC_ENGINEERING",
"TRANSLATION_AND_LOCALIZATION",
"TRANSPORT",
"VENTURE_CAPITAL_AND_PRIVATE_EQUITY",
"VETERINARY",
"WELFARE_AND_COMMUNITY_HEALTH",
"WHOLESALE",
"WINE_AND_SPIRITS",
"WRITING_AND_EDITING",
"PHARMACEUTICALS"
),
class = "factor"
)
),
.Names = c(
"ID",
"Gender",
"Bildungseinrichtungen",
"Abschluss",
"Studienfach",
"Beschäftigungsart",
"Station.Start",
"Station.Ende",
"Bezeichnung",
"Position",
"Unternehmen",
"Unternehmensgroesse",
"Industrie"
),
row.names = 1L,
class = "data.frame"
))

It seems as if the Filtering option in RStudio's Data Viewer (View()) offers a drop down menu for a factor, when its number of levels (nlevels()) is less than 65. Otherwise it defaults to a search field:
df <- data.frame(x=as.factor(1:64))
View(df)
# "filter" yields a drop down menu
df <- data.frame(x=as.factor(1:65))
View(df)
# "filter" yields a search field
RStudio.Version()$version
# [1] ‘1.0.143’
Note that this has nothing to do with R itself, as already mentioned in the comments.

Related

Why does clicking on a filtering box of renderDataTable sends me back to the left side of the screen in R shiny?

You can use the following piece of code as an example:
options(stringsAsFactors = FALSE)
options(encoding = "UTF-8")
library(shiny)
library(shinyWidgets)
library(shinydashboard)
library(dplyr)
library(shinycssloaders)
library(data.table)
library(tidyverse)
library(DT)
dt <- structure(list(GENE = c("SI", "ARSA", "ABCA3", "KIT", "IVD", "COL18A1"), RefSeq_ID = c("NM_001041.4", "NM_000487.6", "NM_001089.3", "NM_000222.3", "NM_002225.5", "NM_001379500.1"), Tag = c("DM", "DM", "DM", "DM", "DM?", "DM"), clinvar_clnsig = c("Pathogenic", "Uncertain_significance", "NULL", "NULL", "NULL", "Likely_pathogenic"), MutationType = c("missense", "missense", "initiation", "missense", "missense", "nonsense"), ExpectedInheritance = c("AR", "AR", "AR", "AD", "AR", "AR"), Disease = c("Sucrase isomaltase deficiency", "Metachromatic leukodystrophy", "Fatal surfactant deficiency", "Piebaldism", "Isovaleric acidaemia", "Knobloch syndrome"), hgvs = c("1022T>C", "991G>A", "3G>C", "1861G>T", "1124G>A", "1876C>T"), hgvsAll = c("1022TtoC | L341P", "991GtoA | E331K", "3GtoC | M1I", "1861GtoT | A621S", "1124GtoA | G375D", "1876CtoT | R626*"), comments = c("Descr. as T/C 1021 L340P, mut. conf. by PC <1592>.", "Found in cis with Pd allele. Descr. as G985A E329R, mut. conf. by PC <1251>.", "Descr. as M1I, base change conf. by PC <1663>.", "Descr. as 1861G>C A621S, mut. conf. by PC <1495>.", "Descr. as c.1124G>A; G375A, mut. conf. by PC <1331>.", "Descr. as c.2416C>T, posn. conf. by PC <1439>."), gnomad_AC = c("NULL", "NULL", "NULL", "NULL", "NULL", "NULL"), pmid = c("10903344", "12809637", "16641205", "17124503", "19089597", "18484314"), pmidAll = c("NULL", "30052522", "24871971", "NULL", "32778825|32977617", "16532212"), CHROM = c("3", "22", "16", "4", "15", "21"), POS = c("165060026", "50626052", "2326464", "54727909", "40416348", "45487489"), REF = c("A", "C", "C", "G", "G", "C"), ALT = c("G", "T", "G", "T", "A", "T"), Support = c("NULL", "1", "0", "NULL", "2", "1"), Rankscore = c("0.48", "0.17", "0.38", "0.68", "0.5871645293736492", "0.99"), gdbid = c("120377", "119007", "3770735", "120117", "119354", "138752"), omimid = c("609845", "607574", "601615", "164920", "607036", "120328"), amino = c("Leu-Pro", "Glu-Lys", "Met-Ile", "Ala-Ser", "Gly-Asp", "Arg-Term"), deletion = c("NULL", "NULL", "NULL", "NULL", "NULL", "NULL"), insertion = c("NULL", "NULL", "NULL", "NULL", "NULL", "NULL"), codon = c("341", "331", "1", "621", "375", "626"), codonAff = c("341", "331", "1", "621", "375", "626"), descr = c("Leu341Pro", "Glu331Lys", "Met1Ile", "Ala621Ser", "Gly375Asp", "Arg626Term"), refseq = c("NM_001041.4", "NM_000487.6", "NM_001089.3", "NM_000222.3", "NM_002225.5", "NM_001379500.1"), dbsnp = c("rs267607049", "NULL", "NULL", "NULL", "rs769261274", "NULL"), gnomad_AF = c("NULL", "NULL", "NULL", "NULL", "NULL", "NULL"), gnomad_AN = c("NULL", "NULL", "NULL", "NULL", "NULL", "NULL"), new_date = c("2013-06-11", "2011-09-23", "2013-09-19", "2012-11-06", "2012-01-25", "2012-07-27"), base = c("M", "M", "M", "M", "M", "M"), clinvarID = c("1413", "556001", "NULL", "NULL", "NULL", "915432"), entrezID = c("6476", "410", "21", "3815", "3712", "80781"), hgncID = c("10856", "713", "33", "6342", "6186", "2195"), svar = c("NULL", "NULL", "NULL", "NULL", "NULL", "NULL"), mut = c("Y", "Y", "Y", "Y", "Y", "Y"), poly = c("Y", "Y", "Y", "Y", "N", "Y"), ftv = c("N", "N", "N", "N", "N", "Y"), TotalMutations = c(74L, 320L, 338L, 144L, 157L, 76L), NewMutations = c(1L, 5L, 14L, 2L, 1L, 5L), gene_date = c("1996-04-01", "1996-04-01", "2004-04-15", "1996-04-01", "1996-04-01", "2000-09-15"), author = c("Jacob", "Rafi", "Garmany", "Bondanza", "Bonilla Guerrero", "Williams"), title = c("Congenital sucrase-isomaltase deficiency arising from cleavage and secretion of a mutant form of the enzyme.", "Disease-causing mutations in cis with the common arylsulfatase A pseudodeficiency allele compound the difficulties in accurately identifying patients and carriers of metachromatic leukodystrophy.", "Surfactant composition and function in patients with ABCA3 mutations.", "Piebald trait: implication of kit mutation on in vitro melanocyte survival and on the clinical application of cultured epidermal autografts.", "Essential fatty acid profiling for routine nutritional assessment unmasks adrenoleukodystrophy in an infant with isovaleric acidaemia.", "A phenotypic variant of Knobloch syndrome."), fullname = c("J Clin Invest", "Mol Genet Metab", "Pediatr Res", "J Invest Dermatol", "J Inherit Metab Dis", "Ophthalmic Genet"), allname = c("The Journal of clinical investigation", "Molecular genetics and metabolism", "Pediatric research", "The Journal of investigative dermatology", "Journal of inherited metabolic disease", "Ophthalmic genetics"), vol = c("106", "79", "59", "127", "31S2", "29"), page = c("281", "83", "801", "676", "S453", "85"), year = c(2000L, 2003L, 2006L, 2007L, 2008L, 2008L), reftag = c("PRI", "PRI", "PRI", "PRI", "PRI", "PRI"), Some_ID = c("BM0042985", "BM0393251", "BM0673028", "BM0795183", "BM0867669", "BM0887391"), OtherNames = c("NULL", "ASA|MLD", "ABC-C|ABC3|EST111653|LBM180|SMDP3", "C-Kit|CD117|MASTC|PBT|SCFR", "ACAD2|IVDH", "GLCC|KNO|KNO1|KS"), Location = c("3q25.2-q26.2", "22q13.31-qter", "16p13.3", "4q11-q12", "15q14-q15", "21q22.3"), STRAND = c("-", "-", "-", "+", "+", "+"), FullGeneName = c("Sucrase-isomaltase", "Arylsulfatase A", "ATP binding cassette subfamily A member 3", "KIT proto-oncogene, receptor tyrosine kinase", "Isovaleryl-CoA dehydrogenase", "Collagen type XVIII alpha 1 chain")), row.names = c(NA, -6L), class = c("data.table", "data.frame"))
dt$gnomad_AC <- as.numeric(dt$gnomad_AC)
dt$Support <- as.numeric(dt$Support)
dt$Rankscore <- as.numeric(dt$Rankscore)
dt$gnomad_AF <- as.numeric(dt$gnomad_AF)
dt$gnomad_AN <- as.numeric(dt$gnomad_AN)
# ui
ui <- fluidPage(
theme = "slate",
navbarPage(
title = "Some Table",
header = tagList(useShinydashboard()),
tabPanel(
"Test",
fluidRow(
box(
dataTableOutput("mytable") %>% withSpinner(color="#0dc5c1"),
width = 12,
collapsible = FALSE,
title = "",
solidHeader = T
)
)
)
)
)
# server
server <- function(input, output) {
res <- reactive ({
outputdf <- withProgress(
message = "Loading ...",
expr = {sample_n(dt, size = 5) }
)
outputdf
})
output$mytable <-
renderDataTable(
res(),
filter = list(position = "top", clear = FALSE, plain = TRUE),
options = list(scrollX = TRUE,autoWidth = TRUE, search = list(regex = TRUE)),
rownames = FALSE
)
}
# app
shinyApp(ui, server)
And the output will give a table with filtering boxes on top of the table:
But the problem is that whenever I click on one of the filter boxes, it sends me back to the left side of the table. I have to scroll back again to where I clicked before I can use the filtering slider. Is there any way I can fix this?
I really appreciate any help you can provide.
This isn't a new issue. It looks like it's been reported a few times to the maintainers of the package.
In your UI, you can set the box to scroll. In the server, set your table to not scroll and not set an auto width.
The changes in the user interface.
ui <- fluidPage(
theme = "slate",
navbarPage(
title = "Some Table",
header = tagList(useShinydashboard()),
tabPanel(
"Test",
fluidRow(
box(style = "overflow-x: scroll;", # <--- I'm new!
dataTableOutput("mytable") %>% withSpinner(color = "#0dc5c1"),
width = 12,
collapsible = FALSE,
title = "",
solidHeader = T
)
)
)
)
)
The change in server.
# server
server <- function(input, output) {
res <- reactive ({
outputdf <- withProgress(
message = "Loading ...",
expr = {sample_n(dt, size = 5) }
)
outputdf
})
output$mytable <-
renderDataTable(
res(),
filter = list(position = "top", clear = FALSE, plain = TRUE),
options = list(scrollX = F, autoWidth = F, # <---- both of these have flipped
search = list(regex = TRUE)),
rownames = FALSE
)
}

create data frame from nested entries

I have a data frame test like this:
dput(test)
structure(list(X = 1L, entityId = structure(1L, .Label = "HOST-123", class = "factor"),
displayName = structure(1L, .Label = "server1", class = "factor"),
discoveredName = structure(1L, .Label = "server1", class = "factor"),
firstSeenTimestamp = 1593860000000, lastSeenTimestamp = 1603210000000,
tags = structure(1L, .Label = "c(\"CONTEXTLESS\", \"CONTEXTLESS\", \"CONTEXTLESS\", \"CONTEXTLESS\", \"CONTEXTLESS\", \"CONTEXTLESS\", \"CONTEXTLESS\", \"CONTEXTLESS\"), c(\"app1\", \"client\", \"org\", \"app1\", \"DATA_CENTER\", \"PURPOSE\", \"REGION\", \"Test\"), c(NA, \"NONE\", \"Host:Environment:test123\", \"111\", \"222\", \"GENERAL\", \"444\", \"555\")", class = "factor")), .Names = c("X",
"entityId", "displayName", "discoveredName", "firstSeenTimestamp",
"lastSeenTimestamp", "tags"), class = "data.frame", row.names = c(NA,
-1L))
There is a column called tags which should become a dataframe. I need to get rid of the first row in tags (which keep saying CONTEXTLESS, expand the second column in tags(make them columns. Lastly I need to insert the 3rd column values in tags under each expanded columns.
For example in needs to look like this:
structure(list(entityId = structure(1L, .Label = "HOST-123", class = "factor"),
displayName = structure(1L, .Label = "server1", class = "factor"),
discoveredName = structure(1L, .Label = "server1", class = "factor"),
firstSeenTimestamp = 1593860000000, lastSeenTimestamp = 1603210000000,
app1 = NA, client = structure(1L, .Label = "None", class = "factor"),
org = structure(1L, .Label = "Host:Environment:test123", class = "factor"),
app1.1 = 111L, data_center = 222L, purppose = structure(1L, .Label = "general", class = "factor"),
region = 444L, test = 555L), .Names = c("entityId", "displayName",
"discoveredName", "firstSeenTimestamp", "lastSeenTimestamp",
"app1", "client", "org", "app1.1", "data_center", "purppose",
"region", "test"), class = "data.frame", row.names = c(NA, -1L
))
I need to remove the 1st vector that keeps saying "contextless", add the second vector the columns. Each 2nd vector value should be a column name. Last vector should be values of the newly added columns.
If you are willing to drop the first "row" of garbage and then do a ittle cleanup of the parse-side-effects, then this might be a good place to start:
read.table(text=gsub("\\),", ")\n", test$tags[1]), sep=",", skip=1, #drops line
header=TRUE)
c.app1 client org app1 DATA_CENTER PURPOSE REGION Test.
1 c(NA NONE Host:Environment:test123 111 222 GENERAL 444 555)
The read.table function uses the scan function which doesn't know that "c(" and ")" are meaningful. The other alternative might be to try eval(parse(text= .)) (which would know that they are enclosing vectors) on the the second and third lines, but I couldn't see a clean way to do that. I initially tried to separate the lines using strsplit, but that caused me to loose the parens.
Here's a stab at some cleanup via that addition of some more gsub operations:
read.table(text=gsub("c\\(|\\)","", # gets rid of enclosing "c(" and ")"
gsub("\\),", "\n", # inserts line breaks
test$tags[1])),
sep=",", #lets commas be parsed
skip=1, #drops line
header=TRUE) # converts to colnames
app1 client org app1.1 DATA_CENTER PURPOSE REGION Test
1 NA NONE Host:Environment:test123 111 222 GENERAL 444 555
The reason for the added ".1" in the second instance of app1 is that R colnames in dataframes need to be unique unless you override that with check.names=FALSE
Here is a tidyverse approach
library(dplyr)
library(tidyr)
str2dataframe <- function(txt, keep = "all") {
# If you can confirm that all vectors are of the same length, then we can make them into columns of a data.frame
out <- eval(parse(text = paste0("data.frame(", as.character(txt),")")))
# rename columns as X1, X2, ...
nms <- make.names(seq_along(out), unique = TRUE)
if (keep == "all")
keep <- nms
`names<-`(out, nms)[, keep]
}
df %>%
mutate(
tags = lapply(tags, str2dataframe, -1L),
tags = lapply(tags, function(d) within(d, X2 <- make.unique(X2)))
) %>%
unnest(tags) %>%
pivot_wider(names_from = "X2", values_from = "X3")
df looks like this
> df
X entityId displayName discoveredName firstSeenTimestamp lastSeenTimestamp
1 1 HOST-123 server1 server1 1.59386e+12 1.60321e+12
tags
1 c("CONTEXTLESS", "CONTEXTLESS", "CONTEXTLESS", "CONTEXTLESS", "CONTEXTLESS", "CONTEXTLESS", "CONTEXTLESS", "CONTEXTLESS"), c("app1", "client", "org", "app1", "DATA_CENTER", "PURPOSE", "REGION", "Test"), c(NA, "NONE", "Host:Environment:test123", "111", "222", "GENERAL", "444", "555")
Output looks like this
# A tibble: 1 x 14
X entityId displayName discoveredName firstSeenTimestamp lastSeenTimestamp app1 client org app1.1 DATA_CENTER PURPOSE REGION Test
<int> <fct> <fct> <fct> <dbl> <dbl> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 1 HOST-123 server1 server1 1593860000000 1603210000000 NA NONE Host:Environment:test123 111 222 GENERAL 444 555

matching strings regex exact match

This thread follows on from this answered qestion: Matching strings loop over multiple columns
I opened a new thread as I would like to make an update to flag for exact matches only..
I have a table of key words in separate colums as follows:
#codes table
codes <- structure(
list(
Support = structure(
c(2L, 3L, NA),
.Label = c("",
"help", "questions"),
class = "factor"
),
Online = structure(
c(1L,
3L, 2L),
.Label = c("activities", "discussion board", "quiz", "sy"),
class = "factor"
),
Resources = structure(
c(3L, 2L, NA),
.Label = c("", "pdf",
"textbook"),
class = "factor"
)
),
row.names = c(NA,-3L),
class = "data.frame"
)
I also have a comments table structured as follows:
#comments table
comments <- structure(
list(
SurveyID = structure(
1:5,
.Label = c("ID_1", "ID_2",
"ID_3", "ID_4", "ID_5"),
class = "factor"
),
Open_comments = structure(
c(2L,
4L, 3L, 5L, 1L),
.Label = c(
"I could never get the pdf to download",
"I could never get the system to work",
"I didn’t get the help I needed on time",
"my questions went unanswered",
"staying motivated to get through the textbook",
"there wasn’t enough engagement in the discussion board"
),
class = "factor"
)
),
class = "data.frame",
row.names = c(NA,-5L)
)
What I am trying to do:
Search for an exact match keyword. The following working code has been provided by #Len Greski and #Ronak Shah from the previous thread (with huge thanks to both):
resultsList <- lapply(1:ncol(codes),function(x){
y <- stri_detect_regex(comments$Open_comments,paste(codes[[x]],collapse = "|"))
ifelse(y == TRUE,1,0)
})
results <- as.data.frame(do.call(cbind,resultsList))
colnames(results) <- colnames(codes)
mergedData <- cbind(comments,results)
mergedData
and
comments[names(codes)] <- lapply(codes, function(x)
+(grepl(paste0(na.omit(x), collapse = "|"), comments$Open_comments)))
Both work great but I have come across a snag and now need to match the keywords exactly. As per the example tables above, if I have a keyword "sy", the code will flag any comment with the word "system". I would modify either of the above pieces of code to flag the comment where only "sy" exact match is present.
Many thanks

How do I unnest a nested df and use the coumn name as part of the new column name?

I realize my title is probably a little confusing. I have some JSON that is a little confusing to unnest. I am trying to use the tidyverse.
Sample Data
df <- structure(list(long_abbr = c("Team11", "BBS"), short_name = c("Ac ",
"BK"), division = c("", ""), name = c("AC Slaters Muscles", "Broken Bats"
), abbr = c("T1", "T1"), owners = list(structure(list(commissioner = 0L,
name = "Chris Liss", id = "300144F8-79F4-11EA-8F25-9AE405472731"), class = "data.frame", row.names = 1L),
structure(list(commissioner = 1L, name = "Mark Ortin", id = "90849EF6-7427-11EA-95AA-4EEEAC7F8CD2"), class = "data.frame", row.names = 1L)),
id = c("1", "2"), logged_in_team = c(NA_integer_, NA_integer_
)), row.names = 1:2, class = "data.frame")
)
# Unnest Owners Information
df <- df %>%
unnest(owners)
I get the following error since I have duplicate columns that use name.
Error: Column names `name` and `id` must not be duplicated.
Is there an easy way to unnest the columns with a naming convention that takes the prefix owners (or in my case, I'd want it to take whatever the name of the column that hold the nested df is) before the nested columns. I.E. owners.commissioner, owners.name, owners.id. I'd also be interested in solutions that use camel case, and an underscore. I.E. ownersName, or owners_name.
set the argument names_sep:
df <- structure(
list(long_abbr = c("Team11", "BBS"),
short_name = c("Ac ", "BK"),
division = c("", ""),
name = c("AC Slaters Muscles", "Broken Bats"),
abbr = c("T1", "T1"),
owners = list(
structure(list(commissioner = 0L, name = "Chris Liss",
id = "300144F8-79F4-11EA-8F25-9AE405472731"),
class = "data.frame", row.names = 1L),
structure(list(commissioner = 1L, name = "Mark Ortin",
id = "90849EF6-7427-11EA-95AA-4EEEAC7F8CD2"),
class = "data.frame", row.names = 1L)),
id = c("1", "2"),
logged_in_team = c(NA_integer_, NA_integer_)),
row.names = 1:2, class = "data.frame"
)
tidyr::unnest(df, owners, names_sep = "_")
#> # A tibble: 2 x 10
#> long_abbr short_name division name abbr owners_commissi… owners_name
#> <chr> <chr> <chr> <chr> <chr> <int> <chr>
#> 1 Team11 "Ac " "" AC S… T1 0 Chris Liss
#> 2 BBS "BK" "" Brok… T1 1 Mark Ortin
#> # … with 3 more variables: owners_id <chr>, id <chr>, logged_in_team <int>
Created on 2020-04-26 by the reprex package (v0.3.0)
Does this solve your problem?

Lexis function not found in R

I am using this code from the R help guide in the Epi
package:
# A small bogus cohort
xcoh <- structure( list( id = c("A", "B", "C"),
birth = c("14/07/1952", "01/04/1954",
"10/06/1987"),
entry = c("04/08/1965", "08/09/1972",
"23/12/1991"),
exit = c("27/06/1997", "23/05/1995",
"24/07/1998"),
fail = c(1, 0, 1) ),
.Names = c("id", "birth", "entry", "exit",
"fail"),
row.names = c("1", "2", "3"),
class = "data.frame" )
# Define a Lexis object with timescales calendar time and
age
Lcoh <- Lexis( entry = list( per=entry ),
exit = list( per=exit,
age=exit-birth ),
exit.status = fail,
data = xcoh )
But I get this error:
Error in Lexis(entry = list(per = entry), exit = list(per = exit, age = exit - :
could not find function "Lexis"
Any thoughts?
Epi package first needs to be installed in the environment using:
install.packages("Epi")
And then the library for Epi needs to be loaded.
library(Epi)
Hence your code being modified as follows:
install.packages("Epi")
library(Epi)
xcoh <- structure( list( id = c("A", "B", "C"),
birth = c("14/07/1952", "01/04/1954",
"10/06/1987"),
entry = c("04/08/1965", "08/09/1972",
"23/12/1991"),
exit = c("27/06/1997", "23/05/1995",
"24/07/1998"),
fail = c(1, 0, 1) ),
.Names = c("id", "birth", "entry", "exit",
"fail"),
row.names = c("1", "2", "3"),
class = "data.frame" )
# Define a Lexis object with timescales calendar time and
Lcoh <- Lexis( entry = list( per=entry ),
exit = list( per=exit,
age=exit-birth ),
exit.status = fail,
data = xcoh )
Note: I have removed the line that says age. Assuming it is not relevant to the question posted here.

Resources