Error in R - more columns than column names - r

I am trying to read in a file that has 5 column headers, however in column 5 I have list of genes separated commas.
EC2 <- read.table("david.txt", header=TRUE)
Whenever I run the code below, I get the message
"more columns than column names."
I feel like the answer is probably simple. Any idea?
These are the first 3 lines:
Category ID Term PValue Genes
BP GO: 0006412 translation 2.711930356491234E-10 P0A7U3, P0A7K6, P68191, P0A7Q1, P0A7U7, P02359, P02358, P60438, P0A7L0, P0A7L3, P0A7L8, P0A7T3, P0A8A8, P69441, P0A8N5, P0A8N3, P02413, P0A7T7, P0AG63, P0A7D1, P0AA10 , P0ADY3, P0AG67, P0A7M2, P0A898, P0A9W3, P0A7M6, P0A7X3, P0AAR3, P0A7S3, P0A7S9, P0ADY7, P62399, P60624, P32132, P0ADZ4, P60723, P0C0U4, P0AG51, P0ADZ0, P0A7N9, P0A7J3, P0A7W7, P0AG59, P68679, P0C018 , P0A7R1, P0A7N4, P0A7R5, P0A7R9, P0AG44, P68919, P61175, P0A6K3, P0A7V0, P0A7M9, P0A7K2, P0A7V3, P0AG48
BP GO: 0051301 cell division 1.4011247561051483E-7 P0AC30, P17952, P75949, P0A6H1, P06966, P0A9R7, P64612, P36548, P60472, P45955, P0A855, P06136, P0A850, P6246, P0246, P024 P22523, P08373, P11880, P0AFB1, P60293, P18196, P0ABG4, P07026, P0A749, P29131, P0A6S5, P26648, P17443, P0ADS2, P0A8P6, P0A8P8, P0A6, P0A6A7, P0A8P8, P0A6, P0A6A7, P0A6, P0A6A7 P46889, P0A6F9, P0AE60, P0AD68, P19934, P0ABU9, P37773

Related

How to correct the output generated through str_detect/str_contains in R

I just have a column "methods_discussed" in CSV (link is https://github.com/pandas-dev/pandas/files/3496001/multiple_responses.zip)
multi<- read.csv("multiple_responses.csv", header = T)
This file having values name of family planning methods in the column name like:
methods_discussed
emergency female_sterilization male_sterilization iud NaN injectables male_condoms -77 male_condoms female_sterilization male_sterilization injectables iud male_condoms
I have created a vector of all but not -77 and NAN of 8 family planning methods as:
method_names = c('female_condoms', 'emergency', 'male_condoms', 'pill', 'injectables', 'iud', 'male_sterilization', 'female_sterilization')
I want to create new indicator variable based on the names of vector (method_names) in the existing data frame multi2, for this I used (I)
for (abc in method_names) {
multi2[abc]<- as.integer(str_detect(multi2$methods_discussed, fixed(abc)))
}
(II)
for (abc in method_names) {
multi2[abc]<- as.integer(str_contains(abc,multi2$methods_discussed))
}
(III) I also tried
for (abc in method_names) {
multi2[abc]<- as.integer(stri_detect_fixed(multi2$methods_discussed, abc))
}
but the output is not matching as expected. Probably male_sterilization is a substring of female_sterilization and it shows 1(TRUE) for male_sterilization for female_sterlization also. It is shown below in the Actual output at row 2. It must show 0 (FALSE) as female_sterilization is in the method_discussed column at row 2. I also don't want to generate any thing like 0/1 (False/True) (should be blank) corresponding to -77 and blank in method_discussed (All are highlighted in Expected output.
Actual Output
Expected Output
No error in code but only in the output.
You can add word boundaries to fix that issue.
multi<- read.csv("multiple_responses.csv", header = T)
method_names = c('female_condoms', 'emergency', 'male_condoms', 'pill', 'injectables', 'iud', 'male_sterilization', 'female_sterilization')
for (abc in method_names) {
multi[abc]<- as.integer(grepl(paste0('\\b', abc, '\\b'), multi$methods_discussed))
}
multi[multi$methods_discussed %in% c('', -77), method_names] <- ''

Filtering a large named list based on matches to a data frame

I don't work with lists in R often, so I'm sure there is a simple solution here. I am working with a large, named list of KEGG pathway IDs (test1). Within each KEGG pathway ID (koXXXXX) is a list of every gene within that pathway (K#####). I have a selection of important genes (test2) and their associated KEGG IDs (test2$kegg_id; K#####). I'd like to filter test1 to include only KEGG pathway IDs that contain at least one matching $kegg_id from test2 (i.e. contains a matching test2$kegg_id value). I'd like to retain all of the information from test_1, but just for pathways that have a matching K##### in test2$kegg_id.
I'd then like to create a character vector of just those KEGG pathway IDs.
Here is a subset of the data:
dput(test1)
list(`ko00970 Aminoacyl-tRNA biosynthesis` = c("K00604", "K01042",
"K01866", "K01867", "K01868", "K01869", "K01870", "K01872", "K01873",
"K01874", "K01875", "K01876", "K01878", "K01879", "K01880", "K01881",
"K01883", "K01884", "K01885", "K01886", "K01887", "K01889", "K01890",
"K01892", "K01893", "K02433", "K02434", "K02435", "K03330", "K03341",
"K03865", "K04566", "K04567", "K06868", "K07587", "K09482", "K09698",
"K09759", "K10837", "K11627", "K14163", "K14164", "K14218", "K14219",
"K14220", "K14221", "K14222", "K14223", "K14224", "K14225", "K14226",
"K14227", "K14228", "K14229", "K14230", "K14231", "K14232", "K14233",
"K14234", "K14235", "K14236", "K14237", "K14238", "K14239", "K22503",
"K24278"), `ko02010 ABC transporters` = c("K01995", "K01996",
"K01997", "K01998", "K01999", "K02000", "K02001", "K02002", "K02006",
"K02007", "K02008", "K02009", "K02010", "K02011", "K02012", "K02017",
"K02018", "K02020", "K02036", "K02037", "K02038", "K02040", "K02041",
"K02042", "K02044", "K02045", "K02046", "K02047", "K02048", "K02062",
"K02063", "K02064", "K02065", "K02066", "K02067", "K02071", "K02072",
"K02073", "K02193", "K02194", "K02195", "K02196", "K02424", "K02471",
"K03523", "K05031", "K05032", "K05033", "K05641", "K05642", "K05643",
"K05644", "K05645", "K05646", "K05647", "K05648", "K05649", "K05650",
"K05651", "K05652", "K05653", "K05654", "K05655", "K05656", "K05657",
"K05658", "K05659", "K05660", "K05661", "K05662", "K05663", "K05664",
"K05665", "K05666", "K05667", "K05668", "K05669", "K05670", "K05671",
"K05672", "K05673", "K05674", "K05675", "K05676", "K05677", "K05678",
"K05679", "K05680", "K05681", "K05682", "K05683", "K05684", "K05685",
"K05772", "K05773", "K05776", "K05813", "K05814", "K05815", "K05816",
"K05845", "K05846", "K05847", "K06073", "K06074", "K06159", "K06160",
"K06161", "K06726", "K06857", "K06858", "K06861", "K07091", "K07122",
"K07323", "K07335", "K08711", "K08712", "K09688", "K09689", "K09690",
"K09691", "K09692", "K09693", "K09694", "K09695", "K09696", "K09697",
"K09808", "K09810", "K09811", "K09812", "K09813", "K09814", "K09815",
"K09816", "K09817", "K09969", "K09970", "K09971", "K09972", "K09996",
"K09997", "K09998", "K09999", "K10000", "K10001", "K10002", "K10003",
"K10004", "K10005", "K10006", "K10007", "K10008", "K10009", "K10010",
"K10013", "K10014", "K10015", "K10016", "K10017", "K10018", "K10019",
"K10020", "K10021", "K10022", "K10023", "K10024", "K10025", "K10036",
"K10037", "K10038", "K10039", "K10040", "K10041", "K10094", "K10107",
"K10108", "K10109", "K10110", "K10111", "K10112", "K10117", "K10118",
"K10119", "K10188", "K10189", "K10190", "K10191", "K10192", "K10193",
"K10194", "K10195", "K10196", "K10197", "K10198", "K10199", "K10200",
"K10201", "K10202", "K10227", "K10228", "K10229", "K10232", "K10233",
"K10234", "K10235", "K10236", "K10237", "K10238", "K10240", "K10241",
"K10242", "K10439", "K10440", "K10441", "K10537", "K10538", "K10539",
"K10540", "K10541", "K10542", "K10543", "K10544", "K10545", "K10546",
"K10547", "K10548", "K10549", "K10550", "K10551", "K10552", "K10553",
"K10554", "K10555", "K10556", "K10557", "K10558", "K10559", "K10560",
"K10561", "K10562", "K10820", "K10823", "K10824", "K10829", "K10830",
"K10831", "K11004", "K11050", "K11051", "K11069", "K11070", "K11071",
"K11072", "K11073", "K11074", "K11075", "K11076", "K11077", "K11078",
"K11079", "K11080", "K11081", "K11082", "K11083", "K11084", "K11085",
"K11601", "K11602", "K11603", "K11604", "K11605", "K11606", "K11607",
"K11631", "K11632", "K11704", "K11705", "K11706", "K11707", "K11708",
"K11709", "K11710", "K11720", "K11950", "K11951", "K11952", "K11953",
"K11954", "K11955", "K11956", "K11957", "K11958", "K11959", "K11960",
"K11961", "K11962", "K11963", "K12292", "K12368", "K12369", "K12370",
"K12371", "K12372", "K12533", "K12536", "K12539", "K12541", "K13409",
"K13889", "K13890", "K13891", "K13892", "K13893", "K13894", "K13895",
"K13896", "K14698", "K14699", "K15495", "K15496", "K15497", "K15551",
"K15552", "K15553", "K15554", "K15555", "K15556", "K15557", "K15558",
"K15576", "K15577", "K15578", "K15579", "K15580", "K15581", "K15582",
"K15583", "K15584", "K15585", "K15586", "K15587", "K15598", "K15599",
"K15600", "K15628", "K15770", "K15771", "K15772", "K16012", "K16013",
"K16014", "K16199", "K16200", "K16201", "K16202", "K16299", "K16783",
"K16784", "K16785", "K16786", "K16787", "K16905", "K16906", "K16907",
"K16915", "K16916", "K16917", "K16918", "K16919", "K16920", "K16921",
"K16956", "K16957", "K16958", "K16959", "K16960", "K16961", "K16962",
"K16963", "K17062", "K17063", "K17073", "K17074", "K17076", "K17077",
"K17202", "K17203", "K17204", "K17205", "K17206", "K17207", "K17208",
"K17209", "K17210", "K17213", "K17214", "K17215", "K17234", "K17235",
"K17236", "K17237", "K17238", "K17239", "K17240", "K17241", "K17242",
"K17243", "K17244", "K17245", "K17246", "K17311", "K17312", "K17313",
"K17314", "K17315", "K17316", "K17317", "K17318", "K17319", "K17320",
"K17321", "K17322", "K17323", "K17324", "K17325", "K17326", "K17327",
"K17328", "K17329", "K17330", "K17331", "K18104", "K18216", "K18217",
"K18230", "K18231", "K18232", "K18233", "K18887", "K18888", "K18889",
"K18890", "K18891", "K18892", "K18893", "K18894", "K18895", "K19079",
"K19080", "K19083", "K19084", "K19226", "K19227", "K19228", "K19229",
"K19230", "K19309", "K19310", "K19340", "K19341", "K19349", "K19350",
"K19971", "K19972", "K19973", "K19975", "K19976", "K20344", "K20386",
"K20459", "K20460", "K20461", "K20490", "K20491", "K20492", "K20494",
"K22921", "K22922", "K22923", "K23055", "K23056", "K23057", "K23058",
"K23059", "K23060", "K23061", "K23062", "K23063", "K23064", "K23125",
"K23163", "K23181", "K23182", "K23183", "K23184", "K23185", "K23186",
"K23187", "K23188", "K23227", "K23228", "K23508", "K23509", "K23510",
"K23511", "K23512", "K23513", "K23535", "K23536", "K23537", "K23545",
"K23546", "K23547"), `ko02020 Two-component system` = c("K00027",
"K00066", "K00244", "K00245", "K00246", "K00247", "K00370", "K00371",
"K00373", "K00374", "K00404", "K00405", "K00406", "K00407", "K00410",
"K00411", "K00412", "K00413", "K00424", "K00425", "K00426", "K00494",
"K00575", "K00626", "K00689", "K00692", "K00990", "K01034", "K01035",
"K01051", "K01077", "K01104", "K01113", "K01179", "K01425", "K01467",
"K01545", "K01546", "K01547", "K01548", "K01643", "K01644", "K01646",
"K01791", "K01910", "K01915", "K01991", "K02040", "K02106", "K02252",
"K02253", "K02259", "K02313", "K02398", "K02402", "K02403", "K02405",
"K02406", "K02472", "K02488", "K02489", "K02490", "K02491", "K02556",
"K02584", "K02650", "K02657", "K02658", "K02659", "K02660", "K02661",
"K02667", "K02668", "K03092", "K03367", "K03400", "K03406", "K03407",
"K03408", "K03412", "K03413", "K03415", "K03532", "K03533", "K03563",
"K03620", "K03739", "K03740", "K03776", "K04751", "K04771", "K05338",
"K05339", "K05597", "K05874", "K05875", "K05876", "K05877", "K05964",
"K05966", "K06046", "K06080", "K06281", "K06282", "K06347", "K06375",
"K06596", "K06597", "K06598", "K07165", "K07260", "K07636", "K07637",
"K07638", "K07639", "K07640", "K07641", "K07642", "K07643", "K07644",
"K07645", "K07646", "K07647", "K07648", "K07649", "K07650", "K07651",
"K07652", "K07653", "K07654", "K07655", "K07656", "K07657", "K07658",
"K07659", "K07660", "K07661", "K07662", "K07663", "K07664", "K07665",
"K07666", "K07667", "K07668", "K07669", "K07670", "K07671", "K07672",
"K07673", "K07674", "K07675", "K07676", "K07677", "K07678", "K07679",
"K07680", "K07681", "K07682", "K07683", "K07684", "K07685", "K07686",
"K07687", "K07688", "K07689", "K07690", "K07691", "K07692", "K07693",
"K07694", "K07695", "K07696", "K07697", "K07698", "K07699", "K07700",
"K07701", "K07702", "K07703", "K07704", "K07705", "K07706", "K07707",
"K07708", "K07709", "K07710", "K07711", "K07712", "K07713", "K07714",
"K07715", "K07716", "K07717", "K07718", "K07719", "K07720", "K07768",
"K07769", "K07770", "K07771", "K07772", "K07773", "K07774", "K07775",
"K07776", "K07777", "K07778", "K07780", "K07781", "K07782", "K07783",
"K07784", "K07785", "K07786", "K07787", "K07788", "K07789", "K07790",
"K07792", "K07793", "K07794", "K07795", "K07796", "K07797", "K07798",
"K07799", "K07800", "K07801", "K07803", "K07804", "K07805", "K07806",
"K07810", "K07811", "K07813", "K08082", "K08083", "K08348", "K08349",
"K08350", "K08357", "K08358", "K08359", "K08372", "K08475", "K08476",
"K08477", "K08478", "K08479", "K08641", "K08738", "K08926", "K08927",
"K08928", "K08929", "K08930", "K08939", "K09474", "K09475", "K09476",
"K09477", "K09696", "K09697", "K10001", "K10002", "K10003", "K10004",
"K10125", "K10126", "K10255", "K10681", "K10682", "K10697", "K10715",
"K10850", "K10851", "K10909", "K10910", "K10911", "K10912", "K10913",
"K10914", "K10916", "K10941", "K10942", "K10943", "K11103", "K11230",
"K11231", "K11232", "K11233", "K11326", "K11327", "K11328", "K11329",
"K11330", "K11331", "K11332", "K11354", "K11355", "K11356", "K11357",
"K11382", "K11383", "K11384", "K11443", "K11444", "K11520", "K11521",
"K11522", "K11523", "K11524", "K11525", "K11526", "K11601", "K11602",
"K11603", "K11614", "K11615", "K11616", "K11617", "K11618", "K11619",
"K11620", "K11621", "K11622", "K11623", "K11624", "K11625", "K11626",
"K11629", "K11630", "K11631", "K11632", "K11633", "K11634", "K11635",
"K11636", "K11637", "K11638", "K11639", "K11640", "K11641", "K11688",
"K11689", "K11690", "K11691", "K11692", "K11711", "K11712", "K12292",
"K12293", "K12294", "K12295", "K12296", "K12340", "K12415", "K12530",
"K12531", "K12532", "K13040", "K13041", "K13061", "K13486", "K13487",
"K13488", "K13489", "K13490", "K13491", "K13532", "K13533", "K13584",
"K13587", "K13588", "K13589", "K13598", "K13599", "K13815", "K13816",
"K13924", "K13927", "K13991", "K13994", "K14188", "K14205", "K14978",
"K14979", "K14980", "K14981", "K14982", "K14983", "K14986", "K14987",
"K14988", "K14989", "K15011", "K15012", "K15739", "K15841", "K15850",
"K15851", "K15853", "K15854", "K15859", "K15860", "K15861", "K15862",
"K16692", "K16712", "K16713", "K17060", "K17061", "K18072", "K18073",
"K18093", "K18094", "K18095", "K18321", "K18322", "K18323", "K18324",
"K18326", "K18344", "K18345", "K18346", "K18347", "K18348", "K18349",
"K18350", "K18351", "K18352", "K18353", "K18354", "K18444", "K18856",
"K18866", "K18940", "K18941", "K18986", "K18987", "K19077", "K19078",
"K19079", "K19080", "K19081", "K19082", "K19083", "K19084", "K19609",
"K19610", "K19611", "K19615", "K19616", "K19617", "K19618", "K19620",
"K19621", "K19622", "K19624", "K19641", "K19661", "K19666", "K19667",
"K19668", "K19690", "K19691", "K19692", "K20263", "K20264", "K20339",
"K20340", "K20482", "K20483", "K20484", "K20485", "K20486", "K20487",
"K20488", "K20489", "K20490", "K20491", "K20492", "K20494", "K20552",
"K20973", "K20974", "K20975", "K20976", "K20977", "K20978", "K22501",
"K23236", "K23514", "K23548", "K23549"), `ko02024 Quorum sensing` = c("K00494",
"K01114", "K01218", "K01318", "K01364", "K01399", "K01497", "K01580",
"K01626", "K01635", "K01657", "K01658", "K01728", "K01897", "K01995",
"K01996", "K01997", "K01998", "K01999", "K02031", "K02032", "K02033",
"K02034", "K02035", "K02052", "K02053", "K02054", "K02055", "K02250",
"K02251", "K02252", "K02253", "K02402", "K02403", "K02490", "K03070",
"K03071", "K03073", "K03075", "K03076", "K03106", "K03110", "K03210",
"K03217", "K03400", "K03666", "K06046", "K06352", "K06353", "K06354",
"K06355", "K06356", "K06358", "K06359", "K06360", "K06361", "K06363",
"K06364", "K06365", "K06366", "K06369", "K06375", "K06998", "K07173",
"K07344", "K07645", "K07666", "K07667", "K07680", "K07691", "K07692",
"K07699", "K07706", "K07707", "K07711", "K07715", "K07781", "K07782",
"K07800", "K07813", "K08321", "K08605", "K08642", "K08777", "K09823",
"K09936", "K10555", "K10556", "K10557", "K10558", "K10715", "K10823",
"K10909", "K10910", "K10911", "K10912", "K10913", "K10914", "K10915",
"K10916", "K10917", "K11006", "K11007", "K11031", "K11033", "K11034",
"K11035", "K11036", "K11037", "K11039", "K11063", "K11216", "K11530",
"K11531", "K11752", "K12257", "K12292", "K12293", "K12294", "K12295",
"K12296", "K12415", "K12789", "K12990", "K13060", "K13061", "K13062",
"K13063", "K13075", "K13815", "K13816", "K14051", "K14645", "K14982",
"K14983", "K15580", "K15581", "K15582", "K15583", "K15654", "K15655",
"K15656", "K15657", "K15850", "K15851", "K15852", "K15853", "K15854",
"K16619", "K17940", "K18000", "K18001", "K18002", "K18003", "K18096",
"K18098", "K18099", "K18100", "K18101", "K18139", "K18304", "K18306",
"K18307", "K18315", "K18316", "K18317", "K18318", "K18319", "K19666",
"K19731", "K19732", "K19733", "K19734", "K19735", "K20086", "K20087",
"K20088", "K20089", "K20090", "K20248", "K20249", "K20250", "K20252",
"K20253", "K20256", "K20257", "K20258", "K20259", "K20260", "K20261",
"K20262", "K20263", "K20264", "K20265", "K20266", "K20267", "K20268",
"K20269", "K20270", "K20271", "K20272", "K20273", "K20274", "K20275",
"K20276", "K20277", "K20321", "K20322", "K20323", "K20324", "K20325",
"K20326", "K20327", "K20328", "K20329", "K20330", "K20331", "K20332",
"K20333", "K20334", "K20335", "K20336", "K20337", "K20338", "K20339",
"K20340", "K20341", "K20342", "K20343", "K20344", "K20345", "K20373",
"K20374", "K20375", "K20376", "K20377", "K20378", "K20379", "K20380",
"K20381", "K20382", "K20383", "K20384", "K20385", "K20386", "K20387",
"K20388", "K20389", "K20390", "K20391", "K20480", "K20481", "K20482",
"K20483", "K20484", "K20485", "K20486", "K20487", "K20488", "K20489",
"K20490", "K20491", "K20492", "K20494", "K20527", "K20528", "K20529",
"K20530", "K20531", "K20532", "K20533", "K20539", "K20540", "K20552",
"K20554", "K20555", "K22954", "K22955", "K22956", "K22957", "K22968",
"K23133"), `ko02025 Biofilm formation - Pseudomonas aeruginosa` = c("K01657",
"K01658", "K01768", "K02398", "K02405", "K02657", "K02658", "K02659",
"K02660", "K03563", "K03651", "K06596", "K06598", "K07678", "K07689",
"K10914", "K10941", "K11444", "K11890", "K11891", "K11893", "K11895",
"K11900", "K11901", "K11902", "K11903", "K11907", "K11912", "K11913",
"K11915", "K12990", "K12992", "K13060", "K13061", "K13487", "K13488",
"K13489", "K13490", "K13491", "K16011", "K17940", "K18000", "K18001",
"K18002", "K18003", "K18099", "K18100", "K18101", "K18304", "K19291",
"K19735", "K20257", "K20258", "K20259", "K20968", "K20969", "K20970",
"K20971", "K20972", "K20973", "K20974", "K20975", "K20976", "K20977",
"K20978", "K20987", "K20997", "K20998", "K20999", "K21000", "K21001",
"K21002", "K21003", "K21004", "K21005", "K21006", "K21007", "K21008",
"K21009", "K21010", "K21011", "K21012", "K21019", "K21020", "K21021",
"K21022", "K21023", "K21024", "K21025", "K23127"), `ko02026 Biofilm formation - Escherichia coli` = c("K00688",
"K00694", "K00703", "K00975", "K01991", "K02398", "K02402", "K02403",
"K02405", "K02425", "K02777", "K03087", "K03563", "K03566", "K03567",
"K04333", "K04334", "K04335", "K04336", "K04761", "K05851", "K06204",
"K07173", "K07638", "K07648", "K07659", "K07676", "K07677", "K07678",
"K07687", "K07689", "K07773", "K07781", "K07782", "K10914", "K11531",
"K11931", "K11935", "K11936", "K11937", "K12687", "K14051", "K18502",
"K18504", "K18509", "K18515", "K18516", "K18518", "K18521", "K18522",
"K18523", "K18528", "K18968", "K21084", "K21085", "K21086", "K21087",
"K21088", "K21089", "K21090", "K21091"))
And a truncated dataframe with interesting genes
dput(test2)
structure(list(gene_id = c("G6381", "G12285", "G10911", "G17366",
"G3593", "G17753"), kegg_id = c("K18523", "K19009", "K07782",
"K02398", "K21407", "K00922")), row.names = c(NA, 6L), class = "data.frame")
If we need to get the corresponding 'gene_id', create a named vector from the 'test2', loop over the list ('test1'), match those 'kegg_id' with the named vector to extract the 'gene_id' and remove the non-matching elements with na.omit
nm1 <- with(test2, setNames(gene_id, kegg_id))
lst1 <- lapply(test1, function(x) as.vector(na.omit(nm1[x])))
If we need to Filter the original list
test1[lengths(lst1) > 0]
Or to Filter the subset list
lst1[lengths(lst1) > 0]

Too many values in one argument case_when?

I am not sure why this code doesnt run. But if it breaks it into 2 smaller chunks then it works. Is there anyway i can run this whole chunk at once?
When I run this code it appears the plus sign in the console and I couldnt click run in R markdown
dataT4<- dataT4 %>% mutate (coupleID=case_when(id==10011~1, id==10021~2,
id==10032~3, id==10041~4,id==10062~5, id==10071~6,id==10082~7, id==10092~8,
id==10112~9, id==10121~10,id== 10131~11, id==10142~12, id==10151~13,
id==10162~14,id==10171~15, id==10181~16, id==10202~17, id==10212~18, id==10221~19,
id==10232~20, id==10242~21, id==10251~22, id==10262~23, id==10271~24, id==10292~25,
id==10311~26, id==10332~27, id==10342~28, id==10351~29, id==10361~30, id==10372~31,
id==10382~32, id==10391~33, id==10401~34, id==10412~35, id==10421~36, id==10432~37,
id==10442~38, id==10452~39, id==10461~40, id==10471~41, id==10481~42, id==10492~43,
id==10501~44, id==10511~45, id==10521~46, id==10532~47, id==10542~48, id==10562~49,
id==10581~50, id==10592~51, id==10602~52, id==10611~53, id==10642~54, id==10651~55,
id==10662~56, id==10672~57, id==10681~58, id==10702~59, id==10761~60, id==10782~61,
id==10791~62, id==10802~63, id==10812~64, id==10822~65, id==10831~66, id==10852~67,
id==10862~68, id==10881~69, id==10912~70, id==10942~71, id==10951~72, id==10962~73,
id==10972~74, id==10982~75, id==10992~76, id==11001~77, id==11031~78, id==11052~79,
id==11061~80, id==11072~81, id==11092~82, id==11101~83, id==11112~84, id==11171~85,
id==11192~86, id==11202~87, id==11221~88, id==11231~89, id==11252~90, id==11261~91,
id==11281~92, id==11292~93, id==11322~94, id==11332~95, id==11372~96, id==11382~97,
id==11391~98, id==11411~99, id==11422~100, id==11441~101, id==11461~102,
id==11471~103, id==11492~104, id==11501~105, id==11512~106,
id==11521~107,id==11562~108,id==11591~109, id==11601~110, id==11611~111,
id==11621~112, id==11632~113, id==11641~114, id==11651~115, id==11662~116,
id==11682~117,id==11691~118,id==11712~119, id==11771~120, id==11782~121,
id==11811~122, id==11821~123, id==11831~124, id==11841~125, id==11852~126,
id==11861~127,id==11872~128,id==11882~129, id==11892~130, id==11902~131,
id==11911~132, id==11922~133, id==11961~134, id==11972~135,
id==11992~136,id==12011~137, id==12041~138, id==12052~139, id==12061~140,
id==12081~141, id==12101~142, id==12111~143, id==12122~144, id==12131~145,
id==12142~146, id==12151~147, id==12161~148, id==12182~149, id==12191~150,
id==12201~151, id==12232~152, id==12261~153, id==12272~154, id==12322~155,
id==12332~156, id==12342~157, id==12352~158, id==12382~159, id==12392~160,
id==12401~161, id==12411~162, id==12421~163, id==12432~164, id==12441~165,
id==12451~166, id==12461~167, id==12471~168, id==12492~169, id==12501~170,
id==12512~171, id==12521~172, id==12542~173, id==12552~174, id==12562~175,
id==12572~176, id==12581~177, id==12612~178, id==12622~179, id==12652~180,
id==12662~181, id==12682~182, id==12701~183, id==12712~184, id==12731~185,
id==12741~186, id==12762~187, id==12792~188, id==12802~189, id==12811~190,
id==12822~191, id==12832~192, id==12841~193, id==12862~194, id==12882~195,
id==12891~196, id==12911~197, id==12931~198, id==12942~199, id==12952~200,
id==12961~201, id==12972~202, id==13011~203, id==13021~204, id==13032~205,
id==13042~206, id==13061~207, id==13082~208, id==13102~209, id==13111~210,
id==13132~211, id==13142~212, id==13151~213, id==13162~214, id==13191~215,
id==13202~216, id==13212~217, id==13262~218, id==13271~219, id==13281~220,
id==13311~221, id==13322~222, id==13331~223, id==13351~224, id==13361~225,
id==13372~226, id==13422~227, id==13432~228, id==13452~229, id==13462~230,
id==13472~231, id==13481~232, id==13501~233, id==13511~234, id==13521~235,
id==13561~236, id==13571~237, id==13601~238, id==13612~239, id==13632~240,
id==13642~241, id==13652~242, id==13662~243, id==13671~244, id==13681~245,
id==13691~246, id==13701~247, id==13711~248, id==13732~249, id==13742~250,
id==13752~251, id==13782~252, id==13842~253, id==13802~254, id==13822~255,
id==13851~256, id==13872~257, id==13882~258, id==13892~259, id==13912~260,
id==13921~261, id==13932~262, id==13941~263, id==13952~264, id==13971~265,
id==13981~266, id==13992~267, id==14011~268, id==14021~269, id==14031~270,
id==14041~271, id==14052~272, id==14072~273, id==14111~274, id==14131~275,
id==14162~276, id==14172~277, id==14182~278, id==14191~279, id==14212~280,
id==14222~281, id==14241~282, id==14261~283, id==14291~284, id==14302~285,
id==14312~286, id==14321~287, id==14342~288, id==14352~289, id==14362~290,
id==14371~291, id==14392~292, id==14402~293, id==14432~294, id==14451~295,
id==14472~296, id==14482~297, id==14491~298, id==14511~299, id==14521~300,
id==14531~301, id==14541~302, id==14552~303, id==14562~304, id==14572~305,
id==14581~306, id==14592~307, id==14602~308, id==14621~309, id==14632~310,
id==14641~311, id==14651~312, id==14671~313, id==14681~314, id==14692~315,
id==14712~316, id==14722~317, id==14732~318, id==14741~319, id==14751~320,
id==14781~321, id==14792~322, id==14812~323, id==14842~324, id==14852~325,
id==14862~326, id==14882~327, id==14892~328, id==14901~329, id==11012~330))
As a single line it is just too long to be parsed. You may be better served putting all of these values into a separate data.frame and merging it into your data instead of using a giant case_when.
Usually when I want to do something like this I'll open Excel or something similar, put column names in the first row (here that would be id and couple_id) and enter all of the values, save it as a CSV, then read the CSV into R as a data.frame, and then merge it.
You can use rank:
dataT4 <- data.frame(id=c(10011, 10021, 10382, 11012))
dataT4 <- dataT4 %>% mutate (coupleID=rank(id))
dataT4
id coupleID
1 10011 1
2 10021 2
3 10382 3
4 11012 4
Data:
dataT4 <- data.frame(id=c(10011, 10021, 10382, 11012))

Interpolating using approxm function goes wrong for one column

I have a data frame which contains three columns.
A|B|c
10|0|0
10|5|0
10|10|0
15|0|0
15|5|0
15|10|0
When I interpolate the above data frame:
df<-approxm(df,206,method="linear")
Here is the output:
A|B|c
10|0|0
10|1|0
10|2|0
10|3|0
10|4|0
10|5|0
10|6|0
10|7|0
10|8|0
10|9|0
10|10|0
11|8|0
12|6|0
13|4|0
14|2|0
15|0|0
15|1|0
15|2|0
15|3|0
15|4|0
15|5|0
15|6|0
15|7|0
15|8|0
15|9|0
15|10|0
Here in this output Column A with values 11,12,13 and 14 are not interpolated properly.
My Expected output is:
A|B|c
10|0|0
10|1|0
10|2|0
10|3|0
10|4|0
10|5|0
10|6|0
10|7|0
10|8|0
10|9|0
10|10|0
11|0|0
11|1|0
11|2|0
11|3|0
11|4|0
11|5|0
11|6|0
11|7|0
11|8|0
11|9|0
11|10|0
12|0|0
12|1|0
12|2|0
12|3|0
12|4|0
12|5|0
12|6|0
12|7|0
12|8|0
12|9|0
12|10|0
13|0|0
13|1|0
13|2|0
13|3|0
13|4|0
13|5|0
13|6|0
13|7|0
13|8|0
13|9|0
13|10|0
14|0|0
14|1|0
14|2|0
14|3|0
14|4|0
14|5|0
14|6|0
14|7|0
14|8|0
14|9|0
14|10|0
15|0|0
15|1|0
15|2|0
15|3|0
15|4|0
15|5|0
15|6|0
15|7|0
15|8|0
15|9|0
15|10|0
This is my expected output.
But I'm not getting this expected output.
I don't know where my code gets wrong.
Can someone help me out?
Complete function worked out.
tidyr::complete(df,A=full_seq(A,1),nesting(B=full_seq(B,1)),fill=list(c=0))

How to access a particular sub-set of data in R Table

I have tabular (long format) data with a number of variables. I want to load the csv once and then access a particular sub-set later on from it. For example:
Blog,Region,Dim1
Individual,PK,-4.75
Individual,PK,-5.69
Individual,PK,-0.27
Individual,PK,-2.76
Individual,PK,-8.24
Individual,PK,-12.51
Individual,PK,-1.28
Individual,PK,0.95
Individual,PK,-5.96
Individual,PK,-8.81
Individual,PK,-8.46
Individual,PK,-6.15
Individual,PK,-13.98
Individual,PK,-16.43
Individual,PK,-4.09
Individual,PK,-11.06
Individual,PK,-9.04
Individual,PK,-8.56
Individual,PK,-8.13
Individual,PK,-14.46
Individual,PK,-4.21
Individual,PK,-4.96
Individual,PK,-5.48
Multiwriter,PK,-3.31
Multiwriter,PK,-5.62
Multiwriter,PK,-4.48
Multiwriter,PK,-6.08
Multiwriter,PK,-4.68
Multiwriter,PK,-6.92
Multiwriter,PK,-11.29
Multiwriter,PK,6.66
Multiwriter,PK,1.66
Multiwriter,PK,3.39
Multiwriter,PK,0.06
Multiwriter,PK,4.11
Multiwriter,PK,-1.57
Multiwriter,PK,1.33
Multiwriter,PK,-6.91
Multiwriter,PK,4.87
Multiwriter,PK,-10.87
Multiwriter,PK,6.25
Multiwriter,PK,-0.68
Multiwriter,PK,0.11
Multiwriter,PK,0.71
Multiwriter,PK,-3.8
Multiwriter,PK,-1.75
Multiwriter,PK,-5.38
Multiwriter,PK,1.24
Multiwriter,PK,-5.59
Multiwriter,PK,4.98
Multiwriter,PK,0.98
Multiwriter,PK,7.47
Multiwriter,PK,-5.25
Multiwriter,PK,-14.24
Multiwriter,PK,-1.55
Multiwriter,PK,-8.44
Multiwriter,PK,-7.67
Multiwriter,PK,5.85
Multiwriter,PK,6
Multiwriter,PK,-7.53
Multiwriter,PK,1.59
Multiwriter,PK,-9.48
Multiwriter,PK,-3.99
Multiwriter,PK,-5.82
Multiwriter,PK,1.62
Multiwriter,PK,-4.14
Multiwriter,PK,1.06
Multiwriter,PK,4.52
Multiwriter,PK,-5.6
Multiwriter,PK,-3.38
Multiwriter,PK,4.82
Multiwriter,PK,0.76
Multiwriter,PK,-4.95
Multiwriter,PK,-2.05
Column,PK,1.64
Column,PK,5.2
Column,PK,2.8
Column,PK,1.93
Column,PK,2.36
Column,PK,4.77
Column,PK,-1.92
Column,PK,-2.94
Column,PK,4.58
Column,PK,2.98
Column,PK,9.07
Column,PK,8.5
Column,PK,1.23
Column,PK,8.97
Column,PK,4.1
Column,PK,7.25
Column,PK,0.02
Column,PK,-3.48
Column,PK,1.01
Column,PK,2.7
Column,PK,-2.32
Column,PK,3.22
Column,PK,-2.37
Column,PK,-13.28
Column,PK,-4.36
Column,PK,2.91
Column,PK,4.4
Column,PK,-5.07
Column,PK,-10.24
Column,PK,12.8
Column,PK,1.92
Column,PK,13.24
Column,PK,12.32
Column,PK,12.7
Column,PK,9.95
Column,PK,12.11
Column,PK,7.63
Column,PK,11.09
Column,PK,13.04
Column,PK,12.06
Column,PK,9.49
Column,PK,8.64
Column,PK,10.05
Column,PK,6.4
Column,PK,9.64
Column,PK,3.53
Column,PK,4.78
Column,PK,9.54
Column,PK,8.49
Column,PK,2.56
Column,PK,8.82
Column,PK,-3.59
Column,PK,-3.31
Column,PK,10.05
Column,PK,-0.28
Column,PK,-0.5
Column,PK,-6.37
Column,PK,2.97
Column,PK,4.49
Column,PK,9.14
Column,PK,4.5
Column,PK,8.6
Column,PK,6.76
Column,PK,3.67
Column,PK,6.79
Column,PK,5.77
Column,PK,10.5
Column,PK,1.57
Column,PK,9.47
Individual,US,-9.85
Individual,US,-2.73
Individual,US,-0.32
Individual,US,-0.94
Individual,US,-7.51
Individual,US,-8.21
Individual,US,-7.33
Individual,US,-5.1
Individual,US,-1.58
Individual,US,-2.49
Individual,US,-1.36
Individual,US,-5.76
Individual,US,-0.48
Individual,US,-3.38
Individual,US,2.42
Individual,US,-1.71
Individual,US,-2.17
Individual,US,-2.81
Individual,US,-0.64
Individual,US,-8.88
Individual,US,-1.53
Individual,US,-1.42
Individual,US,-17.89
Individual,US,7.1
Individual,US,-4.12
Individual,US,-0.83
Individual,US,2.05
Individual,US,-5.87
Individual,US,-0.15
Individual,US,5.78
Individual,US,-1.96
Individual,US,1.77
Individual,US,-0.67
Individual,US,-10.23
Individual,US,3.37
Individual,US,-1.18
Individual,US,6.94
Individual,US,-3.86
Individual,US,2.21
Individual,US,-11.64
Individual,US,-14.71
Individual,US,-12.74
Individual,US,-6.24
Individual,US,-13.64
Individual,US,-8.53
Individual,US,-10.4
Individual,US,-6.24
Individual,US,-12.15
Individual,US,-15.96
Multiwriter,US,11.27
Multiwriter,US,3.51
Multiwriter,US,4.05
Multiwriter,US,3.81
Multiwriter,US,8.56
Multiwriter,US,6.36
Multiwriter,US,-8.99
Multiwriter,US,3.36
Multiwriter,US,3.18
Multiwriter,US,-5.22
Multiwriter,US,-8.61
Multiwriter,US,-9.02
Multiwriter,US,-6.32
Multiwriter,US,0.53
Multiwriter,US,11.03
Multiwriter,US,-5.7
Multiwriter,US,4
Multiwriter,US,-3.55
Multiwriter,US,2.79
Multiwriter,US,4.61
Multiwriter,US,-3.8
Multiwriter,US,-9.62
Multiwriter,US,-8.37
Multiwriter,US,-2.18
Multiwriter,US,-1.64
Multiwriter,US,-9.99
Multiwriter,US,-1.44
Multiwriter,US,-4.45
Multiwriter,US,-7.84
Multiwriter,US,-11.6
Multiwriter,US,-2.71
Multiwriter,US,1.2
Multiwriter,US,-6.44
Multiwriter,US,-2.64
Multiwriter,US,-11.59
Multiwriter,US,-5.9
Multiwriter,US,-3.78
Multiwriter,US,-14.99
Multiwriter,US,1.32
Multiwriter,US,-6.55
Multiwriter,US,0.92
Multiwriter,US,-5.61
Multiwriter,US,-14.16
Multiwriter,US,-10.03
Multiwriter,US,-7.08
Multiwriter,US,0.62
Multiwriter,US,-5.43
Multiwriter,US,-1.11
Multiwriter,US,-11.37
Multiwriter,US,-13.37
Multiwriter,US,-12.71
Multiwriter,US,1.86
Multiwriter,US,14.11
Multiwriter,US,-5.24
Multiwriter,US,-6.77
Multiwriter,US,-4.79
Multiwriter,US,-6.22
Multiwriter,US,3.66
Multiwriter,US,-2.65
Multiwriter,US,-2.87
Multiwriter,US,-12.32
Multiwriter,US,-7.48
Multiwriter,US,-4.84
Multiwriter,US,0.44
Column,US,8.93
Column,US,10.29
Column,US,8.31
Column,US,5.88
Column,US,8.87
Column,US,-2.9
Column,US,3.71
Column,US,8.43
Column,US,1.47
Column,US,3.05
Column,US,-1.78
Column,US,1.14
Column,US,7.2
Column,US,5.22
Column,US,5.53
Column,US,8.14
Column,US,-2.22
Column,US,0.89
Column,US,2.5
Column,US,6.77
Column,US,3.63
Column,US,2.86
Column,US,3.7
Column,US,7.52
Column,US,3.12
Column,US,0
Column,US,0.28
Column,US,6.86
Column,US,-0.32
Column,US,2.92
Column,US,-1.14
Column,US,-1.11
Column,US,4.42
Column,US,4.37
Column,US,1.09
Column,US,-3.66
Column,US,7.09
Column,US,-11.02
Column,US,-0.78
Column,US,8.44
Column,US,4.88
Column,US,-3.9
Column,US,-0.21
Column,US,6.48
Column,US,4.49
Column,US,-8.89
Column,US,-0.73
Column,US,1.76
Column,US,-4.31
Column,US,4.63
Column,US,8.91
Column,US,3.55
Column,US,6.69
Column,US,-4.45
Column,US,9.82
Column,US,6.79
Column,US,1.84
Column,US,8.97
Column,US,2.38
Column,US,4.68
Column,US,9.23
Column,US,2.85
Column,US,4.19
Column,US,2.43
Column,US,5.48
Column,US,-1.08
Column,US,7.47
Column,US,3.13
Column,US,-0.42
Column,US,-0.71
Column,US,6.51
Column,US,6.34
Column,US,3.94
Column,US,5.46
Column,US,0.39
Column,US,8.15
Column,US,7.99
Column,US,6.26
Column,US,7.91
Column,US,14.18
Column,US,7.41
Column,US,7.16
Column,US,5.6
Column,US,7.51
Column,US,6.24
Column,US,3.67
Column,US,3.84
Column,US,2.37
Column,US,-3.5
Column,US,5.02
Column,US,-6.04
Column,US,5.36
Column,US,1.98
Column,US,7.79
Column,US,0.02
Column,US,-1.9
Column,US,-2.81
Column,US,10.69
Column,US,1.65
Column,US,8.19
Column,US,1.92
How can I access values related to 'Column' with 'US' subset from 'Dim1'?
I have tried to read about 'data frame, table, factor' and 'matrix' data types in R, but I could not find help how to access a subset of a complex table like this. (My real data includes additional vectors of numerical values like Dim1... i.e. Dim2, Dim3, Dim4, Dim5). But that should be the same in principle so I have not included that in this example.
I assume you want to select only the rows which have 'Column' and 'US'.
If so you can select the subset using:
data[data[,1]=='Column' & data[,2]=='US',]

Resources