adaboost model gives a vector of output for one row - r

I have built a model using Adaboost. When I give one row as input, this is the output I get. I was expecting to get just one number as the prediction
> predict(Model,testset[1,],type="prob")[,2]
[1] 0.5159268 0.5143351 0.5135043 0.5127763 0.5116162 0.5097892 0.5098299 0.5098701
[9] 0.5083176 0.5088486 0.5073487 0.5082424 0.5078101 0.5073640 0.5053638 0.5066038
[17] 0.5063418 0.5055067 0.5060952 0.5051869 0.5050157 0.5038692 0.5040837 0.5052188
[25] 0.5040825 0.5046496 0.5050795 0.5042205 0.4976465 0.5046798 0.5047607 0.4957011
[33] 0.5048601 0.5039299 0.5032739 0.5042044 0.5044005 0.5044902 0.5037352 0.4981865
[41] 0.5021579 0.5038746 0.5043289 0.5032334 0.5051926 0.5021917 0.5015447 0.5029390
[49] 0.4951465 0.5033675
> predict(Model,testset[2,],type="prob")[,2]
[1] 0.5159268 0.5143351 0.5135043 0.5127763 0.5116162 0.5097892 0.5098299 0.5098701
[9] 0.5083176 0.5088486 0.5073487 0.5082424 0.4921899 0.5073640 0.5053638 0.5066038
[17] 0.5063418 0.5055067 0.5060952 0.5051869 0.5050157 0.5038692 0.5040837 0.5052188
[25] 0.5040825 0.5046496 0.5050795 0.5042205 0.5023535 0.4953202 0.5047607 0.5042989
[33] 0.4951399 0.5039299 0.4967261 0.5042044 0.5044005 0.4955098 0.5037352 0.5018135
[41] 0.5021579 0.5038746 0.5043289 0.5032334 0.4948074 0.5021917 0.4984553 0.5029390
[49] 0.4951465 0.5033675
If I give say 5 rows as input, as expected I get 5 predictions.
> predict(Model,testset[1:5,],type="prob")[,2]
[1] 0.7470780 0.7101257 0.4795726 0.7451049 0.5607364
Why is the first command giving me 50 predictions when I'm giving just one row as input?

Related

reorder a 1 dimensional dataframe based on the column order of a larger dataframe (R)

relevant_ods_reordered <- relevant_ods[names(cpm)]
the above seeks to reorder column names of a dataframe relevant_ods:
Plate1_DMSO_A01 Plate1_DMSO_B01 Plate1_DMSO_C01 Plate1_Lopinavir_D01
OD595 0.431 0.4495 0.4993 0.5785
Plate1_DMSO_E01 Plate1_DMSO_F01 Plate1_DMSO_G01 Plate1_DMSO_H01
OD595 0.5336 0.5133 0.527 0.5413
Plate1_DMSO_C12 Plate1_DMSO_D12 Plate1_Lopinavir_E12 Plate1_DMSO_F12
OD595 0.4137 0.4274 0.5241 0.4264
Plate1_DMSO_G12 Plate1_DMSO_H12
OD595 0.4561 0.4767
to match the order of the columns in a significantly larger dataframe:
[1] "Plate1_DMSO_A01" "Plate1_DMSO_A12"
[3] "Plate1_DMSO_B01" "Plate1_DMSO_B12"
[5] "Plate1_DMSO_C01" "Plate1_DMSO_C12"
[7] "Plate1_DMSO_D12" "Plate1_DMSO_E01"
[9] "Plate1_DMSO_F01" "Plate1_DMSO_F12"
[11] "Plate1_DMSO_G01" "Plate1_DMSO_G12"
[13] "Plate1_DMSO_H01" "Plate1_DMSO_H12"
[15] "Plate1_Lopinavir_D01" "Plate1_Lopinavir_E12"
[17] "Plate1_NS1519_22009_A02" "Plate1_NS1519_22009_A04"
[19] "Plate1_NS1519_22009_A05" "Plate1_NS1519_22009_A06"
[21] "Plate1_NS1519_22009_A07" "Plate1_NS1519_22009_A08"
[23] "Plate1_NS1519_22009_A09" "Plate1_NS1519_22009_A10"
[25] "Plate1_NS1519_22009_A11" "Plate1_NS1519_22009_B02"
[27] "Plate1_NS1519_22009_B03" "Plate1_NS1519_22009_B04"
[29] "Plate1_NS1519_22009_B05" "Plate1_NS1519_22009_B06"
etc.
Clearly, there is a returned
Error in `[.data.frame`(relevant_ods, names(cpm)) :
undefined columns selected
due to the mismatch between the numbers of columns
I have tried
relevant_ods_reordered <- relevant_ods[names(cpm),]
relevant_ods_reordered <- select(relevant_ods, names(cpm))
relevant_ods_reordered <- match(relevant_ods, names(cpm))
With base R, you need to find the names in common. intersect is good for this and preserves the order of its first argument:
relevant_ods[intersect(names(cpm), names(relevant_ods))]
Or with dplyr, use the select helper any_of:
select(relevant_ods, any_of(names(cpm)))

Code to rename multiple columns in rStudio

I want to rename this columns in R, I want to remove X from each of them so that it remains just figures which represents different years varying from 1960 to 2020. The first two (country name and Country Code) are sorted out already.
[1] "ï..Country.Name" "Country.Code" "X1960" "X1961" "X1962"
[6] "X1963" "X1964" "X1965" "X1966" "X1967"
[11] "X1968" "X1969" "X1970" "X1971" "X1972"
[16] "X1973" "X1974" "X1975" "X1976" "X1977"
[21] "X1978" "X1979" "X1980" "X1981" "X1982"
[26] "X1983" "X1984" "X1985" "X1986" "X1987"
[31] "X1988" "X1989" "X1990" "X1991" "X1992"
[36] "X1993" "X1994" "X1995" "X1996" "X1997"
[41] "X1998" "X1999" "X2000" "X2001" "X2002"
[46] "X2003" "X2004" "X2005" "X2006" "X2007"
[51] "X2008" "X2009" "X2010" "X2011" "X2012"
[56] "X2013" "X2014" "X2015" "X2016" "X2017"
[61] "X2018" "X2019" "X2020"
names(df) <- gsub("^X", "", names(df))
gsub() matches a regular expression and replaces it if found. In our case, the regex says the string must have an X at the beginning.

find a list of x from n lists

I have a list of 20 lists (each list contain genes of 20 populations) to
I have to find genes (for each population) whose are presented in at least 15 lists
I'm using R
Any help please ?
example:
BigList$list1$pop1
BigList$list1$pop2
BigList$list1$pop3
BigList$list2$pop1
BigList$list2$pop1
BigList$list2$pop3
BigList$list3$pop1
BigList$list3$pop1
BigList$list3$pop3
My list is like :
[[1]]$pop2
[1] "CFC1" "ZNF536" "TRIM67" "AC092431.3" "RP11-572M11.4" "HCG23" "AC006372.4" "RP11-6O2.4" "CACNG3"
[10] "AC129492.6" "POTEC" "RP11-862L9.3" "AC018766.5" "RP11-506O24.1" "RP11-397O8.7" "RP11-54O7.11" "RP11-335O13.7" "RP11-392O17.1"
[19] "AC140481.2" "RP11-284H18.1" "RP11-370B11.3" "SLC17A8" "RP11-474D1.2" "GOLGA8H" "RP11-815J21.3" "CTD-2135D7.2" "RP11-388M20.6"
[28] "CTD-2034I21.2" "KRT31" "USH1G" "CTC-360G5.9" "TBL1Y" "RP11-143E21.6" "SERPINA10" "RP11-303E16.3" "RP11-849F2.5"
[37] "VCAN-AS1" "OPN4" "MS4A2" "LIMS3" "SYNE1-AS1" "RP11-881M11.4" "GCSAML-AS1" "LIMS3L" "FBXW12"
[46] "RP11-364P22.1" "ADAMTS19" "AC005276.1" "RP11-513D5.5" "RP11-68L18.1" "RP11-402G3.3" "PGA3" "PGA4" "RP11-582E3.2"
[55] "LINC00943" "AC073657.1" "RP11-773H22.4" "ANKRD30B" "RP11-103J8.2" "CTA-407F11.8" "ETNPPL" "RP11-1M18.1" "RP11-277P12.10"
[64] "AC105339.1" "DDX4" "CTD-2342N23.3" "RP11-684B21.1" "NDST4" "CCDC60" "U91319.1" "RGR" "AC108868.6"
[73] "RP11-480G7.1"
[[1]]$pop2
[1] "RP11-469N6.1" "GDF5" "NELL1"
[[1]]$ppo3
[1] "RP3-398G3.5" "AC010091.1" "RP11-3B12.5" "RP11-78F17.1" "C20ORF135" "CTC-325J23.3" "DBH" "FOXE3" "FOXD4L1"
[10] "AC114730.8" "AC008697.1" "RP3-323N1.2" "RP11-142M10.2" "AC005616.2" "DCDC2B" "RP11-415J8.7" "LINC00326" "IL1RAPL2"
[19] "RP11-167N4.2" "RP11-114H23.1" "RP11-57A19.2" "C17orf98" "XX-CR54.3" "DLX2" "RP11-337N6.1" "RP11-416O18.1" "RP11-25H12.1"
[28] "RP11-269F21.3" "LINC00491" "CTB-43E15.3" "GABRR1" "H2BFWT" "TRPC5OS" "HTR2C" "RP11-642C5.1" "RP11-64P14.7"
[[2]]$pop1
[1] "CNGA3" "ITLN2" "RP11-400N13.1" "RP11-331F9.4" "GPR88" "LINC01037" "RP11-255M2.2" "LA16c-329F2.1" "RP11-154H12.2" "DUXA"
[11] "RP11-36B6.1" "RP11-12A16.3"
[[2]]$pop2
[1] "AC011893.3" "ISM1-AS1" "CA10" "RP11-301L8.2" "RP11-1250I15.3" "GABRG2" "NAMA" "CLEC1B" "RP11-458D21.5"
[10] "RGPD4" "SLITRK3" "RP3-495K2.2" "C11orf87" "RCVRN" "RP5-1112F19.2" "RP3-333A15.1" "RP5-836J3.1" "METTL11B"
[19] "AC112721.1" "RP11-761N21.1" "GRID2" "GML" "CLEC2A" "RP11-834C11.8" "RP11-406H23.2" "RP4-715N11.2" "RHD"
[28] "EYA1" "TAS2R19" "GABRA1" "SLC8A3" "RP3-510H16.3" "GRM7-AS3" "RP11-71H9.1" "PPEF2" "TULP1"
[37] "RP11-704J17.5" "RP11-10C8.2" "RP11-298H24.1" "RP11-263K4.3" "METTL21C" "AC012317.1" "CCDC42" "AC139100.3" "AF015262.2"
[[2]]$pop3
[1] "SYT10" "SPATA13-AS1" "AC064834.2" "CTD-2544H17.2" "AC106786.1" "RP11-25L3.3" "IMPG1" "DDX4" "RP11-50B3.4"
I have to find intersections to find (List with genes in at least:
2 lists
3 lists
Thank you.

R - Operations over corresponding vector items in list

Let's say I have a list of vectors, like so:
[[1]]
[1] -0.36603596 -0.41461025 -0.68573296 -0.55516173 0.05071238 0.47723472 0.10851948
[8] 0.67005116 0.25519780 -0.79428716 0.16506077 0.81905548 0.22808934 -0.39257712
[15] 0.44778539 -0.36149934 -0.90142102 -0.99826169 0.24544167 -0.18989310 -0.67592344
[22] -0.65447808 0.26617179 -0.25020153 0.19562031 0.53520465 -0.47531100 -0.60152887
[29] 0.12012461 -0.68947499 -0.33258301 0.19914520 -0.70396942 0.21574644 -0.67197365
[36] -0.12744723 -0.07113916 0.44497439 0.07592963 -0.29082130 -0.27967624 0.28314801
[43] -0.09840383 -0.55582233 -0.29474315 -0.41717316 0.51017306 -0.31227399 0.39484400
[50] -0.88843530
[[2]]
[1] -0.14763873 -0.69009083 -0.55705599 -0.43779047 0.15626341 -0.00629513 -0.95227841
[8] 0.85645849 -0.40110676 -0.35732008 0.31375323 0.71478975 0.02262899 -0.12802829
[15] 0.58750725 -0.25629463 -0.65609956 -0.83185625 -0.35244759 -0.33287717 -0.99199682
[22] -0.45836093 -0.19431609 -0.41590652 1.06120542 0.20687783 0.13268137 -0.34219985
[29] -0.18096691 -0.24496102 -0.47769117 0.89134577 -0.56128402 0.70825268 0.10426368
[36] -0.13962506 -0.72478276 -0.40178315 0.65943132 -0.82083464 0.22569929 -1.02243310
[43] -0.70983610 -1.36733592 0.68807554 0.09156598 0.76850778 -0.64040433 0.79276407
[50] -0.40297792
[[3]]
[1] 0.34405450 -0.07928067 0.08353835 -0.37919066 -0.47233278 -0.38839824 -0.13269067
[8] 0.17348495 0.42777652 -0.19297300 -0.86438130 0.75787336 -0.34358747 0.47852682
[15] 1.29980892 -0.42527812 -0.25074922 -0.59565850 0.32800193 -0.56109570 -0.72905476
[22] -0.11498356 -0.29827083 -0.21653428 0.78533418 0.64735755 0.31889828 -0.37129803
[29] -0.51252162 0.24192268 -0.29281809 1.03299397 -0.11251429 0.13157698 -0.06404053
[36] 0.01904473 -0.13162565 0.30488937 0.31933970 0.14135025 -0.31501649 0.16738399
[43] -0.19627252 -1.29613018 -0.03572980 -0.72008672 0.13932428 -0.06117093 -0.62665670
[50] -0.12662761
[[4]]
[1] 0.183303468 0.160037845 -0.053473912 0.005199917 -0.126312554 0.116465956 -0.061730281
[8] 0.392903969 -0.008337453 -0.752631038 -0.235599857 0.999534398 0.375208363 0.201100799
[15] 0.444068886 -0.575795949 -0.873388633 -0.863612264 0.076050073 -0.188358603 -0.391865671
[22] -1.726690292 -1.206992567 -0.547175750 0.290255919 1.119834989 0.551360182 -0.510140345
[29] -0.460314706 -0.245835558 -0.315087602 0.947181076 -0.132550448 0.038419545 -0.017929636
[36] 0.041870497 -0.520961791 0.195326850 -0.117783785 -0.427426472 -0.119577158 0.702550914
[43] -0.045789957 -0.794299036 0.181420440 0.407347072 0.571894407 -0.217325835 0.280283391
[50] -0.492866084
[[5]]
[1] -0.40852268 -0.33488615 -0.30609700 -0.67467326 -0.11966383 1.01161858 -0.27108333
[8] 0.92772286 0.39047166 0.29019594 0.24404167 0.07824440 0.32786441 0.21657727
[15] 0.34362648 -0.44996166 -0.27823770 -1.24962127 -0.57241699 -0.30297804 -0.66728157
[22] 0.01783441 0.50773758 -0.31477033 -0.14581338 -0.13827194 -0.25574117 0.40049840
[29] 0.38634920 -0.29027963 -0.03381480 0.48510557 -0.61594522 1.09573928 -0.27992008
[36] -0.41523542 -0.24131548 0.43480320 0.32855110 0.48579320 0.47366867 0.62697303
[43] -0.57792202 -0.81951194 0.21583044 0.15593484 -0.10270703 -0.10206812 -0.25195873
[50] -0.89835763
I want to average corresponding vector items (e.g.: [[1]][1], [[1]][2], [[1]][3], etc.) to result in a single vector of averaged values. For instance, the mean of every first vector item across the list would be -0.07896788. What's the best way to go about this?
let's say list is called mylist:
mydf=as.data.frame(do.call("rbind",mylist))
colMeans(mydf)
would that be the desired output?

R sort list of files numerically [duplicate]

This question already has answers here:
How to sort a character vector where elements contain letters and numbers?
(6 answers)
Closed 2 years ago.
I have a list of files that I need to sort numerically, such that I can import them in order
my code is:
bed = '/files/coverage_v2'
beds=list.files(path=bed, pattern='ctcf.motif.minus[0-9]+.bed.IGTB950.bed')
for(b in beds){
`for(b in beds){`print(b)
read.table(b)
}
> [1] "ctcf.motif.minus1.bed.IGTB950.bed" "ctcf.motif.minus10.bed.IGTB950.bed"
[3] "ctcf.motif.minus100.bed.IGTB950.bed" "ctcf.motif.minus101.bed.IGTB950.bed"
[5] "ctcf.motif.minus102.bed.IGTB950.bed" "ctcf.motif.minus103.bed.IGTB950.bed"
[7] "ctcf.motif.minus104.bed.IGTB950.bed" "ctcf.motif.minus105.bed.IGTB950.bed"
[9] "ctcf.motif.minus106.bed.IGTB950.bed" "ctcf.motif.minus107.bed.IGTB950.bed"
[11] "ctcf.motif.minus108.bed.IGTB950.bed" "ctcf.motif.minus109.bed.IGTB950.bed"
[13] "ctcf.motif.minus11.bed.IGTB950.bed" "ctcf.motif.minus110.bed.IGTB950.bed"
[15] "ctcf.motif.minus111.bed.IGTB950.bed" "ctcf.motif.minus112.bed.IGTB950.bed"
[17] "ctcf.motif.minus113.bed.IGTB950.bed" "ctcf.motif.minus114.bed.IGTB950.bed"
[19] "ctcf.motif.minus115.bed.IGTB950.bed" "ctcf.motif.minus116.bed.IGTB950.bed"
[21] "ctcf.motif.minus117.bed.IGTB950.bed" "ctcf.motif.minus118.bed.IGTB950.bed"
[23] "ctcf.motif.minus119.bed.IGTB950.bed" "ctcf.motif.minus12.bed.IGTB950.bed"
[25] "ctcf.motif.minus120.bed.IGTB950.bed" "ctcf.motif.minus121.bed.IGTB950.bed"
[27] "ctcf.motif.minus122.bed.IGTB950.bed" "ctcf.motif.minus123.bed.IGTB950.bed"
[29] "ctcf.motif.minus124.bed.IGTB950.bed" "ctcf.motif.minus125.bed.IGTB950.bed"
[31] "ctcf.motif.minus126.bed.IGTB950.bed" "ctcf.motif.minus127.bed.IGTB950.bed"
[33] "ctcf.motif.minus128.bed.IGTB950.bed" "ctcf.motif.minus129.bed.IGTB950.bed"
[35] "ctcf.motif.minus13.bed.IGTB950.bed" "ctcf.motif.minus130.bed.IGTB950.bed"
[37] "ctcf.motif.minus131.bed.IGTB950.bed" "ctcf.motif.minus132.bed.IGTB950.bed"
[39] "ctcf.motif.minus133.bed.IGTB950.bed" "ctcf.motif.minus134.bed.IGTB950.bed"
But what I really want is for it to be sorted numerically:
> "ctcf.motif.minus1.bed.IGTB950.bed"
"ctcf.motif.minus10.bed.IGTB950.bed"
"ctcf.motif.minus11.bed.IGTB950.bed"
"ctcf.motif.minus12.bed.IGTB950.bed"
"ctcf.motif.minus13.bed.IGTB950.bed"
"ctcf.motif.minus100.bed.IGTB950.bed"
"ctcf.motif.minus101.bed.IGTB950.bed"
etc, so that it will be imported numerically.
Thanks in advance!!
You could try mixedsort from gtools
library(gtools)
beds1 <- mixedsort(beds)
head(beds1)
#[1]"ctcf.motif.minus1.bed.IGTB950.bed" "ctcf.motif.minus10.bed.IGTB950.bed"
#[3]"ctcf.motif.minus11.bed.IGTB950.bed" "ctcf.motif.minus12.bed.IGTB950.bed"
#[5]"ctcf.motif.minus13.bed.IGTB950.bed" "ctcf.motif.minus100.bed.IGTB950.bed"
Or using regex (assuming that the order depends on the numbers after 'minus' and before 'bed'.
beds[order(as.numeric(gsub('\\D+|\\.bed.*', '', beds)))]

Resources