I have a list of 20 lists (each list contain genes of 20 populations) to
I have to find genes (for each population) whose are presented in at least 15 lists
I'm using R
Any help please ?
example:
BigList$list1$pop1
BigList$list1$pop2
BigList$list1$pop3
BigList$list2$pop1
BigList$list2$pop1
BigList$list2$pop3
BigList$list3$pop1
BigList$list3$pop1
BigList$list3$pop3
My list is like :
[[1]]$pop2
[1] "CFC1" "ZNF536" "TRIM67" "AC092431.3" "RP11-572M11.4" "HCG23" "AC006372.4" "RP11-6O2.4" "CACNG3"
[10] "AC129492.6" "POTEC" "RP11-862L9.3" "AC018766.5" "RP11-506O24.1" "RP11-397O8.7" "RP11-54O7.11" "RP11-335O13.7" "RP11-392O17.1"
[19] "AC140481.2" "RP11-284H18.1" "RP11-370B11.3" "SLC17A8" "RP11-474D1.2" "GOLGA8H" "RP11-815J21.3" "CTD-2135D7.2" "RP11-388M20.6"
[28] "CTD-2034I21.2" "KRT31" "USH1G" "CTC-360G5.9" "TBL1Y" "RP11-143E21.6" "SERPINA10" "RP11-303E16.3" "RP11-849F2.5"
[37] "VCAN-AS1" "OPN4" "MS4A2" "LIMS3" "SYNE1-AS1" "RP11-881M11.4" "GCSAML-AS1" "LIMS3L" "FBXW12"
[46] "RP11-364P22.1" "ADAMTS19" "AC005276.1" "RP11-513D5.5" "RP11-68L18.1" "RP11-402G3.3" "PGA3" "PGA4" "RP11-582E3.2"
[55] "LINC00943" "AC073657.1" "RP11-773H22.4" "ANKRD30B" "RP11-103J8.2" "CTA-407F11.8" "ETNPPL" "RP11-1M18.1" "RP11-277P12.10"
[64] "AC105339.1" "DDX4" "CTD-2342N23.3" "RP11-684B21.1" "NDST4" "CCDC60" "U91319.1" "RGR" "AC108868.6"
[73] "RP11-480G7.1"
[[1]]$pop2
[1] "RP11-469N6.1" "GDF5" "NELL1"
[[1]]$ppo3
[1] "RP3-398G3.5" "AC010091.1" "RP11-3B12.5" "RP11-78F17.1" "C20ORF135" "CTC-325J23.3" "DBH" "FOXE3" "FOXD4L1"
[10] "AC114730.8" "AC008697.1" "RP3-323N1.2" "RP11-142M10.2" "AC005616.2" "DCDC2B" "RP11-415J8.7" "LINC00326" "IL1RAPL2"
[19] "RP11-167N4.2" "RP11-114H23.1" "RP11-57A19.2" "C17orf98" "XX-CR54.3" "DLX2" "RP11-337N6.1" "RP11-416O18.1" "RP11-25H12.1"
[28] "RP11-269F21.3" "LINC00491" "CTB-43E15.3" "GABRR1" "H2BFWT" "TRPC5OS" "HTR2C" "RP11-642C5.1" "RP11-64P14.7"
[[2]]$pop1
[1] "CNGA3" "ITLN2" "RP11-400N13.1" "RP11-331F9.4" "GPR88" "LINC01037" "RP11-255M2.2" "LA16c-329F2.1" "RP11-154H12.2" "DUXA"
[11] "RP11-36B6.1" "RP11-12A16.3"
[[2]]$pop2
[1] "AC011893.3" "ISM1-AS1" "CA10" "RP11-301L8.2" "RP11-1250I15.3" "GABRG2" "NAMA" "CLEC1B" "RP11-458D21.5"
[10] "RGPD4" "SLITRK3" "RP3-495K2.2" "C11orf87" "RCVRN" "RP5-1112F19.2" "RP3-333A15.1" "RP5-836J3.1" "METTL11B"
[19] "AC112721.1" "RP11-761N21.1" "GRID2" "GML" "CLEC2A" "RP11-834C11.8" "RP11-406H23.2" "RP4-715N11.2" "RHD"
[28] "EYA1" "TAS2R19" "GABRA1" "SLC8A3" "RP3-510H16.3" "GRM7-AS3" "RP11-71H9.1" "PPEF2" "TULP1"
[37] "RP11-704J17.5" "RP11-10C8.2" "RP11-298H24.1" "RP11-263K4.3" "METTL21C" "AC012317.1" "CCDC42" "AC139100.3" "AF015262.2"
[[2]]$pop3
[1] "SYT10" "SPATA13-AS1" "AC064834.2" "CTD-2544H17.2" "AC106786.1" "RP11-25L3.3" "IMPG1" "DDX4" "RP11-50B3.4"
I have to find intersections to find (List with genes in at least:
2 lists
3 lists
Thank you.
Related
Hello I have a set of daily meteo data, using the expression :
f <- list.files(getwd(), include.dirs=TRUE, recursive=TRUE, pattern= "PREC")
I select only the files of Precipitation
I wonder how to select only files for example of January, the one for example named 20170103 (yyyymmdd) , so the one named yyyy01dd....
the files are named in this way: "PREC_20010120.grd".
Try pattern='PREC_\\d{4}01\\d[2].*'.
PREC_ literally
\\d{4} four digits
01 '"01" literally
\\d{2} two digits
.* any character repeatedly
Thank you , but I retrieved only 35 items instead of 31 days * 10 years what's wrong ?
[1] "20100102/PREC_20100102.tif" "20100112/PREC_20100112.tif"
[3] "20100122/PREC_20100122.tif" "20110102/PREC_20110102.tif"
[5] "20110112/PREC_20110112.tif" "20110122/PREC_20110122.tif"
[7] "20120102/PREC_20120102.tif" "20120112/PREC_20120112.tif"
[9] "20120122/PREC_20120122.tif" "20130102/PREC_20130102.tif"
[11] "20130112/PREC_20130112.tif" "20130122/PREC_20130122.tif"
[13] "20140102/PREC_20140102.tif" "20140112/PREC_20140112.tif"
[15] "20140122/PREC_20140122.tif" "20150102/PREC_20150102.tif"
[17] "20150112/PREC_20150112.tif" "20150122/PREC_20150122.tif"
[19] "20160102/PREC_20160102.tif" "20160112/PREC_20160112.tif"
[21] "20160122/PREC_20160122.tif" "20170102/PREC_20170102.tif"
[23] "20170112/PREC_20170112.tif" "20170122/PREC_20170122.tif"
[25] "20180102/PREC_20180102.tif" "20180112/PREC_20180112.tif"
[27] "20180122/PREC_20180122.tif" "20190102/PREC_20190102.tif"
[29] "20190112/PREC_20190112.tif" "20190122/PREC_20190122.tif"
[31] "20200102/PREC_20200102.tif" "20200112/PREC_20200112.tif"
[33] "20200122/PREC_20200122.tif" "20210102/PREC_20210102.tif"
[35] "20210112/PREC_20210112.tif" "20210122/PREC_20210122.tif"
Resolved with:
f <- list.files(getwd(), include.dirs=TRUE, recursive=TRUE, pattern='PREC_\\d{4}01.*')
Let's say I have a list of vectors, like so:
[[1]]
[1] -0.36603596 -0.41461025 -0.68573296 -0.55516173 0.05071238 0.47723472 0.10851948
[8] 0.67005116 0.25519780 -0.79428716 0.16506077 0.81905548 0.22808934 -0.39257712
[15] 0.44778539 -0.36149934 -0.90142102 -0.99826169 0.24544167 -0.18989310 -0.67592344
[22] -0.65447808 0.26617179 -0.25020153 0.19562031 0.53520465 -0.47531100 -0.60152887
[29] 0.12012461 -0.68947499 -0.33258301 0.19914520 -0.70396942 0.21574644 -0.67197365
[36] -0.12744723 -0.07113916 0.44497439 0.07592963 -0.29082130 -0.27967624 0.28314801
[43] -0.09840383 -0.55582233 -0.29474315 -0.41717316 0.51017306 -0.31227399 0.39484400
[50] -0.88843530
[[2]]
[1] -0.14763873 -0.69009083 -0.55705599 -0.43779047 0.15626341 -0.00629513 -0.95227841
[8] 0.85645849 -0.40110676 -0.35732008 0.31375323 0.71478975 0.02262899 -0.12802829
[15] 0.58750725 -0.25629463 -0.65609956 -0.83185625 -0.35244759 -0.33287717 -0.99199682
[22] -0.45836093 -0.19431609 -0.41590652 1.06120542 0.20687783 0.13268137 -0.34219985
[29] -0.18096691 -0.24496102 -0.47769117 0.89134577 -0.56128402 0.70825268 0.10426368
[36] -0.13962506 -0.72478276 -0.40178315 0.65943132 -0.82083464 0.22569929 -1.02243310
[43] -0.70983610 -1.36733592 0.68807554 0.09156598 0.76850778 -0.64040433 0.79276407
[50] -0.40297792
[[3]]
[1] 0.34405450 -0.07928067 0.08353835 -0.37919066 -0.47233278 -0.38839824 -0.13269067
[8] 0.17348495 0.42777652 -0.19297300 -0.86438130 0.75787336 -0.34358747 0.47852682
[15] 1.29980892 -0.42527812 -0.25074922 -0.59565850 0.32800193 -0.56109570 -0.72905476
[22] -0.11498356 -0.29827083 -0.21653428 0.78533418 0.64735755 0.31889828 -0.37129803
[29] -0.51252162 0.24192268 -0.29281809 1.03299397 -0.11251429 0.13157698 -0.06404053
[36] 0.01904473 -0.13162565 0.30488937 0.31933970 0.14135025 -0.31501649 0.16738399
[43] -0.19627252 -1.29613018 -0.03572980 -0.72008672 0.13932428 -0.06117093 -0.62665670
[50] -0.12662761
[[4]]
[1] 0.183303468 0.160037845 -0.053473912 0.005199917 -0.126312554 0.116465956 -0.061730281
[8] 0.392903969 -0.008337453 -0.752631038 -0.235599857 0.999534398 0.375208363 0.201100799
[15] 0.444068886 -0.575795949 -0.873388633 -0.863612264 0.076050073 -0.188358603 -0.391865671
[22] -1.726690292 -1.206992567 -0.547175750 0.290255919 1.119834989 0.551360182 -0.510140345
[29] -0.460314706 -0.245835558 -0.315087602 0.947181076 -0.132550448 0.038419545 -0.017929636
[36] 0.041870497 -0.520961791 0.195326850 -0.117783785 -0.427426472 -0.119577158 0.702550914
[43] -0.045789957 -0.794299036 0.181420440 0.407347072 0.571894407 -0.217325835 0.280283391
[50] -0.492866084
[[5]]
[1] -0.40852268 -0.33488615 -0.30609700 -0.67467326 -0.11966383 1.01161858 -0.27108333
[8] 0.92772286 0.39047166 0.29019594 0.24404167 0.07824440 0.32786441 0.21657727
[15] 0.34362648 -0.44996166 -0.27823770 -1.24962127 -0.57241699 -0.30297804 -0.66728157
[22] 0.01783441 0.50773758 -0.31477033 -0.14581338 -0.13827194 -0.25574117 0.40049840
[29] 0.38634920 -0.29027963 -0.03381480 0.48510557 -0.61594522 1.09573928 -0.27992008
[36] -0.41523542 -0.24131548 0.43480320 0.32855110 0.48579320 0.47366867 0.62697303
[43] -0.57792202 -0.81951194 0.21583044 0.15593484 -0.10270703 -0.10206812 -0.25195873
[50] -0.89835763
I want to average corresponding vector items (e.g.: [[1]][1], [[1]][2], [[1]][3], etc.) to result in a single vector of averaged values. For instance, the mean of every first vector item across the list would be -0.07896788. What's the best way to go about this?
let's say list is called mylist:
mydf=as.data.frame(do.call("rbind",mylist))
colMeans(mydf)
would that be the desired output?
I have built a model using Adaboost. When I give one row as input, this is the output I get. I was expecting to get just one number as the prediction
> predict(Model,testset[1,],type="prob")[,2]
[1] 0.5159268 0.5143351 0.5135043 0.5127763 0.5116162 0.5097892 0.5098299 0.5098701
[9] 0.5083176 0.5088486 0.5073487 0.5082424 0.5078101 0.5073640 0.5053638 0.5066038
[17] 0.5063418 0.5055067 0.5060952 0.5051869 0.5050157 0.5038692 0.5040837 0.5052188
[25] 0.5040825 0.5046496 0.5050795 0.5042205 0.4976465 0.5046798 0.5047607 0.4957011
[33] 0.5048601 0.5039299 0.5032739 0.5042044 0.5044005 0.5044902 0.5037352 0.4981865
[41] 0.5021579 0.5038746 0.5043289 0.5032334 0.5051926 0.5021917 0.5015447 0.5029390
[49] 0.4951465 0.5033675
> predict(Model,testset[2,],type="prob")[,2]
[1] 0.5159268 0.5143351 0.5135043 0.5127763 0.5116162 0.5097892 0.5098299 0.5098701
[9] 0.5083176 0.5088486 0.5073487 0.5082424 0.4921899 0.5073640 0.5053638 0.5066038
[17] 0.5063418 0.5055067 0.5060952 0.5051869 0.5050157 0.5038692 0.5040837 0.5052188
[25] 0.5040825 0.5046496 0.5050795 0.5042205 0.5023535 0.4953202 0.5047607 0.5042989
[33] 0.4951399 0.5039299 0.4967261 0.5042044 0.5044005 0.4955098 0.5037352 0.5018135
[41] 0.5021579 0.5038746 0.5043289 0.5032334 0.4948074 0.5021917 0.4984553 0.5029390
[49] 0.4951465 0.5033675
If I give say 5 rows as input, as expected I get 5 predictions.
> predict(Model,testset[1:5,],type="prob")[,2]
[1] 0.7470780 0.7101257 0.4795726 0.7451049 0.5607364
Why is the first command giving me 50 predictions when I'm giving just one row as input?
I have two almost identical data.frames, and I want to find the unique column name that is added to the x.2 object.
> colnames(x.1)
[1] "listPrice" "rent" "floor" "livingArea"
[5] "rooms" "published" "constructionYear" "objectType"
[9] "booliId" "soldDate" "soldPrice" "url"
[13] "additionalArea" "isNewConstruction" "location.namedAreas" "location.address.streetAddress"
[17] "location.address.city" "location.position.latitude" "location.position.longitude" "location.region.municipalityName"
[21] "location.region.countyName" "location.distance.ocean" "source.name" "source.id"
[25] "source.type" "source.url" "areaSize" "priceDiff"
[29] "perc.priceDiff" "sqrmPrice"
> colnames(x.2)
[1] "listPrice" "livingArea" "additionalArea" "plotArea"
[5] "rooms" "published" "constructionYear" "objectType"
[9] "booliId" "soldDate" "soldPrice" "url"
[13] "isNewConstruction" "floor" "rent" "location.namedAreas"
[17] "location.address.streetAddress" "location.address.city" "location.position.latitude" "location.position.longitude"
[21] "location.region.municipalityName" "location.region.countyName" "location.distance.ocean" "source.name"
[25] "source.id" "source.type" "source.url" "areaSize"
[29] "priceDiff" "perc.priceDiff" "sqrmPrice"
You can use setdiff to get the column names that are in 'x.2' and not in 'x.1'
setdiff(colnames(x.2), colnames(x.1))
Try
colnames(x.2)[!colnames(x.2) %in% colnames(x.1)]
This question already has answers here:
How to sort a character vector where elements contain letters and numbers?
(6 answers)
Closed 2 years ago.
I have a list of files that I need to sort numerically, such that I can import them in order
my code is:
bed = '/files/coverage_v2'
beds=list.files(path=bed, pattern='ctcf.motif.minus[0-9]+.bed.IGTB950.bed')
for(b in beds){
`for(b in beds){`print(b)
read.table(b)
}
> [1] "ctcf.motif.minus1.bed.IGTB950.bed" "ctcf.motif.minus10.bed.IGTB950.bed"
[3] "ctcf.motif.minus100.bed.IGTB950.bed" "ctcf.motif.minus101.bed.IGTB950.bed"
[5] "ctcf.motif.minus102.bed.IGTB950.bed" "ctcf.motif.minus103.bed.IGTB950.bed"
[7] "ctcf.motif.minus104.bed.IGTB950.bed" "ctcf.motif.minus105.bed.IGTB950.bed"
[9] "ctcf.motif.minus106.bed.IGTB950.bed" "ctcf.motif.minus107.bed.IGTB950.bed"
[11] "ctcf.motif.minus108.bed.IGTB950.bed" "ctcf.motif.minus109.bed.IGTB950.bed"
[13] "ctcf.motif.minus11.bed.IGTB950.bed" "ctcf.motif.minus110.bed.IGTB950.bed"
[15] "ctcf.motif.minus111.bed.IGTB950.bed" "ctcf.motif.minus112.bed.IGTB950.bed"
[17] "ctcf.motif.minus113.bed.IGTB950.bed" "ctcf.motif.minus114.bed.IGTB950.bed"
[19] "ctcf.motif.minus115.bed.IGTB950.bed" "ctcf.motif.minus116.bed.IGTB950.bed"
[21] "ctcf.motif.minus117.bed.IGTB950.bed" "ctcf.motif.minus118.bed.IGTB950.bed"
[23] "ctcf.motif.minus119.bed.IGTB950.bed" "ctcf.motif.minus12.bed.IGTB950.bed"
[25] "ctcf.motif.minus120.bed.IGTB950.bed" "ctcf.motif.minus121.bed.IGTB950.bed"
[27] "ctcf.motif.minus122.bed.IGTB950.bed" "ctcf.motif.minus123.bed.IGTB950.bed"
[29] "ctcf.motif.minus124.bed.IGTB950.bed" "ctcf.motif.minus125.bed.IGTB950.bed"
[31] "ctcf.motif.minus126.bed.IGTB950.bed" "ctcf.motif.minus127.bed.IGTB950.bed"
[33] "ctcf.motif.minus128.bed.IGTB950.bed" "ctcf.motif.minus129.bed.IGTB950.bed"
[35] "ctcf.motif.minus13.bed.IGTB950.bed" "ctcf.motif.minus130.bed.IGTB950.bed"
[37] "ctcf.motif.minus131.bed.IGTB950.bed" "ctcf.motif.minus132.bed.IGTB950.bed"
[39] "ctcf.motif.minus133.bed.IGTB950.bed" "ctcf.motif.minus134.bed.IGTB950.bed"
But what I really want is for it to be sorted numerically:
> "ctcf.motif.minus1.bed.IGTB950.bed"
"ctcf.motif.minus10.bed.IGTB950.bed"
"ctcf.motif.minus11.bed.IGTB950.bed"
"ctcf.motif.minus12.bed.IGTB950.bed"
"ctcf.motif.minus13.bed.IGTB950.bed"
"ctcf.motif.minus100.bed.IGTB950.bed"
"ctcf.motif.minus101.bed.IGTB950.bed"
etc, so that it will be imported numerically.
Thanks in advance!!
You could try mixedsort from gtools
library(gtools)
beds1 <- mixedsort(beds)
head(beds1)
#[1]"ctcf.motif.minus1.bed.IGTB950.bed" "ctcf.motif.minus10.bed.IGTB950.bed"
#[3]"ctcf.motif.minus11.bed.IGTB950.bed" "ctcf.motif.minus12.bed.IGTB950.bed"
#[5]"ctcf.motif.minus13.bed.IGTB950.bed" "ctcf.motif.minus100.bed.IGTB950.bed"
Or using regex (assuming that the order depends on the numbers after 'minus' and before 'bed'.
beds[order(as.numeric(gsub('\\D+|\\.bed.*', '', beds)))]