relevant_ods_reordered <- relevant_ods[names(cpm)]
the above seeks to reorder column names of a dataframe relevant_ods:
Plate1_DMSO_A01 Plate1_DMSO_B01 Plate1_DMSO_C01 Plate1_Lopinavir_D01
OD595 0.431 0.4495 0.4993 0.5785
Plate1_DMSO_E01 Plate1_DMSO_F01 Plate1_DMSO_G01 Plate1_DMSO_H01
OD595 0.5336 0.5133 0.527 0.5413
Plate1_DMSO_C12 Plate1_DMSO_D12 Plate1_Lopinavir_E12 Plate1_DMSO_F12
OD595 0.4137 0.4274 0.5241 0.4264
Plate1_DMSO_G12 Plate1_DMSO_H12
OD595 0.4561 0.4767
to match the order of the columns in a significantly larger dataframe:
[1] "Plate1_DMSO_A01" "Plate1_DMSO_A12"
[3] "Plate1_DMSO_B01" "Plate1_DMSO_B12"
[5] "Plate1_DMSO_C01" "Plate1_DMSO_C12"
[7] "Plate1_DMSO_D12" "Plate1_DMSO_E01"
[9] "Plate1_DMSO_F01" "Plate1_DMSO_F12"
[11] "Plate1_DMSO_G01" "Plate1_DMSO_G12"
[13] "Plate1_DMSO_H01" "Plate1_DMSO_H12"
[15] "Plate1_Lopinavir_D01" "Plate1_Lopinavir_E12"
[17] "Plate1_NS1519_22009_A02" "Plate1_NS1519_22009_A04"
[19] "Plate1_NS1519_22009_A05" "Plate1_NS1519_22009_A06"
[21] "Plate1_NS1519_22009_A07" "Plate1_NS1519_22009_A08"
[23] "Plate1_NS1519_22009_A09" "Plate1_NS1519_22009_A10"
[25] "Plate1_NS1519_22009_A11" "Plate1_NS1519_22009_B02"
[27] "Plate1_NS1519_22009_B03" "Plate1_NS1519_22009_B04"
[29] "Plate1_NS1519_22009_B05" "Plate1_NS1519_22009_B06"
etc.
Clearly, there is a returned
Error in `[.data.frame`(relevant_ods, names(cpm)) :
undefined columns selected
due to the mismatch between the numbers of columns
I have tried
relevant_ods_reordered <- relevant_ods[names(cpm),]
relevant_ods_reordered <- select(relevant_ods, names(cpm))
relevant_ods_reordered <- match(relevant_ods, names(cpm))
With base R, you need to find the names in common. intersect is good for this and preserves the order of its first argument:
relevant_ods[intersect(names(cpm), names(relevant_ods))]
Or with dplyr, use the select helper any_of:
select(relevant_ods, any_of(names(cpm)))
I want to rename this columns in R, I want to remove X from each of them so that it remains just figures which represents different years varying from 1960 to 2020. The first two (country name and Country Code) are sorted out already.
[1] "ï..Country.Name" "Country.Code" "X1960" "X1961" "X1962"
[6] "X1963" "X1964" "X1965" "X1966" "X1967"
[11] "X1968" "X1969" "X1970" "X1971" "X1972"
[16] "X1973" "X1974" "X1975" "X1976" "X1977"
[21] "X1978" "X1979" "X1980" "X1981" "X1982"
[26] "X1983" "X1984" "X1985" "X1986" "X1987"
[31] "X1988" "X1989" "X1990" "X1991" "X1992"
[36] "X1993" "X1994" "X1995" "X1996" "X1997"
[41] "X1998" "X1999" "X2000" "X2001" "X2002"
[46] "X2003" "X2004" "X2005" "X2006" "X2007"
[51] "X2008" "X2009" "X2010" "X2011" "X2012"
[56] "X2013" "X2014" "X2015" "X2016" "X2017"
[61] "X2018" "X2019" "X2020"
names(df) <- gsub("^X", "", names(df))
gsub() matches a regular expression and replaces it if found. In our case, the regex says the string must have an X at the beginning.
Hello I have a set of daily meteo data, using the expression :
f <- list.files(getwd(), include.dirs=TRUE, recursive=TRUE, pattern= "PREC")
I select only the files of Precipitation
I wonder how to select only files for example of January, the one for example named 20170103 (yyyymmdd) , so the one named yyyy01dd....
the files are named in this way: "PREC_20010120.grd".
Try pattern='PREC_\\d{4}01\\d[2].*'.
PREC_ literally
\\d{4} four digits
01 '"01" literally
\\d{2} two digits
.* any character repeatedly
Thank you , but I retrieved only 35 items instead of 31 days * 10 years what's wrong ?
[1] "20100102/PREC_20100102.tif" "20100112/PREC_20100112.tif"
[3] "20100122/PREC_20100122.tif" "20110102/PREC_20110102.tif"
[5] "20110112/PREC_20110112.tif" "20110122/PREC_20110122.tif"
[7] "20120102/PREC_20120102.tif" "20120112/PREC_20120112.tif"
[9] "20120122/PREC_20120122.tif" "20130102/PREC_20130102.tif"
[11] "20130112/PREC_20130112.tif" "20130122/PREC_20130122.tif"
[13] "20140102/PREC_20140102.tif" "20140112/PREC_20140112.tif"
[15] "20140122/PREC_20140122.tif" "20150102/PREC_20150102.tif"
[17] "20150112/PREC_20150112.tif" "20150122/PREC_20150122.tif"
[19] "20160102/PREC_20160102.tif" "20160112/PREC_20160112.tif"
[21] "20160122/PREC_20160122.tif" "20170102/PREC_20170102.tif"
[23] "20170112/PREC_20170112.tif" "20170122/PREC_20170122.tif"
[25] "20180102/PREC_20180102.tif" "20180112/PREC_20180112.tif"
[27] "20180122/PREC_20180122.tif" "20190102/PREC_20190102.tif"
[29] "20190112/PREC_20190112.tif" "20190122/PREC_20190122.tif"
[31] "20200102/PREC_20200102.tif" "20200112/PREC_20200112.tif"
[33] "20200122/PREC_20200122.tif" "20210102/PREC_20210102.tif"
[35] "20210112/PREC_20210112.tif" "20210122/PREC_20210122.tif"
Resolved with:
f <- list.files(getwd(), include.dirs=TRUE, recursive=TRUE, pattern='PREC_\\d{4}01.*')
I have a list of 20 lists (each list contain genes of 20 populations) to
I have to find genes (for each population) whose are presented in at least 15 lists
I'm using R
Any help please ?
example:
BigList$list1$pop1
BigList$list1$pop2
BigList$list1$pop3
BigList$list2$pop1
BigList$list2$pop1
BigList$list2$pop3
BigList$list3$pop1
BigList$list3$pop1
BigList$list3$pop3
My list is like :
[[1]]$pop2
[1] "CFC1" "ZNF536" "TRIM67" "AC092431.3" "RP11-572M11.4" "HCG23" "AC006372.4" "RP11-6O2.4" "CACNG3"
[10] "AC129492.6" "POTEC" "RP11-862L9.3" "AC018766.5" "RP11-506O24.1" "RP11-397O8.7" "RP11-54O7.11" "RP11-335O13.7" "RP11-392O17.1"
[19] "AC140481.2" "RP11-284H18.1" "RP11-370B11.3" "SLC17A8" "RP11-474D1.2" "GOLGA8H" "RP11-815J21.3" "CTD-2135D7.2" "RP11-388M20.6"
[28] "CTD-2034I21.2" "KRT31" "USH1G" "CTC-360G5.9" "TBL1Y" "RP11-143E21.6" "SERPINA10" "RP11-303E16.3" "RP11-849F2.5"
[37] "VCAN-AS1" "OPN4" "MS4A2" "LIMS3" "SYNE1-AS1" "RP11-881M11.4" "GCSAML-AS1" "LIMS3L" "FBXW12"
[46] "RP11-364P22.1" "ADAMTS19" "AC005276.1" "RP11-513D5.5" "RP11-68L18.1" "RP11-402G3.3" "PGA3" "PGA4" "RP11-582E3.2"
[55] "LINC00943" "AC073657.1" "RP11-773H22.4" "ANKRD30B" "RP11-103J8.2" "CTA-407F11.8" "ETNPPL" "RP11-1M18.1" "RP11-277P12.10"
[64] "AC105339.1" "DDX4" "CTD-2342N23.3" "RP11-684B21.1" "NDST4" "CCDC60" "U91319.1" "RGR" "AC108868.6"
[73] "RP11-480G7.1"
[[1]]$pop2
[1] "RP11-469N6.1" "GDF5" "NELL1"
[[1]]$ppo3
[1] "RP3-398G3.5" "AC010091.1" "RP11-3B12.5" "RP11-78F17.1" "C20ORF135" "CTC-325J23.3" "DBH" "FOXE3" "FOXD4L1"
[10] "AC114730.8" "AC008697.1" "RP3-323N1.2" "RP11-142M10.2" "AC005616.2" "DCDC2B" "RP11-415J8.7" "LINC00326" "IL1RAPL2"
[19] "RP11-167N4.2" "RP11-114H23.1" "RP11-57A19.2" "C17orf98" "XX-CR54.3" "DLX2" "RP11-337N6.1" "RP11-416O18.1" "RP11-25H12.1"
[28] "RP11-269F21.3" "LINC00491" "CTB-43E15.3" "GABRR1" "H2BFWT" "TRPC5OS" "HTR2C" "RP11-642C5.1" "RP11-64P14.7"
[[2]]$pop1
[1] "CNGA3" "ITLN2" "RP11-400N13.1" "RP11-331F9.4" "GPR88" "LINC01037" "RP11-255M2.2" "LA16c-329F2.1" "RP11-154H12.2" "DUXA"
[11] "RP11-36B6.1" "RP11-12A16.3"
[[2]]$pop2
[1] "AC011893.3" "ISM1-AS1" "CA10" "RP11-301L8.2" "RP11-1250I15.3" "GABRG2" "NAMA" "CLEC1B" "RP11-458D21.5"
[10] "RGPD4" "SLITRK3" "RP3-495K2.2" "C11orf87" "RCVRN" "RP5-1112F19.2" "RP3-333A15.1" "RP5-836J3.1" "METTL11B"
[19] "AC112721.1" "RP11-761N21.1" "GRID2" "GML" "CLEC2A" "RP11-834C11.8" "RP11-406H23.2" "RP4-715N11.2" "RHD"
[28] "EYA1" "TAS2R19" "GABRA1" "SLC8A3" "RP3-510H16.3" "GRM7-AS3" "RP11-71H9.1" "PPEF2" "TULP1"
[37] "RP11-704J17.5" "RP11-10C8.2" "RP11-298H24.1" "RP11-263K4.3" "METTL21C" "AC012317.1" "CCDC42" "AC139100.3" "AF015262.2"
[[2]]$pop3
[1] "SYT10" "SPATA13-AS1" "AC064834.2" "CTD-2544H17.2" "AC106786.1" "RP11-25L3.3" "IMPG1" "DDX4" "RP11-50B3.4"
I have to find intersections to find (List with genes in at least:
2 lists
3 lists
Thank you.
I have a vector that looks like
> inecodes
[1] "01001" "01002" "01049" "01003" "01006" "01037" "01008" "01004" "01009" "01010" "01011"
[12] "01013" "01014" "01016" "01017" "01021" "01022" "01023" "01046" "01056" "01901" "01027"
[23] "01019" "01020" "01028" "01030" "01031" "01032" "01902" "01033" "01036" "01058" "01034"
[34] "01039" "01041" "01042" "01043" "01044" "01047" "01051" "01052" "01053" "01054" "01055"
And I want to remove these "numbers" from this vector:
>pob
[1] "01001-Alegría-Dulantzi" "01002-Amurrio"
[3] "01049-Añana" "01003-Aramaio"
[5] "01006-Armiñón" "01037-Arraia-Maeztu"
[7] "01008-Arratzua-Ubarrundia" "01004-Artziniega"
[9] "01009-Asparrena" "01010-Ayala/Aiara"
[11] "01011-Baños de Ebro/Mañueta" "01013-Barrundia"
[13] "01014-Berantevilla" "01016-Bernedo"
[15] "01017-Campezo/Kanpezu" "01021-Elburgo/Burgelu"
[17] "01022-Elciego" "01023-Elvillar/Bilar"
[19] "01046-Erriberagoitia/Ribera Alta"
They are longer that these samples and they don't have the same length. The answer must to be like following:
>pob
[1] "Alegría-Dulantzi" "Amurrio"
[3] "Añana" "Aramaio"
[5] "Armiñón" "Arraia-Maeztu"
[7] "Arratzua-Ubarrundia" "Artziniega"
[9] "Asparrena" "Ayala/Aiara"
[11] "Baños de Ebro/Mañueta" "Barrundia"
[13] "Berantevilla" "Bernedo"
[15] "Campezo/Kanpezu" "Elburgo/Burgelu"
[17] "Elciego" "Elvillar/Bilar"
[19] "Erriberagoitia/Ribera Alta"
Not sure why you needed inecodes at all, since you can use sub to remove all digits:
sub('^\\d+-', '', pob)
Result:
[1] "Alegría-Dulantzi" "Amurrio" "Añana"
[4] "Aramaio" "Armiñón" "Arraia-Maeztu"
[7] "Arratzua-Ubarrundia" "Artziniega" "Asparrena"
[10] "Ayala/Aiara" "Baños de Ebro/Mañueta" "Barrundia"
[13] "Berantevilla" "Bernedo" "Campezo/Kanpezu"
[16] "Elburgo/Burgelu" "Elciego" "Elvillar/Bilar"
[19] "Erriberagoitia/Ribera Alta"
One reason that you might need inecodes is that you have codes in pob that don't exist in inecodes, but that doesn't seem like the case here. If you insist on using inecodes to remove numbers from pob, you can use str_replace_all from stringr:
library(stringr)
str_replace_all(pob, setNames(rep("", length(inecodes)), paste0(inecodes, "-")))
This gives you the exact same result:
[1] "Alegría-Dulantzi" "Amurrio" "Añana"
[4] "Aramaio" "Armiñón" "Arraia-Maeztu"
[7] "Arratzua-Ubarrundia" "Artziniega" "Asparrena"
[10] "Ayala/Aiara" "Baños de Ebro/Mañueta" "Barrundia"
[13] "Berantevilla" "Bernedo" "Campezo/Kanpezu"
[16] "Elburgo/Burgelu" "Elciego" "Elvillar/Bilar"
[19] "Erriberagoitia/Ribera Alta"
Data:
inecodes = c("01001", "01002", "01049", "01003", "01006", "01037", "01008",
"01004", "01009", "01010", "01011", "01013", "01014", "01016",
"01017", "01021", "01022", "01023", "01046", "01056", "01901",
"01027", "01019", "01020", "01028", "01030", "01031", "01032",
"01902", "01033", "01036", "01058", "01034", "01039", "01041",
"01042", "01043", "01044", "01047", "01051", "01052", "01053",
"01054", "01055")
pob = c("01001-Alegría-Dulantzi", "01002-Amurrio", "01049-Añana", "01003-Aramaio",
"01006-Armiñón", "01037-Arraia-Maeztu", "01008-Arratzua-Ubarrundia",
"01004-Artziniega", "01009-Asparrena", "01010-Ayala/Aiara", "01011-Baños de Ebro/Mañueta",
"01013-Barrundia", "01014-Berantevilla", "01016-Bernedo", "01017-Campezo/Kanpezu",
"01021-Elburgo/Burgelu", "01022-Elciego", "01023-Elvillar/Bilar",
"01046-Erriberagoitia/Ribera Alta")
library(stringr)
for(code in inecodes) {
ix <- which(str_detect(pob, code))
pob[ix] <- unlist(str_split(pob, "-", 2))[2]
}
Try this. Match should be much faster
pos<-which(!is.na(pob[match(sub('^([0-9]+)-.*$','\\1',pob),inecodes)]))
pob[pos]<-sub('^[0-9]+-(.*)$','\\1',pob[pos])
Please do post the timings if you manage to get this. Match usually solves many computational issues for large data sets lookup. Would like to see if there are any opposite scenarios.
A bit shorter than sub, str_detect and str_replace is str_remove:
library(stringr)
c("01001-Alegría-Dulantzi", "01002-Amurrio") %>%
str_remove("[0-9]*-")
returns
"Alegría-Dulantzi" "Amurrio"