easy solution needed to subset spectra files in list.files - r

I have a folder full of spectra files. The number of files can vary with different measurements as well as the repetitions.
I have so far and it works:
files <- list.files(pattern = "^Q\\d+")
print(files)
and print(list) gives:
[1] "Q010101N.001" "Q010101N.002" "Q010101N.003" "Q010101N.004" "Q010101N.005" "Q010101N.006"
[7] "Q010101N.007" "Q010101N.008" "Q010101N.009" "Q010101N.010" "Q010101N.011" "Q010101N.012"
[13] "Q010101N.013" "Q010101N.014" "Q010101N.015" "Q010101N.016" "Q010101N.017" "Q010101N.018"
[19] "Q010101N.019" "Q010101N.020" "Q010101N.021" "Q010101N.022" "Q010101N.023" "Q010101N.024"
[25] "Q010101N.025" "Q021101N.001" "Q021101N.002" "Q021101N.003" "Q021101N.004" "Q021101N.005"
[31] "Q021101N.006" "Q021101N.007" "Q021101N.008" "Q021101N.009" "Q021101N.010" "Q021101N.011"
[37] "Q021101N.012" "Q021101N.013" "Q021101N.014" "Q021101N.015" "Q021101N.016" "Q021101N.017"
[43] "Q021101N.018" "Q021101N.019" "Q021101N.020" "Q021101N.021" "Q021101N.022" "Q021101N.023"
[49] "Q021101N.024" "Q021101N.025" "Q031201N.001" "Q031201N.002" "Q031201N.003" "Q031201N.004"
[55] "Q031201N.005" "Q031201N.006" "Q031201N.007" "Q031201N.008" "Q031201N.009" "Q031201N.010"
[61] "Q031201N.011" "Q031201N.012" "Q031201N.013" "Q031201N.014" "Q031201N.015" "Q031201N.016"
[67] "Q031201N.017" "Q031201N.018" "Q031201N.019" "Q031201N.020" "Q031201N.021" "Q031201N.022"
[73] "Q031201N.023" "Q031201N.024" "Q031201N.025" "Q041301N.001" "Q041301N.002" "Q041301N.003"
[79] "Q041301N.004" "Q041301N.005" "Q041301N.006" "Q041301N.007" "Q041301N.008" "Q041301N.009"
[85] "Q041301N.010" "Q041301N.011" "Q041301N.012" "Q041301N.013" "Q041301N.014" "Q041301N.015"
[91] "Q041301N.016" "Q041301N.017" "Q041301N.018" "Q041301N.019" "Q041301N.020" "Q041301N.021"
[97] "Q041301N.022" "Q041301N.023" "Q041301N.024" "Q041301N.025" "Q051401N.001" "Q051401N.002"
[103] "Q051401N.003" "Q051401N.004" "Q051401N.005" "Q051401N.006" "Q051401N.007" "Q051401N.008"
[109] "Q051401N.009" "Q051401N.010" "Q051401N.011" "Q051401N.012" "Q051401N.013" "Q051401N.014"
[115] "Q051401N.015" "Q051401N.016" "Q051401N.017" "Q051401N.018" "Q051401N.019" "Q051401N.020"
[121] "Q051401N.021" "Q051401N.022" "Q051401N.023" "Q051401N.024" "Q051401N.025" "Q061501N.001"
[127] "Q061501N.002" "Q061501N.003" "Q061501N.004" "Q061501N.005" "Q061501N.006" "Q061501N.007"
[133] "Q061501N.008" "Q061501N.009" "Q061501N.010" "Q061501N.011" "Q061501N.012" "Q061501N.013"
[139] "Q061501N.014" "Q061501N.015" "Q061501N.016" "Q061501N.017" "Q061501N.018" "Q061501N.019"
[145] "Q061501N.020" "Q061501N.021" "Q061501N.022" "Q061501N.023" "Q061501N.024" "Q061501N.025"
[151] "Q071601N.001" "Q071601N.002" "Q071601N.003" "Q071601N.004" "Q071601N.005" "Q071601N.006"
[157] "Q071601N.007" "Q071601N.008" "Q071601N.009" "Q071601N.010" "Q071601N.011" "Q071601N.012"
[163] "Q071601N.013" "Q071601N.014" "Q071601N.015" "Q071601N.016" "Q071601N.017" "Q071601N.018"
[169] "Q071601N.019" "Q071601N.020" "Q071601N.021" "Q071601N.022" "Q071601N.023" "Q071601N.024"
[175] "Q071601N.025" "Q081701N.001" "Q081701N.002" "Q081701N.003" "Q081701N.004" "Q081701N.005"
[181] "Q081701N.006" "Q081701N.007" "Q081701N.008" "Q081701N.009" "Q081701N.010" "Q081701N.011"
[187] "Q081701N.012" "Q081701N.013" "Q081701N.014" "Q081701N.015" "Q081701N.016" "Q081701N.017"
[193] "Q081701N.018" "Q081701N.019" "Q081701N.020" "Q081701N.021" "Q081701N.022" "Q081701N.023"
[199] "Q081701N.024" "Q081701N.025" "Q091801N.001" "Q091801N.002" "Q091801N.003" "Q091801N.004"
[205] "Q091801N.005" "Q091801N.006" "Q091801N.007" "Q091801N.008" "Q091801N.009" "Q091801N.010"
[211] "Q091801N.011" "Q091801N.012" "Q091801N.013" "Q091801N.014" "Q091801N.015" "Q091801N.016"
[217] "Q091801N.017" "Q091801N.018" "Q091801N.019" "Q091801N.020" "Q091801N.021" "Q091801N.022"
[223] "Q091801N.023" "Q091801N.024" "Q091801N.025" "Q101901N.001" "Q101901N.002" "Q101901N.003"
[229] "Q101901N.004" "Q101901N.005" "Q101901N.006" "Q101901N.007" "Q101901N.008" "Q101901N.009"
[235] "Q101901N.010" "Q101901N.011" "Q101901N.012" "Q101901N.013" "Q101901N.014" "Q101901N.015"
[241] "Q101901N.016" "Q101901N.017" "Q101901N.018" "Q101901N.019" "Q101901N.020" "Q101901N.021"
[247] "Q101901N.022" "Q101901N.023" "Q101901N.024" "Q101901N.025" "Q112001N.001" "Q112001N.002"
[253] "Q112001N.003" "Q112001N.004" "Q112001N.005" "Q112001N.006" "Q112001N.007" "Q112001N.008"
[259] "Q112001N.009" "Q112001N.010" "Q112001N.011" "Q112001N.012" "Q112001N.013" "Q112001N.014"
[265] "Q112001N.015" "Q112001N.016" "Q112001N.017" "Q112001N.018" "Q112001N.019" "Q112001N.020"
[271] "Q112001N.021" "Q112001N.022" "Q112001N.023" "Q112001N.024" "Q112001N.025" "Q124101N.001"
[277] "Q124101N.002" "Q124101N.003" "Q124101N.004" "Q124101N.005" "Q124101N.006" "Q124101N.007"
[283] "Q124101N.008" "Q124101N.009" "Q124101N.010" "Q124101N.011" "Q124101N.012" "Q124101N.013"
[289] "Q124101N.014" "Q124101N.015" "Q124101N.016" "Q124101N.017" "Q124101N.018" "Q124101N.019"
[295] "Q124101N.020" "Q124101N.021" "Q124101N.022" "Q124101N.023" "Q124101N.024" "Q124101N.025"
[301] "Q134201N.001" "Q134201N.002" "Q134201N.003" "Q134201N.004" "Q134201N.005" "Q134201N.006"
[307] "Q134201N.007" "Q134201N.008" "Q134201N.009" "Q134201N.010" "Q134201N.011" "Q134201N.012"
[313] "Q134201N.013" "Q134201N.014" "Q134201N.015" "Q134201N.016" "Q134201N.017" "Q134201N.018"
[319] "Q134201N.019" "Q134201N.020" "Q134201N.021" "Q134201N.022" "Q134201N.023" "Q134201N.024"
[325] "Q134201N.025" "Q144301N.001" "Q144301N.002" "Q144301N.003" "Q144301N.004" "Q144301N.005"
[331] "Q144301N.006" "Q144301N.007" "Q144301N.008" "Q144301N.009" "Q144301N.010" "Q144301N.011"
[337] "Q144301N.012" "Q144301N.013" "Q144301N.014" "Q144301N.015" "Q144301N.016" "Q144301N.017"
[343] "Q144301N.018" "Q144301N.019" "Q144301N.020" "Q144301N.021" "Q144301N.022" "Q144301N.023"
[349] "Q144301N.024" "Q144301N.025" "Q154401N.001" "Q154401N.002" "Q154401N.003" "Q154401N.004"
[355] "Q154401N.005" "Q154401N.006" "Q154401N.007" "Q154401N.008" "Q154401N.009" "Q154401N.010"
[361] "Q154401N.011" "Q154401N.012" "Q154401N.013" "Q154401N.014" "Q154401N.015" "Q154401N.016"
[367] "Q154401N.017" "Q154401N.018" "Q154401N.019" "Q154401N.020" "Q154401N.021" "Q154401N.022"
[373] "Q154401N.023" "Q154401N.024" "Q154401N.025" "Q164501N.001" "Q164501N.002" "Q164501N.003"
[379] "Q164501N.004" "Q164501N.005" "Q164501N.006" "Q164501N.007" "Q164501N.008" "Q164501N.009"
[385] "Q164501N.010" "Q164501N.011" "Q164501N.012" "Q164501N.013" "Q164501N.014" "Q164501N.015"
[391] "Q164501N.016" "Q164501N.017" "Q164501N.018" "Q164501N.019" "Q164501N.020" "Q164501N.021"
[397] "Q164501N.022" "Q164501N.023" "Q164501N.024" "Q164501N.025" "Q174601N.001" "Q174601N.002"
[403] "Q174601N.003" "Q174601N.004" "Q174601N.005" "Q174601N.006" "Q174601N.007" "Q174601N.008"
[409] "Q174601N.009" "Q174601N.010" "Q174601N.011" "Q174601N.012" "Q174601N.013" "Q174601N.014"
[415] "Q174601N.015" "Q174601N.016" "Q174601N.017" "Q174601N.018" "Q174601N.019" "Q174601N.020"
[421] "Q174601N.021" "Q174601N.022" "Q174601N.023" "Q174601N.024" "Q174601N.025"
So in this case I get 425 spectra files and there are 25 repetitions of each sample. However the total number of files could be different another time and it could also be that one sample has 10 repetitions and the rest has 14 for example.
So I would like to subset each sample (with its repetitions into one subset). In this case I would get 17 subsets.
And the I need to import the files, which I have done succesfully before with all spectra files:
list.data <- list()
#import all spectra files
for (i in 1:length(files))
list.data[[i]] <- read.csv(files[i])
Given that I have now subsets, that would be slightly different!?

You can do this via a helper function and iteration. I used dplyr, purrr and stringi. This will put all your files into a single dataframe. After that you can manipulate it as you see fit.
library(dplyr)
library(purrr)
library(stringi)
read_spectra <- function(file){
file_name <- basename(file)
read.csv(file) %>%
mutate(sample = stri_extract_first_regex(file_name, "([A-Z][0-9]+)(?=.)"),
repetition = stri_extract_first_regex(file_name, "(?<=\\.)(\\d+)")) %>%
select(sample, repetition, everything())
}
full_data <- map_df(files, read_spectra)
The helper function:
Takes a file from list.files.
Reads the csv.
Uses mutate to make two new columns using regex to extract the sample number and repetition.
Orders the columns into sample, repetition and everything else.
The iteration is using map_df() from purrr to iterate read_spectra over each file in files and bind all of this together into a dataframe.

Related

How do I use the index value within the function of a for loop in R?

Multiplier <- numeric(200)
for (i in 1:200) Multiplier[i] <- 1/sqrt(i(i+1))
Like a math function where f(n)=1/sqrt(n(n+1), and put those first 200 values in an array. But when I run the above code I get:
# Error in i(i + 1) : could not find function "i"
and when I try to use [i] I get:
# Error: unexpected '[' in "for (i in 1:200) Multiplier[i] <- 1/sqrt([
Change the i(i+1) to i*(i+1) . When we use i() it is assuming i as function and the i+1 as argument to the function i
for (i in 1:200) Multiplier[i] <- 1/sqrt(i*(i+1))
-output
> Multiplier
[1] 0.707106781 0.408248290 0.288675135 0.223606798 0.182574186 0.154303350 0.133630621 0.117851130 0.105409255 0.095346259
[11] 0.087038828 0.080064077 0.074124932 0.069006556 0.064549722 0.060633906 0.057166195 0.054073807 0.051298918 0.048795004
[21] 0.046524211 0.044455422 0.042562827 0.040824829 0.039223227 0.037742568 0.036369648 0.035093120 0.033903175 0.032791292
[31] 0.031750032 0.030772873 0.029854072 0.028988552 0.028171808 0.027399831 0.026669037 0.025976217 0.025318484 0.024693240
[41] 0.024098135 0.023531040 0.022990024 0.022473329 0.021979349 0.021506620 0.021053798 0.020619652 0.020203051 0.019802951
[51] 0.019418391 0.019048483 0.018692405 0.018349396 0.018018749 0.017699808 0.017391962 0.017094641 0.016807316 0.016529490
[61] 0.016260700 0.016000512 0.015748520 0.015504342 0.015267620 0.015038019 0.014815221 0.014598929 0.014388862 0.014184754
[71] 0.013986356 0.013793431 0.013605757 0.013423121 0.013245324 0.013072175 0.012903494 0.012739112 0.012578865 0.012422600
[81] 0.012270170 0.012121435 0.011976263 0.011834527 0.011696106 0.011560887 0.011428758 0.011299615 0.011173359 0.011049892
[91] 0.010929125 0.010810969 0.010695340 0.010582159 0.010471348 0.010362833 0.010256545 0.010152415 0.010050378 0.009950372
[101] 0.009852336 0.009756214 0.009661948 0.009569488 0.009478779 0.009389775 0.009302426 0.009216688 0.009132515 0.009049866
[111] 0.008968700 0.008888977 0.008810658 0.008733708 0.008658090 0.008583770 0.008510715 0.008438894 0.008368274 0.008298827
[121] 0.008230522 0.008163333 0.008097232 0.008032193 0.007968191 0.007905200 0.007843198 0.007782160 0.007722065 0.007662891
[131] 0.007604618 0.007547224 0.007490689 0.007434996 0.007380124 0.007326056 0.007272775 0.007220264 0.007168505 0.007117483
[141] 0.007067182 0.007017587 0.006968683 0.006920457 0.006872893 0.006825978 0.006779700 0.006734045 0.006689001 0.006644555
[151] 0.006600696 0.006557412 0.006514693 0.006472526 0.006430901 0.006389809 0.006349238 0.006309180 0.006269623 0.006230560
[161] 0.006191980 0.006153875 0.006116237 0.006079055 0.006042324 0.006006033 0.005970176 0.005934744 0.005899731 0.005865128
[171] 0.005830929 0.005797126 0.005763713 0.005730683 0.005698029 0.005665745 0.005633825 0.005602263 0.005571052 0.005540187
[181] 0.005509663 0.005479473 0.005449612 0.005420074 0.005390855 0.005361950 0.005333352 0.005305058 0.005277063 0.005249362
[191] 0.005221950 0.005194823 0.005167976 0.005141405 0.005115106 0.005089075 0.005063307 0.005037799 0.005012547 0.004987547
According to ?Paren
Open parenthesis, (, and open brace, {, are .Primitive functions in R.
Effectively, ( is semantically equivalent to the identity function(x) x

How can I combine multiple .RDS files only for certain values within each file?

this is my first time using R. I am trying to figure out how I can conditionally combine data I have that is spread across five .RDS files. I've taken a look at this post on reading multiple .RDS files, but I do not want to concatenate the data; I want to overwrite certain entries with certain other entries while preserving the overall structure.
The data file naming convention follows the pattern: fold1.rds, fold2.rds, ... , fold5.rds
I load each file using: foldx = readRDS('foldx.rds)
A call to dim(foldx) returns [1] 12 450 4.
A simplified example of what I want to accomplish:
fold1 (when I print the variable) contains values like:
tri-a cal-c xali trop
PC3 1.000 1.000 -.0235 1.000
MCF7 -.0548 1.000 1.000 1.000
while the same indices of the fold2 data contain:
tri-a cal-c xali trop
PC3 -.078 1.000 1.000 1.000
MCF7 1.000 .0254 1.000 -.015
That is, only one of the data files will have an entry for indices [i,j] that is non-1.0. A value of 1.0 is assigned when a calculation has not been performed.
I want to combine the data from folds 1 through 5 such that I have a single file where every occurrence of a 1.0 is replaced with the corresponding non-1.0 value from whichever fold data file happens to contain it.
So my example result would look like:
tri-a cal-c xali trop
PC3 -.078 1.000 -.0235 1.000
MCF7 -.0548 .0254 1.000 -.015
I hope that this is sufficient detail. Thank you.
Edit:
The above example is a simplification of what I want to accomplish. The data are actually multi-dimensional arrays with 12 rows, 450 columns, and a third dimension with 4 entries. When I call dim on any one of them I get [1] 12 450 4.
For instance, calling dimnames(fold1) returns:
[[1]]
[1] "PC3" "MCF7" "A375" "A549" "ASC" "HA1E" "HCC515" "HEPG2" "HT29"
[10] "NPC" "VCAP" "NEU"
[[2]]
[1] "trichostatin-a" "geldanamycin"
[3] "iloprost" "wortmannin"
[5] "calcitriol" "carbacyclin"
[7] "daunorubicin" "tretinoin"
[9] "tamoxifen" "alvespimycin"
[11] "radicicol" "tanespimycin"
[13] "sirolimus" "fulvestrant"
[15] "estradiol" "mitoxantrone"
[17] "genistein" "vorinostat"
[19] "withaferin-a" "sorafenib"
[21] "triflupromazine" "noretynodrel"
[23] "benzonatate" "vemurafenib"
[25] "scopolamine" "brefeldin-a"
[27] "vinblastine" "cyclosporin-a"
[29] "reboxetine" "pinacidil"
[31] "fluticasone" "rimcazole"
[33] "methylergometrine" "nemonapride"
[35] "cromakalim" "cycloheximide"
[37] "clofarabine" "naltrindole"
[39] "tenidap" "cilastatin"
[41] "nisoxetine" "tubocurarine"
[43] "rottlerin" "flunarizine"
[45] "resveratrol" "dinoprostone"
[47] "tetramethylsilane" "danazol"
[49] "aloisine" "risperidone"
[51] "cyclopamine" "thioperamide"
[53] "luzindole" "proxymetacaine"
[55] "pyrrolidine-dithiocarbamate" "halcinonide"
[57] "pseudoephedrine" "caffeic-acid"
[59] "cabergoline" "fusaric-acid"
[61] "anandamide" "piretanide"
[63] "umbelliferone" "xaliproden"
[65] "cuneatin" "spironolactone"
[67] "norethisterone" "mammea-a"
[69] "irilin-a" "graveoline"
[71] "hydrastinine" "melperone"
[73] "ecopipam" "mevastatin"
[75] "pinocembrin" "emodic-acid"
[77] "coumaric-acid" "farnesylthiotriazole"
[79] "dehydroisoandosterone" "lonidamine"
[81] "relcovaptan" "pirfenidone"
[83] "azacyclonol" "cucurbitacin-i"
[85] "cyclopiazonic-acid" "wiskostatin"
[87] "mianserin" "procaterol"
[89] "kynuramine" "triciribine"
[91] "mifepristone" "noscapine"
[93] "yohimbine" "butabindide"
[95] "niguldipine" "bromocriptine"
[97] "vincristine" "monastrol"
[99] "tropisetron" "tropanyl-3,5-dimethylbenzoate"
[101] "bemesetron" "guggulsterone"
[103] "ciglitazone" "rosiglitazone"
[105] "pimozide" "pyrazolanthrone"
[107] "raclopride" "telenzepine"
[109] "minoxidil" "aminopurvalanol-a"
[111] "irinotecan" "desmethylclozapine"
[113] "riluzole" "piceatannol"
[115] "cilostamide" "vanoxerine"
[117] "rotenonic-acid" "glibenclamide"
[119] "clozapine" "kenpaullone"
[121] "zardaverine" "pregnenolone"
[123] "doconexent" "betulinic-acid"
[125] "ketanserin" "arctigenin"
[127] "remoxipride" "pravastatin"
[129] "loperamide" "brimonidine"
[131] "talipexole" "nimesulide"
[133] "osthol" "indolophenanthridine"
[135] "benperidol" "tetramethyl-haematoxylone"
[137] "gliquidone" "carmoxirole"
[139] "olomoucine" "prostaglandin-b2"
[141] "terconazole" "embelin"
[143] "aniracetam" "lisuride"
[145] "ciclacillin" "testosterone"
[147] "nefazodone" "anisomycin"
[149] "zatebradine" "gabazine"
[151] "isocarboxazid" "lamotrigine"
[153] "trapidil" "mead-acid"
[155] "4,5,6,7-tetrabromobenzotriazole" "quercetin"
[157] "dihydroergocristine" "coumestrol"
[159] "beclomethasone-dipropionate" "desoxycortone"
[161] "parthenolide" "formoterol"
[163] "rilmenidine" "puromycin"
[165] "clofibric-acid" "flumetasone"
[167] "verapamil" "methoprene-acid"
[169] "kavain" "hydralazine"
[171] "thenoyltrifluoroacetone" "pirinixic-acid"
[173] "forskolin" "lovastatin"
[175] "phenformin" "propoxycaine"
[177] "nafcillin" "bifemelane"
[179] "ergocornine" "scriptaid"
[181] "skimmianine" "apafant"
[183] "loreclezole" "fenretinide"
[185] "nitrocaramiphen" "fillalbin"
[187] "devazepide" "fluprostenol"
[189] "rimexolone" "tolazamide"
[191] "dipropyl-5ct" "reserpic-acid"
[193] "dextromethorphan" "alpha-linolenic-acid"
[195] "levocabastine" "levonorgestrel"
[197] "ouabain" "hinokitiol"
[199] "denbufylline" "ceforanide"
[201] "clocortolone-pivalate" "seneciphylline"
[203] "malonoben" "fenobam"
[205] "bezafibrate" "quinidine"
[207] "leoidin" "lidoflazine"
[209] "farnesylthioacetic-acid" "norcyclobenzaprine"
[211] "semaxanib" "arecaidine-but-2-ynyl-ester"
[213] "progesterone" "meprylcaine"
[215] "flurofamide" "trap-101"
[217] "flupentixol" "hippeastrine"
[219] "cefixime" "fluvoxamine"
[221] "telmisartan" "moxonidine"
[223] "estrone" "alprenolol"
[225] "nitrendipine" "mebeverine"
[227] "dexamethasone" "trimipramine"
[229] "tenoxicam" "canrenoic-acid"
[231] "rolipram" "estradiol-benzoate"
[233] "brucine" "pioglitazone"
[235] "nimodipine" "budesonide"
[237] "mirtazapine" "etodolac"
[239] "bendroflumethiazide" "bepridil"
[241] "chloroquine" "thalidomide"
[243] "naringenin" "fenoterol"
[245] "piperidolate" "caffeine"
[247] "celecoxib" "penitrem-a"
[249] "epirubicin" "diclofenac"
[251] "apigenin" "fenbufen"
[253] "nocodazole" "cinchonine"
[255] "chloroxine" "amiodarone"
[257] "simvastatin" "flutamide"
[259] "alrestatin" "niclosamide"
[261] "erythromycin-estolate" "loxapine"
[263] "proadifen" "bromhexine"
[265] "tosyl-phenylalanyl-chloromethyl-ketone" "glycocholic-acid"
[267] "spiperone" "chlorprothixene"
[269] "piperine" "dephostatin"
[271] "guanaben-acetate" "dihydrosamidin"
[273] "salicin" "fluvastatin"
[275] "altretamine" "pivmecillinam"
[277] "thiethylperazine" "tacrolimus"
[279] "trazodone" "carbamazepine"
[281] "ergocryptine" "nicergoline"
[283] "fluspirilene" "ricinine"
[285] "methyl-2,5-dihydroxycinnamate" "chlorpromazine"
[287] "amodiaquine" "riboflavin"
[289] "buspirone" "spiramide"
[291] "pifithrin-mu" "droperidol"
[293] "clonidine" "salsolinol"
[295] "warfarin" "tranylcypromine"
[297] "glycodeoxycholic-acid" "lenalidomide"
[299] "oligomycin-a" "olaparib"
[301] "etomoxir" "tivozanib"
[303] "orantinib" "serdemetan"
[305] "15-delta-prostaglandin-j2" "veliparib"
[307] "tozasertib" "rucaparib"
[309] "vecuronium" "raltitrexed"
[311] "mosapride" "voriconazole"
[313] "rifabutin" "epigallocatechin-gallate-(-)"
[315] "docetaxel" "moxifloxacin"
[317] "linezolid" "nelfinavir"
[319] "imatinib" "vinorelbine"
[321] "panobinostat" "alitretinoin"
[323] "antimycin-a" "fludarabine"
[325] "alvocidib" "lapatinib"
[327] "carbetocin" "buthionine-sulfoximine"
[329] "heraclenol" "raltegravir"
[331] "danusertib" "depudecin"
[333] "pyroxamide" "obatoclax"
[335] "enobosarm" "belinostat"
[337] "sitagliptin" "saracatinib"
[339] "cytochalasin-d" "parthenolide-(alternate-stereo)"
[341] "deforolimus" "axitinib"
[343] "tofacitinib" "zibotentan"
[345] "floxuridine" "dasatinib"
[347] "capecitabine" "cilomilast"
[349] "apicidin" "ispinesib"
[351] "epothilone-a" "entinostat"
[353] "decitabine" "andarine"
[355] "tandutinib" "motesanib"
[357] "5-iodotubercidin" "verrucarin-a"
[359] "hydroxycholesterol" "cyclazosin"
[361] "terreic-acid-(-)" "oxalomalic-acid"
[363] "hyperforin" "4-hydroxyretinoic-acid"
[365] "rifampicin" "tenovin-1"
[367] "eicosatetraynoic-acid" "dexrazoxane"
[369] "entecavir" "prostaglandin"
[371] "everolimus" "retinol"
[373] "somatostatin" "mocetinostat"
[375] "aprepitant" "danoprevir"
[377] "chenodeoxycholic-acid" "lithocholic-acid"
[379] "neurodazine" "lestaurtinib"
[381] "farnesol" "androstenol"
[383] "brivanib" "ziprasidone"
[385] "androstenedione" "taurodeoxycholic-acid"
[387] "2-aminopurine" "tamibarotene"
[389] "rivaroxaban" "3,3'-diindolylmethane"
[391] "farnesylcysteine-methyl-ester" "ochratoxin-a"
[393] "torin-1" "deoxycholic-acid"
[395] "cholic-acid" "dexketoprofen"
[397] "tetradecylthioacetic-acid" "tyrphostin-51"
[399] "iloperidone" "canertinib"
[401] "flupirtine" "palbociclib"
[403] "aurora-a-inhibitor-i" "ruxolitinib"
[405] "mirin" "dacinostat"
[407] "epoxycholesterol" "acitretin"
[409] "sphingosine" "gefitinib"
[411] "ebelactone-b" "retinyl-acetate"
[413] "oleoylethanolamide" "torin-2"
[415] "erlotinib" "staurosporine"
[417] "zosuquidar" "masitinib"
[419] "cinacalcet" "pazopanib"
[421] "acetyl-geranygeranyl-cysteine" "isotretinoin"
[423] "calcifediol" "reversine"
[425] "xanthohumol" "crizotinib"
[427] "enzastaurin" "acetyl-farnesyl-cysteine"
[429] "lavendustin-c" "taurocholic-acid"
[431] "nilotinib" "pirarubicin"
[433] "safinamide" "ellipticine"
[435] "cediranib" "lypressin"
[437] "quizartinib" "geranylgeraniol"
[439] "lopinavir" "plinabulin"
[441] "linifanib" "bosutinib"
[443] "sunitinib" "vicriviroc"
[445] "acetyl-geranyl-cysteine" "androstanol"
[447] "teicoplanin" "dopamine"
[449] "azacitidine" "idelalisib"
[[3]]
[1] "corScore" "corCmapScorePos" "corCmapScoreNeg" "corCmapScoreAvg"
I've dealt with multi-dimensional arrays in Python, but I am not sure how to deal with the names in the array while preserving their structure and moving the numerical values around. Sorry for any confusion.
Edit 2:
The solution ended up being to iterate over all three dimensions.
fold_all = fold1
dims <- dim(fold1)
for (i in 1:dims[1]) {
for (j in 1:dims[2]) {
for (k in 1:dims[3] ) {
if (fold1[i,j,k] != 1) {
fold_all[i,j,k] = fold1[i,j,k]
} else if (fold2[i,j,k] != 1) {
fold_all[i,j,k] = fold2[i,j,k]
} else if (fold3[i,j,k] != 1) {
fold_all[i,j,k] = fold3[i,j,k]
} else if (fold4[i,j,k] != 1) {
fold_all[i,j,k] = fold4[i,j,k]
} else if (fold5[i,j,k] != 1) {
fold_all[i,j,k] = fold5[i,j,k]
}
}
}
}
saveRDS(fold_all, 'all_folds.rds')
Thanks again!
Assuming you import each RDS file in as a dataframe with the naming convention dat_one, dat_two etc. you should be able to solve your problem with the following code. Let me know if this works for you!
dat_final <- tibble::tribble(
~names, ~'tri-a', ~'cal-c', ~xali, ~trop,
'PC3', 1.000, 1.000, 1.000, 1.000,
'MCF7', 1.000, 1.000, 1.000, 1.000
)
for (i in 1:nrow(dat_one)){
for (j in 2:length(dat_one)){
if (dat_one[i,j] != 1){
dat_final[i,j] = dat_one[i,j]
} else if (dat_two[i,j] != 1){
dat_final[i,j] = dat_two[i,j]
} else if (dat_three[i,j] != 1){
dat_final[i,j] = dat_three[i,j]
} else if (dat_four[i,j] != 1){
dat_final[i,j] = dat_four[i,j]
} else if (dat_five[i,j] != 1){
dat_final[i,j] = dat_five[i,j]
}
}
}

In dataframe, how to sum values as element wise for a large number of entries?

my_data contains 1920 entries: from X1.V1 to X192.V10 as below:
pic<-my_data
X1.V1 X1.V2 X1.V3 X1.V4 X1.V5 X1.V6 X1.V7 X1.V8 X1.V9 X1.V10
X2.V1 X2.V2 X2.V3 X2.V4 X2.V5 X2.V6 X2.V7 X2.V8 X2.V9 X2.V10
------------------------------------------------------------
X192.V1 X192.V2 X192.V3 X192.V4 X192.V5 X192.V6 X192.V7 X192.V8 X192.V9 X192.V10
In this data, each entry contains 301 elements, e.g.,
pic$X1.V1
[1] -29.935 -7.798 -20.366 1.772 -10.796 -23.363
[7] -1.226 -13.794 -26.361 -4.224 -16.791 -29.359
[13] -7.222 -19.789 2.348 -10.219 -22.787 -0.650
[19] -13.217 -25.785 -3.647 -16.215 -28.782 -6.645
[25] -19.213 2.925 -9.643 -22.210 -0.073 -12.641
[31] -25.208 -3.071 -15.638 -28.206 -6.069 -18.636
[37] 3.501 -9.066 -21.634 0.503 -12.064 -24.632
[43] -2.494 -15.062 -27.630 -5.492 -18.060 4.078
[49] -8.490 -21.058 1.080 -11.488 -24.055 -1.918
[55] -14.485 -27.053 -4.916 -17.483 4.654 -7.913
[61] -20.481 1.656 -10.911 -23.479 -1.341 -13.909
[67] -26.477 -4.339 -16.907 -29.474 -7.337 -19.905
[73] 2.233 -10.335 -22.902 -0.765 -13.333 -25.900
[79] -3.763 -16.330 -28.898 -6.760 -19.328 2.809
[85] -9.758 -22.326 -0.188 -12.756 -25.324 -3.186
[91] -15.754 -28.321 -6.184 -18.752 3.386 -9.182
[97] -21.749 0.388 -12.180 -24.747 -2.610 -15.177
[103] -27.745 -5.608 -18.175 3.962 -8.605 -21.173
[109] 0.964 -11.603 -24.171 -2.033 -14.601 -27.168
[115] -5.031 -17.599 4.539 -8.029 -20.596 1.541
[121] -11.027 -23.594 -1.457 -14.024 -26.592 -4.455
[127] -17.022 -29.590 -7.452 -20.020 2.117 -10.450
[133] -23.018 -0.880 -13.448 -26.015 -3.878 -16.446
[139] -29.013 -6.876 -19.443 2.694 -9.874 -22.441
[145] -0.304 -12.871 -25.439 -3.302 -15.869 -28.437
[151] -6.299 -18.867 3.270 -9.297 -21.865 0.273
[157] -12.295 -24.862 -2.725 -15.293 -27.860 -5.723
[163] -18.290 3.847 -8.721 -21.288 0.849 -11.718
[169] -24.286 -2.149 -14.716 -27.284 -5.146 -17.714
[175] 4.423 -8.144 -20.712 1.426 -11.142 -23.709
[181] -1.572 -14.140 -26.707 -4.570 -17.137 -29.705
[187] -7.568 -20.135 2.002 -10.565 -23.133 -0.996
[193] -13.563 -26.131 -3.993 -16.561 -29.128 -6.991
[199] -19.559 2.579 -9.989 -22.556 -0.419 -12.987
[205] -25.554 -3.417 -15.984 -28.552 -6.415 -18.982
[211] 3.155 -9.412 -21.980 0.157 -12.410 -24.978
[217] -2.840 -15.408 -27.975 -5.838 -18.406 3.732
[223] -8.836 -21.403 0.734 -11.834 -24.401 -2.264
[229] -14.831 -27.399 -5.262 -17.829 4.308 -8.259
[235] -20.827 1.310 -11.257 -23.825 -1.687 -14.255
[241] -26.822 -4.685 -17.253 -29.820 -7.683 -20.250
[247] 1.887 -10.681 -23.248 -1.111 -13.678 -26.246
[253] -4.109 -16.676 -29.244 -7.106 -19.674 2.463
[259] -10.104 -22.672 -0.534 -13.102 -25.669 -3.532
[265] -16.100 -28.667 -6.530 -19.097 3.040 -9.528
[271] -22.095 0.042 -12.525 -25.093 -2.956 -15.523
[277] -28.091 -5.953 -18.521 3.616 -8.951 -21.519
[283] 0.619 -11.949 -24.516 -2.379 -14.947 -27.514
[289] -5.377 -17.944 4.193 -8.375 -20.942 1.195
[295] -11.372 -23.940 -1.803 -14.370 -26.938 -4.800
[301] -17.368
Suppose, if I want to sum 3 entries e.g.,1st element of pic$X1.V1 + 1st element of pic$X2.V1+1st element of pic$X3.V1 and so on, it works fine by
pic$X1.V1 + pic$X2.V1 +pic$X3.V1
However, when I want to sum from pic$X1.V1 to pic$X192.V1 using the above way. It is tedious by typing each one as below
pic$X1.V1 + pic$X2.V1 +pic$X3.V1 +.............+ pic$X192.V1
I have tried the alternative way like below. But it does not work.
sum(pic$X1.V1:pic$X192.V1)
[1] -435.505
Warning messages:
1: In pic$X1.V1:pic$X192.V1 :
numerical expression has 301 elements: only the first used
2: In pic$X1.V1:pic$X192.V1 :
numerical expression has 301 elements: only the first used
Please let me know a better method.
There is a rowSums function. It's optimized in C code so any other solution is probably less efficient. In this case it appears that you want the entire data.frame processed.
rowSums(pic)
If you wanted to choose only those columns that matched the pattern Vnnn.Xmm you could do this:
rowSums(pic[grepl( "^V\\d{1,3}\\.X\\d{1,2}", colnames(pic) )] )

Loop read character vector in order it was produced vs alphabetical

Ok so I pulled a list of tickers from a sorted data frame which was sorted by date. Thus my symbols character vector is in a sorted order:
> tickers
[1] "SPPI" "ZGNX" "ARDM" "PRTK" "GTXI" "HEB" "FCSC" "ACOR" "ALKS" "MNKD" "HRTX" "CTIC"
[13] "ARLZ" "JAZZ" "VVUS" "DEPO" "OREX" "PLX" "PTIE" "DRRX" "SGEN" "PCRX" "PSDV" "ALIM"
[25] "INCY" "ATRS" "INSY" "CRIS" "CORT" "EBS" "RGEN" "ARNA" "AMRN" "HALO" "NAVB" "SUPN"
[37] "EXEL" "IPXL" "IMGN" "DVAX" "SCMP" "TTNP" "ENDP" "AVDL" "AVEO" "TBPH" "DCTH" "ABBV"
[49] "AMAG" "VNDA" "BMRN" "MDCO" "OMER" "BDSI" "EGRX" "ACRX" "KERX" "NKTR" "PGNX" "AEZS"
[61] "ENTA" "BCRX" "ADMS" "VRTX" "NBIX" "RMTI" "ADMP" "AMGN" "MNTA" "PTX" "EBIO" "NYMX"
[73] "VTL" "TTPH" "MACK" "LPTX" "GWPH" "SPHS" "RPRX" "OTIC" "NEOT" "CHRS" "ZFGN" "NEOS"
[85] "RDHL" "PTLA" "OPK" "CHMA" "ACAD" "NLNK" "AZN" "ICPT" "AAAP" "DERM" "OCUL" "MRNS"
[97] "RVNC" "CLVS" "GALE" "LPCN" "TSRO" "AMPE" "CYTR" "RARE" "MCRB" "ADMA" "IONS" "VTVT"
[109] "AUPH" "EARS" "ACRS" "KMDA" "RIGL" "KPTI" "TNXP" "AERI" "NVAX" "VICL" "SRPT" "GILD"
[121] "ITCI" "GNCA" "ABUS" "CEMP" "TENX" "ALNY" "PLXP" "PTN" "INNL" "ANTH" "CRBP" "BSTC"
[133] "REPH" "NOVN" "CERC" "HTBX" "LXRX" "HZNP" "SGYP" "OPHT" "AKAO" "LIFE" "PRTO" "VCEL"
[145] "IRWD" "PBMD" "AMPH" "PFE" "AGRX" "EGLT" "ADHD" "FGEN" "AGN" "GEMP" "OCRX" "CATB"
[157] "DMTX" "AVIR" "JNJ" "TCON" "SAGE" "ZSAN" "AXON" "MRK" "VRX" "ARDX" "XBIT" "CDTX"
[169] "TRVN" "CELG" "CMRX" "ARGS" "LJPC" "NDRM" "PBYI" "SCYX" "PTCT" "GALT" "KURA" "AKCA"
[181] "TGTX" "NVS" "CPRX" "LLY" "GNMX" "BLRX" "XENE" "FOMX" "SNY" "REGN" "RTTR" "CARA"
[193] "NVCR" "BMY" "ONCE" "GERN" "MESO" "OMED" "MTFB" "EIGR" "ACHN" "AKTX" "XOMA" "CAPR"
[205] "RDUS" "NTRP" "BPMX" "TXMD" "BTX" "GSK" "CORI" "FOLD" "BLPH" "SBPH" "NVO" "RETA"
[217] "ECYT" "IMDZ" "MTNB" "ARQL" "LOXO" "ZYME" "RNN" "PIRS" "FPRX" "CALA" "BGNE" "BLUE"
[229] "CLSN" "CRVS" "GLYC" "JUNO" "IOVA" "RGLS" "XLRN" "ALDX" "EPZM" "SELB" "IMUC" "BLCM"
[241] "GBT" "STML" "AGIO" "RARX" "ALDR" "ITEK" "IMRN" "QURE" "SVRA" "KDMN" "CBAY" "BVXV"
[253] "CYTX" "NVIV" "MYOK" "ZYNE" "ESPR" "GLPG" "ABIO" "CVM" "STDY" "CLLS" "INSM" "VSTM"
[265] "VYGR" "VRNA" "UTHR" "ARRY" "BPMC" "IDRA" "INO" "EPIX" "AGEN" "FENC" "MRTX" "INVA"
[277] "NBRV" "VSAR" "IPCI" "PRQR" "AZRX" "PRTA" "BHVN" "MYL" "FLXN" "ANAB" "RXDX"
I want the loop to read this character vector in the order it was produced vs alphabetic.
If I illustrate the loading of data with:
# Note function is store list of commands to perform over a directory of files
genCHART = function(x){
next.symbol <- tickers[i] # specify to start from first position in vector
date.list <- dates[i] # specify to start from first position in vector
next.file <- fread(paste0("https://www.alphavantage.co/query?function=TIME_SERIES_DAILY&symbol=",next.symbol,"&outputsize=full&apikey=6RSYX9BPXKZVXUS9&datatype=csv"))
new.df <- data.frame(next.file)
head(new.df)
}
# Loop calls function in order to process multiple files
for (i in 1:length(tickers)){
genCHART(tickers[[i]])
}
# loop will do nothing but process and load all tickers.. but its too illustrate the point.
What we see if we print tickers[i]
> next.symbol
[1] "ANTH"
it gives me the first ticker per its alphabetical order. So it returns tickers beginning with A first versus my order above. I want it to loop through my character list as the the order of the ticker vector.
Is there anyway to over come this?
Editing post
IF I take a vector of dates:
> dates
[1] "2009-07-05" "2009-07-16" "2009-07-16" "2009-09-04" "2009-10-09" "2009-11-02"
[7] "2009-11-02" "2009-12-01" "2009-12-18" "2010-01-22" "2010-01-27" "2010-03-15"
[13] "2010-03-15" "2010-03-19" "2010-04-09" "2010-04-30" "2010-10-11" "2010-10-28"
[19] "2011-01-19" "2011-01-28" "2011-02-01" "2011-02-25" "2011-04-29" "2011-06-22"
[25] "2011-06-24" "2011-06-24" "2011-08-19" "2011-10-31" "2011-11-11" "2011-11-11"
[31] "2011-11-16" "2011-11-23" "2011-12-08" "2012-01-05" "2012-01-30" "2012-02-17"
and I want to start from the first in the vector...
date.list <- dates[i] # specify to start from first position in vector
shouldnt the above work even though it is wrapped in a function?
How can I get it work so that I read the start of my vector, and also how does this work when im putting my code in a function and then running the function in loop to process multiple files?
print(i) was = to 3
i=1
was the answer

R: Distinguish helper functions from main function in CRAN packages

How to distinguish helper functions from main functions in a package containing lot of functions? Is there a way to separate them by means of some standard identifier may be implemented in R?
For example, I am using sets package in my R session as follows. How would I separate helper functions from main functions from the list given below. I could do it by reading the documentation (sometimes it is explicitly documented a function as helper function), but is there is standard identifier implemented in R, to say that function is a helper or main function?
library(sets)
ls("package:sets")
[1] "%..%" "%<%" "%<=%"
[4] "%>%" "%D%" "%e%"
[7] "%is%" "as.cset" "as.gset"
[10] "as.interval" "as.set" "as.tuple"
[13] "binary_closure" "binary_reduction" "canonicalize_set_and_mapping"
[16] "charfun_generator" "closure" "cset"
[19] "cset_bound" "cset_cardinality" "cset_cartesian"
[22] "cset_charfun" "cset_combn" "cset_complement"
[25] "cset_concentrate" "cset_contains_element" "cset_core"
[28] "cset_defuzzify" "cset_difference" "cset_dilate"
[31] "cset_dissimilarity" "cset_has_missings" "cset_height"
[34] "cset_intersection" "cset_is_crisp" "cset_is_empty"
[37] "cset_is_equal" "cset_is_fuzzy_multiset" "cset_is_fuzzy_set"
[40] "cset_is_multiset" "cset_is_proper_subset" "cset_is_set"
[43] "cset_is_set_or_fuzzy_set" "cset_is_set_or_multiset" "cset_is_subset"
[46] "cset_matchfun" "cset_matchfun<-" "cset_mean"
[49] "cset_memberships" "cset_normalize" "cset_orderfun"
[52] "cset_orderfun<-" "cset_outer" "cset_peak"
[55] "cset_power" "cset_product" "cset_similarity"
[58] "cset_sum" "cset_support" "cset_symdiff"
[61] "cset_transform_memberships" "cset_union" "cset_universe"
[64] "e" "fuzzy_bell" "fuzzy_bell_gset"
[67] "fuzzy_cone" "fuzzy_cone_gset" "fuzzy_inference"
[70] "fuzzy_logic" "fuzzy_normal" "fuzzy_normal_gset"
[73] "fuzzy_partition" "fuzzy_pi3" "fuzzy_pi3_gset"
[76] "fuzzy_pi4" "fuzzy_pi4_gset" "fuzzy_rule"
[79] "fuzzy_sigmoid" "fuzzy_sigmoid_gset" "fuzzy_system"
[82] "fuzzy_trapezoid" "fuzzy_trapezoid_gset" "fuzzy_triangular"
[85] "fuzzy_triangular_gset" "fuzzy_tuple" "fuzzy_two_normals"
[88] "fuzzy_two_normals_gset" "fuzzy_variable" "gset"
[91] "gset_bound" "gset_cardinality" "gset_cartesian"
[94] "gset_charfun" "gset_combn" "gset_complement"
[97] "gset_concentrate" "gset_contains_element" "gset_core"
[100] "gset_defuzzify" "gset_difference" "gset_dilate"
[103] "gset_dissimilarity" "gset_has_missings" "gset_height"
[106] "gset_intersection" "gset_is_crisp" "gset_is_empty"
[109] "gset_is_equal" "gset_is_fuzzy_multiset" "gset_is_fuzzy_set"
[112] "gset_is_multiset" "gset_is_proper_subset" "gset_is_set"
[115] "gset_is_set_or_fuzzy_set" "gset_is_set_or_multiset" "gset_is_subset"
[118] "gset_mean" "gset_memberships" "gset_normalize"
[121] "gset_outer" "gset_peak" "gset_power"
[124] "gset_product" "gset_similarity" "gset_sum"
[127] "gset_support" "gset_symdiff" "gset_transform_memberships"
[130] "gset_union" "gset_universe" "integers"
[133] "integers2reals" "interval" "interval_complement"
[136] "interval_contains_element" "interval_difference" "interval_division"
[139] "interval_domain" "interval_intersection" "interval_is_bounded"
[142] "interval_is_closed" "interval_is_countable" "interval_is_degenerate"
[145] "interval_is_empty" "interval_is_equal" "interval_is_finite"
[148] "interval_is_greater_than" "interval_is_greater_than_or_equal" "interval_is_half_bounded"
[151] "interval_is_left_bounded" "interval_is_left_closed" "interval_is_left_open"
[154] "interval_is_left_unbounded" "interval_is_less_than" "interval_is_less_than_or_equal"
[157] "interval_is_proper" "interval_is_proper_subinterval" "interval_is_right_bounded"
[160] "interval_is_right_closed" "interval_is_right_open" "interval_is_right_unbounded"
[163] "interval_is_subinterval" "interval_is_unbounded" "interval_is_uncountable"
[166] "interval_measure" "interval_power" "interval_product"
[169] "interval_sum" "interval_symdiff" "interval_union"
[172] "is.charfun_generator" "is.cset" "is.gset"
[175] "is.interval" "is.set" "is.tuple"
[178] "is_element" "LABEL" "LABELS"
[181] "make_set_with_order" "matchfun" "naturals"
[184] "naturals0" "pair" "reals"
[187] "reals2integers" "reduction" "set"
[190] "set_cardinality" "set_cartesian" "set_combn"
[193] "set_complement" "set_contains_element" "set_dissimilarity"
[196] "set_intersection" "set_is_empty" "set_is_equal"
[199] "set_is_proper_subset" "set_is_subset" "set_outer"
[202] "set_power" "set_similarity" "set_symdiff"
[205] "set_union" "sets_options" "singleton"
[208] "triple" "tuple" "tuple_is_ntuple"
[211] "tuple_is_pair" "tuple_is_singleton" "tuple_is_triple"
[214] "tuple_outer"
Typically "helper" functions (defined as: functions used by main functions and typically not used by end users directly) are not exported and are not visible unless you use the package_name:::helper_function_name syntax. If a function is visible and documented it typically is a main function. So, if you want to see main functions, just use ls:
ls("package:sets")
If you want all functions (main + helper):
ls(getNamespace("sets"))
And finally, to get just helper functions, use setdiff:
setdiff(ls(getNamespace("sets")), ls("package:sets"))
See this SO Q/A for some discussion.
One potential ambiguity is that sometimes S3 methods are not explicitly exported, even if they are intended to be used as "main" functions.
R library's NAMESPACE file should include a list of export - functions which are exposed. So, there are good definitions of exposed (thus public) and non-exposed (thus internal/private) functions. There is no formal definition of 'main' and 'helper' functions. You may argue only 'main' functions should be exposed, but I don't think it is clear.
A lazy developer sometimes includes
exportPattern("^[[:alpha:]]+")
in NAMESPACE file, thus exposing all functions. In that case, I might suspect that that library's exposed functions are all mixed with main/helper functions. However, when I checked sets's NAMESPACE file, it is very well organized. Thus, I believe sets's exposed functions are well maintained and necessary.

Resources