R: Distinguish helper functions from main function in CRAN packages - r

How to distinguish helper functions from main functions in a package containing lot of functions? Is there a way to separate them by means of some standard identifier may be implemented in R?
For example, I am using sets package in my R session as follows. How would I separate helper functions from main functions from the list given below. I could do it by reading the documentation (sometimes it is explicitly documented a function as helper function), but is there is standard identifier implemented in R, to say that function is a helper or main function?
library(sets)
ls("package:sets")
[1] "%..%" "%<%" "%<=%"
[4] "%>%" "%D%" "%e%"
[7] "%is%" "as.cset" "as.gset"
[10] "as.interval" "as.set" "as.tuple"
[13] "binary_closure" "binary_reduction" "canonicalize_set_and_mapping"
[16] "charfun_generator" "closure" "cset"
[19] "cset_bound" "cset_cardinality" "cset_cartesian"
[22] "cset_charfun" "cset_combn" "cset_complement"
[25] "cset_concentrate" "cset_contains_element" "cset_core"
[28] "cset_defuzzify" "cset_difference" "cset_dilate"
[31] "cset_dissimilarity" "cset_has_missings" "cset_height"
[34] "cset_intersection" "cset_is_crisp" "cset_is_empty"
[37] "cset_is_equal" "cset_is_fuzzy_multiset" "cset_is_fuzzy_set"
[40] "cset_is_multiset" "cset_is_proper_subset" "cset_is_set"
[43] "cset_is_set_or_fuzzy_set" "cset_is_set_or_multiset" "cset_is_subset"
[46] "cset_matchfun" "cset_matchfun<-" "cset_mean"
[49] "cset_memberships" "cset_normalize" "cset_orderfun"
[52] "cset_orderfun<-" "cset_outer" "cset_peak"
[55] "cset_power" "cset_product" "cset_similarity"
[58] "cset_sum" "cset_support" "cset_symdiff"
[61] "cset_transform_memberships" "cset_union" "cset_universe"
[64] "e" "fuzzy_bell" "fuzzy_bell_gset"
[67] "fuzzy_cone" "fuzzy_cone_gset" "fuzzy_inference"
[70] "fuzzy_logic" "fuzzy_normal" "fuzzy_normal_gset"
[73] "fuzzy_partition" "fuzzy_pi3" "fuzzy_pi3_gset"
[76] "fuzzy_pi4" "fuzzy_pi4_gset" "fuzzy_rule"
[79] "fuzzy_sigmoid" "fuzzy_sigmoid_gset" "fuzzy_system"
[82] "fuzzy_trapezoid" "fuzzy_trapezoid_gset" "fuzzy_triangular"
[85] "fuzzy_triangular_gset" "fuzzy_tuple" "fuzzy_two_normals"
[88] "fuzzy_two_normals_gset" "fuzzy_variable" "gset"
[91] "gset_bound" "gset_cardinality" "gset_cartesian"
[94] "gset_charfun" "gset_combn" "gset_complement"
[97] "gset_concentrate" "gset_contains_element" "gset_core"
[100] "gset_defuzzify" "gset_difference" "gset_dilate"
[103] "gset_dissimilarity" "gset_has_missings" "gset_height"
[106] "gset_intersection" "gset_is_crisp" "gset_is_empty"
[109] "gset_is_equal" "gset_is_fuzzy_multiset" "gset_is_fuzzy_set"
[112] "gset_is_multiset" "gset_is_proper_subset" "gset_is_set"
[115] "gset_is_set_or_fuzzy_set" "gset_is_set_or_multiset" "gset_is_subset"
[118] "gset_mean" "gset_memberships" "gset_normalize"
[121] "gset_outer" "gset_peak" "gset_power"
[124] "gset_product" "gset_similarity" "gset_sum"
[127] "gset_support" "gset_symdiff" "gset_transform_memberships"
[130] "gset_union" "gset_universe" "integers"
[133] "integers2reals" "interval" "interval_complement"
[136] "interval_contains_element" "interval_difference" "interval_division"
[139] "interval_domain" "interval_intersection" "interval_is_bounded"
[142] "interval_is_closed" "interval_is_countable" "interval_is_degenerate"
[145] "interval_is_empty" "interval_is_equal" "interval_is_finite"
[148] "interval_is_greater_than" "interval_is_greater_than_or_equal" "interval_is_half_bounded"
[151] "interval_is_left_bounded" "interval_is_left_closed" "interval_is_left_open"
[154] "interval_is_left_unbounded" "interval_is_less_than" "interval_is_less_than_or_equal"
[157] "interval_is_proper" "interval_is_proper_subinterval" "interval_is_right_bounded"
[160] "interval_is_right_closed" "interval_is_right_open" "interval_is_right_unbounded"
[163] "interval_is_subinterval" "interval_is_unbounded" "interval_is_uncountable"
[166] "interval_measure" "interval_power" "interval_product"
[169] "interval_sum" "interval_symdiff" "interval_union"
[172] "is.charfun_generator" "is.cset" "is.gset"
[175] "is.interval" "is.set" "is.tuple"
[178] "is_element" "LABEL" "LABELS"
[181] "make_set_with_order" "matchfun" "naturals"
[184] "naturals0" "pair" "reals"
[187] "reals2integers" "reduction" "set"
[190] "set_cardinality" "set_cartesian" "set_combn"
[193] "set_complement" "set_contains_element" "set_dissimilarity"
[196] "set_intersection" "set_is_empty" "set_is_equal"
[199] "set_is_proper_subset" "set_is_subset" "set_outer"
[202] "set_power" "set_similarity" "set_symdiff"
[205] "set_union" "sets_options" "singleton"
[208] "triple" "tuple" "tuple_is_ntuple"
[211] "tuple_is_pair" "tuple_is_singleton" "tuple_is_triple"
[214] "tuple_outer"

Typically "helper" functions (defined as: functions used by main functions and typically not used by end users directly) are not exported and are not visible unless you use the package_name:::helper_function_name syntax. If a function is visible and documented it typically is a main function. So, if you want to see main functions, just use ls:
ls("package:sets")
If you want all functions (main + helper):
ls(getNamespace("sets"))
And finally, to get just helper functions, use setdiff:
setdiff(ls(getNamespace("sets")), ls("package:sets"))
See this SO Q/A for some discussion.
One potential ambiguity is that sometimes S3 methods are not explicitly exported, even if they are intended to be used as "main" functions.

R library's NAMESPACE file should include a list of export - functions which are exposed. So, there are good definitions of exposed (thus public) and non-exposed (thus internal/private) functions. There is no formal definition of 'main' and 'helper' functions. You may argue only 'main' functions should be exposed, but I don't think it is clear.
A lazy developer sometimes includes
exportPattern("^[[:alpha:]]+")
in NAMESPACE file, thus exposing all functions. In that case, I might suspect that that library's exposed functions are all mixed with main/helper functions. However, when I checked sets's NAMESPACE file, it is very well organized. Thus, I believe sets's exposed functions are well maintained and necessary.

Related

How do I use the index value within the function of a for loop in R?

Multiplier <- numeric(200)
for (i in 1:200) Multiplier[i] <- 1/sqrt(i(i+1))
Like a math function where f(n)=1/sqrt(n(n+1), and put those first 200 values in an array. But when I run the above code I get:
# Error in i(i + 1) : could not find function "i"
and when I try to use [i] I get:
# Error: unexpected '[' in "for (i in 1:200) Multiplier[i] <- 1/sqrt([
Change the i(i+1) to i*(i+1) . When we use i() it is assuming i as function and the i+1 as argument to the function i
for (i in 1:200) Multiplier[i] <- 1/sqrt(i*(i+1))
-output
> Multiplier
[1] 0.707106781 0.408248290 0.288675135 0.223606798 0.182574186 0.154303350 0.133630621 0.117851130 0.105409255 0.095346259
[11] 0.087038828 0.080064077 0.074124932 0.069006556 0.064549722 0.060633906 0.057166195 0.054073807 0.051298918 0.048795004
[21] 0.046524211 0.044455422 0.042562827 0.040824829 0.039223227 0.037742568 0.036369648 0.035093120 0.033903175 0.032791292
[31] 0.031750032 0.030772873 0.029854072 0.028988552 0.028171808 0.027399831 0.026669037 0.025976217 0.025318484 0.024693240
[41] 0.024098135 0.023531040 0.022990024 0.022473329 0.021979349 0.021506620 0.021053798 0.020619652 0.020203051 0.019802951
[51] 0.019418391 0.019048483 0.018692405 0.018349396 0.018018749 0.017699808 0.017391962 0.017094641 0.016807316 0.016529490
[61] 0.016260700 0.016000512 0.015748520 0.015504342 0.015267620 0.015038019 0.014815221 0.014598929 0.014388862 0.014184754
[71] 0.013986356 0.013793431 0.013605757 0.013423121 0.013245324 0.013072175 0.012903494 0.012739112 0.012578865 0.012422600
[81] 0.012270170 0.012121435 0.011976263 0.011834527 0.011696106 0.011560887 0.011428758 0.011299615 0.011173359 0.011049892
[91] 0.010929125 0.010810969 0.010695340 0.010582159 0.010471348 0.010362833 0.010256545 0.010152415 0.010050378 0.009950372
[101] 0.009852336 0.009756214 0.009661948 0.009569488 0.009478779 0.009389775 0.009302426 0.009216688 0.009132515 0.009049866
[111] 0.008968700 0.008888977 0.008810658 0.008733708 0.008658090 0.008583770 0.008510715 0.008438894 0.008368274 0.008298827
[121] 0.008230522 0.008163333 0.008097232 0.008032193 0.007968191 0.007905200 0.007843198 0.007782160 0.007722065 0.007662891
[131] 0.007604618 0.007547224 0.007490689 0.007434996 0.007380124 0.007326056 0.007272775 0.007220264 0.007168505 0.007117483
[141] 0.007067182 0.007017587 0.006968683 0.006920457 0.006872893 0.006825978 0.006779700 0.006734045 0.006689001 0.006644555
[151] 0.006600696 0.006557412 0.006514693 0.006472526 0.006430901 0.006389809 0.006349238 0.006309180 0.006269623 0.006230560
[161] 0.006191980 0.006153875 0.006116237 0.006079055 0.006042324 0.006006033 0.005970176 0.005934744 0.005899731 0.005865128
[171] 0.005830929 0.005797126 0.005763713 0.005730683 0.005698029 0.005665745 0.005633825 0.005602263 0.005571052 0.005540187
[181] 0.005509663 0.005479473 0.005449612 0.005420074 0.005390855 0.005361950 0.005333352 0.005305058 0.005277063 0.005249362
[191] 0.005221950 0.005194823 0.005167976 0.005141405 0.005115106 0.005089075 0.005063307 0.005037799 0.005012547 0.004987547
According to ?Paren
Open parenthesis, (, and open brace, {, are .Primitive functions in R.
Effectively, ( is semantically equivalent to the identity function(x) x

How to remove the prefix of each sample

I was stuck in removing the prefix of each sample. I have tried to remove all the number within the sample, but this could not be a good way for grouping. I would like to only keep the sample name as the last two suffix. ( For example: AAP-L ) The details are list as below. Thank you in advance!
geo$pd$title
[1] "AAB-HT002-AAP-L" "AAB-HT003-AAP-L" "AAB-HT006-AAP-L" "AAB-HT002-AAP-NL"
[5] "AAB-HT003-AAP-NL" "AAB-HT006-AAP-NL" "AAB-C007-AU-L" "AAB-HT001-AT-L"
[9] "AAB-N-C021-Normal-NC" "AAB-N-C022-Normal-NC" "AAB-C024-Normal-NC" "AAB-N-C025-Normal-NC"
[13] "AAB-HT010-AAP.T-L" "AAB-HT011-AAP-L" "AAB-HT012-AAP-L" "AAB-HT010-AAP.T-NL"
[17] "AAB-HT011-AAP-NL" "AAB-HT012-AAP-NL" "AAB-C013-AU-L" "AAB-C033-AU-L"
[21] "AAB-C037-AT-L" "AAB-C043-AU-L" "AAB-HT041-AU-L" "AAB-N-C026-Normal-NC"
[25] "AAB-N-C027-Normal-NC" "AAB-N-C028-Normal-NC" "AAB-N-C029-Normal-NC" "AAB-C014-AAP-L"
[29] "AAB-HT017-AAP.T-L" "AAB-HT018-AAP-L" "AAB-C014-AAP-NL" "AAB-HT017-AAP.T-NL"
[33] "AAB-HT018-AAP-NL" "AAB-C047-AT-L" "AAB-M044-AU-L" "AAB-N-C030-Normal-NC"
[37] "AAB-N-C032-Normal-NC" "AAB-N-C034-Normal-NC" "AAB-N-C035-Normal-NC" "AAB-C020-AAP.T-L"
[41] "AAB-C038-AAP-L" "AABM046-AAP-L" "AAB-C020-AAP.T-NL" "AABM046-AAP-NL"
[45] "AAB-C048-AT-L" "AAB-HT050-AT-L" "AAB-M-060-AU-L" "AAB-M-061-AU-L"
[49] "AAB-N-C036-Normal-NC" "AAB-N-C039-Normal-NC" "AAB-N-C042-Normal-NC" "AAB-N-C045-Normal-NC"
[53] "AAB-C052-AAP-L" "AAB-C076-AAP-L" "AAB-M056-AAP-L" "AAB-M058-AAP-L"
[57] "AAB-C052-AAP-NL" "AAB-C076-AAP-NL" "AAB-M056-AAP-NL" "AAB-M058-AAP-NL"
[61] "AAB-HT077-AU-L" "AAB-HT082-AU-L" "AAB-M080-AU-L" "AAB-N-C054-Normal-NC"
[65] "AAB-N-C055-Normal-NC" "AAB-N-C059-Normal-NC" "AAB-N-C062-Normal-NC" "AAB-C083-AAP-L"
[69] "AAB-HT009-AAP-L" "AAB-HT079-AAP-L" "AAB-SF086-AAP-L" "AAB-C083-AAP-NL"
[73] "AAB-HT079-AAP-NL" "AAB-SF086-AAP-NL" "AAB-C016-AU-L" "AAB-HT008-AU-L"
[77] "AAB-HT091-AT-L" "AAB-SF087-AU-L" "AAB-N-C063-Normal-NC" "AAB-N-C064-Normal-NC"
[81] "AAB-N-C065-Normal-NC" "AAB-HT103-AAP-L" "AAB-SF078-AAP.T-L" "AAB-SF099-AAP-L"
[85] "AAB-HT103-AAP-NL" "AAB-SF078-AAP.T-NL" "AAB-SF099-AAP-NL" "AAB-HT096-AT-L"
[89] "AAB-M094-AU-L" "AAB-SF089-AU-L" "AAB-SF090-AU-L" "AAB-SF100-AU-L"
[93] "AAB-N-C069-Normal-NC" "AAB-N-C070-Normal-NC" "AAB-N-C071-Normal-NC" "AAB-N-C072-Normal-NC"
[97] "AAB-N-C074-Normal-NC" "AAB-N-C075-Normal-NC" "AAB-N-C085-Normal-NC" "AAB-C092-Normal-NC"
[101] "AAB-M112-AAP-L" "AAB-SF104-AAP-L" "AAB-SF114-AAP-L" "AAB-SF115-AAP.T-L"
[105] "AAB-M112-AAP-NL" "AAB-SF104-AAP-NL" "AAB-SF114-AAP-NL" "AAB-SF115-AAP.T-NL"
[109] "AAB-C109-AU-L" "AAB-C111-AU-L" "AAB-HT101-AU-L" "AAB-M110-AT-L"
[113] "AAB-SF106-AU-L" "AAB-SF113-AU-L" "AAB-N-C098-Normal-NC" "AAB-N-C105-Normal-NC"
[117] "AAB-N-C107-Normal-NC" "AAB-N-C108-Normal-NC" "AAB-HT095-AAP.T-L" "AAB-HT095-AAP.T-NL"
[121] "AAB-HT097-AT-L" "AAB-C093-Normal-NC"
Try this:
library(stringr)
# test data:
string <- c("AAB-HT002-AAP-L", "AAB-HT017-AAP.T-L", "AAB-HT003-AAP-L", "AAB-HT006-AAP-L", "AAB-HT002-AAP-NL")
str_split_fixed(string, '-', n=3)[, 3]
# output:
[1] "AAP-L" "AAP.T-L" "AAP-L" "AAP-L" "AAP-NL"
This will deliver the terminal (alpha+period)-dash-(alpha+period)-end components.
titles <-c("AAB-HT002-AAP-L", "AAB-HT003-AA.P-L", "AAB-HT006-AAP-L", "AAB-HT002-AA.P-NL")
sub( "(.+)([-])([[:alpha:].]+[-][[:alpha:].]+$)", "\\3", titles)
[1] "AAP-L" "AA.P-L" "AAP-L" "AA.P-NL"
We could use
library(stringr)
str_remove(string, ".*\\d+-")
[1] "AAP-L" "AAP.T-L" "AAP-L" "AAP-L" "AAP-NL"

Get a list of the all the names of the objects in the datasets R package?

How can I get a list of the exact names of the objects in the datasets package?
I found many of them here:
data_package = data(package="datasets")
datasets <- as.data.frame(data_package[[3]])$Item
datasets
# [1] "AirPassengers" "BJsales" "BJsales.lead (BJsales)" "BOD" "CO2" "ChickWeight"
# [7] "DNase" "EuStockMarkets" "Formaldehyde" "HairEyeColor" "Harman23.cor" "Harman74.cor"
# [13] "Indometh" "InsectSprays" "JohnsonJohnson" "LakeHuron" "LifeCycleSavings" "Loblolly"
# [19] "Nile" "Orange" "OrchardSprays" "PlantGrowth" "Puromycin" "Seatbelts"
# [25] "Theoph" "Titanic" "ToothGrowth" "UCBAdmissions" "UKDriverDeaths" "UKgas"
# [31] "USAccDeaths" "USArrests" "USJudgeRatings" "USPersonalExpenditure" "UScitiesD" "VADeaths"
# [37] "WWWusage" "WorldPhones" "ability.cov" "airmiles" "airquality" "anscombe"
# [43] "attenu" "attitude" "austres" "beaver1 (beavers)" "beaver2 (beavers)" "cars"
# [49] "chickwts" "co2" "crimtab" "discoveries" "esoph" "euro"
# [55] "euro.cross (euro)" "eurodist" "faithful" "fdeaths (UKLungDeaths)" "freeny" "freeny.x (freeny)"
# [61] "freeny.y (freeny)" "infert" "iris" "iris3" "islands" "ldeaths (UKLungDeaths)"
# [67] "lh" "longley" "lynx" "mdeaths (UKLungDeaths)" "morley" "mtcars"
# [73] "nhtemp" "nottem" "npk" "occupationalStatus" "precip" "presidents"
# [79] "pressure" "quakes" "randu" "rivers" "rock" "sleep"
# [85] "stack.loss (stackloss)" "stack.x (stackloss)" "stackloss" "state.abb (state)" "state.area (state)" "state.center (state)"
# [91] "state.division (state)" "state.name (state)" "state.region (state)" "state.x77 (state)" "sunspot.month" "sunspot.year"
# [97] "sunspots" "swiss" "treering" "trees" "uspop" "volcano"
# [103] "warpbreaks" "women"
So something like this would iterate through each one
for(i in 1:length(datasets)) {
print(get(datasets[i]))
cat("\n\n")
}
It works for the first two datasets (AirPassengers and BJsales), but it fails on BJsales.lead (BJsales) since it should be referred to as datasets::BJsales.lead.
I guess I could use string split or similar to discard anything from a space onwards, but I wonder is there any neater way of obtaining a list of all the objects in the dataset package?
Notes
In addition to the above, I also tried listing everything in the datasets namespace but it gave a weird result:
ls(getNamespace("datasets"), all.names=TRUE)
# [1] ".__NAMESPACE__." ".__S3MethodsTable__." ".packageName"
There is a note on the ?data help page that states
Where the datasets have a different name from the argument that should be used to retrieve them the index will have an entry like beaver1 (beavers) which tells us that dataset beaver1 can be retrieved by the call data(beavers).
So the actual object name is the thing before the parentheses at the end. Since that value is returned as just a string, that's something you'll need to remove yourself unfortunately. But you can do that with a gsub
datanames <- data(package="datasets")$results[,"Item"]
objnames <- gsub("\\s+\\(.*\\)","", datanames)
for(ds in objnames) {
print(get(ds))
cat("\n\n")
}

easy solution needed to subset spectra files in list.files

I have a folder full of spectra files. The number of files can vary with different measurements as well as the repetitions.
I have so far and it works:
files <- list.files(pattern = "^Q\\d+")
print(files)
and print(list) gives:
[1] "Q010101N.001" "Q010101N.002" "Q010101N.003" "Q010101N.004" "Q010101N.005" "Q010101N.006"
[7] "Q010101N.007" "Q010101N.008" "Q010101N.009" "Q010101N.010" "Q010101N.011" "Q010101N.012"
[13] "Q010101N.013" "Q010101N.014" "Q010101N.015" "Q010101N.016" "Q010101N.017" "Q010101N.018"
[19] "Q010101N.019" "Q010101N.020" "Q010101N.021" "Q010101N.022" "Q010101N.023" "Q010101N.024"
[25] "Q010101N.025" "Q021101N.001" "Q021101N.002" "Q021101N.003" "Q021101N.004" "Q021101N.005"
[31] "Q021101N.006" "Q021101N.007" "Q021101N.008" "Q021101N.009" "Q021101N.010" "Q021101N.011"
[37] "Q021101N.012" "Q021101N.013" "Q021101N.014" "Q021101N.015" "Q021101N.016" "Q021101N.017"
[43] "Q021101N.018" "Q021101N.019" "Q021101N.020" "Q021101N.021" "Q021101N.022" "Q021101N.023"
[49] "Q021101N.024" "Q021101N.025" "Q031201N.001" "Q031201N.002" "Q031201N.003" "Q031201N.004"
[55] "Q031201N.005" "Q031201N.006" "Q031201N.007" "Q031201N.008" "Q031201N.009" "Q031201N.010"
[61] "Q031201N.011" "Q031201N.012" "Q031201N.013" "Q031201N.014" "Q031201N.015" "Q031201N.016"
[67] "Q031201N.017" "Q031201N.018" "Q031201N.019" "Q031201N.020" "Q031201N.021" "Q031201N.022"
[73] "Q031201N.023" "Q031201N.024" "Q031201N.025" "Q041301N.001" "Q041301N.002" "Q041301N.003"
[79] "Q041301N.004" "Q041301N.005" "Q041301N.006" "Q041301N.007" "Q041301N.008" "Q041301N.009"
[85] "Q041301N.010" "Q041301N.011" "Q041301N.012" "Q041301N.013" "Q041301N.014" "Q041301N.015"
[91] "Q041301N.016" "Q041301N.017" "Q041301N.018" "Q041301N.019" "Q041301N.020" "Q041301N.021"
[97] "Q041301N.022" "Q041301N.023" "Q041301N.024" "Q041301N.025" "Q051401N.001" "Q051401N.002"
[103] "Q051401N.003" "Q051401N.004" "Q051401N.005" "Q051401N.006" "Q051401N.007" "Q051401N.008"
[109] "Q051401N.009" "Q051401N.010" "Q051401N.011" "Q051401N.012" "Q051401N.013" "Q051401N.014"
[115] "Q051401N.015" "Q051401N.016" "Q051401N.017" "Q051401N.018" "Q051401N.019" "Q051401N.020"
[121] "Q051401N.021" "Q051401N.022" "Q051401N.023" "Q051401N.024" "Q051401N.025" "Q061501N.001"
[127] "Q061501N.002" "Q061501N.003" "Q061501N.004" "Q061501N.005" "Q061501N.006" "Q061501N.007"
[133] "Q061501N.008" "Q061501N.009" "Q061501N.010" "Q061501N.011" "Q061501N.012" "Q061501N.013"
[139] "Q061501N.014" "Q061501N.015" "Q061501N.016" "Q061501N.017" "Q061501N.018" "Q061501N.019"
[145] "Q061501N.020" "Q061501N.021" "Q061501N.022" "Q061501N.023" "Q061501N.024" "Q061501N.025"
[151] "Q071601N.001" "Q071601N.002" "Q071601N.003" "Q071601N.004" "Q071601N.005" "Q071601N.006"
[157] "Q071601N.007" "Q071601N.008" "Q071601N.009" "Q071601N.010" "Q071601N.011" "Q071601N.012"
[163] "Q071601N.013" "Q071601N.014" "Q071601N.015" "Q071601N.016" "Q071601N.017" "Q071601N.018"
[169] "Q071601N.019" "Q071601N.020" "Q071601N.021" "Q071601N.022" "Q071601N.023" "Q071601N.024"
[175] "Q071601N.025" "Q081701N.001" "Q081701N.002" "Q081701N.003" "Q081701N.004" "Q081701N.005"
[181] "Q081701N.006" "Q081701N.007" "Q081701N.008" "Q081701N.009" "Q081701N.010" "Q081701N.011"
[187] "Q081701N.012" "Q081701N.013" "Q081701N.014" "Q081701N.015" "Q081701N.016" "Q081701N.017"
[193] "Q081701N.018" "Q081701N.019" "Q081701N.020" "Q081701N.021" "Q081701N.022" "Q081701N.023"
[199] "Q081701N.024" "Q081701N.025" "Q091801N.001" "Q091801N.002" "Q091801N.003" "Q091801N.004"
[205] "Q091801N.005" "Q091801N.006" "Q091801N.007" "Q091801N.008" "Q091801N.009" "Q091801N.010"
[211] "Q091801N.011" "Q091801N.012" "Q091801N.013" "Q091801N.014" "Q091801N.015" "Q091801N.016"
[217] "Q091801N.017" "Q091801N.018" "Q091801N.019" "Q091801N.020" "Q091801N.021" "Q091801N.022"
[223] "Q091801N.023" "Q091801N.024" "Q091801N.025" "Q101901N.001" "Q101901N.002" "Q101901N.003"
[229] "Q101901N.004" "Q101901N.005" "Q101901N.006" "Q101901N.007" "Q101901N.008" "Q101901N.009"
[235] "Q101901N.010" "Q101901N.011" "Q101901N.012" "Q101901N.013" "Q101901N.014" "Q101901N.015"
[241] "Q101901N.016" "Q101901N.017" "Q101901N.018" "Q101901N.019" "Q101901N.020" "Q101901N.021"
[247] "Q101901N.022" "Q101901N.023" "Q101901N.024" "Q101901N.025" "Q112001N.001" "Q112001N.002"
[253] "Q112001N.003" "Q112001N.004" "Q112001N.005" "Q112001N.006" "Q112001N.007" "Q112001N.008"
[259] "Q112001N.009" "Q112001N.010" "Q112001N.011" "Q112001N.012" "Q112001N.013" "Q112001N.014"
[265] "Q112001N.015" "Q112001N.016" "Q112001N.017" "Q112001N.018" "Q112001N.019" "Q112001N.020"
[271] "Q112001N.021" "Q112001N.022" "Q112001N.023" "Q112001N.024" "Q112001N.025" "Q124101N.001"
[277] "Q124101N.002" "Q124101N.003" "Q124101N.004" "Q124101N.005" "Q124101N.006" "Q124101N.007"
[283] "Q124101N.008" "Q124101N.009" "Q124101N.010" "Q124101N.011" "Q124101N.012" "Q124101N.013"
[289] "Q124101N.014" "Q124101N.015" "Q124101N.016" "Q124101N.017" "Q124101N.018" "Q124101N.019"
[295] "Q124101N.020" "Q124101N.021" "Q124101N.022" "Q124101N.023" "Q124101N.024" "Q124101N.025"
[301] "Q134201N.001" "Q134201N.002" "Q134201N.003" "Q134201N.004" "Q134201N.005" "Q134201N.006"
[307] "Q134201N.007" "Q134201N.008" "Q134201N.009" "Q134201N.010" "Q134201N.011" "Q134201N.012"
[313] "Q134201N.013" "Q134201N.014" "Q134201N.015" "Q134201N.016" "Q134201N.017" "Q134201N.018"
[319] "Q134201N.019" "Q134201N.020" "Q134201N.021" "Q134201N.022" "Q134201N.023" "Q134201N.024"
[325] "Q134201N.025" "Q144301N.001" "Q144301N.002" "Q144301N.003" "Q144301N.004" "Q144301N.005"
[331] "Q144301N.006" "Q144301N.007" "Q144301N.008" "Q144301N.009" "Q144301N.010" "Q144301N.011"
[337] "Q144301N.012" "Q144301N.013" "Q144301N.014" "Q144301N.015" "Q144301N.016" "Q144301N.017"
[343] "Q144301N.018" "Q144301N.019" "Q144301N.020" "Q144301N.021" "Q144301N.022" "Q144301N.023"
[349] "Q144301N.024" "Q144301N.025" "Q154401N.001" "Q154401N.002" "Q154401N.003" "Q154401N.004"
[355] "Q154401N.005" "Q154401N.006" "Q154401N.007" "Q154401N.008" "Q154401N.009" "Q154401N.010"
[361] "Q154401N.011" "Q154401N.012" "Q154401N.013" "Q154401N.014" "Q154401N.015" "Q154401N.016"
[367] "Q154401N.017" "Q154401N.018" "Q154401N.019" "Q154401N.020" "Q154401N.021" "Q154401N.022"
[373] "Q154401N.023" "Q154401N.024" "Q154401N.025" "Q164501N.001" "Q164501N.002" "Q164501N.003"
[379] "Q164501N.004" "Q164501N.005" "Q164501N.006" "Q164501N.007" "Q164501N.008" "Q164501N.009"
[385] "Q164501N.010" "Q164501N.011" "Q164501N.012" "Q164501N.013" "Q164501N.014" "Q164501N.015"
[391] "Q164501N.016" "Q164501N.017" "Q164501N.018" "Q164501N.019" "Q164501N.020" "Q164501N.021"
[397] "Q164501N.022" "Q164501N.023" "Q164501N.024" "Q164501N.025" "Q174601N.001" "Q174601N.002"
[403] "Q174601N.003" "Q174601N.004" "Q174601N.005" "Q174601N.006" "Q174601N.007" "Q174601N.008"
[409] "Q174601N.009" "Q174601N.010" "Q174601N.011" "Q174601N.012" "Q174601N.013" "Q174601N.014"
[415] "Q174601N.015" "Q174601N.016" "Q174601N.017" "Q174601N.018" "Q174601N.019" "Q174601N.020"
[421] "Q174601N.021" "Q174601N.022" "Q174601N.023" "Q174601N.024" "Q174601N.025"
So in this case I get 425 spectra files and there are 25 repetitions of each sample. However the total number of files could be different another time and it could also be that one sample has 10 repetitions and the rest has 14 for example.
So I would like to subset each sample (with its repetitions into one subset). In this case I would get 17 subsets.
And the I need to import the files, which I have done succesfully before with all spectra files:
list.data <- list()
#import all spectra files
for (i in 1:length(files))
list.data[[i]] <- read.csv(files[i])
Given that I have now subsets, that would be slightly different!?
You can do this via a helper function and iteration. I used dplyr, purrr and stringi. This will put all your files into a single dataframe. After that you can manipulate it as you see fit.
library(dplyr)
library(purrr)
library(stringi)
read_spectra <- function(file){
file_name <- basename(file)
read.csv(file) %>%
mutate(sample = stri_extract_first_regex(file_name, "([A-Z][0-9]+)(?=.)"),
repetition = stri_extract_first_regex(file_name, "(?<=\\.)(\\d+)")) %>%
select(sample, repetition, everything())
}
full_data <- map_df(files, read_spectra)
The helper function:
Takes a file from list.files.
Reads the csv.
Uses mutate to make two new columns using regex to extract the sample number and repetition.
Orders the columns into sample, repetition and everything else.
The iteration is using map_df() from purrr to iterate read_spectra over each file in files and bind all of this together into a dataframe.

Loop read character vector in order it was produced vs alphabetical

Ok so I pulled a list of tickers from a sorted data frame which was sorted by date. Thus my symbols character vector is in a sorted order:
> tickers
[1] "SPPI" "ZGNX" "ARDM" "PRTK" "GTXI" "HEB" "FCSC" "ACOR" "ALKS" "MNKD" "HRTX" "CTIC"
[13] "ARLZ" "JAZZ" "VVUS" "DEPO" "OREX" "PLX" "PTIE" "DRRX" "SGEN" "PCRX" "PSDV" "ALIM"
[25] "INCY" "ATRS" "INSY" "CRIS" "CORT" "EBS" "RGEN" "ARNA" "AMRN" "HALO" "NAVB" "SUPN"
[37] "EXEL" "IPXL" "IMGN" "DVAX" "SCMP" "TTNP" "ENDP" "AVDL" "AVEO" "TBPH" "DCTH" "ABBV"
[49] "AMAG" "VNDA" "BMRN" "MDCO" "OMER" "BDSI" "EGRX" "ACRX" "KERX" "NKTR" "PGNX" "AEZS"
[61] "ENTA" "BCRX" "ADMS" "VRTX" "NBIX" "RMTI" "ADMP" "AMGN" "MNTA" "PTX" "EBIO" "NYMX"
[73] "VTL" "TTPH" "MACK" "LPTX" "GWPH" "SPHS" "RPRX" "OTIC" "NEOT" "CHRS" "ZFGN" "NEOS"
[85] "RDHL" "PTLA" "OPK" "CHMA" "ACAD" "NLNK" "AZN" "ICPT" "AAAP" "DERM" "OCUL" "MRNS"
[97] "RVNC" "CLVS" "GALE" "LPCN" "TSRO" "AMPE" "CYTR" "RARE" "MCRB" "ADMA" "IONS" "VTVT"
[109] "AUPH" "EARS" "ACRS" "KMDA" "RIGL" "KPTI" "TNXP" "AERI" "NVAX" "VICL" "SRPT" "GILD"
[121] "ITCI" "GNCA" "ABUS" "CEMP" "TENX" "ALNY" "PLXP" "PTN" "INNL" "ANTH" "CRBP" "BSTC"
[133] "REPH" "NOVN" "CERC" "HTBX" "LXRX" "HZNP" "SGYP" "OPHT" "AKAO" "LIFE" "PRTO" "VCEL"
[145] "IRWD" "PBMD" "AMPH" "PFE" "AGRX" "EGLT" "ADHD" "FGEN" "AGN" "GEMP" "OCRX" "CATB"
[157] "DMTX" "AVIR" "JNJ" "TCON" "SAGE" "ZSAN" "AXON" "MRK" "VRX" "ARDX" "XBIT" "CDTX"
[169] "TRVN" "CELG" "CMRX" "ARGS" "LJPC" "NDRM" "PBYI" "SCYX" "PTCT" "GALT" "KURA" "AKCA"
[181] "TGTX" "NVS" "CPRX" "LLY" "GNMX" "BLRX" "XENE" "FOMX" "SNY" "REGN" "RTTR" "CARA"
[193] "NVCR" "BMY" "ONCE" "GERN" "MESO" "OMED" "MTFB" "EIGR" "ACHN" "AKTX" "XOMA" "CAPR"
[205] "RDUS" "NTRP" "BPMX" "TXMD" "BTX" "GSK" "CORI" "FOLD" "BLPH" "SBPH" "NVO" "RETA"
[217] "ECYT" "IMDZ" "MTNB" "ARQL" "LOXO" "ZYME" "RNN" "PIRS" "FPRX" "CALA" "BGNE" "BLUE"
[229] "CLSN" "CRVS" "GLYC" "JUNO" "IOVA" "RGLS" "XLRN" "ALDX" "EPZM" "SELB" "IMUC" "BLCM"
[241] "GBT" "STML" "AGIO" "RARX" "ALDR" "ITEK" "IMRN" "QURE" "SVRA" "KDMN" "CBAY" "BVXV"
[253] "CYTX" "NVIV" "MYOK" "ZYNE" "ESPR" "GLPG" "ABIO" "CVM" "STDY" "CLLS" "INSM" "VSTM"
[265] "VYGR" "VRNA" "UTHR" "ARRY" "BPMC" "IDRA" "INO" "EPIX" "AGEN" "FENC" "MRTX" "INVA"
[277] "NBRV" "VSAR" "IPCI" "PRQR" "AZRX" "PRTA" "BHVN" "MYL" "FLXN" "ANAB" "RXDX"
I want the loop to read this character vector in the order it was produced vs alphabetic.
If I illustrate the loading of data with:
# Note function is store list of commands to perform over a directory of files
genCHART = function(x){
next.symbol <- tickers[i] # specify to start from first position in vector
date.list <- dates[i] # specify to start from first position in vector
next.file <- fread(paste0("https://www.alphavantage.co/query?function=TIME_SERIES_DAILY&symbol=",next.symbol,"&outputsize=full&apikey=6RSYX9BPXKZVXUS9&datatype=csv"))
new.df <- data.frame(next.file)
head(new.df)
}
# Loop calls function in order to process multiple files
for (i in 1:length(tickers)){
genCHART(tickers[[i]])
}
# loop will do nothing but process and load all tickers.. but its too illustrate the point.
What we see if we print tickers[i]
> next.symbol
[1] "ANTH"
it gives me the first ticker per its alphabetical order. So it returns tickers beginning with A first versus my order above. I want it to loop through my character list as the the order of the ticker vector.
Is there anyway to over come this?
Editing post
IF I take a vector of dates:
> dates
[1] "2009-07-05" "2009-07-16" "2009-07-16" "2009-09-04" "2009-10-09" "2009-11-02"
[7] "2009-11-02" "2009-12-01" "2009-12-18" "2010-01-22" "2010-01-27" "2010-03-15"
[13] "2010-03-15" "2010-03-19" "2010-04-09" "2010-04-30" "2010-10-11" "2010-10-28"
[19] "2011-01-19" "2011-01-28" "2011-02-01" "2011-02-25" "2011-04-29" "2011-06-22"
[25] "2011-06-24" "2011-06-24" "2011-08-19" "2011-10-31" "2011-11-11" "2011-11-11"
[31] "2011-11-16" "2011-11-23" "2011-12-08" "2012-01-05" "2012-01-30" "2012-02-17"
and I want to start from the first in the vector...
date.list <- dates[i] # specify to start from first position in vector
shouldnt the above work even though it is wrapped in a function?
How can I get it work so that I read the start of my vector, and also how does this work when im putting my code in a function and then running the function in loop to process multiple files?
print(i) was = to 3
i=1
was the answer

Resources