add non permanent vectors to data frame using rbind - r

i've non permanent vectors that i like to merge them to one data frame,
im using the following loop to create those vectors
for (i in campagin_id){
h <- basicHeaderGatherer()
doc <- getURI(paste0(automations_url,
"/",i,
"?apikey=",accessToken,
"&count=",pagination), headerfunction = h$update)
assign(paste0('web_id',i),c(i,as.integer(substring(h$value()[as.integer(grep(SearchTerm, h$value()))],
as.integer(regexpr(SearchTerm,h$value()[as.integer(grep(SearchTerm, h$value()))]))+nchar(SearchTerm)-1,as.integer(regexpr(SearchTerm,h$value()[as.integer(grep(SearchTerm, h$value()))]))+nchar(SearchTerm)+StringLength-2))))
}
i received list of vectors and i like to marge them with rbind something like that
rbind(web_id0f09cc8ddd,web_id18a71f70a8)
the issue is that i don't not how many vectors i will get but i knows only the beginning of the vector name, so i'm trying to run the following loop
for (i in campagin_id) {
web_id <- do.call("rbind",list(paste0('web_id',i)))
}
but it insert only one vector to the data frame
the campaign_id contains all the i values i need in specific time
Thanks

do.call is the right idea, but rbind is a slow operation. You should add your vectors to a list one-at-a-time, and then do a single rbind at the end, something like this (untested, obviously, as the example isn't reproducible, but it should give you the idea):
result_list = list(length = length(campagin_id))
for (i in campagin_id) {
h <- basicHeaderGatherer()
doc <- getURI(
paste0(
automations_url,
"/",
i,
"?apikey=",
accessToken,
"&count=",
pagination
),
headerfunction = h$update
)
result_list[[i]] = c(i, as.integer(
substring(
h$value()[as.integer(grep(SearchTerm, h$value()))],
as.integer(regexpr(SearchTerm, h$value()[as.integer(grep(SearchTerm, h$value()))])) +
nchar(SearchTerm) - 1,
as.integer(regexpr(SearchTerm, h$value()[as.integer(grep(SearchTerm, h$value()))])) +
nchar(SearchTerm) + StringLength - 2
)
))
}
results = do.call(rbind, result_list)

Related

Dynamically create subsets in R with a for loop

I am trying to create different subsets out of a table and with each iteration I want to shift one column upwards. So far I realized this with this code but undynamically:
subset_cor_lag00 <- subset(data_24h, select = c(price_return, sentiment_return, tweet_return))
korr_tab_lag00 <- cor(subset_cor_lag00)
subset_cor_lag01 <- transform(subset_cor_lag00, price_return = lead(price_return))
subset_cor_lag01 <- na.omit(subset_cor_lag01)
korr_tab_lag01 <- cor(subset_cor_lag01)
But now I tried to do this dynamically but I got stuck with it. So maybe someone has a hint. I really would appreciate it. I tried this
for(i in 1:5) {
paste0("subset_cor_lag0", i) <- transform(paste0("subset_cor_lag0", i-1), price_return = lead(price_return))
paste0("subset_cor_lag0", i) <- na.omit(paste0("subset_cor_lag0", i))
paste0("korr_tab_lag0", i) <- cor(paste0("subset_cor_lag0", i))
}
You can use assign for this, but usually having sequentially named variables isn't nice to work with. The better way is to use a list:
subset_cor_lag = list(subset(data_24h, select = c(price_return, sentiment_return, tweet_return)))
for(i in 2:6) {
temp = transform(subset_cor_lag[[i - 1]], price_return = lead(price_return))
subset_cor_lag[[i]] = na.omit(temp)
}
korr_tab = lapply(subset_cor_lag, cor)
## add names, if desired:
name_vec = paste0("lag", 0:5)
names(subset_cor_lag) = name_vec
names(korr_tab) = name_vec
You can then access, e.g., subset_cor_lag[["lag2"]] or subset_cor_lag[[3]], which is easy to do programmatically in a loop or with lapply.
See my answer at How to make a list of data frames? for more discussion and examples.

trying to get a proper names(list) output

I'm trying to split a 2 level deep list of characters into a 1 level list using a suffix.
More precisely, I have a list of genes, each containing 6 lists of probes corresponding to 6 bins. The architecture looks like :
feat_indexed_probes_bin$HSPB6$bin1
[1] "cg14513218" "cg22891287" "cg20713852" "cg04719839" "cg27580050" "cg18139462" "cg02956481" "cg26608795" "cg15660498" "cg25654926" "cg04878216"
I'm trying to get a list "bins_indexed_probes" with the following architecture :
bins_indexed_probes$HSPB6_bin6 containing the same probes so I can pass it to my map-reducing function.
I tried many solutions such as melt(), for loop, etc but I can't figure how to perform a double nested loop ( on genes and on bins) and get a list output with only 1 level depth.
For the moment, my func to do so is the following :
create_map <- function(indexes = feat_indexed_probes_bin, binlist = c("bin1", "bin2", "bin3", "bin4", "bin5", "bin6"), genes = features) {
map <- list()
ret <- lapply(binlist, function(bin) {
lapply(rownames(features), function(gene) {
map[[paste(gene, "_", bin, sep = "")]] <- feat_indexed_probes_bin[[gene]][[bin]]
tmp_names <<- paste(gene, "_", bin, sep = "")
return(map)
})
names(map) <- tmp_names
rm(tmp_names)
})
return(ret)
}
it returns:
[[6]][[374]]
GDF10_bin6
"cg13565300"
[[6]][[375]]
NULL
[[6]][[376]]
[[6]][[376]]$HNF1B_bin6
[1] "cg03433642" "cg09679923" "cg17652435" "cg03348978" "cg02435495" "cg02701059" "cg05110178" "cg11862993" "cg09463047"
[[6]][[377]]
[[6]][[377]]$GPIHBP1_bin6
[1] "cg01953797" "cg00152340"
instead, I would expect something like
$GPIHBP1_bin1
"cg...." "cg...."
...
$GPIHBP1_bin6
"someotherprobe"
$someothergene_bin1
"probe" "probe"
...
I hope I'm being clear, and since this is my first time asking question, I already apologise if I didn't follow the stackoverflow protocol.
Thank you already for reading me
Consider a nested lapply with extract, [[, and setNames calls, all wrapped in do.call using c to bind return elements together.
bins_indexed_probes <- do.call(c,
lapply(1:6, function(i)
setNames(lapply(feat_indexed_probes_bin, `[[`, i),
paste0(names(feat_indexed_probes_bin), "_bin", i))
)
)
# RE-ORDER ELEMENTS BY NAME
bins_indexed_probes <- bins_indexed_probes[sort(names(bins_indexed_probes))]
Rextester Demo

use of double brackets unclear

I'm new to R. Reading Book Of R by Tilman Davies. An example is provided for how to use an externally defined helper function which incidentally utilizes double square brackets [[]]. Please explain what helper.call[[1]] and helper.call[[2]] are doing and use of double brackets here.
multiples_helper_ext <- function(x=foo,matrix.flags,mat=diag(2){
indexes <- which(matrix.flags)
counter <- 0
result <- list()
for(i in indexes){
temp <- x[[i]]
if(ncol(temp)==nrow(mat)){
counter <- counter+1
result[[counter]] <- temp%*%mat
}
}
return(list(result,counter))
}
multiples4 <- function(x,mat=diag(2),str1="no valid matrices",str2=str1){
matrix.flags <- sapply(x,FUN=is.matrix)
if(!any(matrix.flags)){
return(str1)
}
helper.call <- multiples_helper_ext(x,matrix.flags,mat=diag(2)
result <- helper.call[[1]] #I dont understand this use of double bracket
counter <- helper.call[[2]] #and here either
if(counter==0){
return(str2)
} else {
return(result)
}
}
foo <- list(matrix(1:4,2,2),"not a matrix","definitely not a matrix",matrix(1:8,2,4),matrix(1:8,4,2))
In R there are two basic types of objects: lists and vectors. The items of lists can be other objects, the items of of vectors are usually numbers, strings, etc.
To access items in a list, you use the double bracket [[]]. This gives back the object at that place of the list.
So
x <- 1:10
x is now a vector of integers
L <- list( x, x, "hello" )
L is a list whose first item is the vector x, its second item is the vector x, and its third item is the string "hello".
L[[2]]
This give back a vector, 1:10, which is stored in the 2nd place in L.
L[2]
This is a bit confusing, but this gives back a list whose only item is 1:10, i.e. it only contains L[[2]].
In R, when you want to return multiple values, you usually do this with a list. So, you might end you function with
f <- function() {
return( list( result1="hello", result2=1:10) )
}
x = f()
Now you can access the two results with
print( x[["result1"]] )
print( x[["result2"]] )
You can also access items of a list with ''$, so instead you can write
print( x$result1 )
print( x$result2 )
The syntax [[]] is used for list in python. Your helper.call is a list (of result and counter), so helper.cal[[1]] returns the first element of this list (result).
Have a look here: Understanding list indexing and bracket conventions in R

R - creating data frames inside a for loop

I have hardcoded this:
s79t5 <- read.csv("filename1.csv", header = TRUE)
s81t2 <- read.csv("filename2.csv", header = TRUE)
etc.
subsets79t5 <- subset(s79t5, Tags!='')
subsets81t2 <- subset(s81t2, Tags!='')
...
subsets100t5 <- subset(s100t5, Tags!='')
now i need to softcode it. i am almost there:
sessions <- c('s79t5', 's81t2', 's88t2', 's90t3', 's96t3', 's98t4', 's100t5')
for (i in 1:length(sessions)) {
jFileName <- c(as.character(sessions[i]))
j <- data.frame(jFileName)
subset <- subset(j, j$Tags!='')
assign(paste("subset", jFileName, sep = ""), data.frame(subset))
}
Just throwing an answer here to close this question. Discussion was in the comments.
You need the get function in your line: j <- data.frame(jFileName)
It should be: j <- as.data.frame(get(jFileName))
The get function looks in your existing objects for the string character you gave it (in this case, jFileName) and returns that object. I then make sure it is a data frame with as.data.frame.
Previously you were essentially telling R to make a data frame out of a character string. With get you are now referencing your actual dataset.

How do I convert this for loop into something cooler like by in R

uniq <- unique(file[,12])
pdf("SKAT.pdf")
for(i in 1:length(uniq)) {
dat <- subset(file, file[,12] == uniq[i])
names <- paste("Sample_filtered_on_", uniq[i], sep="")
qq.chisq(-2*log(as.numeric(dat[,10])), df = 2, main = names, pvals = T,
sub=subtitle)
}
dev.off()
file[,12] is an integer so I convert it to a factor when I'm trying to run it with by instead of a for loop as follows:
pdf("SKAT.pdf")
by(file, as.factor(file[,12]), function(x) { qq.chisq(-2*log(as.numeric(x[,10])), df = 2, main = paste("Sample_filtered_on_", file[1,12], sep=""), pvals = T, sub=subtitle) } )
dev.off()
It works fine to sort the data frame by this (now a factor) column. My problem is that for the plot title, I want to label it with the correct index from that column. This is easy to do in the for loop by uniq[i]. How do I do this in a by function?
Hope this makes sense.
A more vectorized (== cooler?) version would pull the common operations out of the loop and let R do the book-keeping about unique factor levels.
dat <- split(-2 * log(as.numeric(file[,10])), file[,12])
names(dat) <- paste0("IoOPanos_filtered_on_pc_", names(dat))
(paste0 is a convenience function for the common use case where normally one would use paste with the argument sep=""). The for loop is entirely appropriate when you're running it for its side effects (plotting pretty pictures) rather than trying to capture values for further computation; it's definitely un-cool to use T instead of TRUE, while seq_along(dat) means that your code won't produce unexpected results when length(dat) == 0.
pdf("SKAT.pdf")
for(i in seq_along(dat)) {
vals <- dat[[i]]
nm <- names(dat)[[i]]
qq.chisq(val, main = nm, df = 2, pvals = TRUE, sub=subtitle)
}
dev.off()
If you did want to capture values, the basic observation is that your function takes 2 arguments that vary. So by or tapply or sapply or ... are not appropriate; each of these assume that just a single argument is varying. Instead, use mapply or the comparable Map
Map(qq.chisq, dat, main=names(dat),
MoreArgs=list(df=2, pvals=TRUE, sub=subtitle))

Resources