Subset from a list with relational operators - r

I have a list object that contains multiple lists with in each list the same returning objects. The list below is a short version of this.
logs <- list(list(success = TRUE, details = "check", timestamp = as.Date("2017-10-06")),
list(success = FALSE, details = "uncheck", timestamp = as.Date("2017-10-07")),
list(success = FALSE, details = "check", timestamp = as.Date("2017-10-08")),
list(success = FALSE, details = "uncheck", timestamp = as.Date("2017-10-09")))
I want to create two vectors: one vector (success_true) that contains the content of the second element of each list if the content of the first element equals true, and one vector (succes_false) that contains the content of the second element of each list if the content of the first element equals false. The result that I'm looking for looks like this:
success_true <- c("check")
succes_false <- c("uncheck", "check", "uncheck")
The sapply solution that Shaun Wilkinson came up with works.
# Solution number 1 by Shaun Wilkinson: sapply
successes <- sapply(logs, function(e) e$success)
details <- sapply(logs, function(e) e$details)
success_true <- details[successes]
success_false <- details[!successes]
I also came up with another solution that incoorporates a conditional statement within a for loop.
# Solution number 2 by SHW: conditional statement with for loop
success_true <- c() #create two vectors
success_false <- c()
for (log in logs) {
if (log$success == TRUE) {
success_true <- c(success_true, log$details) #add content of details element to the success_true vector if the condition is met
} else {
success_false <- c(succes_false, log$details) #add content of details element to the success_false vector if the condition is met
}

Try this:
successes <- sapply(logs, function(e) e$success)
details <- sapply(logs, function(e) e$details)
success_true <- details[successes]
success_false <- details[!successes]

In addition to Shauns answer, I have found another solution that incoorporates a conditional statement in a for loop. I think this solution allows for more flexibility and therefore is the solution that I will be using.
success_true <- c() #create two vectors
success_false <- c()
for (log in logs) {
if (log$success == TRUE) {
success_true <- c(success_true, log$details) #add content of details element to the success_true vector if the condition is met
} else {
success_false <- c(succes_false, log$details) #add content of details element to the success_false vector if the condition is met
}
}

Related

add non permanent vectors to data frame using rbind

i've non permanent vectors that i like to merge them to one data frame,
im using the following loop to create those vectors
for (i in campagin_id){
h <- basicHeaderGatherer()
doc <- getURI(paste0(automations_url,
"/",i,
"?apikey=",accessToken,
"&count=",pagination), headerfunction = h$update)
assign(paste0('web_id',i),c(i,as.integer(substring(h$value()[as.integer(grep(SearchTerm, h$value()))],
as.integer(regexpr(SearchTerm,h$value()[as.integer(grep(SearchTerm, h$value()))]))+nchar(SearchTerm)-1,as.integer(regexpr(SearchTerm,h$value()[as.integer(grep(SearchTerm, h$value()))]))+nchar(SearchTerm)+StringLength-2))))
}
i received list of vectors and i like to marge them with rbind something like that
rbind(web_id0f09cc8ddd,web_id18a71f70a8)
the issue is that i don't not how many vectors i will get but i knows only the beginning of the vector name, so i'm trying to run the following loop
for (i in campagin_id) {
web_id <- do.call("rbind",list(paste0('web_id',i)))
}
but it insert only one vector to the data frame
the campaign_id contains all the i values i need in specific time
Thanks
do.call is the right idea, but rbind is a slow operation. You should add your vectors to a list one-at-a-time, and then do a single rbind at the end, something like this (untested, obviously, as the example isn't reproducible, but it should give you the idea):
result_list = list(length = length(campagin_id))
for (i in campagin_id) {
h <- basicHeaderGatherer()
doc <- getURI(
paste0(
automations_url,
"/",
i,
"?apikey=",
accessToken,
"&count=",
pagination
),
headerfunction = h$update
)
result_list[[i]] = c(i, as.integer(
substring(
h$value()[as.integer(grep(SearchTerm, h$value()))],
as.integer(regexpr(SearchTerm, h$value()[as.integer(grep(SearchTerm, h$value()))])) +
nchar(SearchTerm) - 1,
as.integer(regexpr(SearchTerm, h$value()[as.integer(grep(SearchTerm, h$value()))])) +
nchar(SearchTerm) + StringLength - 2
)
))
}
results = do.call(rbind, result_list)

Random extraction from a list with NO REPLACEMENT

So I am wondering how to extract randomly a string from a list in R with NO REPLACEMENT till the list is empty.
To write
sample(x, size=1, replace=FALSE)
is not helping me, since string are extracted more than once before the list gets empty.
Kind regards
In every iteration one list element will be picked, and from this element a value removed. If there is only one value left, the list element is removed.
x <- list(a = "bla", b = c("ble", "bla"), c = "bli")
while (length(x) > 0) {
s <- sample(x, size = 1)
column <- x[[names(s)]]
value <- sample(unlist(s, use.names = FALSE), size = 1)
list_element_without_value <- subset(column, column != value)
x[[names(s)]] <- if (length(list_element_without_value) == 0) {
NULL
} else {
list_element_without_value
}
}
sample(x)
You can't use size=1 on repeated calls and expect it to know not to grab values previously selected. You have to grab all the values you want at one time. This code will shuffle your data and then you can grab the first element when you need it. Then the next time you need something grab the second... And so on.

dealing with empty elements in lapply()

it might be a rather beginner level question. lapply() is useful in applying a specific function on each component of a list. However, when I deal with data periodically generated by the data base, it happens sometimes, that one or more elements in the list is empty, while all other components of the same class are, let's say, data frames.
When I use lapply() to deal with the whole list, error occurs when it is the turn for the empty elements, because somehow the dimension or length or class don't fit. What I do in this case is using if/else loop, but I guess there must be a neat and smart way to tackle this problem.
Here is a example:
FTSR.site.app <- lapply(sortier.d.f, function(x) {
if(length(x) != 1){
FTSR <- as.numeric(get.FTSR(x))
}else FTSR <- 0})
sortier.d.f is a list consisting of dataframes with numerous rows and columns. If an empty element among them is present, which means no data is generated there, it will not get alone with the get.FTSR function (I wrote for a particular calculation), because the latter can only process data frames. The length of this empty element will be 1, I guess because it still exists as a 0 or a FALSE. Otherwise without such empty elements I can simply use
FTSR.site.app <- lapply(sortier.d.f, get.FTSR(x))
Would you please suggest a better solution for the problem with empty elements in such a case?
A simpler dummy example here:
test.A <- data.frame(name <- c("Michael", "John", "Mary"),
mathematik <- c(85, 72, 90), physics <- c(67, 82, 94))
test.B <- vector(length = 0, mode = "numeric")
test.L <- list(test.A, test.B)
sum.mean.calc <- function(test){
test$total <- apply(test[,2:3], MARGIN = 1, sum)
test$mean <- apply(test[,2:3], MARGIN = 1, mean)
return(test)
}
test.L <- lapply(test.L, sum.mean.calc)
test.L <- lapply(test.L, function(x){
if(length(x) != 0){
x <- sum.mean.calc(x)
}else x <- 0
return(x)
})
To first attemp to use lapply failed, because test.B is a 1-Dim vector with 0, so it can't be processed by function sum.mean.calc, so in the second attempt I have to use the extra loop
if(length(x) != 0){
...
}else x <- 0
to process all components in the list test.L, and that can be annoying when I want to use lapply a number of times on that list.

How to create a sorted vector in r

I have a list of elements in a random order. I want to read each element of this data one at a time and insert into other list in a sorted order. I wonder how to do this in R. I tried the below code.
lst=list()
x=c(2,3,1,4,5)
for(i in 1:length(x)) ## for reading the elements from x
{
if(lst==NULL)
{
lst=x[i]
}
else
{
lst=x[i]
print(lst)
for(k in 2: length(lst)) ## For sorting the elements in a list
{
value = lst[k]
j=k-1
while(j>=1 && lst[j]>value)
{
lst[j+1] = lst[j]
j= j-1
}
lst[j+1] = value
}
}
print(lst)
}
But i get the the Error :
error in if (lst == NULL) { : argument is of length zero.
For big datasets with lots of columns, you can use do.call
df1 <- df[do.call(order, df),]
Checking the order by specifying the column names,
df2 <- df[with(df, order(V1, V2, V3, V4)),]
identical(df1,df2)
#[1] TRUE
If you need to order in the reverse direction
df[do.call(order, c(df,decreasing=TRUE)),]
data
set.seed(24)
df <- as.data.frame(matrix(sample(letters,10*4,replace=TRUE),ncol=4))
First off, as commenters as pointed, you could use sort or order. But I believe you are trying to solve an assignment.
Your problem is a typo. Try executing in a console:
lst <- list()
lst == NULL
The last line evaluates to a null-length vector (logical(0)) for which R has no interpretation. Instead you are interested in
is.null(lst)
which will return TRUE or FALSE.

R: create vector from nested for loop

I have a "hit list" of genes in a matrix. Each row is a hit, and the format is "chromosome(character) start(a number) stop(a number)." I would like to see which of these hits overlap with genes in the fly genome, which is a matrix with the format "chromosome start stop gene"
I have the following function that works (prints a list of genes from column 4 of dmelGenome):
geneListBuild <- function(dmelGenome='', hitList='', binSize='', saveGeneList='')
{
genomeColumns <- c('chr', 'start', 'stop', 'gene')
genome <- read.table(dmelGenome, header=FALSE, col.names = genomeColumns)
chr <- genome[,1]
startAdjust <- genome[,2] - binSize
stopAdjust <- genome[,3] + binSize
gene <- genome[,4]
genome <- data.frame(chr, startAdjust, stopAdjust, gene)
hits <- read.table(hitList, header=TRUE)
chrHits <- hits[hits$chr == "chr3R",]
chrGenome <- genome[genome$chr == "chr3R",]
genes <- c()
for(i in 1:length(chrHits[,1]))
{
for(j in 1:length(chrGenome[,1]))
{
if( chrHits[i,2] >= chrGenome[j,2] && chrHits[i,3] <= chrGenome[j,3] )
{
print(chrGenome[j,4])
}
}
}
genes <- unique(genes[is.finite(genes)])
print(genes)
fileConn<-file(saveGeneList)
write(genes, fileConn)
close(fileConn)
}
however, when I substitute print() with:
genes[j] <- chrGenome[j,4]
R returns a vector that has some values that are present in chrGenome[,1]. I don't know how it chooses these values, because they aren't in rows that seem to fulfill the if statement. I think it's an indexing issue?
Also I'm sure that there is a more efficient way of doing this. I'm new to R, so my code isn't very efficient.
This is similar to the "writing the results from a nested loop into another vector in R," but I couldn't fix it with the information in that thread.
Thanks.
I believe the inner loop could be replaced with:
gene.in <- ifelse( chrHits[i,2] >= chrGenome[,2] & chrHits[i,3] <= chrGenome[,3],
TRUE, FALSE)
Then you can use that logical vector to select what you want. Doing
which(gene.in)
might also be of use to you.

Resources