Stack imbalance while using RcppParallel - r

// [[Rcpp::depends(RcppParallel)]]
#include <RcppParallel.h>
#include <RcppArmadilloExtensions/sample.h>
// [[Rcpp::depends(RcppArmadillo)]]
using namespace Rcpp;
using namespace RcppArmadillo;
using namespace RcppParallel;
using namespace std;
struct Sum : public Worker
{
vector<string> output;
Sum() {}
Sum(const Sum& sum, Split) {}
void operator()(std::size_t begin, std::size_t end) {
vector<string> states;
states.push_back("a");
states.push_back("b");
states.push_back("c");
states.push_back("d");
vector<double> probs;
probs.push_back(0.3);
probs.push_back(0.4);
probs.push_back(0.1);
probs.push_back(0.2);
vector<string> rstat = sample(states, 1, false, wrap(probs));
output.push_back(rstat[0]);
}
void join(const Sum& rhs) {
for(int i=0;i<rhs.output.size();i++) {
output.push_back(rhs.output[i]);
}
}
};
// [[Rcpp::export]]
CharacterVector parallelVectorSum(int n) {
Sum sum;
parallelReduce(0, n, sum);
return wrap(sum.output);
}
The above code is just an experiment to learn RcppParllel. I did a lot of search and found that we should avoid the use of data type such as CharacterVector, NumericVector, etc. That is why I have used C++ STL.
Output 1
> parallelVectorSum(1)
[1] "b"
Output 2
> parallelVectorSum(11)
[1] "d" "a" "b" "b" "d" "a" "b" "b" "d" "b" "a"
Output 3
> parallelVectorSum(111)
Warning: stack imbalance in '.Call', 7 then 6
[1] "a" "b" "d" "b" "a" "b" "d" "d" "a" "b" "a" "b" "d" "b" "b" "c" "a" "a" "a" "d" "b" "b" "b" "a" "c" "a" "b" "a"
[29] "a" "b" "b" "d" "a" "b" "c" "b" "b" "d" "d" "b" "b" "a" "b" "a" "d" "b" "b" "a" "a" "a" "b" "b" "a" "a" "b" "d"
[57] "a" "a" "b" "d" "a" "a" "c" "d" "b" "c" "a" "d" "a" "d" "d" "b" "a" "a" "d" "b" "b" "d" "d" "b" "b" "b" "a" "a"
[85] "c" "a" "b" "d" "c" "b" "b" "a" "d" "d" "b" "b" "a" "a" "d" "d" "a" "c" "b" "b" "a" "a" "b" "b" "b" "c" "d"
In the last run I got a warning related to stack imbalance and I am sure this is because of the use of sample function of RcppArmadillo. In the definition of sample method I found that R data type are being used. Infact fourth parameter of sample is itself NumericVector which is a problem.
What can be the solution of this problem? Do I need to implement my own sample function (I don't think it's easy to do - I am a beginner).
Any solution will be appreciated. Please help.

I've already ported the code over from RcppArmadillo's sample.h to use only arma::vec.
See: https://github.com/RcppCore/RcppArmadillo/pull/101
The only issue is this will not work with std::string as arma has no type defined for that. (I suppose you could write it using a template?

Related

data.table filter doesn't work in with()?

I am filetering a data.table based on another data.table, and it gives a very odd result.
please advise,
library(data.table)
library(magrittr)
set.seed(100)
xA = data.table(A = letters[1:4], B = sample(1:1000))
xB = data.table(A = letters[1:4], B = sample(1:100))
with(xA[30], {
sprintf(" xA A = %s B = %s", A, B) %>% print
xB[A == A]$A %>% print
print("")
xB[A == "b"]$A %>% print
})
#[1] " xA A = b B = 322"
# [1] "a" "b" "c" "d" "a" "b" "c" "d" "a" "b" "c" "d" "a" "b" "c" "d" "a" "b" "c" "d" "a" "b" "c" #"d" "a" "b" "c" "d" "a" "b" "c" "d" "a" "b"
# [35] "c" "d" "a" "b" "c" "d" "a" "b" "c" "d" "a" "b" "c" "d" "a" "b" "c" "d" "a" "b" "c" "d" "a" #"b" "c" "d" "a" "b" "c" "d" "a" "b" "c" "d"
# [69] "a" "b" "c" "d" "a" "b" "c" "d" "a" "b" "c" "d" "a" "b" "c" "d" "a" "b" "c" "d" "a" "b" "c" #"d" "a" "b" "c" "d" "a" "b" "c" "d"
#[1] " xA A = b B = 322"
# [1] "b" "b" "b" "b" "b" "b" "b" "b" "b" "b" "b" "b" "b" "b" "b" "b" "b" "b" #"b" "b" "b" "b" "b" "b" "b"
With the toy code, it shall give a result of all b as the second result, but it gave everything as first printout. How come? Thanks for advice.
The problem is when you just look at the statement
xB[A == A]
How do you know which is a column name and which is a variable name? In this case, data.table just assumes you want all rows where column A is equal to itself (which is all of them. Try using a differnt variable name
with(xA[30], {
sprintf(" xA A = %s B = %s", A, B) %>% print
a <- A
xB[A == a]$A
})

Smartest way for making a sequence of characters in R

I am going to make the below sequence in R:
A A B B B A A B B B
I have used the below code:
rep(c("A","A","B","B","B"),2)
I got the correct answer as follows:
[1] "A" "A" "B" "B" "B" "A" "A" "B" "B" "B"
But I don't like my code. I would like to see the smartest way for making the above sequence. I don't know if it is possible to make the above sequence using LETTERS[1:2].
Thank you in advance
You can do it without using rep at all:
LETTERS[(0:9 %% 5 > 1) + 1]
[1] "A" "A" "B" "B" "B" "A" "A" "B" "B" "B"
Here you just replace 9 with however long you want the sequence to be.
You can use rep twice :
rep(rep(LETTERS[1:2], c(2, 3)), 2)
#[1] "A" "A" "B" "B" "B" "A" "A" "B" "B" "B"
A Reduce() version of #RonakShah's answer.
Reduce(rep, list(c(2, 3), 2), LETTERS[1:2])
# [1] "A" "A" "B" "B" "B" "A" "A" "B" "B" "B"
Another variant using rep and LETTERS:
LETTERS[rep(rep(1:2, 2:3), 2)]
# [1] "A" "A" "B" "B" "B" "A" "A" "B" "B" "B"
An option with replicate
unlist(replicate(2, Map(rep, LETTERS[1:2], c(2, 3))))
#[1] "A" "A" "B" "B" "B" "A" "A" "B" "B" "B"

How to convert DNAbin to FASTA in R?

I am trying to convert my_dnabin1, a DNAbin file of 55 samples, to fasta format. I am using the following code to convert it into a fasta file.
dnabin_to_fasta <- lapply(my_dnabin1, function(x) as.character(x[1:length(x)]))
This generates a list of 55 samples which looks like:
$SS.11.01
[1] "t" "t" "a" "c" "c" "t" "a" "a" "a" "a" "a" "g" "c" "c" "g" "c" "t" "t" "c" "c" "c" "t" "c" "c" "a" "a"
[27] "c" "c" "c" "t" "a" "g" "a" "a" "g" "c" "a" "a" "a" "c" "c" "t" "t" "t" "c" "a" "a" "c" "c" "c" "c" "a"
$SS.11.02
[1] "t" "t" "a" "c" "c" "t" "a" "a" "a" "a" "a" "g" "c" "c" "g" "c" "t" "t" "c" "c" "c" "t" "c" "c" "a" "a"
[27] "c" "c" "c" "t" "a" "g" "a" "a" "g" "c" "a" "a" "a" "c" "c" "t" "t" "t" "c" "a" "a" "c" "c" "c" "c" "a"
and so on...
However, I want a fasta formatted file as the output that may look something like:
>SS.11.01 ttacctga
>SS.11.02 ttacctga
you can try this
lapply(my_dnabin1, function(x) paste0(x, collapse = ''))

Access nested structure

Are there some nice designs to call data in a nested structure e.g.
a<-list(list(LETTERS[1:3],LETTERS[1:3]),list(LETTERS[4:6]))
lapply(a,function(x) lapply(x, function(x) x))
but unlist is not a option.
Not as good as #SimonO101's answer but just for providing as an alternative you can do it using do.call
> do.call(c,do.call(c, a))
[1] "A" "B" "C" "A" "B" "C" "D" "E" "F"
Also using Reduce
> do.call(c, Reduce(c, a))
[1] "A" "B" "C" "A" "B" "C" "D" "E" "F"
Recursive lapply... a.k.a rapply?
rapply( a , c )
[1] "A" "B" "C" "A" "B" "C" "D" "E" "F"

R: repeat elements of a list based on another list

I have searched for this but in vain.
the problem is I have two lists, first with the elements to be repeated
for example
my.list<-list(c('a','b','c','d'), c('g','h'))
and the second list is the number of times each element is to be repeated
repeat.list<-list(c(5,7,6,1), c(2,3))
I would like to create a new list in which each element in my.list is repeated based in repeat.list
i.e.
result:
[[1]]
[1] "a" "a" "a" "a" "a" "b" "b" "b" "b" "b" "b" "b" "c" "c" "c" "c" "c" "c" "d"
[[2]]
[1] "g" "g" "h" "h" "h"
Thank you in advance for your help
Use mapply:
mapply(rep, my.list, repeat.list)
[[1]]
[1] "a" "a" "a" "a" "a" "b" "b" "b" "b" "b" "b" "b" "c" "c" "c" "c" "c" "c" "d"
[[2]]
[1] "g" "g" "h" "h" "h"
lapply also does the trick, but is more verbose:
lapply(seq_along(my.list), function(i)rep(my.list[[i]], repeat.list[[i]]))
[[1]]
[1] "a" "a" "a" "a" "a" "b" "b" "b" "b" "b" "b" "b" "c" "c" "c" "c" "c" "c" "d"
[[2]]
[1] "g" "g" "h" "h" "h"

Resources