Random subset of a fixed set in Julia - julia

Suppose that you have a set A in Julia. How do you generate a random subset from A?
Is there any package or special function to do this?.

The best method I can think of for sampling without replacement is to use the sample method from StatsBase (Doc). Unfortunately, this method currently only works for indexable collections. So you would have to convert your Set to an Array first and your sample back to Set.
using StatsBase
A = Set([1, 2, 3, 4, 5])
S = Set(sample(collect(A), 3, replace = false))

Related

R Saving function output to object when using assign function

I am currently trying to make my code dryer by rewriting some parts with the help of functions. One of the functions I am using is:
datasetperuniversity<-function(university,year){assign(paste("data",university,sep=""),subset(get(paste("originaldata",year,sep="")),get(paste("allcollaboration",university,sep=""))==1))}
Executing the function datasetperuniversity("Harvard","2000") would result within the function in something like this:
dataHarvard=subset(originaldata2000,allcollaborationHarvard==1)
The function runs nearly perfectly, except that it does not store a the results in dataHarvard. I read that this is normal in functions, and using the <<- instead of the = could solve this issue, however since I am making use of the assign function this is not really possible, since the = is just the outcome of the assign function.
Here some data:
sales = c(2, 3, 5,6)
numberofemployees = c(1, 9, 20,12)
allcollaborationHarvard = c(0, 1, 0,1)
originaldata = data.frame(sales, numberofemployees, allcollaborationHarvard)
Generally, it's best not to embed data/a variable into the name of an object. So instead of using assign to dataHarvard, make a list data with an element called "Harvard":
# enumerate unis, attaching names for lapply to use
unis = setNames(, "Harvard")
# make a table for each subset with lapply
data = lapply(unis, function(x)
originaldata[originaldata[[ paste0("allcollaboration", x) ]] == 1, ]
)
which gives
> data
$Harvard
sales numberofemployees allcollaborationHarvard
2 3 9 1
4 6 12 1
As seen here, you can use DF[["column name"]] to access a column instead of get as in the OP. Also, see the note in ?subset:
Warning
This is a convenience function intended for use interactively. For programming it is better to use the standard subsetting functions like [, and in particular the non-standard evaluation of argument subset can have unanticipated consequences.
Generally, it's also better not to embed data in column names if possible. If the allcollaboration* columns are mutually exclusive, they can be collapsed to a single categorical variable with values like "Harvard", "Yale", etc. Alternately, it might make sense to put the data in long form.
For more guidance on arranging data, I recommend Hadley Wickham's tidy data paper.

R: Iterate/accumulate a parameter within lapply

lapply(rep(list(sample(1:100)), 10), sort, partial = 1:10)
... is what I'm trying to do. But the partial = 1:10 term is only evaluated once. What is the simplest method to evaluate the list of ten sample(1:100)'s with ten i++-style values passed to partial?
I apologize for being inarticulate.
A follow-up question is whether there is a more efficient method of generating these samples. What might this look like in a single custom function?
Thank you.
To answer your original question: you can use the multivariate version of sapply/lapply, eg
mapply(sort, x = list(sample(1:100, 10)), partial = 1:10)
If you want to return a list, set SIMPLIFY to FALSE.

Sum of two elements in a vector

I want to perform the following easy calculation for an example data
a<-seq(1:10)
Now, is there an built-in function,, which returns a vector: (a[1]+a[2],a[3]+a[4],...,a[9]+a[10]). Note I'm able to implement this using a for loop or using rollapply (and deleting some elements). However, I'm wondering if there is a built-in function I do not know so far.
How about this?
a[c(T,F)] + a[c(F,T)]
rollapply in the zoo package can do that in a straightforward manner:
library(zoo)
rollapply(a, 2, by = 2, sum)

permutation with repetition

In R, how can I produce all the permutation of a group, but in this group there are some repetitive elements.
Example :
A = {1,1,2,2,3}
solution :
1,1,2,2,3
1,1,2,3,2
1,1,3,2,2
1,2,1,2,3
1,2,2,1,3
1,2,2,3,1
.
.
using the gtools package,
library(gtools)
x <- c(1,1,2,2,3)
permutations(5, 5, x, set = FALSE)
Just use the combinat package:
A = c(1,1,2,2,3)
library(combinat)
permn(A)
If you want to do it with built-in R:
permute <- function(vec,n=length(vec)) {
permute.index <- sample.int(length(vec),n)
return(vec[permute.index])
}
permute(A)
Using the permute package:
x <- c(1,1,2,2,3)
require(permute)
allPerms(x, observed = TRUE)
I have done extensive research on combination and permutation. This result which I have found is written on a book Known as Junction (an art of counting combination and permutation. To view my site then log on to https://sites.google.com/site/junctionslpresentation/home
I have also have solution for your question. I have also found to order a multiple object permutation. This multiple object permutation I call it (CON of MSNO) which means Combination Order Number of Multiple Same Number of Objects.
To view this method of ordering then go to the site https://sites.google.com/site/junctionslpresentation/proof-for-advance-permutation
at the bottom of this site I have attached some word documents. Your required solution is written on the word document 12 Proof (CON of MSNO) and 13 Proof (Converse of CON of MSNO). Download this word document for the proper view of the written matters.

Access variable value where the name of variable is stored in a string

Similar questions have been raised for other languages: C, sql, java, etc.
But I'm trying to do this in R.
I have:
ret_series <- c(1, 2, 3)
x <- "ret_series"
How do I get (1, 2, 3) by calling some function / manipulation on x, without direct mentioning of ret_series?
You provided the answer in your question. Try get.
> get(x)
[1] 1 2 3
For a one off use, the get function works (as has been mentioned), but it does not scale well to larger projects. it is better to store you data in lists or environments, then use [[ to access the individual elements:
mydata <- list( ret_series=c(1,2,3) )
x <- 'ret_series'
mydata[[x]]
What's wrong with either of the following?
eval(as.name(x))
eval(as.symbol(x))
Note that some of the examples above wouldn't work for a data.frame.
For instance, given
x <- data.frame(a=seq(1,5))
get("x$a") would not give you x$a.

Resources