R connect string and dynamic variables - r

some other languages have this:
i=1
x&i=3
Then you will get a variable x1=3
How to realize this in R?
please don't use assign(paste0('x',1),3).
Because I want to iterate i, for example:
x1=c()
for(i in 1:100){
x1=c(x1,2*i)}
And I want x1,x2...xn. assign(paste) only generates variables once and doesn't have adding functions.
So the grammar x&i is the core problem.
Thanks for help.

Try this:
e <- .GlobalEnv
i <- 1
xi.name <- paste0("x", i)
# assign
e[[xi.name]] <- 3
# add
e[[xi.name]] <- e[[xi.name]] + 1
# display
e[[xi.name]]
## [1] 4
or using assign and get the above could be done like this:
i <- 1
xi.name <- paste0("x", i)
# assign
assign(xi.name, 3)
# add
assign(xi.name, get(xi.name) + 1)
# display
get(xi.name)
## [1] 4
Note that normally one does not generate dynamic variables but rather puts them into a list.
L <- list()
i <- 1
xi.name <- paste0("x", i)
# assign
L[[xi.name]] <- 3
# add
L[[xi.name]] <- L[[xi.name]] + 1
# display
L[[xi.name]]
## [1] 4
or simply:
L <- list()
i <- 1
# assign
L[[i]] <- 3
# add
L[[i]] <- L[[i]] + 1
# display
L[[i]]
## [1] 4
Note
e <- .GlobalEnv
i <- 1
xi.name <- paste0("x", i)
x1 <- 3
e[[xi.name]] <- c(e[[xi.name]], 99)
x1
## [1] 3 99
e <- .GlobalEnv
i <- 1
xi.name <- paste0("x", i)
x1 <- 3
assign(xi.name, c(get(xi.name), 99))
x1
## [1] 3 99

Related

How to run a function with multiple arguments of varying length in a loop in R

I need to run this function like 6000 times with all of its iterations. I have 6 arguments in total for the function. The first 3 of them go hand in hand and number 75. The next argument has 9 values. And the last 2 arguments have 3 values.
#require dplyr
#data is history as list
matchloop <- function(data, data2, x, a, b, c) {
#history as list
split <- data
#history for reference
fh <- FullHistory
#start counter
n<-1
#end counter
m<-a
tempdf0.3 <- fh
#set condition for loop
while(nrow(tempdf0.3) > 1 && m <= (nrow(data2))*b) {
#put history into a variable
tempdf0.0 <- split
#put fh into a variable
tempdf0.5 <- fh
#put test path into variable from row n to m
tempdf0.1 <- as.data.frame(data2[n:m,], stringsAsFactors = FALSE)
#change column name of test path
colnames(tempdf0.1) <- "directions"
#put row n to m of history into variable
tempdf0.2 <- lapply(tempdf0.0, function(df) df[n:m,])
#put output into output
tempdf0.3 <- orderedDistancespos(tempdf0.2, tempdf0.1,
"allPaths","directions")
#add to output routeID based on reference from fh-the test path ID
tempdf0.3 <- mutate(tempdf0.3, routeID = (subset(tempdf0.5, routeID
!= x)$routeID))
#reduce output based on the matched threshold
tempdf0.3 <- subset(tempdf0.3, dists >= a*c)
#create new history based on the IDs remaining in output
split <- split[as.character(tempdf0.3$routeID)]
#create new history for reference based on the IDs remaining in
output
fh <- subset(fh, routeID %in% tempdf0.3$routeID)
#increase loop counter
n <- n+a
#increase loop counter
m <- n+(a-1)
}
#show output
mylist <- list(tempdf0.3, nrow(tempdf0.3))
return(mylist)
}
I tried putting the 3 arguments with 75 elements in them to their own lists and use mapply. This works. But even at this level I still have to run the code 81 times to cover all the variables because as far as I understand mapply recycles based on the length of the longest argument.
mapply(matchloop, mylist2,mylist3,mylist4, MoreArgs = list(a=a, b=b, c=c))
data is a list of dataframes
data2 is a dataframe
x, a, b, c are all numerical.
Right now I'm trying to streamline my output so that its in just 1 line. So if possible I would like 1 single csv output with all 6000+ lines.
You can combine mapply and apply function to cycle through all possible combination of a, b and c variables. To create all possible combinations you can use expand.grid. Finally you can contatenate list of rows into a data.frame with the help of do.call and rbind functions as follows:
matchloop_stub <- matchloop <- function(data, data2, x, a, b, c) {
# stub
c(d = sum(data), d2 = sum(data2), x = sum(x), a = a, b = b, c = c, r = a + b + c)
}
set.seed(123)
mylist2 <- replicate(75, data.frame(rnorm(1)))
mylist3 <- replicate(75, data.frame(rnorm(2)))
mylist4 <- replicate(75, data.frame(rnorm(3)))
a <- 1:9
b <- 1:3
c <- 1:3
abc <- expand.grid(a, b, c)
names(abc) <- c("a", "b", "c")
xs <- apply(abc, 1, function(x) (mapply(matchloop_stub, mylist2, mylist3, mylist4, x[1], x[2], x[3], SIMPLIFY = FALSE)))
df <- do.call(rbind, do.call(rbind, xs))
write.csv(df, file = "temp.csv")
res <- read.csv("temp.csv")
nrow(res)
# [1] 6075
head(res)
# X d d2 x a b c r
# 1 1 -0.5604756 0.7407984 -1.362065 1 1 1 3
# 2 2 -0.5604756 0.7407984 -1.362065 2 1 1 4
# 3 3 -0.5604756 0.7407984 -1.362065 3 1 1 5
# 4 4 -0.5604756 0.7407984 -1.362065 4 1 1 6
# 5 5 -0.5604756 0.7407984 -1.362065 5 1 1 7
# 6 6 -0.5604756 0.7407984 -1.362065 6 1 1 8

Combine lists in forloop

I have a list of chromosomes chromosomes <- c(1:2, "X", "Y") that I am iterating over to generate random data n times for each chromosome.
I am doing this first by iterating over the chromosomes and generating the data using generateData() and then adding these to a list which I then combine into a data frame outside of the loop using bp_data <- as.data.frame(do.call(rbind, simByChrom)):
chromosomes <- c(1:2, "X", "Y")
simByChrom <- list()
for (c in chromosomes){
n <- sample(1:5,1)
cat(paste("Simulating", n, "breakpoints on chromosome", c), "\n")
bp_data <- generateData(c, n)
simByChrom[[c]] <- bp_data
}
bp_data <- as.data.frame(do.call(rbind, simByChrom))
rownames(bp_data) <- NULL
# generate dummy data
generateData <- function(c, n){
df <- data.frame(chrom = rep(c, n),
pos= sample(1:10000, n))
return(df)
}
chrom pos
1 1 7545
2 2 5798
3 2 3863
4 3 4036
5 3 9347
6 3 4749
I would like to iterate over this multiple times and record the iteration number in bp_data$iteration, to produce a data frame that looks like this:
chrom pos iteration
1 7215 1
1 4606 1
2 8282 1
2 3501 1
2 4350 1
2 6044 1
X 2467 1
Y 2816 1
Y 8848 1
Y 2304 1
Y 4235 1
1 3760 2
1 8205 2
1 4735 2
2 3061 2
X 56 2
X 1722 2
X 2430 2
X 6749 2
X 2081 2
Y 9646 2
However, I'm unsure how to do this. I've tried:
iterations <- 2
for (i in (1:iterations)){
cat("Running iteration", i, "\n")
simByChrom <- list()
for (c in chromosomes){
n <- sample(1:5,1)
cat(paste("Simulating", n, "breakpoints on chromosome", c), "\n")
bp_data <- generateData(c, n)
bp_data$iteration <- i
simByChrom[[c]] <- bp_data
# or
# simByChrom[[c]][[i]] <- bp_data
# or
# simByChrom[[c]] <- bp_data
# simByChrom[[c]]$iteration <- i
}
bp_data <- as.data.frame(do.call(rbind, simByChrom))
rownames(bp_data) <- NULL
}
But this results in only the last iteration being recorded.
Can anyone suggest how I can achieve my desired result?
The reason you are only seeing the last iteration in your result is because bp_data is being over-written each time through the for loop. You need to make sure you save each iteration result separately and then combine them together at the end.
I believe just a few minor adjustments to what you already have will do the trick:
iterations <- 2
#create empty list to store each iteration result
bp_data <- list()
#run each iteration
for (i in 1:iterations){
cat("Running iteration", i, "\n")
simByChrom <- list()
for (c in chromosomes){
n <- sample(1:5,1)
cat(paste("Simulating", n, "breakpoints on chromosome", c), "\n")
aa <- generateData(c, n)
aa$iteration <- i
simByChrom[[c]] <- aa
}
result <- as.data.frame(do.call(rbind, simByChrom))
rownames(result) <- NULL
bp_data[[i]] <- result
}
#combine each iteration into one data frame
final <- as.data.frame(do.call(rbind, bp_data))

Computing number of bits that are set to 1 for matching rows in terms of hamming distance between two data frames

I have two data frames of same number of columns (but not rows) df1 and df2. For each row in df2, I was able to find the best (and second best) matching rows from df1 in terms of hamming distance, in my previous post. In that post, we have been using the following example data:
set.seed(0)
df1 <- as.data.frame(matrix(sample(1:10), ncol = 2)) ## 5 rows 2 cols
df2 <- as.data.frame(matrix(sample(1:6), ncol = 2)) ## 3 rows 2 cols
I now need to compute the number of bits equal to 1 for:
each row in df2
the best matching rows in df1
the second matching rows in df1
The number of bits equal to 1 of an integer a maybe computed as
sum(as.integer(intToBits(a)))
And I have applied this to #ZheyuanLi's original function, so I have got item 1>. However I'm unable to apply the same logic to get item 2> and 3>, by simple modification of #ZheyuanLi's function.
Below are the functions from #ZheyuanLi's with modification:
hmd <- function(x,y) {
rawx <- intToBits(x)
rawy <- intToBits(y)
nx <- length(rawx)
ny <- length(rawy)
if (nx == ny) {
## quick return
return (sum(as.logical(xor(rawx,rawy))))
} else if (nx < ny) {
## pivoting
tmp <- rawx; rawx <- rawy; rawy <- tmp
tmp <- nx; nx <- ny; ny <- tmp
}
if (nx %% ny) stop("unconformable length!") else {
nc <- nx / ny ## number of cycles
return(unname(tapply(as.logical(xor(rawx,rawy)), rep(1:nc, each=ny), sum)))
}
}
foo <- function(df1, df2, p = 2) {
## check p
if (p > nrow(df2)) p <- nrow(df2)
## transpose for CPU cache friendly code
xt <- t(as.matrix(df1))
yt <- t(as.matrix(df2))
## after transpose, we compute hamming distance column by column
## a for loop is decent; no performance gain from apply family
n <- ncol(yt)
id <- integer(n * p)
d <- numeric(n * p)
sb <- integer(n)
k <- 1:p
for (i in 1:n) {
set.bits <- sum(as.integer(intToBits(yt[,i])))
distance <- hmd(xt, yt[,i])
minp <- order(distance)[1:p]
id[k] <- minp
d[k] <- distance[minp]
sb[i] <- set.bits
k <- k + p
}
## recode "id", "d" and "sb" into data frame and return
id <- as.data.frame(matrix(id, ncol = p, byrow = TRUE))
colnames(id) <- paste0("min.", 1:p)
d <- as.data.frame(matrix(d, ncol = p, byrow = TRUE))
colnames(d) <- paste0("mindist.", 1:p)
sb <- as.data.frame(matrix(sb, ncol = 1)) ## no need for byrow as you have only 1 column
colnames(sb) <- "set.bits.1"
list(id = id, d = d, sb = sb)
}
Running these gives:
> foo(df1, df2)
$id
min1 min2 ## row id for best/second best match in df1
1 1 4
2 2 3
3 5 2
$d
mindist.1 mindist.2 ## minimum 2 hamming distance
1 2 2
2 1 3
3 1 3
$sb
set.bits.1 ## number of bits equal to 1 for each row of df2
1 3
2 2
3 4
OK, after reading through while re-editing your question (many times!), I think I know what you want. Essentially we need change nothing to hmd(). Your required items 1>, 2>, 3> can all be computed after the for loop in foo().
To get item 1>, which you called sb, we can use a tapply(). However, your computation of sb along the for loop is fine, so I will not change it. In the following, I will demonstrate the basic procedure to get item 2> and item 3>.
The id vector inside foo() stores all matching rows in df1:
id <- c(1, 4, 2, 3, 5, 2)
so we can simply extract those rows of df1 (actually, columns of xt), to compute the number of bits equal to 1. As you can see, there are lots of duplicity in id, so we can only computes on unique(id):
id0 <- sort(unique(id))
## [1] 1 2 3 4 5
We now extract those subset columns of xt:
sub_xt <- xt[, id0]
## [,1] [,2] [,3] [,4] [,5]
## V1 9 3 10 5 6
## V2 2 4 8 7 1
To compute the number of bits equal to 1 for each column of sub_xt, we again use tapply() and vectorized approach.
rawbits <- as.integer(intToBits(as.numeric(sub_xt))) ## convert sub_xt to binary
sbxt0 <- unname(tapply(X = rawbits,
INDEX = rep(1:length(id0), each = length(rawbits) / length(id0)),
FUN = sum))
## [1] 3 3 3 5 3
Now we need to map sbxt0 to sbxt:
sbxt <- sbxt0[match(id, id0)]
## [1] 3 5 3 3 3 3
Then we can convert sbxt to a data frame sb1:
sb1 <- as.data.frame(matrix(sbxt, ncol = p, byrow = TRUE))
colnames(sb1) <- paste(paste0("min.", 1:p), "set.bits.1", sep = ".")
## min.1.set.bits.1 min.2.set.bits.1
## 1 3 5
## 2 3 3
## 3 3 3
Finally we can assemble these things up:
foo <- function(df1, df2, p = 2) {
## check p
if (p > nrow(df2)) p <- nrow(df2)
## transpose for CPU cache friendly code
xt <- t(as.matrix(df1))
yt <- t(as.matrix(df2))
## after transpose, we compute hamming distance column by column
## a for loop is decent; no performance gain from apply family
n <- ncol(yt)
id <- integer(n * p)
d <- numeric(n * p)
sb2 <- integer(n)
k <- 1:p
for (i in 1:n) {
set.bits <- sum(as.integer(intToBits(yt[,i])))
distance <- hmd(xt, yt[,i])
minp <- order(distance)[1:p]
id[k] <- minp
d[k] <- distance[minp]
sb2[i] <- set.bits
k <- k + p
}
## compute "sb1"
id0 <- sort(unique(id))
sub_xt <- xt[, id0]
rawbits <- as.integer(intToBits(as.numeric(sub_xt))) ## convert sub_xt to binary
sbxt0 <- unname(tapply(X = rawbits,
INDEX = rep(1:length(id0), each = length(rawbits) / length(id0)),
FUN = sum))
sbxt <- sbxt0[match(id, id0)]
sb1 <- as.data.frame(matrix(sbxt, ncol = p, byrow = TRUE))
colnames(sb1) <- paste(paste0("min.", 1:p), "set.bits.1", sep = ".")
## recode "id", "d" and "sb2" into data frame and return
id <- as.data.frame(matrix(id, ncol = p, byrow = TRUE))
colnames(id) <- paste0("min.", 1:p)
d <- as.data.frame(matrix(d, ncol = p, byrow = TRUE))
colnames(d) <- paste0("mindist.", 1:p)
sb2 <- as.data.frame(matrix(sb2, ncol = 1)) ## no need for byrow as you have only 1 column
colnames(sb2) <- "set.bits.1"
list(id = id, d = d, sb1 = sb1, sb2 = sb2)
}
Now, running foo(df1, df2) gives:
> foo(df1,df2)
$id
min.1 min.2
1 1 4
2 2 3
3 5 2
$d
mindist.1 mindist.2
1 2 2
2 1 3
3 1 3
$sb1
min.1.set.bits.1 min.2.set.bits.1
1 3 5
2 3 3
3 3 3
$sb2
set.bits.1
1 3
2 2
3 4
Note that I have renamed the sb you used to sb2.

Adding elements to a list in for loop in R

I'm trying to add elements to a list in a for loop. How can I set the field name?
L <- list()
for(i in 1:N)
{
# Create object Ps...
string <- paste("element", i, sep="")
L$get(string) <- Ps
}
I want every element of the list to have the field name dependent from i (for example, the second element should have "element2")
How to do this? I think that my error is the usage of get
It seems like you're looking for a construct like the following:
N <- 3
x <- list()
for(i in 1:N) {
Ps <- i ## where i is whatever your Ps is
x[[paste0("element", i)]] <- Ps
}
x
# $element1
# [1] 1
#
# $element2
# [1] 2
#
# $element3
# [1] 3
Although, if you know N beforehand, then it is better practice and more efficient to allocate x and then fill it rather than adding to the existing list.
N <- 3
x <- vector("list", N)
for(i in 1:N) {
Ps <- i ## where i is whatever your Ps is
x[[i]] <- Ps
}
setNames(x, paste0("element", 1:N))
# $element1
# [1] 1
#
# $element2
# [1] 2
#
# $element3
# [1] 3

add a data frame to a constructed name

I have this
for(i in 1:10)
and within it, I have a data frame:
e.g.
df<-1:100
and I want to assign the dataframe to a specific name which I want to create
something like: (not that it works)
paste("name", variable[i])<- df
Edit:
How would I then go about accessing those constructed values in another loop (assuming i've used assign)
datalist <- paste("a",1:100,sep="")
for (i in 1:length(datalist)){
}
I suggest assign, as illustrated here:
for(i in 1:100){
df <- data.frame(x = rnorm(10),y = rpois(10,10))
assign(paste('df',i,sep=''),df)
}
You could store the output of your loop in a list:
set.seed(10)
x = list()
for(i in 1:10){
x[[i]] <- data.frame(x = rnorm(100), y = rnorm(100))
}
Then x will be a list of length 10 and each element of the list will be of dim c(100, 2)
> length(x)
[1] 10
> dim(x[[1]])
[1] 100 2
>
Of course you could name the elements of the list, as well:
names(x) = letters[1:10]
x[["a"]]
x y
1 0.01874617 -0.76180434
2 -0.18425254 0.41937541
3 -1.37133055 -1.03994336
4 -0.59916772 0.71157397
5 0.29454513 -0.63321301
6 0.38979430 0.56317466
...

Resources