I am using a for-loop to do step-by-step calculations where several equations depend on each other. Because of this dependence, I cannot find a solution where I do the calculations inside a dataframe. My main motivation is to speed up the calculations when the Time vector is very large in the reprex below.
Could you please suggest alternatives to the following for-loop based calculations, preferably inside a dataframe in R? The only thing I can think of is using for-loop in Rcpp.
Reproducible Example
last_time <- 10
STEP = 1
Time <- seq(from = 0, to = last_time, by = STEP)
## empty vectors
eq1 <- vector(mode = "double", length = length(Time))
eq2 <- vector(mode = "double", length = length(Time))
eq <- vector(mode = "double", length = length(Time))
eq3 <- vector(mode = "double", length = length(Time))
eq4 <- vector(mode = "double", length = length(Time))
## adding the first values
eq1[1] <- 25
eq2[1] <- 25
eq[1] <- 25
eq3[1] <- 100
eq4[1] <- 2
for (t in 2:length(Time)) {
## eq1
eq1[t] <- eq[t-1] + (2.5 * STEP * (1 - (eq[t-1])/25))
## eq2
eq2[t] <- (-2 * STEP) + ((-2^2) * (STEP^2)) - (2 * eq3[t-1]) - (eq[t-1] * STEP)
## min.
eq[t] <- min(eq1[t], eq2[t] )
## eq3
eq3[t] <- (eq[t] - eq[t-1])/(STEP)
## eq4
eq4[t] <- eq4[t-1] + (eq[t-1] * STEP) + (0.5 * eq3[t-1] * (STEP)^2)
}
Output:
my_data <- data.frame(Time, eq1, eq2, eq, eq3, eq4)
my_data
#> Time eq1 eq2 eq eq3 eq4
#> 1 0 25.00000 25.00000 25.00000 -256.00000 2.0000
#> 2 1 25.00000 -231.00000 -231.00000 25.60000 -101.0000
#> 3 2 -205.40000 225.00000 -205.40000 23.04000 -319.2000
#> 4 3 -182.36000 199.40000 -182.36000 20.73600 -513.0800
#> 5 4 -161.62400 176.36000 -161.62400 18.66240 -685.0720
#> 6 5 -142.96160 155.62400 -142.96160 16.79616 -837.3648
#> 7 6 -126.16544 136.96160 -126.16544 15.11654 -971.9283
#> 8 7 -111.04890 120.16544 -111.04890 13.60489 -1090.5355
#> 9 8 -97.44401 105.04890 -97.44401 12.24440 -1194.7819
#> 10 9 -85.19961 91.44401 -85.19961 11.01996 -1286.1037
#> 11 10 -74.17965 79.19961 -74.17965 0.00000 -1365.7934
Created on 2021-02-28 by the reprex package (v1.0.0)
You could define a recursive function. A loop is faster than recursion though.
g <- function(m, STEP, time, x=2) {
if (time == 0) m
else {
## eq1
m[x, 2] <- m[x - 1, 1] + 2.5*STEP*(1 - (m[x - 1, 1])/25)
## eq2
m[x, 3] <- -2*STEP + -2^2*STEP^2 - 2*m[x - 1, 4] - m[x - 1, 1]*STEP
## min.
m[x, 1] <- min(m[x, 2], m[x, 3])
## eq3
m[x - 1, 4] <- (m[x, 1] - m[x - 1, 1])/STEP
## eq4
m[x, 5] <- m[x - 1, 5] + m[x - 1, 1]*STEP + 0.5*m[x - 1, 4]*STEP^2
g(m, STEP, time - 1, x + 1)
}
}
Usage
last_time <- 10; STEP <- 1
First <- c(eq0=25, eq1=25, eq2=25, eq3=100, eq4=2)
m <- matrix(0, last_time + 1, length(First), dimnames=list(NULL, names(First)))
m[1, ] <- First
g(m, STEP, last_time)
# eq0 eq1 eq2 eq3 eq4
# [1,] 25.00000 25.00000 25.00000 -256.00000 2.0000
# [2,] -231.00000 25.00000 -231.00000 25.60000 -101.0000
# [3,] -205.40000 -205.40000 225.00000 23.04000 -319.2000
# [4,] -182.36000 -182.36000 199.40000 20.73600 -513.0800
# [5,] -161.62400 -161.62400 176.36000 18.66240 -685.0720
# [6,] -142.96160 -142.96160 155.62400 16.79616 -837.3648
# [7,] -126.16544 -126.16544 136.96160 15.11654 -971.9283
# [8,] -111.04890 -111.04890 120.16544 13.60489 -1090.5355
# [9,] -97.44401 -97.44401 105.04890 12.24440 -1194.7819
# [10,] -85.19961 -85.19961 91.44401 11.01996 -1286.1037
# [11,] -74.17965 -74.17965 79.19961 0.00000 -1365.7934
as you asked how it works:
The recursive filter function of stats::filter can be used with mapply as follows:
dataframe <-
mapply(stats::filter,
dataframe,
filter = vector,
method = "recursive")
where vector is e.g. c(25), which could be your first eq1[1] <- 25
The recursive filter works like a recursive loop but is a bit more elegant:
Then the mapply recursive filter would do:
dataframe / vector
row or timepoint 1 20
row or timepoint 2 30 + (20 * c(25))
row or timepoint 3 40 + ((20*25)+30) * c(25))
It calculates the value in the first row and uses it in the next, where it multiplies the next vector. Perhaps if you play around with stats filter and the recursive method you also get the same result. It is a row based calculation over time similar to Rcpp but more flexible.
I need to create groups of numbers which summed up do not reach 34.
For example: I have an array x<-c(28,26,20,5,3,2,1) and I need to create the following groups: a=(28,5,1), b=(26,3,2), c=(20) because the sums of the groups a, b and c do not exceed 34.
Is it possible to implement this procedure in R?
If I understand correctly this is what you want to do:
create_groups <- function(input, threshold) {
input <- sort(input, decreasing = TRUE)
result <- vector("list", length(input))
sums <- rep(0, length(input))
for (k in input) {
i <- match(TRUE, sums + k <= threshold)
if (!is.na(i)) {
result[[i]] <- c(result[[i]], k)
sums[i] <- sums[i] + k
}
}
result[sapply(result, is.null)] <- NULL
result
}
create_groups(x, 34)
# [[1]]
# [1] 28 5 1
#
# [[2]]
# [1] 26 3 2
#
# [[3]]
# [1] 20
However it is not guaranteed that this greedy algorithm will output the optimal solution in terms of number of groups. For instance:
y <- c(18, 15, 11, 9, 8, 7)
create_groups(y, 34)
# [[1]]
# [1] 18 15
#
# [[2]]
# [1] 11 9 8
#
# [[3]]
# [1] 7
while the optimal solution in this case consists of only 2 groups: list(c(18, 9, 7), c(15, 11, 8)).
Assuming you want all possible combinations of subsets of x that meet this condition, you can use
x = c(28,26,20,5,3,2,1)
y = lapply(seq_along(x), function(y) combn(x, y)) # list all combinations of all subsets
le34 = sapply(y, function(z) colSums(z) <= 34) # which sums are less than 34
lapply(seq_along(y), function(i) y[[i]][,le34[[i]]] ) # list of combinations that meet condition
Here is my dummy dataset:
dataset<-data.frame(a=c(1,2,3,4),b=c('a','b','c','d'), c=c("HI","DD","gg","ff"))
g=list(c("a","b"),c(2,3,4), c(44,33,11,22),c("chr","ID","i","II"))
dataset$l<-g
dataset
a b c l
1 1 a HI a, b
2 2 b DD 2, 3, 4
3 3 c gg 44, 33, 11, 22
4 4 d ff chr, ID, i, II
> mode(dataset$l)
[1] "list"
when I try to write the dataset to a file:
> write.table(dataset, "dataset.txt", quote=F, sep="\t")
Error in write.table(x, file, nrow(x), p, rnames, sep, eol, na, dec, as.integer(quote), :
unimplemented type 'list' in 'EncodeElement'
How can i solve this problem?
I can think a few options, depending on what you're trying to achieve.
If it is for display only, then you might simply want capture.output() or sink(); neither of these would be very convenient to read back into R:
capture.output(dataset, file="myfile.txt")
### Result is a text file that looks like this:
# a b c l
# 1 1 a HI a, b
# 2 2 b DD 2, 3, 4
# 3 3 c gg 44, 33, 11, 22
# 4 4 d ff chr, ID, i, II
sink("myfile.txt")
dataset
sink()
## Same result as `capture.output()` approach
If you want to be able to read the resulting table back into R (albeit without preserving the fact that column "l" is a list), you can take an approach similar to what #DWin suggested.
In the code below, the dataset2[sapply... line identifies which variables are lists and concatenates them into a single string. Thus, they become simple character variables, allowing you to use write.table().
dataset2 <- dataset # make a copy just to be on the safe side
dataset2[sapply(dataset2, is.list)] <- apply(dataset2[sapply(dataset2, is.list)],
1, function(x)
paste(unlist(x),
sep=", ", collapse=", "))
str(dataset2)
# 'data.frame': 4 obs. of 4 variables:
# $ a: num 1 2 3 4
# $ b: Factor w/ 4 levels "a","b","c","d": 1 2 3 4
# $ c: Factor w/ 4 levels "DD","ff","gg",..: 4 1 3 2
# $ l: chr "a, b" "2, 3, 4" "44, 33, 11, 22" "chr, ID, i, II"
write.table(dataset2, "myfile.txt", quote=FALSE, sep="\t")
# can be read back in with: dataset3 <- read.delim("myfile.txt")
Output from save is unreadable. Output from dump or dput is ASCII and is readable to people who understand the structure of R objects, but I'm guessing you wanted it more conventionally arranged.
> apply(dataset, 1, function(x) paste(x, sep=",", collapse=","))
[1] "1,a,HI,c(\"a\", \"b\")"
[2] "2,b,DD,c(2, 3, 4)"
[3] "3,c,gg,c(44, 33, 11, 22)"
[4] "4,d,ff,c(\"chr\", \"ID\", \"i\", \"II\")"
The backslashes do not appear in the text-file output:
writeLines(con="test.txt", apply(dataset, 1, function(x) paste(x, sep=",", collapse=",")))
#-------output-----
1,a,HI,c("a", "b")
2,b,DD,c(2, 3, 4)
3,c,gg,c(44, 33, 11, 22)
4,d,ff,c("chr", "ID", "i", "II")
If one of the requirements is to preserve the formatting for excel, etc, this might help:
writableTable <- tableFlatten(dataset, filler="")
# a b c l.01 l.02 l.03 l.04
# 1 a HI a b
# 2 b DD 2 3 4
# 3 c gg 44 33 11 22
# 4 d ff chr ID i II
write.csv(writableTable, "myFile.csv")
tableFlatten uses a function listFlatten which, as the name implies, takes nested lists and flattens them.
However, if the elements within the lists are of different sizes, it adds filler (which can be NAs, blank spaces, or any other user defined option)
The code for it is below.
tableFlatten <- function(tableWithLists, filler="") {
# takes as input a table with lists and returns a flat table
# empty spots in lists are filled with value of `filler`
#
# depends on: listFlatten(.), findGroupRanges(.), fw0(.)
# index which columns are lists
listCols <- sapply(tableWithLists, is.list)
tableWithLists[listCols]
tableWithLists[!listCols]
# flatten lists into table
flattened <- sapply(tableWithLists[listCols], listFlatten, filler=filler, simplify=FALSE)
# fix names
for (i in 1:length(flattened)) colnames(flattened[[i]]) <- fw0(ncol(flattened[[i]]), 2)
# REASSEMBLE, IN ORDER
# find pivot point counts
pivots <- sapply(findGroupRanges(listCols), length)
#index markers
indNonList <- indList <- 1
# nonListGrp <- (0:(length(pivots)/2)) * 2 + 1
# ListGrp <- (1:(length(pivots)/2)) * 2
final <- data.frame(row.names=row.names(tableWithLists))
for (i in 1:length(pivots)) {
if(i %% 2 == 1) {
final <- cbind(final,
tableWithLists[!listCols][indNonList:((indNonList<-indNonList+pivots[[i]])-1)]
)
} else {
final <- cbind(final,
flattened[indList:((indList<-indList+pivots[[i]])-1)]
)
}
}
return(final)
}
#=====================================
listFlatten <- function(obj, filler=NA) {
## Flattens obj like rbind, but if elements are of different length, plugs in value filler
# Initialize Vars
bind <- FALSE
# IF ALL ELEMENTS ARE MATRIX-LIKE OR VECTORS, MAKE SURE SAME NUMBER OF COLUMNS
matLike <- sapply(obj, function(x) !is.null(dim(x)))
vecLike <- sapply(obj, is.vector)
# If all matrix-like.
if (all(matLike)) {
maxLng <- max(sapply(obj[matLike], ncol))
obj[matLike] <- lapply(obj[matLike], function(x) t(apply(x, 1, c, rep(filler, maxLng - ncol(x)))))
bind <- TRUE
# If all vector-like
} else if (all(vecLike)) {
maxLng <- max(sapply(obj[vecLike], length))
obj[vecLike] <- lapply(obj[vecLike], function(x) c(x, rep(filler, maxLng - length(x))))
bind <- TRUE
# If all are either matrix- or vector-like
} else if (all(matLike & vecLike)) {
maxLng <- max(sapply(obj[matLike], ncol), sapply(obj[vecLike], length))
# Add in filler's as needed
obj[matLike] <-
lapply(obj[matLike], function(x) t(apply(x, 1, c, rep(filler, maxLng - ncol(x)))))
obj[vecLike] <-
lapply(obj[vecLike], function(x) c(x, rep(filler, maxLng - length(x))))
bind <- TRUE
}
# If processed and ready to be returned, then just clean it up
if(bind) {
ret <- (do.call(rbind, obj))
colnames(ret) <- paste0("L", fw0(1:ncol(ret), digs=2))
return(ret)
}
# Otherwise, if obj is sitll a list, continue recursively
if (is.list(obj)) {
return(lapply(obj, listFlatten))
}
# If none of the above, return an error.
stop("Unknown object type")
}
#--------------------------------------------
findGroupRanges <- function(booleanVec) {
# returns list of indexes indicating a series of identical values
pivots <- which(sapply(2:length(booleanVec), function(i) booleanVec[[i]] != booleanVec[[i-1]]))
### THIS ISNT NEEDED...
# if (identical(pivots, numeric(0)))
# pivots <- length(booleanVec)
pivots <- c(0, pivots, length(booleanVec))
lapply(seq(2, length(pivots)), function(i)
seq(pivots[i-1]+1, pivots[i])
)
}
#--------------------------------------------
fw0 <- function(num, digs=NULL, mkSeq=TRUE) {
## formats digits with leading 0's.
## num should be an integer or range of integers.
## if mkSeq=T, then an num of length 1 will be expanded to seq(1, num).
# TODO 1: put more error check
if (is.list(num))
lapply(num, fw0)
if (!is.vector(num)) {
stop("num should be integer or vector")
}
# convert strings to numbers
num <- as.numeric(num)
# If num is a single number and mkSeq is T, expand to seq(1, num)
if(mkSeq && !length(num)>1)
num <- (1:num)
# number of digits is that of largest number or digs, whichever is max
digs <- max(nchar(max(abs(num))), digs)
# if there are a mix of neg & pos numbers, add a space for pos numbs
posSpace <- ifelse(sign(max(num)) != sign(min(num)), " ", "")
# return: paste appropriate 0's and preface neg/pos mark
sapply(num, function(x) ifelse(x<0,
paste0("-", paste0(rep(0, max(0, digs-nchar(abs(x)))), collapse=""), abs(x)),
paste0(posSpace, paste0(rep(0, max(0, digs-nchar(abs(x)))), collapse=""), x)
))
}
#-----------------------------------------------
You can use dput for this.
dput(dataset, "dataset.txt")
you can also use save()
save(dataset, file="dataset.RData")
The answer provided by #Ananda is excellent, however, I ran into an issue when I had a data frame with two columns that were lists.
dataset<-data.frame(a=c(1,2,3,4),b=c('a','b','c','d'), c=c("HI","DD","gg","ff"))
g=list(c("a","b"),c(2,3,4), c(44,33,11,22),c("chr","ID","i","II"))
dataset$l<-g
dataset$l2<-g
dataset
a b c l l2
1 1 a HI a, b a, b
2 2 b DD 2, 3, 4 2, 3, 4
3 3 c gg 44, 33, 11, 22 44, 33, 11, 22
4 4 d ff chr, ID, i, II chr, ID, i, II
Using the original answer, both list columns contain the concatenated contents of both columns.
a b c l l2
1 1 a HI a, b, a, b a, b, a, b
2 2 b DD 2, 3, 4, 2, 3, 4 2, 3, 4, 2, 3, 4
3 3 c gg 44, 33, 11, 22, 44, 33, 11, 22 44, 33, 11, 22, 44, 33, 11, 22
4 4 d ff chr, ID, i, II, chr, ID, i, II chr, ID, i, II, chr, ID, i, II
Instead, try this modified version:
dataset2 <- dataset # make a copy just to be on the safe side
dataset2[sapply(dataset2, is.list)] <-
sapply(dataset2[sapply(dataset2, is.list)],
function(x)sapply(x, function(y) paste(unlist(y),collapse=", ") ) )
dataset2
a b c l l2
1 1 a HI a, b a, b
2 2 b DD 2, 3, 4 2, 3, 4
3 3 c gg 44, 33, 11, 22 44, 33, 11, 22
4 4 d ff chr, ID, i, II chr, ID, i, II
I stumbled across this and while there are a lot of great answers, I ended up doing something else. Sharing for posterity.
library(dplyr)
flatten_list = function(x){
if (typeof(x) != "list") {
return(x)
}
sapply(x, function(y) paste(y, collapse = " | "))
}
data %>%
mutate_each(funs(flatten_list)) ->
write_csv("data.csv")