Related
I am new to Programming and got stuck in it. I wanted to calculate the hourly temperature variation of an object throughout the year using some variables, which changes in every hour. The original data contains 60 columns and 8760 rows for the calculation.
I got the desired output using the for loop, but the model is taking a lot of time for the calculation. I wonder if there is any way to replace the loop with functions, which I suspect, can also increase the speed of the calculations.
Here is a small reproducible example to show what I did.
table <- data.table("A" = c(1), "B" = c(1:5), "C" = c(10))
table
A B C
1: 1 1 10
2: 1 2 10
3: 1 3 10
4: 1 4 10
5: 1 5 10
The forloop
for (j in (2: nrow(table))) {
table$A[j] = (table$A[j-1] + table$B[j-1]) * table$B[j]
table$C[j] = table$B[j] * table$A[j]
}
I got the output as I desired:
A B C
1: 1 1 10
2: 4 2 8
3: 18 3 54
4: 84 4 336
5: 440 5 2200
but it took 15 min to run the whole program in my case (not this!)
So I tried to use function instead of the for loop.
I tried this:
table <- data.table("A" = c(1), "B" = c(1:5), "C" = c(10))
myfun <- function(df){
df = df %>% mutate(A = (lag(A) + lag(B)) * B,
C = B * A)
return(df)
}
myfun(table)
But the output was
A B C
1 NA 1 NA
2 4 2 8
3 9 3 27
4 16 4 64
5 25 5 125
As it seems that the function refers to the rows of the first table not the updated rows after the calculation. Is there any way to obtain the desired output using functions? It is my first R project, any help is very much appreciated. Thank you.
A much faster alternative using data.table. Note that the calculation of C can be separated from the calculation of A so we can do less within the loop:
for (i in 2:nrow(table)) {
set(table, i = i, j = "A", value = with(table, (A[i-1] + B[i-1]) * B[i]))
}
table[-1, C := A * B]
table
# A B C
# <num> <int> <num>
# 1: 1 1 10
# 2: 4 2 8
# 3: 18 3 54
# 4: 84 4 336
# 5: 440 5 2200
You can try Reduce like below
dt[
,
A := Reduce(function(x, Y) (x + Y[2]) * Y[1],
asplit(embed(B, 2), 1),
init = A[1],
accumulate = TRUE
)
][
,
C := A * B
]
which updates dt as
> dt
A B C
1: 1 1 1
2: 4 2 8
3: 18 3 54
4: 84 4 336
5: 440 5 2200
data
dt <- data.table("A" = c(1), "B" = c(1:5), "C" = c(10))
Here's a solution using purrr::accumulate2 which lets you use the result of the previous computation as the input to the next one:
library(data.table)
library(purrr)
library(magrittr)
table <- data.table("A" = c(1), "B" = c(1:5), "C" = c(10))
table$A <- accumulate2(
table$A,
seq(table$A),
~ (..1 + table$B[..3]) * table$B[..3 + 1],
.init = table$A[1]
) %>%
unlist() %>%
extract(1:nrow(table))
table$C <- table$B * table$A
table
# A B C
# 1: 1 1 1
# 2: 4 2 8
# 3: 18 3 54
# 4: 84 4 336
# 5: 440 5 2200
I have a large matrix:
id v1 v2 v3 v4 v5 v6 v7 v8
1001 37 15 30 37 4 11 35 37
2111 44 31 44 30 24 39 44 18
3121 43 49 39 34 44 43 26 24
4532 45 31 26 33 12 47 37 15
5234 23 27 34 23 30 34 23 4
6345 9 46 39 34 8 43 26 24
For each row (id), I would like to identify intervals of numbers in column v1 to v8. An interval is here defined as a sequence of numbers which starts and ends with the same number.
For example, in the first row, there are two sequences which both start and ends with 37: From column 1 to 4 (37, 15, 30, 37) and from column 4 to column 8 (37, 4, 11, 35, 37).
The focal value should only occur in start and end positions. For example, in the first row, the sequence from 37 at V1, to 37 at V8 is not included, because 37 also occurs in V4.
For each interval, I want the index of the start and end columns, the focal start and end value, and the sequence of numbers in between.
Desired output:
1001 [v1] to [v4] 37 to 37: 15,30
1001 [v4] to [v8] 37 to 37: 4, 11, 35
2111 [v1] to [v3] 44 to 44: 31
2111 [v3] to [v7] 44 to 44: 30, 24, 39
Any suggestions? Algorithm?
I managed to code for the indices for a vector not a matrix,
a <- which(x == 37)
from <- a[!(a-1) %in% a]
to <- a[!(a+1) %in% a]
rbind(from, to)
Very brute-force method. Get unique elements for the given row, check if they are present more than once but not side-by-side, then lapply through each, getting the elements of the row x between them.
apply(m, 1, function(x) {
u <- unique(x)
u <- u[sapply(u, function(u) any(diff(which(x == u)) > 1))]
lapply(setNames(u, u), function(u){
ind <- which(x == u)
lapply(seq(length(ind) - 1),
function(i) x[seq(ind[i] + 1, ind[i + 1] - 1)])
})
})
Output:
# [[1]]
# [[1]]$`37`
# [[1]]$`37`[[1]]
# [1] 15 30
#
# [[1]]$`37`[[2]]
# [1] 4 11 35
#
#
#
# [[2]]
# [[2]]$`44`
# [[2]]$`44`[[1]]
# [1] 31
#
# [[2]]$`44`[[2]]
# [1] 30 24 39
#
#
#
# [[3]]
# [[3]]$`43`
# [[3]]$`43`[[1]]
# [1] 49 39 34 44
#
#
#
# [[4]]
# named list()
#
# [[5]]
# [[5]]$`23`
# [[5]]$`23`[[1]]
# [1] 27 34
#
# [[5]]$`23`[[2]]
# [1] 30 34
#
#
# [[5]]$`34`
# [[5]]$`34`[[1]]
# [1] 23 30
#
#
#
# [[6]]
# named list()
Edit: Henrik's answer inspired me to do a join-based version
library(data.table)
library(magrittr)
d <- melt(as.data.table(m), "id", variable.name = 'ci')[, ci := rowid(id)]
setorder(d, id)
options(datatable.nomatch = 0)
d[d, on = .(id, value, ci > ci)
, .(id, value, i.ci, x.ci)
, mult = 'first'] %>%
.[d, on = .(id, i.ci < ci, x.ci > ci)
, .(id, value, from_ci = x.i.ci, to_ci = x.x.ci, i.value)] %>%
.[, .(val = .(i.value))
, by = setdiff(names(.), 'i.value')]
# id value from_ci to_ci val
# 1: 1001 37 1 4 15,30
# 2: 1001 37 4 8 4,11,35
# 3: 2111 44 1 3 31
# 4: 2111 44 3 7 30,24,39
# 5: 3121 43 1 6 49,39,34,44
# 6: 5234 23 1 4 27,34
# 7: 5234 34 3 6 23,30
# 8: 5234 23 4 7 30,34
Here's a data.table alternative.
Convert matrix to data.table and melt to long format. Create a column index 'ci' to keep track of the original columns (rowid(id)). Order by 'id'.
For each 'id' and 'value' (by = .(id, value)), check if number of rows is larger than one (if(.N > 1)), i.e. if there is at least one sequence. If so, grab the row index (.I) of the sequences and their column indexes (in the original data). For each sequence, grab the corresponding values between start and end index. Wrap in list twice (.(.() to create a list column.
library(data.table)
d <- melt(as.data.table(m), id.vars = "id")
d[ , `:=`(
ci = rowid(id),
variable = NULL)]
setorder(d, id)
d2 <- d[ , if(.N > 1){
.(from = .I[-.N], to = .I[-1],
from_ci = ci[-.N], to_ci = ci[ -1])
}, by = .(id, value)]
d2[ , val := .(.(d$value[seq(from + 1, to - 1)])), by = 1:nrow(d2)]
d2[ , `:=`(from = NULL, to = NULL)]
# id value from_ci to_ci val
# 1: 1001 37 1 4 15,30
# 2: 1001 37 4 8 4,11,35
# 3: 2111 44 1 3 31
# 4: 2111 44 3 7 30,24,39
# 5: 3121 43 1 6 49,39,34,44
# 6: 5234 23 1 4 27,34
# 7: 5234 23 4 7 30,34
# 8: 5234 34 3 6 23,30
I have created a function in an R package which takes several arguments. One of these arguments is the name of a column for an R data.table.
Let's say I wanted to create a column with all values 42. For R data.table dt, I would do:
dt[, column_name:=42]
For R data.frame, I would do:
df$column_name = 42
I would like the function to take as an argument something that would define column_name. For instance, the function func called by
func(dt, col='hey')
would pass hey as the new name of the data.table column.
Here's a concrete example
renamer = function(colname, dt){
## do calculations on dt
dt[, colname:= 42]
}
If I call the function renamer(colname = 'foo', dt=dt), the resulting column name will still be colname, not the value I passed, 'foo'.
The new column should be the string 'foo'
How could I do this? I've also tried with R data.frame, or trying something with
setnames(dt, "oldname", "newname")
EDIT: I think this question should be clarified:
Here is a data.table:
> library(data.table)
> DT = data.table(ID = c("b","b","b","a","a","c"), a = 1:6, b = 7:12, c = 13:18)
> DT
ID a b c
1: b 1 7 13
2: b 2 8 14
3: b 3 9 15
4: a 4 10 16
5: a 5 11 17
6: c 6 12 18
I would like to create a function such that the new name of the column will be the string the user passes it.
e.g.
colnamer = function(newcolumname, datatable){
## do calculations on dt
## create a column with whatever string is passed via 'newcolumnname'
}
If the user calls colnamer('foobar', DT), I would like the result to be
> DT
ID a b c foobar
1: b 1 7 13 ...
2: b 2 8 14 ...
3: b 3 9 15 ...
4: a 4 10 16 ...
5: a 5 11 17 ...
6: c 6 12 18 ...
EDIT: Changed to OP's new reproducible example with two suggestions that worked as per OP's problem statement;
library(data.table)
DT <- data.table(ID = c("b","b","b","a","a","c"),
a = 1:6, b = 7:12, c = 13:18)
colnamer1 <- function(newcolumname, datatable) {
## do calculations on dt
## create a column with whatever string is passed via 'newcolumnname'
set(datatable, j = newcolumname, value = 42)
}
colnamer2 <- function(newcolumname, datatable) {
## do calculations on dt
## create a column with whatever string is passed via 'newcolumnname'
dt[, (newcolumname) := 42]
}
colnamer1("name_me", DT)
colnamer2("name_me_too", DT)
DT
# ID a b c name_me name_me_too
# 1: b 1 7 13 42 42
# 2: b 2 8 14 42 42
# 3: b 3 9 15 42 42
# 4: a 4 10 16 42 42
# 5: a 5 11 17 42 42
# 6: c 6 12 18 42 42
A possible data.frame solution? Although ever since adopting data.table my data.frame-ing is a bit rusty. Perhaps there is a more elegant solution for your problem when it comes to a data.frame.
df <- data.frame(ID = c("b","b","b","a","a","c"),
a = 1:6, b = 7:12, c = 13:18)
df_colnamer <- function(name_me, df) {
new_df <- df
new_df[[name_me]] <- 42
new_df
}
new_df <- df_colnamer("foo", df)
new_df
# ID a b c foo
# 1 b 1 7 13 42
# 2 b 2 8 14 42
# 3 b 3 9 15 42
# 4 a 4 10 16 42
# 5 a 5 11 17 42
# 6 c 6 12 18 42
With the data frame below of Locations, Days, and Quantities, I'm searching for a solution to create combinations of quantities by Location across each Day. In production, these combinations may grow pretty large, so a data.table or plyr approach would be appreciated.
library(gtools)
dat <- data.frame(Loc = c(51,51,51,51,51), Day = c("Mon","Mon","Tue","Tue","Wed"),
Qty = c(1,2,3,4,5))
The output for this example should be:
Loc Day Qty
1 51 Mon 1
2 51 Tue 3
3 51 Wed 5
4 51 Mon 1
5 51 Tue 4
6 51 Wed 5
7 51 Mon 2
8 51 Tue 3
9 51 Wed 5
10 51 Mon 2
11 51 Tue 4
12 51 Wed 5
I've tried a few nested lapply's which gets me close, but then I'm not sure how to take it to the next step and use the combn() function within each store.
lapply(split(dat, dat$Loc), function(x) {
lapply(split(x, x$Day), function(y) {
y$Qty
})
})
I'm able to get the correct combinations if each Store > Day group was in it's own list, but am struggling how to get there from a data frame using a split-apply-combine method.
loc51_mon <- c(1,2)
loc51_tue <- c(3,4)
loc51_wed <- c(5)
unlist(lapply(loc51_mon, function(x) {
lapply(loc51_tue, function(y) {
lapply(loc51_wed, function(z) {
combn(c(x,y,z), 3)
})
})
}), recursive = FALSE)
[[1]]
[[1]][[1]]
[,1]
[1,] 1
[2,] 3
[3,] 5
[[2]]
[[2]][[1]]
[,1]
[1,] 1
[2,] 4
[3,] 5
[[3]]
[[3]][[1]]
[,1]
[1,] 2
[2,] 3
[3,] 5
[[4]]
[[4]][[1]]
[,1]
[1,] 2
[2,] 4
[3,] 5
This should work however further complexity would require changes to the function:
library(data.table)
dat <- data.frame(Loc = c(51,51,51,51,51), Day = c("Mon","Mon","Tue","Tue","Wed"),
Qty = c(1,2,3,4,5), stringsAsFactors = F)
setDT(dat)
comb_in <- function(Qty_In,Day_In){
temp_df <- aggregate(Qty_In ~ Day_In, cbind(Qty_In, as.character(Day_In)), paste, collapse = "|")
temp_list <- strsplit(temp_df$Qty_In, split = "|", fixed = T)
names(temp_list) <- as.character(temp_df$Day)
melt(as.data.table(expand.grid(temp_list))[, case_group := .I], id.vars = "case_group", variable.name = "Day", value.name = "Qty")
}
dat[, comb_in(Qty_In = Qty, Day_In = Day), by = Loc][order(Loc,case_group,Day)]
Loc case_group Day Qty
1: 51 1 Mon 1
2: 51 1 Tue 3
3: 51 1 Wed 5
4: 51 2 Mon 2
5: 51 2 Tue 3
6: 51 2 Wed 5
7: 51 3 Mon 1
8: 51 3 Tue 4
9: 51 3 Wed 5
10: 51 4 Mon 2
11: 51 4 Tue 4
12: 51 4 Wed 5
You can now filter by case_group to get each combination
this question is quite similar to How to expand.grid on vectors sets rather than single elements
for a general approach (performance likely to be slower than a problem specified approach):
permu.sets <- function(listoflist) {
#assumes that each list within listoflist contains vectors of equal lengths
temp <- expand.grid(listoflist)
do.call(cbind, lapply(temp, function(x) do.call(rbind, x)))
} #permu.sets
#for the problem posted in OP
dat <- data.frame(Loc = c(51,51,51,51,51), Day = c("Mon","Mon","Tue","Tue","Wed"),
Qty = c(1,2,3,4,5))
vecsets <- lapply(split(dat, dat$Day), function(x) split(as.matrix(x), row(x)))
res <- permu.sets(vecsets)
lapply(split(res, seq(nrow(res))), function(x) matrix(x, ncol=3, byrow=T ))
I have a list with 138 tables in it (prop.table). Each table can have up to 20 variables in it (numerical categories ranging from 11-95 as the colnames). I need to convert this list to a master dataframe. The first three tables look like this:
[[1]]
x
21 41 42 43 52 71 81 82
0.02007456 0.58158876 0.22483510 0.09349011 0.05248064 0.01204474 0.00544881 0.01003728
[[2]]
x
21 41 42 43 52 71 90
0.01175122 0.36973345 0.34107194 0.03066781 0.08655775 0.01633706 0.14388077
[[3]]
x
21 22 23 41 42
0.043254082 0.008307075 0.016614151 0.930392438 0.001432254
I need to convert this to a matrix so it looks like this, with NAs or 0 when the categorical variable is not available:
x<-matrix (nrow=3, ncol=11 )
colnames(x) <-c('21', '22', '23', '41', '42', '43', '52', '71', '81', '82', '90' )
I have tried using this line from a previous similar question but the table is not correct:
df <- data.frame(matrix(unlist(prop.table), nrow=138, byrow=T))
Any suggestions on how to resolve this issue and get the table I need?
Is this is what you want?
x1 <- c(1, 5, 7)
names(x1) <- 1:3
x2 <- c(1, 2, 7)
names(x2) <- c(1,3,5)
l <- list(x1, x2)
m <- matrix(nrow=length(l), ncol=5)
colnames(m) <- 1:5
for (i in 1:length(l)) {
m[i, names(l[[i]])] <- l[[i]]
}
Maybe one can replace the loop with an apply function, but I'm not sure...Basically, I loop through the list and set in every row of the matrix those columns that match with the names of the vector in the list.
Sorry for not using your data set, but you didn't have the code at hand and I was too lazy to type it out.
rbind.fill from the plyr package will do just this for you:
# make an example `prop.table`:
tbl <- 1:10
names(tbl) <- letters[1:10]
tbl <- as.matrix(tbl)
# make sure some of the columns are missing
prop.table <- list(tbl[sample(10, size=8),], tbl[sample(10, size=7),], tbl[sample(10, size=9),])
# [[1]]
# d b g c h f e i
# 4 2 7 3 8 6 5 9
# [[2]]
# h g d a j f c
# 8 7 4 1 10 6 3
# [[3]]
# c i b d j a h g e
# 3 9 2 4 10 1 8 7 5
You can use the rbind.fill function from plyr, which is just rbind but it fills missing columns out with NA. It can take in a list of data frames to rbind together, so I convert each element of prop.table into a dataframe first (needed the t to ensure each prop.table[[i]] was treated as a row, not a column)
rbind.fill(lapply(prop.table, function (x) as.data.frame(t(x))))
# d b g c h f e i a j
# 1 4 2 7 3 8 6 5 9 NA NA
# 2 4 NA 7 3 8 6 NA NA 1 10
# 3 4 2 7 3 8 NA 5 9 1 10
(Note - you can sort the columns of the output dataframe with x[, order(colnames(x))])
Here is simple way to using lapply, rbind and do.call
ptl
## [[1]]
## x
## 21 41 42 43 52 71 81 82
## 0.02007456 0.58158876 0.22483510 0.09349011 0.05248064 0.01204474 0.00544881 0.01003728
##
## [[2]]
## x
## 21 41 42 43 52 71 90
## 0.01175122 0.36973345 0.34107194 0.03066781 0.08655775 0.01633706 0.14388077
##
## [[3]]
## x
## 21 22 23 41 42
## 0.043254082 0.008307075 0.016614151 0.930392438 0.001432254
##
## [[4]]
## x
## 21 22 31 41 42 43 81
## 0.10028653 0.03123209 0.00487106 0.66103152 0.03037249 0.01604585 0.15616046
##
## [[5]]
## x
## 21 41 42 43 81
## 0.0662080825 0.8291774147 0.0005732302 0.0865577529 0.0174835196
##
## [[6]]
## x
## 21 22 31 41 42 43 81
## 0.081948424 0.002292264 0.006303725 0.825501433 0.029226361 0.020630372 0.034097421
##
# Get unique names of all columns in tables in the list
resCol <- unique(unlist(lapply(ptl, names)))
# Get dimensions of desired result
nresCol <- length(resCol)
nresRow <- length(ptl)
# Create 'Template' data.frame row
DF <- as.data.frame(matrix(rep(0, nresCol), nrow = 1, dimnames = list(1, resCol)))
# for every table in list, create copy of DF, fill it appropriately, then rbind result together using do.call
result <- do.call(rbind, lapply(ptl, function(x) {
retDF <- DF
retDF[, names(x)] <- x
return(retDF)
}))
# rename rows(optional)
rownames(result) <- 1:nrow(result)
result
## 21 41 42 43 52 71 81 82 90 22 23 31
## 1 0.02007456 0.5815888 0.2248351018 0.09349011 0.05248064 0.01204474 0.00544881 0.01003728 0.0000000 0.000000000 0.00000000 0.000000000
## 2 0.01175122 0.3697334 0.3410719404 0.03066781 0.08655775 0.01633706 0.00000000 0.00000000 0.1438808 0.000000000 0.00000000 0.000000000
## 3 0.04325408 0.9303924 0.0014322544 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.0000000 0.008307075 0.01661415 0.000000000
## 4 0.10028653 0.6610315 0.0303724928 0.01604585 0.00000000 0.00000000 0.15616046 0.00000000 0.0000000 0.031232092 0.00000000 0.004871060
## 5 0.06620808 0.8291774 0.0005732302 0.08655775 0.00000000 0.00000000 0.01748352 0.00000000 0.0000000 0.000000000 0.00000000 0.000000000
## 6 0.08194842 0.8255014 0.0292263610 0.02063037 0.00000000 0.00000000 0.03409742 0.00000000 0.0000000 0.002292264 0.00000000 0.006303725
I'm just going to suggest one solution. How about you just concatenate all of the lists in one. So you would have
MyDataFrame
variable1 1 1 1 1 1 1 1 1
variable2 21 41 42 43 52 71 81 82
variable30.02007456 0.58158876 0.22483510 0.09349011 0.05248064 0.01204474 0.00544881 0.01003728
variable1 2 2 2 2 2 2 2
variable2 21 41 42 43 52 71 90
variable30.01175122 0.36973345 0.34107194 0.03066781 0.08655775 0.01633706 0.14388077
variable1 3 3 3 3 3
variable2 21 22 23 41 42
variable30.043254082 0.008307075 0.016614151 0.930392438 0.001432254
And once you have only one data frame. You can use the reshape function. like
install.packages('reshape')
library('reshape')
cast(MyDataFrame, variable1~variable2)
This won't be the most efficient, but using plyr and reshape2, and assuming your list of prop.tables is called foo
library(plyr)
library(reshape2)
allData <- dcast(ldply(lapply(seq_along(foo), function(x) data.frame(foo[[x]], id = x))),
id ~ x, value.var = 'Freq')
or more straight forwardly
ff <- c('21', '22', '23', '41', '42', '43', '52', '71', '81', '82', '90' )
t(sapply(foo, function(x,y) {x[ff]} ))