Using approx() with groups in dplyr

Using approx() with groups in dplyr - r

I am trying to use approx() and dplyr to interpolate values in an existing array. My initial code looks like this ...
p = c(1,1,1,2,2,2)
q = c(1,2,3,1,2,3)
r = c(1,2,3,4,5,6)
Inputs<- data.frame(p,q,r)
new.inputs= as.numeric(c(1.5,2.5))
library(dplyr)
Interpolated <- Inputs %>%
group_by(p) %>%
arrange(p, q) %>%
mutate(new.output=approx(x=q, y=r, xout=new.inputs)$y)
I expect to see 1.5, 2.5, 4.5, 5.5 but instead I get
Error: incompatible size (2), expecting 3 (the group size) or 1
Can anyone tell me where I am going wrong?

You can get the values you expect using dplyr.
library(dplyr)
Inputs %>%
group_by(p) %>%
arrange(p, q, .by_group = TRUE) %>%
summarise(new.outputs = approx(x = q, y = r, xout = new.inputs)$y)
# p new.outputs
# <dbl> <dbl>
# 1 1.5
# 1 2.5
# 2 4.5
# 2 5.5
You can also get the values you expect using the ddply function from plyr.
library(plyr)
# Output as coordinates
ddply(Inputs, .(p), summarise, new.output = paste(approx(
x = q, y = r, xout = new.inputs
)$y, collapse = ","))
# p new.output
# 1 1.5,2.5
# 2 4.5,5.5
#######################################
# Output as flattened per group p
ddply(Inputs,
.(p),
summarise,
new.output = approx(x = q, y = r, xout = new.inputs)$y)
# p new.output
# 1 1.5
# 1 2.5
# 2 4.5
# 2 5.5

Related

fast looping for user written functions

I have written two functions
a) the first create simulated data and estimates a model
b) the second iterates this process a number of times, and average statistics from multiple simulations.
The third step I would like to do is to iterate this process across different sample sizes. I know how to do this with a for loop but it takesvery long. Does anyone has suggestions on how to improve looping speed?
In particular, I would be interested in using parallel processing or evaluating alternative looping packages like purrr.
Here is an example:
# create a the first function simulates data and estimates the model
genmodel <- function (n,meanx,meany){
df <- as.data.frame(list(mean_x=rnorm(n=n, mean=meanx, sd=1)))
df <- df %>% mutate(mean_y=rnorm(n=n, mean=meany, sd=1))
model<- lm_robust(mean_y ~ mean_x, data=df,
se_type = "stata")
pval<- as.data.frame(list(p=summary(model)$coefficients)) %>% t()
pval <- as.data.frame(pval) %>% rownames_to_column()
return(pval)
}
# example
> genmodel(n=100,meanx=2,meany=1)
rowname (Intercept) mean_x
1 p.Estimate 9.984653e-01 -0.05115484
2 p.Std..Error 2.027905e-01 0.10273142
3 p.t.value 4.923630e+00 -0.49794738
4 p.Pr...t.. 3.441203e-06 0.61963671
5 p.CI.Lower 5.960341e-01 -0.25502201
6 p.CI.Upper 1.400896e+00 0.15271232
7 p.DF 9.800000e+01 98.00000000
Generate the second function that that iterate the first function a number of times and averages estimated statistics
average_model <- function(nrep=100, # number of simulations
n,
mean_x,
mean_y
){
tmpres<- lapply(1:nrep, function(x) genmodel(n=n,meanx=mean_x,meany=mean_y))
tmpres <- do.call(rbind, tmpres)
vec<- names(tmpres[2:ncol(tmpres)])
tmpres <- unique(setDT(tmpres)[,paste("avg",(vec),sep = "_"):=map(.SD,~ mean(.x)),by=rowname,.SDcols=(vec)
][,nobs:=n] %>% select(rowname,`avg_(Intercept)`,avg_mean_x,nobs))
}
# example
tst<-average_model(nrep=50,n=100,mean_x=2,mean_y=1)
rowname avg_(Intercept) avg_mean_x nobs
1: p.Estimate 1.06002378 -0.03100749 100
2: p.Std..Error 0.22368299 0.09921118 100
3: p.t.value 4.83878275 -0.31190506 100
4: p.Pr...t.. 0.00206157 0.45198433 100
5: p.CI.Lower 0.61613217 -0.22788884 100
6: p.CI.Upper 1.50391540 0.16587386 100
7: p.DF 98.00000000 98.00000000 100
Now my objective is to iterate this average_model function over different sample sizes and to create a unique data frame with all of the information. This can be easily done using a for loop
for (i in seq(from=100,to=500,by=30)){
tmpres <- average_model(nrep=50,n=i,mean_x=2,mean_y=1)
results <- rbind(results, tmpres) # sequentially paste results
head(results)
rowname avg_(Intercept) avg_mean_x nobs
1: p.Estimate 1.001296821 0.000989775 100
2: p.Std..Error 0.224800002 0.099078646 100
3: p.t.value 4.530076894 0.027428073 100
4: p.Pr...t.. 0.001934362 0.504152193 100
5: p.CI.Lower 0.555188534 -0.195628574 100
6: p.CI.Upper 1.447405108 0.197608124 100
# it can also be done using `apply`, but both approach are quite slow
tmpres<- lapply(seq(from=100,to=500,by=30), function(x) average_model(nrep=50,n=x,mean_x=2,mean_y=1)
tmpres <- do.call(rbind, tmpres)
The problem with this for loop is that it is extremely slow.
Is there a way I could do this using parallel processing? Other suggestions for reducing running time?

This "all data.table" approach is about twice as fast, but still disappointing.
The basic idea is to assemble all the datasets into one large data.table and then cycle through the models using data.table group by.
library(data.table)
library(estimatr)
library(tictoc)
##
tic()
mf <- data.table(nrep=1:50, meanx=2, meany=1)
mf <- mf[, .(n=seq(100, 500, 30)), by=.(nrep, meanx, meany)]
data <- mf[, .(mean_x=rnorm(n, meanx), mean_y=rnorm(n, meany)), by=.(n, nrep, meanx, meany)]
result <- data[, as.data.table(t(summary(lm(mean_y~mean_x, .SD, se_type = 'stata'))$coefficients), keep.rownames = TRUE)
, by=.(n, nrep, meanx, meany)][, nrep:=NULL]
result <- result[, lapply(.SD, mean), by=.(n, meanx, meany, rn)]
toc()
## 2.58 sec elapsed
So this takes between 2.3 - 2.6 sec on my machine, wheres your code runs in about 4.0 - 4.1 sec. About 80% of the time is spent running lm_robust(...). If I swap that out for lm(...) in base R it runs in about 1 sec.

This can be done more straight forward:
expand_grid(
nreps = 50,
n = seq.default(100, 500, by = 30),
mean_x = 2, mean_y = 1
) %>%
rowid_to_column("n_idx") %>%
uncount(nreps, .remove = FALSE) %>%
rowid_to_column("nreps_idx") %>%
rowwise() %>%
mutate(
lm_robust =
estimatr::lm_robust(
y ~ X,
data =
tibble(y = rnorm(n, mean = mean_y, sd = 1),
X = rnorm(n, mean = mean_x, sd = 1)),
se_type = "stata"
) %>%
coefficients() %>%
set_names(str_c("coef_", names(.))) %>%
list()
) %>%
unnest_wider(lm_robust) %>%
group_by(nreps_idx) %>%
summarise(
n = unique(n),
across(starts_with("coef"), mean),
)
Which result in
# A tibble: 700 × 4
nreps_idx n `coef_(Intercept)` coef_X
<int> <dbl> <dbl> <dbl>
1 1 100 1.34 -0.183
2 2 100 0.845 0.0188
3 3 100 0.949 0.0341
4 4 100 1.20 -0.0705
5 5 100 0.731 0.0419
6 6 100 0.809 0.0564
7 7 100 0.920 0.0558
8 8 100 1.22 -0.0673
9 9 100 1.22 -0.171
10 10 100 1.26 -0.127
# … with 690 more rows
Which is computed fairly quickly.
Now, I've not included all the parameters in your code, because honestly it doesn't make sense to take the mean of them, but if you want them as well then...
expand_grid(
nreps = 50,
n = seq.default(100, 500, by = 30),
mean_x = 2, mean_y = 1
) %>%
rowid_to_column("n_idx") %>%
uncount(nreps, .remove = FALSE) %>%
rowid_to_column("nreps_idx") %>%
rowwise() %>%
mutate(
lm_robust =
estimatr::lm_robust(
y ~ X,
data =
tibble(y = rnorm(n, mean = mean_y, sd = 1),
X = rnorm(n, mean = mean_x, sd = 1)),
se_type = "stata"
) %>%
# SECOND APPROACH
summary() %>%
`[[`("coefficients") %>%
as_tibble(rownames = "rowname") %>%
pivot_wider(names_from = "rowname",
values_from = everything()) %>%
# FIRST APPROACH
# coefficients() %>%
# set_names(str_c("coef_", names(.))) %>%
list()
) %>%
unnest_wider(lm_robust) %>%
print() %>%
group_by(nreps_idx) %>%
summarise(
n = unique(n),
across(starts_with("Estimate"), mean),
# insert statements here to summarise the other gathered stuff
)
But this makes things unnecessarily complicated.

Graph learning in R, igraph, tidygraph

I have a graph with each node having a value (value in red).
I would like to do the following two things (I guess 1 is a special case of 2):
Each node should be assigned the mean of the value of the direct peers directing to it. For example node #5 (1+2)/2=1.5 or node #3 (0+2+0)/3=2/3.
Instead of direct neighbors, include all connected nodes but with a diffusion of times 1/n with n being the distance to the node. The further away the information is coming from the weaker signal we'd have.
I looked into functions of igraph, but could not find anything that is doing this (I might have overseen though). How could I do this computation?
Below is the code for a sample network with random values.
library(tidyverse)
library(tidygraph)
library(ggraph)
set.seed(6)
q <- tidygraph::play_erdos_renyi(6, p = 0.2) %>%
mutate(id = row_number(),
value = sample(0:3, size = 6, replace = T))
q %>%
ggraph(layout = "with_fr") +
geom_edge_link(arrow = arrow(length = unit(0.2, "inches"),
type = "closed")) +
geom_node_label(aes(label = id)) +
geom_node_text(aes(label = value), color = "red", size = 7,
nudge_x = 0.2, nudge_y = 0.2)
Edit, found a solution to 1
q %>%
mutate(value_smooth = map_local_dbl(order = 1, mindist = 1, mode = "in",
.f = function(neighborhood, ...) {
mean(as_tibble(neighborhood, active = 'nodes')$value)
}))
Edit 2, solution to 2, not the most elegant I guess
q %>%
mutate(value_smooth = map_local_dbl(order = 1, mindist = 0, mode = "in",
.f = function(neighborhood, node, ...) {
ne <- neighborhood
ne <- ne %>%
mutate(d = node_distance_to(which(as_tibble(ne,
active = "nodes")$id == node)))
as_tibble(ne, active = 'nodes') %>%
filter(d != 0) %>%
mutate(helper = value/d) %>%
summarise(m = mean(value)) %>%
pull(m)
}))
Edit 3, a faster alternative to map_local_dbl
map_local loops through all nodes of the graph. For large graphs, this takes very long. For just computing the means, this is not needed. A much faster alternative is to use the adjacency matrix and some matrix multiplication.
q_adj <- q %>%
igraph::as_adjacency_matrix()
# out
(q_adj %*% as_tibble(q)$value) / Matrix::rowSums(q_adj)
# in
(t(q_adj) %*% as_tibble(q)$value) / Matrix::colSums(q_adj)
The square of the adjacency matrix is the second order adjacency matrix, and so forth. So a solution to problem 2 could also be created.
Edit 4, direct weighted mean
Say the original graph has weights associated to each edge.
q <- q %>%
activate(edges) %>%
mutate(w = c(1,0.5,1,0.5,1,0.5,1)) %>%
activate(nodes)
We would like to compute the weighted mean of the direct peers' value.
q_adj_wgt <- q %>%
igraph::as_adjacency_matrix(attr = "w")
# out
(q_adj_wgt %*% as_tibble(q)$value) / Matrix::rowSums(q_adj_wgt)
# in
(t(q_adj_wgt) %*% as_tibble(q)$value) / Matrix::colSums(q_adj_wgt)

Probably you can try the code below
q %>%
set_vertex_attr(
name = "value",
value = sapply(
ego(., mode = "in", mindist = 1),
function(x) mean(x$value)
)
)
which gives
# A tbl_graph: 6 nodes and 7 edges
#
# A directed simple graph with 1 component
#
# Node Data: 6 x 2 (active)
id value
<int> <dbl>
1 1 0.5
2 2 NaN
3 3 0.667
4 4 NaN
5 5 1.5
6 6 NaN
#
# Edge Data: 7 x 2
from to
<int> <int>
1 3 1
2 6 1
3 1 3
# ... with 4 more rows

Each node should be assigned the mean of the value of the direct peers
directing to it.
Guessing that you really mean
Each node should be assigned the mean of the values of the direct peers directing to it, before any node values were changed
This seems trivial - maybe I am missing something?
Loop over nodes
Sum values of adjacent nodes
Calculate mean and store in vector by node index
Loop over nodes
Set node value to mean stored in previous loop

Looking for an apply, tidyr or dplyr solution to a nested for loop situation in R

Weirdly for this one, I think its easier to start by viewing the df.
#reproducible data
quantiles<-c("50","90")
var=c("w","d")
df=data.frame(a=runif(20,0.01,.5),b=runif(20,0.02,.5),c=runif(20,0.03,.5),e=runif(20,0.04,.5),
q50=runif(20,1,5),q90=runif(20,10,50))
head(df)
I want to automate a function that I've created (below) to calculate vars using different combinations of values from my df.
For example, the calculation of w needs to use a and b, and d needs to use c and e such that w = a *q ^ b and d = c * q ^ e. Further, q is a quantile, so I actually want w50, w90, etc., which will correspond to q50, q90 etc. from the df.
The tricky part as i see it is setting the condition to use a & b vs. c & d without using nested loops.
I have a function to calculate vars using the appropriate columns, however I can't get all the pieces together efficiently.
#function to calculate the w, d
calc_wd <- function(df,col_name,col1,col2,col3){
#Calculate and create new column col_name for each combo of var and quantile, e.g. "w_50", "d_50", etc.
df[[col_name]] <- df[[col1]] * (df[[col2]] ^ (df[[col3]]))
df
}
I can get this to work for a single case, but not by automating the coefficient swap... you'll see I specify "a" and "b" below.
wd<-c("w_","d_")
make_wd_list<-apply(expand.grid(wd, quantiles), 1, paste,collapse="")
calc_wdv(df,make_wd_list[1],"a",paste0("q",sapply(strsplit(make_wd_list[1],"_"),tail,1)),"b")
Alternatively, I have tried to make this work using nested for loops, but can't seem to append the data correctly. And its ugly.
var=c("w","d")
dataf<-data.frame()
for(j in unique(var)){
if(j=="w"){
coeff1="a"
coeff2="b"
}else if(j=="d"){
coeff1="c"
coeff1="e"
}
print(coeff1)
print(coeff2)
for(k in unique(quantiles)){
dataf<-calc_wd(df,paste0(j,k),coeff1,paste0("q",k),coeff2)
dataf[k,j]=rbind(df,dataf) #this aint right. tried to do.call outside, etc.
}
}
In the end, I'm looking to have new columns with w_50, w_90, etc., which use q50, q90 and the corresponding coefficients as defined originally.

One approach I find easy to type is using purrr::pmap. I like this because when you use with(list(...),), you can access the column names of your data.frame by name. Additionally, you can supply additional arguments.
library(purrr)
pmap_df(df, quant = "q90", ~with(list(...),{
list(w = a * get(quant) ^ b, d = c * get(quant) ^ e)
}))
## A tibble: 20 x 2
# w d
# <dbl> <dbl>
# 1 0.239 0.295
# 2 0.152 0.392
# 3 0.476 0.828
# 4 0.344 0.236
# 5 0.439 1.00
You could combine this with for example a second map call to iterate over quantiles.
library(dplyr)
map(setNames(quantiles,quantiles),
~ pmap_df(df, quant = paste0("q",.x),
~ with(list(...),{list(w = a * get(quant) ^ b, d = c * get(quant) ^ e)}))
) %>% do.call(cbind,.)
# 50.w 50.d 90.w 90.d
#1 0.63585897 0.11045837 1.7276019 0.1784987
#2 0.17286184 0.22033649 0.2333682 0.5200265
#3 0.32437528 0.72502654 0.5722203 1.4490065
#4 0.68020897 0.33797621 0.8749206 0.6179557
#5 0.73516886 0.38481785 1.2782923 0.4870877
Then assigning a custom function is trivial.
calcwd <- function(df,quantiles){
map(setNames(quantiles,quantiles),
~ pmap_df(df, quant = paste0("q",.x),
~ with(list(...),{list(w = a * get(quant) ^ b, d = c * get(quant) ^ e)}))
) %>% do.call(cbind,.)
}

I love #Ian's answer for the completeness and the use of classics like with and do.call. I'm late to the scene with my solution but since I have been trying to get better with rowwise operations (including the use of rowwise thought I would offer up a less elegant but simpler and faster solution using just mutate, formula.tools and map_dfc
library(dplyr)
library(purrr)
require(formula.tools)
# same type example data plus a much larger version in df2 for
# performance testing
df <- data.frame(a = runif(20, 0.01, .5),
b = runif(20, 0.02, .5),
c = runif(20, 0.03, .5),
e = runif(20, 0.04, .5),
q50 = runif(20,1,5),
q90 = runif(20,10,50)
)
df2 <- data.frame(a = runif(20000, 0.01, .5),
b = runif(20000, 0.02, .5),
c = runif(20000, 0.03, .5),
e = runif(20000, 0.04, .5),
q50 = runif(20000,1,5),
q90 = runif(20000,10,50)
)
# from your original post
quantiles <- c("q50", "q90")
wd <- c("w_", "d_")
make_wd_list <- apply(expand.grid(wd, quantiles),
1,
paste, collapse = "")
make_wd_list
#> [1] "w_q50" "d_q50" "w_q90" "d_q90"
# an empty list to hold our formulas
eqn_list <- vector(mode = "list",
length = length(make_wd_list)
)
# populate the list makes it very extensible to more outcomes
# or to more quantile levels
for (i in seq_along(make_wd_list)) {
if (substr(make_wd_list[[i]], 1, 1) == "w") {
eqn_list[[i]] <- as.formula(paste(make_wd_list[[i]], "~ a * ", substr(make_wd_list[[i]], 3, 5), " ^ b"))
} else if (substr(make_wd_list[[i]], 1, 1) == "d") {
eqn_list[[i]] <- as.formula(paste(make_wd_list[[i]], "~ c * ", substr(make_wd_list[[i]], 3, 5), " ^ e"))
}
}
# formula.tools helps us grab both left and right sides
add_column <- function(df, equation){
df <- transmute_(df, rhs(equation))
colnames(df)[ncol(df)] <- as.character(lhs(equation))
return(df)
}
result <- map_dfc(eqn_list, ~ add_column(df = df, equation = .x))
#> w_q50 d_q50 w_q90 d_q90
#> 1 0.10580863 0.29136904 0.37839737 0.9014040
#> 2 0.34798729 0.35185585 0.64196417 0.4257495
#> 3 0.79714122 0.37242915 1.57594506 0.6198531
#> 4 0.56446922 0.43432160 1.07458217 1.1082825
#> 5 0.26896574 0.07374273 0.28557366 0.1678035
#> 6 0.36840408 0.72458466 0.72741030 1.2480547
#> 7 0.64484009 0.69464045 1.93290705 2.1663690
#> 8 0.43336109 0.21265672 0.46187366 0.4365486
#> 9 0.61340404 0.47528697 0.89286358 0.5383290
#> 10 0.36983212 0.53292900 0.53996112 0.8488402
#> 11 0.11278412 0.12532491 0.12486156 0.2413191
#> 12 0.03599639 0.25578020 0.04084221 0.3284659
#> 13 0.26308183 0.05322304 0.87057854 0.1817630
#> 14 0.06533586 0.22458880 0.09085436 0.3391683
#> 15 0.11625845 0.32995233 0.12749040 0.4730407
#> 16 0.81584442 0.07733376 2.15108243 0.1041342
#> 17 0.38198254 0.60263861 0.68082354 0.8502999
#> 18 0.51756058 0.43398089 1.06683204 1.3397900
#> 19 0.34490492 0.13790601 0.69168711 0.1580659
#> 20 0.39771037 0.33286225 1.32578056 0.4141457
microbenchmark::microbenchmark(result <- map_dfc(eqn_list, ~ add_column(df = df2, equation = .x)), times = 10)
#> Unit: milliseconds
#> expr min
#> result <- map_dfc(eqn_list, ~add_column(df = df2, equation = .x)) 10.58004
#> lq mean median uq max neval
#> 11.34603 12.56774 11.6257 13.24273 16.91417 10
The mutate and formula solution is about fifty times faster although both rip through 20,000 rows in less than a second
Created on 2020-04-30 by the reprex package (v0.3.0)

R: Error in calculating the average of a variable at different time intervals for many factors using for loop

I have a data frame in which a variable(var1) is expressed over time in seconds. I want to calculate the mean of var1 for each sample at different time intervals (10 seconds interval until 500 seconds).
the dataframe looks like this:
sample time var1
S1 1 3.5
S1 2 6.3
S1 3 7.8
S1 4 20.5
S1 … ...
S1 530 4.5
S2 1 6.7
S2 2 20.3
S2 3 5.4
S2 … ...
S2 710 70.3
...
The data frame that I want to obtain looks like this
Sample var1_mean10:20sec var1_mean20:30sec .... var1_mean490:500sec
S1
S2
..
So I wrote this code:
setwd("…")
A <- read_excel("dati.xlsx")
for (cat in unique(A$sample))
{
A.s <- subset(A, A$sample == cat)
cuts <- cut (A.s$time, breaks=seq.int(from = 0, to = 500, by = 10))
d <- by (A.s$var1, cuts, mean)
Y<-data.frame(d)
j <- t(Y)
write.csv(Y, file = paste(cat, "var1", sep = "_"))
}
But when I run it I get Error message: Error in as.data.frame.default(x[[i]], optional = TRUE) : cannot coerce class ""by"" to a data.frame
The plan is to eventually merge all the different csv.

If I understood your problem correctly you are trying to average your data in 10 second interval. I would like to propose an alternative approach using the function aggregate to compute the mean across the 10 seconds interval. The 10 seconds interval would be created through a fictitious 'time' array used to group your 10 seconds interval and then averaging.
# try to create some data similar to yours
A <- data.frame(sample = c(rep('A1', 530), rep('A2', 710)),
time = c(1 : 530, 1:710), var1 = runif(530+710))
A$times <- ceiling(A$time / 10)
Y <- aggregate(var1 ~ sample + times, data = A, FUN = mean)
Then you could export tmp straightaway.
HTH

Solved :
A <- read_excel("data.xlsx")
n <- subset(A, time <= 500)
d<-data.frame(sample= n$sample, time= n$time, ms=n$var1)
storage.data<-data.frame(matrix(nrow = n, ncol = n))
for(cat in unique(d$sample)){
g <- subset(d, d$sample == cat)
cuts <- cut (g$time, breaks=seq.int(from = 0, to = 500, by = 10))
p <- by (g$ms, cuts, mean)
storage.data[cat] = p}
View(storage.data)
storage.data_t <- t(storage.data)
View(storage.data_t)
write.csv(storage.data_t, file = "filename.csv")

group and average a large numeric vector to plot

I have an R matrix which is very data dense. It has 500,000 rows. If I plot 1:500000 (x axis) to the third column of the matrix mat[, 3] it takes too long to plot, and sometimes even crashes. I've tried plot, matplot, and ggplot and all of them take very long.
I am looking to group the data by 10 or 20. ie, take the first 10 elements from the vector, average that, and use that as a data point.
Is there a fast and efficient way to do this?

We can use cut and aggregate to reduce the number of points plotted:
generate some data
set.seed(123)
xmat <- data.frame(x = 1:5e5, y = runif(5e5))
use cut and aggregate
xmat$cutx <- as.numeric(cut(xmat$x, breaks = 5e5/10))
xmat.agg <- aggregate(y ~ cutx, data = xmat, mean)
make plot
plot(xmat.agg, pch = ".")
more than 1 column solution:
Here, we use the data.table package to group and summarize:
generate some more data
set.seed(123)
xmat <- data.frame(x = 1:5e5,
u = runif(5e5),
z = rnorm(5e5),
p = rpois(5e5, lambda = 5),
g = rbinom(n = 5e5, size = 1, prob = 0.5))
use data.table
library(data.table)
xmat$cutx <- as.numeric(cut(xmat$x, breaks = 5e5/10))
setDT(xmat) #convert to data.table
#for each level of cutx, take the mean of each column
xmat[,lapply(.SD, mean), by = cutx] -> xmat.agg
# xmat.agg
# cutx x u z p g
# 1: 1 5.5 0.5782475 0.372984058 4.5 0.6
# 2: 2 15.5 0.5233693 0.032501186 4.6 0.8
# 3: 3 25.5 0.6155837 -0.258803746 4.6 0.4
# 4: 4 35.5 0.5378580 0.269690334 4.4 0.8
# 5: 5 45.5 0.3453964 0.312308395 4.8 0.4
# ---
# 49996: 49996 499955.5 0.4872596 0.006631221 5.6 0.4
# 49997: 49997 499965.5 0.5974486 0.022103345 4.6 0.6
# 49998: 49998 499975.5 0.5056578 -0.104263093 4.7 0.6
# 49999: 49999 499985.5 0.3083803 0.386846148 6.8 0.6
# 50000: 50000 499995.5 0.4377497 0.109197095 5.7 0.6
plot it all
par(mfrow = c(2,2))
for(i in 3:6) plot(xmat.agg[,c(1,i), with = F], pch = ".")

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Using approx() with groups in dplyr - r

Related

fast looping for user written functions

Graph learning in R, igraph, tidygraph

Looking for an apply, tidyr or dplyr solution to a nested for loop situation in R

R: Error in calculating the average of a variable at different time intervals for many factors using for loop

group and average a large numeric vector to plot

Categories

Resources