My loop knowledge is very minimal but I currently have a loop written, which takes values from three vectors (small.dens, med.dens, and large.dens) and each vector has 17 values. I have the loop setup to randomly select 2 values, then 3, then 4... all the way up to 17. Using these values, it calculates the mean and standard error (using the plotrix package). It then places these calculated means and standard errors into new vectors (small.density, small.stanerr, medium.density, medium.stanerr, large.density, and large.stanerr). Next, separately from the loop, I combine these vectors into a dataframe.
library(plotrix)
small.density = rep(NA,16)
small.stanerr = rep(NA,16)
medium.density = rep(NA,16)
medium.stanerr = rep(NA,16)
large.density = rep(NA,16)
large.stanerr = rep(NA,16)
for(i in 2:17){
xx=sample(small.dens,i,replace=TRUE)
small.density[[i]] = mean(xx)
small.stanerr[[i]] = std.error(xx)
yy = sample(med.dens, i, replace=TRUE)
medium.density[[i]] = mean(yy)
medium.stanerr[[i]] = std.error(yy)
zz = sample(large.dens, i, replace=TRUE)
large.density[[i]] = mean(zz)
large.stanerr[[i]] = std.error(zz)
}
I then want to run this loop 100 times, ultimately taking the mean, if that makes sense. For example, I would like it to select 2,3,4...17 values 100 times, taking the mean and standard error each time, and then taking the mean of all 100 times. Does this make sense? Would I make another for loop, turning this into a nested loop?
How would I go about doing this?
Thanks!
There are other ways to achieve what you want, but if you do not want to change your code, then just wrap it in something like this
res <- do.call(rbind, lapply(1:100, function(x) {
within(data.frame(
n = x,
size = 2:17,
small.density = rep(NA,16),
small.stanerr = rep(NA,16),
medium.density = rep(NA,16),
medium.stanerr = rep(NA,16),
large.density = rep(NA,16),
large.stanerr = rep(NA,16)
), {
for(i in 2:17){
xx = sample(small.dens,i,replace=TRUE)
small.density[[i - 1L]] = mean(xx)
small.stanerr[[i - 1L]] = std.error(xx)
yy = sample(med.dens, i, replace=TRUE)
medium.density[[i - 1L]] = mean(yy)
medium.stanerr[[i - 1L]] = std.error(yy)
zz = sample(large.dens, i, replace=TRUE)
large.density[[i - 1L]] = mean(zz)
large.stanerr[[i - 1L]] = std.error(zz)
}
rm(xx, yy, zz, i)
})
}))
res looks like this
> head(res, 20)
n size small.density small.stanerr medium.density medium.stanerr large.density large.stanerr
1 1 2 -0.04716195 0.35754422 13.1014925 4.374055 -42.089591 30.87786
2 1 3 -0.15893367 0.34557922 -0.2680632 6.206081 52.984076 36.85058
3 1 4 0.10013995 0.62374467 -0.1944930 5.784211 -112.684774 30.50707
4 1 5 0.40654132 0.40815013 1.6096970 5.026714 45.810098 46.58469
5 1 6 0.13310242 0.32104512 -6.9989844 4.232091 -22.312165 48.14705
6 1 7 0.21283027 0.53633472 -5.0702365 3.829677 -43.266482 41.74286
7 1 8 0.13870439 0.27161346 4.1629469 3.214053 -9.045643 48.49930
8 1 9 0.06495734 0.36738163 3.9742069 3.540913 -43.954345 38.23816
9 1 10 -0.01882762 0.37570468 -3.1764203 3.740403 -43.156792 38.47531
10 1 11 -0.02115580 0.26239465 -2.2026077 2.702412 7.343837 30.58314
11 1 12 0.09967753 0.27360125 3.9603382 3.214921 -13.461632 29.39910
12 1 13 0.53121414 0.27561862 4.3593802 1.872685 -38.572491 25.37029
13 1 14 0.21547909 0.36345292 -0.3377787 2.732968 17.305232 26.08317
14 1 15 0.33957964 0.23029520 0.4832063 2.886160 8.145410 18.23901
15 1 16 0.26871985 0.26846012 -6.7634873 3.436742 -4.011269 20.33814
16 1 17 0.24927792 0.20534048 -0.7481315 1.899348 9.993280 24.49623
17 2 2 -1.10840346 0.07123407 -3.4317644 6.966096 -30.384945 121.00972
18 2 3 1.73947551 0.35986535 -2.1415966 5.628115 -57.857871 10.47413
19 2 4 0.40033834 0.41963615 -4.2156733 1.206414 27.891021 13.84453
20 2 5 -0.08704736 0.52872770 0.3137693 2.974888 -3.100414 57.89126
If you want to calculate the mean of the 100 simulated values for each size, then just
aggregate(. ~ size, res[-1L], mean)
which gives you
size small.density small.stanerr medium.density medium.stanerr large.density large.stanerr
1 2 0.02872578 0.6341294 1.0938287 5.518797 3.141204 53.20675
2 3 0.16985732 0.5388110 -0.1627867 5.185643 -6.660756 49.83607
3 4 0.20543404 0.4815581 0.1385016 4.519419 -8.093673 46.64984
4 5 0.13019280 0.4546794 0.1299331 4.166335 -10.300542 41.40444
5 6 0.10675158 0.4307113 0.2191516 4.033863 -12.068151 38.95312
6 7 0.19326831 0.3834507 0.8784275 3.513812 -6.920378 36.17856
7 8 0.09020638 0.3580780 0.4388388 3.443349 -5.335405 30.49615
8 9 0.13956838 0.3558005 0.3740251 3.313501 -15.290834 31.64833
9 10 0.18368962 0.3397191 0.4600761 3.051425 -5.505220 29.46165
10 11 0.20653866 0.3116104 0.9913534 2.804659 -8.809398 28.79097
11 12 0.14653661 0.2988422 0.3337274 2.624418 -5.128882 26.78074
12 13 0.12255652 0.2864998 0.2085829 2.719396 -11.548064 27.08497
13 14 0.13102809 0.2830709 0.6448798 2.586491 -4.676053 25.21800
14 15 0.14536840 0.2749606 0.3415879 2.522826 -11.968496 24.44427
15 16 0.14871831 0.2571571 0.2218365 2.463486 -10.335511 23.64304
16 17 0.13664397 0.2461108 0.3387764 2.348594 -9.969407 22.84736
Related
I'm rewriting some code, and I am currently creating a small population model. I have re-created the current model function below from a book, it's a simple population model based on a few parameters. I've left them at default and returned the data frame. Everything works well. However, I was wondering whether I could somehow exclude the loop from the function.
I know R is great because of vectorized calculation, but I'm not sure in this case whether it would be possible. I thought of using something like lead/lag to do it, but would this work? Perhaps not as things need to be calculated sequentially?
# Nt numbers at start of time t
# Ct = removed at the end of time t
# Nt0 = numbers at time 0
# r = intrinsic rate of population growth
# K = carrying capacity
mod_fun = function (r = 0.5, K = 1000, N0 = 50, Ct = 0, Yrs = 10, p = 1)
{
# sets years to year value plus 1
yr1 <- Yrs + 1
# creates sequence of length years from year 1 to Yrs value +!
years <- seq(1, yr1, 1)
# uses years length to create a vector of length Yrs + 1
pop <- numeric(yr1)
# sets population at time 0
pop[1] <- N0
# creates a loop that calculates model for each year after first year
for (i in 2:yr1) {
# sets starting value of population for step to one calculated previous step
# thus Nt is always the previous step pop size
Nt <- pop[i - 1]
pop[i] <- max((Nt + (r * Nt/p) * (1 - (Nt/K)^p) -
Ct), 0)
}
# sets pop2 to original pop length
pop2 <- pop[2:yr1]
# binds together years (sequence from 1 to length Yrs),
# pop which is created in loop and is the population at the start of step t
# pop2 which is the population at the end of step t
out <- cbind(year = years, nt = pop, nt1 = c(pop2, NA))
# sets row names to
rownames(out) <- years
out <- out[-yr1, ]
#returns data.frame
return(out)
}
result = mod_fun()
This is what the output looks like. Basically rowwise starting from row 1 given the starting population of 50 the loop calculates nt1 then sets next nt row to lag(nt1) and then things continue in a similar fashion.
result
#> year nt nt1
#> 1 1 50.0000 73.7500
#> 2 2 73.7500 107.9055
#> 3 3 107.9055 156.0364
#> 4 4 156.0364 221.8809
#> 5 5 221.8809 308.2058
#> 6 6 308.2058 414.8133
#> 7 7 414.8133 536.1849
#> 8 8 536.1849 660.5303
#> 9 9 660.5303 772.6453
#> 10 10 772.6453 860.4776
Created on 2022-04-24 by the reprex package (v2.0.1)
mod_fun = function (r = 0.5, K = 1000, N0 = 50, Ct = 0, Yrs = 10, p = 1)
{
years <- seq_len(Yrs)
pop <- Reduce(function(Nt, y)max((Nt + (r * Nt/p) * (1 - (Nt/K)^p) - Ct), 0),
years, init = N0, accumulate = TRUE)
data.frame(year = years, nt = head(pop,-1), nt1 = pop[-1])
}
year nt nt1
1 1 50.0000 73.7500
2 2 73.7500 107.9055
3 3 107.9055 156.0364
4 4 156.0364 221.8809
5 5 221.8809 308.2058
6 6 308.2058 414.8133
7 7 414.8133 536.1849
8 8 536.1849 660.5303
9 9 660.5303 772.6453
10 10 772.6453 860.4776
I am working with the R programming language.
I generated the following random data set that contains x and y points:
set.seed(123)
x_cor = rnorm(10,100,100)
y_cor = rnorm(10,100,100)
my_data = data.frame(x_cor,y_cor)
x_cor y_cor
1 43.95244 222.40818
2 76.98225 135.98138
3 255.87083 140.07715
4 107.05084 111.06827
5 112.92877 44.41589
6 271.50650 278.69131
7 146.09162 149.78505
8 -26.50612 -96.66172
9 31.31471 170.13559
10 55.43380 52.72086
I am trying to write a "greedy search" algorithm that shows which point is located the "shortest distance" from some starting point.
For example, suppose we start at -26.50612, -96.66172
distance <- function(x1,x2, y1,y2) {
dist <- sqrt((x1-x2)^2 + (y1 - y2)^2)
return(dist)
}
Then I calculated the distance between -26.50612, -96.66172 and each point :
results <- list()
for (i in 1:10){
distance_i <- distance(-26.50612, my_data[i,1], -96.66172, my_data[i,2] )
index = i
my_data_i = data.frame(distance_i, index)
results[[i]] <- my_data_i
}
results_df <- data.frame(do.call(rbind.data.frame, results))
However, I don't think this is working because the distance between the starting point -26.50612, -96.66172 and itself is not 0 (see 8th row):
distance_i index
1 264.6443 1
2 238.7042 2
3 191.3048 3
4 185.0577 4
5 151.7506 5
6 306.4785 6
7 331.2483 7
8 223.3056 8
9 213.3817 9
10 331.6455 10
My Question:
Can someone please show me how to write a function that correctly finds the nearest point from an initial point
(Step 1) Then removes the nearest point and the initial point from "my_data"
(Step 2) And then re-calculates the nearest point from "my_data" using the nearest point identified Step 1 (i.e. with the removed data)
And in the end, shows the path that was taken (e.g. 5,7,1,9,3, etc)
Can someone please show me how to do this?
Thanks!
This could be helpful and I think you can solve the further tasks by yourself
start <- c(x= -26.50612, y= -96.66172)
library(dplyr)
my_data <- data.frame(x_cor,y_cor) %>%
rowwise() %>%
mutate(dist = distance(start["x"], x_cor, start["y"], y_cor))
The solution is implemented as a recursive function distmin, which finds the closest point between an input x and a dataframe Y and then calls itself with the closest point and the dataframe without the closest point as arguments.
EDIT: I reimplemented distmin to use dataframes.
my_data = data.frame(x_cor,y_cor) |>
mutate(idx = row_number())
distmin <- function(x, Y) {
if(nrow(Y) == 0) {
NULL
} else {
dst <- sqrt((x$x_cor - Y$x_cor)^2 + (x$y_cor - Y$y_cor)^2)
m <- which.min(dst)
res <- data.frame(x, dist = dst[m], nearest = Y[m,"idx"])
rbind(res, distmin(Y[m,], Y[-m,]))
}}
N <- 5
distmin(my_data[N,], my_data[-N,])
##> x_cor y_cor idx dist nearest
##> 5 112.92877 44.41589 5 58.09169 10
##> 10 55.43380 52.72086 10 77.90211 4
##> 4 107.05084 111.06827 4 39.04847 2
##> 2 76.98225 135.98138 2 57.02661 9
##> 9 31.31471 170.13559 9 53.77858 1
##> 1 43.95244 222.40818 1 125.32571 7
##> 7 146.09162 149.78505 7 110.20762 3
##> 3 255.87083 140.07715 3 139.49323 6
##> 6 271.50650 278.69131 6 479.27176 8
The following shows the order in which points are called.
distmin(my_data[N,], my_data[-N,]) |>
mutate(ord = row_number()) |>
ggplot(aes(x = x_cor, y_cor)) +
geom_text(aes(label = ord))
I'm finding working with the arule package a bit tricky. I'm using the apriori algorithm to find association rules; something similar to an example in the arules documentation.
data("AdultUCI")
dim(AdultUCI)
AdultUCI[1:2,]
#Ignore everything from here to the last two lines, this is just data preparation
## remove attributes
AdultUCI[["fnlwgt"]] <- NULL
AdultUCI[["education-num"]] <- NULL
## map metric attributes
AdultUCI[[ "age"]] <- ordered(cut(AdultUCI[[ "age"]], c(15,25,45,65,100)),
labels = c("Young", "Middle-aged", "Senior", "Old"))
AdultUCI[[ "hours-per-week"]] <- ordered(cut(AdultUCI[[ "hours-per-week"]],
c(0,25,40,60,168)),
labels = c("Part-time", "Full-time", "Over-time", "Workaholic"))
AdultUCI[[ "capital-gain"]] <- ordered(cut(AdultUCI[[ "capital-gain"]],
c(-Inf,0,median(AdultUCI[[ "capital-gain"]][AdultUCI[[ "capital-gain"]]>0]),
Inf)), labels = c("None", "Low", "High"))
AdultUCI[[ "capital-loss"]] <- ordered(cut(AdultUCI[[ "capital-loss"]],
c(-Inf,0, median(AdultUCI[[ "capital-loss"]][AdultUCI[[ "capital-loss"]]>0]),
Inf)), labels = c("None", "Low", "High"))
#resume reading here
rules <- apriori(AdultUCI, parameter=list(support=0.6, confidence=0.75, minlen=4))
inspect(rules)
Which returns the following four rules
lhs rhs support confidence lift
1 {race=White,
capital-gain=None,
native-country=United-States} => {capital-loss=None} 0.680398 0.9457029 0.9920537
2 {race=White,
capital-loss=None,
native-country=United-States} => {capital-gain=None} 0.680398 0.9083504 0.9901500
3 {race=White,
capital-gain=None,
capital-loss=None} => {native-country=United-States} 0.680398 0.9189249 1.0239581
4 {capital-gain=None,
capital-loss=None,
native-country=United-States} => {race=White} 0.680398 0.8730100 1.0210133
I must be missing something: how do you find the rows in the source data that match an lhs rule using just arules functions?
Is there an easy way to build an SQL WHERE clause from the lhs(rules)?
Thanks
This answer is based in the following answer: https://stats.stackexchange.com/questions/21340/finding-suitable-rules-for-new-data-using-arules.
The solution is very slow, i´m not sure if will work for large aplications.
library(arules)
rules <- apriori(AdultUCI, parameter=list(support=0.4, confidence=0.75, minlen=4))
inspect(rules)
rec <- function(rules, data, iter){
basket <- data[iter]
rulesMatchLHS <- is.subset(rules#lhs,basket)
suitableRules <- rulesMatchLHS & !(is.subset(rules#rhs,basket))
rules <- sort(rules[rulesMatchLHS], decreasing=TRUE, by="lift")
as(head(rules, 1), "data.frame")
}
recom_loop <- function(rules, data){
temp <- lapply(seq_along(data), function(x) rec(rules, data, x))
temp <- do.call("rbind", temp)
recom <- gsub(".*=> |\\{|\\}", "", temp$rules)
as.data.frame(cbind(as(data, "data.frame"), recom))
}
trans <- as(AdultUCI, "transactions")
recom <- recom_loop(rules, trans[1:50])
Here is some example output:
head(recom)
transactionID
1 1
2 2
3 3
4 4
5 5
6 6
items
1 {age=Middle-aged,workclass=State-gov,education=Bachelors,marital-status=Never-married,occupation=Adm-clerical,relationship=Not-in-family,race=White,sex=Male,capital-gain=Low,capital-loss=None,hours-per-week=Full-time,native-country=United-States,income=small}
2 {age=Senior,workclass=Self-emp-not-inc,education=Bachelors,marital-status=Married-civ-spouse,occupation=Exec-managerial,relationship=Husband,race=White,sex=Male,capital-gain=None,capital-loss=None,hours-per-week=Part-time,native-country=United-States,income=small}
3 {age=Middle-aged,workclass=Private,education=HS-grad,marital-status=Divorced,occupation=Handlers-cleaners,relationship=Not-in-family,race=White,sex=Male,capital-gain=None,capital-loss=None,hours-per-week=Full-time,native-country=United-States,income=small}
4 {age=Senior,workclass=Private,education=11th,marital-status=Married-civ-spouse,occupation=Handlers-cleaners,relationship=Husband,race=Black,sex=Male,capital-gain=None,capital-loss=None,hours-per-week=Full-time,native-country=United-States,income=small}
5 {age=Middle-aged,workclass=Private,education=Bachelors,marital-status=Married-civ-spouse,occupation=Prof-specialty,relationship=Wife,race=Black,sex=Female,capital-gain=None,capital-loss=None,hours-per-week=Full-time,native-country=Cuba,income=small}
6 {age=Middle-aged,workclass=Private,education=Masters,marital-status=Married-civ-spouse,occupation=Exec-managerial,relationship=Wife,race=White,sex=Female,capital-gain=None,capital-loss=None,hours-per-week=Full-time,native-country=United-States,income=small}
recom
1 race=White
2 race=White
3 race=White
4 race=White
5 race=White
6 capital-gain=None
As for the first question, transactions supporting may be found using this function (should work faster, than the one from the other response):
supp_trans_ids = function(items, transactions){
# makes a logical matrix showing which set of items in rows are fully contains in transactions on rows
tmp = is.subset(items, transactions)
tmp2 = lapply(
seq_len(nrow(tmp)),
# 'which' alone would leave a name for each index, which is a complete rule (and would use a lot of memory therefore)
function(i) {
t = which(tmp[i,])
names(t) = NULL
t
}
)
# to easily idenfify sets of items
names(tmp2) = rownames(tmp)
tmp2
}
Now, you may find which transactions support each rule's lhs with:
AdultUCI_trans = as(AdultUCI, 'transactions')
trans_supporting = supp_trans_ids(lhs(rules), AdultUCI_trans)
e.g.
> str(trans_supporting)
List of 4
$ {race=White,capital-gain=None,native-country=United-States} : int [1:35140] 2 3 6 8 13 17 18 19 20 21 ...
$ {race=White,capital-loss=None,native-country=United-States} : int [1:36585] 1 2 3 6 8 9 10 13 17 18 ...
$ {race=White,capital-gain=None,capital-loss=None} : int [1:36164] 2 3 6 8 13 17 18 19 20 21 ...
$ {capital-gain=None,capital-loss=None,native-country=United-States}: int [1:38066] 2 3 4 6 8 11 13 14 17 18 ...
And data you may find with:
AdultUCI_trans[trans_supporting[[1]]] # transactions supporting
AdultUCI[trans_supporting[[1]],] # data on which these transactions are based
I am creating correlations using R, with the following code:
Values<-read.csv(inputFile, header = TRUE)
O<-Values$Abundance_O
S<-Values$Abundance_S
cor(O,S)
pear_cor<-round(cor(O,S),4)
outfile<-paste(inputFile, ".jpg", sep = "")
jpeg(filename = outfile, width = 15, height = 10, units = "in", pointsize = 10, quality = 75, bg = "white", res = 300, restoreConsole = TRUE)
rx<-range(0,20000000)
ry<-range(0,200000)
plot(rx,ry, ylab="S", xlab="O", main="O vs S", type="n")
points(O,S, col="black", pch=3, lwd=1)
mtext(sprintf("%s %.4f", "pearson: ", pear_cor), adj=1, padj=0, side = 1, line = 4)
dev.off()
pear_cor
I now need to find the lower quartile for each set of data and exclude data that is within the lower quartile. I would then like to rewrite the data without those values and use the new column of data in the correlation analysis (because I want to threshold the data by the lower quartile). If there is a way I can write this so that it is easy to change the threshold by applying arguments from Java (as I have with the input file name) that's even better!
Thank you so much.
I have now implicated the answer below and that is working, however I need to keep the pairs of data together for the correlation. Here is an example of my data (from csv):
Abundance_O Abundance_S
3635900.752 1390.883073
463299.4622 1470.92626
359101.0482 989.1609251
284966.6421 3248.832403
415283.663 2492.231265
2076456.856 10175.48946
620286.6206 5074.268802
3709754.717 269.6856808
803321.0892 118.2935093
411553.0203 4772.499758
50626.83554 17.29893001
337428.8939 203.3536852
42046.61549 152.1321255
1372013.047 5436.783169
939106.3275 7080.770535
96618.01393 1967.834701
229045.6983 948.3087208
4419414.018 23735.19352
So I need to exclude both values in the row if one does not meet my quartile threshold (0.25 quartile). So if the quartile for O was 45000 then the row "42046.61549,152.1321255" would be removed. Is this possible? If I read in both columns as a dataframe can I search each column separately? Or find the quartiles and then input that value into code to remove the appropriate rows?
Thanks again, and sorry for the evolution of the question!
Please try to provide a reproducible example, but if you have data in a data.frame, you can subset it using the quantile function as the logical test. For instance, in the following data we want to select only rows from the dataframe where the value of the measured variable 'Val' is above the bottom quartile:
# set.seed so you can reproduce these values exactly on your system
set.seed(39856)
df <- data.frame( ID = 1:10 , Val = runif(10) )
df
ID Val
1 1 0.76487516
2 2 0.59755578
3 3 0.94584374
4 4 0.72179297
5 5 0.04513418
6 6 0.95772248
7 7 0.14566118
8 8 0.84898704
9 9 0.07246594
10 10 0.14136138
# Now to select only rows where the value of our measured variable 'Val' is above the bottom 25% quartile
df[ df$Val > quantile(df$Val , 0.25 ) , ]
ID Val
1 1 0.7648752
2 2 0.5975558
3 3 0.9458437
4 4 0.7217930
6 6 0.9577225
7 7 0.1456612
8 8 0.8489870
# And check the value of the bottom 25% quantile...
quantile(df$Val , 0.25 )
25%
0.1424363
Although this is an old question, I came across it during research of my own and I arrived at a solution that someone may be interested in.
I first defined a function which will convert a numerical vector into its quantile groups. Parameter n determines the quantile length (n = 4 for quartiles, n = 10 for deciles).
qgroup = function(numvec, n = 4){
qtile = quantile(numvec, probs = seq(0, 1, 1/n))
out = sapply(numvec, function(x) sum(x >= qtile[-(n+1)]))
return(out)
}
Function example:
v = rep(1:20)
> qgroup(v)
[1] 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3 4 4 4 4 4
Consider now the following data:
dt = data.table(
A0 = runif(100),
A1 = runif(100)
)
We apply qgroup() across the data to obtain two quartile group columns:
cols = colnames(dt)
qcols = c('Q0', 'Q1')
dt[, (qcols) := lapply(.SD, qgroup), .SDcols = cols]
head(dt)
> A0 A1 Q0 Q1
1: 0.72121846 0.1908863 3 1
2: 0.70373594 0.4389152 3 2
3: 0.04604934 0.5301261 1 3
4: 0.10476643 0.1108709 1 1
5: 0.76907762 0.4913463 4 2
6: 0.38265848 0.9291649 2 4
Lastly, we only include rows for which both quartile groups are above the first quartile:
dt = dt[Q0 + Q1 > 2]
I have a matrix with n rows and n columns and I would like to do binning average 10 rows at a time, which means in the end I am left with a matrix of size n/10-by-n. I added the matlab library and tried the following code:
nRemove = rem(size(a,1),10);
a = a(1:end-nRemove,:)
Avg = mean(reshape(a,10,[],n));
AvgF = squeeze(Avg);
but it didn't work, which code/codes should i use?
Thanks!!
Here is another way to do it:
set.seed(5)
x = matrix(runif(1000), ncol = 10)
nr = nrow(x)
gr = rep(1:floor(nr/10), each = 10)
aggregate(x ~ gr, FUN=mean)[,-1]
which results in
NA NA.1 NA.2 NA.3 NA.4 NA.5 NA.6 NA.7
1 0.5295264 0.5957229 0.4502069 0.5168083 0.3398190 0.4075922 0.6059122 0.5127865
2 0.4778341 0.3967321 0.4069635 0.4514742 0.6172677 0.2486085 0.6340686 0.4052600
3 0.5168132 0.5117207 0.5202261 0.5068593 0.5218041 0.4925462 0.5169584 0.4919296
4 0.3299557 0.3314723 0.4503393 0.3965103 0.6166598 0.5525628 0.4943880 0.6048207
5 0.6145423 0.5853235 0.4822182 0.3377771 0.3540784 0.5974846 0.5202577 0.5769518
6 0.5009249 0.5203701 0.3940540 0.4237508 0.3199265 0.4817713 0.4655320 0.6124400
7 0.7335082 0.5856578 0.3929621 0.6403662 0.5347719 0.5658542 0.4226456 0.7196593
8 0.4976663 0.5205538 0.4529273 0.4757352 0.6980300 0.5694570 0.4384924 0.5481236
9 0.5275932 0.5014861 0.5363340 0.5664576 0.5006055 0.5611069 0.3803889 0.4680865
10 0.4560031 0.5527328 0.4419076 0.6893043 0.5161281 0.5895931 0.3965911 0.3842419
NA.8 NA.9
1 0.3711607 0.5541607
2 0.4379255 0.4159131
3 0.5048523 0.5884052
4 0.4642687 0.4572388
5 0.6054209 0.5174784
6 0.4659952 0.5332438
7 0.4568273 0.3943798
8 0.6978356 0.5087778
9 0.4897584 0.4710949
10 0.6310546 0.4775762
t( sapply(1:(NROW(A)/10), function(x) colMeans(A[ x:(x+9), ] ) ) )
You need the transpose operation to re-orient the result. One often needs to do so after an 'apply' operation.