Data frame to 3D array and calculate mean in Z - r

I have a data frame read from CSV which contains 14 columns and 990 rows. Each set of 110 rows contains repeats of structured data (not the values) with the first 5 columns being labels.
I now want to create a new grid of 14x110, such that if columns are labelled with letters and rows are numbered numerically, then A1 to E110 of the new grid are the labels and F1 contains the mean average of F1 in the original frame, and so on through to N110.
I have never used R before, and have got as far as calculating the mean of one cell with
mean(data[c(seq.int(3,nrow(d),110)),c(6)])
but I need some help with repeating this for the rest of the cells and constructing a resulting data frame, please.

To transform an matrix to a 3D array
yourarray=array(unlist(yourmatrix),dim = c(110,14,9))
Then to take an average of z values you can do something like
out=matrix(NA,110,14)
for(n in 1:14){
for(i in 1:110){out[i,n]=mean(b[i,n,])}}
Example
a=matrix(1:125,25,5)
b=array(unlist(a),dim = c(5,5,5))
out=matrix(NA,5,5)
for(n in 1:5){
for(i in 1:5){out[i,n]=mean(b[i,n,])}}
> out
[,1] [,2] [,3] [,4] [,5]
[1,] 51 56 61 66 71
[2,] 52 57 62 67 72
[3,] 53 58 63 68 73
[4,] 54 59 64 69 74
[5,] 55 60 65 70 75
Hope this is what you're after.

Related

workflow for image analysis and metrics using RTextureMetrics

I'm working with grayscale (0 black; 255 white) images (2560 x 2048) of surfaces and would like to extract Haralick et al 1973 metrics (ASM, contrast, etc.) using package RTextureMetrics.
The following are the grayscale values (0 to 255) of the initial 5 rows and columns from the matrix (a sample image):
Image2[1:5, 1:5]
[,1] [,2] [,3] [,4] [,5]
[1,] 35 33 29 29 36
[2,] 43 41 39 40 47
[3,] 44 44 44 47 54
[4,] 46 49 51 54 60
[5,] 60 64 67 68 71
From here the workflow seems quite straightforward: generate the Grey Level Co-occurrence Matrix using genGLCM() and extract desired metrics.
library(RTextureMetrics)
# Generate Grey Level Co-occurrence Matrix (direction east, distance one pixel)
GLCM = genGLCM(1, 1, Image2)
# Get desired metrics
calcCON(GLCM)
calcHOM(GLCM)
calcDIS(GLCM)
calcASM(GLCM)
calcENT(GLCM)
However, results seem odd. Haralik et al report a maximum contrast of 4.709 while the above sample matrix has a contrast of 25.3388 (the below table shows metrics from the sample matrix). Additionally, when looping over files of images, counterintuitive results are obtained.
Cont
Hom
Dis
ASM
ENT
25.3388
0.192384
3.9036
0.00028943
-7.6482
Please note that I am new to texture analysis and my experience is limited. Also have been trying package glcm but an initial workflow with RTextureMetrics seems more desireble.

Choose the highest row value of an array

I have the following matrix as the result of a neuralnetowrk classification.
[,1] [,2] [,3] [,4]
78 6.679997e-04 4.650186e-05 9.820879e-01 4.037018e-02
85 6.721164e-05 4.037081e-03 3.442273e-04 9.993829e-01
97 5.889365e-06 8.632577e-03 7.168499e-04 9.992764e-01
52 2.118997e-01 5.272690e-04 9.340079e-01 2.318471e-05
63 1.630762e-05 2.278233e-04 9.999697e-01 1.327665e-05
11 9.999995e-01 8.570293e-04 1.033523e-05 1.954824e-03
4 9.999998e-01 4.675230e-03 4.100173e-06 1.386167e-04
67 8.230676e-08 3.901855e-05 9.999998e-01 2.482015e-05
82 3.113818e-05 4.045431e-03 4.980008e-04 9.994791e-01
59 2.199707e-02 8.932616e-05 9.996509e-01 3.201505e-06
68 6.396933e-05 3.507847e-05 9.999431e-01 2.231336e-04
50 3.644305e-03 9.955089e-01 6.152610e-07 2.438749e-03
65 2.985633e-01 3.111180e-04 7.284095e-04 9.567911e-01
84 8.953203e-08 2.043904e-03 2.796990e-02 9.997651e-01
33 5.182628e-03 9.959819e-01 1.582604e-07 9.150829e-03
29 4.094475e-03 9.936016e-01 2.439294e-07 1.378562e-02
21 9.999986e-01 2.920500e-03 2.343490e-04 8.551598e-06
79 2.356930e-01 1.064989e-04 9.998469e-01 8.037159e-08
54 9.760921e-07 1.125948e-04 9.999947e-01 4.913316e-05
71 7.575290e-05 1.901314e-03 9.998013e-01 1.212056e-06
73 3.069030e-02 1.351355e-04 9.961720e-01 2.970879e-05
98 1.852377e-05 1.071308e-02 1.508556e-03 9.923317e-01
8 9.999967e-01 1.091833e-03 8.615699e-05 3.788923e-04
55 7.353873e-05 1.572100e-04 9.999848e-01 2.654150e-06
87 6.485545e-05 1.801804e-03 2.487318e-03 9.978182e-01
66 1.075623e-04 9.965178e-05 9.999943e-01 1.090936e-06
6 9.999996e-01 2.057387e-03 5.199279e-06 8.711600e-04
46 1.675466e-03 9.923240e-01 5.403372e-07 1.406461e-02
48 2.897351e-03 9.948545e-01 2.023942e-07 1.650545e-02
28 4.179047e-03 9.950091e-01 1.261037e-07 2.139333e-02
99 6.191239e-08 2.242249e-02 7.910123e-04 9.999195e-01
47 1.265915e-03 9.928326e-01 1.905755e-07 6.175589e-02
41 2.460404e-02 9.910379e-01 2.134886e-07 6.080052e-03
45 1.416097e-03 9.904895e-01 4.379419e-07 3.060463e-02
18 9.999999e-01 2.119948e-03 4.377037e-06 2.702198e-04
What I want to do is to get the highest value in each row. More precisely, I want a vector with the list of the highest values for all the columns.
The first case would be number3:
78 6.679997e-04 4.650186e-05 **9.820879e-01** 4.037018e-02
The values represent the probability of correctly choosing the label of a case given the data.
If we want the column index of the max value per row, just use max.col
max.col(m1, "first")
Or with apply
apply(m1, 1, which.max)
Inorder to get the max values, we can use apply with MARGIN = 1 to loop over the rows and get the max
apply(m1, 1, max)
For columns, use MARGIN = 2
apply(m1, 2, max)
A vectorized option is max.col to get the max value per row
m1[cbind(seq_len(nrow(m1)), max.col(m1, "first")]
Or with pmax if it is a data.frame
do.call(pmax, as.data.frame(m1))

What can do to find and remove semi-duplicate rows in a matrix?

Assume I have this matrix
set.seed(123)
x <- matrix(rnorm(410),205,2)
x[8,] <- c(0.13152348, -0.05235148) #similar to x[5,]
x[16,] <- c(1.21846582, 1.695452178) #similar to x[11,]
The values are very similar to the rows specified above, and in the context of the whole data, they are semi-duplicates. What could I do to find and remove them? My original data is an array that contains many such matrices, but the position of the semi duplicates is the same across all matrices.
I know of agrep but the function operates on vectors as far as I understand.
You will need to set a threshold, but you can just compute the distance between each row using dist and find the points that are sufficiently close together. Of course, Each point is near itself, so you need to ignore the diagonal of the distance matrix.
DM = as.matrix(dist(x))
diag(DM) = 1 ## ignore diagonal
which(DM < 0.025, arr.ind=TRUE)
row col
8 8 5
5 5 8
16 16 11
11 11 16
48 48 20
20 20 48
168 168 71
91 91 73
73 73 91
71 71 168
This finds the "close" points that you created and a few others that got generated at random.

R - Apply function with different argument value for each row/column of a matrix

I am trying to apply a function to each row or column of a matrix, but I need to pass a different argument value for each row.
I thought I was familiar with lapply, mapply etc... But probably not enough.
As a simple example :
> a<-matrix(1:100,ncol=10);
> a
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 1 11 21 31 41 51 61 71 81 91
[2,] 2 12 22 32 42 52 62 72 82 92
[3,] 3 13 23 33 43 53 63 73 83 93
[4,] 4 14 24 34 44 54 64 74 84 94
[5,] 5 15 25 35 45 55 65 75 85 95
[6,] 6 16 26 36 46 56 66 76 86 96
[7,] 7 17 27 37 47 57 67 77 87 97
[8,] 8 18 28 38 48 58 68 78 88 98
[9,] 9 19 29 39 49 59 69 79 89 99
[10,] 10 20 30 40 50 60 70 80 90 100
Let's say I want to apply a function to each row, I would do :
apply(a, 1, myFunction);
However my function takes an argument, so :
apply(a, 1, myFunction, myArgument);
But if I want my argument to take a different value for each row, I cannot find the right way to do it.
If I define a 'myArgument' with multiple values, the whole vector will obviously be passed to each call of 'myFunction'.
I think that I would need a kind of hybrid between apply and the multivariate mapply. Does it make sense ?
One 'dirty' way to achieve my goal is to split the matrix by rows (or columns), use mapply on the resulting list and merge the result back to a matrix :
do.call(rbind, Map(myFunction, split(a,row(a)), as.list(myArgument)));
I had a look at sweep, aggregate, all the *apply variations but I wouldn't find the perfect match to my need. Did I miss it ?
Thank you for your help.
You can use sweep to do that.
a <- matrix(rnorm(100),10)
rmeans <- rowMeans(a)
a_new <- sweep(a,1,rmeans,`-`)
rowMeans(a_new)
I don't think there are any great answers, but you can somewhat simplify your solution by using mapply, which handles the "rbind" part for you, assuming your function always returns the same sizes vector (also, Map is really just mapply):
a <- matrix(1:80,ncol=8)
myFun <- function(x, y) (x - mean(x)) * y
myArg <- 1:nrow(a)
t(mapply(myFun, split(a, row(a)), myArg))
I know the topic is quiet old but I had the same issue and I solved it that way:
# Original matrix
a <- matrix(runif(n=100), ncol=5)
# Different value for each row
v <- runif(n=nrow(a))
# Result matrix -> Add a column with the row number
o <- cbind(1:nrow(a), a)
fun <- function(x, v) {
idx <- 2:length(x)
i <- x[1]
r <- x[idx] / v[i]
return(r)
}
o <- t(apply(o, 1, fun, v=v)
By adding a column with the row number to the left of the original matrix, the index of the needed value from the argument vector can be received from the first column of the data matrix.

Apply over all columns and rows of two diffrent dataframes in R

I try to apply a function over all rows and columns of two dataframes but I don't know how to solve it with apply.
I think the following script explains what I intend to do and the way i tried to solve it. Any advice would be warmly appreciated! Please note, that the simplefunction is only intended to be an example function to keep it simple.
# some data and a function
df1<-data.frame(name=c("aa","bb","cc","dd","ee"),a=sample(1:50,5),b=sample(1:50,5),c=sample(1:50,5))
df2<-data.frame(name=c("aa","bb","cc","dd","ee"),a=sample(1:50,5),b=sample(1:50,5),c=sample(1:50,5))
simplefunction<-function(a,b){a+b}
# apply on a single row
simplefunction(df1[1,2],df2[1,2])
# apply over all colums
apply(?)
## apply over all columns and rows
# create df to receive results
df3<-df2
# loop it
for (i in 2:5)df3[i]<-apply(?)
My first mapply answer!! For your simple example you have...
mapply( FUN = `+` , df1[,-1] , df2[,-1] )
# a b c
# [1,] 60 35 75
# [2,] 57 39 92
# [3,] 72 71 48
# [4,] 31 19 85
# [5,] 47 66 58
You can extend it like so...
mapply( FUN = function(x,y,z,etc){ simplefunctioncodehere} , df1[,-1] , df2[,-1] , ... other dataframes here )
The dataframes will be passed in order to the function, so in this example df1 would be x, df2 would be y and z and etc would be some other dataframes that you specify in that order. Hopefully that makes sense. mapply will take the first row, first column values of all dataframes and apply the function, then the first row, second column of all data frames and apply the function and so on.
You can also use Reduce:
set.seed(45) # for reproducibility
Reduce(function(x,y) { x + y}, list(df1[, -1], df2[,-1]))
# a b c
# 1 53 22 23
# 2 64 28 91
# 3 19 56 51
# 4 38 41 53
# 5 28 42 30
You can just do :
df1[,-1] + df2[,-1]
Which gives :
a b c
1 52 24 37
2 65 63 62
3 31 90 89
4 90 35 33
5 51 33 45

Resources