How to delete the conditioned row from matrix in r - r

so basicly I want to separate a random generated matrix into 2 matrix, 1 for training and 1 for testing.
a <- s[sample(nrow(s),size=3,replace=FALSE),]
b <- s[-a,]
> s
[,1] [,2]
[1,] 0.69779187 -0.75869384
[2,] -0.46857477 -0.33813598
[3,] 0.53903809 -0.95950598
[4,] -0.33312675 -0.49951164
[5,] 0.88500834 0.08256923
[6,] 0.63664652 0.87420720
[7,] 0.61614134 0.77893294
[8,] 0.36956134 0.07586245
[9,] -0.03678593 -0.23743987
[10,] -0.27057064 -0.86067063
> a
[,1] [,2]
[1,] 0.8850083 0.08256923
[2,] 0.6366465 0.87420720
[3,] -0.2705706 -0.86067063
> b
[,1] [,2]
The idea here is generate a 10*2 matrix, and random pick 3 rows as training data from matrix, then output the training matrix and the rest row of matrix as testing matrix.
Does anyone has some suggestions on how to delete a from s?

The issue is that you're trying to index s with a matrix a, rather than the randomly selected indices. Modifying your code to the following should do the trick:
i <- sample(nrow(s),size=3,replace=FALSE)
a <- s[i,]
b <- s[-i,] # Note the indexing with i, rather than a

Related

I want to write an R function get the corresponding index of a pair of data

I have a dataset which has 2 variables and want to write an R function as follows: If I input one pair (one row) arbitrarily from the dataset into this function, I want to extract the corresponding index of the dataset through the function.
I used,
data = rnorm2d(10,rho = 0.4)
x = c(data[10,1],data[10,2])
print(match(x, data))
generated dataset is:
[,1] [,2]
[1,] -0.1792099 1.3007178
[2,] 0.3280193 0.6615251
[3,] -0.4390389 -1.9611801
[4,] -1.3096660 -0.9117184
[5,] 0.5165317 -0.3229271
[6,] -1.0963584 -1.1492360
[7,] 0.3447118 0.5357070
[8,] -0.8919166 0.4934032
[9,] -0.2199690 0.5788579
[10,] -0.9864628 0.6880458
But this gave me an output as follows:
[1] 10 20

R - Dividing columns of matrix list by vector list

I have a list of matrices and a list of vectors, and I want to divide the columns of each matrix with the corresponding vector element.
For example, given
set.seed(230)
data <- list(cbind(c(NA, rnorm(6)),c(rnorm(6),NA)), cbind(runif(7), runif(7)))
divisors <- list(c(0.5,2), c(3,4))
I'm looking for a vectorized function that produces output that looks the same as
for(i in 1:length(data)){
for(j in 1:ncol(data[[i]])){data[[i]][,j] <- data[[i]][,j] / divisors[[i]][j]}
}
i.e.
[[1]]
[,1] [,2]
[1,] NA 0.28265752
[2,] -0.46967014 -0.07132588
[3,] 0.20253439 -0.37432527
[4,] 0.65736410 0.06630705
[5,] 0.72349294 0.67202129
[6,] 0.88532648 -0.80892508
[7,] 0.08162027 NA
[[2]]
[,1] [,2]
[1,] 0.26597435 0.18120979
[2,] 0.31213250 0.16493883
[3,] 0.19250804 0.14104145
[4,] 0.21196882 0.10172964
[5,] 0.10389773 0.04979742
[6,] 0.02754329 0.15064043
[7,] 0.25771766 0.23042586
The closest I have been able to come is
Map(`/`, data, divisors)
But that divides rows (rather than columns) of the matrix by the vector. Any help appreciated.
Transpose your matrices before and after:
lapply(Map(`/`, lapply(data, t), divisors), t)
# [[1]]
# [,1] [,2]
# [1,] NA 0.28265752
# [2,] -0.46967014 -0.07132588
# [3,] 0.20253439 -0.37432527
# [4,] 0.65736410 0.06630705
# [5,] 0.72349294 0.67202129
# [6,] 0.88532648 -0.80892508
# [7,] 0.08162027 NA
#
# [[2]]
# [,1] [,2]
# [1,] 0.26597435 0.18120979
# [2,] 0.31213250 0.16493883
# [3,] 0.19250804 0.14104145
# [4,] 0.21196882 0.10172964
# [5,] 0.10389773 0.04979742
# [6,] 0.02754329 0.15064043
# [7,] 0.25771766 0.23042586
I prefer the transpose approach above, but another option is to expand your divisor vectors into matrices of the same dimensions as in data:
div_mat = Map(matrix, data = divisors, nrow = sapply(data, nrow), ncol = 2, byrow = T)
Map("/", data, div_mat)

Matrix into another matrix with specified dimensions

I have a matrix with 2 columns, and I'd like to turn it into a matrix with specified dimensions.
> t <- matrix(rnorm(20), ncol=2, nrow=10)
[,1] [,2]
[1,] 1.4938530 1.2493088
[2,] -0.8079445 1.8715868
[3,] 0.5775695 -0.9277420
[4,] 0.4415969 2.6357908
[5,] 0.3209226 -1.1306049
[6,] 0.5109251 -0.8661100
[7,] 1.9495571 0.2092941
[8,] 0.7816373 1.1517466
[9,] 0.0300595 -0.1351532
[10,] 0.7550894 0.7778869
What I'd like to do is something like:
> tt <- matrix(t, ncol=4, nrow=5)
[,1] [,2] [3,] [4,]
[1,] 1.4938530 1.2493088 -0.8079445 1.8715868
[2,] 0.5775695 -0.9277420 0.4415969 2.6357908
[3,] etc.
I tried to do things with modulo but my head hurts too much for me to try even one more minute.
You can transpose your first matrix, so that data is stored in the order you want, and then fill the second matrix by row:
tt <- matrix(t(t), ncol=4, nrow=5, byrow = T)
t
# [,1] [,2]
# [1,] -1.4162465950 0.01532476
# [2,] -0.2366332875 -0.04024386
# [3,] 0.5146631983 -0.34720239
# [4,] 1.9243922633 -0.24016160
# [5,] 1.6161165230 0.63187438
# [6,] -0.3558181508 -0.73199138
# [7,] 0.7459405376 0.01934826
# [8,] -1.0428581093 -2.04422042
# [9,] 0.0003166344 0.98973993
#[10,] 0.6390745275 -0.65584930
tt
# [,1] [,2] [,3] [,4]
# [1,] -1.4162465950 0.01532476 -0.2366333 -0.04024386
# [2,] 0.5146631983 -0.34720239 1.9243923 -0.24016160
# [3,] 1.6161165230 0.63187438 -0.3558182 -0.73199138
# [4,] 0.7459405376 0.01934826 -1.0428581 -2.04422042
# [5,] 0.0003166344 0.98973993 0.6390745 -0.65584930
When you work with matrix in R, you can think of it as a vector with data stored column by column. So extracting data by row from a matrix is not as straight forward as extracting by column which is essentially how data is stored. After transposing the first matrix, the data will be stored in an order you want to extract and then fill the second matrix by row would be straight forward.

`sweep() function` in R taking `2L` as input

Very, very specific question, but I'm stuck trying to unravel the code within contr.poly() in R.
I am at what I think is the last hurdle... There is this internal function, make.poly(), which is the critical part of contr.poly(). Within make.poly I see that there is a raw matrix generated, which for contr.poly(4) is:
[,1] [,2] [,3] [,4]
[1,] 1 -1.5 1 -0.3
[2,] 1 -0.5 -1 0.9
[3,] 1 0.5 -1 -0.9
[4,] 1 1.5 1 0.3
From there the function sweep() is applied with the following call and result:
Z <- sweep(raw, 2L, apply(raw, 2L, function(x) sqrt(sum(x^2))),
"/", check.margin = FALSE)
[,1] [,2] [,3] [,4]
[1,] 0.5 -0.6708204 0.5 -0.2236068
[2,] 0.5 -0.2236068 -0.5 0.6708204
[3,] 0.5 0.2236068 -0.5 -0.6708204
[4,] 0.5 0.6708204 0.5 0.2236068
I am familiar with the apply functions, and I guess sweep is similar, at least in syntax, but I don't understand what 2L is doing, and I don't know if "/" and check.margin = F are important to understand the mathematical operation being performed.
EDIT: Quite easy... thanks to this - it just normalizes vector lengths by dividing "/" by the function(x) applied column-wise, each entry of the matrix.
Here is an example that answers the operation in the function sweep().
I start with a matrix
> set.seed(0)
> (mat = matrix(rnorm(30, 5, 3), nrow= 10))
[,1] [,2] [,3]
[1,] 8.7888629 7.290780 4.327196
[2,] 4.0212999 2.602972 6.132187
[3,] 8.9893978 1.557029 5.400009
[4,] 8.8172880 4.131615 7.412569
[5,] 6.2439243 4.102355 4.828680
[6,] 0.3801499 3.765468 6.510824
[7,] 2.2142989 5.756670 8.257308
[8,] 4.1158387 2.324237 2.927138
[9,] 4.9826985 6.307050 1.146202
[10,] 12.2139602 1.287385 5.140179
and I want to center the data columnwise. Granted, I could use scale(mat, center = T, scale = F) and be done, but I find that this function give you a list of attributes at the end as such:
attr(,"scaled:center")
[1] 6.076772 3.912556 5.208229
corresponding to the column means. Good to have, but I just wanted the matrix, clean and neat. So it turns out that this can be achieved with:
> (centered = sweep(mat, 2, apply(mat,2, function(x) mean(x)),"-"))
[,1] [,2] [,3]
[1,] 2.7120910 3.3782243 -0.88103281
[2,] -2.0554720 -1.3095838 0.92395779
[3,] 2.9126259 -2.3555271 0.19177993
[4,] 2.7405161 0.2190592 2.20433938
[5,] 0.1671524 0.1897986 -0.37954947
[6,] -5.6966220 -0.1470886 1.30259477
[7,] -3.8624730 1.8441143 3.04907894
[8,] -1.9609332 -1.5883194 -2.28109067
[9,] -1.0940734 2.3944938 -4.06202721
[10,] 6.1371883 -2.6251713 -0.06805063
So the sweep() function is understood as:
sweep(here goes matrix name to sweep through, tell me if you want to do it column (2) or row wise (1), but first let's calculate the second argument to use in the sweep - let's use apply on either the same matrix, or another matrix: just type the name here, again... column or row wise, now define a function(x) mean(x), almost done: now the actual operation in the function in quotes: "-" or "/"... and done
Interestingly, we could have used the means of the columns of a completely different matrix to then sweep through the original matrix - presumably a more complex operation, more in line with the reason why this function was developed.
> aux.mat = matrix(rnorm(9), nrow = 3)
> aux.mat
[,1] [,2] [,3]
[1,] -0.2793463 -0.4527840 -1.065591
[2,] 1.7579031 -0.8320433 -1.563782
[3,] 0.5607461 -1.1665705 1.156537
> (centered = sweep(mat, 2, apply(aux.mat,2, function(x) mean(x)),"-"))
[,1] [,2] [,3]
[1,] 8.1090952 8.107913 4.818142
[2,] 3.3415323 3.420105 6.623132
[3,] 8.3096302 2.374162 5.890954
[4,] 8.1375203 4.948748 7.903514
[5,] 5.5641567 4.919487 5.319625
[6,] -0.2996178 4.582600 7.001769
[7,] 1.5345313 6.573803 8.748253
[8,] 3.4360710 3.141369 3.418084
[9,] 4.3029308 7.124183 1.637147
[10,] 11.5341925 2.104517 5.631124

create a matrix of samples in R

I have a probability distribution X and I would like to create samples of 100 observations:
I use sample(X,size=100,replace=TRUE) I would like to plot the sample mean PDF on 100,1000,10000 samples so I tried to create matrices of observations usingmatrix(sample(X,size=100,replace=TRUE),nrow=100,ncol=100) but it would generate the same sample in all columns. Any ideas on how to create a new sample for each column?
how about this? Substitute rnorm with your sample call. This will take a new sample for each column
replicate(3,rnorm(10))
# [,1] [,2] [,3]
# [1,] -0.439366440511456290974 0.349113310500896667499 2.10467702915785226381
# [2,] 0.788892611945899879800 0.572377925929974273878 0.92566383997665424577
# [3,] 0.098359807623723205516 -0.642162545019581476602 0.28636140673186011307
# [4,] -3.063133170307587249681 1.322694510750672014510 0.66340500173312999532
# [5,] 0.255018412772398617161 1.492588176987205361712 1.11444057062233659039
# [6,] -1.069621910039232570711 -1.460604130070508821504 -0.81534768620081377044
# [7,] -1.036421328330551894226 1.525817374339748067058 0.47070620500783272311
# [8,] -0.139135286049327872027 -0.065015174557339946992 0.21483758566831215320
# [9,] -0.370005496738202488416 1.573987068922320986530 -1.21431499328084857581
#[10,] -0.070508137614489943545 1.657541962601124518883 0.45886687983031809734

Resources