rows less than 5 [duplicate] - r

This question already has answers here:
Remove rows with all or some NAs (missing values) in data.frame
(18 answers)
Closed 2 years ago.
i'm trying to find which simple events have a difference between the maximum and minimum of no more than 10.
the code i have is
sampleSpace <- combinayions(100,3,seq(1,100,by=1),repeats.allowed=FALSE)
diff <- sample_max - sample_min
can anyone help me.

The subset of sampleSpace where the difference between max and min in each row is given by:
ss <- apply(sampleSpace, 1, function(x) abs(diff(range(x))) <= 10)
So you can do
sampleSpace[ss,]
to get just the rows meeting that criterion. There are 25,020 remaining rows. The first 30 are:
sampleSpace[ss,]
#> [,1] [,2] [,3]
#> [1,] 1 2 3
#> [2,] 1 2 4
#> [3,] 1 2 5
#> [4,] 1 2 6
#> [5,] 1 2 7
#> [6,] 1 2 8
#> [7,] 1 2 9
#> [8,] 1 2 10
#> [9,] 1 2 11
#> [10,] 1 3 2
#> [11,] 1 3 4
#> [12,] 1 3 5
#> [13,] 1 3 6
#> [14,] 1 3 7
#> [15,] 1 3 8
#> [16,] 1 3 9
#> [17,] 1 3 10
#> [18,] 1 3 11
#> [19,] 1 4 2
#> [20,] 1 4 3
#> [21,] 1 4 5
#> [22,] 1 4 6
#> [23,] 1 4 7
#> [24,] 1 4 8
#> [25,] 1 4 9
#> [26,] 1 4 10
#> [27,] 1 4 11
#> [28,] 1 5 2
#> [29,] 1 5 3
#> [30,] 1 5 4

you can simply use which(), as follows:
sampleSpace = sampleSpace[which(diff <= 10), ]
head(sampleSpace)
Here is the output:
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 1 2 4
[3,] 1 2 5
[4,] 1 2 6
[5,] 1 2 7
[6,] 1 2 8

We can use rowRanges from matrixStats
library(matrixStats)
m1 <- rowRanges(sampleSpace)
out <- sampleSpace[(m1[,2]- m1[,1]) <= 10,]

Related

get.edgelist or get.data.frame to transform matrix'values into weights of an edgelist

I have matrix that through the interpretation as a network shall give me an edgelist with the matrix' values as weights.
M4 <- matrix(rpois(36,5),nrow=6) #randomly googled that to have sample data
Mg <- graph.adjacency(M4) # make matrix into graph
Mw <- get.data.frame(Mg) # 1st try
Mw2 <- cbind( get.edgelist(Mg) , round( E(Mg)$weight, 3 )) # second try
You'll probably see that Mw using get.data.frame is giving me an edgelist, but without the edge weight. There is probably an attribute, which I can't find or understand.
Mw2 using this solution returns:
Error in round(E(Mg)$weight, 3) : non-numeric argument to mathematical function.
How to work around that? Leaves me puzzled.
If you would like to assign weights to edges, you should enable the option weighted like below
Mg <- graph.adjacency(M4, weighted = TRUE)
and finally you will get Mw2 like
> Mw2
[,1] [,2] [,3]
[1,] 1 1 2
[2,] 1 2 9
[3,] 1 3 4
[4,] 1 4 4
[5,] 1 5 7
[6,] 1 6 2
[7,] 2 1 4
[8,] 2 2 1
[9,] 2 3 4
[10,] 2 4 5
[11,] 2 5 5
[12,] 2 6 4
[13,] 3 1 5
[14,] 3 2 2
[15,] 3 3 6
[16,] 3 4 5
[17,] 3 5 6
[18,] 3 6 4
[19,] 4 1 3
[20,] 4 2 4
[21,] 4 3 9
[22,] 4 4 8
[23,] 4 5 3
[24,] 4 6 5
[25,] 5 1 7
[26,] 5 2 5
[27,] 5 3 3
[28,] 5 4 3
[29,] 5 5 5
[30,] 5 6 6
[31,] 6 1 2
[32,] 6 2 7
[33,] 6 3 4
[34,] 6 4 7
[35,] 6 5 11
[36,] 6 6 3

unique relation between two columns X and Y using R [duplicate]

I have a data frame of integers that is a subset of all of the n choose 3 combinations of 1...n.
E.g., for n=5, it is something like:
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 1 2 4
[3,] 1 2 5
[4,] 1 3 4
[5,] 1 3 5
[6,] 1 4 5
[7,] 2 1 3
[8,] 2 1 4
[9,] 2 1 5
[10,] 2 3 4
[11,] 2 3 5
[12,] 2 4 5
[13,] 3 1 2
[14,] 3 1 4
[15,] 3 1 5
[16,] 3 2 4
[17,] 3 2 5
[18,] 3 4 5
[19,] 4 1 2
[20,] 4 1 3
[21,] 4 1 5
[22,] 4 2 3
[23,] 4 2 5
[24,] 4 3 5
[25,] 5 1 2
[26,] 5 1 3
[27,] 5 1 4
[28,] 5 2 3
[29,] 5 2 4
[30,] 5 3 4
What I'd like to do is remove any rows with duplicate combinations, irrespective of ordering. E.g., [1,] 1 2 3 is the same as [1,] 2 1 3 is the same as [1,] 3 1 2.
unique, duplicated, &c. don't seem to take this into account. Also, I am working with quite a large amount of data (n is ~750), so it ought to be a pretty fast operation. Are there any base functions or packages that can do this?
Sort within the rows first, then use duplicated, see below:
# example data
dat = matrix(scan('data.txt'), ncol = 3, byrow = TRUE)
# Read 90 items
dat[ !duplicated(apply(dat, 1, sort), MARGIN = 2), ]
# [,1] [,2] [,3]
# [1,] 1 2 3
# [2,] 1 2 4
# [3,] 1 2 5
# [4,] 1 3 4
# [5,] 1 3 5
# [6,] 1 4 5
# [7,] 2 3 4
# [8,] 2 3 5
# [9,] 2 4 5
# [10,] 3 4 5

Removing rows based off of duplicate answers in different columns [duplicate]

I have a data frame of integers that is a subset of all of the n choose 3 combinations of 1...n.
E.g., for n=5, it is something like:
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 1 2 4
[3,] 1 2 5
[4,] 1 3 4
[5,] 1 3 5
[6,] 1 4 5
[7,] 2 1 3
[8,] 2 1 4
[9,] 2 1 5
[10,] 2 3 4
[11,] 2 3 5
[12,] 2 4 5
[13,] 3 1 2
[14,] 3 1 4
[15,] 3 1 5
[16,] 3 2 4
[17,] 3 2 5
[18,] 3 4 5
[19,] 4 1 2
[20,] 4 1 3
[21,] 4 1 5
[22,] 4 2 3
[23,] 4 2 5
[24,] 4 3 5
[25,] 5 1 2
[26,] 5 1 3
[27,] 5 1 4
[28,] 5 2 3
[29,] 5 2 4
[30,] 5 3 4
What I'd like to do is remove any rows with duplicate combinations, irrespective of ordering. E.g., [1,] 1 2 3 is the same as [1,] 2 1 3 is the same as [1,] 3 1 2.
unique, duplicated, &c. don't seem to take this into account. Also, I am working with quite a large amount of data (n is ~750), so it ought to be a pretty fast operation. Are there any base functions or packages that can do this?
Sort within the rows first, then use duplicated, see below:
# example data
dat = matrix(scan('data.txt'), ncol = 3, byrow = TRUE)
# Read 90 items
dat[ !duplicated(apply(dat, 1, sort), MARGIN = 2), ]
# [,1] [,2] [,3]
# [1,] 1 2 3
# [2,] 1 2 4
# [3,] 1 2 5
# [4,] 1 3 4
# [5,] 1 3 5
# [6,] 1 4 5
# [7,] 2 3 4
# [8,] 2 3 5
# [9,] 2 4 5
# [10,] 3 4 5

R - Network Data Into and Out of R - Adjacency Matrix to Edgelist Format

I am a basic programmer using R for social-network analysis and have some information that I am not sure how to solve.
WHAT I HAVE:
An adjacency matrix stored as a csv file with the following information:
a) Households in row 1 and households in column 1 interact with each other through sharing resources.
b) The interactions are ties represented by kinship numbers. The smaller the number the closer (or stronger) the kinship connection. For example, 1 is parent-child kinship, and 100 is no kinship. No kinship to self is NA.
c) File snippet:
[,1] [,2] [,3] [,4] [,5]
[1,] NA 100 2 1 100
[2,] 4 NA 100 100 3
[3,] 100 3 NA 2 4
[4,] 100 1 5 NA 100
[5,] 1 100 4 100 NA
WHAT I NEED:
I need to convert this adjacency matrix into an edge list with three columns ("HH1", "HH2", "HHKinRank") in order to complete additional kinship calculations.
This edge list must be saved as a new csv file for further analysis.
My greatest issue with the list, is that it will need to only list the numerical values. If there is no tie (NA) then will the edge list show this?
WHAT I HAVE DONE:
I tried assigning the csv file to a new variable
HHKinRank.el <- read.csv("HouseholdKinRank.csv").
When I did this the most frustrating component was determining what libraries I may have to use. There are many function commands, such as melt, so troubleshooting is an issue because I also may be incorrectly assigning values.
I can go from an edge list to a matrix, but the opposite is hard to run the commands for.
Thank you for any assistance with this.
You can do this using the network package for R, probably in igraph as well.
library(network)
# create the example data
adjMat <- matrix(c(NA, 100, 2, 1, 100,
4, NA, 100, 100, 3,
100, 3, NA, 2, 4,
100, 1, 5, NA, 100,
1, 100, 4, 100, NA),
ncol = 5,byrow=TRUE)
# create a network object
net<-as.network(adjMat,matrix.type='adjacency',
ignore.eval = FALSE, # read edge values from matrix as attribute
names.eval='kinship', # name the attribute
loops=FALSE) # ignore self-edges
# convert to an edgelist matrix
el <-as.edgelist(net,attrname = 'kinship')
# relabel the columns
colnames(el)<-c("HH1", "HH2", "HHKinRank")
# check results
el
HH1 HH2 HHKinRank
[1,] 1 2 100
[2,] 1 3 2
[3,] 1 4 1
[4,] 1 5 100
[5,] 2 1 4
[6,] 2 3 100
[7,] 2 4 100
[8,] 2 5 3
[9,] 3 1 100
[10,] 3 2 3
[11,] 3 4 2
[12,] 3 5 4
[13,] 4 1 100
[14,] 4 2 1
[15,] 4 3 5
[16,] 4 5 100
[17,] 5 1 1
[18,] 5 2 100
[19,] 5 3 4
[20,] 5 4 100
# write edgelist matrix to csv file
write.csv(el,file = 'myEdgelist.csv')
Original data:
adj_mat <- matrix(
c(NA, 100, 2, 1, 100,
4, NA, 100, 100, 3,
100, 3, NA, 2, 4,
100, 1, 5, NA, 100,
1, 100, 4, 100, NA
),
nrow = 5, ncol = 5, byrow = TRUE
)
adj_mat
#> [,1] [,2] [,3] [,4] [,5]
#> [1,] NA 100 2 1 100
#> [2,] 4 NA 100 100 3
#> [3,] 100 3 NA 2 4
#> [4,] 100 1 5 NA 100
#> [5,] 1 100 4 100 NA
1) Assemble row indices, column indices, and the adjacency matrix's values into a list of 3 matrices:
rows_cols_vals_matrices <- list(row_indices = row(adj_mat),
col_indices = col(adj_mat),
values = adj_mat)
rows_cols_vals_matrices
#> $row_indices
#> [,1] [,2] [,3] [,4] [,5]
#> [1,] 1 1 1 1 1
#> [2,] 2 2 2 2 2
#> [3,] 3 3 3 3 3
#> [4,] 4 4 4 4 4
#> [5,] 5 5 5 5 5
#>
#> $col_indices
#> [,1] [,2] [,3] [,4] [,5]
#> [1,] 1 2 3 4 5
#> [2,] 1 2 3 4 5
#> [3,] 1 2 3 4 5
#> [4,] 1 2 3 4 5
#> [5,] 1 2 3 4 5
#>
#> $values
#> [,1] [,2] [,3] [,4] [,5]
#> [1,] NA 100 2 1 100
#> [2,] 4 NA 100 100 3
#> [3,] 100 3 NA 2 4
#> [4,] 100 1 5 NA 100
#> [5,] 1 100 4 100 NA
2) Flatten the matrices:
vectorized_matrices <- lapply(rows_cols_vals_matrices, as.vector)
vectorized_matrices
#> $row_indices
#> [1] 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5
#>
#> $col_indices
#> [1] 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3 4 4 4 4 4 5 5 5 5 5
#>
#> $values
#> [1] NA 4 100 100 1 100 NA 3 1 100 2 100 NA 5 4 1 100
#> [18] 2 NA 100 100 3 4 100 NA
3) Bind the vectors into a 3 column matrix:
melted <- do.call(cbind, vectorized_matrices)
head(melted)
#> row_indices col_indices values
#> [1,] 1 1 NA
#> [2,] 2 1 4
#> [3,] 3 1 100
#> [4,] 4 1 100
#> [5,] 5 1 1
#> [6,] 1 2 100
4) Drop rows where column 3 is NA:
filtered <- melted[!is.na(melted[, 3]), ]
filtered
#> row_indices col_indices values
#> [1,] 2 1 4
#> [2,] 3 1 100
#> [3,] 4 1 100
#> [4,] 5 1 1
#> [5,] 1 2 100
#> [6,] 3 2 3
#> [7,] 4 2 1
#> [8,] 5 2 100
#> [9,] 1 3 2
#> [10,] 2 3 100
#> [11,] 4 3 5
#> [12,] 5 3 4
#> [13,] 1 4 1
#> [14,] 2 4 100
#> [15,] 3 4 2
#> [16,] 5 4 100
#> [17,] 1 5 100
#> [18,] 2 5 3
#> [19,] 3 5 4
#> [20,] 4 5 100
5) Wrap it all up into a function:
as_edgelist.adj_mat <- function(x, .missing = NA) {
# if there arerow/colnames or non-numeric data, you'll need to to use a data frame to
# handle heterogenous data types
stopifnot(is.numeric(x) & is.null(dimnames(x)))
melted <- do.call(cbind, lapply(list(row(x), col(x), x), as.vector))
if (is.na(.missing)) {
out <- melted[!is.na(melted[, 3]), ]
} else {
out <- melted[melted[, 3] != .missing, ]
}
out
}
6) Take it for a spin:
as_edgelist.adj_mat(adj_mat)
#> [,1] [,2] [,3]
#> [1,] 2 1 4
#> [2,] 3 1 100
#> [3,] 4 1 100
#> [4,] 5 1 1
#> [5,] 1 2 100
#> [6,] 3 2 3
#> [7,] 4 2 1
#> [8,] 5 2 100
#> [9,] 1 3 2
#> [10,] 2 3 100
#> [11,] 4 3 5
#> [12,] 5 3 4
#> [13,] 1 4 1
#> [14,] 2 4 100
#> [15,] 3 4 2
#> [16,] 5 4 100
#> [17,] 1 5 100
#> [18,] 2 5 3
#> [19,] 3 5 4
#> [20,] 4 5 100

Removing duplicate combinations (irrespective of order)

I have a data frame of integers that is a subset of all of the n choose 3 combinations of 1...n.
E.g., for n=5, it is something like:
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 1 2 4
[3,] 1 2 5
[4,] 1 3 4
[5,] 1 3 5
[6,] 1 4 5
[7,] 2 1 3
[8,] 2 1 4
[9,] 2 1 5
[10,] 2 3 4
[11,] 2 3 5
[12,] 2 4 5
[13,] 3 1 2
[14,] 3 1 4
[15,] 3 1 5
[16,] 3 2 4
[17,] 3 2 5
[18,] 3 4 5
[19,] 4 1 2
[20,] 4 1 3
[21,] 4 1 5
[22,] 4 2 3
[23,] 4 2 5
[24,] 4 3 5
[25,] 5 1 2
[26,] 5 1 3
[27,] 5 1 4
[28,] 5 2 3
[29,] 5 2 4
[30,] 5 3 4
What I'd like to do is remove any rows with duplicate combinations, irrespective of ordering. E.g., [1,] 1 2 3 is the same as [1,] 2 1 3 is the same as [1,] 3 1 2.
unique, duplicated, &c. don't seem to take this into account. Also, I am working with quite a large amount of data (n is ~750), so it ought to be a pretty fast operation. Are there any base functions or packages that can do this?
Sort within the rows first, then use duplicated, see below:
# example data
dat = matrix(scan('data.txt'), ncol = 3, byrow = TRUE)
# Read 90 items
dat[ !duplicated(apply(dat, 1, sort), MARGIN = 2), ]
# [,1] [,2] [,3]
# [1,] 1 2 3
# [2,] 1 2 4
# [3,] 1 2 5
# [4,] 1 3 4
# [5,] 1 3 5
# [6,] 1 4 5
# [7,] 2 3 4
# [8,] 2 3 5
# [9,] 2 4 5
# [10,] 3 4 5

Resources