number non-zero elements by row in R - r

I wish to number the non-zero elements in a matrix by row. Here is a small data set and the desired result. I would prefer a solution in base R.
my.data <- matrix(c(10, 0, 0, 0, 0,
0, 3, 9, 0, 1,
2, 12, 0, 0, 0,
5, 5, 5, 0, 5,
0, 0, 0, 0, 0), nrow = 5, byrow = TRUE)
desired.result <- matrix(c( 1, 0, 0, 0, 0,
0, 1, 2, 0, 3,
1, 2, 0, 0, 0,
1, 2, 3, 0, 4,
0, 0, 0, 0, 0), nrow = 5, byrow = TRUE)

Another couple options:
# create new matrix with multiplication
t(apply(my.data != 0, 1, cumsum)) * (my.data != 0)
# alternative:
# replace elements in original matrix
my.data[my.data != 0] = t(apply(my.data != 0, 1, cumsum))[my.data != 0]
my.data
# [,1] [,2] [,3] [,4] [,5]
# [1,] 1 0 0 0 0
# [2,] 0 1 2 0 3
# [3,] 1 2 0 0 0
# [4,] 1 2 3 0 4
# [5,] 0 0 0 0 0

Here's a relatively naive base R method:
t(apply(my.data, 1, function(x) {
x[x != 0] <- seq_len(sum(x != 0))
x
}))
[,1] [,2] [,3] [,4] [,5]
[1,] 1 0 0 0 0
[2,] 0 1 2 0 3
[3,] 1 2 0 0 0
[4,] 1 2 3 0 4
[5,] 0 0 0 0 0

Related

How do I convert multiple columns into one from Binary data?

My data has 3 surveys per year (for 10 years) where 1 represents presence and 0s present absence. The subset looks like this
x <- structure(c(0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1,
0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1,
0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1,
0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1),
.Dim = c(4L, 3L, 4L))
I want to collapse these three columns into one in a way that every row that has 1 in any survey, shows 1 in the final otherwise shows 0.
Collapse the second dimension of the array with apply:
apply(x, c(1L, 3L), function(y) as.integer(any(as.logical(y))))
## [,1] [,2] [,3] [,4]
## [1,] 0 0 0 0
## [2,] 1 1 1 1
## [3,] 0 1 1 1
## [4,] 1 1 1 1
The result is a [site, year] matrix.
We could use max
apply(x, c(1, 3), FUN = max)
[,1] [,2] [,3] [,4]
[1,] 0 0 0 0
[2,] 1 1 1 1
[3,] 0 1 1 1
[4,] 1 1 1 1

Replacing values in one matrix with values from another

I'm trying to compare to matrices. When the values aren't equivalent then I want to use the value from mat2 so long as it is greater than 0; if it is zero, then I want the value from mat1. As the code is currently, it appears to constantly return the value of mat1.
Here is my attempt:
mat.data1 <- c(1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 1, 0, 1, 1, 0, 0, 1)
mat1 <- matrix(data = mat.data1, nrow = 5, ncol = 5, byrow = TRUE)
mat.data2 <- c(0, 0, 0, 0, 0, 0, 1, 2, 0, 0, 0, 1, 2, 2, 0, 0, 0, 1, 2, 2, 0, 2, 1, 0, 1)
mat2 <- matrix(data = mat.data2, nrow = 5, ncol = 5, byrow = TRUE)
mat3 = if(mat1 == mat2){mat1} else {if(mat2>0){mat2} else {mat1}}
the expected output should be
1 0 1 1 1
0 1 2 1 1
1 1 2 2 0
1 1 1 2 2
1 1 1 0 1
Here is one potential way to do it.
mat.data1 <- c(1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 1, 0, 1, 1, 0, 0, 1)
mat1 <- matrix(data = mat.data1, nrow = 5, ncol = 5, byrow = TRUE)
mat.data2 <- c(0, 0, 0, 0, 0, 0, 1, 2, 0, 0, 0, 1, 2, 2, 0, 0, 0, 1, 2, 2, 0, 2, 1, 0, 1)
mat2 <- matrix(data = mat.data2, nrow = 5, ncol = 5, byrow = TRUE)
mat3 <- mat1
to_change <- which(mat2 != mat1 & mat2 > 0)
mat3[to_change] <- mat2[to_change]
This specific use of which essentially asks for the locations in mat2 that are not equal to that in mat1 AND where mat2 is greater than zero. You can then just do a subset and place those values in mat3.
This output is then:
> mat3
[,1] [,2] [,3] [,4] [,5]
[1,] 1 0 1 1 1
[2,] 0 1 2 1 1
[3,] 1 1 2 2 0
[4,] 1 1 1 2 2
[5,] 1 2 1 0 1
We can use coalesce
library(dplyr)
out <- coalesce(replace(mat2, !mat2, NA), replace(mat1, !mat1, NA))
replace(out, is.na(out), 0)
Or as #Axeman mentioned
coalesce(out, 0)

Issues with plotting network in igraph

I am having some issues in realizing a bipartite network in R with the library igraph. Here is my script:
library(igraph)
library(reshape2)
setwd("....")
getwd()
library(readxl)
network=read_excel("network1.xlsx")
print(network)
subjects=as.character(unlist(network[,1]))
agents=colnames(network[-1])
print(network)
network = network[,-1]
g=graph.incidence(network, weighted = T)
V(g)$type
V(g)$name=c(subjects,agents)
V(g)$color = V(g)$type
V(g)$color=gsub("FALSE","red",V(g)$color)
V(g)$color=gsub("TRUE","lightblue",V(g)$color)
plot(g, edge.arrow.width = 0.3,
vertex.size = 5,
edge.arrow.size = 0.5,
vertex.size2 = 5,
vertex.label.cex = 1,
vertex.label.color="black",
asp = 0.35,
margin = 0,
edge.color="grey",
edge.width=(E(g)$weight),
layout=layout_as_bipartite)
The network is properly plotted
as you can see
however I have two issues
(1) I don't understand the order in which the vertexs are showed in the plot. They are not in the same order of the excel file, neither in alphabetical or numerical order. They seem to be in random order. How could I choose the order in which the vertex should be placed?
(2) I don't understand why some vertex are closer toghether, and some are more far apart. I would all vertexes at the same distance. How could I do it?
Thank you a lot for your invaluable help.
Since you do not provide your data, I will illustrate with a made-up example.
Sample graph data
library(igraph)
set.seed(123)
EL = matrix(c(sample(8,18, replace=T),
sample(LETTERS[1:6], 18, replace=T)), ncol=2)
g = simplify(graph_from_edgelist(EL))
V(g)$type = bipartite_mapping(g)$type
VCol = c("#FF000066", "#0000FF66")[as.numeric(V(g)$type)+1]
plot(g, layout=layout_as_bipartite(g), vertex.color=VCol)
As with your graph, this has two problems. The nodes are ordered arbitrarily
and the lower row is oddly spaced. Let's address those problems one at a time.
To do so, we will need to take control of the layout instead of using any of
the automated layout functions. A layout is simply a vcount(g) * 2 matrix
giving the x-y coordinates of the vertices for plotting. Here, I will put one
type of nodes in the top row by specifying the y coordinate as 1 and the other
nodes in a lower row by specifying y=0. We want to specify the order horizontally
by rank (alphabetically) within each group. So
LO = matrix(0, nrow=vcount(g), ncol=2)
LO[!V(g)$type, 2] = 1
LO[V(g)$type, 1] = rank(V(g)$name[V(g)$type])
LO[!V(g)$type, 1] = rank(V(g)$name[!V(g)$type])
plot(g, layout=LO, vertex.color=VCol)
Now both rows are ordered and evenly spaced, but because there are fewer
vertices in the bottom row, there is an unattractive, unbalanced look. We
can fix that by stretching the bottom row. I find it easier to make the right
scale factor if the coordinates go from 0 to (number of nodes) - 1 rather than
1 to (number of nodes) as above. Doing this, we get
LO[V(g)$type, 1] = rank(V(g)$name[V(g)$type]) - 1
LO[!V(g)$type, 1] = (rank(V(g)$name[!V(g)$type]) - 1) *
(sum(V(g)$type) - 1) / (sum(!V(g)$type) - 1)
plot(g, layout=LO, vertex.color=VCol)
thank you a lot. I performed your very very helpful example, and with the step one I did it work properly with my data, keeping the different thickness of the edges and all as in my plot, but with the proper order. This is very important, thank you a lot. However, I have some troubles in understanding how to rescale properly the top and the bottom row with my data, because they always seem to bee too near. probably I did not understand completly the coordinates on which I have to work. Here are my data.
> `> network=read_excel("network1.xlsx",2)
> dput(network)
structure(list(`NA` = c(2333, 2439, 2450, 2451, 2452, 2453, 2454,
2455, 2456, 2457, 2458, 2459, 2460, 2461, 2480, 2490, 2491, 2492,
2493, 2494, 2495), A = c(12, 2, 2, 5, 2, 0, 5, 3, 0, 0, 7, 0,
0, 0, 6, 2, 10, 7, 1, 2, 5), B = c(0, 1, 0, 1, 0, 0, 2, 0, 0,
0, 0, 0, 1, 0, 5, 0, 2, 0, 0, 0, 0), C = c(0, 0, 0, 0, 1, 0,
4, 0, 0, 0, 0, 1, 0, 0, 2, 0, 4, 4, 2, 1, 0), D = c(2, 0, 0,
0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 7, 0, 4, 0, 1, 4, 0), E = c(11,
2, 3, 3, 3, 8, 3, 6, 4, 1, 1, 0, 12, 0, 5, 0, 4, 6, 4, 8, 9),
F = c(2, 0, 0, 3, 1, 0, 10, 1, 0, 0, 0, 1, 0, 0, 9, 0, 0,
1, 1, 3, 3), G = c(0, 3, 1, 1, 0, 0, 0, 0, 0, 3, 2, 0, 0,
0, 1, 0, 0, 2, 0, 1, 0), H = c(0, 0, 2, 0, 0, 0, 1, 0, 0,
0, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 1), I = c(0, 0, 0, 0, 0,
0, 3, 0, 6, 3, 0, 0, 1, 0, 7, 0, 0, 4, 1, 2, 0), J = c(0,
0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0,
0)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
-21L), .Names = c(NA, "A", "B", "C", "D", "E", "F", "G", "H",
"I", "J"))
> print(network)
NA A B C D E F G H I J
1 2333 12 0 0 2 11 2 0 0 0 0
2 2439 2 1 0 0 2 0 3 0 0 0
3 2450 2 0 0 0 3 0 1 2 0 0
4 2451 5 1 0 0 3 3 1 0 0 0
5 2452 2 0 1 0 3 1 0 0 0 0
6 2453 0 0 0 0 8 0 0 0 0 1
7 2454 5 2 4 2 3 10 0 1 3 0
8 2455 3 0 0 0 6 1 0 0 0 0
9 2456 0 0 0 0 4 0 0 0 6 0
10 2457 0 0 0 0 1 0 3 0 3 0
11 2458 7 0 0 0 1 0 2 0 0 0
12 2459 0 0 1 0 0 1 0 0 0 0
13 2460 0 1 0 0 12 0 0 0 1 0
14 2461 0 0 0 0 0 0 0 0 0 0
15 2480 6 5 2 7 5 9 1 2 7 1
16 2490 2 0 0 0 0 0 0 0 0 0
17 2491 10 2 4 4 4 0 0 0 0 0
18 2492 7 0 4 0 6 1 2 0 4 0
19 2493 1 0 2 1 4 1 0 0 1 0
20 2494 2 0 1 4 8 3 1 0 2 0
21 2495 5 0 0 0 9 3 0 1 0 0
> `

Get the average every 10 steps in a vector in R [duplicate]

This question already has answers here:
Stats on every n rows for each column
(2 answers)
Closed 6 years ago.
I have a vector of values:
[1] 0 0 4 1 0 0 -1 1 1 0 -1 0 0 -2 0 0
[17] 1 2 0 2 0 1 1 1 0 1 -1 0 0 0 0 0
[33] 0 2 0 4 -2 0 0 -1 1 0 0 0 -1 -2 2 0
[49] -1 0 -1 0 3 0 0 -1 1 0 0 0 1 -3 0 -1
[65] 0 -1 0 1 1 0 1 -2 1 1 0 0 -1 -2 0 0
[81] 0 2 0 0 1 1 0 0 0 -1 -2 0 -1 -1 -1 -1
[97] 1 1 0 1
I would like to get the average every 10 steps (the average of the previous 10 numbers at that point), and thus produce a new vector with these averages. Since there are 100 values in the original vector this would give a new vector of length 10 (the 10 averages).
I know I can get access to the number at each 10th point using:
result <- my_vector[seq(1, length(my_vector), 10)]
But I need the average of the 10 previous points at that step, not just the number itself.
colMeans(matrix(x, 10))
[1] 0.4 0.7 0.8 0.2 0.0 0.4 -0.4 -0.4 -0.7 0.1
We turn the vector into a matrix with the dimensions matching your desired length and use colMeans to find the mean of each group. We could have also used rowMeans, but since the matrix is populated column-wise by default we would have to add another argument byrow=TRUE and potentially hurt ourselves with all of the extra typing.
We can test our answer by explicitly finding the mean of a few of the subsetted vectors.
#Test
mean(x[1:10])
[1] 0.4
mean(x[11:20])
[1] 0.7
Data
x <- c(0, 1, 0, -1, 0, 0, 0, 2, 2, 0, -1, 2, 4, 0, 0, -1, 0, 0, 1,
2, 4, 0, 1, 0, 0, 0, -2, 3, 1, 1, 0, 1, 0, 0, 0, 1, -1, 1, 0,
0, 1, 0, 1, 1, -1, -1, -2, 0, 1, 0, 1, 1, 1, 0, 0, 1, 0, 0, 1,
-1, -1, -1, 0, 0, 0, -2, 0, 0, 0, 0, 0, 0, 0, 0, -1, 1, -1, -1,
-2, 0, -2, -3, -2, -1, 0, 0, 2, 0, 0, -1, 0, 0, 0, -1, 0, -1,
1, 1, 0, 1)

How to convert predicted values into binary variables and save them to a CSV

I have made a decision tree model on test data then used it to predict vales in a test dataset.
dtpredict<-predict(ct1, testdat, type="class")
The output looks like:
1 2 3 4 5 6
Class_2 Class_2 Class_6 Class_2 Class_8 Class_2
I want to write a csv to look like:
id, Class_1, Class_2, Class_3, Class_4, Class_5, Class_6, Class_7, Class_8, Class_9
1, 0, 1, 0, 0, 0, 0, 0, 0, 0
2, 0, 1, 0, 0, 0, 0, 0, 0, 0
3, 0, 0, 0, 0, 0, 1, 0, 0, 0
4, 0, 1, 0, 0, 0, 0, 0, 0, 0
5, 0, 0, 0, 0, 0, 0, 0, 1, 0
6, 0, 1, 0, 0, 0, 0, 0, 0, 0
There's a package called dummies that does that well...
install.packages("dummies")
library(dummies)
x <- factor(c("Class_2", "Class_2", "Class_6", "Class_2", "Class_8", "Class_2"),
levels = paste("Class", 1:9, sep="_"))
dummy(x, drop = FALSE)
xClass_1 xClass_2 xClass_3 xClass_4 xClass_5 xClass_6 xClass_7 xClass_8 xClass_9
[1,] 0 1 0 0 0 0 0 0 0
[2,] 0 1 0 0 0 0 0 0 0
[3,] 0 0 0 0 0 1 0 0 0
[4,] 0 1 0 0 0 0 0 0 0
[5,] 0 0 0 0 0 0 0 1 0
[6,] 0 1 0 0 0 0 0 0 0
All that remains is to get rid of the "x" but this should not be too hard with something like this:
d <- dummy(x,drop = FALSE)
colnames(d) <- sub("x", "", colnames(d))
and then to save to disk:
write.csv(d, "somefile.csv", row.names = FALSE)
Uh, what are the 010101's - logicals? If so they don't make much sense in your example all are class 1 (doesn't correspond to your example dtpredict). If they are logicals....
# if dtpredict is a factor vector, where the values are the classes
# and the names are the boolean values:
values = as.numeric(as.character(names(dtpredict)))
classes = as.character(dtpredict)
x = data.frame(id=names(classes))
for(class in sort(unique(classes)){
x[ , class] = as.numeric(sapply(classes, FUN=function(p) p==class])
}
write.csv(x, 'blah.csv')

Resources