plotting graphs when rownames and column names are not identical - r

I tried everything and I could not find any meaningful answers so I decided to post this here. I have an adjacency matrix as shown below
I am trying to create a plot of a simple graph
library(graph)
g = as(x4, "graphNEL")
plot(g, "neato")
I got an error message
Error in asMethod(object) : 'rownames(from)' and 'colnames(from)' must be identical
Abdominal pain Chest pain Flu-like Liver Damage Nausea Numbness Swelling
Avandaia 1 0 0 1 1 1 1
Warfrin 0 1 1 0 1 1 1
Flu-like 0 0 0 0 0 0 0
Liver Damage 0 0 0 0 0 0 0
Nausea 0 0 0 0 0 0 0
Numbness 0 0 0 0 0 0 0
Swelling 0 0 0 0 0 0 0
Any advice would be helpful. Thanks.

jlhoward when I do rownames(x4) <- colnames(x4) I am making my rownames and column names the same, I am interested in a graph when row names and column names are not equal.

Related

Count number of unique instances in a column depending on values in other columns

I've got the following table (which is called train) (in reality much bigger)
UNSPSC adaptor alert bact blood collection packet patient ultrasoft whit
514415 0 0 0 0 0 0 0 1 0
514415 0 0 0 1 0 0 0 1 0
514415 0 0 1 0 0 0 0 1 0
514415 0 0 0 0 0 0 0 1 0
514415 0 0 0 0 0 0 0 1 0
514415 0 0 0 0 0 0 0 1 0
422018 0 0 0 0 0 0 0 1 0
422018 0 0 0 0 0 0 0 1 0
422018 0 0 0 1 0 0 0 1 0
411011 0 0 0 0 0 0 0 1 0
I want to calculate the number of unique UNSPSC per column where the value is equal to 1. So for column blood it will be 2 and for column ultrasoft will be 3.
I'm doing this but don't know how to continue:
apply(train[,-1], 2, ......)
I'm trying to not to use loops.
To continue from where you left, we can use apply with margin=2 and calculate the length of unique values of "UNSPSC" for each column.
apply(train[-1], 2, function(x) length(unique(train$UNSPSC[x==1])))
#adaptor alert bact blood collection packet
# 0 0 1 2 0 0
#patient ultrasoft whit
# 0 3 0
Better option is with sapply/lapply which gives the same result but unlike apply does not convert the dataframe into matrix.
sapply(train[-1], function(x) length(unique(train$UNSPSC[x==1])))
If you have columns of only 0 and 1, like in the example, just use colSums:
colSums(train[,-1]) # you remove the non numeric columns before use, like UNSPSC
# adaptor alert bact blood collection packet patient
# 0 0 1 2 0 0 0
# ultrasoft whit
# 10 0

Compute percentage weights on rows when one column is not numeric

I have this data called out:
Dates Consumer Staples Energy Financials Health Care
1 12/31/99 0 0 0 0 0
2 03/31/00 0 0 0 0 0
3 06/30/00 0 0 0 0 0
4 09/30/00 0 0 0 0 0
5 12/31/00 0 0 0 0 0
6 03/31/01 1000 0 0 50 0
7 06/30/01 0 0 0 0 0
I would like to compute the weights for each category on each row
but need to avoid summing the first column which is a date
Weights <- round(out[2:6]/rowSums(out[2:6])*100, 2)
1/ Is there a way to keep the dates in the first column, and compute
the weights of the next 5 columns in the same data set
2/ When a date has only 0 data, how to avoid the NAs?
Thank you for you help
outN <- out[,-1]
rownames(outN) <- out[,1]
Cap_Weights <- round(outN/rowSums(outN)*100, 2)
Cap_Weights[is.na(Cap_Weights)] <- 0

Random subsampling in R

I am new in R, therefore my question might be really simple.
I have a 40 sites with abundances of zooplankton.
My data looks like this (columns are species abundances and rows are sites)
0 0 0 0 0 2 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 85 0
0 0 0 0 0 45 5 57 0
0 0 0 0 0 13 0 3 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 7 0
0 3 0 0 12 8 0 57 0
0 0 0 0 0 0 0 1 0
0 0 0 0 0 59 0 0 0
0 0 0 0 4 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 105 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 1 0 0 100 0
0 35 0 55 0 0 0 0 0
1 4 0 0 0 0 0 0 0
0 0 0 0 0 34 21 0 0
0 0 0 0 0 9 17 0 0
0 54 0 0 0 27 5 0 0
0 1 0 0 0 1 0 0 0
0 17 0 0 0 54 3 0 0
What I would like to is take a random sub-sample (e.g. 50 individuals) from each site without replacement several times (bootstrap) in order to calculate diversity indexes to the new standardized abundances afterwards.
Try something like this:
mysample <- mydata[sample(1:nrow(mydata), 50, replace=FALSE),]
What the OP is probably looking for here is a way to bootstrap the data for a Hill or Simpson diversity index, which provides some assumptions about the data being sampled:
Each row is a site, each column is a species, and each value is a count.
Individuals are being sampled for the bootstrap, NOT THE COUNTS.
To do this, bootstrapping programs will often model the counts as a string of individuals. For instance, if we had a record like so:
a b c
2 3 4
The record would be modeled as:
aabbbcccc
Then, a sample is usually drawn WITH replacement from the string to create a larger set based on the model set.
Bootstrapping a site: In R, we have a way to do this that is actually quite simple with the 'sample' function. If you select from the column numbers, you can provide probabilities using the count data.
# Test data.
data <- data.frame(a=2, b=3, c=4)
# Sampling from first row of data.
row <- 1
N_samples <- 50
samples <- sample(1:ncol(data), N_samples, rep=TRUE, prob=data[row,])
Converting the sample into the format of the original table: Now we have an array of samples, with each item indicating the column number that the sample belongs to. We can convert back to the original table format in multiple ways, but here is a fairly simple one using a simple counting loop:
# Count the number of each entry and store in a list.
for (i in 1:ncol(data)){
site_sample[[i]] <- sum(samples==i)
}
# Unlist the data to get an array that represents the bootstrap row.
site_sample <- unlist(site_sample)
Just stumbled upon this thread, and the vegan package has a function called 'rrarify' that does precisely what you're looking to do (and in the same ecological context, too)
This should work. It's a little more complicated than it looks at first, since each cell contains counts of a species. The solution uses the apply function to send each row of the data to the user-defined sample_species function. Then we generate n random numbers and order them. If there are 15 of species 1, 20 of species 2, and 20 of species 3, the random numbers generated between 1 and 15 signify species 1, 16 and 35 signify species 2, and 36-55 signify species 3.
## Initially takes in a row of the data and the number of samples to take
sample_species <- function(counts,n) {
num_species <- length(counts)
total_count <- sum(counts)
samples <- sample(1:total_count,n,replace=FALSE)
samples <- samples[order(samples)]
result <- array(0,num_species)
total <- 0
for (i in 1:num_species) {
result[i] <- length(which(samples > total & samples <= total+counts[i]))
total <- total+counts[i]
}
return(result)
}
A <- matrix(sample(0:100,10*40,replace=T), ncol=10) ## mock data
B <- t(apply(A,1,sample_species,50)) ## results

Vertex names by creating a network object via an edgelist (R package: network)

I want to create a network object, representing a directed network on basis of an edgelist. The first column contains some unique ID of project leaders, the second project partners, let's say:
library("network")
x <- cbind(rbind(1,1,2,2,3), rbind(3,7,10,9,6))
y.nw <- network(x, matrix="edgelist", directed=TRUE, loops=FALSE)
Now my problem is: I need all vertexes to have the right ID, since after creating the network object I have to transfer it back to a adjacency matrix with the right corresponding firm IDs. However, I am not sure in which order I should assign them, since I sorted the dataframe by column 1 (project leaders), which, however, not always show up as project partners as well.
If your ids are sequential integers as in your example, you can produce the adjacency matrix corresponding to the edgelist in your example with:
>as.sociomatrix(y.nw))
1 2 3 4 5 6 7 8 9 10
1 0 0 1 0 0 0 1 0 0 0
2 0 0 0 0 0 0 0 0 1 1
3 0 0 0 0 0 1 0 0 0 0
4 0 0 0 0 0 0 0 0 0 0
5 0 0 0 0 0 0 0 0 0 0
6 0 0 0 0 0 0 0 0 0 0
7 0 0 0 0 0 0 0 0 0 0
8 0 0 0 0 0 0 0 0 0 0
9 0 0 0 0 0 0 0 0 0 0
10 0 0 0 0 0 0 0 0 0 0
But maybe you have a different type of id system in your real input?

function to assign value in a matrix (j-programming)

I have two vectors (say, X and Y) which correspond to row and columns numbers. I want to write a function (a verb, in j-programming) that takes these and assign 1 in a n x n zero matrix. Here's for a simple case.
I have these vectors:
X=:1 2 1 5
Y=:0 3 3 9
and a zeros matrix:
mat=: 10 10$0
and I wrote the following function (I used boxing):
1(|:(,./<"0(|:(X,:Y)))) } 10 10$0
but the problem is it takes these vectors and assigns 1 to every column. So if I take (1,0) it assigns 1 to rows number 1 and 0 in all the columns (like this in Matlab (1,:) ). how can I overcome this problem?
I understand you to want to amend a boolean noun to put 1 at designated coordinates. You start with the coordinate pairs as separate lists. I recommend stitching those lists together like this:
Y,.X
0 1
3 2
3 1
9 5
Y comes before X because in J axes are naturally arranged in decreasing sequence (that is, most fine-grained to the right.) To use these as coordinate pairs with Amend, they'll need to be boxed:
<"1 Y,.X
+---+---+---+---+
|0 1|3 2|3 1|9 5|
+---+---+---+---+
Those will work with Amend to set 1 at those particular coordinates, so:
1 (<"1 Y,.X)} 10 10$0
0 1 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 1 1 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 1 0 0 0 0
If I've understood your question, this is the matrix you were looking to produce.

Resources