I have created a transition matrix as a 'from cluster' (rows) 'to cluster' (columns) frequency. Think Markov chain.
Assume I have 5 from clusters but only 3 to clusters then I get a 5*3 transition matrix. How do a force it to be a 5*5 transition matrix? Effectively how to I show the all zero columns?
I'm after an elegant solution as this will be applied on a much larger problem involving hundreds of clusters. I am really quite unfamiliar with R Matrix's and to my knowledge I don't know of an elegant way to force number of columns to enter number of rows then impute zero's where no match except for using a for loop which my hunch is that's not the best solution.
Example code:
# example data
cluster_before <- c(1,2,3,4,5)
cluster_after <- c(1,2,4,4,1)
# Table output
table(cluster_before,cluster_after)
# ncol does not = nrows. I want to rectify that
# I want output to look like this:
what_I_want <- matrix(
c(1,0,0,0,0,
0,1,0,0,0,
0,0,0,1,0,
0,0,0,1,0,
1,0,0,0,0),
byrow=TRUE,ncol=5
)
# Possible solution. But for loop can't be best solution?
empty_mat <- matrix(0,ncol=5,nrow=5)
matrix_to_update <- empty_mat
for (i in 1:length(cluster_before)) {
val_before <- cluster_before[i]
val_after <- cluster_after[i]
matrix_to_update[val_before,val_after] <- matrix_to_update[val_before,val_after]+1
}
matrix_to_update
# What's the more elegant solution?
Thanks in advance for your help. It's much appreciated.
Make them factors and then table:
levs <- union(cluster_before, cluster_after)
table(factor(cluster_before,levs), factor(cluster_after,levs))
# 1 2 3 4 5
# 1 1 0 0 0 0
# 2 0 1 0 0 0
# 3 0 0 0 1 0
# 4 0 0 0 1 0
# 5 1 0 0 0 0
Another solution is to use matrix indicies:
what_I_want <- matrix(0,ncol=5,nrow=5)
what_I_want[cbind(cluster_before,cluster_after)] <- 1
print(what_I_want)
## [,1] [,2] [,3] [,4] [,5]
##[1,] 1 0 0 0 0
##[2,] 0 1 0 0 0
##[3,] 0 0 0 1 0
##[4,] 0 0 0 1 0
##[5,] 1 0 0 0 0
The second line sets the elements corresponding to the row (cluster_before) and column (cluster_after) indices to 1.
Hope this helps.
Related
d <- data.frame(B1 = c(1,2,3,4),B2 = c(0,1,2,3))
d$total=rowSums(d)
B1 B2 total
1 0 1
2 1 3
3 2 5
4 3 7
Using the dataframe above, I want to create a new dataframe with the following logic:
Going by rows, if cells (B1:B2) matches d$total, return 1, else 0.
Ideally output to look like:
B1n B2n
1 0
0 0
0 0
0 0
What is the best way to do this in R?
Thank you.
You can compare first 2 columns with total value.
res <- +(d[1:2] == d$total)
res
# B1 B2
#[1,] 1 0
#[2,] 0 0
#[3,] 0 0
#[4,] 0 0
The result is a matrix, if you want dataframe as output you can do res <- data.frame(res).
Here is an alternate way to solve this problem. You can use dplyr::transmute which is the opposite of dplyr::mutate which will give you two separate columns. Inside transmute are just conditions.
library(dplyr)
newdf <- d %>% transmute(B1n=ifelse(B1+B2==B1,1,0),B2n=ifelse(B1+B2==B2,1,0))
> newdf
B1n B2n
1 1 0
2 0 0
3 0 0
4 0 0
I am attempting to loop a command based upon a list (fish_species). And while I’ve found plenty of examples, I haven’t found one that also includes changing the column name as part of the loop. I have figured out how to get the desired result for an individual species (lines 10-13), but in the actual dataset I have ~500 species, and I’d prefer not to repeat this command 500+ times. Is there a way to substitute the values from a list where it says variable?
Fishdata$variable <- ifelse(fishdata$Species== “variable”,fishdata$Number,0)
I know how to do this is ArcGIS, but I am trying to expand my horizons and learn R. This is also my first post, so please excuse any screw ups.
Thank you for any help you can provide.
fishdata <-c()
fishdata$Site <-c(1,1,1,2,2,2)
fishdata$Species <- c("one_fish", "two_fish", "two_fish", "red_fish", "blue_fish", "blue_fish")
fishdata$Number <- c(1,1,1,1,1,1)
fishdata$one_fish <-0
fishdata$two_fish <-0
fishdata$red_fish <-0
fishdata$blue_fish <-0
fish_list <- c("one_fish","two_fish", "red_fish", "blue_fish")
fishdata$one_fish <- ifelse(fishdata$Species=="one_fish",fishdata$Number,0)
fishdata$two_fish <- ifelse(fishdata$Species=="two_fish",fishdata$Number,0)
fishdata$red_fish <- ifelse(fishdata$Species=="red_fish",fishdata$Number,0)
fishdata$blue_fish <- ifelse(fishdata$Species=="blue_fish",fishdata$Number,0)
You can use sapply to iterate over species,
sapply(fishdata$Species, function(i)ifelse(fishdata$Species== i, fishdata$Number,0))
# one_fish two_fish two_fish red_fish blue_fish blue_fish
#[1,] 1 0 0 0 0 0
#[2,] 0 1 1 0 0 0
#[3,] 0 1 1 0 0 0
#[4,] 0 0 0 1 0 0
#[5,] 0 0 0 0 1 1
#[6,] 0 0 0 0 1 1
$ is just an alternative to the [] operator:
a$x
a["x"]
So you can do:
fishdata[species] <- ifelse(fishdata$Species == species, fishdata$Number, 0)
for (species in fish_species) {
fishdata[species] <- ifelse(fishdata$Species == species, fishdata$Number, 0)
}
I'm working on code to construct an option pricing matrix. What I have at the moment is the values along the diagonal part of the matrix. Currently I'm working in a matrix with 4 rows and 4 columns. What I'm attempting to do is to use the values in the diagonal part of the matrix to give values in the lower triangle of the matrix. So for my matrix Omat, Omat[1,1]+Omat[2,2] will give a value for [2,1], Omat[2,2]+Omat[3,3] will give a value for [3,2]. Then using these created values, Omat[2,1]+Omat[3,2] will give a value for [3,1].
My attempt:
Omat = diag(2, 4, 4)
Omat[j+i,j] <- Omat[i-1,j]+Omat[i,j+1]
Any ideas on how one could go about this?
What I currently have, a 4 row by 4 col matrix:
Omat
# 2 0 0 0
# 0 2 0 0
# 0 0 2 0
# 0 0 0 2
What I've been attempting to create, a 4 row by 4 col matrix:
0 0 0 0
4 0 0 0
8 4 0 0
16 8 4 0
You could try calculating successive diagonals underneath the main diagonal. Code could look like:
Omat = diag(2,4)
for(i in 1:(nrow(Omat)-1)) {
for( j in (i+1):nrow(Omat)) {
Omat[j,j-i] <- Omat[j,j-i+1] + Omat[j-1,j-i]
}
}
diag(Omat) <- 0
Am I probably missing something, but why not do this:
for (i in 2:dim){
for (j in 1:(i-1)){
Omat[i,j] <- Omat[i-1,j] + Omat[i,j+1]
}
}
diag(Omat) <- 0
,David.
I would like to create a matrix of indicator variables. My initial thought was to use model.matrix, which was also suggested here: Automatically expanding an R factor into a collection of 1/0 indicator variables for every factor level
However, model.matrix does not seem to work if a factor has only one level.
Here is an example data set with three levels to the factor 'region':
dat = read.table(text = "
reg1 reg2 reg3
1 0 0
1 0 0
1 0 0
1 0 0
1 0 0
1 0 0
0 1 0
0 1 0
0 1 0
0 0 1
0 0 1
0 0 1
0 0 1
", sep = "", header = TRUE)
# model.matrix works if there are multiple regions:
region <- c(1,1,1,1,1,1,2,2,2,3,3,3,3)
df.region <- as.data.frame(region)
df.region$region <- as.factor(df.region$region)
my.matrix <- as.data.frame(model.matrix(~ -1 + df.region$region, df.region))
my.matrix
# The following for-loop works even if there is only one level to the factor
# (one region):
# region <- c(1,1,1,1,1,1,1,1,1,1,1,1,1)
my.matrix <- matrix(0, nrow=length(region), ncol=length(unique(region)))
for(i in 1:length(region)) {my.matrix[i,region[i]]=1}
my.matrix
The for-loop is effective and seems simple enough. However, I have been struggling to come up with a solution that does not involve loops. I can use the loop above, but have been trying hard to wean myself off of them. Is there a better way?
I would use matrix indexing. From ?"[":
A third form of indexing is via a numeric matrix with the one column for each dimension: each row of the index matrix then selects a single element of the array, and the result is a vector.
Making use of that nice feature:
my.matrix <- matrix(0, nrow=length(region), ncol=length(unique(region)))
my.matrix[cbind(seq_along(region), region)] <- 1
# [,1] [,2] [,3]
# [1,] 1 0 0
# [2,] 1 0 0
# [3,] 1 0 0
# [4,] 1 0 0
# [5,] 1 0 0
# [6,] 1 0 0
# [7,] 0 1 0
# [8,] 0 1 0
# [9,] 0 1 0
# [10,] 0 0 1
# [11,] 0 0 1
# [12,] 0 0 1
# [13,] 0 0 1
I came up with this solution by modifying an answer to a similar question here:
Reshaping a column from a data frame into several columns using R
region <- c(1,1,1,1,1,1,2,2,2,3,3,3,3)
site <- seq(1:length(region))
df <- cbind(site, region)
ind <- xtabs( ~ site + region, df)
ind
region <- c(1,1,1,1,1,1,1,1,1,1,1,1,1)
site <- seq(1:length(region))
df <- cbind(site, region)
ind <- xtabs( ~ site + region, df)
ind
EDIT:
The line below will extract the data frame of indicator variables from ind:
ind.matrix <- as.data.frame.matrix(ind)
I have a dataframe (df1) like this.
f1 f2 f3 f4 f5
d1 1 0 1 1 1
d2 1 0 0 1 0
d3 0 0 0 1 1
d4 0 1 0 0 1
The d1...d4 column is the rowname, the f1...f5 row is the columnname.
To do sample(df1), I get a new dataframe with count of 1 same as df1. So, the count of 1 is conserved for the whole dataframe but not for each row or each column.
Is it possible to do the randomization row-wise or column-wise?
I want to randomize the df1 column-wise for each column, i.e. the number of 1 in each column remains the same. and each column need to be changed by at least once. For example, I may have a randomized df2 like this: (Noted that the count of 1 in each column remains the same but the count of 1 in each row is different.
f1 f2 f3 f4 f5
d1 1 0 0 0 1
d2 0 1 0 1 1
d3 1 0 0 1 1
d4 0 0 1 1 0
Likewise, I also want to randomize the df1 row-wise for each row, i.e. the no. of 1 in each row remains the same, and each row need to be changed (but the no of changed entries could be different). For example, a randomized df3 could be something like this:
f1 f2 f3 f4 f5
d1 0 1 1 1 1 <- two entries are different
d2 0 0 1 0 1 <- four entries are different
d3 1 0 0 0 1 <- two entries are different
d4 0 0 1 0 1 <- two entries are different
PS. Many thanks for the help from Gavin Simpson, Joris Meys and Chase for the previous answers to my previous question on randomizing two columns.
Given the R data.frame:
> df1
a b c
1 1 1 0
2 1 0 0
3 0 1 0
4 0 0 0
Shuffle row-wise:
> df2 <- df1[sample(nrow(df1)),]
> df2
a b c
3 0 1 0
4 0 0 0
2 1 0 0
1 1 1 0
By default sample() randomly reorders the elements passed as the first argument. This means that the default size is the size of the passed array. Passing parameter replace=FALSE (the default) to sample(...) ensures that sampling is done without replacement which accomplishes a row wise shuffle.
Shuffle column-wise:
> df3 <- df1[,sample(ncol(df1))]
> df3
c a b
1 0 1 1
2 0 1 0
3 0 0 1
4 0 0 0
This is another way to shuffle the data.frame using package dplyr:
row-wise:
df2 <- slice(df1, sample(1:n()))
or
df2 <- sample_frac(df1, 1L)
column-wise:
df2 <- select(df1, one_of(sample(names(df1))))
Take a look at permatswap() in the vegan package. Here is an example maintaining both row and column totals, but you can relax that and fix only one of the row or column sums.
mat <- matrix(c(1,1,0,0,0,0,0,1,1,0,0,0,1,1,1,0,1,0,1,1), ncol = 5)
set.seed(4)
out <- permatswap(mat, times = 99, burnin = 20000, thin = 500, mtype = "prab")
This gives:
R> out$perm[[1]]
[,1] [,2] [,3] [,4] [,5]
[1,] 1 0 1 1 1
[2,] 0 1 0 1 0
[3,] 0 0 0 1 1
[4,] 1 0 0 0 1
R> out$perm[[2]]
[,1] [,2] [,3] [,4] [,5]
[1,] 1 1 0 1 1
[2,] 0 0 0 1 1
[3,] 1 0 0 1 0
[4,] 0 0 1 0 1
To explain the call:
out <- permatswap(mat, times = 99, burnin = 20000, thin = 500, mtype = "prab")
times is the number of randomised matrices you want, here 99
burnin is the number of swaps made before we start taking random samples. This allows the matrix from which we sample to be quite random before we start taking each of our randomised matrices
thin says only take a random draw every thin swaps
mtype = "prab" says treat the matrix as presence/absence, i.e. binary 0/1 data.
A couple of things to note, this doesn't guarantee that any column or row has been randomised, but if burnin is long enough there should be a good chance of that having happened. Also, you could draw more random matrices than you need and discard ones that don't match all your requirements.
Your requirement to have different numbers of changes per row, also isn't covered here. Again you could sample more matrices than you want and then discard the ones that don't meet this requirement also.
you can also use the randomizeMatrix function in the R package picante
example:
test <- matrix(c(1,1,0,1,0,1,0,0,1,0,0,1,0,1,0,0),nrow=4,ncol=4)
> test
[,1] [,2] [,3] [,4]
[1,] 1 0 1 0
[2,] 1 1 0 1
[3,] 0 0 0 0
[4,] 1 0 1 0
randomizeMatrix(test,null.model = "frequency",iterations = 1000)
[,1] [,2] [,3] [,4]
[1,] 0 1 0 1
[2,] 1 0 0 0
[3,] 1 0 1 0
[4,] 1 0 1 0
randomizeMatrix(test,null.model = "richness",iterations = 1000)
[,1] [,2] [,3] [,4]
[1,] 1 0 0 1
[2,] 1 1 0 1
[3,] 0 0 0 0
[4,] 1 0 1 0
>
The option null.model="frequency" maintains column sums and richness maintains row sums.
Though mainly used for randomizing species presence absence datasets in community ecology it works well here.
This function has other null model options as well, check out following link for more details (page 36) of the picante documentation
Of course you can sample each row:
sapply (1:4, function (row) df1[row,]<<-sample(df1[row,]))
will shuffle the rows itself, so the number of 1's in each row doesn't change. Small changes and it also works great with columns, but this is a exercise for the reader :-P
If the goal is to randomly shuffle each column, some of the above answers don't work since the columns are shuffled jointly (this preserves inter-column correlations). Others require installing a package. Yet a one-liner exist:
df2 = lapply(df1, function(x) { sample(x) })
You can also "sample" the same number of items in your data frame with something like this:
nr<-dim(M)[1]
random_M = M[sample.int(nr),]
Random Samples and Permutations ina dataframe
If it is in matrix form convert into data.frame
use the sample function from the base package
indexes = sample(1:nrow(df1), size=1*nrow(df1))
Random Samples and Permutations
Here is a data.table option using .N with sample like this:
library(data.table)
setDT(df)
df[sample(.N)]
#> a b c
#> 1: 0 1 0
#> 2: 1 1 0
#> 3: 1 0 0
#> 4: 0 0 0
Created on 2023-01-28 with reprex v2.0.2
Data:
df <- read.table(text = " a b c
1 1 1 0
2 1 0 0
3 0 1 0
4 0 0 0", header = TRUE)