I am working with survey data and i would like to replace specific values - that follow each other- in a data frame.
For example
v1 v2 v3 v4 v5
0 2 0 0 55
0 0 3 0 1
3 0 1 1 2
0 2 0 2 0
If I replace (0,2,0) with 1's and the rest of the data frame with 0's, the new matrix will look like
v1 v2 v3 v4 v5
1 1 1 0 0
0 0 0 0 0
0 0 0 0 0
1 1 1 1 1
How can I do this to n-lenght specific number, i.e. (1,3); (1,2,4,5,8,2)?
As others have pointed out, you need to clarify your question a bit to make sure that we are answering it correctly.
My assumptions are that, you are matching a pattern (c(0,2,0)) in the example you show, and you are only matching the pattern across rows. That is, it cannot wrap from row 2, column 5 to row 3, column 1, nor will it check matches in the columns only.
If those assumptions are correct, then the following function will work. It replicates the example you provided, and returns a matrix. You can modify the replace value (rep_val) and fill values (fill_val), 1 and 0 in your example, respectively, with the optional parameters. This function could also be improved to be more elegant, but I think it works.
Code
replace_pattern <- function(x, pattern, rep_with = 1, fill_val = 0)
{
n <- length(pattern)
if (n > ncol(x))
stop("pattern is longer than number of columns")
new_x <- matrix(fill_val, nrow = nrow(x), ncol = ncol(x))
# loop over each row
for (rr in seq_len(nrow(x))) {
# start matching the pattern at the entry = length of pattern
# and look backwards
for (cc in n:ncol(x)) {
cur_cols <- (cc - n + 1):cc
cur_vals <- x[rr, cur_cols]
# if it matches the pattern, replace the values with specified value
if (isTRUE(all.equal(cur_vals, pattern, check.attributes = FALSE))) {
new_x[rr, cur_cols] <- rep_with
}
}
}
new_x
}
Testing
Using your example, and setting it to xx
xx:
v1 v2 v3 v4 v5
0 2 0 0 55
0 0 3 0 1
3 0 1 1 2
0 2 0 2 0
And then calling replace_pattern(xx, c(0, 2, 0)) returns the second matrix you provided.
x2 <- xx
x2[2, 2] <- 1
replace_pattern(x2, c(1, 3))
Returns:
[,1] [,2] [,3] [,4] [,5]
[1,] 0 0 0 0 0
[2,] 0 1 1 0 0
[3,] 0 0 0 0 0
[4,] 0 0 0 0 0
And
xx3 <- rbind(xx, c(1, 2, 4, 5, 8))
replace_pattern(xx3, c(1, 2, 4, 5, 8))
Returns:
[,1] [,2] [,3] [,4] [,5]
[1,] 0 0 0 0 0
[2,] 0 0 0 0 0
[3,] 0 0 0 0 0
[4,] 0 0 0 0 0
[5,] 1 1 1 1 1
Finally, replace_pattern(xx, c(1,2,4,5,8,2)) will fail because the pattern is longer than the number of columns.
Related
I have a binary vector x=(1,0,0,1). lower-order terms including itself of this vectors are assumed (0,0,0,0), (0,0,0,1), (1,0,0,0) and (1,0,0,1). How do I find this lower-order vectors in R.
What I understand so far: basically we want o find subsets, replace each 1 by 0. But to do it in R? I am clueless?
here what I tried so far.
a<-c(1,0,0,1)
M<-length(a)
for(i in 1:M){
ifelse(a[i]==1, a[i]<-0, next)
print(a)
}
[1] 0 0 0 1
[1] 0 0 0 0
what I am looking for in detail: for example, I have 4 factors A,B,C,D. Here (1,0,0,1) means AD.
Now I want a subset of (1,0,0,1) that means AD. In my subsets, I can not have B and C. Result will be {} {A} {D} {AD} in binary form (0,0,0,0), (1,0,0,0),(0,0,0,1),(1,0,0,1).
Here's a method relying on expand.grid to do the heavy lifting:
vecs = lapply(a, seq, 0) # keep 0s as 0, make 1s c(1, 0)
do.call(expand.grid, vecs) # generate all combinations
# Var1 Var2 Var3 Var4
# 1 1 0 0 1
# 2 0 0 0 1
# 3 1 0 0 0
# 4 0 0 0 0
Using RcppAlgos::permuteGeneral.
library(RcppAlgos)
A <- t(apply(permuteGeneral(length(a), sum(a)), 1, function(x) {a[x] <- 0; a}))
A[!duplicated(A), ]
# [,1] [,2] [,3] [,4]
# [1,] 0 0 0 1
# [2,] 0 0 0 0
# [3,] 1 0 0 1
# [4,] 1 0 0 0
We can use the which, combn, and *apply functions to perform this operation. Since this is a step-by-step operation, it may be helpful to look at the results line-by-line.
Here it is wrapped in a function called find_binary_subsets:
find_binary_subsets <- function(x){
# where does x equal 1
x_eq_1 <- which(x == 1)
# combinations of indexes where x == 1
l_w_x <- lapply(length(x_eq_1):1,
FUN = function(l) combn(x_eq_1, l))
# loop over the combinations of indexes where x == 1, replace by 0, return vector
# apply(., 2) loops over the columns of a matrix, which is what we want
combs <- lapply(l_w_x,
FUN = function(d)
apply(d, 2, FUN = function(i){x[i] <- 0; x}))
# cbind results, then transpose to arrange by row
t(cbind(do.call("cbind", combs), x))
}
find_binary_subsets(a)
[,1] [,2] [,3] [,4]
0 0 0 0
0 0 0 1
1 0 0 0
x 1 0 0 1
I'm trying to find a way to create a matrix which counts values from each row of a data frame. I'd like it to recognise the values in each row of the data frame, and count how many times that value has occurred in all rows above the row the value occurs in (not the whole data frame).
The same value will never occur more than once in a single row of the data frame.
For example:
# df:
a b c
1 2 3
3 4 5
3 2 6
7 8 9
8 3 6
matrix result:
0 0 0 (none of the df values have occurred as there are no rows above)
1 0 0 (3 has occurred once above, the others have not occurred)
2 1 0 (3 has occurred twice above, 2 has occurred once above, 6 has not occurred)
0 0 0 (none of the df values have occurred in rows above)
1 3 1 (8 has occurred once, 3 has occurred 3 times, 6 has occurred once)
Here's one way:
# convert to a vector
x = as.vector(t(as.matrix(df)))
# get counts of each unique element (in the right place)
# and add them up
res = rowSums(sapply(unique(x), function(z) {
r = integer(length(x))
r[x == z] = 0:(sum(x == z) - 1)
return(r)
}))
# convert to matrix
res = matrix(res, ncol = ncol(df), byrow = T)
res
# [,1] [,2] [,3]
# [1,] 0 0 0
# [2,] 1 0 0
# [3,] 2 1 0
# [4,] 0 0 0
# [5,] 1 3 1
Using this data:
df = read.table(text = "
a b c
1 2 3
3 4 5
3 2 6
7 8 9
8 3 6", header = T)
Another...for fun
out<-matrix(1,nrow = nrow(df),ncol = ncol(df))
for(i in 1:nrow(df)){
out[i,]<-sapply(1:ncol(df),function(z) sum(unlist(df[0:(i-1),]) %in% df[i,z]))
}
out
[,1] [,2] [,3]
[1,] 0 0 0
[2,] 1 0 0
[3,] 2 1 0
[4,] 0 0 0
[5,] 1 3 1
Three other approaches:
1) with base R:
temp <- stack(df)[c(outer(c(0,5,10), 1:5, '+')),]
temp$val2 <- with(temp, ave(values, values, FUN = seq_along)) - 1
df2 <- unstack(temp, val2 ~ ind)
which gives:
> df2
a b c
1 0 0 0
2 1 0 0
3 2 1 0
4 0 0 0
5 1 3 1
2) with data.table:
library(data.table)
melt(setDT(df)[, r := .I],
id = 'r')[order(r), val2 := rowid(value) - 1
][, dcast(.SD, rowid(variable) ~ variable, value.var = 'val2')
][, variable := NULL][]
which gives the same result.
3) with the tidyverse:
library(dplyr)
library(tidyr)
df %>%
mutate(r = row_number()) %>%
gather(k, v, -4) %>%
arrange(r) %>%
group_by(v) %>%
mutate(v2 = row_number() - 1) %>%
ungroup() %>%
select(r, k, v2) %>%
spread(k, v2)
which, off course, also gives the same result.
Here is another solution:
df = read.table(text = "a b c
1 2 3
3 4 5
3 2 6
7 8 9
8 3 6", header = T)
elements = sort(unique(unlist(df)))
frequency = sapply(elements, # for each element
function(element) {apply(df == element, 1, sum)}) # Sum the number of occurances per row
# [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
# [1,] 1 1 1 0 0 0 0 0 0
# [2,] 0 0 1 1 1 0 0 0 0
# [3,] 0 1 1 0 0 1 0 0 0
# [4,] 0 0 0 0 0 0 1 1 1
# [5,] 0 0 1 0 0 1 0 1 0
results = df
for(i in 1:nrow(df)){
for(j in 1:ncol(df))
results[i,j] = sum(frequency[1:i-1, # Sum the prevoius rows occurances
which(df[i,j] == elements)]) # Of the same element
}
# a b c
# 1 0 0 0
# 2 1 0 0
# 3 2 1 0
# 4 0 0 0
# 5 1 3 1
I know we're not supposed to comment with "thanks", but thank you to all. I've marked Brian's response as the most useful because I'm pretty new to R and his was the example I could follow all the way through without needing to look anything up. I'll have fun finding out about all the other ways and new (to me) functions / approaches you've kindly shared though.
I have a data table and one of the columns is a bunch of 0's and 1's, just like vec below.
vec = c(rep(1, times = 6), rep(0, times = 10), rep(1, times = 11), rep(0, times = 4))
> vec
[1] 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0
What I want to do is to split the data everytime there's a change in that column from 0 to 1 or vice-versa. Here is what I have done so far:
b = c(vec[1],diff(vec))
rowby = numeric(0)
for (i in 2:(length(b))) {
if (b[i] != 0) {
rowby <- c(rowby, i-1)
}
}
splitted_data <- split(vec, cumsum(c(TRUE,(1:length(vec) %in% rowby)[-length(vec)])))
There must be some thing right under my nose I can't see. What is a correct way to do this? This works for the example above, but not generally.
Try
split(vec,cumsum(c(1, abs(diff(vec)))))
#$`1`
#[1] 1 1 1 1 1 1
#$`2`
#[1] 0 0 0 0 0 0 0 0 0 0
#$`3`
#[1] 1 1 1 1 1 1 1 1 1 1 1
#$`4`
#[1] 0 0 0 0
Or use rle
split(vec,inverse.rle(within.list(rle(vec), values <- seq_along(values))))
With current versions of data.table, rleid is one function which can be used for this job:
library(data.table)#v1.9.5+
split(vec,rleid(vec))
I have a dataframe (df1) like this.
f1 f2 f3 f4 f5
d1 1 0 1 1 1
d2 1 0 0 1 0
d3 0 0 0 1 1
d4 0 1 0 0 1
The d1...d4 column is the rowname, the f1...f5 row is the columnname.
To do sample(df1), I get a new dataframe with count of 1 same as df1. So, the count of 1 is conserved for the whole dataframe but not for each row or each column.
Is it possible to do the randomization row-wise or column-wise?
I want to randomize the df1 column-wise for each column, i.e. the number of 1 in each column remains the same. and each column need to be changed by at least once. For example, I may have a randomized df2 like this: (Noted that the count of 1 in each column remains the same but the count of 1 in each row is different.
f1 f2 f3 f4 f5
d1 1 0 0 0 1
d2 0 1 0 1 1
d3 1 0 0 1 1
d4 0 0 1 1 0
Likewise, I also want to randomize the df1 row-wise for each row, i.e. the no. of 1 in each row remains the same, and each row need to be changed (but the no of changed entries could be different). For example, a randomized df3 could be something like this:
f1 f2 f3 f4 f5
d1 0 1 1 1 1 <- two entries are different
d2 0 0 1 0 1 <- four entries are different
d3 1 0 0 0 1 <- two entries are different
d4 0 0 1 0 1 <- two entries are different
PS. Many thanks for the help from Gavin Simpson, Joris Meys and Chase for the previous answers to my previous question on randomizing two columns.
Given the R data.frame:
> df1
a b c
1 1 1 0
2 1 0 0
3 0 1 0
4 0 0 0
Shuffle row-wise:
> df2 <- df1[sample(nrow(df1)),]
> df2
a b c
3 0 1 0
4 0 0 0
2 1 0 0
1 1 1 0
By default sample() randomly reorders the elements passed as the first argument. This means that the default size is the size of the passed array. Passing parameter replace=FALSE (the default) to sample(...) ensures that sampling is done without replacement which accomplishes a row wise shuffle.
Shuffle column-wise:
> df3 <- df1[,sample(ncol(df1))]
> df3
c a b
1 0 1 1
2 0 1 0
3 0 0 1
4 0 0 0
This is another way to shuffle the data.frame using package dplyr:
row-wise:
df2 <- slice(df1, sample(1:n()))
or
df2 <- sample_frac(df1, 1L)
column-wise:
df2 <- select(df1, one_of(sample(names(df1))))
Take a look at permatswap() in the vegan package. Here is an example maintaining both row and column totals, but you can relax that and fix only one of the row or column sums.
mat <- matrix(c(1,1,0,0,0,0,0,1,1,0,0,0,1,1,1,0,1,0,1,1), ncol = 5)
set.seed(4)
out <- permatswap(mat, times = 99, burnin = 20000, thin = 500, mtype = "prab")
This gives:
R> out$perm[[1]]
[,1] [,2] [,3] [,4] [,5]
[1,] 1 0 1 1 1
[2,] 0 1 0 1 0
[3,] 0 0 0 1 1
[4,] 1 0 0 0 1
R> out$perm[[2]]
[,1] [,2] [,3] [,4] [,5]
[1,] 1 1 0 1 1
[2,] 0 0 0 1 1
[3,] 1 0 0 1 0
[4,] 0 0 1 0 1
To explain the call:
out <- permatswap(mat, times = 99, burnin = 20000, thin = 500, mtype = "prab")
times is the number of randomised matrices you want, here 99
burnin is the number of swaps made before we start taking random samples. This allows the matrix from which we sample to be quite random before we start taking each of our randomised matrices
thin says only take a random draw every thin swaps
mtype = "prab" says treat the matrix as presence/absence, i.e. binary 0/1 data.
A couple of things to note, this doesn't guarantee that any column or row has been randomised, but if burnin is long enough there should be a good chance of that having happened. Also, you could draw more random matrices than you need and discard ones that don't match all your requirements.
Your requirement to have different numbers of changes per row, also isn't covered here. Again you could sample more matrices than you want and then discard the ones that don't meet this requirement also.
you can also use the randomizeMatrix function in the R package picante
example:
test <- matrix(c(1,1,0,1,0,1,0,0,1,0,0,1,0,1,0,0),nrow=4,ncol=4)
> test
[,1] [,2] [,3] [,4]
[1,] 1 0 1 0
[2,] 1 1 0 1
[3,] 0 0 0 0
[4,] 1 0 1 0
randomizeMatrix(test,null.model = "frequency",iterations = 1000)
[,1] [,2] [,3] [,4]
[1,] 0 1 0 1
[2,] 1 0 0 0
[3,] 1 0 1 0
[4,] 1 0 1 0
randomizeMatrix(test,null.model = "richness",iterations = 1000)
[,1] [,2] [,3] [,4]
[1,] 1 0 0 1
[2,] 1 1 0 1
[3,] 0 0 0 0
[4,] 1 0 1 0
>
The option null.model="frequency" maintains column sums and richness maintains row sums.
Though mainly used for randomizing species presence absence datasets in community ecology it works well here.
This function has other null model options as well, check out following link for more details (page 36) of the picante documentation
Of course you can sample each row:
sapply (1:4, function (row) df1[row,]<<-sample(df1[row,]))
will shuffle the rows itself, so the number of 1's in each row doesn't change. Small changes and it also works great with columns, but this is a exercise for the reader :-P
If the goal is to randomly shuffle each column, some of the above answers don't work since the columns are shuffled jointly (this preserves inter-column correlations). Others require installing a package. Yet a one-liner exist:
df2 = lapply(df1, function(x) { sample(x) })
You can also "sample" the same number of items in your data frame with something like this:
nr<-dim(M)[1]
random_M = M[sample.int(nr),]
Random Samples and Permutations ina dataframe
If it is in matrix form convert into data.frame
use the sample function from the base package
indexes = sample(1:nrow(df1), size=1*nrow(df1))
Random Samples and Permutations
Here is a data.table option using .N with sample like this:
library(data.table)
setDT(df)
df[sample(.N)]
#> a b c
#> 1: 0 1 0
#> 2: 1 1 0
#> 3: 1 0 0
#> 4: 0 0 0
Created on 2023-01-28 with reprex v2.0.2
Data:
df <- read.table(text = " a b c
1 1 1 0
2 1 0 0
3 0 1 0
4 0 0 0", header = TRUE)
I am working on a project and I reached this point but in fact I am stuck on it since one week ago, I tried many ideas but all trials to code my algorithm failed.
Suppose we have the following simple graph:
the edges in order are: 1--3, 1--4, 3--2
For each edge, a random walk is defined on each vertex to move to one of it's neighbors like:
For the first edge, v1=1 ,v2=3, n1=3,4 and n2=1,2 in order, so the possible moves from v1 and v2 are:
1 to 3,3 to 1
1 to 4,3 to 1
1 to 3,3 to 2
1 to 4,3 to 2
For the second edge, v1=1 ,v2=4, n1=3,4 and n2=1 in order,so the possible moves from v1 and v2 are:
1 to 3,4 to 1
1 to 4,3 to 1
For the third edge, v1=3 ,v2=2, n1=1,2 and n2=3 in order,so the possible moves from v1 and v2 are:
3 to 1,2 to 3
3 to 2,2 to 3
For the whole graph there are just 8 possible moves so I have 8 variables to construct the constraints matrix
Let us denote the moves by x's (according to their order of occurrences); i.e
(1 to 3,3 to 1) to be represented by x_1
(1 to 4,3 to 1) to be represented by x_2
:
(3 to 1,2 to 3) to be represented by x_7
(3 to 2,2 to 3) to be represented by x_8
I want to build the required constraints matrix depending on these moves, the number of constraints will equal \sum{i} ( number of neighbors for v1(i) * number of neighbors for v2(i) ) which is 10 in our graph.
My algorithm to build this matrix is:
Step1: 1) select 1st edge, fix v1, v2, n2
2) change n1 and fill the 1st row of the matrix by 1's in the place of the resulted moves and 0 if there is no similar move on the graph until you finish all elements in n1.
Step2: move to the 2nd row of the matrix and select the 2nd element of n2 and
1) loop over n1
2) fill the 2nd row by 1's in the place of the resulted moves until you finish all elements in n1.
Step3: since you selected all elements in n1 and n2 for the vertices in the first edge move to a new row in the matrix
Step4: Select next edges and do the same work done before until you finish all edges.
Step5: select the 1st edge again and do the same work but while fixing v1,v2 &n1, loop over n2
The resulted matrix according to this algorithm will be:
1 1 0 0 0 0 0 0
0 0 1 1 0 0 0 0
0 0 0 0 1 1 0 0
0 0 0 0 0 0 1 1
1 0 1 0 0 0 0 0
0 1 0 1 0 0 0 0
0 0 0 0 1 0 0 0
0 0 0 0 0 1 0 0
0 0 0 0 0 0 1 0
0 0 0 0 0 0 0 1
What I failed to do is: how to let the matrix know that there is a move and to replace it by 1 in it's position and if there is no move to replace it by 0 in it's position
My code is:
library(igraph)
graph<-matrix(c(1,3,1,4,3,2),ncol=2,byrow=TRUE)
g<-graph.data.frame(d = graph, directed = FALSE)
countercol<-0
for (edge in 1:length(E(g))){
v1<-ends(graph = g, es = edge)[1]
v2<-ends(graph = g, es = edge)[2]
n1<-neighbors(g,v1,mode=c("all"))
n2<-neighbors(g,v2,mode=c("all"))
countercol=countercol+(length(n1)*length(n2))
}
counterrow<-0
for (edge in 1:length(E(g))){
v1<-ends(graph = g, es = edge)[1]
v2<-ends(graph = g, es = edge)[2]
n1<-neighbors(g,v1,mode=c("all"))
n2<-neighbors(g,v2,mode=c("all"))
counterrow=counterrow+(length(n1)+length(n2))
}
for (edge in 1:length(E(df))){
v1<-ends(graph = df, es = edge)[1]
v2<-ends(graph = df, es = edge)[2]
n1<-neighbors(df,v1,mode=c("all"))
n2<-neighbors(df,v2,mode=c("all"))
...
...
...
}
I am not looking for someone to write the code, what I want is the idea to let the program differentiate between the possible moves and store 1's and 0's in the suitable position for the resulted move.
Many Many thanks for any kind of help
Here's a solution consisting of two parts
edgeMoves <- function(e) {
umoves <- sapply(ends(graph = g, es = e), neighbors, graph = g, mode = "all", simplify = FALSE)
do.call(paste, c(expand.grid(mapply(function(x, y)
paste(x, names(y), sep =" to "), ends(graph = g, es = e), umoves, SIMPLIFY = FALSE)), sep = ", "))
}
edgeConstraints <- function(e) {
v <- ends(graph = g, es = e)
n1 <- names(neighbors(g, v[1], mode = "all"))
n2 <- names(neighbors(g, v[2], mode = "all"))
t(cbind(sapply(n2, function(nn2) moves %in% paste0(v[1], " to ", n1, ", ", v[2], " to ", nn2)),
sapply(n1, function(nn1) moves %in% paste0(v[1], " to ", nn1, ", ", v[2], " to ", n2))))
}
moves <- do.call(c, sapply(E(g), edgeMoves))
moves
# [1] "1 to 3, 3 to 1" "1 to 4, 3 to 1" "1 to 3, 3 to 2"
# [4] "1 to 4, 3 to 2" "1 to 3, 4 to 1" "1 to 4, 4 to 1"
# [7] "3 to 1, 2 to 3" "3 to 2, 2 to 3"
do.call(rbind, sapply(E(g), edgeConstraints)) * 1
# [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
# 1 1 1 0 0 0 0 0 0
# 2 0 0 1 1 0 0 0 0
# 3 1 0 1 0 0 0 0 0
# 4 0 1 0 1 0 0 0 0
# 1 0 0 0 0 1 1 0 0
# 3 0 0 0 0 1 0 0 0
# 4 0 0 0 0 0 1 0 0
# 3 0 0 0 0 0 0 1 1
# 1 0 0 0 0 0 0 1 0
# 2 0 0 0 0 0 0 0 1
The row order is different, but I suspect that it is not a problem. Also, for a single edge you may use edgeMoves(e) and edgeConstraints(e) * 1.