Creating a repeated sequence of zero and ones with uneven "breaks" between - r

I am trying to create a sequence consisting of 1 and 0 using Rstudio.
My desired output is a sequence that first has five 1 then six 0, followed by four 1 then six 0. Then this should all be repeat until the end of a given vector.
The result should be like this:
1 1 1 1 1 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 .....
Hope someone has a good solution, and sorry if I have some grammar mistakes
Best,
HB

rep(c(rep(1,5),rep(0,6),rep(1,4),rep(0,6)),n)
repeating your pattern n times.

You could use Map.
unlist(Map(function(x, ...) c(rep(x, ...), rep(0, 6)), 1, times=length(v):1))
# [1] 1 1 1 1 1 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0
Instead of length(v):1 you may also use rev(seq(v)) but it's slower.
Data
v <- c("Vector", "of", "specific", "length", "five")

Related

R - Creating a new column within a data frame when two or more columns are a match in a row

I'm currently stuck on a part of my code that feels intuitive but I can't figure a way to do it. I have a very big data frame (nrows = 34036, ncol = 43) in which I want to create a continuous sequence of the variables where the value of the row is 1 (without having multiple columns with 1). It consists of only zeros and ones similar to the following:
A B C D
1 0 0 0
0 0 0 1
0 0 0 1
0 0 0 0
0 0 0 0
1 0 1 0
1 0 1 0
0 1 0 0
0 1 0 0
1 0 0 1
I was able to remove the zeroes using:
#find the sum of each row
placeholderData <- transform(placeholderData, sum=rowSums(placeholderData))
placeholderData <- placeholderData[!(placeholderData$sum <= 0),]
And the data frame now looks like:
A B C D sum
1 0 0 0 1
0 0 0 1 1
0 0 0 1 1
1 0 1 0 2
1 0 1 0 2
0 1 0 0 1
0 1 0 0 1
1 0 0 1 2
My main problem comes when there are two or more 1's in a row. To try to solve this, I used the following code to identify the columns that have a sum of 2 or more:
placeholderData$Matches <- lapply(apply(placeholderData == 1, 1, which), names)
Which added the following column to the data frame:
A B C D sum Matches
1 0 0 0 1 A
0 0 0 1 1 D
0 0 0 1 1 D
1 0 1 0 2 c("A","C")
1 0 1 0 2 c("A","C")
0 1 0 0 1 B
0 1 0 0 1 B
1 0 0 1 2 c("A", "D")
I added the Matches column as an approach to solve the problem, but I'm not sure how would I do it without using a lot of logical operators (I don't know what columns have matches or not). What I would like to do is to aggregate the rows that have more than (or equal to) two 1's into a new column, to be able to have a data frame like this:
A B C D AC AD sum Matches
1 0 0 0 0 0 1 A
0 0 0 1 0 0 1 D
0 0 0 1 0 0 1 D
0 0 0 0 1 0 1 c("A","C")
0 0 0 0 1 0 1 c("A","C")
0 1 0 0 0 0 1 B
0 1 0 0 0 0 1 B
0 0 0 0 0 1 1 c("A", "D")
Then, I would be able to use my code as normal (It works just fine when there are no repeated values in rows). I tried searching to find similar questions, but I'm not sure if I was even asking the right question. I was wondering if anyone could provide some help or some ideas that I could try.
Thank you very much!
This seems a lot like making dummy variables, so I would use the model.matrix function commonly used for dummy variables (one-hot encoding):
m = read.table(header = T, text = "A B C D
1 0 0 0
0 0 0 1
0 0 0 1
0 0 0 0
0 0 0 0
1 0 1 0
1 0 1 0
0 1 0 0
0 1 0 0
1 0 0 1")
m = m[rowSums(m) > 0, ]
d = factor(sapply(apply(m == 1, 1, which), function(x) paste(names(m)[x], collapse = "")))
result = data.frame(model.matrix(~ d + 0))
names(result) = levels(d)
# A AC AD B D
# 1 1 0 0 0 0
# 2 0 0 0 0 1
# 3 0 0 0 0 1
# 4 0 1 0 0 0
# 5 0 1 0 0 0
# 6 0 0 0 1 0
# 7 0 0 0 1 0
# 8 0 0 1 0 0

Permutation position of numbers in R

I'm looking for a function in R which can do the permutation. For example, I have a vector with five 1 and ten 0 like this:
> status=c(rep(1,5),rep(0,10))
> status
[1] 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0
Now I'd like to randomly permute the position of these numbers but keep the same number of 0 and 1 in vector and to get new series of number, for example to get something like this:
1 1 0 1 0 1 0 0 0 0 0 1 0 0 0
or
1 0 0 0 0 0 0 1 1 0 0 1 0 1 0
I found the function sample() can help us to sample, but the number of 1 and 0 is not the same each time. Do you know how can I do this with R? Thanks in advance.
We can use sample
sample(status)
#[1] 1 0 0 1 0 0 1 0 0 0 0 1 0 1 0
sample(status)
#[1] 0 0 0 0 1 1 0 0 1 1 0 0 0 1 0
If we use sample to return the entire vector, it will do the permutation and give the frequency count same for each of the unique elements
colSums(replicate(5, sample(status)))
#[1] 5 5 5 5 5
i.e. we get 5 one's in each of the sampling. So, the remaining 0's would be 10.

finding strcutural holes constraint , efficiency,ego density and effective size in r

I am working on the adjacency matrix to find the results of the egonet package function. But when I run the command index.egonet, it gives me an error.
My adjacency matrix "p2":
p2
1 2 3 4 5 7 8 9 6
1 0 1 1 1 1 0 0 0 0
2 1 0 0 0 1 1 1 1 0
3 1 0 0 0 0 1 0 1 1
4 1 0 0 0 0 0 0 0 0
5 1 1 0 0 0 0 0 0 0
7 0 1 1 0 0 0 0 0 0
8 0 1 0 0 0 0 0 0 0
9 0 1 1 0 0 0 0 0 0
6 0 0 1 0 0 0 0 0 0
I apply this command on the adjacency for the desired results but it gives me an error
index.egonet(p2)
Error in dati[ego.name, y] : subscript out of bounds
So any alternative or solution to current code error will highly be appreciated.
The ego name must be "EGO" in capital letters, as far as I could understand from working with that function.
colnames(p2) <- rownames(p2) <- c("EGO", 2:ncol(p2))
index.egonet(p2)
this should work...

changing values in vector given a location and condition with R

i'm having trouble manipulating vectors in R. i have a vector that looks like this:
stack <- append(append(rep(0,8),c(1,0,0,0,0,1)),rep(0,6))
[1] 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0
my overall goal is to the manipulate the vector as such:
*when there is a 1, make the next three values in the vector 1.
*change the original 1 to 0.
so ultimately the vector would look like:
[1] 0 0 0 0 0 0 0 0 0 1 1 1 0 0 1 1 1 0 0 0
the second part I can do by:
replace(stack,which(stack == 1),0)
but I can't figure out how to do the first one efficiently. any help would be greatly appreciated.
You can use filter here :
c(filter(sx,c(0,0,0,0,1,1,1),circular=TRUE))
## [1] 0 0 0 0 0 0 0 0 0 1 1 1 0 0 1 1 1 0 0 0
Here's a possible base R option
temp <- which(stack == 1)
stack[as.vector(mapply(`:`, temp, temp + 3))] <- c(0, rep(1, 3))
stack
# [1] 0 0 0 0 0 0 0 0 0 1 1 1 0 0 1 1 1 0 0 0
I would go with regular expressions
stack <- paste0(stack, collapse="")
stack <- gsub("1.{3}", "0111", stack)
stack <- strsplit(stack, "+")

How can I calculate an empirical CDF in R?

I'm reading a sparse table from a file which looks like:
1 0 7 0 0 1 0 0 0 5 0 0 0 0 2 0 0 0 0 1 0 0 0 1
1 0 0 1 0 0 0 3 0 0 0 0 1 0 0 0 1
0 0 0 1 0 0 0 2 0 0 0 0 1 0 0 0 1 0 1 0 0 1
1 0 0 1 0 3 0 0 0 0 1 0 0 0 1
0 0 0 1 0 0 0 2 0 0 0 0 1 0 0 0 1 0 1 0 0 1 1 2 1 0 1 0 1
Note row lengths are different.
Each row represents a single simulation. The value in the i-th column in each row says how many times value i-1 was observed in this simulation. For example, in the first simulation (first row), we got a single result with value '0' (first column), 7 results with value '2' (third column) etc.
I wish to create an average cumulative distribution function (CDF) for all the simulation results, so I could later use it to calculate an empirical p-value for true results.
To do this I can first sum up each column, but I need to take zeros for the undef columns.
How do I read such a table with different row lengths? How do I sum up columns replacing 'undef' values with 0'? And finally, how do I create the CDF? (I can do this manually but I guess there is some package which can do that).
This will read the data in:
dat <- textConnection("1 0 7 0 0 1 0 0 0 5 0 0 0 0 2 0 0 0 0 1 0 0 0 1
1 0 0 1 0 0 0 3 0 0 0 0 1 0 0 0 1
0 0 0 1 0 0 0 2 0 0 0 0 1 0 0 0 1 0 1 0 0 1
1 0 0 1 0 3 0 0 0 0 1 0 0 0 1
0 0 0 1 0 0 0 2 0 0 0 0 1 0 0 0 1 0 1 0 0 1 1 2 1 0 1 0 1")
df <- data.frame(scan(dat, fill = TRUE, what = as.list(rep(1, 29))))
names(df) <- paste("Val", 1:29)
close(dat)
Resulting in:
> head(df)
Val 1 Val 2 Val 3 Val 4 Val 5 Val 6 Val 7 Val 8 Val 9 Val 10 Val 11 Val 12
1 1 0 7 0 0 1 0 0 0 5 0 0
2 1 0 0 1 0 0 0 3 0 0 0 0
3 0 0 0 1 0 0 0 2 0 0 0 0
4 1 0 0 1 0 3 0 0 0 0 1 0
5 0 0 0 1 0 0 0 2 0 0 0 0
....
If the data are in a file, provide the file name instead of dat. This code presumes that there are a maximum of 29 columns, as per the data you supplied. Alter the 29 to suit the real data.
We get the column sums using
df.csum <- colSums(df, na.rm = TRUE)
the ecdf() function generates the ECDF you wanted,
df.ecdf <- ecdf(df.csum)
and we can plot it using the plot() method:
plot(df.ecdf, verticals = TRUE)
You can use the ecdf() (in base R) or Ecdf() (from the Hmisc package) functions.

Resources