I have data like this:
x = c(1,2,3)
prob = c(0.13,0.13,0.74)
# Total sample size
n = 70
result = rep(x, round(n * prob))
Final<-replicate(1, sample(result))
I want to make a matrix[7,10] that have the probability of (0.14,0.14,0.72) for (1,2,3). In this matrix, I need to have in every seven values 1 and 2 repeat 1, and 3 repeats 5 times like this :
3 3 3 1 2 3 3
3 3 3 3 2 1 3
3 2 1 3 3 3 3
3 3 3 1 3 3 2
2 1 3 3 3 3 3
2 1 3 3 3 3 3
3 3 3 2 1 3 3
3 3 3 3 2 3 1
3 2 1 3 3 3 3
So, I will get just one 1, and one 2 in each raw. Could you please help me how to write the code?
One way is to populate a matrix with 3's, then assign 1 and 2 randomly to a column position for each row.
set.seed(1)
m <- matrix(rep(3, 7*10), ncol = 7)
pos <- replicate(10, sample(1:7, 2))
for (i in 1:nrow(m)) m[i, pos[,i]] <- 1:2
m
#> [,1] [,2] [,3] [,4] [,5] [,6] [,7]
#> [1,] 1 3 3 2 3 3 3
#> [2,] 2 3 3 3 3 3 1
#> [3,] 3 1 3 3 2 3 3
#> [4,] 3 3 2 3 3 3 1
#> [5,] 3 2 3 3 3 1 3
#> [6,] 3 3 1 3 3 3 2
#> [7,] 1 3 3 3 2 3 3
#> [8,] 3 2 3 3 1 3 3
#> [9,] 3 3 3 3 3 1 2
#> [10,] 2 1 3 3 3 3 3
Created on 2022-04-15 by the reprex package (v2.0.1)
I have a dataset of adolescents over 5 waves. In each wave they nominate up to 3 friends. I want to add variables that indicate whether each friend was nominated in the previous wave of data collection.
My data look like this sample:
student_id wave friend1_id friend2_id friend3_id
1 1 3 NA NA
2 1 5 2 3
3 1 2 4 5
4 1 1 6 NA
5 1 1 NA 6
6 1 5 NA 2
7 1 8 NA NA
8 1 NA 9 NA
9 1 8 7 NA
10 1 7 9 NA
1 2 4 NA NA
2 2 5 3 NA
3 2 NA NA 5
4 2 NA NA NA
5 2 6 NA NA
6 2 5 NA NA
7 2 10 1 3
8 2 9 NA NA
9 2 8 6 7
10 2 7 4 NA
So wave 2 "consistency" variables should look like this (0 is not present in previous wave 1 is present in previous wave, NA if they didn't nominate someone in wave 2):
student_id wave friend1_consit friend2_consit friend3_consit
1 2 0 NA NA
2 2 1 1 NA
3 2 NA NA 1
4 2 NA NA NA
5 2 1 NA NA
6 2 1 NA NA
7 2 0 0 0
8 2 1 NA NA
9 2 1 2 1
10 2 1 0 NA
This answer in Base-R returns a matrix with the student_id as the rows, and the wave# as the columns:
votes_bywave <- split(df1[,3:5],df1$wave)
votes_bywave <- lapply(votes_bywave, function(x) unique(unlist(x)))
votes_bywave <- sapply(votes_bywave, function(x) unique(df1$student_id) %in% x )
> votes_bywave
1 2
[1,] TRUE TRUE
[2,] TRUE FALSE
[3,] TRUE TRUE
[4,] TRUE TRUE
[5,] TRUE TRUE
[6,] TRUE TRUE
[7,] TRUE TRUE
[8,] TRUE TRUE
[9,] TRUE TRUE
[10,] FALSE TRUE
or you may prefer to have the actual Ids listed, in which case add this line at the end:
cbind(student_id = unique(df1$student_id), votes_bywave)
student_id 1 2
[1,] 1 1 1
[2,] 2 1 0
[3,] 3 1 1
[4,] 4 1 1
[5,] 5 1 1
[6,] 6 1 1
[7,] 7 1 1
[8,] 8 1 1
[9,] 9 1 1
[10,] 10 0 1
Suppose I have a column vector of [1 1 1 2 2 2 3 3 3] and I want to generate all the different column vectors only by switching two positions. For an example, one such vector would be
[1 1 3 2 2 2 1 3 3].
Try this (it gives you a data frame each row of which is a unique vector with 2 elements swapped from the original vector, there are 28 such unique vectors, including the original one):
v <- c(1,1,1,2,2,2,3,3,3)
unique(t(apply(t(combn(1:length(v), 2)), 1, function(x) {v[x] <- v[rev(x)]; v})))
with output:
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
[1,] 1 1 1 2 2 2 3 3 3 # original one
[2,] 2 1 1 1 2 2 3 3 3 # swap 1st & 4th elements
[3,] 2 1 1 2 1 2 3 3 3 # swap 1st & 5th
[4,] 2 1 1 2 2 1 3 3 3 # ...
[5,] 3 1 1 2 2 2 1 3 3
[6,] 3 1 1 2 2 2 3 1 3
[7,] 3 1 1 2 2 2 3 3 1
[8,] 1 2 1 1 2 2 3 3 3
[9,] 1 2 1 2 1 2 3 3 3
[10,] 1 2 1 2 2 1 3 3 3
[11,] 1 3 1 2 2 2 1 3 3
[12,] 1 3 1 2 2 2 3 1 3
[13,] 1 3 1 2 2 2 3 3 1
[14,] 1 1 2 1 2 2 3 3 3
[15,] 1 1 2 2 1 2 3 3 3
[16,] 1 1 2 2 2 1 3 3 3
[17,] 1 1 3 2 2 2 1 3 3
[18,] 1 1 3 2 2 2 3 1 3
[19,] 1 1 3 2 2 2 3 3 1
[20,] 1 1 1 3 2 2 2 3 3
[21,] 1 1 1 3 2 2 3 2 3
[22,] 1 1 1 3 2 2 3 3 2
[23,] 1 1 1 2 3 2 2 3 3
[24,] 1 1 1 2 3 2 3 2 3
[25,] 1 1 1 2 3 2 3 3 2
[26,] 1 1 1 2 2 3 2 3 3
[27,] 1 1 1 2 2 3 3 2 3
[28,] 1 1 1 2 2 3 3 3 2 # swap 6th & 9th
First, we can use the utils function combn to generate all of the possible combinations of pairs of positions to swap. Here, I am assuming you don't want to swap the same number (e.g. 1 and 1), so am checking them to make sure they are different values:
allCombo <-
combn(1:length(startVec), 2)
toKeep <- apply(allCombo, 2, function(x) {
startVec[x[1]] != startVec[x[2]]
})
Then, apply along those that you are keeping, and swap the positions.
outVecs <- apply(allCombo[ , toKeep], 2, function(x){
temp <- startVec
temp[x] <- startVec[rev(x)]
return(temp)
})
This returns as a vector, but you can convert it to a list, which may be easier to manage, like so:
outVecsInList <-
as.list(as.data.frame(outVecs))
head(outVecsInList) shows:
$V1
[1] 2 1 1 1 2 2 3 3 3
$V2
[1] 2 1 1 2 1 2 3 3 3
$V3
[1] 2 1 1 2 2 1 3 3 3
$V4
[1] 3 1 1 2 2 2 1 3 3
$V5
[1] 3 1 1 2 2 2 3 1 3
$V6
[1] 3 1 1 2 2 2 3 3 1
My sample data looks like this
DF
n a b c d
1 NA NA NA NA
2 1 2 3 4
3 5 6 7 8
4 9 NA 11 12
5 NA NA NA NA
6 4 5 6 NA
7 8 9 10 11
8 12 13 15 16
9 NA NA NA NA
I need to substract row 2 from row 3 and row 4.
Similarly i need to subtract row 6 from row 7 and row 8
My real data is huge, is there a way of doing it automatically. It seems it could be some for loop but as I am dummy R user my trials were not successful.
Thank you for any help and tips.
UPDATE
I want to achieve something like this
DF2
rowN1<-DF$row3-DF$row2
rowN2<-DF$row4-DF$row2
rowN3<-DF$row7-DF$row6 # there is NA in row 6 so after subtracting there should be NA also
rowN4<-DF$row8-DF$row6
Here's one idea
set.seed(1)
(m <- matrix(sample(c(1:9, NA), 60, T), ncol=5))
# [,1] [,2] [,3] [,4] [,5]
# [1,] 3 7 3 8 8
# [2,] 4 4 4 2 7
# [3,] 6 8 1 8 5
# [4,] NA 5 4 5 9
# [5,] 3 8 9 9 5
# [6,] 9 NA 4 7 3
# [7,] NA 4 5 8 1
# [8,] 7 8 6 6 1
# [9,] 7 NA 5 6 4
# [10,] 1 3 2 8 6
# [11,] 3 7 9 1 7
# [12,] 2 2 7 5 5
idx <- seq(2, nrow(m)-2, 4)
do.call(rbind, lapply(idx, function(x) {
rbind(m[x+1, ]-m[x, ], m[x+2, ]-m[x, ])
}))
# [1,] 2 4 -3 6 -2
# [2,] NA 1 0 3 2
# [3,] NA NA 1 1 -2
# [4,] -2 NA 2 -1 -2
# [5,] 2 4 7 -7 1
# [6,] 1 -1 5 -3 -1
I have a list which contains vectors that I would like to export as a single .csv file containing all vectors as named colums.
For instance, if I have, simply, four vectors containing ten items from hypothetical cluster analyses of four models containing a variable number data points created by
veglist=list.files(pattern="TXT") #create list of files
veg=lapply(veglist,read.csv,header=T,row.names=1) #read list of files
vegbc=lapply(veg,vegdist,method="bray") #create dissimilarity matrix from each file
av=lapply(vegbc,agnes,method="average") #do clustering analysis with each dissimilarity mat
av2=lapply(av,cutree,k=2) #cut the hierarchical analysis at 2 groups level
when I type in fix(av2) I would see:
list(c(1,1,1,1,1,1,2,2,2,2,2,2),c(1,1,1,1,1,2,2,2,2,2),c(1,1,1,2,1,2,2,2,2,2),c(1,1,1,1,2,1,2,2,2,2,2,2,2))
If I type in av2 I see
[[1]]
[1] 1 1 1 1 1 1 2 2 2 2 2 2
[[2]]
[1] 1 1 1 1 1 2 2 2 2 2
[[3]]
[1] 1 1 1 2 1 2 2 2 2 2
[[4]]
[1] 1 1 1 1 2 1 2 2 2 2 2 2 2
I have tried following this example How to read every .csv file in R and export them into single large file. This did not work.
I think the underlying problem is that my vectors are not the same size. What I want to do is output the vectors into a single table that looks something like:
a b c d
1 1 1 1
1 1 1 1
1 1 1 1
1 1 2 1
1 1 1 1
1 2 2 2
2 2 2 2
2 2 2 2
2 2 2 2
2 2
2 2
2
Where a,b,c,d are in place of my actual names. Preferably it would look prettier than this, but I could work with it.
I apologize for the very long question, but I was trying to provide enough of an example to go by. I am also sorry if this has a very easy answer, but I am not yet good with R. Thanks in advance.
Here is one way you can do:
l <- list(c(1,1,1,1,1,1,2,2,2,2,2,2),c(1,1,1,1,1,2,2,2,2,2),c(1,1,1,2,1,2,2,2,2,2),c(1,1,1,1,2,1,2,2,2,2,2,2,2))
maxlength <- max(sapply(l, length))
df <- data.frame(sapply(l, function(x) c(x, rep(NA, (maxlength - length(x))))))
df
X1 X2 X3 X4
1 1 1 1 1
2 1 1 1 1
3 1 1 1 1
4 1 1 2 1
5 1 1 1 2
6 1 2 2 1
7 2 2 2 2
8 2 2 2 2
9 2 2 2 2
10 2 2 2 2
11 2 NA NA 2
12 2 NA NA 2
13 NA NA NA 2
You would first need to extend each vector to the length of the maximum length-ed vector and then you could cbind them together so that write.csv would send them out as "columns":
> maxlength <- max(sapply(l, length))
> mat <- cbind(sapply(l, `length<-`, maxlength))
> mat
[,1] [,2] [,3] [,4]
[1,] 1 1 1 1
[2,] 1 1 1 1
[3,] 1 1 1 1
[4,] 1 1 2 1
[5,] 1 1 1 2
[6,] 1 2 2 1
[7,] 2 2 2 2
[8,] 2 2 2 2
[9,] 2 2 2 2
[10,] 2 2 2 2
[11,] 2 NA NA 2
[12,] 2 NA NA 2
[13,] NA NA NA 2
> write.csv(mat, file="mycsv.csv")
Which looks like this in a text editor (and would get imported into Excel properly.):
"","V1","V2","V3","V4"
"1",1,1,1,1
"2",1,1,1,1
"3",1,1,1,1
"4",1,1,2,1
"5",1,1,1,2
"6",1,2,2,1
"7",2,2,2,2
"8",2,2,2,2
"9",2,2,2,2
"10",2,2,2,2
"11",2,NA,NA,2
"12",2,NA,NA,2
"13",NA,NA,NA,2
This can be done with stri_list2matrix from stringi
library(stringi)
m1 <- stri_list2matrix(l)
mode(m1) <- "integer"
m1
# [,1] [,2] [,3] [,4]
# [1,] 1 1 1 1
# [2,] 1 1 1 1
# [3,] 1 1 1 1
# [4,] 1 1 2 1
# [5,] 1 1 1 2
# [6,] 1 2 2 1
# [7,] 2 2 2 2
# [8,] 2 2 2 2
# [9,] 2 2 2 2
#[10,] 2 2 2 2
#[11,] 2 NA NA 2
#[12,] 2 NA NA 2
#[13,] NA NA NA 2