I'm new to R and to programming in general. I have this data: screenshot
I have 12 'IDs' (research subjects), numbered 1-12. The 'types' column tells the 'type' of each ID. For example, the first 5 numbers of the 'types' column refer to the 'types' of first 5 IDs, i.e. 'types' of first 5 IDs are 3,3,2,1,1 respectively.
The 'pairs' column describes how IDs are paired together. For example, 6 is paired with 9; 4 is paired with 7; 1 is paired with 11 and so on.
So what I need help with is that I want to create three columns using this data.
first column: lists the ID (1-12)
second column: returns the ID of the pair (like 1 was paired with 11, so second column should say 11 for ID 1)
third column: tells the 'type' of the pair (so 'type' of 11 is 3. third column should display that.
Here's a visualization of the desired output format: output format
Any help would be much appreciated.
Thanks in advance!
You can do this with some clever indexing. I entered the raw data as a vector for types, and a list of vectors for pairs:
# Enter the raw data
type <- c(3, 3, 2, 1, 1, 1, 2, 3, 1, 1, 3, 1)
pairs <- list(c(6, 9), c(4, 7), c(1, 11), c(3, 10), c(2, 12), c(5, 8))
From this, you can create the first two columns of the desired output by stacking all of the pairs once in their original order, and then again in the reverse order. (I reversed each pair by using lapply(pairs, rev), which applies the rev command to each pair in the list.)
# Create a 12 x 2 matrix of the pairs
pairs.mat <- do.call(rbind, c(pairs, lapply(pairs, rev)))
pairs.mat
# [,1] [,2]
# [1,] 6 9
# [2,] 4 7
# [3,] 1 11
# [4,] 3 10
# [5,] 2 12
# [6,] 5 8
# [7,] 9 6
# [8,] 7 4
# [9,] 11 1
# [10,] 10 3
# [11,] 12 2
# [12,] 8 5
For cleanliness of results, I converted this into a data.frame:
# Convert to data frame
colnames(pairs.mat) <- c("id", "match")
df <- as.data.frame(pairs.mat)
Finally, we can get the type_match column by taking type in the order of the match column from the data.frame we just created.
# Add in the type_match column
df$type_match <- type[df$match]
# Print results in order
df[order(df$id), ]
# id match type_match
# 3 1 11 3
# 5 2 12 1
# 4 3 10 1
# 2 4 7 2
# 6 5 8 3
# 1 6 9 1
# 8 7 4 1
# 12 8 5 1
# 7 9 6 1
# 10 10 3 2
# 9 11 1 3
# 11 12 2 3
And that should give you the desired output.
Related
I'm currently working with a large matrix of two columns, and what I want to check is If every line/combination (two columns) is also present in a dataframe loaded (two columns as well).
Example,
(obj_design <- matrix(c(2,5,4,7,6,6,20,12,4,0), nrow = 5, ncol = 2))
[,1] [,2]
[1,] 2 6
[2,] 5 20
[3,] 4 12
[4,] 7 4
[5,] 6 0
(refined_grid <- data.frame(i=1:4, j=1:12))
i j
1 1 1
2 2 2
3 3 3
4 4 4
5 1 5
6 2 6
7 3 7
8 4 8
9 1 9
10 2 10
11 3 11
12 4 12
Following the reproducible example, it would be selected (2,6) and (4,12).
I'm wondering if there's a function that I can use to check the whole matrix, and see if a specific line is in the dataframe, and (if possible) write separately (new dataset) which elements of the matrix it is in.
Any assistance would be wonderful.
Here is an option with match
i1 <- match(do.call(paste, as.data.frame(obj_design)),
do.call(paste, refined_grid), nomatch = 0)
refined_grid[i1,]
This code will give you which rows of the matrix exist in the dataframe.
which(paste(obj_design[,1], obj_design[,2]) %in%
paste(refined_grid$i, refined_grid$j)
)
Then you can just assign it to a vector!
Lets say I have 2 separate data frames, one with 10 rows of data, and one with 5 rows of data. Let's say I want to replace the last 5 rows of data in a specific column of dataframe 1 with the values in a specific column of data frame 2. How would I go about doing this?
For simplicity, let's say there are two dataframes of just 1 column in this example
vect1<- c(1:10)
vect2<- c(11:15)
as.data.frame(vect1)
as.data.frame(vect2)
How would I go about replacing the last 5 values in vector 1 with the 5 values in vector 2? So the output would be 1, 2, 3, 4, 5, 11, 12, 13, 14, 15. Any help is greatly appreciated!
Does it work for you?
> vect1_df <- data.frame(vect1)
> vect2_df <- data.frame(vect2)
> vect1_df$vect1[6:10] <- vect2_df$vect2
> vect1_df
vect1
1 1
2 2
3 3
4 4
5 5
6 11
7 12
8 13
9 14
10 15
vect1<- c(1:10)
vect2<- c(11:15)
vect1[6:10] = vect2
This would make vect1's last 5 digit be replaced with vect2
> vect1
[1] 1 2 3 4 5 11 12 13 14 15
We can use replace while specifying the row index with tail to replace those elements with the second vector
vec1new <- replace(vect1, tail(seq_along(vect1), 5), vect2)
vec1new
#[1] 1 2 3 4 5 11 12 13 14 15
I would like to list all unique combinations of vectors of length 3 where each element of the vector can range between 1 to 9.
First I list all such combinations:
df <- expand.grid(1:9, 1:9, 1:9)
Then I would like to remove the rows that contain repetitions.
For example:
1 1 9
9 1 1
1 9 1
should only be included once.
In other words if two lines have the same numbers and the same number of each number then it should only be included once.
Note that
8 8 8 or
9 9 9 is fine as long as it only appears once.
Based on your approach and the idea to remove repetitions:
df <- expand.grid(1:2, 1:2, 1:2)
# Var1 Var2 Var3
# 1 1 1 1
# 2 2 1 1
# 3 1 2 1
# 4 2 2 1
# 5 1 1 2
# 6 2 1 2
# 7 1 2 2
# 8 2 2 2
df2 <- unique(t(apply(df, 1, sort))) #class matrix
# [,1] [,2] [,3]
# [1,] 1 1 1
# [2,] 1 1 2
# [3,] 1 2 2
# [4,] 2 2 2
df2 <- as.data.frame(df2) #class data.frame
There are probably more efficient methods, but if I understand you correct, that is the result you want.
Maybe something like this (since your data frame is not large, so it does not pain!):
len <- apply(df,1,function(x) length(unique(x)))
res <- rbind(df[len!=2,], df[unique(apply(df[len==2,],1,prod)),])
Here is what is done:
Get the number of unique elements per row
Comprises two steps:
First argument of rbind: Those with length either 1 (e.g. 1 1 1, 7 7 7, etc) or 3 (e.g. 5 8 7, 2 4 9, etc) are included in the final results res.
Second argument of rbind: For those in which the number of unique elements are 2 (e.g. 1 1 9, 3 5 3, etc), we apply product per row and take whose unique products (cause, for example, the product of 3 3 5 and 3 5 3 and 5 3 3 are the same)
I have 6 digits (1, 2, 3, 4, 5, 6), and I need to create all possible combinations (i.e. 6*5*4*3*2*1 = 720 combinations) in which no number can be used twice and O is not allowed. I would like to obtain combinations like: 123456, 246135, 314256, etc.
Is there a way to create them with Matlab or R? Thank you.
In Matlab you can use
y = perms(1:6);
This gives a numerical 720×6 array y, where each row is a permutation:
y =
6 5 4 3 2 1
6 5 4 3 1 2
6 5 4 2 3 1
6 5 4 2 1 3
6 5 4 1 2 3
···
If you want the result as a char array:
y = char(perms(1:6)+'0');
which produces
y =
654321
654312
654231
654213
654123
···
In R:
library(combinat)
p <- permn(1:6)
gives you a list; do.call(rbind, p) or matrix(unlist(p), ncol=6, byrow=TRUE) will give a numeric array; sapply(p,paste,collapse="") gives a vector of strings.
Here's a base R 'solution':
p <- unique(t(replicate(100000, sample(6,6), simplify="vector")))
nrow(p)
#> [1] 720
head(p)
#> [,1] [,2] [,3] [,4] [,5] [,6]
#> [1,] 3 5 4 2 1 6
#> [2,] 6 3 5 4 1 2
#> [3,] 5 1 6 2 3 4
#> [4,] 6 5 3 2 4 1
#> [5,] 5 2 3 6 4 1
#> [6,] 1 4 2 5 6 3
It's a hack of course, and this potentially only applies to the example given, but sometimes it's useful to do things in silly ways... this takes an excessive number of samples (without replacement) of the vector 1:6, then removes any duplicates. It does indeed produce the unique 720 results, but they're not sorted.
A base R approach is
x <- do.call(expand.grid, rep(list(1:6), 6))
x <- x[apply(x, MAR = 1, function(x) length(unique(x)) == 6), ]
which creates a matrix with 6^6 rows, then retains only rows that contain all 6 numbers.
df <- data.frame(DAY = character(), ID = character())
I'm running a (for i in DAYS[i]) and get IDs for each day and storing them in a data frame
df <- rbind(df, data.frame(ID = IDs))
I want to add the DAY[i] in a second column across each row in a loop.
How do I do that?
As #Pascal says, this isn't the best way to create a data frame in R. R is a vectorised language, so generally you don't need for loops.
I'm assuming each ID is unique, so you can create a vector of IDs from 1 to 10:
ID <- 1:10
Then, you need a vector for your DAYs which can be the same length as your IDs, or can be recycled (i.e. if you only have a certain number of days that are repeated in the same order you can have a smaller vector that's reused). Use c() to create a vector with more than one value:
DAY <- c(1, 2, 9, 4, 4)
df <- data.frame(ID, DAY)
df
# ID DAY
# 1 1 1
# 2 2 2
# 3 3 9
# 4 4 4
# 5 5 4
# 6 6 1
# 7 7 2
# 8 8 9
# 9 9 4
# 10 10 4
Or with a vector for DAY that includes unique values:
DAY <- sample(1:100, 10, replace = TRUE)
df <- data.frame(ID, DAY)
df
# ID DAY
# 1 1 61
# 2 2 30
# 3 3 32
# 4 4 97
# 5 5 32
# 6 6 74
# 7 7 97
# 8 8 73
# 9 9 16
# 10 10 98