How to separate a vector every fifth element? - r

i have a vector
a<-as.vector(diag(5))
How to separate this vector every 5 numbers and create a data.frame by joining each in a row?
my idea is to do this
https://imgur.com/X7JLMYH
one column, each row of that column as if it were diag (5). Each line will identify a different object, so you need to follow the image order.
length must equal number of numbers within each line

We can use matrix (as the length is already a multiple of 5) and then wrap with as.data.frame
as.data.frame(matrix(a, ncol = 5, byrow = TRUE))
If we want as a single column of strings, can paste each row to create that single column data
data.frame(col1 = do.call(paste, as.data.frame(matrix(a, ncol = 5,
byrow = TRUE))))
Or place it as a list column
data.frame(col1 = I(asplit(matrix(a, ncol = 5, byrow = TRUE), 1)))

Related

Comparing each row of one dataframe with a row in another dataframe using R

I'm relatively new to R and I have looked for an answer for my problem but didn't find one. I want to compare two dataframes.
library(dplyr)
library(gtools)
v1 <- LETTERS[1:10]
combinations_from_4_letters <- (as.data.frame(combinations(n = 10, r = 4, v = v1),
stringsAsFactors = FALSE))
combinations_from_4_letters$group <- rep(1:15, each = 14)
combinations_from_2_letters <- (as.data.frame(combinations(n = 10, r = 2, v = v1),
stringsAsFactors = FALSE))
Dataframe 'combinations_from_4_letters' contains all combinations that can be made from 10 letters without repetitions and permutations. The combinations are binned into groups from 1-15. I want to find out how often pairs of the 10 letters (saved in dataframe 'combinations_from_2_letters') are found in each group (basically a frequency table). I started doing a complicated loop looping through both dataframes but I think there must be a more 'R' solution to it, similar to comparing a dataframe and a vector like:
combinations_from_4_letters %in% combinations_from_2_letters[i,])
Thank you in advance for your help!
I recommend an approach like the following:
# adding dummy column for a complete cross-join
combinations_from_4_letters = combinations_from_4_letters %>%
mutate(ones = 1)
combinations_from_2_letters = combinations_from_2_letters %>%
mutate(ones = 1)
joined = combinations_from_2_letters %>%
inner_join(combinations_from_4_letters, by = "ones") %>%
# comparison goes here
mutate(within = ifelse(comb2 %in% comb4, 1, 0)) %>%
group_by(comb2) %>%
summarise(freq = sum(within))
You'll probably need to modify to ensure it matches the exact column names and your comparison condition.
Key ideas:
adding filler column so we have a complete cross-join
mutate a new indicator column for whether the two letter pair is within the four letter pair
sum indicators on the two letter pair

R Extract columns in list of dataframes into lists

I have a very specific problem. I have a list of dataframes:
AB_df = data.frame(replicate(2,sample(0:130,201,rep=TRUE)))
BC_df = data.frame(replicate(2,sample(0:130,200,rep=TRUE)))
DE_df = data.frame(replicate(2,sample(0:130,197,rep=TRUE)))
FG_df = data.frame(replicate(2,sample(0:130,203,rep=TRUE)))
AB_pc = data.frame(replicate(2,sample(0:130,201,rep=TRUE)))
BC_pc = data.frame(replicate(2,sample(0:130,200,rep=TRUE)))
DE_pc = data.frame(replicate(2,sample(0:130,197,rep=TRUE)))
FG_pc = data.frame(replicate(2,sample(0:130,203,rep=TRUE)))
df_list = list(AB_df, BC_df, DE_df, FG_df, AB_pc, BC_pc, DE_pc, FG_pc)
names(df_list) = c("AB_df", "BC_df", "DE_df", "FG_df", "AB_pc", "BC_pc", "DE_pc", "FG_pc")
I want to extract now the 1st column of every 2nd dataframe into a list called "picked" and the other dataframes into a list called "unpicked". I tried doing this with a loop and sequences. The sequences are giving me the correct list entries, but in my output lists I always only get the same entry. This is my try so far:
picked = list()
unpicked = list()
for (a in 1:(length(df_list)/2)) {
for (b in seq(1,length(df_list), by = 2)){
for (c in seq(2,length(df_list), by = 2)) {
picked[[a]] = df_list[[b]][[1]]
unpicked[[a]] = df_list[[c]][[1]]}}}
I think I am close, but something is still not right.
We can use lapply to select 1st column of every second dataframe (picked) and select the remaining dataframe in unpicked.
picked <- lapply(df_list[c(FALSE, TRUE)], `[`, 1)
unpicked <- lapply(df_list[c(TRUE, FALSE)], `[`, 1)
We use FALSE/TRUE to select alternate list elements. So here we select element 2, 4, 6 and 8 and with [ subset 1st columns from the data. For unpicked we select list element 1, 3, 5 and 7 and get their first column.
we can split to get a list of picked and unpicked
lst1 <- lapply(split(seq_along(df_list), seq_along(df_list) %%2),
function(i) df_list[i][1])

Trying to compare two dataframes, and writing a logical result to a new dataframe in R

I have an R dataframe that contains 18 columns, I would like to write a function that compares column 1 to column 2, and if both columns contain the same value, a logical result of T or F is written to a new column (this part is not too hard for me), however I would like to repeat this process over for the next columns and write T/F to a new column.
values col 1 = values col 2, write T/F to new column, values col 3 = values col 4, write T/F to a new column (or write results to a new dataframe)
I have been trying to do this with the purrr package, and use the pmap/map function, but I know I am making a mistake and missing some important part.
This function should work if I understand your problem correctly.
df <-
data.frame(a = c(18, 6, 2 ,0),
b = c(0, 6, 2, 18),
c = c(1, 5, 6, 8),
d = c(3, 5, 9, 2))
compare_columns <-
function(x){
n_columns <- ncol(x)
odd_columns <- 2*1:(n_columns/2) - 1
even_columns <- 2*1:(n_columns/2)
comparisons_list <-
lapply(seq_len(n_columns/2),
function(y){
df[, odd_columns[y]] == df[, even_columns[y]]
})
comparisons_df <-
as.data.frame(comparisons_list,
col.names = paste0("column", odd_columns, "_column", even_columns))
return(cbind(x, comparisons_df))
}
compare_columns(df)

Create data frame and specify row/column names in a single operation

From an old R thread captured in nabble the indication is that three separate operations are required to obtain the result described in the title of this post http://r.789695.n4.nabble.com/To-give-column-names-of-a-data-frame-td2249996.html:
results <- data.frame(matrix(c(1,2,3,4),nrow=2,ncol=2))
rownames(results) <- c("a","b")
colnames(results) <- c("c","d")
Can these be collapsed into a single operation?
We can use setnames and row.names to set them in one-line
setNames(data.frame(matrix(c(1,2,3,4),nrow=2,ncol=2), row.names=c("a","b")), c("c", "d"))
# c d
#a 1 3
#b 2 4
You can use the option dimnames which is part of the matrix function. The first part of dimnames are the row names, the second part the column names.
data.frame(matrix(c(1,2,3,4),nrow = 2, ncol = 2, dimnames = list(c("a","b"), c("c","d")))
The difference between matrix(c(1,2,3,4),nrow = 2, ncol = 2, dimnames = list(c("a","b"), c("c","d"))) and the previous line is that the matrix call will give you a matrix with a dimnnames attribute. The data.frame line transforms the matrix into a data.frame with row names and column headers.

Is there an easy way to extract specific combinations of values from a list?

I have several (named) vectors in a list:
data = list(a=runif(n = 50, min = 1, max = 10), b=runif(n = 50, min = 1, max = 10), c=runif(n = 50, min = 1, max = 10), d=runif(n = 50, min = 1, max = 10))
I want to play around with different combinations of them depending on the row from another array called combs:
var <- letters[1:length(data)]
combs <- do.call(expand.grid, lapply(var, function(x) c("", x)))[-1,]
I would like to be able to extract each combination so that I can use the vectors created by these combinations.
All this is to be able to apply functions to each row extracted, and then to each combinations of these dataframes. So for example:
# Row 5 is "a", "c"
combs[5,]
# Use this information to extract this particular combination from my data:
# by hand it would be:
res_row5 = cbind(data[["a"]], data[["c"]])
# Extract another combination
# Row 11 is "a", "b", "d"
combs[11,]
res_row11 = cbind(data[["a"]], data[["b"]], data[["d"]])
# So that I can apply functions to each row across all these vectors
res_row_5_func = apply(res_row5, 1, sum)
# Apply another function to res_row11
res_row_5_func = apply(res_row11, 1, prod)
# Multiply the two, do other computations which can do as long as I have extracted the right vectors
I had already asked a very similar question here: Is there an easy way to match values of a list to array in R?
But can't figure out how to extract the actual data...
Thanks so much!
What you could do is first generate a list of vectors indexing the relevant entries in data:
library(magrittr)
combList <- lapply(1:nrow(combs), function(ii) combs[ii,] %>% unlist %>% setdiff(""))
You could then use this list to index the columns in data and generate a new list of the desired matrices:
dataMatrixList <- lapply(combList, function(indVec) data[indVec] %>% do.call('cbind', .))
The i-th entry in your dataMatrixList the contains a matrix with columns corresponding to the i-th row in combs. You can then compute sums, products etc. using
rowSumsList <- lapply(dataMatrixList, function(x) apply(x, 1, sum))
This would be another approach, that I think gives what you want? it will return a list of your dataframes by subsetting your data list by the (non-empty) elements of each row of combs:
data_sets <- apply(combs,
1,
function(x) do.call(cbind.data.frame, data[unlist(x[x!=''])])
)

Resources