list to dataframe without unique columns - r

I have this loop to generate some values
for (j in 1:2) {
table <- rep(data.frame(
matrix(c(letters[1:2],
sample(c(rep(1,100),0), size = 1),
sample(c(rep(0,100),1), size = 1)), ncol = 2) ), j)
}
I would like to get this output like this
X1 X2
a 1
b 0
a 1
b 1
To get table of letters with one column and numbers in second column
I tried
do.call(rbind, table)
data.frame(matrix(unlist(table), nrow=length(table), byrow=TRUE))
But I am not able to get values to right column in data table.

The table is getting updated in each of the iteration. Instead, we may use replicate to create a list
lst1 <- replicate(2, data.frame(
matrix(c(letters[1:2],
sample(c(rep(1,100),0), size = 1),
sample(c(rep(0,100),1), size = 1)), ncol = 2) ), simplify = FALSE)
do.call(rbind, lst1)

Related

How to rename the columns that are similar to current one plus one?

Could you help me with this problem: I have a dataset where columns are numeric values.
Some of the columns are sequencial. I would like to rename those sequencial column in the same name as from the column from where the sequence started.
Here a similar dataset to this example one:
fake_dataset <- data.frame(sample = paste0("sample_", sample(1:100, replace = T)),
"1678.47647" = runif(100, 1, 2),
"1679.84733" = runif(100, 1, 3),
"1680.87487" = runif(100, 2, 4),
"1800.35463" = runif(100, 1, 2),
"1811.47463" = runif(100, 2, 3),
"1823.52342" = runif(100, 2, 5)
)
colnames(fake_dataset) <- c("sample",
"1678.47647",
"1679.84733",
"1680.87487",
"1800.35463",
"1811.47463",
"1823.52342")
fake_dataset$sample <- NULL
My logic was to rename the column name value of the next sequencial column to the same name as the previous one, like this:
test <- function(data){
new_names <- c()
counter <- 0
for (i in as.integer(colnames(fake_dataset))){
counter <- counter + 1
if(as.character( as.integer( names( data[counter] ) )) == as.character( as.integer( names( data[counter] ) )+1) ) {
print("same!\n")
colname( data[, counter]) <- colnames( data[, counter + 1])
}else{
print("different!\n")
}
}
}
But I haven't managed yet. Could anyone help?
Thank you for you time.
We may convert the colnames to integer, get the difference between adjacent elements to create a grouping variable, use that in ave to select the first element of the vector and assign it back as column names.
v1 <- as.integer(colnames(fake_dataset))
grp <- cumsum(c(TRUE, diff(v1) != 1))
new <- ave(v1, grp, FUN = function(x) x[1])
colnames(fake_dataset) <- new
-output
> colnames(fake_dataset)
[1] "1678" "1678" "1678" "1800" "1811" "1823"
NOTE: data.frame/tibble/data.table doesn't support duplicate column names. It would be changed to unique values in subsequent transformations by using make.unique i.e. adding .1, .2 for duplicates. However, for a matrix the duplicate column names are allowed

Optimise row wise matrix comparison in R

I've googled extensively and can't seem to find an answer to my problem. Apologies if this has been asked before. I have two matrices, a & b, each with the same dimensions. What I am trying to do is iterate over the rows of a (from i = 1 to number of rows in a) and check if any elements found in row i of matrix a appear in the corresponding row in matrix b. I have a solution using sapply but this becomes quite slow with very large matrices. I wondered if it is possible to vectorise my solution somehow? Examples below:
# create example matrices
a = matrix(
1:9,
nrow = 3
)
b = matrix(
4:12,
nrow = 3
)
# iterate over rows in a....
# returns TRUE for each row of a where any element in ith row is found in the corresponding row i of matrix b
sapply(1:nrow(a), function(x){ any(a[x,] %in% b[x,])})
# however, for large matrices this performs quite poorly. is it possible to vectorise?
a = matrix(
runif(14000000),
nrow = 7000000
)
b = matrix(
runif(14000000),
nrow = 7000000
)
system.time({
sapply(1:nrow(a), function(x){ any(a[x,] %in% b[x,])})
})
Use apply to find any 0 differences:
a <- sample(1:3, 9, replace = TRUE)
b <- sample(1:3, 9, replace = TRUE)
a <- matrix(a, ncol = 3)
b <- matrix(b, ncol = 3)
diff <- (a - b)
apply(diff, 1, function(x) which(x == 0)) # actual indexes = 0
apply(diff, 1, function(x) any(x == 0)) # row check only
or
Maybe you can try intersect + asplit like below
lengths(Map(intersect, asplit(a, 1), asplit(b, 1))) > 0

How can I delete values by column in a data frame?

I need to take abundance values by column without zeros, by this reason I used an empty list and a loop (for loop). When I delete [i] in the first line of my loop I get the desired result only in the column of total values (sum by an object), but in the way in which I learn to write them, I only obtain an undesired result.
set.seed(1000)
df <- data.frame(Category = sample(LETTERS[1:10]),
Object = sample(letters[1:10]),
A = sample(0:20, 10, rep = TRUE),
B = sample(0:20, 10, rep = TRUE),
C = sample(0:20, 10, rep = TRUE))
sincero <- list()
for (i in colnames(df[ , 3:5])){
sincero[i] = df[df[ , i] != 0, ]
sincero
}
sincero

How to create a table from a data.frame where a cell could have multiple values using R

I have tried looking at cheatsheets and looking over other questions asked on here but have been unsuccessful in finding an answer.
I am using R and My data.frame looks like this:
I want to take the second column and make it the vertical categories and make the third column the horizontal categories. The first column would then be matched to the corresponding categories in its row.
Here is an example of how I want to format the table:
Is there a way write a code to do this in order to avoid using Excel and Word to create the table?
Try this:
df = data.frame(let = LETTERS[1:12],
vert = c(10, 10, 2.5, 5, 10, 5, 2.5, 10, 1.25, 1.25, 1.25, 1.25),
hor = c(2,2,3,2,4,2,3,4,1,4,4,1),
stringsAsFactors = F)
# find unique combinations
positions = expand.grid(unique(df$vert), unique(df$hor))
# pre-allocate matrix
M = matrix(ncol = length(unique(df$hor)),
nrow = length(unique(df$vert)))
rownames(M) <- sort(unique(df$vert))
colnames(M) <- sort(unique(df$hor))
# loop over valid positions and put them in the matrix
for (i in c(1:nrow(positions))){
# get row
row = as.numeric(positions[i,])
# gather all entries that go in position
valid = df[df$vert == row[1] & df$hor == row[2], 'let']
valid = paste(valid, collapse=",")
# get matrix indices
vert_i <- which(rownames(M) == row[1])
horiz_i <- which(colnames(M) == row[2])
# put the data in the matrix
M[vert_i, horiz_i] <- valid
}
print(M)
It could be more efficient but it gets the job done.
Here is one way:
library(tidyr)
library(dplyr)
# example data
exd <- data.frame(
Vertical = c(10,10,2.5,5,10,5,2.5,10,1.25,1.25,1.25, 1.25),
Horizontal = c(2,2,3,2,4,2,3,4,1,4,4,1),
row.names = LETTERS[1:12])
# move values into a column
exd <- mutate(exd,
Value = rownames(exd))
# aggregate by Vertical and Horizontal
exd <- summarize(group_by(exd, Vertical, Horizontal),
Value = paste(Value, collapse = ","))
# re-arrange into the desired form
spread(exd, Horizontal, Value)

Create a matrix from a list consisting of unequal matrices for individual bootstraps

I tried to create a matrix from a list which consists of N unequal matrices...
The reason to do this is to make R individual bootstrap samples.
In the example below you can find e.g. 2 companies, where we have 1 with 10 & 1 with just 5 observations.
Data:
set.seed(7)
Time <- c(10,5)
xv <- matrix(c(rnorm(10,5,2), rnorm(5,20,1), rnorm(10,5,2), rnorm(5,20,1)), ncol=2);
y <- matrix( c(rnorm(10,5,2), rnorm(5,20,1)));
z <- matrix(c(rnorm(10,5,2), rnorm(5,20,1), rnorm(10,5,2), rnorm(5,20,1)), ncol=2)
# create data frame of input variables which helps
# to conduct the rowise bootstrapping
data <- data.frame (y = y, xv = xv, z = z);
rows <- dim(data)[1];
cols <- dim(data)[2];
# create the index to sample from the different panels
cumTime <- c(0, cumsum (Time));
index <- findInterval (seq (1:rows), cumTime, left.open = TRUE);
# draw R individual bootstrap samples
bootList <- replicate(R = 5, list(), simplify=F);
bootList <- lapply (bootList, function(x) by (data, INDICES = index, FUN = function(x) dplyr::sample_n (tbl = x, size = dim(x)[1], replace = T)));
---------- UNLISTING ---------
Currently, I try do it incorrectly like this:
Example for just 1 entry of the list:
matrix(unlist(bootList[[1]], recursive = T), ncol = cols)
The desired output is just
bootList[[1]]
as a matrix.
Do you have an idea how to do this & if possible reasonably efficient?
The matrices are then processed in unfortunately slow MLE estimations...
i found a solution for you. From what i gather, you have a Dataframe containing all observations of all companies, which may have different panel lengths. And as a result you would like to have a Bootstap sample for each company of same size as the original panel length.
You mearly have to add a company indicator
data$company = c(rep(1, 10), rep(2, 5)) # this could even be a factor.
L1 = split(data, data$company)
L2 = lapply(L1, FUN = function(s) s[sample(x = 1:nrow(s), size = nrow(s), replace = TRUE),] )
stop here if you would like to have saperate bootstap samples e.g. in case you want to estimate seperately
bootdata = do.call(rbind, L2)
Best wishes,
Tim

Resources