Converting large matrix to dataframe

Converting large matrix to dataframe - r

I am sure there is a simple way to achieve this, but I cannot arrive at one based on existing questions. I have a matrix where although the View command outputs a dataframe-like feature, the structure is different. It consists of a series of lists with all the values being recognised within this as a separate numeric value.
Some example data where this data is outputted:
set.seed(24)
df1 <- as.data.frame(matrix(sample(0:20, 500*500, replace=TRUE), ncol=500))
colsums <- as.data.frame(t(colSums(df1)))
matrix <- matrix(colsums, nrow=2)
str(matrix)
I have tried as.data.frame and melt functions but they do not seem to help the problem.

One option is to unlist the whole thing and convert to a two-rowed matrix and then convert to a data.frame class, e.g.
df <- as.data.frame(matrix(unlist(matrix), nrow = 2))

Related

How can I get the column/variable names of a dataframe that fit certain parameters?

I came across a problem in my DataCamp exercise that basically asked "Remove the column names in this vector that are not factors." I know what they -wanted- me to do, and that was to simply do glimpse(df) and manually delete elements of the vector containing the column names, but that wasn't satisfying for me. I figured there was a simple way to store the column names of the dataframe that are factors into a vector. So, I tried two things that ended up working, but I worry they might be inefficient.
Example data Frame:
factorVar <- as.factor(LETTERS[1:10])
df1 <- data.frame(x = 1, y = 1:10, factorVar = sample(factorVar, 10))
My first solution was this:
vector1 <- names(select_if(df1, is.factor))
This worked, but select_if returns an entire tibble of a filtered dataframe and then gets the column names. Surely there's an easier way...
Next, I tried this:
vector2 <- colnames(df1)[sapply(df1,is.factor)]
This also worked, but I wanted to know if there's a quicker, more efficient way of filtering column names based on their type and then storing the results as a vector.

R populate list with samples

I have a numeric vector stock_data containing thousands of floating point numbers, I know i can sample them using
sample(stock_data, sample_size)
I want to take 100 different samples and populate them in a list of samples.
How do i do that without using a loop to append the samples to a list?
I thought of creating a list replicating the stock data 100 times then using lapply on them.
I tried:
all_repl <- as.list(rep(stock_data,100))
all_samples <- lapply(all_repl, sample, size=100)
But all_repl doesn't contain a list of data, it contains a single numeric vector which has replicated the data 100 times.
Can anyone suggest what's wrong and point out a better method to do what i want.

We can use replicate
replicate(100, sample(stock_data, sample_size))
Using simplify=FALSE get the output in a list. Using a reproducible example
replicate(5, sample(1:9, 5), simplify=FALSE)

R - creating dataframe from colMeans function

I've been trying to create a dataframe from my original dataframe, where rows in the new dataframe would represent mean of every 20 rows of the old dataframe. I discovered a function called colMeans, which does the job pretty well, the only problem, which still persists is how to change that vector of results back to dataframe, which can be further analysed.
my code for colMeans: (matrix1 in my original dataframe converted to matrix, this was the only way I managed to get it to work)
a<-colMeans(matrix(matrix1, nrow=20));
But here I get the numeric sequence, which has all the results concatenated in one single column(if I try for example as.data.frame(a)). How am I supposed to get this result back into dataframe where each column includes only the results for specific column name and not all the averages.
I hope my question is clear, thanks for help.

Based on the methods('as.data.frame'), as.data.frame.list is an option to convert each element of a vector to columns of a data.frame
as.data.frame.list(a)
data
m1 <- matrix(1:20, ncol=4, dimnames=list(NULL, paste0('V', 1:4)))
a <- colMeans(m1)

Subsetting every x amount of columns as separate sites

I need a function that recognises every x amount of columns as a separate site. So in df1 below there are 8 columns, with 4 sites each consisting of 2 variables. Previously, I have used a procedure like this as answered here Selecting column sequences and creating variables.
set.seed(24)
df1 <- as.data.frame(matrix(sample(0:20, 8*10, replace=TRUE), ncol=8))
I then need to calculate a column sum so that a total for each variable is obtained.
colsums <- as.data.frame(t(colSums(df1)))
I subsequently split the dataframe using this technique...
lst1 <- setNames(lapply(split(1:ncol(colsums), as.numeric(gl(ncol(colsums),
2, ncol(colsums)))), function(i) colsums[,i]), paste0('site', 1:4))
list2env(lst1, envir=.GlobalEnv)
And organise into one dataframe...
Combined <- as.matrix(mapply(c,site1,site2,site3,site4))
rownames(Combined) <- c("Site.1","Site.2","Site.3","Site.4")
Whilst this technique has been great on smaller dataframes, where there are a substantial amount of sites (>500) typing out each site following the mapply function takes up a lot of code and could lead to some sites getting missed off if I'm typing them all in manually. Is there an easy way to overcome this following the colsums stage?

A matrix is a vector with dimensions. Matrices are stored in column-major order in R.
The call matrix(colsums, nrow=2) should help you a lot.
NB.: Polluting the "global" environment is generally a bad idea.

Applying a function to a dataframe to trim empty columns within a list environment R

I am a naive user of R and am attempting to come to terms with the 'apply' series of functions which I now need to use due to the complexity of the data sets.
I have large, ragged, data frame that I wish to reshape before conducting a sequence of regression analyses. It is further complicated by having interlaced rows of descriptive data(characters).
My approach to date has been to use a factor to split the data frame into sets with equal row lengths (i.e. a list), then attempt to remove the trailing empty columns, make two new, matching lists, one of data and one of chars and then use reshape to produce a common column number, then recombine the sets in each list. e.g. a simplified example:
myDF <- as.data.frame(rbind(c("v1",as.character(1:10)),
c("v1",letters[1:10]),
c("v2",c(as.character(1:6),rep("",4))),
c("v2",c(letters[1:6], rep("",4)))))
myDF[,1] <- as.factor(myDF[,1])
myList <- split(myDF, myDF[,1])
myList[[1]]
I can remove the empty columns for an individual set and can split the data frame into two sets from the interlacing rows but have been stumped with the syntax in writing a function to apply the following function to the list - though 'lapply' with 'seq_along' should do it?
Thus for the individual set:
DF <- myList[[2]]
DF <- DF[,!sapply(DF, function(x) all(x==""))]
DF
(from an earlier answer to a similar, but simpler example on this site). I have a large data set and would like an elegant solution (I could use a loop but that would not use the capabilities of R effectively). Once I have done that I ought to be able to use the same rationale to reshape the frames and then recombine them.
regards
jac

Try
lapply(split(myDF, myDF$V1), function(x) x[!colSums(x=='')])

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Converting large matrix to dataframe - r

One option is to unlist the whole thing and convert to a two-rowed matrix and then convert to a data.frame class, e.g. df <- as.data.frame(matrix(unlist(matrix), nrow = 2))

Related

How can I get the column/variable names of a dataframe that fit certain parameters?

R populate list with samples

R - creating dataframe from colMeans function

Subsetting every x amount of columns as separate sites

Applying a function to a dataframe to trim empty columns within a list environment R

Categories

Resources