R pivot table with multiple column levels - r

I would be grateful if anyone could tell me how to create pivot table in R like python pandas with selected aggregation function and more then one level in column.
I would like to receive in R something like this in python:
Iris.pivot_table(index='Sepal.Length',columns=['Sepal.Width','Species'],values='Petal.Length',aggfunc=sum)
I know there is pivotabler package, but default rendering to html method is to slow for a bit larger tables.
I also have found ftable function from stats package but its only for contingency tables, in which I can`t specify my own aggregation function.
Thank you.

Related

Data structure and package for a radial dendrogram in R

I'd like to create a radial dendrogram in R, but being new to the software, I don't know if I chose the correct data structure and package.
I've created a YAML file that looks as follows:
Data structure
I know the exact hierachy of the languages, but I need R to calculate x and y values. I'd use hclust for that, I think?
I found this instruction here for example: https://stats.stackexchange.com/questions/4062/how-to-plot-a-fan-polar-dendrogram-in-r, but it uses the mtcars dataset. I'd just like to know whether it makes sense to set up my data as above or whether I should use a different structure. When I try to import the datasets I get an error message saying I've got more columns than column headers so I must be doing something wrong.

Convert a database from MongoDB to a R data frame using Rmongo

I am trying to obtain a database that comes from Mongo DB to R, so I can make anlaysis on it. The bridge between these two is a R package: Rmongo.
As I have some policy rules, I cannot show you the dataset and my output, so I will try to explain as best as possible.
My two first commands, after installing the package, are these ones:
mg1 <- mongoDbConnect("test", "localhost", 27018)
dbShowCollections(mg1)
Which works, as it shows the collection, or the different variables.
Then, I can use the commands made by the Rmongo package, meaning:
query = dbGetQuery(mg1, 'address_history','{}')
This normally returns a data frame with all the variables on each column. But, because it is a nested file, I only get the first three variables (out of around fifty) because they are at the top of the nest. For the rest, I get one column of the data frame with the json code (so of approximately 50 variables) that I cannot seem to turn in a data frame. If someone is familiar with that, please help me.
I already saw on Stack Overflow a way to do it manually thanks to gsub, and in general pattern with the code, but this code is dissimilar, and doing it manually will not make it work.
Furthermore, there is also another command via the Rmongo package:
query2 = dbGetQueryForKeys(mg1, 'address_history', '{}', '{address:1}')
where I can return the variable that I want. Unfortunately, because this is a nested file, it also cannot find the variables that are not in the top of the nest.
Is there another command or another package that I can use? I am open to any other opportunity to get this dataset (very large) into an R data frame, so I can make any inferences.
Thank you very much!
I tried just now setting up Rmongo and mongolite for R. I got mongolite working in minutes with the starter data locally . I could not get even get the data I wanted inserted using Rmongo.
I think if you try installing mongolite you will find their documentation and package simpler. https://github.com/jeroen/mongolite

rangedummarizedexperiment for deseq2

I'm trying to use the DESeq2 package in R for differential gene expression, but I'm having trouble creating the required RangedSummarizedExperiment object from my input data. I have found several tutorials and vignettes for doing this, but they all seem to apply to a raw data set that is different from mine. My data has gene names as row names and patient id as column names, and the data is simply integer count data. There has to be a simple way to create the RangedSummarizedExperiment object from this type of input data, but I haven't yet found a way. Can anybody help? Thanks.
I had a similar problem understanding how to use this data structure. I eventually managed to do without it by using DESeqDataSetFromMatrix. You can see an example in the first code block of Modify r object with rpy2 (this code is pure R, rpy2 stuff comes after). In this example, I have genes as rows and samples as columns, so it is likely you will be able to adopt the same approach.

The Most Convenient Way to Insert Multiple Histograms within Latex-based Summary Statistics Tables (R)

I was wondering what is the most convenient way to create in R a summary statistics table (in Latex) with one column being graphic presentation of the distribution of that variable.
More specifically, I was looking for something like this:
Thanks!
I think you can do something like this with the sparkTable package. That package should give you TeX output.

reaching max.print on R

I just found a bunch of weather data that I would like to play around with in glmnet in R. First I've been reading and organizing the data in R, and right now I am just trying to look at the raw data of each variable. Unfortunately, each variable has a lot of data and R isn't able to print it all. Is there a way I can view all the raw data in R or just in the file itself? I've tried opening the file in excel to no success. Thanks!
Try to use Frequency tables, you can group by segments.
str() , summary(), table(), pairs(), plots() etc. There are several libraries (such as decr) which facilitate analyzing numerical and factor levels. Let me know if you need help with any.

Resources