R summarytools(dfSummary) alphabetical order - r

(I'm new here so hopefully I am asking this in the correct way).
I am using Rmarkdown to create a summary table, and am using dfSummary within the summarytools package. I have created the table using this code:
Here is a picture of my current R-markdown:
I would like the order of the results for character variables to be listed in order of descending frequency, not alphabetical. I have read many pages summarize the package but cannot find an answer. Any help would be great!

There is no option to do that in summarytools directly. (You can open an issue on the GitHub project page for a feature request.) The ordering by frequency is done only when there are more distinct values than allowed by max.distinct.values.
The easiest way to proceed would be to recode your character variable with forcats::fct_infreq:
library(forcats)
df$var_f <- fct_infreq(df$var)
The order of factor levels will be preserved in the the results.

Related

Crosstalk and Aggregated Data

I am trying to create a flexdashboard that includes a Data Table along with Crosstalk Filters. My data includes similar data that I would like to aggregate to find the means of certain categories according to filter criteria. Basically, the table will be aggregated data.
It is my understanding that Crosstalk wasn't meant to aggregate data, but does anyone know a way around this? I'm willing to look at any way possible.
Thank you!
You are looking for the summaryWidget package
link to github page

R pivot table with multiple column levels

I would be grateful if anyone could tell me how to create pivot table in R like python pandas with selected aggregation function and more then one level in column.
I would like to receive in R something like this in python:
Iris.pivot_table(index='Sepal.Length',columns=['Sepal.Width','Species'],values='Petal.Length',aggfunc=sum)
I know there is pivotabler package, but default rendering to html method is to slow for a bit larger tables.
I also have found ftable function from stats package but its only for contingency tables, in which I can`t specify my own aggregation function.
Thank you.

rangedummarizedexperiment for deseq2

I'm trying to use the DESeq2 package in R for differential gene expression, but I'm having trouble creating the required RangedSummarizedExperiment object from my input data. I have found several tutorials and vignettes for doing this, but they all seem to apply to a raw data set that is different from mine. My data has gene names as row names and patient id as column names, and the data is simply integer count data. There has to be a simple way to create the RangedSummarizedExperiment object from this type of input data, but I haven't yet found a way. Can anybody help? Thanks.
I had a similar problem understanding how to use this data structure. I eventually managed to do without it by using DESeqDataSetFromMatrix. You can see an example in the first code block of Modify r object with rpy2 (this code is pure R, rpy2 stuff comes after). In this example, I have genes as rows and samples as columns, so it is likely you will be able to adopt the same approach.

R plot data.frame to get more effective overview of data

At work when I want to understand a dataset (I work with portfolio data in life insurance), I would normally use pivot tables in Excel to look at e.g. the development of variables over time or dependencies between variables.
I remembered from university the nice R-function where you can plot every column of a dataframe against every other column like in:
For the dependency between issue.age and duration this plot is actually interesting because you can clearly see that high issue ages come with shorter policy durations (because there is a maximum age for each policy). However the plots involving the issue year iss.year are much less "visual". In fact you cant see anything from them. I would like to see with once glance if the distribution of issue ages has changed over the different issue.years, something like
where you could see immediately that the average age of newly issue policies has been increasing from 2014 to 2016.
I don't want to write code that needs to be customized for every dataset that I put in because then I can also do it faster manually in Excel.
So my question is, is there an easy way to plot each column of a matrix against every other column with more flexible chart types than with the standard plot(data.frame)?
The ggpairs() function from the GGally library. It has a lot of capability for visualizing columns of all different types, and provides a lot of control over what to visualize.
For example, here is a snippet from the vignette linked to above:
data(tips, package = "reshape")
ggpairs(tips)

reaching max.print on R

I just found a bunch of weather data that I would like to play around with in glmnet in R. First I've been reading and organizing the data in R, and right now I am just trying to look at the raw data of each variable. Unfortunately, each variable has a lot of data and R isn't able to print it all. Is there a way I can view all the raw data in R or just in the file itself? I've tried opening the file in excel to no success. Thanks!
Try to use Frequency tables, you can group by segments.
str() , summary(), table(), pairs(), plots() etc. There are several libraries (such as decr) which facilitate analyzing numerical and factor levels. Let me know if you need help with any.

Resources