Mutation plot splitting cohort - r

I am approaching some R packages to create oncoplot representing different mutation types.
I am using maftools (oncoplot function) and genvisR (waterfall function) packages.
I would like to split results plot in two parts based on clinical information (like gender - male and female).
I want to reach a representation like this one but I can't find the correct parameter
Could someone help me? Is there a command inside the function or outside (like facet_wrap) to split to cohort based on clinical information? It should be very useful for me
Thank you in advance

Related

R newbie- is there a way to separate or filter out items listed in a single cell for plotting purposes?

Problem
R and stack overflow newbie here so try and be patient with me. I am currently working on a data.frame that will act as a summary of various modeling approaches used to predict either fall events or fall rates within an in-patient setting based on a range of hospital, environmental and individual-level variables.
My data is in long format and some studies have several rows (I have created a row for each model type, with some studies having built multiple). For some columns (i.e., Model performance) I have multiple entries separated by a comma (e.g., C-statistic, Hosmer-Lemeshow test, likelihood ratio, and so forth). My question is, is there a way to separate these so I can create a barplot in ggplot2 that shows the prevalence of different methods and there is one bar per statistic/test type, with the height of the bar being a count of the number of instances in the data frame it occurs? At the moment this obviously does not work as some bars have a label that contains all of the values (i.e, C-statistic, Hosmer-Lemeshow test, likelihood ratio), which means there can be multiple bars that contain "c-statistic" for example, because the list is slightly different.
Screenshots and code
I have attached a screenshot of my data.frame below. The column I refer to is "Statistic.reported"
Screenshot of datadrame:
I have also attached an image of what happens when I create a basic barplot with the following code:
Bar <- ggplot(Modelling.Data, aes(x=Statistic.reported)) +geom_bar()+ theme_classic()
Image of plot using current basic code:~
Things I have tried
I have tried using the tidyr package function seperate_rows my code for this was as follows
separate_rows(Modelling.Data,Modelling.Data$Statistic.reported, sep = ",")
From this I got an error that said "Can't subset columns that don't exist".
Hopefully, this makes sense, but I'm really new to all of this so if you need anything else please tell me. Any tips or advice would be hugely appreciated! Apologies in advance for my complete lack of knowledge.

Package for Summarizing Data in My DataFrame R

I have a huge dataset containing information about 1774 counties in the US. The variables there are things like income quartile, voter preferences, median household income etc.
I would like to know if there exists a package which would allow me to quickly see for example the number of counties which have income over a certain number and voted Republican, or the number of counties where more than 50 % work in services, while the average education attainment is HS or lower.
I know that I can do so with dplyr functions, however, that is extremely time-consuming when I want to do it with large amounts of variables.
Thank you for any recommendations!
I recommend you try the explore package.
While you can use it manually to explore specific parts of your dataset, it has additional features to explore data interactively via shiny (explore_shiny) and to generate a report of your entire dataset via rmarkdown (report).
Exploring pairs of variables (e.g. income by party voted for) is possible by specifying one variable as the target and selecting the second variable. But it won't always give you the comparison you need. Hence I would recommend the explore package as an initial starting point for understanding your data, but for specific analysis you will probably need to write your own dplyr, ggplot, and/or plotly code (or whichever other packages you favour).
Further worked examples are found in its vignette.

How can I do a Propensity Score Matching with Instrumental Variables in R?

I'm trying to do a PSM with IV. I've read this paper and I would like to apply it to my script (in R). I've done a PSM with my data, one example is following:
d=data
X=vars #List of covariables
Y=y
Tr=x
glm1=glm(psf,family=binomial(link="logit"),data=d) #PSM
XATT=Match(Y=y,Tr=x,X=glm1$fitted,M=1,estimand="ATT",
ties=FALSE,version="fast")
summary(XATT)
Now, I'm interested in doing the previous process with an IV called 'z'. Ichimura & Taber (2001) suggest two propensity-score methods that use instrumental variables, but I do not know if there are other ways to do a PSM-IV in R, so in case of some of you know a different way to achieve it, I'd deeply appreciate it.
Any help I'll extremely grateful.

Package in R to determine which factor values are overrepresented within a particular factor of interest

I have performed some clustering using copy number data from SNP arrays. I would like to know if those clusters are enriched for any of the clinical data/factors the samples come with.
I have made several searches for a package that performs this, but results always are related to transcription factor enrichment.
Please, does anyone know any?

Clustering genes based on function

We would like to use either hierarchical or k means clustering, to cluster the genes in our dataset based on their function. We got the GO id for each gene and now we would like to cluster them in groups based on the function preferably hierarchical. That means from the bottom (where each function is unique) to upper levels (where we have more generalized/groups of functions). We are programming in R.
Thanks in advance for your help!
Usuall one either performs a differential expression analysis between two conditions, or clusters genes based on expression across conditions or time points. After that, it is possible to look for overrepresentation of GO terms in differentially expressed gene sets or in clusters.
You may be interested in GeneMania (http://www.genemania.org/) - you can enter a list of genes that will be presented in a network (with lots of options for customisation and expansioN). This tool will again provide you with GO terms that are enriched in the network. A second tool of interest is Gorilla (http://cbl-gorilla.cs.technion.ac.il/) - this will show the GO hierarchy itself with GO terms lighting up if they are enriched.
k-means isn't a good idea for this kind of data.
Instead, look at algorithms specialized for this data, in particular biclustering algorithms.

Resources