How to include head into snscountplot [duplicate] - jupyter-notebook

Is it possible to show only the top/bottom n groups in asns.countplot()?
Using an example from the seaborn website,
sns.countplot(y="deck", hue="class", data=titanic, palette="Greens_d");
Is there any easy (or even relatively straightforward) way of limiting this plot to just 3 decks (groups) instead of displaying all 7 or is this something that would be better accomplished with an sns.bargraph or just plain matplotlib?

import seaborn as sns
titanic = sns.load_dataset("titanic")
sns.countplot(y="deck", hue="class", data=titanic, palette="Greens_d",
order=titanic.deck.value_counts().iloc[:3].index)

Just adding real example instead of toy dataset.
Assuming you have Pandas Data Frame name training_var and you want to display top 10 'Gene' column counts 'order=' bit should look as follows:
sb.countplot(x='Gene',data=training_var,order=pd.value_counts(training_var['Gene']).iloc[:10].index)

Related

Is there a way to create a heatmap of multiple values for the same compared variables – essentially a heatmap within a heatmap?

Hi I am new to R and trying to learn. I would like to compare the overlap of 4 clusters which have the same 4 categories each in them. Basically, I am thinking of making a clustered heatmap like this image below that I quickly made as an example in excel. Does anyone know of an R package that would allow me to make a graph like this? So far, I have only found packages that limit you to one variable per X vs Y variable grid space. Thanks so much for your suggestions!

How to combine two data sets into one and plotting one graph with both data sets in R Studio

Currently I have combined the Apple stock market and the Samsung stock market from 2014- 2018. I have combined the Date,Open,High, low and Close using cbind and changed the names so it says Apple/ Samsung.
My problem is with the graph, Now my dataset is combined in columns so I feel like this might be part of the problem, but none the less I would prefer to keep it like that. I would love a graph that would have both of the open figures on it over the years.
If I just use plot(Total$OpenApple, Total$OpenSam) the plot is a huge block compared to the line graph I would like.
Thanks.
Without any example data its difficult to understand your problem fully. However, I would try using the ggplot2 package and dplyr package. Then you can change your data so that OpenApple and OpenSam are both part of the same column, and then use a function from ggplot2 to change the colors of your lines based on what group they are a part of.

Change colors in r plot

I am currently trying to plot some data and don't manage to obtain a nice result. I have a set of 51 individuals with each a specific value (Pn) and split within 14 groups. The closest thing I end up with is this kind of plot. I obtain it thanks to the simple code bellow, starting by ordering my values for the Individuals :
Individuals <- factor(Individuals,levels=Individuals[order(Pn)])
dotchart(Pn,label=Individuals,color=Groups)
The issue is that I only have 9 colors on this plot (so I lost information somehow) and I can't manage to find a way to apply manually one color per group.
I've also try to use the ggplot2 package by reading it could give nice looking things. In that case I can't manage to order properly the Individuals (the previous sorting doesn't seem to have any effect here), plus I end up with only different type of blue for the group representation which is not an efficient way to represent the information given by my data set. The plot I get is accessible here and I used the following code:
ggplot(data=gps)+geom_point(mapping=aes(x=Individuals, y=Pn, color=Groups))
I apologize if this question seems redundant but I couldn't figure a solution on my own, even following some answer given to others...
Thank you in advance!
EDIT: Using the RColorBrewer as suggested bellow sorted out the issue with the colors when I use the ggplot2 package.
I believe you are looking for the scale_color_manual() function within ggplot2. You didn't provide a reproducible example, but try something along the lines of this:
ggplot(data=gps, mapping=aes(x=Individuals, y=Pn, color=Groups))+
geom_point() +
scale_color_manual(values = c('GROUP1' = 'color_value_1',
'GROUP2' = 'color_value_2',
'GROUP3' = 'color_value_3'))
Replace GROUPX with the values inside your Group column, and replace color_value_x with whatever colors you want to use.
A good resource for further learning about ggplot2 is chapter 3 of R For Data Science, which you can read here: http://r4ds.had.co.nz/data-visualisation.html
I can't be sure without looking at your data, but it looks like Groups may be a numeric value. Try this:
gps$Groups <- as.factor(gps$Groups)
library(RColorBrewer)
ggplot(data=gps)+
geom_point(mapping=aes(x=Individuals, y=Pn, color=Groups))+
scale_colour_brewer(palette = "Set1")

Power View - Tables with columns of the similar data

Apologies in advance if my question is obvious but I'm new to this and having spent days searching and experimenting I cannot achieve the result I'm after.
I have a big table of data that I want to plot some graphs for, namely stacked columns and then filter these using other criteria such as date. I came across Power View and have been learning to use it hoping it will let me produce the reports I'm after rather than standard Excel graphs which are very clunky.
An example of the sort of data my tables contains is as follows although I have a lot of other columns and about 20 of the similar metric columns:
]
And this is the sort of graph I want to plot:
]
Where the red sections correspond to "R0" values, the orange "R1" and the green "R2" - they're essentially a rating; poor, ok and good. I can plot a single metric versus the items column with the stacked bar fine but cannot find a way to plot the metrics long the x axis for say a given item or sum of all items.
I've created measures using CALCULATE to filter by the rating but when I try and plot these my only option in power view is a clustered column graph where the x axis is the rating and the legend is the metrics.
I also created another small table with a single column of R0, R1 and R2 but can only link that to one metric column whereas I need it to link to all of them.
I think it's potentially a many to many mapping issue and have found a lot of links covering bridge tables and the magic CALCULATE function.
However as I'm trying to map values in a column to several other columns it doesn't seem to quite fit the many to many problem or if it does I can't see it.
I feel like what I'm after should be quite simple but I either end up with all my metric columns being made to show identical values or there's loads of cross filtering that I don't want. The "ratings" for each metric are essentially independent and I don't want to combine them in any way.
Any help is greatly appreciated and if my solution is in the links I've listed above then I'd really appreciate a bit of help with seeing it.
Thanks in advance

Cufflinks: how to subplot heatmaps w/ Cufflinks in Jupyter/ipython Notebook?

I have read the Cufflinks examples. The only subplots examples are generated from a single DataFrame with a subplots=True parameter and an optional shape parameter (i.e. df.iplot(..., subplots=True, shape=(...), ...). As I understand it, the mechanism is that when subplots=True is provided, each column of the DataFrame is plotted as a subplot.
Now, about heatmaps in Cufflinks. The example in the same link shows that the DataFrame of a heatmap of N * M is simply an N * M DataFrame where the column names and indexes tells the x and y coordiates and the values are the "heat" of each cell of the grid.
Combining the two, it seems that if I have two heatmaps (thus two DataFrames), I cannot plot both in a subplot-fashion, because subplots require a single DataFrame and I cannot combine two heatmap DataFrames into one.
Anyone has any idea how it might work?
BTW, I also tried plotly.offline.iplot(..., subplots=True, ...) and the parameter is not supported.
EDIT
There is another question (from me, too) asking about doing the same in plotly, which got answered. So if you are working w/ plotly directly then that's the answer you might want to take a look.
This question is about using Cufflinks to achieve the same. It still seems impossible (or at least very difficult) to me.
You can use the following:
import cufflinks as cf
df1=cf.datagen.heatmap()
df2=cf.datagen.heatmap()
cf.subplots([df1.figure(kind='heatmap'),df2.figure(kind='heatmap')]).iplot()
You can do this with as many heatmaps, and you can also use the shape parameters.

Resources