I have a data frame with 190 observations (rows). There are five columns, in which every entry either has the value 0 or 1.
How do I get a barplot that has the name of the five columns on the x-Axis and the number of 1's (i.e. the sum) of every column as height of the bars - preferably with ggplot?
I'm sorry I don't have any sample data, I couldn't figure out how to produce a smaller dataframe that fits the descriptions.
### Load ggplot & create sample data
library(ggplot2)
sampledata=data.frame(a=rbinom(n,1,0.7),b=rbinom(n,1,0.6),
c=rbinom(n,1,0.5),d=rbinom(n,1,0.2),e=rbinom(n,1,0.3))
### Sum the data using apply & use this for beautiful barplot
sumdata=data.frame(value=apply(sampledata,2,sum))
sumdata$key=rownames(sumdata)
ggplot(data=sumdata, aes(x=key, y=value, fill=key)) +
geom_bar(colour="black", stat="identity")
Just take the column sums and make a barplot.
barplot(colSums(iris[,1:4]))
Related
So I have a dataframe containing the 10 most upregulated genes in cancer-samples compared to control-samples and the 10 most downregulated genes.
It looks like this:
I want to create a neat boxplot to compare the spread of each gene between patient-samples and control-samples (there are 4 samples of each type).
The problems I have is that I don't get all boxes along-side each other in a row/in the same graph, but like this:
I would also like it to show the gene's boxes sorted by the "log2FC-value", and not be in alphabetical order. Does anyone know how to fix this??
This is my code I used:
#Im using the dataframe called "Dataframe"
#Make a column for the genenames
Dataframe$genenames <- rownames(Dataframe)
#Get data into a long-format
long_Dataframe <- gather(Dataframe, key="samples",
value="values", -c(log2FC, gennames,))
#Creating a new column called "group", stating if each row belongs to patient/control
long_Dataframe$group <- rep(c("Control", "Patient"), each=40)
#Order rows by log2FC - from lowest to highest
long_Dataframe <- long_Dataframe[order(long_Dataframe$log2FC), ,
drop=FALSE]
#Use long data for boxplot of top 20 up/downregulated genes
Boxplot_top20 <- ggplot(long_Dataframe, aes(x=genenames, y=values, fill=group)) +
geom_boxplot() +
scale_fill_manual(values=c("green", "red")) +
theme_light() +
facet_wrap(~genenames, scales="free")
You may use geom_boxplot(position=position_dodge()) instead of facet_wrap() to place your boxplots by pair within group.
I'm trying to order the bars of my percent stacked barchart in R based on descending stack segment height.
R automatically sorts my categorical data in alphabetical order (in both the barchart and its legend) but I'd like the data to be ordered so to have the biggest bars (the ones with the greatest stack segment height) on top of the barchart and the smallest at the bottom, in a descending manner.
I don't know how to do this because I cannot manually set a specific order with a vector prior to using ggplot2: my dataset is quite big and I need it to be ordered based on total field area (a quantitative variable that changes for every single city I'm considering).
Does anyone know hot to help me?
You need to set your categorical variable as an ordered factor. For example, using the iris data, the default is for an alphabetical x-axis:
iris%>%
ggplot(aes(Species,Petal.Length))+
geom_col()
Using fct_reorder (from forecats, included in the tidyverse), you can change a character variable to a factor and give it an order in one step. Here I change the order of the x-axis such that is order by the average sepal width of the petal.
iris%>%
mutate(Species=fct_reorder(Species,Sepal.Width,mean))%>%
ggplot(aes(Species,Petal.Length))+
geom_col()
st_des_as %>%
mutate(COLTURA=fct_reorder(COLTURA,tot_area),.desc=F) %>%
ggplot(aes(x=" ", y=tot_area, fill=COLTURA)) +
geom_bar(position= "fill", stat="identity") +
facet_grid(~ZONA) +
labs(x=NULL, y="landcover (%)") +
scale_y_continuous(labels=function(x) paste0(x*100)) +
scale_fill_manual(name="CROP TYPE",values=colours_as) +
theme_classic() +
theme(legend.key.size = unit (10, "pt")) +
theme(legend.title = element_text(face="bold"))
geom_col()
here are some of my data, as you can see they are numerical values divided by region (ZONA) and crop type (COLTURA)
and here are the first graphs: the first one from the left is correctly sorted while the other three ones are sorted not following their bars' height but rather following the same sorting of the first graph, no matter the dimension of their own bars
Maybe someone has run into someissue. I am trying to plot different proteins (e.g., CS, AMPK) in other type points (pre and post), and two different intervention groups (control, treatment). I plot a grouped barplot nicely based on group and time, but I cannot create a unique grouped barplot per protein.
My data has as columns the following variables:
PID, time, group, protein_name, protein_content (in the plot named area)
I used the following code to plot a grouped barplot
ggplot(data, aes(group, area, fill=time)) + theme_classic() +
geom_bar(position = position_dodge(), stat = "identity")
With the following output:
How do I create a plot like the one above for each protein rather than using all the protein values as a unique value? I have 5 different proteins, so I would like to plot 5 different plots like the one above.
Thank you in advance! All help is very much appreciated :)
Here's the ggplot I have:
listTimelinePlot <- ggplot(listDf, aes(x=N, y=Measurement_Value,color="List")) +
xlab("n") +
ylab("Time to append n items") +
scale_x_log10() +
scale_y_log10() +
geom_line() +
geom_point()
N is an array of integers that may contain duplicate values. As a result, in the resulting plot there are multiple points that share the same x-value:
How do I make it so that only one point is displayed per x-value, namely a point with its y-value equal to the average of the points' y-values? I'm assuming that the 'joints' created by geom_line() meet at the mean y-value.
Just compress your dataframe to means in the first place. ddply from the plyr package should do the job.
newListDF <- ddply(listDF, "N", numcolwise(mean))
The first input to the function is the data, the second is the column you want to categorise by, and the last column is the function you want to apply to the groupings (numcolwise is required to make the function apply in the correct direction of the data frame).
This will give you a data frame where we have calculate the mean of the N values for each x. Look at the names for this dataframe and you can use this as an input to ggplot instead.
My data frame (df) consists of 5 columns with 2,000 numerical values for each one.
Using reshape I reformatted my data frame to two columns: 1st containing the values (df$Values) (a total of 10,000) and a 2nd containing the name of the column (df$Labels) from where the value in col 1 is coming from.
I will use the 2nd column as a group factor.
I generated a mycolor and myshapes for coloring and setting the shape of lines.
With ggplot I tried to generate a density plot containing the density plot for the five factors.
The problem is that the x-axis show the counts, which maximum is 10,000. This value does not make any sense because the maximum possible counts for each plot must be 2,000. Anyone knows what is going on? Which is code I need to use to properly correct the x-axis?
ggplot2, geom_density() plot:
Here is the code:
ggplot(df, aes(x=Values, colour=Labels, linetype=Labels))+
geom_density(aes(y=..count..))+
theme_classic()+
scale_colour_manual(values = mycolor)+
scale_linetype_manual(values = myshapes)+
ggtitle("Title")+
scale_x_continuous(limits = c(0.5,1.5))