I have attempted to use ggplot2 and the normal hist() function to display the data required. I messed around with bin widths and number of bins, but I've been getting very similar results to this.
This is my code:
geneCount = read.delim("smc_gene_expression_counts.txt")
geneCount$I98_FBS
geneCount %>% ggplot() + geom_histogram(aes(I98_FBS), binwidth = 500)
Histogram Output:
Examples of Values in Column Used (I98_FBS)
Related
I created a plot with multiple boxplots using this code from the singer data (this is reproducible):
library(tidyverse)
library(plotly)
library(lattice)
plot_ly(singer, y = ~height, color = ~voice.part, type = "box")
It created this beautiful box plot that was broken down by the voice part:
Now, the issue I'm having is that I'm trying to do the same thing but with a quantile plot, but no matter what I do, it ends up being all clumped together still, like this:
Oh, and this is the code I used for that:
fvalfull <- (1:nrow(singer) - 0.5) / nrow(singer)
dffull <- tibble(smpl = singer$height, voice.part = singer$voice.part, fval = fvalfull)
plot2 <- ggplot(dffull, aes(sample = smpl)) +
geom_qq(distribution = qunif)
ggplotly(plot2, color = ~dffull$voice.part)
Is there any way I can get all eight of the quantile plots to show up in the same plot? I know I can just make eight separate plots, but I think it would be more interesting to have all of them in the same plot, similar to the box plots.
Thank you!
I am not sure if this is what you want, but I created a quantile plot which shows all the voice.parts in one plot with different colors. You can use the following code:
library(tidyverse)
library(plotly)
library(lattice)
p <- ggplot(dffull, aes(sample=smpl))+
geom_qq(distribution = qunif, aes(colour=voice.part)) +
xlim(c(0,1))
ggplotly(p)
Output:
I would like to use R to randomly construct chi-square distribution with the degree of freedom of 5 with 100 observations. After doing so, I want to calculate the mean of those observations and use ggplot2 to plot the chi-square distribution with a bar chart. The following is my code:
rm(list = ls())
library(ggplot2)
set.seed(9487)
###Step_1###
x_100 <-data.frame(rchisq(100, 5, ncp = FALSE))
###Step_2###
mean_x <- mean(x_100[,1])
class(x_100)
###Step_3###
plot_x_100 <- ggplot(data = x_100, aes(x = x_100)) +
geom_bar()
plot_x_100
Firstly, I construct a data frame of a random chi-square distribution with df = 5, obs = 100.
Secondly, I calculate the mean value of this chi-square distribution.
At last, I plot the graph with the ggplot2 package.
However, I get the result like the follows:
Don't know how to automatically pick scale for object of type data.frame. Defaulting to continuous.
Error in is.finite(x) : default method not implemented for type 'list'
I got stuck in this problem for several hours and cannot find any list in my global environment. It would be appreciated if anyone can help me and give me some suggestions.
The problem is that inside the ggplot function you are calling the same dataframe (x_100) as both the data and the x variable inside aes. Remember that in ggplot, inside aes you should indicate the name of the column you wish to map. Additionally, if you want to plot the chi-square distribution I think it might be a better idea to use the geom_histogram instead of geom_bar, as the first one groups the observations into bins.
library(ggplot2)
# Rename the only column of your data frame as "value"
colnames(x_100) <- "value"
plot_x_100 <- ggplot(data = x_100, aes(x = value)) +
geom_histogram(bins = 20)
I'm struggling with how to do something with R that comes very easily to me in Excel: so I'm sure this is something quite basic but I'm just not aware of the equivalent method in R.
In essence, I have a two variables in my dataset: a categorical variable which has a list of names, and an analytical variable that has the frequency corresponding to that particular observation.
Something like this:
Name Freq
==== =========
X 100
Y 200
and so on.
I would like to plot a bar chart with the names listed on the X-Axis (X, Y and so on) and bars of height corresponding to the relevant value of the Freq. variable for that observation.
This is something very trivial with Excel; I can just select the relevant cells and create a bar chart.
However, in R I just can't seem to figure out how to do this! The bar charts in R seems to be univariate only and doesn't behave the way I want it to. Trying to plot the two variables results in a scatter plot which is not what I'm going for.
Is there something very basic I'm missing here, or is R just not capable of performing this task?
Any pointers will be much helpful.
Edited to Add:
I was primarily trying to use base R's plot function to get the job done.
Using, plot(dataset1$Name, dataset1$Freq) does not lead to a bar graph but a scatter-plot instead.
First the data.
dat <- data.frame(Name = c("X", "Y"), Freq = c(100, 200))
With base R.
barplot(dat$Freq, names.arg = dat$Name)
If you want to display a long list of names.arg, maybe the best way is to customize your horizontal axis with function staxlab from package plotrix. Here are two example plots.
One, with the axis labels rotated 45 degrees.
set.seed(3)
Name <- paste0("Name_", LETTERS[1:10])
dat2 <- data.frame(Name = Name, Freq = sample(100:200, 10))
bp <- barplot(dat2$Freq)
plotrix::staxlab(1, at = bp, labels = dat2$Name, srt = 45)
Another, with the labels spread over 3 lines.
bp <- barplot(dat2$Freq)
plotrix::staxlab(1, at = bp, labels = dat2$Name, nlines = 3)
Add colors with argument col. See help("par").
With ggplot2.
library(ggplot2)
ggplot(dat, aes(Name, Freq)) +
geom_bar(stat = "identity")
To add colors you have the aesthetics colour (for the contour of the bars) and fill (for the interior of the bars).
So I have 10.000 values in a vector from a Monte Carlo simulation. I want to plot this data as a histogram and a density plot. Doing this with the hist() function is easy, and it will calculate the frequency of the of the different values automatically. My ambition is however doing this in ggplot.
My biggest problem right now is how to transform the data so ggplot can handle it. I would like my x-axis to show the "price" while the x-axis shows the frequency or density. My data has a lot decimals as shown in the example data below.
myData <- c(266.8997, 271.5137, 225.4786, 223.3533, 258.1245, 199.5601, 234.2341, 231.7850, 260.2091, 184.5102, 272.8287, 203.7482, 212.5140, 220.9094, 221.2627, 236.3224)
My current code using the hist()-function, and the plot is shown below.
hist(myData,
xlab ="Price",
prob=TRUE)
lines(density(myData))
Histogram for the data vector containing 10000 values
How would you sort the data, and how would you do this with ggplot? I am thinking if I should round the numbers as well?
Hard to say exactly without seeing a sample of your data, but have you tried:
ggplot(myData, aes(Price)) + geom_histogram()
or:
ggplot(myData, aes(Price)) + geom_density()
Just try this:
ggplot() +
geom_bar(aes(myData)) +
geom_density(aes(myData))
I am trying to plot a histogram using ggplot() however I am unable to deal with extreme values. I would like them to be comined within one bin (called "500 and more" for example).
I have tried the scale_x_continuous(breaks = seq(0,500, by = 50)) function but it just removes labels from the x-axis (attached below) Any ideas of how to deal with this?
I would suggest to compute counts before the plotting. Using function cut() you can set breaks as you need and plot those data using geom_bar(). Setting width=1 inside the geom_bar() will remove space between bars.
library(dplyr)
library(ggplot2movies)
data("movies")
df<-movies %>% mutate(length.class=cut(length,breaks=c(seq(0,500,50),10000))) %>%
group_by(length.class) %>% summarise(count=n())
ggplot(df,aes(length.class,count))+geom_bar(stat="identity",width=1)