The data has 4 columns and roughly 600 rows. The data is twitter data collected using the twitteR package, and then summarized into a data frame. The summary is based on how many words from these libraries each tweet has, the tweets are given a score and then the summary is the number of tweets which get specific scores. So the columns are the two types of scores, the dates, and then the number of tweets with those scores.
Score1 Score2 Date Number
0 0 01/10/2015 50
0 1 01/10/2015 34
1 0 01/10/2015 10
...and so on
With dates and data that extend over a month, and the scores either way can go +/- 10 or so.
I'm trying to plot that kind of data using a bubble plots, score1 on the x axis and score2 on the y axis with the size of the bubble dependant on the number (how many tweets of with those scores there were per day).
My problem is that I only know how to use ggplot.
g <- ggplot(
twitterdata,
aes(x=score1, y=score2, size=number, label=""), guide=FALSE) +
geom_point(colour="black", fill="red", shape=21) +
scale_size_area(max_size = 30) +
scale_x_continuous(name="score1", limits=c(0, 10)) +
scale_y_continuous(name="score2", limits=c(-10, 10)) +
geom_text(size=4) +
theme_bw()
and that just gives me the plot for all dates, and what I need is a good way to see how that data changes over time. I've looked into using sliders and selectors but I really have no idea what would be the best tool to use. I've tried subsetting the data based on date, which works nicely but ideally I could make some kind of interactive graph.
I really need some way select certain days out of that data to plot so it doesn't pile up all on itself, but do it interactively so it can be presented.
Any help would be greatly appreciated, thank you.
It sounds like this won't completely satisfy your use case, but an extremely low-overhead way to add some interactivity to your plot would be to install.packages('plotly') and add the following line to your code:
# your original code
g <- ggplot(
twitterdata,
aes(x=score1, y=score2, size=number, label=""),
guide=FALSE)+
geom_point(colour="black", fill="red", shape=21) +
scale_size_area(max_size = 30) +
scale_x_continuous(name="score1", limits=c(0,10)) +
scale_y_continuous(name="score2", limits=c(-10,10)) +
geom_text(size=4) +
theme_bw()
# add this line
gg <- ggplotly(g)
Details and demos: https://plot.ly/ggplot2/
As Eric suggested, if you want sliders and such you should check out shiny. Here's a demo combining shiny with plotly: https://plot.ly/r/shiny-tutorial/
Related
I hope I can explain this well. Suppose you have a fictitious data set that has 3 columns,
Car
Color
Yes/No
Each row is an observation that indicates if the user likes their model car and color. I'd like to create chart that shows on the X axis each model car then a line graph for each color where the y value is the percent liked (yes) of the total for that combination of car/color.
What is the best approach to work this in R? I'm thinking this could be useful in general where the response is Yes/No, and you want to show an interaction between two categorical features.
Thanks!
Ok, this is what I ended up doing. It seems to do what I need it to do ;0
PS - I'm not sure if it's appropriate to answer my own question. Thanks for the comments!
prop <- data.frame(prop.table(table(data$outcome,data$Factor1,data$Factor2),3))
names(prop) <- c("Outcome","Factor1","Factor2","Percentage")
# Remove No Percent
prop <- prop[which(prop$Outcome=="Yes"),]
# Bar plot
ggplot(data=prop, aes(x=Factor1, y=Percentage, fill=Factor2)) +
geom_bar(stat="identity", position=position_dodge())+
scale_fill_brewer(palette="Paired")+
theme_minimal()
# Line Plot
ggplot(data=prop, aes(x=Factor1, y=Percentage, group=Factor2)) +
geom_line(aes(color=Factor2))+
geom_point(aes(color=Factor2))
I would like to make a graph in R, which I managed to make in excel. It is a bargraph with species on the x-axis and the log number of observations on the y-axis. My current data structure in R is not suitable (I think) to make this graph, but I do not know how to change this (in a smart way).
I have (amongst others) a column 'camera_site' (site 1, site2..), 'species' (agouti, paca..), 'count'(1, 2..), with about 50.000 observations.
I tried making a dataframe with a column 'species" (with 18 species) and a column with 'log(total observation)' for each species (see dataframe) But then I can only make a point graph.
this is how I would like the graph to look:
desired graph made in excel
Your data seems to be in the correct format from what I can tell from your screenshot.
The minimum amount of code you would need to get a plot like that would be the following, assuming your data.frame is called df:
ggplot(df, aes(VRM_species, log_obs_count_vrm)) +
geom_col()
Many people intuitively try geom_bar(), but geom_col() is equivalent to geom_bar(stat = "identity"), which you would use if you've pre-computed observations and don't need ggplot to do the counting for you.
But you could probably decorate the plot a bit better with some additions:
ggplot(df, aes(VRM_species, log_obs_count_vrm)) +
geom_col() +
scale_x_discrete(name = "Species") +
scale_y_continuous(name = expression("Log"[10]*" Observations"),
expand = c(0,0,0.1,0)) +
theme(axis.text.x = element_text(angle = 90))
Of course, you could customize the theme anyway you would like.
Groetjes
I am trying to make a count plot from RNA-seq data for individual genes. I am only interested in the comparisons between each treatment and the control group and my data are paired, so I'm trying to show this. I have managed to make the graph on the left (Counts of single gene) by using the plotCounts function of DEseq2 and then modify the graph a bit. The code is the following:
data <- plotCounts(dds, gene="GB41122", intgroup=c("Treatment", "Home", "Behaviour"), returnData=TRUE)
data <- ggplot(data, aes(x=Treatment, y=count, shape = Behaviour, color=Home, group=Home)) +
scale_y_log10() +
geom_point() + geom_line()
How could this be modified so that the graph looks like the one to the right?
Also, how can I reorder the treatment levels so that I have ctr to the left, then CO1 and CO2 to the right?
Thank you!
Andrea
I don't know how change the lines, but to reordrer the treatment levels, try adding this to your code:
+ scale_x_discrete(limits=c("Ctr", "CO1", "CO2"))
I am fairly new to R and ggplot2 and am having some trouble plotting multiple variables in the same histogram plot.
My data is already grouped and just needs to be plotted. The data is by week and I need to plot the number for each category (A, B, C and D).
Date A B C D
01-01-2011 11 0 11 1
08-01-2011 12 0 3 3
15-01-2011 9 0 2 6
I want the Dates as the x axis and the counts plotted as different colors according to a generic y axis.
I am able to plot just one of the categories at a time, but am not able to find an example like mine.
This is what I use to plot one category. I am pretty sure I need to use position="dodge" to plot multiple as I don't want it to be stacked.
ggplot(df, aes(x=Date, y=A)) + geom_histogram(stat="identity") +
labs(title = "Number in Category A") +
ylab("Number") +
xlab("Date") +
theme(axis.text.x = element_text(angle = 90))
Also, this gives me a histogram with spaces in between the bars. Is there any way to remove this? I tried spaces=0 as you would do when plotting bar graphs, but it didn't seem to work.
I read some previous questions similar to mine, but the data was in a different format and I couldn't adapt it to fit my data.
This is some of the help I looked at:
Creating a histogram with multiple data series using multhist in R
http://www.cookbook-r.com/Graphs/Plotting_distributions_%28ggplot2%29/
I'm also not quite sure what the bin width is. I think it is how the data should be spaced or grouped, which doesn't apply to my question since it is already grouped. Please advise me if I am wrong about this.
Any help would be appreciated.
Thanks in advance!
You're not really plotting histograms, you're just plotting a bar chart that looks kind of like a histogram. I personally think this is a good case for faceting:
library(ggplot2)
library(reshape2) # for melt()
melt_df <- melt(df)
head(melt_df) # so you can see it
ggplot(melt_df, aes(Date,value,fill=Date)) +
geom_bar() +
facet_wrap(~ variable)
However, I think in general, that changes over time are much better represented by a line chart:
ggplot(melt_df,aes(Date,value,group=variable,color=variable)) + geom_line()
I'm an undergrad researcher and I've been teaching myself R over the past few months. I just started trying ggplot, and have run into some trouble. I've made a series of boxplots looking at the depth of fish at different acoustic receiver stations. I'd like to add a scatterplot that shows the depths of the receiver stations. This is what I have so far:
data <- read.csv(".....MPS.csv", header=TRUE)
df <- data.frame(f1=factor(data$Tagging.location), #$
f2=factor(data$Station),data$Detection.depth)
df2 <- data.frame(f2=factor(data$Station), data$depth)
df$f1f2 <- interaction(df$f1, df$f2) #$
plot1 <- ggplot(aes(y = data$Detection.depth, x = f2, fill = f1), data = df) + #$
geom_boxplot() + stat_summary(fun.data = give.n, geom = "text",
position = position_dodge(height = 0, width = 0.75), size = 3)
plot1+xlab("MPS Station") + ylab("Depth(m)") +
theme(legend.title=element_blank()) + scale_y_reverse() +
coord_cartesian(ylim=c(150, -10))
plot2 <- ggplot(aes(y=data$depth, x=f2), data=df2) + geom_point()
plot2+scale_y_reverse() + coord_cartesian(ylim=c(150,-10)) +
xlab("MPS Station") + ylab("Depth (m)")
Unfortunately, since I'm a new user in this forum, I'm not allowed to upload images of these two plots. My x-axis is "Stations" (which has 12 options) and my y-axis is "Depth" (0-150 m). The boxplots are colour-coded by tagging site (which has 2 options). The depths are coming from two different columns in my spreadsheet, and they cannot be combined into one.
My goal is to to combine those two plots, by adding "plot2" (Station depth scatterplot) to "plot1" boxplots (Detection depths). They are both looking at the same variables (depth and station), and must be the same y-axis scale.
I think I could figure out a messy workaround if I were using the R base program, but I would like to learn ggplot properly, if possible. Any help is greatly appreciated!
Update: I was confused by the language used in the original post, and wrote a slightly more complicated answer than necessary. Here is the cleaned up version.
Step 1: Setting up. Here, we make sure the depth values in both data frames have the same variable name (for readability).
df <- data.frame(f1=factor(data$Tagging.location), f2=factor(data$Station), depth=data$Detection.depth)
df2 <- data.frame(f2=factor(data$Station), depth=data$depth)
Step 2: Now you can plot this with the 'ggplot' function and split the data by using the `col=f1`` argument. We'll plot the detection data separately, since that requires a boxplot, and then we'll plot the depths of the stations with colored points (assuming each station only has one depth). We specify the two different plots by referencing the data from within the 'geom' functions, instead of specifying the data inside the main 'ggplot' function. It should look something like this:
ggplot()+geom_boxplot(data=df, aes(x=f2, y=depth, col=f1)) + geom_point(data=df2, aes(x=f2, y=depth), colour="blue") + scale_y_reverse()
In this plot example, we use boxplots to represent the detection data and color those boxplots by the site label. The stations, however, we plot separately using a specific color of points, so we will be able to see them clearly in relation to the boxplots.
You should be able to adjust the plot from here to suit your needs.
I've created some dummy data and loaded into the chart to show you what it would look like. Keep in mind that this is purely random data and doesn't really make sense.