I'm currently working with a dataframe which has this structure:
Date
Term
Frequency
2022-10-28
politics
42
2022-10-26
biology
69
It was generated to summarize the frequency of a certain word by date, from a larger database of social media posts.
Here's example data:
examp.data <- data.frame(
date = as.Date(c("2012-01-01", "2013-09-23", "2014-11-15", "2014-05-11",
"2015-03-27")),
term = c("engineering","biology","physics","mathematics","computer"),
freq = c(732,917,241,601,692),
stringsAsFactors = FALSE
)
The object is to produce a plot that looks this
from one that right now looks this:
I was assuming I could achieve this by creating new variables (columns) based on each word and then plotting them using the same x axis (dates). But I can't figure a way to transform the data to do it.
I don't think you need to transform the data. You can just use ggplot aesthetics.
dat %>%
ggplot() +
aes(date, freq, color = term) +
geom_line()
Related
I have several datasets and my end goal is to do a graph out of them, with each line representing the yearly variation for the given information. I finally joined and combined my data (as it was in a per month structure) into a table that just contains the yearly means for each item I want to graph (column depicting year and subsequent rows depicting yearly variation for 4 different elements)
I have one factor that is the year and 4 different variables that read yearly variations, thus I would like to graph them on the same space. I had the idea to joint the 4 columns into one by factor (collapse into one observation per row and the year or factor in the subsequent row) but seem unable to do that. My thought is that this would give a structure to my y axis. Would like some advise, and to know if my approach to the problem is effective. I am trying ggplot2 but does not seem to work without a defined (or a pre defined range) y axis. Thanks
I would suggest next approach. You have to reshape your data from wide to long as next example. In that way is possible to see all variables. As no data is provided, this solution is sketched using dummy data. Also, you can change lines to other geom you want like points:
library(tidyverse)
set.seed(123)
#Data
df <- data.frame(year=1990:2000,
v1=rnorm(11,2,1),
v2=rnorm(11,3,2),
v3=rnorm(11,4,1),
v4=rnorm(11,5,2))
#Plot
df %>% pivot_longer(-year) %>%
ggplot(aes(x=factor(year),y=value,group=name,color=name))+
geom_line()+
theme_bw()
Output:
We could use melt from reshape2 without loading multiple other packages
library(reshape2)
library(ggplot2)
ggplot(melt(df, id.var = 'year'), aes(x = factor(year), y = value,
group = variable, color = variable)) +
geom_line()
-output plot
Or with matplot from base R
matplot(as.matrix(df[-1]), type = 'l', xaxt = 'n')
data
set.seed(123)
df <- data.frame(year=1990:2000,
v1=rnorm(11,2,1),
v2=rnorm(11,3,2),
v3=rnorm(11,4,1),
v4=rnorm(11,5,2))
I am trying to calculate the city wise spend on each product on yearly basis.Also including graphical representation however I am not able to get the graphs on R?
Top_11 <- aggregate(Ca_spend["Amount"],
by = Ca_spend[c("City","Product","Month_Year")],
FUN="sum")
A <- ggplot(Top_11,aes(x=City,Month_Year,y=Amount))
A <-geom_bar(stat="identity",position='dodge',fill="firebrick1",colour="black")
A <- A+facet_grid(.~Type)
This is the code I am using.I am trying to plot City,Product,Year on same graph.
VARIABLES-(City product Month_Year Amount)
(OBSERVATIONS)- New York Gold 2004 $50,0000 (Sample DATA Type)
I'd try this:
ggplot(Top_11,aes(x=City, fill = Product, y=Amount)) +
geom_col() +
facet_wrap(~Month_Year)
For your 5 rows of sample data, that gives the graph below. You can play around with which variable goes to fill (fill color), x (x-axis), and facet_wrap (for small multiples). I see in your code you tried facet_grid(.~Type), but that won't work unless you have a column named Type.
I am learning r currently and I have an r data-frame containing data I have scraped from a football website.
There are 58 columns(Variables,attributes) for each row. Out of these variables, I wish to plot 3 in a single bar chart.I have 3 important variables 'Name', 'Goals.with.right.foot', 'Goals.with.left.foot'.
What I want to build is a bar chart with each 'Name' appearing on the x-axis and 2 independent bars representing the other 2 variables.
Sample row entry:
{......., RONALDO, 10(left), 5(right),............}
I have tried playing around a lot with ggplot2 geom_bar with no success.
I have also searched for similar questions however I cannot understand the answers. Is anyone able to explain simply how do I solve this problem?
my data frame is called 'Forwards' who are the strikers in a game of football. They have attributes Name, Goals.with.left.foot and Goals.with.right.foot.
barplot(counts, main="Goals",
xlab="Goals", col=c("darkblue","red"),
legend = rownames(counts))
You could try it this way:
I simulated a frame as a stand in for yours, just replace it with a frame containing the columns you're interested in:
df <- data.frame(names = letters[1:5], r.foot = runif(5,1,10), l.foot = runif(5,1,10))
# transform your df to long format
library(reshape2)
plotDf <- melt(df, variable.name = 'footing', value.name = 'goals')
# plot it
library(ggplot2)
ggplot(plotDf, aes(x = names, y = goals, group = footing, fill = footing)) +
geom_col(position = position_dodge()) #does the same as geom_bar, but uses stat_identity instead of stat_count
Results in this plot:
your plot
This works, because ggplot expects one variable containing the values needed for the y-axis and one or more variable containing the grouping factor(s).
with the melt-function, your data.frame is merged into the so called 'long format' which is exactly the needed orientation of data.
I have a dataset with a few organisms, which I would like to plot on my y-axis, against date, which I would like to plot on the x-axis. However, I want the fluctuation of the curve to represent the abundance of the organisms. I.e I would like to plot a time series with the relative abundance separated by the organism to show similar patterns with time.
However, of course, plotting just date against an organism does not yield any information on the abundance. So, my question is, is there a way to make the curve represent abundance using ggridges?
Here is my code for an example dataset:
set.seed(1)
Data <- data.frame(
Abundance = sample(1:100),
Organism = sample(c("organism1", "organism2"), 100, replace = TRUE)
)
Date = rep(seq(from = as.Date("2016-01-01"), to = as.Date("2016-10-01"), by =
'month'),times=10)
Data <- cbind(Date, Data)
ggplot(Data, aes(x = Abundance, y = Organism)) +
geom_density_ridges(scale=1.15, alpha=0.6, color="grey90")
This produces a plot with the two organisms, however, I want the date on the x-axis and not abundance. However, this doesn't work. I have read that you need to specify group=Date or change date into julian day, however, this doesn't change the fact that I do not get to incorporate abundance into the plot.
Does anyone have an example of a plot with date vs. a categorical variable (i.e. organism) plotted against a continuous variable in ggridges?
I really like to output from ggridges and would like to be able to use it for these visualizations. Thank you in advance for your help!
Cheers,
Anni
To use geom_density_ridges, it'll help to reshape the data to show observations in separate rows, vs. as summarized by Abundance.
library(ggplot2); library(ggridges); library(dplyr)
# Uncount copies the row "Abundance" number of times
Data_sum <- Data %>%
tidyr::uncount(Abundance)
ggplot(Data_sum, aes(x = Date, y = Organism)) +
ggridges::geom_density_ridges(scale=1, alpha=0.6, color="grey90")
So I'm having trouble creating a dot plot/bar graph of this data set I have. My data set looks like this. I want an output that looks like this. However, geom_bar() through ggplot will only give me counts, and won't take the individual decimal values from the table. I've tried using Plotly as well, but it doesn't seem to scale well to plots with multiple players.
I've already set up a larger data frame with 200+ variables. I'm trying to make something that can search for specific players in that data frame, and then create a plot from it. Consequently, I'm ideally looking for something that can easily handle 5-10 different series.
Any help would be greatly appreciated.
Thanks!
This is pretty straightforward, the key is to get your data from its current wide format into the long format that is more useful for plotting in R. And use geom_point rather than geom_bar.
First, some reproducible example data (that you should use again in your question if you post another question here, makes it much easier for others to help you):
library(ggplot2)
library(reshape2)
dataset <- data.frame(
PlayerName = letters[1:6],
IsolationPossG = runif(6),
HandoffPossG = runif(6),
OffScreenPossG = runif(6)
)
This is your current data, in the wide format:
dataset
PlayerName IsolationPossG HandoffPossG OffScreenPossG
1 a 0.78184751 0.939183520 0.74461784
2 b 0.06557433 0.745699149 0.96540299
3 c 0.21105745 0.753534811 0.02977973
4 d 0.41271918 0.555475622 0.18317886
5 e 0.38153149 0.246292074 0.74862310
6 f 0.89946318 0.008412111 0.53195933
Now we convert to the long format:
molten <- melt(
dataset,
id.vars = "PlayerName",
measure.vars = c("IsolationPossG", "HandoffPossG", "OffScreenPossG")
)
Here is the long format, much more useful for plotting in R:
head(molten)
PlayerName variable value
1 a IsolationPossG 0.78184751
2 b IsolationPossG 0.06557433
3 c IsolationPossG 0.21105745
4 d IsolationPossG 0.41271918
5 e IsolationPossG 0.38153149
6 f IsolationPossG 0.89946318
Here's how to plot it:
ggplot(molten, aes(x = variable, y = value, colour = PlayerName)) +
geom_point(size = 4) +
theme_bw() +
theme(legend.position="bottom",legend.direction="horizontal")
Which gives:
h/t how to have multple labels in ggplot2 for bubble plot
If you want the shape of the data point to vary by name, as your example image shows (but it seems rather excessive to have the player name variable on two of the plot's aesthetics):
ggplot(molten, aes(x = variable, y = value, shape = PlayerName, colour = PlayerName)) +
geom_point(size = 4) +
theme_bw() +
theme(legend.position="bottom",legend.direction="horizontal")