how to plot a expenditure vs year in r - r

I have a dataset which has about 100,000 datapoints.
I want to plot two columns.
Y axis - Year
X axis - Sales
Sample Data:
Sales Year
22 2016
10 2016
3.99 2017
8.99 2017
12.99 2017
8.00 2016
12.00 2017
5.00 2016
22 2017
50 2016
53 2017
Im using the following code
plot(subset_4$SALES ~ subset_4$YEAR)
But the plot doesn't look great. Is there any nicer way of doing this?
Update: plot(subset_4$SALES ~ subset_4$WEEKS)

You can try ggplot2 library
df <- data.frame(sales, year)
ggplot(df, aes(x = sales, y = year, color = year)) +
geom_point() +
xlab("Sales") +
ylab("Year")

Related

Two line graphs in the same plot in R

I have a large dataframe. i am trying to plot sales for 2 different years in the same plots as line graph to show the variation across 2 years each month. There is a long series of grouping and filtering i have done before getting the below dataframe.
Dataframe has 3 columns (month, sales and the year)
When I am trying to plot the sales across the different years as :
ggplot(df,aes(x=month.sales,y=sales/100000,color=year)) +
geom_line()
I am getting a blank graph with x and y labels , while if I plot a column graph, it works.
Please help.
thank you
I'm guessing your data looks something like this:
set.seed(69)
df <- data.frame(month.sales = factor(rep(month.abb, 2), month.abb),
year = rep(2018:2019, each = 12),
sales = runif(24, 1, 2) * 100000)
df
#> month.sales year sales
#> 1 Jan 2018 114570.1
#> 2 Feb 2018 123197.1
#> 3 Mar 2018 166092.7
#> 4 Apr 2018 163214.1
#> 5 May 2018 109486.6
#> 6 Jun 2018 131429.8
#> 7 Jul 2018 167363.6
#> 8 Aug 2018 191097.6
#> 9 Sep 2018 127427.4
#> 10 Oct 2018 145360.1
#> 11 Nov 2018 134577.1
#> 12 Dec 2018 169486.6
#> 13 Jan 2019 168493.2
#> 14 Feb 2019 147552.5
#> 15 Mar 2019 139811.3
#> 16 Apr 2019 156351.2
#> 17 May 2019 199368.3
#> 18 Jun 2019 130953.6
#> 19 Jul 2019 148150.5
#> 20 Aug 2019 166307.3
#> 21 Sep 2019 121830.8
#> 22 Oct 2019 101838.1
#> 23 Nov 2019 109716.9
#> 24 Dec 2019 125407.9
In which case you can draw a line plot like this:
library(ggplot2)
ggplot(df, aes(x = month.sales, y = sales / 100000,
color = factor(year), group = factor(year))) +
geom_line()
Note that you need to add the group aesthetic so that ggplot doesn't automatically group your data points according to the factor levels on the x axis.

How to plot monthly data having in the x-axis months and Years R studio

I have a dataframe where column 1 are Months, column 2 are Years and column 3 are precipitation values.
I want to plot the precipitation values for EACH month and EACH year.
My data goes from at January 1961 to February 2019.
¿How can I plot that?
Here is my data:
If I use this:
plot(YearAn,PPMensual,type="l",col="red",xlab="años", ylab="PP media anual")
I get this:
Which is wrong because it puts all the monthly values in every single year! What Im looking for is an x axis that looks like "JAN-1961, FEB1961....until FEB-2019"
It can be done easily using ggplot/tidyverse packages.
First lets load the the packages (ggplot is part of tidyverse) and create a sample data:
library(tidyverse)
set.seed(123)
df <- data.frame(month = rep(c(1:12), 2),
year = rep(c("1961", "1962"),
each = 12),
ppmensual = rnorm(24, 5, 2))
Now we can plot the data (df):
df %>%
ggplot(aes(month, ppmensual,
group = year,
color = year)) +
geom_line()
Using lubridate and ggplot2 but with no grouping:
Setup
library(lubridate) #for graphic
library(ggplot2) # for make_date()
df <- tibble(month = rep(month.name, 40),
year = rep(c(1961:2000), each = 12),
PP = runif(12*40) * runif(12*40) * 10) # PP data is random here
print(df, n = 20)
month year PP
<chr> <int> <dbl>
1 January 1961 5.42
2 February 1961 0.855
3 March 1961 5.89
4 April 1961 1.37
5 May 1961 0.0894
6 June 1961 2.63
7 July 1961 1.89
8 August 1961 0.148
9 September 1961 0.142
10 October 1961 3.49
11 November 1961 1.92
12 December 1961 1.51
13 January 1962 5.60
14 February 1962 1.69
15 March 1962 1.14
16 April 1962 1.81
17 May 1962 8.11
18 June 1962 0.879
19 July 1962 4.85
20 August 1962 6.96
# … with 460 more rows
Graph
df %>%
ggplot(aes(x = make_date(year, factor(month)), y = PP)) +
geom_line() +
xlab("años")

Boxplot not plotting all data

I'm trying to plot a boxplot for a time series (e.g. http://www.r-graph-gallery.com/146-boxplot-for-time-series/) and can get every other example to work, bar my last one. I have averages per month for six years (2011 to 2016) and have data for 2014 and 2015 (albeit in small quantities), but for some reason, boxes aren't being shown for the 2014 and 2015 data.
My input data has three columns: year, month and residency index (a value between 0 and 1). There are multiple individuals (in this example, 37) each with an average residency index per month per year (including 2014 and 2015).
For example:
year month RI
2015 1 NA
2015 2 NA
2015 3 NA
2015 4 NA
2015 5 NA
2015 6 NA
2015 7 0.387096774
2015 8 0.580645161
2015 9 0.3
2015 10 0.225806452
2015 11 0.3
2015 12 0.161290323
2016 1 0.096774194
2016 2 0.103448276
2016 3 0.161290323
2016 4 0.366666667
2016 5 0.258064516
2016 6 0.266666667
2016 7 0.387096774
2016 8 0.129032258
2016 9 0.133333333
2016 10 0.032258065
2016 11 0.133333333
2016 12 0.129032258
which is repeated for each individual fish.
My code:
#make boxplot
boxplot(RI$RI~RI$month+RI$year,
xaxt="n",xlab="",col=my_colours,pch=20,cex=0.3,ylab="Residency Index (RI)", ylim=c(0,1))
abline(v=seq(0,12*6,12)+0.5,col="grey")
axis(1,labels=unique(RI$year),at=seq(6,12*6,12))
The average trend line works as per the other examples.
a=aggregate(RI$RI,by=list(RI$month,RI$year),mean, na.rm=TRUE)
lines(a[,3],type="l",col="red",lwd=2)
Any help on this matter would be greatly appreciated.
Your problem seems to be the presence of missing values, NA, in your data, the other values are plotted correctly. I've simplified your code a bit.
boxplot(RI$RI ~ RI$month + RI$year,
ylab="Residency Index (RI)")
a <- aggregate(RI ~ month + year, data = RI, FUN = mean, na.rm = TRUE)
lines(c(rep(NA, 6), a[,3]), type="l", col="red", lwd=2)
Also, I believe that maybe a boxplot is not the best way to depict your data. You only have one value per year/month, when a boxplot would require more. Maybe a simple scatter plot will do better.

How to improve my R syntax for ggplot2 to produce a line chart rather than a dotplot?

I am new to R and I am plotting a chart using ggplot2. Running head(mydata1) gives me the following output of the structure of my dataframe:
PropertyCode Date MthName CY TotalRN
<chr> <date> <chr> <chr> <int>
BLU 2015-01-01 Jan CY 2015 146
BLU 2015-02-01 Feb CY 2015 278
BLU 2015-03-01 Mar CY 2015 143
BLU 2015-04-01 Apr CY 2015 365
BLU 2015-05-01 May CY 2015 198
BLU 2015-06-01 Jun CY 2015 114
Here is my ggplot2 syntax to plot a line curve:
ggplot(data=mydata1, aes(x=MthName, y=TotalRN, color=CY)) +
geom_line() +
geom_point()
The output (see image below) has 2 major problems:
(1) The x-axis is showing the MthName in alphabetical order rather than Jan, Feb, Mar....till Dec
(2) The plot looks more like a dotplot rather than a line curve
How do I correct for these 2 issues and make my plot look like the one shown further below (that one is from Excel using the same data)?
You will have to convert your Date column to Date format and then try the following:
mydata1$Date = as.yearmon(mydata1$Date)
library(lubridate)
ggplot(data=mydata1, aes(x=month(mydata1$Date, label=TRUE, abbr=TRUE),
y=TotalRN, group=CY, color = CY)) +
geom_line() +
geom_point() +
xlab("Month name")

How to merge several data frames of equal format for plotting with ggplot in one digramm in R

I have several data frames (i.e t1, t2 and t3) of same format but maybe with different row lengths.
t1
year month avgTemp
2006 1 -0.95
2006 2 1.34
2006 3 3.58
2006 4 9.94
2006 5 14.67
2006 6 18.38
2006 7 23.56
2006 8 16.57
2006 9 18.08
2006 10 13.26
2006 11 8.27
2006 12 4.82
t2
year month avgTemp
2015 1 3.01
2015 2 2.16
2015 3 6.37
2015 4 10.31
2015 5 14.40
2015 6 17.84
2015 7 22.04
2015 8 21.35
2015 9 14.18
2015 10 9.40
2015 11 8.18
2015 12 7.22
and t3
year month avgTemp
2005 7 19.79
2005 8 17.54
2005 9 16.69
2005 10 11.64
2005 11 5.40
2005 12 1.97
Now, when I want to plot those 3 data frames in one diagramm I am doing this:
ggplot() +
geom_line(data=t1, aes(x = t1$month, y = t1$avgTemp, colour = t1$year)) +
geom_line(data=t2, aes(x = t2$month, y = t2$avgTemp, colour = t2$year)) +
geom_line(data=t3, aes(x = t3$month, y = t3$avgTemp, colour = t3$year))
And the output look like this
So far everything is ok, but the plot command is very ugly since I have to put every data frame into a new geom_line.
Is there a more elegant way to achieve this by merging the data frames or so?
Thanks in advance.
You can try something like this:
t <- rbind(t1, t2, t3)
t$year <- as.factor(t$year)
ggplot(t, aes(x = month, y = avgTemp, col = year)) + geom_line()
It should give you the desired plot with three lines for three years.
EDIT: Adding this code option based on the comment below about leaving year as a numeric value:
t <- rbind(t1, t2, t3)
ggplot(t, aes(x = month, y = avgTemp, col = year, group = year)) + geom_line()

Resources