Binding multiple csvs in R to output to a graph in ggplot2 - r

This is an example of my data:
HUC8 YEAR RO_MM
bcc1_45Fall_1020004 1961 112.0
bcc1_45Fall_1020004 1962 243.7
bcc1_45Fall_1020004 1963 233.3
bcc1_45Fall_1020004 1964 190.3
bcc1_M_45Fall_1020004 1961 100.9
bcc1_M_45Fall_1020004 1962 132.3
bcc1_M_45Fall_1020004 1963 255.1
bcc1_M_45Fall_1020004 1964 281.9
bnuesm_45Fall_1020004 1961 89.0
bnuesm_45Fall_1020004 1962 89.5
bnuesm_45Fall_1020004 1963 126.8
bnuesm_45Fall_1020004 1964 194.3
canesm2_45Fall_1020004 1961 186.6
canesm2_45Fall_1020004 1962 197.4
canesm2_45Fall_1020004 1963 229.1
canesm2_45Fall_1020004 1964 141.8
Each of the similar prefixes represents (a segment of) a single csv. I have called them into a list and used rbind to link them. My goal is to have each csv represent a line of data, which would look like this:
Name
1961 1962 1963 1964 ...
bcc1_45Fall_1020004 112.0 243.7 233.3 190.3
bcc1_M_45Fall_1020004 100.9 132.3 255.1 281.9
bnuesm_45Fall_1020004 89.0 89.5 126.8 194.3
canesm2_45Fall_1020004 186.6 197.4 229.1 141.8
I would then like to plot these lines in a line graph using ggplot2where each Name becomes a line of "RO_MM" data over 140 years. Remember, this is only a tiny sample. There are actually hundreds of files. I know that hundreds is too many for a graph and plan to do them in smaller groups, but I DO NOT need to grid them together. I have so far used this code which has provided the initial datalist above:
library(rio)
library(tidyverse)
library(data.table)
file_names <- list.files("~/Desktop/Rproj/splitByHUCs45/a01020004/splFall")
data_list <- lapply(file_names, read.csv , header=TRUE, sep=",")
finalTable <- do.call(rbind, data_list)
I have found this code (below). It is not what I need because I don't need the mean of anything, but I saw that it used more than one csv for input, so I'm trying to make sense of it, but don't know how to make it work for me.
#some pseudo data for testing
my_other_data <- myData
my_other_data$Data <- my_other_data$Data * 0.5
pplot <- ggplot(data=myData, aes(x=Group, y=Data)) +
stat_summary(fun = mean, geom = "line", color='red') +
stat_summary(data=my_other_data, aes(x=Group, y=Data),
fun = mean, geom = "line", color='green') +
xlab("Group") +
ylab("Data")
pplot
That said, the page on creating a reprex said that I should provide you with this:
head(finalTable, 3) %>%
+ deparse()
[1] "structure(list(HUC8 = structure(c(1L, 1L, 1L), .Label = c(\"bcc1_45Fall_1020004\", "
[2] "\"bcc1_M_45Fall_1020004\", \"bnuesm_45Fall_1020004\", \"canesm2_45Fall_1020004\", "
[3] "\"ccsm4_45Fall_1020004\", \"cnrmcm5_45Fall_1020004\", \"csiromk360_45Fall_1020004\", "
[4] "\"gfdlesm2g_45Fall_1020004\", \"gfdlesm2m_45Fall_1020004\", \"hadgem2cc_45Fall_1020004\", "
[5] "\"hadgem2es_45Fall_1020004\", \"hist_Fall_1020004\", \"inmcm4_45Fall_1020004\", "
[6] "\"ipslcm5_alr_45Fall_1020004\", \"ipslcm5_blr_45Fall_1020004\", \"ipslcm5amr_45Fall_1020004\", "
[7] "\"miroc5_45Fall_1020004\", \"mirocesm_45Fall_1020004\", \"mirocesmchem_45Fall_1020004\", "
[8] "\"mricgcm3_45Fall_1020004\", \"noresm1m_45Fall_1020004\"), class = \"factor\"), "
[9] " YEAR = 1961:1963, RO_MM = c(112, 243.7, 233.3)), row.names = c(NA, "
[10] "3L), class = \"data.frame\")"
I would appreciate getting help structuring the data so that I can bring it into ggplot2 and how to make a graph with ggplot2, and explanations would be especially helpful. Thanks.

I wasn't able to use your example data (please use dput(head(finalTable)) instead of deparse), but here is one potential solution using the data at the beginning of your question:
# Load libraries and data
library(tidyverse)
dat1 <- read.table(text = "HUC8 YEAR RO_MM
bcc1_45Fall_1020004 1961 112.0
bcc1_45Fall_1020004 1962 243.7
bcc1_45Fall_1020004 1963 233.3
bcc1_45Fall_1020004 1964 190.3
bcc1_M_45Fall_1020004 1961 100.9
bcc1_M_45Fall_1020004 1962 132.3
bcc1_M_45Fall_1020004 1963 255.1
bcc1_M_45Fall_1020004 1964 281.9
bnuesm_45Fall_1020004 1961 89.0
bnuesm_45Fall_1020004 1962 89.5
bnuesm_45Fall_1020004 1963 126.8
bnuesm_45Fall_1020004 1964 194.3
canesm2_45Fall_1020004 1961 186.6
canesm2_45Fall_1020004 1962 197.4
canesm2_45Fall_1020004 1963 229.1
canesm2_45Fall_1020004 1964 141.8",
header = TRUE)
# Create your table
dat1 %>%
pivot_wider(names_from = YEAR, values_from = RO_MM)
# A tibble: 4 x 5
# HUC8 `1961` `1962` `1963` `1964`
# <chr> <dbl> <dbl> <dbl> <dbl>
#1 bcc1_45Fall_1020004 112 244. 233. 190.
#2 bcc1_M_45Fall_1020004 101. 132. 255. 282.
#3 bnuesm_45Fall_1020004 89 89.5 127. 194.
#4 canesm2_45Fall_1020004 187. 197. 229. 142.
# Create a line plot (don't need to use the table for this)
dat1 %>%
ggplot(aes(x = YEAR, y = RO_MM, group = HUC8, color = HUC8)) +
geom_line()
And you can 'group' your results however you like, e.g.
dat1 %>%
mutate(group = ifelse(str_detect(string = HUC8, pattern = "bcc"),
"group_bcc", "group_others")) %>%
ggplot(aes(x = YEAR, y = RO_MM, group = HUC8, color = HUC8)) +
geom_line() +
facet_grid(rows = vars(group))
And, if you don't want a grid (like you said in your question):
dat1 %>%
mutate(group = ifelse(str_detect(string = HUC8, pattern = "bcc"),
"group_bcc", "group_others")) %>%
filter(group == "group_bcc") %>%
ggplot(aes(x = YEAR, y = RO_MM, group = HUC8, color = HUC8)) +
geom_line() +
ggtitle("bcc csv files only")
You can "highlight" one specific line using a package (e.g. gghighlight) or just tell ggplot which colours you want to use, e.g.
dat1 %>%
ggplot(aes(x = YEAR, y = RO_MM, group = HUC8, color = HUC8)) +
geom_line() +
scale_color_manual(values = c("black", viridis::viridis(3, alpha = 0.33)))

Related

How to make exploratory plots using only certain rows in a column

I am making some exploratory plots to analyze zone M. I need one that plots Distance over time and another with Distance vs. MHT.
Here is what I have so far:
library(ggplot2)
ggplot(datmarsh, aes(x=Year, y=Distance)) + geom_point()
ggplot(datmarsh, aes(x=MHT, y=Distance)) + geom_point()
What I'm struggling with is specifying only zone "M" in each of these graphs.
Here is a sample of what my data looks like:
Year Distance MHT Zone
1975 253.1875 933 M
1976 229.75 877 M
1977 243.8125 963 M
1978 243.8125 957 M
1975 103.5 933 P
1976 150.375 877 P
1977 117.5625 963 P
1978 131.625 957 P
1979 145.6875 967 P
1975 234.5 933 PP
1976 314.1875 877 PP
1977 248.5625 963 PP
1978 272 957 PP
1979 290.75 967 PP
Thanks!
dplyr::filter() will let you do what you need. However, this has probably been answered elsewhere a few times, so do try searching!
library(dplyr)
library(ggplot2)
library(magrittr)
datmarsh %>%
filter(Zone == "M") %>%
ggplot(aes(x=Year, y=Distance)) +
geom_point()
datmarsh %>%
filter(Zone == "M") %>%
ggplot(daes(x=MHT, y=Distance)) +
geom_point()

How to plot a specific row in a dataframe using ggplot [duplicate]

This question already has answers here:
Plotting each value of columns for a specific row
(2 answers)
Closed 1 year ago.
I have a dataframe that shows the number of car sales in each country for years 2000 to 2020. I wish to plot a line graph to show how the number of car sales have changed over time for only a specific country/row, with year on the x axis and sales on the y axis. How would I do this using ggplot?
You perhaps want this
#toy_data
sales
#> Country 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010
#> 2 A 1002 976 746 147 1207 627 157 1481 1885 1908 392
#> 3 B 846 723 1935 176 1083 636 1540 1692 899 607 1446
#> 4 C 1858 139 1250 121 1520 199 864 238 1109 1029 937
#> 5 D 534 1203 1759 553 1765 1784 1410 420 606 467 1391
library(tidyverse)
#for all countries
sales %>% pivot_longer(!Country, names_to = 'year', values_to = 'sales') %>%
mutate(year = as.numeric(year)) %>%
ggplot(aes(x = year, y = sales, color = Country)) +
geom_line()
#for one country
sales %>% pivot_longer(!Country, names_to = 'year', values_to = 'sales') %>%
mutate(year = as.numeric(year)) %>%
filter(Country == 'A') %>%
ggplot(aes(x = year, y = sales)) +
geom_line()
Created on 2021-06-07 by the reprex package (v2.0.0)
Suppose you have a data frame that looks like this:
#make dummy df
df <- matrix(sample(1:100, 63), ncol=21, nrow=3)
rownames(df) <- c("UK", "US", "UAE")
colnames(df) <- 2000:2020
Here I generated some random data for 21 years between 2000 and 2020, and for three countries. To get a line plot with ggplot for UK, I did:
data_uk <- data.frame(year=colnames(df), sales=df["UK",], row.names=NULL)
ggplot(data=data_uk, aes(x=year, y=sales, group=1)) + geom_point() + geom_line()
Example plot

How to plot monthly data having in the x-axis months and Years R studio

I have a dataframe where column 1 are Months, column 2 are Years and column 3 are precipitation values.
I want to plot the precipitation values for EACH month and EACH year.
My data goes from at January 1961 to February 2019.
¿How can I plot that?
Here is my data:
If I use this:
plot(YearAn,PPMensual,type="l",col="red",xlab="años", ylab="PP media anual")
I get this:
Which is wrong because it puts all the monthly values in every single year! What Im looking for is an x axis that looks like "JAN-1961, FEB1961....until FEB-2019"
It can be done easily using ggplot/tidyverse packages.
First lets load the the packages (ggplot is part of tidyverse) and create a sample data:
library(tidyverse)
set.seed(123)
df <- data.frame(month = rep(c(1:12), 2),
year = rep(c("1961", "1962"),
each = 12),
ppmensual = rnorm(24, 5, 2))
Now we can plot the data (df):
df %>%
ggplot(aes(month, ppmensual,
group = year,
color = year)) +
geom_line()
Using lubridate and ggplot2 but with no grouping:
Setup
library(lubridate) #for graphic
library(ggplot2) # for make_date()
df <- tibble(month = rep(month.name, 40),
year = rep(c(1961:2000), each = 12),
PP = runif(12*40) * runif(12*40) * 10) # PP data is random here
print(df, n = 20)
month year PP
<chr> <int> <dbl>
1 January 1961 5.42
2 February 1961 0.855
3 March 1961 5.89
4 April 1961 1.37
5 May 1961 0.0894
6 June 1961 2.63
7 July 1961 1.89
8 August 1961 0.148
9 September 1961 0.142
10 October 1961 3.49
11 November 1961 1.92
12 December 1961 1.51
13 January 1962 5.60
14 February 1962 1.69
15 March 1962 1.14
16 April 1962 1.81
17 May 1962 8.11
18 June 1962 0.879
19 July 1962 4.85
20 August 1962 6.96
# … with 460 more rows
Graph
df %>%
ggplot(aes(x = make_date(year, factor(month)), y = PP)) +
geom_line() +
xlab("años")

How to create cumulative precipitation vs. temperature graph in a single plot

I have historical data for precip vs. annual temperature. I want to plot them into cool & wet, warm and wet, cool and dry, warm and dry years. Can someone help me with this?
Year Precip annual temperature
1987 821 8.5
1988 441 8
1989 574 7.9
1990 721 12.4
1991 669 10.8
1992 830 10
1993 1105 7.8
1994 772 8
1995 678 6.7
1996 834 8
1997 700 11
1998 786 11.2
1999 612 12
2000 758 10.6
2001 833 11
2002 622 10.6
2003 656 10.7
2004 799 9.9
2005 647 10.8
2006 764 12
2007 952 12.5
2008 943 10.86
2009 610 12.8
2010 766 11
2011 717 11.3
2012 602 9.5
2013 834 10.6
2014 758 11
2015 841 11
2016 630 11.5
2017 737 11.2
Average 742.32 10.36
As Majid suggested, you need to give more detail so you can get better answers. At least, try to use dput() with your dataframe, so we can get a reproducible copy of it. Copying and pasting into Excel is not appropriate for these kind of questions.
In any case, that graph can be easily be done using the ggplot2 package. You graph each year based on its X and Y coordinates and then manually add the lines and the titles for each category. You do need to establish the boundaries between cool/warm and dry/wet, of course.
library(ggplot2)
rain <- read.csv('~/data/rain.csv')
limit_humid <- 800
limit_warm <- 9.5
ggplot(rain, aes(x = temp, y = precip)) +
geom_text(aes(label = year)) +
geom_vline(xintercept = limit_warm) +
geom_hline(yintercept = limit_humid) +
annotate('text', label = 'bold("Cool and wet")', size = 4, parse = T,
x = min(rain$temp), y = max(rain$precip)) +
annotate('text', label = 'bold("Warm and wet")', size = 4, parse = T,
x = max(rain$temp), y = max(rain$precip)) +
annotate('text', label = 'bold("Cool and dry")', size = 4, parse = T,
x = min(rain$temp), y = min(rain$precip)) +
annotate('text', label = 'bold("Warm and wet")', size = 4, parse = T,
x = max(rain$temp), y = min(rain$precip)) +
theme_classic() +
labs(x = 'Average Temperature (°C)',
y = 'Cumulative precipitation (mm)')

ggplot sub-plots with categorical and numeric in R

I have a the following table and I need to plot this to show (week in x-axis and percent in y-axis). MY following code plots nothing but gives me a message. Can someone help me to fix this?
Any help is appreciated.
dfx1:
Year State Cty Week ac_sum percent
1998 KS Coffey 10-1 79 6.4
1998 KS Coffey 10-3 764 62
1998 KS Coffey 10-4 951 77.2
1998 KS Coffey 10-5 1015 82.4
1998 KS Coffey 11-2 1231 100
1998 KS Crawford 10-3 79 6.1
1998 KS Crawford 10-4 764 15.8
1998 KS Crawford 10-5 951 84.1
1998 KS Crawford 11-2 1015 100
.
.
.
.
gg <- ggplot(dfx1, aes(Week,percent, col=Year))
gg <- gg + geom_line()
gg <- gg + facet_wrap(~Cty, 2, scales = "fixed")
gg <- gg + xlim(c(min(dfx1$Week), max(dfx1$Week)))
plot(gg)
geom_path: Each group consists of only one observation. Do you need to
adjust the group aesthetic?
Is this what you want?
dfx1 <- read.table(text="Year State Cty Week ac_sum percent
1998 KS Coffey 10-1 79 6.4
1998 KS Coffey 10-3 764 62
1998 KS Coffey 10-4 951 77.2
1998 KS Coffey 10-5 1015 82.4
1998 KS Coffey 11-2 1231 100
1998 KS Crawford 10-3 79 6.1
1998 KS Crawford 10-4 764 15.8
1998 KS Crawford 10-5 951 84.1
1998 KS Crawford 11-2 1015 100", header=T)
library(ggplot2)
ggplot(dfx1, aes(Week,percent, col=Year)) +
geom_point() +
facet_wrap(~Cty, 2, scales = "fixed")
ggplot(dfx1, aes(Week,percent, col=Year, group=1)) +
geom_point() + geom_line() +
facet_wrap(~Cty, 2, scales = "fixed")
You can look at other answers like this one to see that you're missing group = Year in your plot. Adding it in will give you what you are looking for:
library(ggplot2)
dfx1$Week <- factor(dfx1$Week, ordered = T)
ggplot(dfx1, aes(Week, percent, col = Year, group = Year)) +
geom_line() +
facet_wrap(~Cty, 2, scales = 'fixed')
With your last line it looks like you're wanting to only show the Weeks that actually have data. You can do that with scales = 'free', like so:
ggplot(dfx1, aes(Week, percent, col = Year, group = Year)) +
geom_line() +
facet_wrap(~Cty, 2, scales = 'free')

Resources