So i have this data, that I would like to plot onto a graph - all the lines on the same graph
>ndiveristy
Quadrant nta.shannon ntb.shannon ntc.shannon
1 1 2.188984 0.9767274 1.8206140
2 2 1.206955 1.3240481 1.3007058
3 3 1.511083 0.5805081 0.7747041
4 4 1.282976 1.4222243 0.4843907
5 5 1.943930 1.7337267 1.5736545
6 6 2.030524 1.8604619 1.6860711
7 7 2.043356 1.5707110 1.5957869
8 8 1.421275 1.4363365 1.5456799
here is the code that I am using to try to plot it:
ggplot(ndiversity,aes(x=Quadrant,y=Diversity,colour=Transect))+
geom_point()+
geom_line(aes(y=nta.shannon),colour="red")+
geom_line(aes(y=ntb.shannon),colour="blue")+
geom_line(aes(y=ntc.shannon),colour="green")
But all I am getting is the error
data must be a data frame, or other object coercible by fortify(), not a numeric vector.
Can someone tell me what I'm doing wrong
Typically, rather than using multiple geom_line calls, we would only have a single call, by pivoting your data into long format. This would create a data frame of three columns: one for Quadrant, one containing labels nta.shannon, ntb.shannon and ntc.shannon, and a column for the actual values. This allows a single geom_line call, with the label column mapped to the color aesthetic, which automatically creates an instructive legend for your plot too.
library(tidyverse)
as.data.frame(ndiversity) %>%
pivot_longer(-1, names_to = 'Type', values_to = 'Shannon') %>%
mutate(Type = substr(Type, 1, 3)) %>%
ggplot(aes(Quadrant, Shannon, color = Type)) +
geom_line(size = 1.5) +
theme_minimal(base_size = 16) +
scale_color_brewer(palette = 'Set1')
For posterity:
convert to data frame
ndiversity <- as.data.frame(ndiversity)
get rid of the excess code
ggplot(ndiversity,aes(x=Quadrant))+
geom_line(aes(y=nta.shannon),colour="red")+
geom_line(aes(y=ntb.shannon),colour="blue")+
geom_line(aes(y=ntc.shannon),colour="green")
profit
not the prettiest graph I ever made
One of the variables in my data frame is a factor denoting whether an amount was gained or spent. Every event has a "gain" value; there may or may not be a corresponding "spend" amount. Here is an image with the observations overplotted:
Adding some random jitter helps visually, however, the "spend" amounts are divorced from their corresponding gain events:
I'd like to see the blue circles "bullseyed" in their gain circles (where the "id" are equal), and jittered as a pair. Here are some sample data (three days) and code:
library(ggplot2)
ccode<-c(Gain="darkseagreen",Spend="darkblue")
ef<-data.frame(
date=as.Date(c("2021-03-01","2021-03-01","2021-03-01","2021-03-01","2021-03-01","2021-03-01","2021-03-01","2021-03-01","2021-03-02","2021-03-02","2021-03-02","2021-03-02","2021-03-02","2021-03-02","2021-03-02","2021-03-03","2021-03-03","2021-03-03","2021-03-03","2021-03-03","2021-03-03","2021-03-03","2021-03-03","2021-03-01","2021-03-01","2021-03-01","2021-03-01","2021-03-01","2021-03-01","2021-03-01","2021-03-01","2021-03-02","2021-03-02","2021-03-02","2021-03-02","2021-03-02","2021-03-02","2021-03-02","2021-03-03","2021-03-03","2021-03-03","2021-03-03","2021-03-03","2021-03-03","2021-03-03","2021-03-03")),
site=c("Castle","Temple","Temple","Temple","Temple","Temple","Palace","Palace","Castle","Castle","Castle","Temple","Temple","Palace","Palace","Castle","Castle","Castle","Castle","Castle","Temple","Temple","Palace","Castle","Temple","Temple","Temple","Temple","Temple","Palace","Palace","Castle","Castle","Castle","Temple","Temple","Palace","Palace","Castle","Castle","Castle","Castle","Castle","Temple","Temple","Palace"),
id=c("C123","T101","T93","T94","T95","T96","P102","P96","C126","C127","C128","T100","T98","P100","P98","C129","C130","C131","C132","C133","T104","T99","P99","C123","T101","T93","T94","T95","T96","P102","P96","C126","C127","C128","T100","T98","P100","P98","C129","C130","C131","C132","C133","T104","T99","P99"),
gainspend=c("Gain","Gain","Gain","Gain","Gain","Gain","Gain","Gain","Gain","Gain","Gain","Gain","Gain","Gain","Gain","Gain","Gain","Gain","Gain","Gain","Gain","Gain","Gain","Spend","Spend","Spend","Spend","Spend","Spend","Spend","Spend","Spend","Spend","Spend","Spend","Spend","Spend","Spend","Spend","Spend","Spend","Spend","Spend","Spend","Spend","Spend"),
amount=c(6,14,34,31,3,10,6,14,2,16,16,14,1,1,15,11,8,7,2,10,15,4,3,NA,NA,4,5,NA,NA,NA,NA,NA,NA,2,NA,1,NA,3,NA,NA,2,NA,NA,2,NA,3))
#▼ 3 day, points centered
ggplot(ef,aes(date,site)) +
geom_point(aes(size=amount,color=gainspend),alpha=0.5) +
scale_color_manual(values=ccode) +
scale_size_continuous(range=c(1,15),breaks=c(5,10,20))
#▼ 3 day, jitted
ggplot(ef,aes(date,site)) +
geom_point(aes(size=amount,color=gainspend),alpha=0.5,position=position_jitter(w=0,h=0.2)) +
scale_color_manual(values=ccode) +
scale_size_continuous(range=c(1,15),breaks=c(5,10,20))
My main idea is the old "add jitter manually" approach. I'm wondering if a nicer approach could be something like plotting little pie charts as points a la package scatterpie.
In this case you could add a random number for the amount of jitter to each ID so points within groups will be moved the same amount. This takes doing work outside of ggplot2.
First, draw the "jitter" to add for each ID. Since a categorical axis is 1 unit wide, I choose numbers between -.3 and .3. I use dplyr for this work and set the seed so you will get the same results.
library(dplyr)
set.seed(16)
ef2 = ef %>%
group_by(id) %>%
mutate(jitter = runif(1, min = -.3, max = .3)) %>%
ungroup()
Then the plot. I use a geom_blank() layer so that the categorical site axis is drawn before I add the jitter. I convert site to be numeric from a factor and add the jitter on; this only works for factors so luckily categorical axes in ggplot2 are based on factors.
Now paired ID's move together.
ggplot(ef2, aes(x = date, y = site)) +
geom_blank() +
geom_point(aes(size = amount, color = gainspend,
y = as.numeric(factor(site)) + jitter),
alpha=0.5) +
scale_color_manual(values = ccode) +
scale_size_continuous(range = c(1, 15), breaks = c(5, 10, 20))
#> Warning: Removed 15 rows containing missing values (geom_point).
Created on 2021-09-23 by the reprex package (v2.0.0)
You can add some jitter by id outside the ggplot() call.
jj <- data.frame(id = unique(ef$id), jtr = runif(nrow(ef), -0.3, 0.3))
ef <- merge(ef, jj, by = 'id')
ef$sitej <- as.numeric(factor(ef$site)) + ef$jtr
But you need to make site integer/numeric to do this. So when it comes to making the plot, you need to manually add axis labels with scale_y_continuous(). (Update: the geom_blank() trick from aosmith above is a better solution!)
ggplot(ef,aes(date,sitej)) +
geom_point(aes(size=amount,color=gainspend),alpha=0.5) +
scale_color_manual(values=ccode) +
scale_size_continuous(range=c(1,15),breaks=c(5,10,20)) +
scale_y_continuous(breaks = 1:3, labels= sort(unique(ef$site)))
This seems to work, but there are still a few gain/spend circles without a partner--perhaps there is a problem with the id variable.
Perhaps someone else has a better approach!
I've got data on survival/sampling dates of over 500 dogs, each dog having been sampled at least once, and several having been sampled three or four times. For e.g.
Microchip_number Date Sampling_occasion
White notched fatso 20,11,2018 First
White notched fatso 28,12,2018 Second
White notched fatso 09,04,2019 Third
White notched fatso 23,10,2019 Fourth
Tuttu Jeevan 06,12,2018 First
Tuttu Jeevan 03,01,2019 Second
Tuttu Jeevan 04,05,2019 Third
Tuppy 22,10,2018 First
Tuppy 20,11,2018 Second
Tuppy 17,04,2019 Third
Tuppy 31,07,2019 Lost to study
I've managed to plot this in ggplot, but it's a very large image which requires zooming in and scrolling to view the sampling times of each individual dog.
Plot of outcomes for all dogs
I've found suggestions to split large dataframes based on a certain variable (e.g. month) or to use facet_wrap, but in my case, I don't have any such variable to use. Is there a way to split this large plot into multiple smaller plots that don't need to be zoomed in to view all the details clearly, such as below (without having to separately plot subsets of the dataframe)?
How I'd like each split/sub-plot to appear
This is the code I'm using
outcomes <- read_xlsx("Dog outcomes.xlsx", col_types = c("text", "date", "text"))
outcomes$Microchip_number<- as.factor(outcomes$Microchip_number)
outcomes$Sampling_occasion<- factor(outcomes$Sampling_occasion,
levels = c("First", "Second", "Third", "Fourth", "Lost to study", "Died"))
g<- ggplot(outcomes)
g + geom_point(aes(x = Date, y = Microchip_number, colour = Sampling_occasion, shape = Sampling_occasion)) +
geom_line(aes(x = Date, y = Microchip_number, group = Microchip_number, colour = Sampling_occasion)) +
theme_bw()
Thanks so much, Jrm FRL, the code to add the counter and subgroup columns was exactly what I needed! As Gregor mentioned, facet_wrap just made things more difficult to view, so I used a for loop using subgroup to plot 50 dogs per pdf page (or any other device). This is the code I used, and it's worked perfectly, although for some reason, the 'Microchip_number's are displaying in reverse sequence / alphabetical order (68481, 68480, 68479 etc.), despite being organised the other way round in the main dataframe 'outcomes'. Minor quibble, however! This makes it so much easier to visualise outcomes for specific individuals. Cheers!
outcomes2 <- outcomes %>%
mutate(counter = 1 + cumsum(c(0,as.numeric(diff(Microchip_number))!=0)), # this counter starting at 1 increments for each new dog
subgroup = as.factor(ceiling(counter/50)))
pdf(file = "All_outcomes_50.pdf") #
for (i in 1:length(unique(outcomes2$subgroup))) {
outcomes2 %>%
filter(subgroup == i) -> df
ggplot(df) + geom_point(aes(x = Date, y = Microchip_number, colour = Sampling_occasion, shape = Sampling_occasion)) +
geom_line(aes(x = Date, y = Microchip_number, group = Microchip_number, colour = Sampling_occasion)) +
theme_bw() -> wow
print(wow)
}
dev.off()
New plot after using 'for' loop
You can simply divide your dasatet in sub-groups containing the same number of dogs (e.g. 10).
Add an intermediate counter column to overcome the small difficulty that there is not necessarly the same number of rows for each dog.
I would suggest :
library('dplyr')
outcomes <- outcomes %>%
mutate(counter = 1 + cumsum(c(0,as.numeric(diff(Microchip_number))!=0)), # this counter starting at 1 increments for each new dog
subgroup = as.factor(ceiling(counter/10)))
You will obtain a new dataset with a factor subgroup column whose value is different every 10th dog. Then just add a + facet_wrap(.~subgroup) to your plot.
Hope this will help.
I have been trying to plot a line plot with ggplot.
My data looks something like this:
I04 F04 I05 F05 I06 F06
CAT 3 12 2 6 6 20
DOG 0 0 0 0 0 0
BIEBER 1 0 0 1 0 0
and can be found here.
Basically, we have a certain number of CATs (or other creatures) initially in a year (this is I04), and a certain number of CATs at the end of the year (this is F04). This goes on for some time.
I can plot something like this fairly simply using the code below, and get this:
This is fantastic, but doesn't work very well for me. After all, I have these staring and ending inventory for each year. So I am interested in seeing how the initial values (I04, I05, I06) change over time. So, for each animal, I would like to create two different lines, one for initial quantity and one for final quantity (F01, F05, F06). This seems to me like now I have to consider two factors.
This is really difficult given the way my data is set up. I'm not sure how to tell ggplot that all the I prefixed years are one factor, and all the F prefixed years are another factor. When the dataframe gets melted, it's too late. I'm not sure how to control this situation.
Any advice on how I can separate these values or perhaps another, better way to tackle this situation?
Here is the code I have:
library(ggplot2)
library(reshape2)
DF <- read.csv("mydata.csv", stringsAsFactors=FALSE)
## cleaning up, converting factors to numeric, etc
text_names <- data.frame(as.character(DF$animals))
names(text_names) <- c("animals")
numeric_cols <- DF[, -c(1)]
numeric_cols <- sapply(numeric_cols, as.numeric)
plot_me <- data.frame(cbind(text_names, numeric_cols))
plot_me$animals <- as.factor(plot_me$animals)
meltedDF <- melt(plot_me)
p <- ggplot()
p <- p + geom_line(aes(seq(1:36), meltedDF$value, group=meltedDF$animals, color=meltedDF$animals))
p
Using your original data from the link:
nd <- reshape(mydata, idvar = "animals", direction = "long", varying = names(mydata)[-1], sep = "")
ggplot(nd, aes(x = time, y = I, group = animals, colour = animals)) + geom_line() + ggtitle("Development of initial inventories")
ggplot(nd, aes(x = time, y = F, group = animals, colour = animals)) + geom_line() + ggtitle("Development of final inventories")
I think from a data analyst perspective the following approach might provide better insight.
For each animal we visualize the initial and the final quantity in a separate panel. Moreover, each subplot has its own y scale because the values of the different animal types are radically different. Like this, differences within and across animal types are easier to spot.
Given the current structure of your data, we do not need two different factors. After the gather call the indicator column includes data like I04, F04, etc. We just need to separate the first character from the rest resulting in two columns type and time. We can use type as the argument for color in the ggplot call. time provides a unified x-axis across all animal types.
library(tidyr)
library(dplyr)
library(ggplot2)
data %>% gather(indicator, value, -animals) %>%
separate(indicator, c('type', 'time'), sep = 1) %>%
mutate(
time = as.numeric(time)
) %>% ggplot(aes(time, value, color = type)) +
geom_line() +
facet_grid(animals ~ ., scales = "free_y")
Of course, you might also do it the other way round, namely using a subplot for the initial and the final quantities like this:
data %>% gather(indicator, value, -animals) %>%
separate(indicator, c('type', 'time'), sep=1) %>%
mutate(
time = as.numeric(time)
) %>% ggplot(aes(time, value, color = animals)) +
geom_line() +
facet_grid(type ~ ., scales = "free_y")
But as described above, I would not recommend that because the y scale varies too much across animal types.