Overlay multiple lines from data frame with index column onto existing plot - r

I have a dataframe with 3 columns, (Id, Lat, Long), you can construct a small section of this with the following data:
df <- data.frame(
Id=c(1,1,2,2,2,2,2,2,3,3,3,3,3,3),
Lat=c(58.12550, 58.17426, 58.46461, 58.45812, 58.45207, 58.44512, 58.43358, 58.42727, 57.77700, 57.76034, 57.73614, 57.72411, 57.70498, 57.68453),
Long=c(-5.098068, -5.314452, -4.914108, -4.899922, -4.887067, -4.873312, -4.852384, -4.840817, -5.666568, -5.648711, -5.617588, -5.594681, -5.557740, -5.509405))
The Id column is an index column. So all the rows with the same Id number have the coordinates for a single line. In my data frame this Id number varies from 1 through to 7696. So I have 7696 lines to plot.
Each Id number relates to an individual separate line of Lat and Long coordinates. What I want to do is overlay onto an existing plot all of these 7696 individual lines.
With the example data above this contains the Lat & Long coordinates for lines 1, 2, 3.
What is the best way to overlay all these lines onto an existing plot, I was thinking maybe some kind of loop?

Using ggplot2:
#dummy data
df <- data.frame(
Id=c(1,1,2,2,2,2,2,2,3,3,3,3,3,3),
Lat=c(58.12550, 58.17426, 58.46461, 58.45812, 58.45207, 58.44512, 58.43358, 58.42727, 57.77700, 57.76034, 57.73614, 57.72411, 57.70498, 57.68453),
Long=c(-5.098068, -5.314452, -4.914108, -4.899922, -4.887067, -4.873312, -4.852384, -4.840817, -5.666568, -5.648711, -5.617588, -5.594681, -5.557740, -5.509405))
library(ggplot2)
#plot
ggplot(data=df,aes(Lat,Long,colour=as.factor(Id))) +
geom_line()
Using base R:
#plot blank
with(df,plot(Lat,Long,type="n"))
#plot lines
for(i in unique(df$Id))
with(df[ df$Id==i,],lines(Lat,Long,col=i))

To be honest, I think that any approach to take is going to result in a very cluttered plot since you have so many Ids (unless their lines do not overlap much). Either way, I would probably use ggplot2 for this.
##
if( !("ggplot2" %in% installed.packages()[,1]) ){
install.packages("ggplot2",dependencies=TRUE)
}
library(ggplot2)
##
D <- data.frame(
Id=Id,
Lat=Lat,
Long=Long
)
##
ggplot(data=D,aes(x=Lat,y=Long,group=Id,color=Id))+
geom_point()+ ## you might want to omit geom_point() in your plot
geom_line()
##
The reason I used group=Id, color=Id in aes() rather than passing Id as a factor to aes() and just using color=Id is that you will end up with a legend containing 7000+ factor levels (the majority of which will not be visible in the plot area).

Related

How do I make my row names appear on my x axis? And the numbers on from my variables appear as the y axis?

I created a dataframe with countries as row names and percentages as obs. from the variables, but when making a histogram it seems that the percentages from the variables are occupying the x axis and the country names aren't even there. How do I make it so that the countrie's names are on the x axis and the variables on the y?
Country <- c('Albania','Armenia','Austria','Belarus','Belgium','Bosnia and Herzegovina','Bulgaria','Croatia','Cyprus','Czechia','Denmark','Estonia','Finland','France','Georgia','Germany','Greece','Hungary','Iceland','Ireland','Italy','Latvia','Lithuania','Luxembourg','Malta','Moldova','Montenegro','Netherlands','Norway','Poland','Portugal','Romania','Russia','Serbia','Slovakia','Slovenia','Spain','Sweden','Switzerland','Turkey','Ukraine','United Kingdom')
Anxiety.Disorders <- c(3.38,2.73,5.22,3.03,4.92,3.70,3.84,3.74,5.61,3.59,5.18,3.01,3.59,6.37,2.46,6.37,5.58,3.69,5.15,5.66,5.57,3.04,3.06,5.19,5.14,2.77,3.55,6.43,7.33,3.68,5.52,3.41,3.02,3.60,3.61,3.60,5.14,5.16,5.28,3.85,3.09,4.43)
Depressive.Disorders <- c(2.42,3.16,3.66,4.84,4.35,2.88,3.30,3.60,3.88,3.25,3.62,4.78,5.08,4.55,2.98,4.42,4.56,3.53,3.55,4.37,3.94,4.44,5.20,3.95,3.69,3.77,2.96,4.34,3.95,2.72,5.27,2.88,4.36,3.15,2.87,3.58,3.91,4.84,4.17,3.76,5.02,4.35)
Bipolar.Disorder <- c(0.72,0.77,0.95,0.73,0.91,0.79,0.67,0.77,1.04,0.75,0.99,0.71,0.99,0.93,0.67,0.79,0.93,0.74,0.97,0.80,0.95,0.71,0.73,0.95,0.97,0.67,0.74,0.94,0.85,0.76,0.97,0.78,0.70,0.74,0.76,0.75,0.97,1.04,0.98,0.85,0.73,1.05)
G08 <- data.frame(Country, Anxiety.Disorders, Depressive.Disorders, Bipolar.Disorder)
row.names(G08) <- G08$Country
G08[1] <- NULL
hist(G08$Anxiety.Disorders)
I use the melt() call to create one observation per row. Then, I use ggplot to produce the bar plot.
library(ggplot2)
library(reshape2)
Country <- c('Albania','Armenia','Austria','Belarus','Belgium','Bosnia-Herzegovina','Bulgaria','Croatia','Cyprus','Czechia','Denmark','Estonia','Finland','France','Georgia','Germany','Greece','Hungary','Iceland','Ireland','Italy','Latvia','Lithuania','Luxembourg','Malta','Moldova','Montenegro','Netherlands','Norway','Poland','Portugal','Romania','Russia','Serbia','Slovakia','Slovenia','Spain','Sweden','Switzerland','Turkey','Ukraine','United Kingdom')
Anxiety.Disorders <- c(3.38,2.73,5.22,3.03,4.92,3.70,3.84,3.74,5.61,3.59,5.18,3.01,3.59,6.37,2.46,6.37,5.58,3.69,5.15,5.66,5.57,3.04,3.06,5.19,5.14,2.77,3.55,6.43,7.33,3.68,5.52,3.41,3.02,3.60,3.61,3.60,5.14,5.16,5.28,3.85,3.09,4.43)
Depressive.Disorders <- c(2.42,3.16,3.66,4.84,4.35,2.88,3.30,3.60,3.88,3.25,3.62,4.78,5.08,4.55,2.98,4.42,4.56,3.53,3.55,4.37,3.94,4.44,5.20,3.95,3.69,3.77,2.96,4.34,3.95,2.72,5.27,2.88,4.36,3.15,2.87,3.58,3.91,4.84,4.17,3.76,5.02,4.35)
Bipolar.Disorder <- c(0.72,0.77,0.95,0.73,0.91,0.79,0.67,0.77,1.04,0.75,0.99,0.71,0.99,0.93,0.67,0.79,0.93,0.74,0.97,0.80,0.95,0.71,0.73,0.95,0.97,0.67,0.74,0.94,0.85,0.76,0.97,0.78,0.70,0.74,0.76,0.75,0.97,1.04,0.98,0.85,0.73,1.05)
G08 <- data.frame(Country, Anxiety.Disorders, Depressive.Disorders, Bipolar.Disorder)
G08melt <- melt(G08, "Country")
G08.bar <- ggplot(G08melt, aes(x = Country, y=value)) +
geom_bar(aes(fill=variable),stat="identity", position ="dodge") +
theme_bw()+
theme(axis.text.x = element_text(angle=-40, hjust=.1))
G08.bar
Looking at your question, I think you tried to do a grouped column diagram instead of a histogram. You can do the plot directly using the barplot function from the graphics package. But before that, you need to convert your dataframe into a matrix. I removed the first column from G08.
mat<-G08[,-1]
Now just simply use the barplot function on the transpose of the matrix mat and use the names parameter of barplot to write the names of the Countries on the x-axis:
barplot(t(mat),beside=T,col=c('red','blue','gold'),border=NA,names=G08$Country,cex.names=0.45,las=2)
par(new=T)
legend('topright',c("Anxiety","Depressive","Bipolar"),fill=c("red","blue","gold"),cex=0.5,title='Disorder types')
Suggestion:
For a little bit of more 'fresh air' in the graph, you can just set beside=F in barplot and get a stacked column diagram:

geom_area doesn't show data, supposedly because of x-axis data

I want to create a stacked area plot based on a data frame.
Time <- c("W37/19","W38/19","W39/19","W40/19","W41/19")
Basis <- c(20.07,20.07,20.07,20.07,20.07)
AdStock <- c(5.88,5.60,5.34,5.09,4.86)
TV <- c(0,0,0.54,0.93,1.14)
Display <- c(0.07,0.21,0.33,0.35,0.36)
df_graph <- data.frame(Time, Basis, AdStock, TV, Display)
Data is time series data, "Time" is German calender weeks and should stay in this order.
First thing I do is transforming the data in long format.
library(tidyr)
df_graph <- pivot_longer(df_graph[,c("Time","Basis","AdStock","TV","Display")],-Time)
Second I convert df_graph$name to a factor and reverse the order, because I want to keep the original order for the stacking.
library(forcats)
df_graph$name <-factor(df_graph$name, levels = c("Basis","AdStock","TV","Display"))
df_graph$name <- fct_rev(df_graph$name)
Then I want to plot my data.
library(ggplot2)
p <- ggplot(df_graph, aes(x=Time, y=value, fill=name))
p <- p + geom_area()
p
The plot shows both axes as well as the legend but no data.
If I replace the calender weeks in "Time" by just an ascending series of numbers
df_graph$Time <- seq(1:5)
it works, but not with my X-Axis values.
Also I don't think, that the conversion of "Name" to factor is a problem, because I still don't get data in my plot even if I remove these two lines.
I tried different methods for the Long-Format (e.g. gather) and also tried using the ascending series of numbers(1:5) as x-values and then replacing it with scale_x_discrete but my areas always disappear.
What am I missing?
Many thanks in advance.

Graphing different variables in the same graph R- ggplot2

I have several datasets and my end goal is to do a graph out of them, with each line representing the yearly variation for the given information. I finally joined and combined my data (as it was in a per month structure) into a table that just contains the yearly means for each item I want to graph (column depicting year and subsequent rows depicting yearly variation for 4 different elements)
I have one factor that is the year and 4 different variables that read yearly variations, thus I would like to graph them on the same space. I had the idea to joint the 4 columns into one by factor (collapse into one observation per row and the year or factor in the subsequent row) but seem unable to do that. My thought is that this would give a structure to my y axis. Would like some advise, and to know if my approach to the problem is effective. I am trying ggplot2 but does not seem to work without a defined (or a pre defined range) y axis. Thanks
I would suggest next approach. You have to reshape your data from wide to long as next example. In that way is possible to see all variables. As no data is provided, this solution is sketched using dummy data. Also, you can change lines to other geom you want like points:
library(tidyverse)
set.seed(123)
#Data
df <- data.frame(year=1990:2000,
v1=rnorm(11,2,1),
v2=rnorm(11,3,2),
v3=rnorm(11,4,1),
v4=rnorm(11,5,2))
#Plot
df %>% pivot_longer(-year) %>%
ggplot(aes(x=factor(year),y=value,group=name,color=name))+
geom_line()+
theme_bw()
Output:
We could use melt from reshape2 without loading multiple other packages
library(reshape2)
library(ggplot2)
ggplot(melt(df, id.var = 'year'), aes(x = factor(year), y = value,
group = variable, color = variable)) +
geom_line()
-output plot
Or with matplot from base R
matplot(as.matrix(df[-1]), type = 'l', xaxt = 'n')
data
set.seed(123)
df <- data.frame(year=1990:2000,
v1=rnorm(11,2,1),
v2=rnorm(11,3,2),
v3=rnorm(11,4,1),
v4=rnorm(11,5,2))

How do I put multiple boxplots in the same graph in R?

Sorry I don't have example code for this question.
All I want to know is if it is possible to create multiple side-by-side boxplots in R representing different columns/variables within my data frame. Each boxplot would also only represent a single variable--I would like to set the y-scale to a range of (0,6).
If this isn't possible, how can I use something like the panel option in ggplot2 if I only want to create a boxplot using a single variable? Thanks!
Ideally, I want something like the image below but without factor grouping like in ggplot2. Again, each boxplot would represent completely separate and single columns.
ggplot2 requires that your data to be plotted on the y-axis are all in one column.
Here is an example:
set.seed(1)
df <- data.frame(
value = runif(810,0,6),
group = 1:9
)
df
library(ggplot2)
ggplot(df, aes(factor(group), value)) + geom_boxplot() + coord_cartesian(ylim = c(0,6)
The ylim(0,6) sets the y-axis to be between 0 and 6
If your data are in columns, you can get them into the longform using melt from reshape2 or gather from tidyr. (other methods also available).
You can do this if you reshape your data into long format
## Some sample data
dat <- data.frame(a=rnorm(100), b=rnorm(100), c=rnorm(100))
## Reshape data wide -> long
library(reshape2)
long <- melt(dat)
plot(value ~ variable, data=long)

adding text to ggplot geom_jitter points that match a condition

How can I add text to points rendered with geom_jittered to label them? geom_text will not work because I don't know the coordinates of the jittered dots. Could you capture the position of the jittered points so I can pass to geom_text?
My practical usage would be to plot a boxplot with the geom_jitter over it to show the data distribution and I would like to label the outliers dots or the ones that match certain condition (for example the lower 10% for the values used for color the plots).
One solution would be to capture the xy positions of the jittered plots and use it later in another layer, is that possible?
[update]
From Joran answer, a solution would be to calculate the jittered values with the jitter function from the base package, add them to a data frame and use them with geom_point. For filtering he used ddply to have a filter column (a logic vector) and use it for subsetting the data in geom_text.
He asked for a minimal dataset. I just modified his example (a unique identifier in the label colum)
dat <- data.frame(x=rep(letters[1:3],times=100),y=runif(300),
lab=paste('id_',1:300,sep=''))
This is the result of joran example with my data and lowering the display of ids to the lowest 1%
And this is a modification of the code to have colors by another variable and displaying some values of this variable (the lowest 1% for each group):
library("ggplot2")
#Create some example data
dat <- data.frame(x=rep(letters[1:3],times=100),y=runif(300),
lab=paste('id_',1:300,sep=''),quality= rnorm(300))
#Create a copy of the data and a jittered version of the x variable
datJit <- dat
datJit$xj <- jitter(as.numeric(factor(dat$x)))
#Create an indicator variable that picks out those
# obs that are in lowest 1% by x
datJit <- ddply(datJit,.(x),.fun=function(g){
g$grp <- g$y <= quantile(g$y,0.01);
g$top_q <- g$qual <= quantile(g$qual,0.01);
g})
#Create a boxplot, overlay the jittered points and
# label the bottom 1% points
ggplot(dat,aes(x=x,y=y)) +
geom_boxplot() +
geom_point(data=datJit,aes(x=xj,colour=quality)) +
geom_text(data=subset(datJit,grp),aes(x=xj,label=lab)) +
geom_text(data=subset(datJit,top_q),aes(x=xj,label=sprintf("%0.2f",quality)))
Your question isn't completely clear; for example, you mention labeling points at one point but also mention coloring points, so I'm not sure which you really mean, or perhaps both. A reproducible example would be very helpful. But using a little guesswork on my part, the following code does what I think you're describing:
#Create some example data
dat <- data.frame(x=rep(letters[1:3],times=100),y=runif(300),
lab=rep('label',300))
#Create a copy of the data and a jittered version of the x variable
datJit <- dat
datJit$xj <- jitter(as.numeric(factor(dat$x)))
#Create an indicator variable that picks out those
# obs that are in lowest 10% by x
datJit <- ddply(datJit,.(x),.fun=function(g){
g$grp <- g$y <= quantile(g$y,0.1); g})
#Create a boxplot, overlay the jittered points and
# label the bottom 10% points
ggplot(dat,aes(x=x,y=y)) +
geom_boxplot() +
geom_point(data=datJit,aes(x=xj)) +
geom_text(data=subset(datJit,grp),aes(x=xj,label=lab))
Just an addition to Joran's wonderful solution:
I ran into trouble with the x-axis positioning when I tried to use in a facetted plot using facet_wrap(). The problem is, that ggplot2 uses 1 as the x-value on every facet. The solution is to create a vector of jittered 1s:
datJit$xj <- jitter(rep(1,length(dat$x)),amount=0.1)

Resources