I have some time series data, I've used a few functions to find specific x,y coordinates and wish to have them highlighted on the facet plot I've created. Not even sure if it's possible.
# create sample data
t<-seq(1:100)
a<-rnorm(1:100)
b<-rnorm(1:100)
c<-rnorm(1:100)
g <-as.data.frame(cbind(t,a,b,c))
g <- melt(g,id="t")
# current facet graph
ggplot(g,aes(x=t,y=value,color=variable))+
geom_point()+
facet_wrap(~variable)
This looks along the lines of what I've got right now. But I've also got the additional data below, which is a dataframe of x,y coordinates.
# sample data with x,y coords
x1 <- c(10,11,15)
y1 <- c(5,6,9)
x2 <- c(50,41,35)
y2 <- c(25,27,19)
xy<-rbind(x1,y1,x2,y2)
colnames(xy)<-c("a","b","c")
I'm not sure how to make this happen. I'd like the coordinates to be graphed in their individual plots.
Thanks for the help above, as you guys suspected the format of my data was incorrect. I'm still a novice and so collecting and formatting the data is often not as straightforward.
#the format of my 'primary' dataframe
t<-seq(1:100)
a<-rnorm(1:100)
b<-rnorm(1:100)
c<-rnorm(1:100)
g <-as.data.frame(cbind(t,a,b,c))
g <- melt(g,id="t")
#and then converting the original dataframe above to something that matches
xy<-data.frame(t = c(10,11,15,50,41,35),variable=c('a','b','c'),value = c(5,6,9,25,27,19))
as soon as I converted the df to the correct format it became much easier to fiddle around with.
ggplot(g,aes(x=t,y=value,color=variable))+
geom_point()+
geom_point(data=xy,size=4) +
facet_wrap(~variable)
The below picture shows the result. Perhaps a better title to the question would be Plot multiple dataframes on a single facet
Related
I have some data from people (i.e. reading of e-books and printed books) that I want to plot with respect to 3 different categories (i.e. age, place of living, and social status) in the same graph because the comparison would be easier to see this way. I did a plot using the tool facet_grid of ggplot2.
A code akin to the one I have is the following:
library(ggplot2)
F1 <- rep(c("F11","F12"),4)
F2 <- rep(c("F21","F22","F23","F24","F25","F26"),4)
F3 <- rep(c("F31","F32","F33","F34","F35","F36"),4)
R1 <- c(rep("R1",4),rep("R2",4))
R2 <- c(rep("R1",12),rep("R2",12))
R3 <- c(rep("R1",12),rep("R2",12))
YN1 <- rep(c(rep("Yes",2),rep("No",2)),2)
YN2 <- rep(c(rep("Yes",6),rep("No",6)),2)
YN3 <- rep(c(rep("Yes",6),rep("No",6)),2)
C <- c(rep("C3",8),rep("C2",24),rep("C1",24))
df1 <- data.frame(Fill=F1,R=R1,YN=YN1)
df2 <- data.frame(Fill=F2,R=R2,YN=YN2)
df3 <- data.frame(Fill=F3,R=R3,YN=YN3)
nums <- c(13.686722,6.246296,85.220158,83.702738,52.768593,44.885623,46.138288,45.063411,
12.448873,25.969226,17.290794,11.733145,8.288770,3.889354,84.722827,72.213003,
80.786696,86.643134,90.212283,95.047198,69.149769,53.831649,45.815887,44.073420,
43.501864,46.696141,28.021930,44.350581,52.261603,54.302859,54.999190,52.240411,
7.984016,13.276286,18.632469,24.995091,26.862302,24.478694,92.015984,86.723714,
81.367531,75.004909,73.137698,75.521306,49.535149,51.837543,56.646371,64.814451,
66.873589,69.809610,50.464851,48.162457,43.353629,35.185549,33.126411,30.190390)
df <- data.frame(C,rbind(df1,df2,df3),nums)
ggplot(df, aes(x=YN,y=nums,fill=factor(Fill,levels=c(c("F31","F32","F33","F34","F35","F36"),
c("F21","F22","F23","F24","F25","F26")
,c("F11","F12"))))) +
scale_fill_manual(values=c("#74ADD1","#ABD9E9", "#E0F3F8", "#D9EF8B","#A6D96A","#66BD63"
,rev(c("#8DD3C7","#FFFFB3","#BEBADA","#FB8072","#80B1D3","#FDB462")),
"#FBB4AE","#B3CDE3"))+
geom_bar(position=position_stack(reverse=F), stat="identity", colour="black",size=.3)+
facet_grid(C~R,scales="free") + labs(fill="Fill")
I'm new to this site so it doesn't let me add pictures, but the plot it generates is this one.
As you can see, there's three different groups of colours for the fillings of the bars, one for each category. However, the legend of said fillings displays them all as if they were one big group.
This is what I'd like to change. I'd want to separate this legend in three groups, one corresponding to each category. I've edited the plot to show precisely what my aim is. The actual plot I would like to get is something similar to this one.
I'm not sure if something like it can be achieved with code, or if instead I may have to edit the original plot with some graphic design tool like I've done here haha.
I'm rather new to coding on R so any help would be greatly appreciated. Thanks in advance.
I started learning R for data analysis and, most importantly, for data visualisation.
Since I am still in the switching process, I am trying to reproduce the activities I was doing with Graphpad Prism or Origin Pro in R. In most of the cases everything was smooth, but I could not find a smart solution for plotting multiple y columns in a single graph.
What I usually get from the softwares I use for data visualisations look like this:
Each single black trace is a measurement, and I would like to obtain the same plot in R. In Prism or Origin, this will take a single copy-paste in a XY graph.
I exported the matrix of data (one X, which indicates the time, and multiple Y values, which are the traces you see in the image).
I imported my data in R with the following commands:
library(ggplot2) #loaded ggplot2
Data <- read.csv("Directory/File.txt", header=F, sep="") #imported data
DF <- data.frame(Data) #transformed data into data frame
If I plot my data now, I obtain a series of columns, where the first one (called V1) is the X axis and all the others (V2 to V140) are the traces I want to put on the same graph.
To plot the data, I tried different solutions:
ggplot(data=DF, aes(x=DF$V1, y=DF[V2:V140]))+geom_line()+theme_bw() #did not work
plot(DF, xy.coords(x=DF$V1, y=DF$V2:V140)) #gives me an error
plot(DF, xy.coords(x=V1, y=c(V2:V10))) #gives me an error
I tried the matplot, without success, following the EZH guide:
The code I used is the following: matplot(x=DF$V1, type="l", lty = 2:100)
The only solution I found would be to individually plot a command for each single column, but it is a crazy solution. The number of columns varies among my data, and manually enter commands for 140 columns is insane.
What would you suggest?
Thank you in advance.
Here there are also some data attached.Data: single X, multiple Y
I tried using the matplot(). I used a very sample data which has no trend at all. so th eoutput from my code shall look terrible, but my main focus is on the code. Since you have already tried matplot() ,just recheck with below solution if you had done it right!
set.seed(100)
df = matrix(sample(1:685765,50000,replace = T),ncol = 100)
colnames(df)=c("x",paste0("y", 1:99))
dt=as.data.frame(df)
matplot(dt[["x"]], y = dt[,c(paste0("y",1:99))], type = "l")
If you want to plot in base R, you have to make a plot and add lines one at a time, however that isn't hard to do.
we start by making some sample data. Since the data in the link seemed to all be on the same scale, I will assume your data frame only has y values and the x value is stored separately.
plotData <- as.data.frame(matrix(sort(rnorm(500)),ncol = 5))
xval <- sort(sample(200, 100))
Now we can initialize a plot with the first column.
plot(xval, plotData[[1]], type = "l",
ylim = c(min(plotData), max(plotData)))
type = "l" makes a line plot instead of a scatter plot
ylim = c(min(plotData), max(plotData)) makes sure the y-axis will fit all the data.
Now we can add the rest of the values.
apply(plotData[-1], 2, lines, x = xval)
plotData[-1] removes the column we already plotted,
apply function with 2 as the second parameter means we want to execute a function on every column,
lines defines the function we are applying to the columns. lines adds a new line to the current plot.
x = xval passes an extra parameter (x) to the lines function.
if you wat to plot the data using ggplot2, the data should be transformed to long format;
library(ggplot2)
library(reshape2)
dat <- read.delim('AP.txt', header = F)
# plotting only first 9 traces
# my rstudio will crach if I plot the full data;
df <- melt(dat[1:10], id.vars = 'V1')
ggplot(df, aes(x = V1, y = value, color = variable)) + geom_line()
# if you want all traces to be in same colour, you can use
ggplot(df, aes(x = V1, y = value, group = variable)) + geom_line()
This is my first post, so go easy. Up until now (the past ~5 years?) I've been able to either tweak my R code the right way or find an answer on this or various other sites. Trust me when I say that I've looked for an answer!
I have a working script to create the attached boxplot in basic R.
http://i.stack.imgur.com/NaATo.jpg
This is fine, but I really just want to "jazz" it up in ggplot, for vain reasons.
I've looked at the following questions and they are close, but not complete:
Why does a boxplot in ggplot requires axis x and y?
How do you draw a boxplot without specifying x axis?
My data is basically like "mtcars" if all the numerical variables were on the same scale.
All I want to do is plot each variable on the same boxplot, like the basic R boxplot I made above. My y axis is the same continuous scale (0 to 1) for each box and the x axis simply labels each month plus a yearly average (think all the mtcars values the same on the y axis and the x axis is each vehicle model). Each box of my data represents 75 observations (kind of like if mtcars had 75 different vehicle models), again all the boxes are on the same scale.
What am I missing?
Though I don't think mtcars makes a great example for this, here it is:
First, we make the data (hopefully) more similar to yours by using a column instead of rownames.
mt = mtcars
mt$car = row.names(mtcars)
Then we reshape to long format:
mt_long = reshape2::melt(mt, id.vars = "car")
Then the plot is easy:
library(ggplot2)
ggplot(mt_long, aes(x = variable, y = value)) +
geom_boxplot()
Using ggplot all but requires data in "long" format rather than "wide" format. If you want something to be mapped to a graphical dimension (x-axis, y-axis, color, shape, etc.), then it should be a column in your data. Luckily, it's usually quite easy to get data in the right format with reshape2::melt or tidyr::gather. I'd recommend reading the Tidy Data paper for more on this topic.
I am trying to plot (on the same graph) two sets of data versus date from two different data frames. Both data frames have the same exact dates for each of the two measurements. I would like to plot these two sets of data on the same graph, with different colors. However, I can't get them on the same graph at all. R is already reading the date as date. I tried this:
qplot( date , NO3, data=qual.arn)
+ qplot( qual.arn$date , qual.arn$DIS.O2, "O2(aq)" , add=T)
and received this error.
Error in add_ggplot(e1, e2, e2name) :
argument "e2" is missing, with no default
I tried using the ggplot function instead of qplot, but I couldn't even plot one graph this way.
ggplot(date=qual.no3.s, aes(date,NO3))
Error: ggplot2 doesn't know how to deal with data of class uneval
PLEASE HELP. Thank you!
Since you didn't provide any data (please do so in future), here's a made up dataset for demonstrate a solution. There are (at least) two ways to do this: the right way and the wrong way. Both yield equivalent results in this very simple case.
# set up minimum reproducible example
set.seed(1) # for reproducible example
dates <- seq(as.Date("2015-01-01"),as.Date("2015-06-01"), by=1)
df1 <- data.frame(date=dates, NO3=rpois(length(dates),25))
df2 <- data.frame(date=dates, DIS.O2=rnorm(length(dates),50,10))
ggplot is designed to use data in "long" format. This means that all the y-values (the concentrations) are in a single column, and there is separate column which identifies the corresponding category ("NO3" or "DIS.O2" in your case). So first we merge the two data-sets based on date, then use melt(...) to convert from "wide" (categories in separate columns) to "long" format. Then we let ggplot worry about legends, colors, etc.
library(ggplot2)
library(reshape2) # for melt(...)
# The right way: combine the data-sets, then plot
df.mrg <- merge(df1,df2, by="date", all=TRUE)
gg.df <- melt(df.mrg, id="date", variable.name="Component", value.name="Concentration")
ggplot(gg.df, aes(x=date, y=Concentration, color=Component)) +
geom_point() + labs(x=NULL)
The "wrong" way to do this is by making separate calls to geom_point(...) for each layer. In your particular case this might be simpler, but in the long run it's better to use the other method.
# The wrong way: plot two sets of points
ggplot() +
geom_point(data=df1, aes(x=date, y=NO3, color="NO2")) +
geom_point(data=df2, aes(x=date, y=DIS.O2, color="DIS.O2")) +
scale_color_manual(name="Component",values=c("red", "blue")) +
labs(x=NULL, y="Concentration")
I have a dataframe with 3 columns, (Id, Lat, Long), you can construct a small section of this with the following data:
df <- data.frame(
Id=c(1,1,2,2,2,2,2,2,3,3,3,3,3,3),
Lat=c(58.12550, 58.17426, 58.46461, 58.45812, 58.45207, 58.44512, 58.43358, 58.42727, 57.77700, 57.76034, 57.73614, 57.72411, 57.70498, 57.68453),
Long=c(-5.098068, -5.314452, -4.914108, -4.899922, -4.887067, -4.873312, -4.852384, -4.840817, -5.666568, -5.648711, -5.617588, -5.594681, -5.557740, -5.509405))
The Id column is an index column. So all the rows with the same Id number have the coordinates for a single line. In my data frame this Id number varies from 1 through to 7696. So I have 7696 lines to plot.
Each Id number relates to an individual separate line of Lat and Long coordinates. What I want to do is overlay onto an existing plot all of these 7696 individual lines.
With the example data above this contains the Lat & Long coordinates for lines 1, 2, 3.
What is the best way to overlay all these lines onto an existing plot, I was thinking maybe some kind of loop?
Using ggplot2:
#dummy data
df <- data.frame(
Id=c(1,1,2,2,2,2,2,2,3,3,3,3,3,3),
Lat=c(58.12550, 58.17426, 58.46461, 58.45812, 58.45207, 58.44512, 58.43358, 58.42727, 57.77700, 57.76034, 57.73614, 57.72411, 57.70498, 57.68453),
Long=c(-5.098068, -5.314452, -4.914108, -4.899922, -4.887067, -4.873312, -4.852384, -4.840817, -5.666568, -5.648711, -5.617588, -5.594681, -5.557740, -5.509405))
library(ggplot2)
#plot
ggplot(data=df,aes(Lat,Long,colour=as.factor(Id))) +
geom_line()
Using base R:
#plot blank
with(df,plot(Lat,Long,type="n"))
#plot lines
for(i in unique(df$Id))
with(df[ df$Id==i,],lines(Lat,Long,col=i))
To be honest, I think that any approach to take is going to result in a very cluttered plot since you have so many Ids (unless their lines do not overlap much). Either way, I would probably use ggplot2 for this.
##
if( !("ggplot2" %in% installed.packages()[,1]) ){
install.packages("ggplot2",dependencies=TRUE)
}
library(ggplot2)
##
D <- data.frame(
Id=Id,
Lat=Lat,
Long=Long
)
##
ggplot(data=D,aes(x=Lat,y=Long,group=Id,color=Id))+
geom_point()+ ## you might want to omit geom_point() in your plot
geom_line()
##
The reason I used group=Id, color=Id in aes() rather than passing Id as a factor to aes() and just using color=Id is that you will end up with a legend containing 7000+ factor levels (the majority of which will not be visible in the plot area).