Data from two data frames in one plot (R) - r

I've got two data frames in R, both of the same structure - with columns named: Year, Age, Gender and Value1.
What I'd like to do, is to plot (as points) Value1 (on Y axis) against Year (on X axis), for a particular gender and age. The plot should consists of points from both data frames (with legend indicating which points are from which data frame).
What I've done is:
attach(df1)
plot(Value1[Gender=="Female" & Age==30] ~ Year[Gender=="Female" & Age==30])
which creates the plot with points from one data frame. The question is, how to add the points from the second data frame to the same plot, and how to create proper legend? I tried few combinations of the points() formula, but it did not help.

without a reproducable example it is not very easy to help. Assuming your data frames are called df1,df2 you can try this:
library(ggplot2)
library(dplyr)
df1$frame="1"
df2$frame="2"
df=rbind(df1,df2)
df<-filter(df,Gender=="Female"&Age==30)
ggplot(data=df,aes(x=Year,y=Value1,col=frame))+geom_point()

Related

Is it possible to get data from a plot in R

I have this plot where I plotted patient ID's on the x axis and BMI on the y axis. I found a cluster of a data in "severely underweight" category as u can see in the plot. How can I get a table of all those points which are in here?
OR
How can I extract one category from a column in R.
Assuming that
your data is the d1 data frame
the category is given by the group column
Then one possibility is to use dplyr package:
suw <- filter(d1, group == "Severely underweight")

R - converting a table to data frame

I'm working on the Titanic dataset from R. I want to analyse the dataset using a ggplot (stacked and group bar plots). So I wanted to convert the table into a data-frame so I could plot the graphs. I used the following code to convert :
df<-as.data.frame(Titanic)
View(df)
However, even on viewing I see my df to be more like a data-table.
And further when I tried to use it to plot a function usinf the code:
ggplot(data=df) + geom_bar(aes(x=Class,y=Sex))
All it shows is an empty plot, with just the labels on x and y axis, along with the categorical values of Sex as Male & Female and Class as 1st,2nd,3rd and crew.
What confuses me even more is that it's picking up the categorical values from the dataset but not the observations.
Please let me know how I can convert to dataframe correctly. Thanks :)
If I reproduce your code it gives me this error:
Error : Mapping a variable to y and also using stat="bin".
This is because you also included the y=Sex in your script. The main question therefore is, what would you like to plot?
If this is a barchart with the count of persons in each class the code will be:
ggplot(data=df) + geom_bar(aes(x=Class))
If it will be the total amount of females/males it will be:
ggplot(data=df) + geom_bar(aes(x=Sex))
Do not try to plot them at the same time.
To get back to the question. There is nothing wrong with your data frame. It is your ggplot code that is faulty.

How to plot binary data together with continuous data in time series with ggplot2?

I have several data sets containing binary and continuous data respectively.
The data sets includes the datetime for the given observation.
The time step in the datetime column is not the same, so I cannot merge the datasets.
(So far I kept the two datasets apart, especially because the timestep in each dataset is irregular it itself.)
The binary data is in lower frequency than the continous data
Important: I transformed the time to POSIXct format in order to get around the irregular timesteps in the data
I would like to plot the two datasets in one time series plot with ggplot2.
The binary data (0's and 1's) should shade the continuous curve with rectangular surfaces going from y=-Inf to y=Inf.
Does it make sense?
My question: How do I do that?
How to I create a legend and control the colors of the plot?
So far I have the binary data in one plot using geom_step
and the continous data in another plot
I tried multiplot, but it does not seem to work.
The dream situation is, to put multiple plots of different data on top of each other as layers using the POSIXct time as reference somehow!
Not sure I can give some reproducible code..
This is how I transform the time column to POSIXct format:
D$Time <- strptime(D$Time, format="%Y/%m/%d %H:%M:%S")
This is the plot with two binary data sets using geom_step:
ggplot() +
geom_step(data=E, aes(x=Time, y=Set, group=1, col="high window")) +
geom_step(data=D, aes(x=Time, y=Set, group=1)) +
scale_x_datetime(limits=c(as.POSIXct('0015-01-07 08:00:00'), as.POSIXct('0015-01-07 10:00:00'))) +
scale_y_continuous(breaks=seq(0, 1, 1))
I am currently trying to plot the plot above together with a third dataset which is continuous, which means I need another y-axis if I should continue with geom_step...

Plotting multiple time-series in ggplot

I have a time-series dataset consisting of 10 variables.
I would like to create a time-series plot, where each 10 variable is plotted in different colors, over time, on the same graph. The values should be on the Y axis and the dates on the X axis.
Click Here for dataset csv
This is the (probably wrong) code I have been using:
c.o<-read.csv(file="co.csv",head=TRUE)
ggplot(c.o, aes(Year, a, b, c, d, e,f))+geom_line()
and here's what the output from the code looks like:
Can anyone point me in the right direction? I wasn't able to find anything in previous threads.
PROBLEM SOLVED, SEE BELOW.
One additional thing I would like to know:
Is it possible to add an extra line to the plot which represents the average of all variables across time, and have some smoothing below and above that line to represent individual variations?
If your data is called df something like this:
library(ggplot2)
library(reshape2)
meltdf <- melt(df,id="Year")
ggplot(meltdf,aes(x=Year,y=value,colour=variable,group=variable)) + geom_line()
So basically in my code when I use aes() im telling it the x-axis is Year, the y-axis is value and then the colour/grouping is by the variable.
The melt() function was to get your data in the format ggplot2 would like. One big column for year, etc.. which you then effectively split when you tell it to plot by separate lines for your variable.

introducing a gap in continuous x axis using ggplot

This is kinda a build-on on my previous post creating an stacked area/bar plot with missing values (all the script I run can be found there). In this post, however, Im asking if its possible to leave a gap in an continuous x axis? I have a time-serie (month-by-month) over a year, but for one sample one month is missing and I would like to show this month as a complete gap in the plot. Almost like plotting a graph for Jan-Aug (Sep is missing) and one for Oct-Dec and merging these with a gap for Sep.
The only things I have come up trying are treating the missing month as zero or NA, creating a hugh drop in the area chart for Sep or excluding it but with an x axis ranging from 1-11, respectively (see plots in dropbox folder).
The data set Im working on can be found in my dropbox folder and it's named r_class.txt and you can also see the two different plots (Rplots1 and 2).
Any ideas would really be appreciated!
Plot the series as two separate data frames:
#Load libraries
require(ggplot2)
require(reshape)
#Code copied from your linked post:
wa=read.table('wa_class.txt', sep="", header=F, na.string="0")
names(wa)=c("Class","Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec")
wam=melt(wa)
wam$variablen=as.numeric(wam$variable)
#For readability, split the melted data frame into two separate data frames
wam1 <- wam[wam$variablen %in% 1:6,]
wam2 <- wam[wam$variablen %in% 8:12, ]
ggplot() +
geom_area(data=wam1, aes(x=variablen, y=value, fill=Class)) +
geom_area(data=wam2, aes(x=variablen, y=value, fill=Class))
#and add lineranges, etc., accordingly

Resources