Specify data to be graphed by one parameter - r

I want to be able to graph some points based on if one parameter is true in a data frame. I want to only use the position coordinates if in the "Time" column is equal to 2. How do I write this in R?

In the future it'd helpful to dput some data. The packages dplyr and ggplot2 are two popular packages you might want to look into.
library(dplyr)
library(ggplot2)
Create something that your data might look like:
df <- data.frame(x = runif(10), y = runif(10), Time = sample(1:2, 10, replace = T))
Filter then plot:
df_tmp <- df %>% filter(Time == 2)
ggplot(df_tmp, aes(x,y)) + geom_point()

Related

How to make "interactive" time series plots for exploratory data analysis

I have a time series data frame similar to data created below. Measurements of 5 variables are taken on each individual. Individuals have unique ID numbers. Note that in this data set each individual is of the same length (each has 1000 observations), but in my real data set each individual is of has different lengths (teach individual has a different number of observations). For each individual, I want to plot all 5 variables on top of one another (i.e. all on the y axis) and plot them against time (x axis). I want to print each of these plots to an external document of some kind (pdf, or whatever is recommended for this application) with one plot per page, meaning each individual will have its own page with a single plot. I want these time series plots to be "interactive", in that I can move my mouse over a point, and it will tell me what time individual data points are at. My goal in doing this is exploring the association between peaks, valleys, and other regions between the 5 variables. I am not sure if ggplot2 is still the best application for this, but I would still like for the plots to be aesthetically appealing so that it will be easier to see patterns in the data. Also, is pasting these plots to a pdf the most sensible route? Or would I be better off using R notebook or some other application?
ID <- rep(c("A","B","C"), each=1000)
time <- rep(c(1:1000), times = 3)
one <- rnorm(1000)
two <- rnorm(1000)
three <- rnorm(1000)
four <- rnorm(1000)
five<-rnorm(1000)
data<- data.frame(cbind(ID,time,one,two,three,four,five))
Try using the plotly package. And since you want it to be interactive, you'll want to export as something like html rather than pdf.
To produce a single faceted plot (note I added stringAsFactors = FALSE to your sample data):
library(tidyverse)
library(plotly)
ID <- rep(c("A","B","C"), each=1000)
time <- rep(c(1:1000), times = 3)
one <- rnorm(1000)
two <- rnorm(1000)
three <- rnorm(1000)
four <- rnorm(1000)
five<-rnorm(1000)
data<- data.frame(cbind(ID,time,one,two,three,four,five),
stringsAsFactors = FALSE)
data_long <- data %>%
gather(variable,
value,
one:five) %>%
mutate(time = as.numeric(time),
value = as.numeric(value))
plot <- data_long %>%
ggplot(aes(x = time,
y = value,
color = variable)) +
geom_point() +
facet_wrap(~ID)
interactive_plot <- ggplotly(plot)
htmlwidgets::saveWidget(interactive_plot, "example.html")
If you want to produce and export an interactive plot for every ID programmatically:
walk(unique(data_long$ID),
~ htmlwidgets::saveWidget(ggplotly(data_long %>%
filter(ID == .x) %>%
ggplot(aes(x = time,
y = value,
color = variable)) +
geom_point() +
labs(title = paste(.x))),
paste("plot_for_ID_", .x, ".html", sep = "")))
Edit: I changed map() to walk() so that the plots are produced without console output (previously just a list with 3 empty elements).

dataframe2delta: how to plot a delta function directly from the dataframe using ggplot2

I looked for an answer throughout the former threads, but with no luck.
I was wondering if it could be possible, given a data frame having a structure similar to this one
df <- data.frame(x = rep(1:100, times = 2 ),
y = c(rnorm(100), rnorm(100, 10)),
group = rep(c("a", "b"), each = 100))
to plot directly the difference, between the observations of the two groups, instead of plotting the two samples using different colours, which is what I'm able to do so far using ggplot2. Of course I know I could do that using the base plotting system by simply using
plot(df[df$group == "a",]$y - df[df$group == "b",]$y)
but doing so I waste all the cool features of ggplot2.
Thanks in advance!
EB
You could try something like this:
library(reshape2)
library(ggplot2)
df <- dcast(df, x~group, value.var='y')
df$dif = df$a-df$b
ggplot(df, aes(x, dif)) + geom_line()
Or if you use data.table here is how to do it:
library(data.table)
dt=data.table(df)
dt<-dcast.data.table(dt, x~group, value.var='y')
dt[,dif:=a-b]
ggplot(dt, aes(x, dif)) + geom_line()
How does this look?
Another possibility using dplyr is the following:
ggplot(df %>% group_by(x) %>% summarise(delta = diff(y)),
aes(x = x, y = delta)) + geom_line()
In this case you can avoid the dcast using the function diff and assuming the order between the groups, otherwise you need to sort the factors or apply a dcast on your data frame. I am quite sure that you can do something very similar using data.table.
It's not completely solved, but it looks close to what I meant:
qplot( x = x,
y = diff,
data = dcast( data = df,
value.var = y,
formula = x ~ "diff",
fun.aggregate = function( x ) x[1] - x[2] )
It's quite tricky and strongly depends on what you have in your group variable, but works.
An alternative was to mutate the output of dcast, but in my case the group column was filled in with TRUE and FALSEvalues. Thus, using mutate to obtain diff=TRUE-FALSE returned a column of 1s, not very useful.

Plot table objects with ggplot?

I've got this data:
No Yes
Female 411 130
Male 435 124
which was created using the standard table command. Now with plot I can plot this as such:
plot(table(df$gender, df$fraud))
and it then outputs a 2x2 bar chart.
So my question is, how can I do this with ggplot2? Is there any way with out transforming the table-object to a data frame? I would do that, but it becomes a mess and you then need to rename column and row headers and it just becomes a mess for what is really a quite simple thing?
Something such as
ggplot(as.data.frame(table(df)), aes(x=gender, y = Freq, fill=fraud)) +
geom_bar(stat="identity")
gets a similar chart with a minimum amount of relabelling.
ggplot2 works with data frame, so, you have to convert table into a frame. Here is a sample code:
myTable <- table(df$gender, df$fraud)
myFrame <- as.data.frame(table(myTable))
Now, you can use myFrame in ggplot2:
ggplot(myFrame, aes(x=gender))+
geom_bar(y = Freq)
see Coerce to a Data Frame for more information.
For the record, janitor::tabyl() outputs contingency tables that are data.frames. As such, they are more convenient for a workflow based on tidyverse tools.
For example:
# Data
df <- data.frame(gender = c(rep("female", times = 411),
rep("female", times = 130),
rep("male", times = 435),
rep("male", times = 124)),
fraud = c(rep("no", times = 411),
rep("yes", times = 130),
rep("no", times = 435),
rep("yes", times = 124)))
# Plotting the tabulation with tidyverse tools
df |>
janitor::tabyl(gender, fraud) |>
tidyr::gather(key = fraud, value = how_many, no:yes) |>
ggplot2::ggplot(aes(y = how_many, x = gender, fill = fraud)) +
geom_col()
Note: The sequence of piped results with janitor and tidyr has the benefit of being more transparent, but it essentially replicates the same result achieved with as.data.frame(table(df)).

Subset of data included in more than one ggplot facet

I have a population and a sample of that population. I've made a few plots comparing them using ggplot2 and its faceting option, but it occurred to me that having the sample in its own facet will distort the population plots (however slightly). Is there a way to facet the plots so that all records are in the population plot, and just the sampled records in the second plot?
Matt,
If I understood your question properly - you want to have a faceted plot where one panel contains all of your data, and the subsequent facets contain only a subset of that first plot?
There's probably a cleaner way to do this, but you can create a new data.frame object with the appropriate faceting variable that corresponds to each subset. Consider:
library(ggplot2)
df <- data.frame(x = rnorm(100), y = rnorm(100), sub = sample(letters[1:5], 100, TRUE))
df2 <- rbind(
cbind(df, faceter = "Whole Sample")
, cbind(df[df$sub == "a" ,], faceter = "Subset A")
#other subsets go here...
)
qplot(x,y, data = df2) + facet_wrap(~ faceter)
Let me know if I've misunderstood your question.
-Chase

Setting up a CSV file for R to display histograms

Greetings,
Basically, I have two vectors of data (let's call it experimental and baseline). I want to use the lattice library and histogram functions of R to plot the two histograms side-by-side, just as seen at the end of this page.
I have my data in a CSV file like this:
Label1,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18
Label2,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18
Label3,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18
Label4,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18
Label5,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18
Label6,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18
Each row should be a new pair of histograms. Columns 1-9 represents the data for the experiment (left-side histogram). Columns 10-18 represents the baseline data (right-side histogram).
Can anyone help me on this? Thanks.
Your data is poorly formatted for faceting with lattice. You can restructure it using reshape.
read.csv(textConnection("Label1,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18
Label2,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18
Label3,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18
Label4,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18
Label5,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18
Label6,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18"), header = F)->data
colnames(data)[1] <- "ID"
colnames(data)[2:10] <- paste("exp",1:9, sep = "_")
colnames(data)[11:19] <- paste("base", 1:9, sep = "_")
library(reshape)
data.m <- melt(data, id = "ID")
data.m <- cbind(data.m, colsplit(data.m$variable, "_", names = c("Source","Measure")))
data.m is now in the format you really want your data to be in for almost everything. I don't know if each of the 9 measurements from the experiment and the baseline are meaningful or can be meaningfully compared so I kept them distinct.
Now, you can use lattice properly.
histogram(~value | Source + ID, data = data.m)
If the measurements are meaningfully compared (that is, data[,2] and data[,11] are somehow the "same"), you could recast the data to directly compare experiment to baseline
data.comp <- cast(data.m, ID + Measure ~ Source)
## I know ggplot2 better
library(ggplot2)
qplot(base, exp, data = data.comp)+
geom_abline()+
expand_limits(x = 0, y = 0)
Something like this should work:
library(lattice)
data <- matrix(1:18, ncol=18, nrow=3, byrow=T)
for (i in 1:nrow(data))
{
tmp <- cbind(data[i,], rep(1:2, each=9))
print(histogram(~tmp[,1]|tmp[,2]), split=c(1,i,1,nrow(data)), more=T)
}
Note: this will work only for few rows of data... for larger datasets you may want to think of slightly different layout (change the split parameter in histogram)

Resources