Plotting multi repetitions in R - r

I have the following dataset:
Class R1 R2 R3 R4 R5
Operator 6.5 2 18 3.6 5.1
Assest 1.3 9.5 6 6.3 7.5
Operator 10 5 9 2.2 7.5
Execute 6.3 4 2.5 9 9
Execute 6 5 5 5 1.6
Assest 6 2.5 6.6 7 7.9
Operator 10 5 13 5 7.5
Assest 5 2.5 6.6 9 7.9
I would like to generate a mulitplot for each class where each individual plot represents a single run (each multiplot will have three plots based on the example).
I started by doing the following:
data <- read_csv("/home/adam/Desktop/dataa.csv")
dataset <- data %>% melt(id.vars = c("Class"))
p2_data <- dataset %>% filter(Class == "Operator")
pp2 <- p2_data %>% ggplot(aes(x=variable, y=value, group=Class, colour=Class)) +
geom_line() +
scale_x_discrete(breaks = seq(0, 1000, 100)) +
but that only give me a plot of one class with all the runs, which is not what I want. Can you please help me solving this?

If I understand your question correctly and you would like to have separate plots for each of the three Classes with a line for each row of observations (3 for Assest, 2 for Execute and 3 for Operator), perhaps the below would help?
data %>%
group_by(Class) %>%
mutate(run=row_number()) %>%
melt(id.vars = c("Class", "run")) %>%
mutate(run=as.factor(run)) %>%
ggplot(aes(variable, value, colour=run, group=run)) +
geom_point() + geom_line() + facet_wrap(~Class)

Related

Color and shape coding within ggplot

Working with a chemical dateset and what I want to do is to color code the geom_points by the depth at which they were sampled from and then make the shape based on when it was sampled from. I also want to add a thin black border on all the geom_points in order to distinguish them.
Here is a sample table:
ID Depth(m) Sampling Date Cl Br
1 1 May 4.0 .05
2 1 June 5.0 .07
3 2 May 6.0 .03
4 2 June 7.0 .05
5 3 May 8.0 .01
6 3 June 9.0 .03
7 4 May 10.0 .00
8 4 June 11.0 .01
I am trying to use the code
graph <- df %>%
ggplot(aes(x = Cl, y = Br, fill = Depth, shape = Sampling Date), color = black) +
geom_point(shape = c(21:24, size = 4) +
labs(x = "Cl", y = "Br")
graph
But everytime I do this it just fills in the shape black ignoring the color specification. Also I need to use the shapes 21:25 but everytime I try to specify the number of shapes it always says that it doesn't match the number of variables within my dataset.
Your code is somewhat filled with ... challenges.
Remove all spaces! That makes your life easier. Also add shape aes to geom_point and specify the shapes with a scale call.
library(ggplot2)
df <- read.table(text = "ID Depth SamplingDate Cl Br
1 1 May 4.0 .05
2 1 June 5.0 .07
3 2 May 6.0 .03
4 2 June 7.0 .05
5 3 May 8.0 .01
6 3 June 9.0 .03
7 4 May 10.0 .00
8 4 June 11.0 .01", header = T)
ggplot(df, aes(x = Cl, y = Br, fill = Depth, shape = SamplingDate)) +
geom_point(aes(shape = SamplingDate), size = 4) +
scale_shape_manual(values = 21:24)
Created on 2020-07-30 by the reprex package (v0.3.0)

Make scatter (or X, Y) plot by treatment for different time period

I have a data (R dataframe) like this:
Treatment Diameter(inches).Sep Diameter(inches).Dec
Aux_Drop NA NA
Aux_Spray 3.7 2
DMSO NA NA
Water 4.2 2
Aux_Drop 2.6 3
Aux_Spray 3.7 3
DMSO 4 2
Water 5.2 1
Aux_Drop 5.4 2
Aux_Spray 3.4 2
DMSO 4.8 2
Water 4.2 2
Aux_Drop 4.7 2
Aux_Spray 2.7 2
DMSO 3.4 2
Water 4.9 2
.......
.......
I want to make a scatter (or x, y) plot of diameter for each treatment group. I have found lattice library plot more helpful as of now and I have used:
require(lattice)
xyplot(`Diameter(inches).Sep` ~ Treatment , merged.Sep.Dec.Mar, pch= 20)
to generate the plot:
However, I want to add the scatter plot for "Diameter from Dec" next to the "Diameter of Sep" for each treatments with different color. I am not able to find a workable example that I can use for my purpose so far.
Method with lattice, ggplot2 or base plot or any other would be really helpful.
Thanks,
Something like this?
library(tidyverse)
df %>%
gather(Month, Diameter, -Treatment) %>%
ggplot(aes(Treatment, Diameter)) +
geom_point(aes(colour = Month), position = position_dodge(width = 0.9))
You can adjust the amount of separation between the different coloured points by changing width inside position_dodge.
Sample data
df <- read.table(text =
"Treatment Diameter(inches).Sep Diameter(inches).Dec
Aux_Drop NA NA
Aux_Spray 3.7 2
DMSO NA NA
Water 4.2 2
Aux_Drop 2.6 3
Aux_Spray 3.7 3
DMSO 4 2
Water 5.2 1
Aux_Drop 5.4 2
Aux_Spray 3.4 2
DMSO 4.8 2
Water 4.2 2
Aux_Drop 4.7 2
Aux_Spray 2.7 2
DMSO 3.4 2
Water 4.9 2", header = T)
Here's a tidyverse solution. It uses tidyr::gather to put the two diameter types into one column. You can then facet on the values in that column. I hide the colour legend, since the categories are apparent from the axis labels.
Assuming the data frame is named mydata.
library(tidyverse)
mydata %>%
gather(Result, Value, -Treatment) %>%
ggplot(aes(Result, Value)) +
geom_jitter(aes(color = Result),
width = 0.1) +
facet_wrap(~Treatment) +
guides(color = FALSE)

Filter data frame based off factor - R

I have the following data frame (called cats, can be accessed using library(MASS)
Sex Bwt Hwt
1 F 2.0 7.0
2 F 2.0 7.4
3 F 2.0 9.5
4 F 2.1 7.2
5 F 2.1 7.3
6 F 2.1 7.6
7 F 2.1 8.1
8 F 2.1 8.2
9 F 2.1 8.3
10 F 2.1 8.5
I first create 3 factors:
x = cut(cats$Bwt, breaks=3)
Now I need to grab all the data which fits in the first factor, plot it in a boxplot. Then do the same for the other 2 factors.
I have tried:
new_data = subset(cats, cats$Bwt %in% x[1])
also
new_data = cats[which(cats$Bwt == x[1])]
I can't seem to filter this data based on the factor. How is this done?
The simple answer is that the variable you created is the one you should be iterating over when performing the comparison. So:
new_data <- cats[which(x == unique(x)[1]),]
Another alternative is not to subset at all but instead use the facet functionality from ggplot something like this
cats %>%
mutate(breaks = cut(Bwt, breaks=3)) %>%
ggplot() +
geom_boxplot(aes(x = Sex, y = Hwt)) +
facet_wrap(~breaks)

overlap of time series in ggplot2 keeping the x labels

How can I overlap two time series with ggplot2 and keep both X labels (one with 1970 and another with 1980)?
This is an overview of my datasets and the code I use to plot each graphic.
> dataset1.data
Date Obs
1 1/1/1970 2.0
2 1/2/1970 1.0
3 1/3/1970 0.0
4 1/4/1970 0.0
5 1/5/1970 0.5
6 1/6/1970 5.1
7 1/7/1970 0.0
8 1/8/1970 0.0
> dataset2.data
Date Obs
1 1/1/1980 3.0
2 1/2/1980 0.5
3 1/3/1980 0.5
4 1/4/1980 5.0
5 1/5/1980 0.4
6 1/6/1980 6.2
7 1/7/1980 9.0
8 1/8/1980 1.3
qplot(main="Observations 1")+xlab("Date")+ylab("Obs")+
geom_point(data = dataset1.data,aes(Date, Obs, colour="blue"),alpha = 0.7,na.rm = TRUE)+
scale_colour_identity("Legend", breaks=c("blue"), labels="1970")
qplot(main="Observations 2")+xlab("Date")+ylab("Obs")+
geom_point(data = dataset2.data,aes(Date, Obs, colour="red"),alpha = 0.7,na.rm = TRUE)+
scale_colour_identity("Legend", breaks=c("red"), labels="1980")
I would put them both in a single dataset, and then use a new Year variable for the color aesthetic:
dataset1.data = read.table('dataset1.txt')
dataset2.data = read.table('dataset2.txt')
dataset1.data$Date = as.Date(dataset1.data$Date, format='%m/%d/%Y')
dataset2.data$Date = as.Date(dataset2.data$Date, format='%m/%d/%Y')
data = rbind(dataset1.data, dataset2.data)
data = transform(data, MonthDay=gsub('(.+)-(.+-.+)', '\\2', data$Date), Year=gsub('(.+)-(.+-.+)', '\\1', data$Date))
qplot(main="Observations 1")+xlab("Date")+ylab("Obs")+geom_point(data = data,aes(MonthDay, Obs, colour=Year),alpha = 0.7,na.rm = TRUE)
It's probably also possible to do it by editing the grid objects. For example, see: https://github.com/hadley/ggplot2/wiki/Editing-raw-grid-objects-from-a-ggplot

drawing a stratified sample in R

Designing my stratified sample
library(survey)
design <- svydesign(id=~1,strata=~Category, data=billa, fpc=~fpc)
So far so good, but how can I draw now a sample in the same way I was able for simple sampling?
set.seed(67359)
samplerows <- sort(sample(x=1:N, size=n.pre$n))
If you have a stratified design, then I believe you can sample randomly within each stratum. Here is a short algorithm to do proportional sampling in each stratum, using ddply:
library(plyr)
set.seed(1)
dat <- data.frame(
id = 1:100,
Category = sample(LETTERS[1:3], 100, replace=TRUE, prob=c(0.2, 0.3, 0.5))
)
sampleOne <- function(id, fraction=0.1){
sort(sample(id, round(length(id)*fraction)))
}
ddply(dat, .(Category), summarize, sampleID=sampleOne(id, fraction=0.2))
Category sampleID
1 A 21
2 A 29
3 A 72
4 B 13
5 B 20
6 B 42
7 B 58
8 B 82
9 B 100
10 C 1
11 C 11
12 C 14
13 C 33
14 C 38
15 C 40
16 C 63
17 C 64
18 C 71
19 C 92
Take a look at the sampling package on CRAN (pdf here), and the strata function in particular.
This is a good package to know if you're doing surveys; there are several vignettes available from its page on CRAN.
The task view on "Official Statistics" includes several topics that are closely related to these issues of survey design and sampling - browsing through it and the packages recommended may also introduce other tools that you can use in your work.
You can draw a stratified sample using dplyr. First we group by the column or columns in which we are interested in. In our example, 3 records of each Species.
library(dplyr)
set.seed(1)
iris %>%
group_by (Species) %>%
sample_n(., 3)
Output:
Source: local data frame [9 x 5]
Groups: Species
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 4.3 3.0 1.1 0.1 setosa
2 5.7 3.8 1.7 0.3 setosa
3 5.2 3.5 1.5 0.2 setosa
4 5.7 3.0 4.2 1.2 versicolor
5 5.2 2.7 3.9 1.4 versicolor
6 5.0 2.3 3.3 1.0 versicolor
7 6.5 3.0 5.2 2.0 virginica
8 6.4 2.8 5.6 2.2 virginica
9 7.4 2.8 6.1 1.9 virginica
here's a quick way to sample three records per distinct 'carb' value from the mtcars data frame without replacement
# choose how many records to sample per unique 'carb' value
records.per.carb.value <- 3
# draw the sample
your.sample <-
mtcars[
unlist(
tapply(
1:nrow( mtcars ) ,
mtcars$carb ,
sample ,
records.per.carb.value
)
) , ]
# print the results to the screen
your.sample
note that the survey package is mostly used for analyzing complex sample survey data, not creating it. #Iterator is right that you should check out the sampling package for more advanced ways to create complex sample survey data. :)

Resources