Colouring points in an ordination plot in r using a data frame - r

Using the following data.frame:
df<-data.frame("sites"=as.character(1:20),"type"=c(rep("small",10),rep("large",10)))
sites type
1 1 small
2 2 small
3 3 small
4 4 small
5 5 small
6 6 small
7 7 small
8 8 small
9 9 small
10 10 small
11 11 large
12 12 large
13 13 large
14 14 large
15 15 large
16 16 large
17 17 large
18 18 large
19 19 large
20 20 large
I would like to colour the text labels (i.e. 1-20) by label (i.e. "small", "large") in the following ordination plot:
library(vegan)
library(stats)
data(dune)
dist <- vegdist(wisconsin(dune))
#Ordinate data
pc<-cmdscale(dist, k=10, eig=TRUE, add=TRUE, x.ret =TRUE)
#Create ordination plot
quartz(title="PCoA on coral data")
fig<-ordiplot(scores(pc)[,c(1,2)], type="t", main="PCoA")

It looks like the text label color is hard coded in ordiplot so you have to set up the plot and then use text() to plot the labels by group:
score <- scores(pc)[, 1:2]
fig<-ordiplot(score, type="n", main="PCoA")
color <- c("red", "blue")
sz <- as.numeric(df$type[as.numeric(rownames(score))])
text(score, rownames(score), col=color[sz])

Related

R: Plot Density Graph for data in tables with respect to Labels in tables

I got a data in table form which look like this in R:
V1 V2
1 19 -1539
2 7 -1507
3 3 -1446
4 7 -1427
5 8 -1401
6 2 -422
7 22 4178
8 5 4277
9 10 4303
10 18 4431
....200 million more lines to go
I would like to plot a density plot for the value in the second column with respect to the label in the first column (i.e. each label has on density curve on a same graph). But I don't know how. Any suggestion?
If I understood the question correctly, this would end up somewhat like a density heatmap in the end. (Considering there are 200 million observations total and V1 has fairly considerable range of variation)
For that I would try ggplot and stat_binhex:
df <- read.table(text="V1 V2
1 19 -1539
2 7 -1507
3 3 -1446
4 7 -1427
5 8 -1401
6 2 -422
7 22 4178
8 5 4277
9 10 4303
10 18 4431")
library(ggplot2)
ggplot(data=df,aes(V1,V2)) +
stat_binhex() +
scale_fill_gradient(low="red", high="steelblue") +
scale_y_continuous() +
theme_bw()
stat_binhex should work well with large data and has several parameters that will help with presentation (like bins, binwidth. See ?stat_binhex)
OK I figure it out by myself
ggplot(data, aes(x=V2, color=V1)) + geom_density(aes(group=V1))
Should be able to do that.
However there is two thing I need to make sure first in order to let it run:
V1 is a factor
V2 is a numerical value
The data I got wasn't set directly by read.tables in the way I want, so I have to do the following before using ggplot:
data$V1 = as.factor(data$V1)
data$V2 = as.numeric(as.character(data$V2))

Stacking Scatterplots in ggplot2

I am creating scatterplots in R using ggplot2 to visualize populations over time. My data set looks like the following:
sampling_period cage total_1 total_2 total_3
4 y 34 95 12
4 n 89 12 13
5 n 23 10 2
I have been making individual scatterplots for the variables total_1, total_2, and total_3 with the following code:
qplot(data=BLPcaged, x=sampling_period, y=total_1, color=cage_or_control)
qplot(data=BLPcaged, x=sampling_period, y=total_2, color=cage_or_control)
qplot(data=BLPcaged, x=sampling_period, y=total_3, color=cage_or_control)
I want to create a scatterplot that contains the information about all three populations over time. I want the final product to be composed of three scatterplots one on top of each other and have the same scale for the axes. This way I could compare all three populations in one visualization.
I know that I can use facet to make different plots for the levels of a factor, but can it also be used to create different plots for different variables?
You can use melt() to reshape your data with total as a factor that you can facet on:
BLPcaged = data.frame(sampling_period=c(4,4,5),
cage=c('y','n','n'),
total_1=c(34,89,23),
total_2=c(95,12,10),
total_3=c(12,13,2))
library(reshape2)
BLPcaged.melted = melt(BLPcaged,
id.vars=c('sampling_period','cage'),
variable.name='total')
So now BLPcaged.melted looks like this:
sampling_period cage total value
1 4 y total_1 34
2 4 n total_1 89
3 5 n total_1 23
4 4 y total_2 95
5 4 n total_2 12
6 5 n total_2 10
7 4 y total_3 12
8 4 n total_3 13
9 5 n total_3 2
You can then facet this by total:
ggplot(BLPcaged.melted, aes(sampling_period, value, color=cage)) +
geom_point() +
facet_grid(total~.)

Frequency distribution with custom format data

I need help with a R plot, with a data format I have not worked with before. Please help if you know.
NUMBER FREQUENCY
10 1
11 1
12 3
10 45
11 2
12 3
i need a bar plot with numbers on X axis (continuous, not bins in histogram) and frequency on Y, but combined.
like
10 46
11 3
12 6
it seems simple enough, but i have 10,000 rows and large numbers in real data so I am looking for a good solution in R without doing it manually.
What about:
##tapply splits dd$FREQ by dd$NUM and "sums" them
barplot(tapply(dd$FREQUENCY, dd$NUMBER, sum))
to get:
Read in your data:
dd = read.table(textConnection("NUMBER FREQUENCY
10 1
11 1
12 3
10 45
11 2
12 3"), header=TRUE)

How to create a stacked bar chart from summarized data in ggplot2

I'm trying to create a stacked bar graph using ggplot 2. My data in its wide form, looks like this. The numbers in each cell are the frequency of responses.
activity yes no dontknow
Social events 27 3 3
Academic skills workshops 23 5 8
Summer research 22 7 7
Research fellowship 20 6 9
Travel grants 18 8 7
Resume preparation 17 4 12
RAs 14 11 8
Faculty preparation 13 8 11
Job interview skills 11 9 12
Preparation of manuscripts 10 8 14
Courses in other campuses 5 11 15
Teaching fellowships 4 14 16
TAs 3 15 15
Access to labs in other campuses 3 11 18
Interdisciplinary research 2 11 18
Interdepartamental projects 1 12 19
I melted this table using reshape2 and
melted.data(wide.data,id.vars=c("activity"),measure.vars=c("yes","no","dontknow"),variable.name="haveused",value.name="responses")
That's as far as I can get. I want to create a stacked bar chart with activities on the x axis, frequency of responses in the y axis, and each bar showing the distribution of the yes, nos and dontknows
I've tried
ggplot(melted.data,aes(x=activity,y=responses))+geom_bar(aes(fill=haveused))
but I'm afraid that's not the right solution
Any help is much appreciated.
You haven't said what it is that's not right about your solution. But some issues that could be construed as problems, and one possible solution for each, are:
The x axis tick mark labels run into each other. SOLUTION - rotate the tick mark labels;
The order in which the labels (and their corresponding bars) appear are not the same as the order in the original dataframe. SOLUTION - reorder the levels of the factor 'activity';
To position text inside the bars set the vjust parameter in position_stack to 0.5
The following might be a start.
# Load required packages
library(ggplot2)
library(reshape2)
# Read in data
df = read.table(text = "
activity yes no dontknow
Social.events 27 3 3
Academic.skills.workshops 23 5 8
Summer.research 22 7 7
Research.fellowship 20 6 9
Travel.grants 18 8 7
Resume.preparation 17 4 12
RAs 14 11 8
Faculty.preparation 13 8 11
Job.interview.skills 11 9 12
Preparation.of.manuscripts 10 8 14
Courses.in.other.campuses 5 11 15
Teaching.fellowships 4 14 16
TAs 3 15 15
Access.to.labs.in.other.campuses 3 11 18
Interdisciplinay.research 2 11 18
Interdepartamental.projects 1 12 19", header = TRUE, sep = "")
# Melt the data frame
dfm = melt(df, id.vars=c("activity"), measure.vars=c("yes","no","dontknow"),
variable.name="haveused", value.name="responses")
# Reorder the levels of activity
dfm$activity = factor(dfm$activity, levels = df$activity)
# Draw the plot
ggplot(dfm, aes(x = activity, y = responses, group = haveused)) +
geom_col(aes(fill=haveused)) +
theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.25)) +
geom_text(aes(label = responses), position = position_stack(vjust = .5), size = 3) # labels inside the bar segments

How to subset data for additional geoms while using facets in ggplot2?

I want additional 'geoms' to only apply to a subset of the initial data. I would like this subset to be from each units created by facets=~.
My trials using subletting of either the data or of the plotted variables leads to subsetting of the whole data set, rather than the subletting of the units created by 'facets=~' and in two different ways (apparently dependant on the sorting of the data).
This difficulty is appears with any 'geom' while using 'facets'
library(ggplot2)
test.data<-data.frame(factor=rep(c("small", "big"), each=9),
x=c(c(1,2,3,3,3,2,1,1,1), 2*c(1,2,3,3,3,2,1,1,1)),
y=c(c(1,1,1,2,3,3,3,2,1), 2*c(1,1,1,2,3,3,3,2,1)))
factor x y
1 small 1 1
2 small 2 1
3 small 3 1
4 small 3 2
5 small 3 3
6 small 2 3
7 small 1 3
8 small 1 2
9 small 1 1
10 big 2 2
11 big 4 2
12 big 6 2
13 big 6 4
14 big 6 6
15 big 4 6
16 big 2 6
17 big 2 4
18 big 2 2
qplot(data=test.data,
x=x,
y=y,
geom="polygon",
facets=~factor)+
geom_polygon(data=test.data[c(2,3,4,5,6,2),],
aes(x=x,
y=y),
fill=I("red"))
qplot(data=test.data,
x=x,
y=y,
geom="polygon",
facets=~factor)+
geom_polygon(aes(x=x[c(2,3,4,5,6,2)],
y=y[c(2,3,4,5,6,2)]),
fill=I("red"))
The answer is to subset the data in a first step.
library(ggplot2)
library(plyr)
test.data<-data.frame(factor=rep(c("small", "big"), each=9),
x=c(c(1,2,3,3,3,2,1,1,1), 2*c(1,2,3,3,3,2,1,1,1)),
y=c(c(1,1,1,2,3,3,3,2,1), 2*c(1,1,1,2,3,3,3,2,1)))
subset.test<-ddply(.data=test.data,
.variables="factor",
function(data){
data[c(2,3,4,5,6,2),]})
qplot(data=test.data,
x=x,
y=y,
geom="polygon",
facets=~factor)+
geom_polygon(data=subset.test,
fill=I("red"))

Resources