R: plot() uses lines in a scatterplot after as.data.frame() - r

I want to create a simple scatterplot using a table with 2 variables.
The Table looks like this:
> freqs
Var1 Freq
1 1 200
2 2 50
3 3 20
I got it using freqs <- as.data.frame(table(data$V2)) to compute the frequency of numbers in another table.
What I do right now is:
plot(freqs, log="xy", main="Frequency of frequencies",
xlab="frequencies", ylab="frequency of frequencies")
The problem is, that I get a plot with lines, not dots, and I don't know why.
For another list plot() behaved differently and used dots.
It looks like this:
I know that plot depends on the datatype that it gets.
So is the problem in the way I generate freqs?
Edit:
Here is the data as requested: link
Steps were:
data <- read.csv(file="out-kant.txt",head=FALSE,sep="\t")
freqs <- as.data.frame(table(data$V2))
plot(freqs,log="xy",main="Frequency of frequencies", xlab="frequencies", ylab="frequency of frequencies")

It seems like the type of one of your variables is not set as an integer. You get a scatterplot when both x and y are integers. For example, when you run this code you will get a scatterplot, because it automatically sets both variables as integers:
freqs <- read.table(header=TRUE, text='Var1 freq
1 200
2 50
3 20')
plot(freqs, log="xy", main="Frequency of frequencies", xlab="frequencies", ylab="frequency of frequencies")
Check what type your variables are with:
typeof(freqs$freq)
typeof(freqs$Var1)
Then, if it is not an integer, fix it with:
freqs$freq <- as.integer(freqs$freq)
freqs$Var1 <- as.integer(freqs$Var1)
EDIT: So I managed to reproduce your issue when I ran:
freqs$Var1 <- as.factor(freqs$Var1)
plot(freqs, log="xy", main="Frequency of frequencies", xlab="frequencies", ylab="frequency of frequencies")
Perhaps your Var1 variable is specified as a factor. Try running:
freqs$Var1 <- as.numeric(freqs$Var1)
EDIT2: Used the above code to make freqs$Var1 numeric on the data provided on the main question's edit, which fixed the issue.

Related

Combining dotplot R

Im trying to combine two plots into the same plot in R.
My code looks like this:
#----------------------------------------------------------------------------------------#
# RING data: Mikkel
#----------------------------------------------------------------------------------------#
# Set working directory
setwd("/Users/mikkelastrup/Dropbox/Master/RING R")
#### Read data & Converting factors ####
dat <- read.table("R SUM kopi.txt", header=TRUE)
str(dat)
dat$Vial <- as.factor(dat$Vial)
dat$Line <- as.factor(dat$Line)
dat$rep <- as.factor(dat$rep)
dat$fly <- as.factor(dat$fly)
str(dat)
mtdata <- droplevels(dat[dat$Line=="20",])
mt1data <- droplevels(mtdata[mtdata$rep=="1",])
tdata <- melt(mt1data, id=c("rep","Conc","Sex","Line","Vial", "fly"))
tdata$variable <- as.factor(tdata$variable)
tfdata <- droplevels(tdata[tdata$Sex=="f",])
tmdata <- droplevels(tdata[tdata$Sex=="m",])
####Plotting####
d1 <- dotplot(tfdata$value~tdata$variable|tdata$Conc,
main="Y Position over time Line 20 Female",
xlab="Time", ylab="mm above buttom")
d2 <- dotplot(tmdata$value~tdata$variable|tdata$Conc,
main="Y Position over time Line 20 Male",
xlab="Time", ylab="mm above buttom")
grid.arrange(d1,d2,ncol=2)
And that looks like this:
Im trying to combine it into one plot, with two different colors for male and female, i have tried to write it into one dotplot separated by a , and or () but that dosen't work and when i dont split the data and use tdata instead of tfdata and tfmdata i get all the dots in the same color. Im open to suggestions, using another package or another way of plotting the data that still looks somewhat like this since im new to R
All you need to do is to use the group parameter.
dotplot(value~variable|Conc, group=Sex, data=tdata,
main="Y Position over time Line 20 All",
xlab="Time", ylab="mm above buttom")
Also, don't use the $ notation in these functions; notice that you're using value from tfdata but value and variable from tdata. This is a problem because there's twice as many rows in tdata! Instead, use the data argument to specify which data frame to get the variables from.

Generating histogram that can calculate percent recovered

I have the following dataset called df:
Amp Injected Recovered Percent less_0.1_True
0.13175 25.22161274 0.96055540 3.81 0
0.26838 21.05919344 21.06294791 100.02 1
0.07602 16.88526724 16.91541763 100.18 1
0.04608 27.50209048 27.55404507 100.19 0
0.01729 8.31489333 8.31326976 99.98 1
0.31867 4.14961918 4.14876247 99.98 0
0.28756 14.65843377 14.65248551 99.96 1
0.26177 10.64754579 10.76435667 101.10 1
0.23214 6.28826689 6.28564299 99.96 1
0.20300 17.01774090 1.05925850 6.22 0
...
Here, the less_0.1_True column flags whether the Recovered periods were close enough to Injected period to be considered a successful recovery or not. If the flag is 1, then it is a succesful recovery. Based on this, I need to generate a plot (Henderson & Stassun, the Astrophysical Journal, 747:51, 2012) like the following:
I am not sure how to create a histogram like this. The closest I have been do reproduce is a bar plot with the following code:
breaks <- seq(0,30,by=1)
df <- split(dat, cut(dat$Injected,breaks)) # I make bins with width = 1 day
x <- seq(1,30,by=1)
len <- numeric() #Here I store the total number of objects in each bin
sum <- numeric() #Here I store the total number of 1s in each bin
for (i in 1:30){
n <- nrow(df[[i]])
len <- c(len,n)
s <- sum(df[[i]]$less_0.1_True == 1, na.rm = TRUE)
sum <- c(sum,s)
}
percent = sum/len*100 #Here I calculate what the percentage is for each bin
barplot(percent, names = x, xlab = "Period [d]" , ylab = "Percent Recovered", ylim=c(0,100))
And it generates the following bar plot:
Obviously, this plot does not look like the first one and there are issues such as it does not show from 0 to 1 like the first graph (which I understand is the case because the latter is a bar graph and not a histogram).
Could anyone please guide me as to how I may reproduce the first figure based on my dataset?
If I run your code I get errors. You need to use border = NA to get rid of the bar borders:
set.seed(42)
hist(rnorm(1000,4), xlim=c(0,10), col = 'skyblue', border = NA, main = "Histogram", xlab = NULL)
Another example using ggplot2:
ggplot(iris, aes(x=Sepal.Length))+
geom_histogram()
I finally found a solution to the problem in StackOverflow. I guess the solved question was worded differently than mine and so I could not find it when I was looking for it initially. The solution is here: How to plot a histogram with a custom distribution?

want to use another df for errorbars in R with barplot

I have these two df.
x;
experiment expression
1 HC 50
2 LC 4
3 HR 10
4 LR 2
y;
HC_conf_lo HC_conf_hi LC_conf_lo LC_conf_hi HR_conf_lo HR_conf_hi LR_conf_lo LR_conf_hi
1 63.3293 109.925 2.33971 5.26642 8.8504 16.7707 0.124013 0.434046
I want to use df:y to plot low and high conf. points. Output should be a barplot with errorbars. Can someone show me using lines in the basic package how to do this?
So don't know if your data is valid. Assuming the confidence intervals are valid.
Here's what you can do to get error bars in your data
#First reading in your data
x<-read.table("x.txt", header=T)
y<=read.table("y.txt", header =T)
#reshaping y to merge it with x
y.wide <-data.frame(matrix(t(y),ncol=2,byrow=T)) #Transpose Y,
#matrix with 2 cols, byrow,
#so we get the lo and hi values in one row
names(y.wide)<-c("lo","hi") #name the columns in y.wide
#Make a data.frame of x and y.wide
xy.df <-data.frame(x,y.wide) # this will be used for plotting the error bars
#make a matrix for using with barplot (barplot takes only matrix or table)
xy<-as.matrix(cbind(expression=x$expression,y.wide))
rownames(xy)<-x$experiment #rownames, so barplot can label the bars
#Get ylimts for barplot
ylimits <-range(range(xy$expression), range(xy$lo), range(xy$hi))
barx <-barplot(xy[,1],ylim=c(0,ylimits[2])) #get the x co-ords of the bars
barplot(xy[,1],ylim=c(0,ylimits[2]),main = "barplot of Expression with ? bars")
# ? as don't know if it's C.I, or what
with(xy.df, arrows(barx,expression,barx,lo,angle=90, code=1,length=0.1))
with(xy.df, arrows(barx,expression,barx,hi,angle=90, code=1,length=0.1))
Resultant Plot
But it doesn't look right, This is because your expression values don't fall between the lo and hi values.
With the hack below,
barplot(xy[,1],ylim=c(0,ylimits[2]),main = "barplot of Expression with ? bars")
with(xy.df, arrows(barx,lo,barx,hi,angle=90, code=2,length=0.1))
with(xy.df, arrows(barx,hi,barx,lo,angle=90, code=2,length=0.1))
The resultant plot
So look at the both arrows call carefully, and you will see how I achieved it.
I would recommend double checking your calculations though.
And this is far easier with ggplot2. Look at this page for examples and code
http://docs.ggplot2.org/0.9.3.1/geom_errorbar.html

Plotting Two Factors on the same Graph

Say I have two factors and I want to graph them on the same plot, both factors have the same levels.
s1 <- c(rep("male",20), rep("female", 30))
s2 <- c(rep("male",10), rep("female", 40))
s1 <- factor(s1, levels=c("male", "female"))
s2 <- factor(s2, levels=c("male", "female"))
I would have thought that using the table function would have produced the correct result for graphing but it pops out.
table(s1, s2)
s2
s1 male female
male 10 10
female 0 30
So really two questions, what is the table function doing to get this result and what other function can i use to create a graph with 2 series using functions with the same levels?
Also if it is a factor I'm using barplot2 in the gplots package to graph it.
You can achieve slightly more detailed results with lattice package:
s1 <- factor(c(rep("male",20), rep("female", 30)))
s2 <- factor(c(rep("male",10), rep("female", 40)))
D <- data.frame(s1, s2)
library(lattice)
histogram(~s1+s2, D, col = c("pink", "lightblue"))
Or if you want males/females side by side for easier comparison:
t1 <- table(s1)
t2 <- table(s2)
barchart(cbind(t1, t2), stack = F, horizontal = F)
From ?table:
‘table’ uses the cross-classifying factors to build a contingency
table of the counts at each combination of factor levels.
When you do table(s1,s2) what happens is that the function considers s1 and s2 as paired results. Effectively it tells you that if you were to take cbind(s1,s2) then there would be 10 rows of male-male, 10 of male-female and so on.
To understand this consider a very trivial example:
a <- c("M","M","F","F")
b <- c("F","F","M","M")
table(a,b)
b
a F M
F 0 2
M 2 0
What you should do is:
t1 <- table(s1)
t2 <- table(s2)
barplot(cbind(t1,t2), beside=TRUE, col=c("lightblue", "salmon"))
Two options producing slightly different forms of plots are
plot(s1, s2)
and
plot(table(s1,s2))
The former is a spineplot a special case of the mosaic plot, which the plot method for table produces (the second example). See ?spineplot and ?mosaicplot for more details and you can use these functions directly, rather than the generic plot() if you wish.
Also take a look at the mosaic() function in the vcd package on CRAN by Meyer et al (Link to vcd on CRAN)
table() is producing the contingency table for the two factors.
Hmm.. I don't think creating a contingency table is what Cameron was looking for. If I understood him correctly, I think he wanted to create a data frame with two variables in it, where s1 and s2 seems to be vectors of the same size. (length(s1)==length(s2)).
In this case, he would simply need to create a "table" (I think he meant data.frame) using:
df = data.frame(s1=s1, s2=s2);
And then plot the 2 series in the same plot.
So as for the second question of plotting these things, I'd use matplot. For example:
matplot(1:10, data.frame(a=rnorm(10), b=rnorm(10)), type="l", lty=1, lwd=1, col=c("blue","red"))
Given that he has his data of 2 vectors organized in a single data.frame named "df", he can just do something like:
matplot(df, type="l", lty=1, lwd=1, col=c("blue","red"))
Hope this helps.

Plotting Simple Data in R

I have a comma separated file named foo.csv containing the following data:
scale, serial, spawn, for, worker
5, 0.000178, 0.000288, 0.000292, 0.000300
10, 0.156986, 0.297926, 0.064509, 0.066297
12, 2.658998, 6.059502, 0.912733, 0.923606
15, 188.023411, 719.463264, 164.111459, 161.687982
I essentially have two questions:
1) How do I plot the first column (x-axis) versus the second column (y-axis)? I'm trying this (from reading this site):
data <- read.table("foo.csv", header=T,sep=",")
attach(data)
scale <- data[1]
serial <- data[2]
plot(scale,serial)
But I get this error back:
Error in stripchart.default(x1, ...) : invalid plotting method
Any idea what I'm doing wrong? A quick Google search reveals someone else with the same problem but no relevant answer. UPDATE: It turns out it works fine if I skip the two assignment statements in the middle. Any idea why this is?
The second question follows pretty easily after the first:
2) How do I plot the first column (x-axis) versus all the other columns on the y-axis? I presume it's pretty easy once I get around the first problem I'm running into, but am just a bit new to R so I'm still wrapping my head around it.
You don't need the two lines:
scale <- data[1]
serial <- data[2]
as scale and serial are already set from the headers in the read.table.
Also scale <- data[1] creates an element from a data.frame
data[1]
1 5
2 10
3 12
4 15
whereas scale from the read.table is a vector
5 10 12 15
and the plot(scale, serial) function expects vector rather than a data.frame, so you just need to do
plot(scale, serial)
One approach to plotting the other columns of data on the y-axis:
plot(scale,serial, ylab="")
par(new=TRUE)
plot(scale,spawn,axes=F, ylab="", type="b")
par(new=TRUE)
plot(scale,for., axes=F, ylab="", type="b")
par(new=TRUE)
plot(scale,worker,axes=F, ylab="", type="b")
There are probably better ways of doing this, but that is beyond my current R knowledge....
In your example,
plot(scale, serial)
won't work because scale and serial are both data frames, e.g.
class(scale)
[1] "data.frame"
You could try the following and use points(), once the plot has been generated, to plot the remaining columns. Note, I used the ylim parameter in plot to accommodate the range in the third column.
data <- read.csv('foo.csv', header=T)
plot(data$scale, data$serial, ylim=c(0,750))
points(data$scale, data$spawn, col='red')
points(data$scale, data$for., col='green')
points(data$scale, data$worker, col='blue')
I'm new in R, but if you want to draw scale vs. all other columns in one plot, easy and with some elegance :) for printing or presentation, you may use Prof. Hadley Wickham's packages ggplot2 & reshape.
Installation:
install.packages(“ggplot2”,dep=T)
install.packages(“reshape”,dep=T)
Drawing your example:
library(ggplot2)
library(reshape)
#read data
data = read.table("foo.csv", header=T,sep=",")
#melt data “scale vs. all”
data2=melt(data,id=c("scale"))
data2
scale variable value
1 5 serial 0.000178
2 10 serial 0.156986
3 12 serial 2.658998
4 15 serial 188.023411
5 5 spawn 0.000288
6 10 spawn 0.297926
7 12 spawn 6.059502
8 15 spawn 719.463264
9 5 for. 0.000292
10 10 for. 0.064509
11 12 for. 0.912733
12 15 for. 164.111459
13 5 worker 0.000300
14 10 worker 0.066297
15 12 worker 0.923606
16 15 worker 161.687982
#draw all variables at once as line with different linetypes
qplot(scale,value,data=data2,geom="line",linetype=variable)
You could also use points (geom=”points”), choose different colours or shapes for different variables dots (colours=variable or shape=variable), adjust axis, set individual options for every line etc.
Link to online documentation.
I am far from being an R expert, but I think you need a data.frame:
plot(data.frame(data[1],data[2]))
It does at least plot something on my R setup!
Following advice in luapyad's answer, I came up with this. I renamed the header "scale":
scaling, serial, spawn, for, worker
5, 0.000178, 0.000288, 0.000292, 0.000300
10, 0.156986, 0.297926, 0.064509, 0.066297
12, 2.658998, 6.059502, 0.912733, 0.923606
15, 188.023411, 719.463264, 164.111459, 161.687982
then:
foo <- read.table("foo.csv", header=T,sep=",")
attach(foo)
plot( scaling, serial );
Try this:
data <- read.csv('foo.csv')
plot(serial ~ scale, data)
dev.new()
plot(spawn ~ scale, data)
dev.new()
plot(for. ~ scale, data)
dev.new()
plot(worker ~ scale, data)
There is a simple-r way of plotting it:
https://code.google.com/p/simple-r/
Using that script, you just have to type:
r -cdps, -k1:2 foo.csv
To get the plot you want. Put it in the verbose mode (-v) to see the corresponding R script.
data <- read.table(...)
plot(data$scale,data$serial)

Resources