Barplots Not Plotting Frequency Correctly in R - r

Struggling to plot a barplot where the height of each bar is related to the value in a column (in this case, freq). Table name is Tag_Count_2
Tag.1 freq
Hello 3
My 4
Name 10
I tried:
Counts<-table(Tag_Count_2$freq)
barplot(Counts)
But it plots the # of times values under Tag.1 share the same freq. I also tried:
barplot(Tag_Count_2, height=freq)
But it said freq wasn't found.

Are you looking for
barplot(Tag_Count_2$freq)
You don't need the table command in this case as you already have the frequency .

Related

R Question: How can I create a histogram with 2 variables against eachother?

Okay, let me be as clear as I can in my problem. I'm new to R, so your patience is appreciated.
I want to create a histogram using two different vectors. The first vector contains a list of models (products). These models are listed as either integers, strings, or NA. I'm not exactly sure how R is storing them (I assume they're kept as strings), or if that is a relevant issue. I also have a vector containing a list of incidents pertaining to that model. So for example, one row in the dataframe might be:
Model Incidents
XXX1991 7
How can I create a histogram where the number of incidents for each model is shown? So the histogram will look like
| =
| =
Frequency of | =
Incidents | = =
| = = =
| = = = = =
- - - - - -
Each different Model
Just to give a general idea.
I also need to be able to map everything out with standard deviation lines, so that it's easy to see which models are the least reliable. But that's not the main question here. I just don't want to do anything that will make me unable to use standard deviation in the future.
So far, all I really understand is how to make a histogram with the frequency marked, but for some reason, the x-axis is marked with numbers, not the models' names.
I don't really care if I have to download new packages to make this work, but I suspect that this already exists in basic R or ggplot2 and I'm just too dumb to figure it out.
Feel free to ask clarfying questions. Thanks.
EDIT: I forgot to mention, there are multiple rows of incidents listed under each model. So to add to my example earlier:
Model Incidents
XXX1991 7
XXX1991 1
XXX1991 19
3
5
XXX1002 9
XXX1002 4
etc . . .
I want to add up all the incidents for a model under one label.
I am assuming that you did not mean to leave the model blank in your example, so I filled in some values.
You can add up the number of incidents by model using aggregate then make the relevant plot using barplot.
## Example Data
data = read.table(text="Model Incidents
XXX1991 7
XXX1991 1
XXX1991 19
XXX1992 3
XXX1992 5
XXX1002 9
XXX1002 4",
header=TRUE)
TAB = aggregate(data$Incidents, list(data$Model), sum)
TAB
Group.1 x
1 XXX1002 13
2 XXX1991 27
3 XXX1992 8
barplot(TAB$x, names.arg=TAB$Group.1 )

Dot Plots with multiple categories - R

I'm definitely a neophyte to R for visualizing data, so bear with me.
I'm looking to create side-by-side dot plots of seven categorical samples with many gene expression values corresponding with individual gene names. mydata.csv file looks like the following
B27 B28 B30 B31 LTNP5.IFN.1 LTNP5.IFN.2 LTNP5.IL2.1
1 13800.91 13800.91 13800.91 13800.91 13800.91 13800.91 13800.91
2 6552.52 5488.25 3611.63 6552.52 6552.52 6552.52 6552.52
3 3381.70 1533.46 1917.30 2005.85 3611.63 4267.62 5488.25
4 2985.37 1188.62 1051.96 1362.32 2717.68 2985.37 5016.01
5 1917.30 2862.19 2625.29 2493.26 2428.45 2717.68 4583.02
6 990.69 777.97 1269.05 1017.26 5488.25 5488.25 4267.62
I would like each sample data to be organized in its own dot plot in one graph. Additionally, if I could point out individual data points of interest, that would be great.
Thanks!
You can use base R, but you need to convert to matrix first.
dotchart(as.matrix(df))
or, we can transpose the matrix to arrange it by sample:
dotchart(t(as.matrix(df)))
Considering your [toy] data is stored in a data frame called a:
library(reshape2)
library(ggplot2)
a$trial<-1:dim(a)[1] # also, nrow(a)
b<-melt(data = a,varnames = colnames(a)[1:7],id.vars = "trial")
b$variable<-as.factor(b$variable)
ggplot(b,aes(trial,value))+geom_point()+facet_wrap(~variable)
produces
What we did:
Loaded required libraries (reshape2 to convert from wide to long and ggplot2 to, well, plot); melted the data into long formmat (more difficult to read, easier to process) and then plotted with ggplot.
I introduced trial to point to each "run" each variable was measured, and so I plotted trial vs value at each level of variable. The facet_wrap part puts each plot into a subplot region determined by variable.

Creating stacked chart

I have two tables that stores login attempts of users. One table contains all successful logins and the other contains fail attempts. I'm trying to create a stacked chart by using fail login counts and successful login counts. This is how my tables look like :
Success_login Table:
User_ID Site_Address Login_Attempts
1 xxx.xxx.xxx 5
2 xxx.xxy.yyy 10
Fail_login Table:
User_ID Site_Address Login_Attempts
1 xxx.xxx.xxx 2
2 xxx.xxy.yyy 8
How do I use Login_Attempts columns of those two tables to create stacked chart so that I can highlight success and failure attempt? I looked online and I found this code :
# Stacked Bar Plot with Colors and Legend
counts <- table(mtcars$vs, mtcars$gear)
barplot(counts, main="Car Distribution by Gears and VS",
xlab="Number of Gears", col=c("darkblue","red"),
legend = rownames(counts))
However, it does not work, as my two tables have different number of records. I appreciate if you could guide me to the solution.
Thanks
Discussion
First you have to unify your data into a single table. This can be done with a kind of outer join, if you're familiar with SQL. See How to join (merge) data frames (inner, outer, left, right)?. The resulting NAs (for records which failed to join to the opposite table) must be replaced with zeroes in order for the final call to barplot() to work.
You must then derive a matrix in the format required by barplot() for producing stacked bar charts, which can be done pretty easily with a single call to matrix(). Taking care to set labels/titles/legends/colors correctly, you can get a nice stacked bar chart:
Code
s <- data.frame(User_ID=c(1,2,3), Site_Address=c('xxx.xxx.xxx','xxx.xxy.yyy','xxx.yyy.zzz'), Login_Attempts=c(5,10,3) );
f <- data.frame(User_ID=c(1,2,4), Site_Address=c('xxx.xxx.xxx','xxx.xxy.yyy','xxx.yyy.zzz'), Login_Attempts=c(2,8,4) );
all <- merge(s,f,by=c('User_ID','Site_Address'),suffixes=c('.successful','.failed'),all=T);
all[is.na(all)] <- 0;
stackData <- matrix(c(all$Login_Attempts.failed, all$Login_Attempts.successful ),2,byrow=T);
colnames(stackData) <- paste0(all$User_ID, '#', all$Site_Address );
rownames(stackData) <- c('failed','successful');
barplot(stackData,main='Successful and failed login attempts',xlab='User_ID#Site_Address',ylab='Login_Attempts',col=c('red','blue'),legend=rownames(stackData));
Resulting data
r> s;
User_ID Site_Address Login_Attempts
1 1 xxx.xxx.xxx 5
2 2 xxx.xxy.yyy 10
3 3 xxx.yyy.zzz 3
r> f;
User_ID Site_Address Login_Attempts
1 1 xxx.xxx.xxx 2
2 2 xxx.xxy.yyy 8
3 4 xxx.yyy.zzz 4
r> all;
User_ID Site_Address Login_Attempts.successful Login_Attempts.failed
1 1 xxx.xxx.xxx 5 2
2 2 xxx.xxy.yyy 10 8
3 3 xxx.yyy.zzz 3 0
4 4 xxx.yyy.zzz 0 4
r> stackData;
1#xxx.xxx.xxx 2#xxx.xxy.yyy 3#xxx.yyy.zzz 4#xxx.yyy.zzz
failed 2 8 0 4
successful 5 10 3 0
Output
References
How to join (merge) data frames (inner, outer, left, right)?
R: merge unequal dataframes and replace missing rows with 0
https://stat.ethz.ch/R-manual/R-devel/library/base/html/merge.html
http://www.statmethods.net/graphs/bar.html
https://stat.ethz.ch/R-manual/R-devel/library/graphics/html/barplot.html
https://stat.ethz.ch/R-manual/R-devel/library/base/html/matrix.html
Edit: It's a little strange to create a one-bar stacked bar chart, but ok, here's how you can do it, using the above data (all) as a base:
barplot(matrix(c(sum(all$Login_Attempts.failed),sum(all$Login_Attempts.successful))),main='Successful and failed login attempts',ylab='Login_Attempts',col=c('red','blue'),legend=c('failed','successful'));
Edit: Yeah, the y-axis should really cover the stack completely by default, it's a weakness in the base graphics package that it doesn't. You can add ylim=c(0,1.2*sum(do.call(c,all[,3:4]))) as an argument to the barplot() call to force the y-axis to extend at least 20% beyond the high point of the stack. (It's unfortunate that you have to calculate that manually from the input data, but as I said, it's a weakness in the package.)
Also, with regard to my comment about the oneness of the bar, it's just more common for stacked bar charts to be used to compare multiple bars, rather than showing a single bar. (That's why my initial assumption was that you wanted a separate bar for each user/site.) Instead of a single stacked bar, normally you'd see a plain old bar chart showing the different data points side-by-side. But it really depends on your application, so do what works best for you.
Try drawing, by hand, the stacked chart you are trying to create. Does it even make sense?
When convinced that you now know what your desired result should look like, by hand, create a single data.frame or matrix necessary for barplot to create your result. Remember to include special instances e.g. where a user only has successful or unsuccessful logins.
Figure how to put your input data.frames together into the single data.frame in the previous step.
The result of step 2 is your reproducible example you need in order to ask a sensible question here.
Step 3 is what you are asking here, but it does not seem you are sure what the intermediate result should look like.
Step 1 is about visualising the end product, and working back from there.

R histogram from already summarized count

I have a really huge file, thus I had to count frequencies for histogram generation outside the R.
Couldn't find the correct answer in already existing threads. Everything I tried led me to bar plot or failure (even R's exceptions didn't let it plot as histogram the way I tried)
file looks like (it's tab delimited):
freq cov
394104974 1
387288861 3
141169009 4
105488813 2
60039934 6
45109486 5
26318120 7
9691068 8
7532886 9
3973434 10
it has sth like 3k lines.
How can I plot this with ggplot2 as a nice histogram? (cov column holds x axis values)
Cheers,
Irek

How to indicate factors in ggplot with horizontal line and Text

My data looks like this example:
dataExample<-data.frame(Time=seq(1:10),
Data1=runif(10,5.3,7.5),
Data2=runif(10,4.3,6.5),
Application=c("Substance1","Substance1","Substance1",
"Substance1","Substance2","Substance2","Substance2",
"Substance2","Substance1","Substance1"))
dataExample
Time Data1 Data2 Application
1 1 6.511573 5.385265 Substance1
2 2 5.870173 4.512775 Substance1
3 3 6.822132 5.109790 Substance1
4 4 5.940528 6.281412 Substance1
5 5 7.269394 4.680380 Substance2
6 6 6.122454 6.015899 Substance2
7 7 5.660429 6.113362 Substance2
8 8 6.649749 4.344978 Substance2
9 9 7.252656 4.764667 Substance1
10 10 7.204440 5.835590 Substance1
I would like to indicate at which time any Substance was applied that is different from dataExample$Application[1].
Here I show you the way I get this ploted, but I assume that there is a much easier way to do it with ggplot.
library(reshape2)
library(ggplot)
plotDataExample<-function(DataFrame){
longDF<-melt(DataFrame,id.vars=c("Time","Application"))
p=ggplot(longDF,aes(Time,value,color=variable))+geom_line()
maxValue=max(longDF$value)
minValue=min(longDF$value)
yAppLine=maxValue+((maxValue-minValue)/20)
xAppLine1=min(longDF$Time[which(longDF$Application!=longDF$Application[1])])
xAppLine2=max(longDF$Time[which(longDF$Application!=longDF$Application[1])])
lineData=data.frame(x=c(xAppLine1,xAppLine2),y=c(yAppLine,yAppLine))
xAppText=xAppLine1+(xAppLine2-xAppLine1)/2
yAppText=yAppLine+((maxValue-minValue)/20)
appText=longDF$Application[which(longDF$Application!=longDF$Application[1])[1]]
textData=data.frame(x=xAppText,y=yAppText,appText=appText)
p=p+geom_line(data=lineData,aes(x=x, y=y),color="black")
p=p+geom_text(data=textData,aes(x=x,y=y,label = appText),color="black")
return(p)
}
plotDataExample(dataExample)
Question:
Do you know a better way to get a similar result so that I could possibly indicate more than one factor (e.g. Substance3, Substance4 ...).
First, made new sample data to have more than 2 levels and twice repeated Substance2.
dataExample<-data.frame(Time=seq(1:10),
Data1=runif(10,5.3,7.5),
Data2=runif(10,4.3,6.5),
Application=c("Substance1","Substance1","Substance2",
"Substance2","Substance1","Substance1","Substance2",
"Substance2","Substance3","Substance3"))
Didn't make this as function to show each step.
Add new column groups to original data frame - this contains identifier for grouping of Applications - if substance changes then new group is formed.
dataExample$groups<-c(cumsum(c(1,tail(dataExample$Application,n=-1)!=head(dataExample$Application,n=-1))))
Convert to long format data for lines of data.
longDF<-melt(dataExample,id.vars=c("Time","Application","groups"))
Calculate positions for Substance identifiers. Used function ddply() from library plyr. For calculation only data that differs from first Application value are used (that's subset()). Then Application and groups are used for grouping of data. Calculated starting, middle and ending positions on x axis and y value taken as maximal value +0.3.
library(plyr)
lineData<-ddply(subset(dataExample,Application != dataExample$Application[1]),
.(Application,groups),
summarise,minT=min(Time),maxT=max(Time),
meanT=mean(Time),ypos=max(longDF$value)+0.3)
Now plot longDF data with ggplot() and geom_line() and add segments above plot with geom_segment() and text with annotate() using new data frame lineData.
ggplot(longDF,aes(Time,value,color=variable))+geom_line()+
geom_segment(data=lineData,aes(x=minT,xend=maxT,y=ypos,yend=ypos),inherit.aes=FALSE)+
annotate("text",x=lineData$meanT,y=lineData$ypos+0.1,label=lineData$Application)

Resources