I have a dataframe named tab of the following format:
Var1 Freq
1 5 1853
2 15 2862
3 25 7206
4 35 14890
5 45 19856
6 55 23837
7 65 16510
8 75 4729
9 85 830
I want to make a barplot and have the Freq displayed on the bar.
I have tried the following:
plot(tab$Var1, tab$Freq)
text(
x = tab$Var1,
y = tab$Freq,
labels = row.names( tab$Freq ),
adj = 0)
There seems to be two things wrong here:
The plot is showing lines instead of bars, and barplot(tab$Freq)
axis(1, xaxp = tab$Var1) gives error that Error in axis(1, xaxp = tab$Var1) : plot.new has not been called yet (PART 1 SOLVED).
Using the original plot and text function, the texts do not show up.
Thanks.
Related
I am working with the R programming language. Suppose I have the following data:
a = rnorm(1000,10,1)
b = rnorm(200,3,1)
c = rnorm(200,13,1)
d = c(a,b,c)
index <- 1:1400
my_data = data.frame(index,d)
I can make the following histograms of the same data by adjusting the "bin" length (via the "breaks" option):
hist(my_data, breaks = 10, main = "Histogram #1, Breaks = 10")
hist(my_data, breaks = 100, main = "Histogram #2, Breaks = 100")
hist(my_data, breaks = 5, main = "Histogram #3, Breaks = 5")
My Question: In each one of these histograms there are a different number of "bars" (i.e. bins). For example, in the first histogram there are 8 bars and in the third histogram there are 4 bars. For each one of these histograms, is there a way to find out which observations (from the original file "d") are located in each bar?
Right now, I am trying to manually do this, e.g. (for histogram #3)
histogram3_bar1 <- my_data[which(my_data$d < 5 & my_data$d > 0), ]
histogram3_bar2 <- my_data[which(my_data$d < 10 & my_data$d > 5), ]
histogram3_bar3 <- my_data[which(my_data$d < 15 & my_data$d > 10), ]
histogram3_bar4 <- my_data[which(my_data$d < 15 & my_data$d > 20), ]
head(histogram3_bar1)
index d
1001 1001 4.156393
1002 1002 3.358958
1003 1003 1.605904
1004 1004 3.603535
1006 1006 2.943456
1007 1007 1.586542
But is there a more "efficient" way to do this?
Thanks!
hist itself can provide for the solution to the question's problem, to find out which data points are in which intervals. hist returns a list with first member breaks
First, make the problem reproducible by setting the RNG seed.
set.seed(2021)
a = rnorm(1000,10,1)
b = rnorm(200,3,1)
c = rnorm(200,13,1)
d = c(a,b,c)
Now, save the return value of hist and have findInterval tell the bins where each data points are in.
h1 <- hist(d, breaks = 10)
f1 <- findInterval(d, h1$breaks)
h1$breaks
# [1] -2 0 2 4 6 8 10 12 14 16
head(f1)
#[1] 6 7 7 7 7 6
The first six observations are intervals 6 and 7 with end points 8, 10 and 12, as can be seen indexing d by f1:
head(d[f1])
#[1] 8.07743 10.26174 10.26174 10.26174 10.26174 8.07743
As for whether the intervals given by end points 8, 10 and 12 are left- or right-closed, see help("findInterval").
As a final check, table the values returned by findInterval and see if they match the histogram's counts.
table(f1)
#f1
# 1 2 3 4 5 6 7 8 9
# 2 34 130 34 17 478 512 169 24
h1$counts
#[1] 2 34 130 34 17 478 512 169 24
To have the intervals for each data point, the following
bins <- data.frame(bin = f1, min = h1$breaks[f1], max = h1$breaks[f1 + 1L])
head(bins)
# bin min max
#1 6 8 10
#2 7 10 12
#3 7 10 12
#4 7 10 12
#5 7 10 12
#6 6 8 10
I have a dataset that I would like to visualize with barplot() . My question is, why do some labels not show when appended with text() and how does one solve this issue?
For example this is my table
table(test$Freq)
2 3 4 5 6 7 8 9 10 11 12 14 16 44
6338 2544 1072 394 102 29 11 9 5 2 3 1 1 1
And the following barplot will miss the first label:
barplot(table(test$Freq))
text(x = xx, y = test$Freq, label = test$Freq, pos = 3, cex = 0.8, col = "red")
It looks like the text is being plotted outside of your graph.
Try adjusting the ylim value when you call barplot. This should solve your problem.
I'm in a little bit of pain at the moment.
I'm looking for a way to plot compositional data.(https://en.wikipedia.org/wiki/Compositional_data). I have four categories so data must be representable in a 3d simplex ( since one category is always 1 minus the sum of others).
So I have to plot a tetrahedron (edges will be my four categories) that contains my data points.
I've found this github https://gist.github.com/rmaia/5439815 but the use of pavo package(tcs, vismodel...) is pretty obscure to me.
I've also found something else in composition package, with function plot3D. But in this case an RGL device is open(?!) and I don't really need a rotating plot but just a static plot, since I want to save as an image and insert into my thesis.
Update: data looks like this. Consider only columns violent_crime (total), rape, murder, robbery, aggravated_assault
[ cities violent_crime murder rape rape(legally revised) robbery
1 Autauga 68 2 8 NA 6
2 Baldwin 98 0 4 NA 18
3 Barbour 17 2 2 NA 2
4 Bibb 4 0 1 NA 0
5 Blount 90 0 6 NA 1
6 Bullock 15 0 0 NA 3
7 Butler 44 1 7 NA 4
8 Calhoun 15 0 3 NA 1
9 Chambers 4 0 0 NA 2
10 Cherokee 49 2 8 NA 2
aggravated_assault
1 52
2 76
3 11
4 3
5 83
6 12
7 32
8 11
9 2
10 37
Update: my final plot with composition package
Here is how you can do this without a dedicated package by using geometry and plot3D. Using the data you provided:
# Load test data
df <- read.csv("test.csv")[, c("murder", "robbery", "rape", "aggravated_assault")]
# Convert absolute data to relative
df <- t(apply(df, 1, function(x) x / sum(x)))
# Compute tetrahedron coordinates according to https://mathoverflow.net/a/184585
simplex <- function(n) {
qr.Q(qr(matrix(1, nrow=n)) ,complete = TRUE)[,-1]
}
tetra <- simplex(4)
# Convert barycentric coordinates (4D) to cartesian coordinates (3D)
library(geometry)
df3D <- bary2cart(tetra, df)
# Plot data
library(plot3D)
scatter3D(df3D[,1], df3D[,2], df3D[,3],
xlim = range(tetra[,1]), ylim = range(tetra[,2]), zlim = range(tetra[,3]),
col = "blue", pch = 16, box = FALSE, theta = 120)
lines3D(tetra[c(1,2,3,4,1,3,1,2,4),1],
tetra[c(1,2,3,4,1,3,1,2,4),2],
tetra[c(1,2,3,4,1,3,1,2,4),3],
col = "grey", add = TRUE)
text3D(tetra[,1], tetra[,2], tetra[,3],
colnames(df), add = TRUE)
You can tweak the orientation with the phi and theta arguments in scatter3D.
I have a doubt about the use of the barplot function, I have the following function that receives a data.frame as parameter, which can vary widely in the number of rows. I want to print a histogram as image or likeness. The problem is that I always have problems barplot margins. Is there any way to do the same histogram with another library that no problems margins?
function:
HIST_EPC_list<-function(DF_TAG_PHASE_EPC_counter){
num<-nrow(DF_TAG_PHASE_EPC_counter)
barplot(DF_TAG_PHASE_EPC_counter$Num_EPC, names.arg = DF_TAG_PHASE_EPC_counter$Tag_PHASE, xlab = "Tag_PHASE", ylab = "Num_EPC", main="Histograma Num tags/PHASE:", width=40)
par(mar=c(10,10,10,10))
}
data.frame example:
DF_TAG_PHASE_EPC_counter
Tag_PHASE Num_EPC
1 123.0 1
2 75.0 1
3 78.0 1
4 81.0 2
5 84.0 1
6 87.0 1
7 90.0 2
8 98.0 1
Error:
Error in plot.new() : figure margins too large
Called from: barplot(DF_TAG_RSSI_EPC_counter$Num_EPC, names.arg = DF_TAG_RSSI_EPC_counter$Tag_RSSI,
xlab = "Tag_RSSI", ylab = "Num_EPC", main = "Histograma Num tags/RSSI:",
width = 10)
Happy new year to you all!
I am plotting some graphs and would like to differentiate some plotted lines and points. This is an example of my data and the graph that I am trying to get:
anim <- c(1,2,3,4,5)
var1 <- c(32,36,40,38,39)
var2 <- c(30,31,34,36,38)
surv <- c(0,1,0,1,1)
mydf <- data.frame(anim,var1,var2,surv)
mydf
anim var1 var2 surv
1 1 32 30 0
2 2 36 31 1
3 3 40 34 0
4 4 38 36 1
5 5 39 38 1
lm.pos1 <- lm(var1~var2,data=mydf)
plot(mydf$var2,mydf$var1,xlab="ave.ear",ylab="rtemp",xlim=c(25,45),ylim=c(25,45))
abline(lm.pos1)
abline(h=37.6,v=0,col="gray10",lty=20)
abline(h=34,v=0,col="gray10",lty=20)
First, I would like to insert the label "37.6°C" on the top horizontal and continuous line and "34.0°C" on the bottom horizontal and broken line.
Second, I would like to colour those individuals (circles) as red if surv=0 (died) or green if surv=1.
Any help would be very much appreciated!
Baz
plot(mydf$var2, mydf$var1, xlab="ave.ear", ylab="rtemp",
xlim=c(25,45), ylim=c(25,45), col=c('green', 'red')[surv+1])
abline(lm.pos1)
abline(h=37.6,v=0,col="gray10",lty=20)
text(25,38.1,parse(text='37.6*degree'),col='gray10')
abline(h=34,v=0,col="gray10",lty=20)
text(25,34.5,parse(text='34*degree'),col='gray10')