Put data into unequal bin sizes - r

I'm new to R and want to utilize it to directly work with my data. My ultimate goal is to make a histogram / bar plot.
Depth: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10
Percent: .4, .1, .5, .2, .1, .3, .9, .3, .2, .2, .8
I want to take the Depth vector and bin it into unequal chunks (0, 1-5, 6-8, 9-10), and take the Percent values and somehow sum them together for the matching chunks.
For example:
0 -> .4
1-5 -> 1.2
6-8 -> 1.4
9-10 -> 1.0
The actual data set goes into the thousands, and I feel R might be more suited for this then using C++ to group my data into a smaller table before letting R plot it.
I looked up how to use SPLIT and CUT, but I'm not quite sure how to utilize the data after I do cut it into ranges. If I do "breaks" for a CUT, I don't know how to include the Zero initial value (corresponding to .4 in the example).
Any suggestions or approaches would be appreciated.

You're on the right track with cut:
dat <- data.frame(Depth = 0:10,
Percent = c(0.4, 0.1, 0.5, 0.2, 0.1, 0.3, 0.9, 0.3, 0.2, 0.2, 0.8))
cuts <- cut(dat$Depth, breaks=c(0, 1, 6, 9, 11), right=FALSE)
Then you can use aggregate:
aggregate(dat$Percent, list(cuts), sum)
Or as a oneliner:
aggregate(dat$Percent,
list(cut(dat$Depth,
breaks=c(0, 1, 6, 9, 11),
right=FALSE)),
sum)

Related

Adjusting Plot aspect ratio in R

Trying to change the aspect ratio of this plot so that its twice as long as it is tall, here is the code
plot(X,vw,
ylab= "Stress (MPa)",
xlab= "Strain (mm)")
title("Veronda-Westmann")
lines(X,s,col="red")
legend(x=0, y=17, c("Veronda-Westmann", "Experimental"),cex=.8,col=c("black","red"),pch=c(1,NA),lty=c(NA,1))
I used code to try and specify height and width, but this didtn appear to work. New to R so really sorry if this is a stupid Q
Without reproducible data it is a bit hard to exactly reproduce your plot, but you can set asp to your plot which means you can modify the width of the x-axis in any ratio to the y-axis you want. You can use the following code:
X <- c(0,1,2,3,4,5, 6, 7, 8, 9, 10)
vw <- c(0, 0.5, 0.75, 1, 1.5, 3, 5, 8, 10, 15, 22)
s <- c(0, 0.5, 0.75, 1, 1.5, 3, 5, 8, 10, 15, 22)
plot(X,vw,
ylab= "Stress (MPa)",
xlab= "Strain (mm)", asp = 2)
title("Veronda-Westmann")
lines(X,s,col="red")
legend("topleft",
c("Veronda-Westmann", "Experimental"),
cex=.8,col=c("black","red"),
pch=c(1,NA),
lty=c(NA,1))
Output:
Are you looking to export this figure? If so, you can simply specify this on export:
x <- seq(0,5, 0.1)
y <- seq(0,15, 0.3)
Plot 1: native aspect ratio
png("test.png") # or pdf, etc
plot(x, y)
dev.off()
Plot 2: twice as wide:
png("test2.png", width = 1000, height = 500) # random dimensions
plot(x, y)
dev.off()

x-y Plot based on 2 column table

I'm really new to R and I'm trying to convert a 2 column table into an xy-Plot.
Here's my .csv:
x [cm];y [cm]
0.5;0
2.6;9
0.5;1
0.6;2
0.7;3
0.8;4
1;5
1.2;6
1.5;7
1.9;8
Now: plot(data$`x [cm]`,data$`y [cm]`, type="b").
However I get this result:
I'm not quite sure why (0.5/y) and (2.6/y) are connected..
What I want is a simple line connecting all the dots since they are representing electric field lines. Is there an easy way of doing that?
Sort your data first:
data <- data[order(data[,1]),]
plot(data[,1], data[,2], type="b", xlab="x [cm]", ylab="y [cm]")
The points are connected like this because the connection is created based on their order in the matrix.
m <- matrix(c(
0.5, 0,
0.5, 1,
0.6, 2,
0.7, 3,
0.8, 4,
1, 5,
1.2, 6,
1.5, 7,
1.9, 8,
2.6, 9), ncol = 2, byrow = TRUE)
colnames(m) <- c("x", "y")
plot(m, type = "b")
Simly regrouping the matrix solves your problem.
You can use
library(ggplot2)
ggplot(data, aes(x=`x [cm]`, y=`y [cm]`)) + geom_point() + geom_line()
Or using base R plot
plot(data$`x [cm]`, data$`y [cm]`,
xlim=range(data$`x [cm]`), ylim=range(data$`y [cm]`),
xlab="x [cm]", ylab="y [cm]")
lines(data$`x [cm]`[order(data$`x [cm]`)], data$`y [cm]`[order(data$`y [cm]`)],
xlim=range(data$`x [cm]`), ylim=range(data$`y [cm]`))

How to get quantile from category count in R?

For example, I have a sample data of human height in a DataFrame:
df <- data_frame(height = c(1.5, 1.6, 1.7, 1.8, 1.9), number = c(20, 30, 50, 30, 20))
How can I calculate the 90% quantile of this sample?
I know ggplot2 has a function can plot the ecdf of the sample:
ggplot(df, aes(x = height, y = number)) + stat_ecdf()
but I only need a specified quantile not the plot.
I could repeat each height number times to make a vector and use the quantile function on the vector, but as the number getting larger, this method seems to be very inefficient.
EDIT:
It seems stat_ecdf are not supposed to be used in this way, and when data distribution is skewed:
df <- data_frame(height = c(1.5, 1.6, 1.7, 1.8, 1.9), number = c(100, 2, 3, 4, 5))
only quantile of the repeated vector gives the desired result:
quantile(c(rep(1.5,100), rep(1.6,2), rep(1.7,3), rep(1.8,4), rep(1.9,5)))

R: Bar plot on a continuous x-axis (time-scaled)

I'm fairly new to R so please comment on anything you see.
I have data taken at different timepoints, under two conditions (for one timpoint) and I want to plot this as a bar plot with errorbars and with the bars at the appropriate timepoint.
I currently have this (stolen from another question on this site):
library(ggplot2)
example <- data.frame(tp = factor(c(0, "14a", "14b", 24, 48, 72)), means = c(1, 2.1, 1.9, 1.8, 1.7, 1.2), std = c(0.3, 0.4, 0.2, 0.6, 0.2, 0.3))
ggplot(example, aes(x = tp, y = means)) +
geom_bar(position = position_dodge()) +
geom_errorbar(aes(ymin=means-std, ymax=means+std))
Now my timepoints are a factor, but the fact that there is an unequal distribution of measurements across time makes the plot less nice.!
This is how I imagine the graph :
I find the ggplot2 package can give you very nice graphs, but I have a lot more difficulty understanding it than I have with other R stuff.
Before we get into R, you have to realize that even in a bar plot the x axis needs a numeric value. If you treat them as factors then the software assumes equal spacing between the bars by default. What would be the x-values for each of the bars in this case? It can be (0, 14, 14, 24, 48, 72) but then it will plot two bars at point 14 which you don't seem to want. So you have to come up with the x-values.
Joran provides an elegant solution by modifying the width of the bars at position 14. Modifying the code given by joran to make the bars fall at the right position in the x-axis, the final solution is:
library(ggplot2)
example <- data.frame(tp = factor(c(0, "14a", "14b", 24, 48, 72)), means = c(1, 2.1, 1.9, 1.8, 1.7, 1.2), std = c(0.3, 0.4, 0.2, 0.6, 0.2, 0.3))
example$tp1 <- gsub("a|b","",example$tp)
example$grp <- c('a','a','b','a','a','a')
example$tp2 <- as.numeric(example$tp1)
ggplot(example, aes(x = tp2, y = means,fill = grp)) +
geom_bar(position = "dodge",stat = "identity") +
geom_errorbar(aes(ymin=means-std, ymax=means+std),position = "dodge")

How to rescale label on x axis in log2(n+1) format?

I want to format my x-axis in log2(n+1) format so the x-axis labels correspond to 1, 2, 4, 16 and so on.
Input:
x <- c(1, 2, 3, 11, 15)
y <- c(1.1, 1.2, .4, 2.1, 1.5)
plot(log2(x + 1), y, axes=FALSE)
axis(1, at=(labels=as.character(formatC(x))), cex.axis=0.9)
But plot I get still has the original x-axis values.
How can I make my x-axis powers of 2 (1, 2, 4, 16, etc.)?
I guess this is what you want.
x<-c(1,2,3,11,15)
y<-c(1.1,1.2,.4,2.1,1.5)
lab<-c(1,2,4,16)
plot(log2(x+1),y,xaxt="n",xlab="x")
axis(1,at=log2(lab+1),labels=lab)
It might also be useful to calculate equally spaced labels:
lab<-round(2^seq(min(log2(x+1)),max(log2(x+1)),length.out=4)-1)

Resources