R plot - No control over number of items on x-axis - r

I am trying to draw a plot with all of the x-values shown on the x-axis.
m <- lm(MOCtfd_duration ~ CallMonth)
n <- lm(MTCtfd_duration ~ CallMonth)
summary(m)
summary(n)
title = OpName[1]
plot(MOCtfd_duration ~ CallMonth,type="l", col="red", ylim = c(600,1700), ylab="Duration", las=2, main = title )
lines(MTCtfd_duration ~ CallMonth, col="blue")
axTicks(CallMonth)
abline(m, col="darkred")
abline(n, col="darkblue")
The values for the x-axis items are:
> CallMonth
[1] "2014-05-01" "2014-04-01" "2014-03-01" "2014-02-01" "2014-01-01" "2013-12-01" "2013- 11-01" "2013-10-01" "2013-09-01" "2013-08-01"
[11] "2013-07-01" "2013-06-01" "2013-05-01" "2013-04-01" "2013-03-01" "2013-02-01" "2013-01-01" "2012-12-01" "2012-11-01" "2012-10-01"
[21] "2012-09-01" "2012-08-01" "2012-07-01" "2012-06-01"
But I am only getting two values: 2013 and 2014 even though all the points are being correctly reflected in the plot; this in itself is puzzling since these are not values in the data.
I have tried both using and not using axTicks() (for which the documentation seems very limited), but this doesn't appear to have any effect.
Can someone kindly point out my elementary error!

OK, finally sorted it in large part thanks to http://lukemiller.org/index.php/2014/01/make-your-r-figures-legible-in-powerpointkeynote-presentations/
Suppress the standard x-axis in plot() using xaxt="n"
Use axis.Date to select the side, and use prettyto determine the number of points, the frequency
Use format "%m-%Y for the format of the axis
Use las=2 to turn the new axis items through 90 degrees:
plot(MOCtfd_duration ~ CallMonth,type="l", col="red", ylim = c(0,max(MOCtfd_duration)*1.3), xaxt="n", ylab="Duration", xlab="", main = title )
lines(MTCtfd_duration ~ CallMonth, col="blue")
abline(m, col="darkred")
abline(n, col="darkblue")
legend(min(CallMonth),max(MOCtfd_duration)*1.15, c("MOC", "MTC"), fill=c("red","blue") )
axis.Date(side = 1, at=pretty(CallMonth, min.n=6, by="month"), format = "%m-%Y", las=2)

Related

Dates Messed Up in R

I made a list of dates called newDat that looks like the following:
> newDat
[1] 4.2.20 4.3.20 4.4.20 4.5.20 4.6.20 4.7.20 4.8.20 4.9.20
[9] 4.10.20 4.11.20 4.12.20 4.13.20 4.14.20 4.15.20 4.16.20 4.17.20
[17] 4.18.20 4.19.20 4.20.20 4.21.20 4.22.20 4.23.20 4.24.20 4.25.20
[25] 4.26.20 4.27.20 4.28.20 4.29.20 4.30.20 5.1.20 5.2.20 5.3.20
[33] 5.4.20 5.5.20 5.6.20 5.7.20 5.8.20 5.9.20 5.10.20 5.11.20
[41] 5.12.20 5.13.20 5.14.20 5.15.20 5.16.20 5.17.20 5.18.20 5.19.20
...
I subsequently plot my data by using the following code
plot.ts(as.Date(newDat,"%m.%d.%y"), casesDifferenced, type = "l",
xlab = "Date")
But my x-axis dates are now showing up properly as shown in the image below.
What am I missing here?
Way to fix my problem:
plot(as.Date(newDate,"%m.%d.%y"), casesDifferenced, type = "l", xlab = "Date")

building circular graphic for angles in degrees

I have a data of turning angles for a group of animals separated by occupation areas (breeding ground, migratory route, feeding area).
I need to plot a circular graphic in R for angle values in degrees for each area.
The angle values are like that in the data frame
[1] NA 41.027 -43.410 29.056 18.241 -7.125 -4.702 0.298
[9] 37.846 -7.545 -69.403 -7.376 17.289 7.927 60.752 -85.219
[17] 24.218 -17.482 3.703 -3.901 -8.582 -84.871 38.448 44.028
[25] -150.796 -59.679 -169.927 -6.862 51.130 -1.784 -16.468 -2.356
[33] 5.645 -6.988 4.750 -5.707 2.949 -6.150 -4.129 0.869
[41] -1.935 5.130 0.559 4.686 145.086 14.324 -169.206 1.741
[49] 53.595 15.315 36.892 49.279 21.171 10.739 122.553 -141.081
[57] 3.126 48.323 -7.139 163.742 141.473 47.320 128.430 175.918
[65] 7.447 -16.159 55.957 37.351 -2.703 -25.308 -31.338 NA
[73] NA -16.028 25.110 -31.085 -92.887 88.917 146.903 -148.539
[81] -11.576 41.030 -155.616 -129.368 -32.886 -164.284 -120.785 118.591
[89] 68.335 -98.038 40.347 166.333 19.495 -170.337 -178.322 99.111
can someone help me with this simple question? thank u!
It is not clear what you want, but here are two visualizations that might help. The first just plots points on the unit circle to show the angles. The second version has lines in the directions of turn. BTW, I simply left out your NAs.
Data at the bottom
x = cos(pi*Turns/180)
y = sin(pi*Turns/180)
par(mfrow=c(1,2))
plot(x,y, pch=20, col="#22222266", asp=1)
plot(x,y, pch=20, col="#22222266", asp=1)
N = length(x)
segments(0, 0, x, y)
Data
Turns = c(41.027 -43.410, 29.056, 18.241, -7.125, -4.702, 0.298,
37.846, -7.545 -69.403, -7.376, 17.289, 7.927, 60.752 -85.219,
24.218, -17.482, 3.703, -3.901, -8.582 -84.871, 38.448, 44.028,
-150.796, -59.679 -169.927, -6.862, 51.130, -1.784 -16.468, -2.356,
5.645, -6.988, 4.750, -5.707, 2.949, -6.150, -4.129, 0.869,
-1.935, 5.130, 0.559, 4.686, 145.086, 14.324 -169.206, 1.741,
53.595, 15.315, 36.892, 49.279, 21.171, 10.739, 122.553, -141.081,
3.126, 48.323, -7.139, 163.742, 141.473, 47.320, 128.430, 175.918,
7.447, -16.159, 55.957, 37.351, -2.703 -25.308, -31.338,
-16.028, 25.110, -31.085, -92.887, 88.917, 146.903, -148.539,
-11.576, 41.030, -155.616, -129.368, -32.886, -164.284, -120.785, 118.591,
68.335, -98.038, 40.347, 166.333, 19.495, -170.337, -178.322, 99.111)

Error when generating histogram in R

I have a text file containing:
Tue Feb 11 12:19:39 +0000 2014
Tue Feb 11 12:19:56 +0000 2014
Tue Feb 11 12:20:04 +0000 2014
and i read it into r
dataset <- read.csv("Time.txt")
and in order for R to recognise the timestamps in the file, i write:
time <- strptime(dataset[,1], format = "%a %b %d %H:%M:%S %z %Y")
and whenever i try to plot a histogram with:
hist(time, breaks = 100)
it produces an error together with a generated histogram
In breaks[-1L] + breaks[-nB] : NAs produced by integer overflow
What could be the issue that is prompting this error?
Since you asked what could be causing the error here it is:
The error is created when the hist.default function calculates the midpoints of the histogram. This vector mids <- 0.5 * (breaks[-1L] + breaks[-nB]) calculates the halfway point between each break. The issue arises because the breaks are generated as integers:
If the argument breaks is numeric and length == 1 then the hist.default function (which is called by hist.POSIXt) creates a vector of breaks based on the range of x and the number of breaks. This is done using the pretty command. For reasons I have not looked into too closely, if breaks is small enough that pretty(range(x),n=breaks, min.n = 1) returns only one of each value e.g.:
pretty(range(x), n = 35, min.n = 1)
#[1] 1392121179 1392121180 1392121181 1392121182 1392121183 1392121184
#[7] 1392121185 1392121186 1392121187 1392121188 1392121189 1392121190
#[13] 1392121191 1392121192 1392121193 1392121194 1392121195 1392121196
#[19] 1392121197 1392121198 1392121199 1392121200 1392121201 1392121202
#[25] 1392121203 1392121204
then the output is an integer type. If however, the number of breaks is larger and some of the outputs are duplicated:
pretty(range(x), n = 36, min.n = 1)
# [1] 1392121179 1392121180 1392121180 1392121181 1392121181 1392121182
# [7] 1392121182 1392121183 1392121183 1392121184 1392121184 1392121185
#[13] 1392121185 1392121186 1392121186 1392121187 1392121187 1392121188
#[19] 1392121188 1392121189 1392121189 1392121190 1392121190 1392121191
#[25] 1392121191 1392121192 1392121192 1392121193 1392121193 1392121194
#[31] 1392121194 1392121195 1392121195 1392121196 1392121196 1392121197
#[37] 1392121197 1392121198 1392121198 1392121199 1392121199 1392121200
#[43] 1392121200 1392121201 1392121201 1392121202 1392121202 1392121203
#[49] 1392121203 1392121204 1392121204
then the output is numeric.
Because R uses 32 bit integer types and POSIXt integers are large numbers, adding two POSIXt integers results in an overflow that R can't handle and returns NA. When pretty returns numeric, this is not a problem.
See also: What is integer overflow in R and how can it happen?
In practice, all this means is that, if you print out the hist structure returned, all of your mids values will be NA but I don't think it actually affects the plotting of the histogram. Thus it is only a warning.
EDIT:
pretty internally uses seq.int
In my environement, it does not generate any errors.
dataset <- read.csv("Time.txt", header = F)
time <- strptime(dataset[,1], format = "%a %b %d %H:%M:%S %z %Y")
hist(as.numeric(time), breaks = 100)
Perhaps if you just convert time into numeric as above, error will disappear. Then, it is straightforward to change the x-axis of the histogram.
EDIT : The ggplot2 should not face this issue and is much simpler and modern :
ggplot(dataset) + geom_histogram(aes(x = V1), stat = "count", bins = 100)
Where V1 is the default name of the unique column of dataset created by read.csv().

Change scaling of data on the x-axis

I am having plot my data like that:
(dput(sale))
structure(c(-0.049668136, 0.023675638, -0.032249731, -0.071487224,
-0.034017265, -0.031278933, -0.052070721, -0.034305542, -0.019041209,
-0.050459175, -0.017315808, -0.012787003, -0.03341208, -0.045078144,
-0.036638132, -0.036533367, -0.012683656, -0.014388251, -0.006775188,
-0.037153807, -0.008941402, -0.011760677, -0.005077979, -0.041187417,
-0.001966554, -0.028822067, 0.021828558, 0.016208791, -0.026897492,
-0.032107207, -0.008496522, -0.028027096, -0.013746662, -0.004545603,
-0.005679941, -0.004614187, 0.004083014, -0.012624954, -0.016362079,
-0.006350167, -0.019551277), na.action = structure(42:45, class = "omit"))
[1] -0.049668136 0.023675638 -0.032249731 -0.071487224 -0.034017265
[6] -0.031278933 -0.052070721 -0.034305542 -0.019041209 -0.050459175
[11] -0.017315808 -0.012787003 -0.033412080 -0.045078144 -0.036638132
[16] -0.036533367 -0.012683656 -0.014388251 -0.006775188 -0.037153807
[21] -0.008941402 -0.011760677 -0.005077979 -0.041187417 -0.001966554
[26] -0.028822067 0.021828558 0.016208791 -0.026897492 -0.032107207
[31] -0.008496522 -0.028027096 -0.013746662 -0.004545603 -0.005679941
[36] -0.004614187 0.004083014 -0.012624954 -0.016362079 -0.006350167
[41] -0.019551277
attr(,"na.action")
[1] 42 43 44 45
attr(,"class")
[1] "omit"
(dput(purchase))
structure(c(0.042141187, 0.075875128, 0.090953485, 0.050951625,
0.082566915, 0.184396833, 0.136625887, 0.042725409, 0.135028692,
0.13201904, 0.093634104, 0.16776844, 0.13645719, 0.201365036,
0.227589832, 0.236473792, 0.269064385, 0.200981722, 0.144739536,
0.145256493, 0.040205545, 0.031577107, 0.014767345, 0.005843065,
0.034805051, 0.082493053, 0.010572227, 0.000645763, 0.033368236,
0.024326153, 0.038601182, 0.025446045, 0.000556418, 0.017201608,
0.008316872, 0.059722053, 0.059695415, 0.076940829, 0.067650014,
0.002029566, 0.008466334), na.action = structure(42:45, class = "omit"))
[1] 0.042141187 0.075875128 0.090953485 0.050951625 0.082566915 0.184396833
[7] 0.136625887 0.042725409 0.135028692 0.132019040 0.093634104 0.167768440
[13] 0.136457190 0.201365036 0.227589832 0.236473792 0.269064385 0.200981722
[19] 0.144739536 0.145256493 0.040205545 0.031577107 0.014767345 0.005843065
[25] 0.034805051 0.082493053 0.010572227 0.000645763 0.033368236 0.024326153
[31] 0.038601182 0.025446045 0.000556418 0.017201608 0.008316872 0.059722053
[37] 0.059695415 0.076940829 0.067650014 0.002029566 0.008466334
attr(,"na.action")
[1] 42 43 44 45
attr(,"class")
[1] "omit"
timeLine <- c(-20 , +20)
plot(sale,type="b", xlim=timeLine, ylim=c(-.1,.4) )
lines( purchase, type="b")
abline(v=0, col="black")
The plot I get looks like that:
Whats wrong with the plot is the scaling. My graphs should start at -20 and should got to +20 whereas each data point like -20, -19, -18, ..., +19, +20 is a point in the graph. In my exported csv sheet I have a row with these values. My question is, how to start from -20 so that every data point is an integer number to +20? Is is also possible to display every integer from -20 to +20?
I really appreciate your answer!
UPDATE
The scaling of the axis:
By, default the values are plotted against their index (starting at 1) when x is not specified in plot. You have to create a vector for the x axis.
timeLine <- c(-20 , 20)
# this command generates a sequence from -20 to 20
timeSeq <- Reduce(seq, timeLine)
# now, this sequence is passed to `x`
plot(sale, x = timeSeq, type = "b", xlim = timeLine, ylim = c(-.1, .4) )
lines(purchase, x = timeSeq, type = "b")
abline(v = 0, col = "black")
Update: how to show all x axis labels?
You can show all x axis labels if you decrease their size (cex.axis) and increase the width of the plot. Here's an example.
png("plot.png", width = 1000)
plot(sale,type="b", x = timeSeq, xlim=timeLine, ylim=c(-.1,.4),
xaxt = "n")
lines( purchase, type="b", x = timeSeq)
abline(v=0, col="black")
axis(side = 1, at = timeSeq, cex.axis = 0.75)
dev.off()

Scatterplot with X and Y axis color scales

I attempting a scatter plot with many points (> 150). The goal is to distinguish points at certain areas of the graph. What I'm essentially looking for is a way to have 2 color scales for the x and y axes (1 for each). Essentially, I'm looking for something like this:
Each unique point should be a mix of the colours of the respective scales. What I have tried so far is a scatter plot using ggplot. I've tried setting the colour attribute, but that assigns its own coordinates. It also doesn't work with a limitation I have in that I have to create separate plots of the scatter plot (in short, zoomed in plots of the top-left, top-right, bottom-left, bottom-right). If I set the xlim and ylim to my own liking for the additional plots, all I get is a crop which results in some cutouts of other points and their texts on the edges of the plot. I can't simply create a separate plot as I need the points to be the same colour on my overall plot and the more specific plots (singular colours).
png("image.png", width = 2000, height = 1500, res = 85);
ggplotXY <- ggplot(scatterPlotData, aes(x=x, y=y, colour=labels, label=labels)) +
geom_point() +
geom_text(hjust=0, vjust=0)
ggplotXY
dev.off()
Current overall plot:
Current plot of zoomed in bottom-left:
png("image.png", width = 2000, height = 1500, res = 85);
ggplotXY <- ggplot(scatterPlotData, aes(x=x, y=y, colour=labels, label=labels)) +
geom_point() +
geom_text(hjust=0, vjust=0) +
coord_cartesian(xlim=c(0,100), ylim=c(0, 2.5))
ggplotXY
dev.off()
As you can see, some of the points are clipped and aren't ommitted. In order to leave out the non applicable points, I'll have to create a new data frame with the actual points within the limits, but doing so would alter the colours of the points when I create a new plot. I was thinking about including my own colours for each point as part of my data frame that I'm reading in, but adding and subtracting hex colour codes is not very nice. I tried and got something along these lines:
png("image.png", width = 2000, height = 1500, res = 85);
ggplotXYColor <- ggplot(scatterPlotData, aes(x=x, y=y, label=labels)) +
geom_point(colour=scatterPlotData$scatterPointColour)
ggplotXYColor
dev.off()
In case you are wondering, the scatterPlotData$scatterPointColour is as follows:
[1] "#2276c6" "#224dd0" "#201893" "#22459f" "#21580f" "#219998" "#201893"
[8] "#216871" "#22459f" "#201893" "#2276c6" "#22459f" "#22353d" "#201893"
[15] "#225602" "#21cabe" "#2178d3" "#21eb83" "#21eb83" "#201893" "#201893"
[22] "#22978b" "#2276c6" "#301054" "#201893" "#301054" "#225e33" "#228f59"
[29] "#226664" "#220c47" "#21eb83" "#228f59" "#227ef7" "#227ef7" "#226e95"
[36] "#21c28d" "#22459f" "#228f59" "#223d6e" "#221caa" "#22459f" "#226e95"
[43] "#225602" "#221caa" "#21d2f0" "#222d0c" "#22459f" "#201893" "#2020c4"
[50] "#210623" "#21a1c9" "#201893" "#228f59" "#201893" "#201893" "#221caa"
[57] "#220c47" "#201893" "#22a7ed" "#101893" "#22c080" "#201893" "#2276c6"
[64] "#201893" "#201893" "#21d2f0" "#222d0c" "#21c28d" "#225602" "#226664"
[71] "#226e95" "#201893" "#201893" "#21b22b" "#2020c4" "#21cabe" "#21f3b4"
[78] "#22d0e2" "#201893" "#21c28d" "#21fbe5" "#220c47" "#225602" "#230209"
[85] "#226664" "#210e55" "#211eb7" "#2170a2" "#201893" "#221caa" "#220c47"
[92] "#21f3b4" "#21fbe5" "#201893" "#201893" "#201893" "#224dd0" "#247add"
[99] "#201893" "#23fffc" "#25db1d" "#24188f" "#245a18" "#2449b6" "#24a3d3"
[106] "#201893" "#2451e7" "#24624a" "#24830e" "#2020c4" "#201893" "#201893"
[113] "#25b228" "#25eb80" "#23ced5" "#244185" "#24ed8d" "#243123" "#2449b6"
[120] "#201893" "#273b5e" "#201893" "#264dcd" "#2420c1" "#2578d0" "#264dcd"
[127] "#251eb3" "#22c8b1" "#22c080" "#22f1a7" "#249370" "#251eb3" "#2428f2"
[134] "#2428f2" "#249ba1" "#201893" "#2020c4" "#201893" "#244185" "#2472ac"
[141] "#2449b6" "#247add" "#201893" "#244185" "#243123" "#249370" "#24b435"
[148] "#2020c4" "#248b3f" "#2020c4"
I converted the hex colours to decimal and then added specific decimal colours together and then converted it back to hex. Theoretically, it should be a nice white to yellow on the x-axis and white to blue on the y-axis. As the points increase in x and y, the colours should become more green. As you can see, it's not as simple as that. I haven't come across any libraries that does the 2 axes colours.
To sum up, I need to be able to have the 2 axes colours to give unique colours to the points and a way to create additional plots that will have the exact some colours just on a more zoomed in canvas.
If anyone can help, it would be greatly appreciated.
Here you have a first approach using base graphics for your first problem (mixing two color gradients).
## use white->yellow for the x-axis and white->blue for the y-axis
chooseColors <- function(x, y) {
x <- 1-x/max(x)
y <- 1-y/max(y)
return(rgb(green=y, red=y, blue=x))
}
## example values for the whole range
values <- expand.grid(1:100, 1:100)
## plot it
plot(values, col=chooseColors(values[,1], values[,2]), pch=16)
A more realistic toy example:
set.seed(1)
n <- 50
values <- cbind(sample(1:15, size=n, replace=TRUE), sample(1:15, size=n, replace=TRUE))
## plot it
plot(values, col=chooseColors(values[,1], values[,2]), pch=16)

Resources