R line chart - removing vexing zero line not associated with data - r

I have a simple (yet very large) data set of counts made at different sites from Apr to Aug.
Between mid Apr and July there are no zero counts - yet a line at zero extends from the earliest to latest date.
Here is the part of the data used to make the above chart (columns are- Site.ID, DATE, Visible Number):
data=structure(list(Site.ID = c(302L, 302L, 302L, 302L, 302L, 302L,
302L, 302L, 302L, 302L, 302L, 302L, 304L, 304L, 304L, 304L, 304L,
304L, 304L, 304L, 304L, 304L, 304L, 304L), DATE = structure(c(1L,
2L, 5L, 3L, 4L, 6L, 8L, 7L, 9L, 10L, 11L, 12L, 1L, 2L, 5L, 3L,
4L, 6L, 8L, 7L, 9L, 10L, 11L, 12L), .Label = c("3/21/2014", "3/27/2014",
"4/17/2014", "4/28/2014", "4/8/2014", "5/13/2014", "6/17/2014",
"6/6/2014", "7/10/2014", "7/22/2014", "7/29/2014", "8/5/2014"
), class = "factor"), Visible.Number = c(0L, 0L, 5L, 14L, 20L,
21L, 6L, 8L, 0L, 0L, 0L, 0L, 0L, 0L, 2L, 7L, 7L, 7L, 7L, 5L,
0L, 0L, 0L, 0L)), .Names = c("Site.ID", "DATE", "Visible.Number"
), class = "data.frame", row.names = c(NA, -24L))
attach(data)
DATE<-as.Date(DATE,"%m/%d/%Y")
plot(data$Visible.Number~DATE, type="l", ylab="Visible Number")
I have two sites but there are three lines. How to make R not plot a line along zero?
Thank you for your help!

Your problem is with the multiple site ID's. It plots the first one, then goes back (drawing a line) to draw the second one. Essentially, base plots tries to draw all the lines without "lifting the pen". With base plotting, your option is to plot them separately with lines, perhaps in a for loop. I think stuff like this is easier with ggplot2
library(ggplot2)
ggplot(data, aes(x = DATE, y = Visible.Number, group = Site.ID)) + geom_line()
# if you prefer more base-like styling
ggplot(data, aes(x = DATE, y = Visible.Number, group = Site.ID)) +
geom_line() +
theme_bw()
In base:
plot(data$DATE, data$Visible.Number, type = "n",
ylab = "Visible Number", xlab = "Date")
for(site in unique(data$Site.ID)) {
with(subset(data, Site.ID == site),
lines(Visible.Number ~ DATE)
)
}
N.B. I did not attach my data as you did, so I don't know if the subsetting in the base solution will work properly for you if you do attach. In general, avoid attach; with is a nice way to save typing without attaching, and is much less "risky" in that it doesn't copy your data columns into isolated vectors, thus making them more difficult to keep track of as you subset or otherwise work with your data.

Related

ggpairs formatting for points only

I'm looking to increase the size of the points AND outline them in black while keeping the line weight the same across the remaining plots.
library(ggplot2)
library(GGally)
pp <- ggpairs(pp.sed, columns = c(1,2), aes(color=pond.id, alpha = 0.5)) +
theme_bw()
print(pp)
Which gives me the following figure:
Data for reproducibility, and TIA!
> dput(pp.sed)
structure(list(Fe.259.941 = c(905.2628883, 825.7883359, 6846.128702,
1032.932924, 997.8037721, 588.9599882, 6107.641947, 798.4493611,
1046.38376, 685.2485692, 6452.273486, 730.8656684, 902.8585447,
1039.886406, 7408.801001, 2512.089991, 911.2101809, 941.3712067,
659.1069185, 1070.090445, 1017.666402, 925.3221586, 645.0500668,
954.0009756, 1022.594904, 803.5865352, 7653.184537, 1082.714082,
1048.51115, 773.9070604, 6889.060748, 973.0971769, 1002.091143,
798.9670583, 5089.035978, 2361.713222, 970.8258109, 748.3574529,
3942.04816, 889.1760124), Mn.257.611 = c(17.24667962, 14.90488024,
14.39265671, 20.51133433, 19.92596564, 11.76690074, 19.76386229,
14.29779164, 20.23646264, 13.55374658, 16.8847698, 13.11784439,
15.91777975, 20.64068844, 16.78681661, 28.61732162, 15.88328987,
19.59750367, 13.09735943, 21.59458118, 17.680152, 19.87127449,
12.8082581, 20.12050221, 17.57143193, 18.72196029, 16.21525793,
22.0518966, 18.39642397, 18.32238508, 16.17696923, 20.69668404,
17.96018218, 18.71945309, 16.50162126, 30.60719123, 17.69058768,
14.99048753, 16.28302375, 18.32277507), pond.id = structure(c(6L,
5L, 2L, 1L, 3L, 5L, 2L, 1L, 3L, 5L, 2L, 1L, 6L, 3L, 2L, 4L, 6L,
3L, 4L, 4L, 6L, 3L, 4L, 1L, 6L, 3L, 2L, 1L, 6L, 3L, 2L, 1L, 6L,
3L, 2L, 1L, 6L, 5L, 2L, 1L), .Label = c("LIL", "RHM", "SCS",
"STN", "STS", "TS"), class = "factor")), class = "data.frame", row.names = c(11L,
12L, 13L, 15L, 26L, 27L, 28L, 30L, 36L, 37L, 38L, 40L, 101L,
102L, 103L, 105L, 127L, 128L, 129L, 131L, 142L, 143L, 144L, 146L,
157L, 158L, 159L, 161L, 172L, 173L, 174L, 176L, 184L, 185L, 186L,
188L, 199L, 200L, 201L, 203L))
The GGally package already offers a family of wrap_xxx functions which could be used to set parameters to override default behaviour, e.g. using wrap you could override the default size of points using wrap(ggally_points, size = 5).
To use the wrapped function instead of the default you have to call
ggpairs(..., lower = list(continuous = wrap(ggally_points, size = 5))).
Switching the outline is a bit more tricky. Using wrap we could switch the shape of the points to 21 and set the outline color to "black". However, doing so the points are no longer colored. Unfortunately I have found no way to override the mapping. While it is possible to add a global fill aes, a drawback of doing so is that we lose the black outline for the densities.
One option to fix that is to write a wrapper for ggally_points which adjusts the mapping so that the fill aes is used instead of color.
library(ggplot2)
library(GGally)
ggally_points_filled <- function(data, mapping, ...) {
names(mapping)[grepl("^colour", names(mapping))] <- "fill"
ggally_points(data, mapping, ..., shape = 21)
}
w_ggally_points_filled <- wrap(ggally_points_filled, size = 5, color = "black")
ggpairs(pp.sed, columns = c(1, 2), aes(color = pond.id, alpha = 0.5),
lower = list(continuous = w_ggally_points_filled)) +
theme_bw()

Order Bars in ggplot2 from high to low (when repeating words are used)

I am trying to reorder the bars in ggPlot2's barplot from the highest values to lowest values. Where the highest values are at the top of the barchart and the lowest values are at the bottom.
I've used this stack overflow post in other plots and it works with no problem.
However, ggPlot2 seems to have a problem when there are the same values in both facets. It does not produce the correct ordering in the plot.
Here is what it looks like now. As you can see, it is out of order. Idealy, I'd like the Unvax_to_Vax facet to read (from top to bottom): safe, sheep, good, dumb, stupid, scared and I'd like the Vax_to_Unvax facet to read (from top to bottom): stupid, selfish, ingnorant, dumb, unsade, foolish.
Here is the data and code to reproduce the figure.
df <- structure(list(Var1 = structure(c(8L, 7L, 4L, 1L, 9L, 2L, 5L,
10L, 3L, 1L, 8L, 6L), .Label = c("dumb", "foolish", "good", "ignorant",
"safe", "scared", "selfish", "stupid", "unsafe", "sheep"), class = "factor"),
Freq = c(101L, 94L, 47L, 33L, 29L, 24L, 27L, 22L, 18L, 15L,
15L, 11L), Percent = c(8.82096069868996, 8.20960698689956,
4.10480349344978, 2.882096069869, 2.53275109170306, 2.09606986899563,
5.54414784394251, 4.51745379876797, 3.69609856262834, 3.08008213552361,
3.08008213552361, 2.25872689938398), Group = c("Vax_to_Unvax",
"Vax_to_Unvax", "Vax_to_Unvax", "Vax_to_Unvax", "Vax_to_Unvax",
"Vax_to_Unvax", "Unvax_to_Vax", "Unvax_to_Vax", "Unvax_to_Vax",
"Unvax_to_Vax", "Unvax_to_Vax", "Unvax_to_Vax")), row.names = c(319L,
292L, 147L, 82L, 375L, 98L, 173L, 182L, 76L, 54L, 190L, 176L), class = "data.frame")
ggplot(df,
aes( x= reorder(Var1, Freq), y = Percent, fill = Group)) +
geom_bar(stat="identity") +
facet_wrap(Group ~. , scales = "free") +
coord_flip()
Thank you for your help.

factor order when subsetting within ggplot

I have factors on x-axis and order those factor levels in a way that's intuitive to plot with ggplot. It works fine. However, when I use the subset command within ggplot, it re-orders my original sequence of factors.
Is it possible to do subsetting within ggplot and preserve the order of factor levels?
Here is the data and code:
library(ggplot2)
library(plyr)
dat <- structure(list(SubjectID = structure(c(12L, 4L, 6L, 7L, 12L,
7L, 5L, 8L, 14L, 1L, 15L, 1L, 7L, 1L, 7L, 5L, 4L, 2L, 9L, 6L,
7L, 13L, 12L, 2L, 15L, 3L, 5L, 13L, 13L, 10L, 7L, 8L, 10L, 10L,
1L, 10L, 12L, 7L, 6L, 10L), .Label = c("s001", "s002", "s003",
"s004", "s005", "s006", "s007", "s008", "s009", "s010", "s011",
"s012", "s013", "s014", "s015"), class = "factor"), Parameter = structure(c(7L,
3L, 5L, 3L, 6L, 4L, 6L, 7L, 7L, 4L, 7L, 12L, 8L, 11L, 1L, 4L,
3L, 4L, 6L, 4L, 6L, 6L, 12L, 5L, 12L, 1L, 7L, 13L, 11L, 1L, 4L,
1L, 6L, 13L, 10L, 10L, 10L, 13L, 5L, 8L), .Label = c("(Intercept)",
"c0.008", "c0.01", "c0.015", "c0.02", "c0.03", "PrevCorr1", "PrevFail1",
"c0.025", "c0.004", "c0.006", "c0.009", "c0.012", "c0.005"), class = "factor"),
Weight = c(0.0352725634087837, 1.45546697427904, 2.29457594510248,
0.479548914792514, 6.39680995359234, 1.48829600339586, 2.69253113220079,
-0.171219812386926, -0.453625394224277, 1.43732884325816,
0.742416863226952, 0.256935761466245, -0.29401087047524,
0.34653127811481, 0.33120592543102, 2.79213318878505, 2.47047299128637,
1.022450287681, 6.92891513416868, 0.648982326396105, 6.58336282626389,
6.40600461501379, 1.80062359655524, 3.86658202530889, 1.23833324887194,
-0.026560261876089, 0.121670468861011, 0.9290824087063, 0.349104382483186,
0.24722583823016, 1.82473621255801, -0.712668411699556, 6.51789901685784,
0.74682257127003, 0.0755807984938072, 0.131705709322157,
0.246465073382095, 0.876279316248929, 1.83442709571662, -0.579086982613267
)), .Names = c("SubjectID", "Parameter", "Weight"), row.names = c(2924L,
784L, 1537L, 1663L, 3138L, 1744L, 1266L, 1996L, 3548L, 86L, 3692L,
230L, 1613L, 213L, 1627L, 1024L, 832L, 384L, 2418L, 1568L, 1714L,
3362L, 3200L, 497L, 3632L, 683L, 1020L, 3281L, 3263L, 2779L,
1632L, 1995L, 2674L, 2753L, 312L, 2638L, 3198L, 1809L, 1569L,
2589L), class = "data.frame")
## Sort factors in the order that will make it intuitive to read the plot
## It goes, "(Intercept), "PrevCorr1", "PrevFail1", "c0.004", "c0.006", etc.
paramNames <- levels(dat$Parameter)
contrastNames <- sort(paramNames[grep("c0",paramNames)])
biasNames <- paramNames[!paramNames %in% contrastNames]
dat$Parameter <- factor(dat$Parameter, levels=c(biasNames, contrastNames))
## Add grouping parameter that will be used to plot different weights in different colors
dat$plotColor <-"Contrast"
dat$plotColor[dat$Parameter=="(Intercept)"] <- "Intercept"
dat$plotColor[grep("PrevCorr", dat$Parameter)] <- "PrevSuccess"
dat$plotColor[grep("PrevFail", dat$Parameter)] <- "PrevFail"
p <- ggplot(dat, aes(x=Parameter, y=Weight)) +
# The following command, which adds geom_line to data points of the graph, changes the order of levels
# If I uncomment the next line, the factor level order goes wrong.
#geom_line(subset=.(plotColor=="Contrast"), aes(group=1), stat="summary", fun.y="mean", color="grey50", size=1) +
geom_point(aes(group=Parameter, color=plotColor), size=5, stat="summary", fun.y="mean") +
geom_point(aes(group=Parameter), size=2.5, color="white", stat="summary", fun.y="mean") +
theme(axis.text.x = element_text(angle=45, vjust=1, hjust=1))
print(p)
Here is the plot when geom line is commented
And here is what happens when geom_line is uncommented
If you switch the order in which you plot the objects, the problem disappears:
p <- ggplot(dat, aes(x=Parameter, y=Weight)) +
# The following command, which adds geom_line to data points of the graph, changes the order of levels
# If I uncomment the next line, the factor level order goes wrong.
geom_point(aes(group=Parameter, color=plotColor), size=5, stat="summary", fun.y="mean") +
geom_line(subset = .(plotColor == "Contrast"), aes(group=1), stat="summary", fun.y="mean", color="grey50", size=1) +
geom_point(aes(group=Parameter), size=2.5, color="white", stat="summary", fun.y="mean") +
theme(axis.text.x = element_text(angle=45, vjust=1, hjust=1))
print(p)
I think the problem lies in plotting the subsetted data first, it ditches the levels for the original data, and when you add back in the points, it doesn't know where to put them. When you plot with the original data first, it maintains the levels. I'm not sure though, you might have to take my word on it.

stock price prediction by using nnet

stock<-structure(list(week = c(1L, 2L, 5L, 2L, 3L, 4L, 3L, 2L, 1L, 5L,
1L, 3L, 2L, 4L, 3L, 4L, 2L, 3L, 1L, 4L, 3L),
close_price = c(774000L,
852000L, 906000L, 870000L, 1049000L, 941000L, 876000L, 874000L,
909000L, 966000L, 977000L, 950000L, 990000L, 948000L, 1079000L,
NA, 913000L, 932000L, 1020000L, 872000L, 916000L),
vol = c(669L,
872L, 3115L, 2693L, 575L, 619L, 646L, 1760L, 419L, 587L, 8922L,
366L, 764L, 6628L, 1116L, NA, 572L, 592L, 971L, 1181L, 1148L),
obv = c(1344430L, 1304600L, 1325188L, 1322764L, 1365797L,
1355525L, 1308385L, 1308738L, 1353999L, 1364475L, 1326557L,
1357572L, 1362492L, 1322403L, 1364273L, NA, 1354571L, 1354804L,
1363256L, 1315441L, 1327927L)),
.Names = c("week", "close_price", "vol", "obv"),
row.names = c(16L, 337L, 245L, 277L, 193L, 109L, 323L, 342L, 106L,
170L, 226L, 133L, 72L, 234L, 208L, 329L, 107L, 103L, 71L, 284L, 253L),
class = "data.frame")
I have data set like this form called Nam which has observations of 349 and I want to use nnet to predict close_price.
obs<- sample(1:21, 20*0.5, replace=F)
tr.Nam<- stock[obs,]; st.Nam<- stock[-obs,]
# tr.Nam is a training data set while st.Nam is test data.
library(nnet)
Nam_nnet<-nnet(close_price~., data=tr.Nam, size=2, decay=5e-4)
By this statement, I think I made a certain function to predict close_price.
summary(Nam_nnet)
y<-tr.Nam$close_price
p<-predict(Nam_nnet, tr.Nam, type="raw")
I expected p to be the predicted value of close_price, but it has only values of 1. Why doesn't p have the continuous value of close_price?
tt<-table(y,p)
summary(tt)
tt
I think I could do a bit better with a reproducible example but I think the problem may be one (or more) of several reasons. Firstly, do a str(data) to make sure each variable is of the correct type (factor, numeric, etc.). Also, Neural Nets usually respond better to standardized, scaled, and centered data otherwise the inputs get oversaturated with larger numeric inputs which might be the case if the 'week' variable is numeric.
In summary, definitely check the types of each variable to make sure you are inputting the correct forms and consider scaling your data to be smooth and so the inputs are of comparable magnitudes.

Scaling data in R data frame and fitting gaussian to geom_point

2 questions based on my data.frame
structure(list(Collimator = structure(c(1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L), .Label = c("n", "y"), class = "factor"), angle = c(0L,
15L, 30L, 45L, 60L, 75L, 90L, 105L, 120L, 135L, 150L, 165L, 180L,
0L, 15L, 30L, 45L, 60L, 75L, 90L, 105L, 120L, 135L, 150L, 165L,
180L), X1 = c(2099L, 11070L, 17273L, 21374L, 23555L, 23952L,
23811L, 21908L, 19747L, 17561L, 12668L, 6008L, 362L, 53L, 21L,
36L, 1418L, 6506L, 10922L, 12239L, 8727L, 4424L, 314L, 38L, 21L,
50L), X2 = c(2126L, 10934L, 17361L, 21301L, 23101L, 23968L, 23923L,
21940L, 19777L, 17458L, 12881L, 6051L, 323L, 40L, 34L, 46L, 1352L,
6569L, 10880L, 12534L, 8956L, 4418L, 344L, 58L, 24L, 68L), X3 = c(2074L,
11109L, 17377L, 21399L, 23159L, 23861L, 23739L, 21910L, 20088L,
17445L, 12733L, 6046L, 317L, 45L, 26L, 46L, 1432L, 6495L, 10862L,
12300L, 8720L, 4343L, 343L, 38L, 34L, 60L), average = c(2099.6666666667,
11037.6666666667, 17337, 21358, 23271.6666666667, 23927, 23824.3333333333,
21919.3333333333, 19870.6666666667, 17488, 12760.6666666667,
6035, 334, 46, 27, 42.6666666667, 1400.6666666667, 6523.3333333333,
10888, 12357.6666666667, 8801, 4395, 333.6666666667, 44.6666666667,
26.3333333333, 59.3333333333)), .Names = c("Collimator", "angle",
"X1", "X2", "X3", "average"), row.names = c(NA, -26L), class = "data.frame")
I wish to plot detector counts versus angle with and without a collimator attached to the device. I guess geom_point is probably the best way to summarise the data
p <- ggplot(df, aes(x=angle,y=average,col=Collimator)) + geom_point() + geom_line()
Instead of plotting average count in the y-axis, I would prefer to rescale the data so that the angle with max counts has a value 1 for both collimator Y and N. The way I have done this seems quite cumbersome
range01 <- function(x){(x-min(x))/(max(x)-min(x))}
coly = subset(df,Collimator=='y')
coly$norm_count = range01(coly$average)
coln = subset(df,Collimator=='n')
coln$norm_count = range01(coln$average)
df = rbind(coln,coly)
p <- ggplot(df, aes(x=angle,y=norm_count,col=Collimator) + geom_point() + geom_line()
I'm sure this can be done in a more efficient manner, applying the function to the data.frame based on the variable 'Collimator'. How can I do this?
Also I want to fit a function to the data rather than using geom_line. I think a Gaussian function may work in this case but have no idea how/if I can implement this in stat_smooth. Also can I pull out mead/standard deviation from such a fit?
ggplot2 goes hand in hand with the package plyr:
df <- ddply(df,.(Collimator),
transform,
norm_count1 = (average - min(average)) / (max(average) - min(average)) )
joran's answer scales the highest value to 1 and the lowest to 0; if you just want to scale to make the highest value 1 (and leaving 0 as 0), it is even simpler.
library("plyr")
df <- ddply(df, .(Collimator), transform,
norm.average = average / max(average))
The the plot is
ggplot(df, aes(x=angle,y=norm.average,col=Collimator)) +
geom_point() + geom_line()

Resources