I am trying to create a line plot in R. For each 'RuleID' in my data frame I want to plot the 'ErrorCount' at each 'ProcessorTimeStamp'
DQ_Counts= data.frame(RuleID=c(1,2,1,2),
ProcessorTimeStamp=as.Date(c('2016-08-04','2016-08-04','2016-08-08','2016-08-08')),
ErrorCount=c(6,8,3,4))
# RuleID ProcessorTimeStamp ErrorCount
# 1 1 2016-08-04 6
# 2 2 2016-08-04 8
# 3 1 2016-08-08 3
# 4 2 2016-08-08 4
This is a plot I found online that I would like the end result to look like all though I am obviously not talking about trees. The code for this plot is here Code for Tree Growth Plot but I don't understand it well enough to make it work for me.
For my plot 'ProcessTimeStamp' would be my x and 'ErrorCount' would by my y. Each line would represent a different 'RuleID'.
One thing to note is that I have 'ErrorCounts' ranging from 0 to over 3 million (this is why I need to report on them to get them fixed!).
Thanks in advance.
This is probably the easiest way to get a basic plot like the one above with your data
lattice::xyplot(ErrorCount~ProcessorTimeStamp, DQ_Counts,
groups=RuleID, auto.key=T, type="l")
Which returns
or you could use ggplot2
library(ggplot2)
ggplot(DQ_Counts, aes(ProcessorTimeStamp, ErrorCount, color=factor(RuleID))) + geom_line()
to get
How can I plot a recurrency in R.
Any solution with base plot, ggplot2, lattice, or a dedicated package is welcome.
For example:
Imagine I have these data:
mydata <- data.frame(t=1:10, Y=runif(10))
t Y
1 0.3744869
2 0.6314202
3 0.3900789
4 0.6896278
5 0.6894134
6 0.5549006
7 0.4296244
8 0.4527201
9 0.3064433
10 0.5783539
I could transform it like this:
mydata2 <- data.frame(t=c(NA,mydata$t),Y=c(NA,mydata$Y),Y2=c(mydata$Y, NA))
t Y Y2
NA NA 0.9103703
1 0.9103703 0.1426041
2 0.1426041 0.4150476
3 0.4150476 0.2109258
4 0.2109258 0.4287504
5 0.4287504 0.1326900
6 0.1326900 0.4600964
7 0.4600964 0.9429571
8 0.9429571 0.7619739
9 0.7619739 0.9329098
10 0.9329098 NA
(or similar methods, but I can have problems with missing data)
And plot it
plot(Y2~Y, data=mydata2)
I guess I must use some grouping function such as ave or apply. But it's not an elegant solution, and if I have more columns it can become difficult to generalize the transformation.
For example
mydata3 <- data.frame(x=sample(10,100, replace=T),t=1:100, Y=2*runif(100)+1)
For every x (or combination of values on other columns) I want to plot Y_{i+1} ~ Y_i, on the same plot.
Other tools, such as Mathematica have functions to plot sequences directly.
I've found a solution, thoug not very beautiful:
For this sample data.
mydata <- data.frame(x=sample(4,25, replace=T),t=1:25, Y=2*runif(25)+1)
newdata <- mydata[order(mydata$x, mydata$t), ]
newdata$prev <- ave(newdata$Y, newdata$x, FUN=function(x) c(NA,head(x,-1)))
plot(Y~prev, data=newdata)
In this example you don't have rows for every t value, you would need to first generate NAs for missing values. But it's just a quick solution. In my real data I have many observations for each t.
lag.plot can plot recurrence plots but not within each subgroup.
I am a novice R user, hence the question. I refer to the solution on creating stacked barplots from R programming: creating a stacked bar graph, with variable colors for each stacked bar.
My issue is slightly different. I have 4 column data. The last column is the summed total of the first 3 column. I want to plot bar charts with the following information 1) the summed total value (ie 4th column), 2) each bar is split by the relative contributions of each of the three column.
I was hoping someone could help.
Regards,
Bernard
If I understood it rightly, this may do the trick
the following code works well for the example df dataframe
df <- a b c sum
1 9 8 18
3 6 2 11
1 5 4 10
23 4 5 32
5 12 3 20
2 24 1 27
1 2 4 7
As you don't want to plot a counter of variables, but the actual value in your dataframe, you need to use the goem_bar(stat="identity") method on ggplot2. Some data manipulation is necessary too. And you don't need a sum column, ggplot does the sum for you.
df <- df[,-ncol(df)] #drop the last column (assumed to be the sum one)
df$event <- seq.int(nrow(df)) #create a column to indicate which values happaned on the same column for each variable
df <- melt(df, id='event') #reshape dataframe to make it readable to gpglot
px = ggplot(df, aes(x = event, y = value, fill = variable)) + geom_bar(stat = "identity")
print (px)
this code generates the plot bellow
I have 2 data frames, mydf1 and mydf2
> mydf1
id a b c
1 1 2 10 2
2 2 3 11 4
3 3 5 12 6
4 4 7 13 8
5 5 8 14 10
> mydf2
id a b c
1 1 4 20 4
2 2 6 22 8
3 3 10 24 12
4 4 14 26 16
5 5 16 28 20
I would like to plot variables a,b & c against id (sample graphs is given below). I want similar graphs for variables b and c too and I want to do it in a loop and then export it to a local folder. So, I am using the following code
for (i in 2:4) {
jpeg(paste("C:/Data/myplot",i,".jpg"))
ymin<-min(mydf1[,i],mydf2[,i])
ymax<-max(mydf1[,i],mydf2[,i])
plot(mydf1[,1],mydf1[,i],ylim=c(ymin,ymax),xlab="id",ylab=colnames(mydf)[i])
points(mydf2[,1],mydf2[,i],pch=2)
legend("topright",c("mydf1","mydf2"),pch=c(1,2))
dev.off()
}
My problem is that I would like to get all three different graphs, (id vs a (mydf1 and mydf2) , id vs b(mydf1 and mydf2), id vs c(mydf1 and mydf2) in one figure.(something like 2 along the first row of the figure and the third one in the second row with legend) I tried the following
jpeg("C:/Data/myplot.jpg")
par(mfrow=c(2,2))
for (i in 2:4) {
ymin<-min(mydf1[,i],mydf2[,i])
ymax<-max(mydf1[,i],mydf2[,i])
plot(mydf1[,1],mydf1[,i],ylim=c(ymin,ymax),xlab="id",ylab=colnames(mydf)[i])
points(mydf2[,1],mydf2[,i],pch=2)
legend("topright",c("mydf1","mydf2"),pch=c(1,2))
dev.off()
}
But it didn't work. Any suggestion to do this?
p.s: This is the simplified version of my task. Actually I have hundreds of columns, that's why I am using a loop operation
Sample plot id vs a (mydf1 and mydf2) plotted on the same graph
It is unclear what you are trying to do. Do you want 2 plots, one for mydf1 and one for mydf2 or all on one figure? If two panels, you should change to mfrow=c(2,1) instead of c(2,2) which is currently making 4 panels?
If you want them all on a single plot, then remove the par(mfrow... line.
Then within the plots, you are plotting the first series from mydf1 and the other two series from mydf2. Is that actually what you want?
Using base graphics, you should move your plot line outside the loop so it is done once, change the loop to start at 3, and then keep the points statements inside the loop. Alternatively, you could put an if statement inside the loop to see if it is the first time.
You also have a typo in the plot statement with mydf (no number).
And move your dev.off() outside the loop so it only closes the figure once.
Here is some code that generates a single-panel plot, and you should be able to modify it to work for your desired output...
jpeg("myplot.jpg")
for (i in 2:4) {
ymin<-min(mydf1[,i],mydf2[,i])
ymax<-max(mydf1[,i],mydf2[,i])
if (i==2){
plot(mydf1[,1],mydf1[,i],ylim=c(ymin,ymax),xlab="id",ylab=colnames(mydf1)[i])
legend("topright",c("mydf1","mydf2"),pch=c(1,2))
}
else{
points(mydf2[,1],mydf2[,i],pch=2)
}
}
dev.off()
EDIT: After your clarified question, I think the only problem is that dev.off() should be outside the loop. (I recommend PNG or PDF instead of JPEG for any plot worth presenting....)
png("myplot.png")
par(mfrow=c(2,2))
for (i in 2:4) {
ymin<-min(mydf1[,i],mydf2[,i])
ymax<-max(mydf1[,i],mydf2[,i])
plot(mydf1[,1],mydf1[,i],ylim=c(ymin,ymax),xlab="id",ylab=colnames(mydf1)[i])
legend("topright",c("mydf1","mydf2"),pch=c(1,2))
points(mydf2[,1],mydf2[,i],pch=2)
}
dev.off()
I'd do something like
mydf1$g <- 1
mydf2$g <- 2
d3 <- rbind(mydf1, mydf2)
library(reshape2)
d3 <- melt(d3, id.vars = c('id', 'g'))
library(ggplot2)
ggplot(d3, aes(x=id, y=value)) +
geom_point(aes(colour = as.factor(g), shape = variable))
or using facets
ggplot(d3, aes(x=id, y=value)) +
geom_point(aes(colour = as.factor(g))) +
facet_wrap(~variable)
to finally export it
ggsave(file = paste0(tempdir(), 'myplot.png'),
last_plot()
)
Consider the following frequency data:
> table(income)
income
3 5 6 7 8 5000
2 7 2 2 2 1
When I type >hist(income) I get the following histogram
So as you can see, the fact that most income values are concentrated around 5 and there is one value quite distant from the others makes the histogram not look very good. MS Excel can consider the 5000 value as of another category, so the data would like this instead:
> table(income)
income
3 5 6 7 8 more
2 7 2 2 2 1
So plotting this as a histogram would look much better, so you can see the frequency within a shorter range:
Is there anyway to do this either with the hist() function or others functions from lattice or ggplot2? I do however, don't want to overwrite the values that exceed a certain threshold, so as I do lose any information.
Thanks a lot!
Data generation:
income <- c(rep(3,2), rep(5,7), rep(6,2), rep(7,2), rep(8,2), 5000)
Function for preparing data for plotting:
nice.data <- function(x, threshold=10){
x[x>threshold] <- "More"
x
}
Plotting:
library(ggplot2)
ggplot() + geom_histogram(aes(x=nice.data(income))) + xlab("Income")
Result: