I'm working with TraMineR to do a sequence analysis of educational data. I can get R to produce a plot of the 10 most frequent sequences in the data using code similar to the following:
library(TraMineR)
##Loading the data
data(actcal)
##Creating the labels and defining the sequence object
actcal.lab <- c("> 37 hours", "19-36 hours", "1-18 hours", "no work")
actcal.seq <- seqdef(actcal, 13:24, labels=actcal.lab)
## 10 most frequent sequences in the data
actcal.freq <- seqtab(actcal.seq)
actcal.freq
## Plotting the object
seqfplot(actcal.seq, pbarw=FALSE, yaxis="pct", tlim=10:1, cex.legend=.75, withlegend="right")
However, I'd also like to have the frequencies of each sequence (which are in the object actcal.freq) along the right side of the plot. For example, the first sequence in the plot created by the code above represents 37.9% of the data (as the plot currently shows). Per the seqtab, this is 757 subjects. I'd like the number 757 to appear on the right y-axis (and so on for the other sequences).
Is this possible? I've played around with axis(side=4, ...) but never been able to get it to reproduce the spacing of the left y-axis.
OK. This is a bit of a mess, but the function resets the par setting if you include a legend by default, so you need to turn that off. Then you can set the axis a bit more easily, and then we can go back for the legend. This should work with your test data above.
#add padding to the right for axis and legend
par("mar"=c(5,4,4,8)+.1)
#plot w/o axis
seqfplot(actcal.seq, pbarw=FALSE, yaxis="pct", tlim=10:1, withlegend=F)
#plot right axis with freqs
axis(4, at = seq(.7, by=1.2, length.out=length(attr(actcal.freq,"freq")$Freq)),
labels = rev(attr(actcal.freq,"freq")$Freq),
mgp = c(1.5, 0.5, 0), las = 1, tick = FALSE)
#now put the legend on
legend("right", legend=attr(actcal.seq, "labels"),
fill=attr(actcal.seq, "cpal"),
inset=-.3, bty="o", xpd=NA, cex=.75)
You may need to play a bit with the margins and especially the inset= parameter of the legend to get it placed correctly. I hope your real data isn't too much different than this because you really have to dig though the function to see how it does the formatting to get things to match up.
Related
I made a time series plot of several random walks and by now I understand how to extract a certain part of it and how to change the ticks from years to months. But even after long testing I don't get how to manipulate the x-axis in my graph properly.
Right now, it displays 50 year-steps and only every second white vertical grid line is labelled (why? In every tutorial I watch all lines are labelled instead). What I want to achieve is to change the scaling, so less space is used horizontally (i.e. reduce the space between all the ticks on the x-axis), so the first tick would be at 2000, the second (not the third as is currently the case) at 2050, and so on. I think this should be somehow achievable with breaks, but I can't figure it out. Finally the plot starts and ends too early on the left and on the right, but I believe I can handle that.
Here is the plot:
set.seed(21)
n <- 2500
x <- matrix(replicate(20,cumsum(sample(c(-1, 1), n, TRUE))),nrow = 2500,ncol=20)
aa <- x
rnames <- seq(as.Date("2010-01-01"), length=dim(aa)[1], by="1 month") - 1
rownames(aa) <- format(as.POSIXlt(rnames, format = "%Y-%m-%d"), format = "%d.%m.%Y")
colnames(aa) <- paste0("aa",1:k)
library("ggplot2")
library("reshape2")
library("scales")
aa <- melt(aa, id.vars = rownames(aa))
names(aa) <- c("time","id","value")
aa$time <- as.Date(aa$time, "%d.%m.%Y")
ggplot(aa, aes(x=time,y=value,colour=id,group=id)) +
geom_line()
By default, ggplot adds a minor grid line (that is, a grid line without a tick mark or tick label) between each major grid line. To include only major grid lines add scale_x_date(minor_breaks=NULL). (If you're not seeing minor grid lines in the tutorial videos you've watched, my guess is that they are there, but difficult or impossible to see due to insufficient resolution and/or small size of the video image.)
To reduce the physical distance between tick marks, you would need to change the aspect ratio of the plot. For example, if you want the vertical extent of the plot to be, say 3", then you would need to shrink the horizontal extent until you get a small enough distance between tick marks. First, let's create a plot:
ggplot(aa, aes(x=time,y=value,colour=id,group=id)) +
geom_line(show.legend=FALSE) +
scale_x_date(minor_breaks=NULL)
Here are two examples of rendering the plot:
UPDATE: To answer the comment: For the plots above, I used grid.arrange to create the plot layout and then saved it as a png from the RStudio plot window. I used the widths argument to make one plot thinner than the other.
library(gridExtra)
grid.arrange(p1, p1, widths=c(0.6,0.4), ncol=2)
However, you can adjust the size precisely in many different ways, depending on what format you desire. For example:
# PNG format
png("wide.png", 500,500)
p1
dev.off()
png("narrow.png", 300,500)
p1
dev.off()
# PDF format
pdf("wide.pdf", 5, 5)
p1
dev.off()
pdf("narrow.pdf", 3, 5)
p1
dev.off()
I want to change x-axis in my graphic, but it doesn't work properly with axis(). Datas in the graphic are daily datas and I want to show only years. Hope someone understands me and find a solution. This is how it looks like now: enter image description here and this is how it looks like with the code >axis (1, at = seq(1800, 1975, by = 25), las=2): enter image description here
Without a reproducible code is not easy to get what could be the problem. I try a "quick and dirt" approach.
High level plots are composed by elements that are sub-composed themselves. Hence, separate drawing commands could turn in use by allowing a finer control on the plotting procedure.
In practice, the first thing to do is plot "nothing".
> plot(x, y, type = "n", xlab = "", ylab = "", axes = F)
type = "n" causes the data to not be drawn. axes = F suppresses the axis and the box around the plot. In spite of that, the plotting region is ready to show the data.
The main benefit is that now the plotting area is correctly dimensioned. Try now to add the desired x axis as you tried before.
> points(x, y) # Plots the data in the area
> axis() # Plots the desired axis with your scale
> title() # Plots the desired titles
> box() # Prints the box surrounding the plot
EDITED based on comment by #scoa
As a quick and dirty solution, you can simply enter the following line after your plot() line:
# This reads as, on axis x (1), anchored at the first (day) value of 0
# and last (day) value of 63917 with 9131 day year increments (by)
# and labels (las) perpendicular (2) to axis (for readability)
# EDITED: and AT the anchor locations, put the labels
# 1800 (year) to 1975 (year) in 25 (year) increments
axis (1, at = seq(0, 63917, by = 9131), las=2, labels=seq(1800, 1975, by=25));
For other parameters, check out ?axis. As #scoa mentioned, this is approximate. I have used 365.25 as a day-to-year conversion, but it's not quite right. It should suffice for visual accuracy at the scale you have provided. If you need precise conversion from days to years, you need to operate on your original data set first before plotting.
How to change the axis length? for ex:
s <- data.table(school=rep(1:3,5), wave=c(rep(1,7), rep(2,8)), v1=rpois(15,10))
plot(s$wave,s$v2)
I get a scatter plot where the data is at the edges of the plot (a lot of white space in the graph). changing the xaxp values doesn't help (tried xaxp=c(-1, +2,4)) but nothing happened) and when I try to define it a factor I get a box plot. I know I can "squeeze" it when i save to .png but is there any other way?
I tried to upload pictures to convey the problem but I don't have enough reputation.
edit-thanks for whoever uploaded it (although the axis are reversed - wave is the x and V2 is the y). the thing is that there is a lot of "free space" between the 1st and the 2nd wave. the position is perfect when i define the wave a factor (it's centered and each factor is half the axis length) but it keeps giving me a box plot!
You can add a lot of values to your plot function, like colour, title, and also the limits of the axsis
Your code:
s <- data.frame(school=rep(1:3,5), wave=c(rep(1,7), rep(2,8)), v1=rpois(15,10))
plot(s$wave,s$v2)
And now just add some more:
plot(
x = s$wave,
y = s$v2,
col = "red",
main = "This is my title",
xlab = "the label of the x-axis",
ylab = "the label of the y-axis",
xlim = c(-5, 5), # the limits of the x-axis,
ylim = c(-4, 10) # the limits of the y-axis
)
You can add much more like size and type of the points ...
just as jlhoward mentioned
i found a function in the "lattice" package that does exactly what i want - a boxplot without the box.
the function is called stripplot.
http://www.math.ucla.edu/~anderson/rw1001/library/base/html/stripplot.html
thank you all for the help
I'm looking to plot a set of sparklines in R with just a 0 and 1 state that looks like this:
Does anyone know how I might create something like that ideally with no extra libraries?
I don't know of any simple way to do this, so I'm going to build up this plot from scratch. This would probably be a lot easier to design in illustrator or something like that, but here's one way to do it in R (if you don't want to read the whole step-by-step, I provide my solution wrapped in a reusable function at the bottom of the post).
Step 1: Sparklines
You can use the pch argument of the points function to define the plotting symbol. ASCII symbols are supported, which means you can use the "pipe" symbol for vertical lines. The ASCII code for this symbol is 124, so to use it for our plotting symbol we could do something like:
plot(df, pch=124)
Step 2: labels and numbers
We can put text on the plot by using the text command:
text(x,y,char_vect)
Step 3: Alignment
This is basically just going to take a lot of trial and error to get right, but it'll help if we use values relative to our data.
Here's the sample data I'm working with:
df = data.frame(replicate(4, rbinom(50, 1, .7)))
colnames(df) = c('steps','atewell','code','listenedtoshell')
I'm going to start out by plotting an empty box to use as our canvas. To make my life a little easier, I'm going to set the coordinates of the box relative to values meaningful to my data. The Y positions of the 4 data series will be the same across all plotting elements, so I'm going to store that for convenience.
n=ncol(df)
m=nrow(df)
plot(1:m,
seq(1,n, length.out=m),
# The following arguments suppress plotting values and axis elements
type='n',
xaxt='n',
yaxt='n',
ann=F)
With this box in place, I can start adding elements. For each element, the X values will all be the same, so we can use rep to set that vector, and seq to set the Y vector relative to Y range of our plot (1:n). I'm going to shift the positions by percentages of the X and Y ranges to align my values, and modified the size of the text using the cex parameter. Ultimately, I found that this works out:
ypos = rev(seq(1+.1*n,n*.9, length.out=n))
text(rep(1,n),
ypos,
colnames(df), # These are our labels
pos=4, # This positions the text to the right of the coordinate
cex=2) # Increase the size of the text
I reversed the sequence of Y values because I built my sequence in ascending order, and the values on the Y axis in my plot increase from bottom to top. Reversing the Y values then makes it so the series in my dataframe will print from top to bottom.
I then repeated this process for the second label, shifting the X values over but keeping the Y values the same.
text(rep(.37*m,n), # Shifted towards the middle of the plot
ypos,
colSums(df), # new label
pos=4,
cex=2)
Finally, we shift X over one last time and use points to build the sparklines with the pipe symbol as described earlier. I'm going to do something sort of weird here: I'm actually going to tell points to plot at as many positions as I have data points, but I'm going to use ifelse to determine whether or not to actually plot a pipe symbol or not. This way everything will be properly spaced. When I don't want to plot a line, I'll use a 'space' as my plotting symbol (ascii code 32). I will repeat this procedure looping through all columns in my dataframe
for(i in 1:n){
points(seq(.5*m,m, length.out=m),
rep(ypos[i],m),
pch=ifelse(df[,i], 124, 32), # This determines whether to plot or not
cex=2,
col='gray')
}
So, piecing it all together and wrapping it in a function, we have:
df = data.frame(replicate(4, rbinom(50, 1, .7)))
colnames(df) = c('steps','atewell','code','listenedtoshell')
BinarySparklines = function(df,
L_adj=1,
mid_L_adj=0.37,
mid_R_adj=0.5,
R_adj=1,
bottom_adj=0.1,
top_adj=0.9,
spark_col='gray',
cex1=2,
cex2=2,
cex3=2
){
# 'adJ' parameters are scalar multipliers in [-1,1]. For most purposes, use [0,1].
# The exception is L_adj which is any value in the domain of the plot.
# L_adj < mid_L_adj < mid_R_adj < R_adj
# and
# bottom_adj < top_adj
n=ncol(df)
m=nrow(df)
plot(1:m,
seq(1,n, length.out=m),
# The following arguments suppress plotting values and axis elements
type='n',
xaxt='n',
yaxt='n',
ann=F)
ypos = rev(seq(1+.1*n,n*top_adj, length.out=n))
text(rep(L_adj,n),
ypos,
colnames(df), # These are our labels
pos=4, # This positions the text to the right of the coordinate
cex=cex1) # Increase the size of the text
text(rep(mid_L_adj*m,n), # Shifted towards the middle of the plot
ypos,
colSums(df), # new label
pos=4,
cex=cex2)
for(i in 1:n){
points(seq(mid_R_adj*m, R_adj*m, length.out=m),
rep(ypos[i],m),
pch=ifelse(df[,i], 124, 32), # This determines whether to plot or not
cex=cex3,
col=spark_col)
}
}
BinarySparklines(df)
Which gives us the following result:
Try playing with the alignment parameters and see what happens. For instance, to shrink the side margins, you could try decreasing the L_adj parameter and increasing the R_adj parameter like so:
BinarySparklines(df, L_adj=-1, R_adj=1.02)
It took a bit of trial and error to get the alignment right for the result I provided (which is what I used to inform the default values for BinarySparklines), but I hope I've given you some intuition about how I achieved it and how moving things using percentages of the plotting range made my life easier. In any event, I hope this serves as both a proof of concept and a template for your code. I'm sorry I don't have an easier solution for you, but I think this basically gets the job done.
I did my prototyping in Rstudio so I didn't have to specify the dimensions of my plot, but for posterity I had 832 x 456 with the aspect ratio maintained.
I'm creating a histogram in R which displays the frequency of several events in a vector. Each event is represented by an integer in the range [1, 9]. I'm displaying the label for each count vertically below the chart. Here's the code:
hist(vector, axes = FALSE, breaks = chartBreaks)
axis(1, at = tickMarks, labels = eventTypes, las = 2, tick = FALSE)
Unfortunately, the labels are too long, so they are cut off by the bottom of the window. How can I make them visible? Am I even using the right chart?
Look at help(par), in particular fields mar (for the margin) and oma (for outer margin).
It may be as simple as
par(mar=c(5,3,1,1)) # extra large bottom margin
hist(vector, axes = FALSE, breaks = chartBreaks)
axis(1, at = tickMarks, labels = eventTypes, las = 2, tick = FALSE)
This doesn't sound like a job for a histogram - the event is not a continuous variable. A barplot or dotplot may be more suitable.
Some dummy data
set.seed(123)
vec <- sample(1:9, 100, replace = TRUE)
vec <- factor(vec, labels = paste("My long event name", 1:9))
A barplot is produced via the barplot() function - we provide it the counts of each event using the table() function for convenience. Here we need to rotate labels using las = 2 and create some extra space of the labels in the margin
## lots of extra space in the margin for side 1
op <- par(mar = c(10,4,4,2) + 0.1)
barplot(table(vec), las = 2)
par(op) ## reset
A dotplot is produced via function dotchart() and has the added convenience of sorting out the plot margins for us
dotchart(table(vec))
The dotplot has the advantage over the barplot of using much less ink to display the same information and focuses on the differences in counts across groups rather than the magnitudes of the counts.
Note how I've set the data up as a factor. This allows us to store the event labels as the labels for the factor - thus automating the labelling of the axes in the plots. It also is a natural way of storing data like I understand you to have.
Perhaps adding \n into your labels so they will wrap onto 2 lines? It's not optimal, but it may work.
You might want to look at this post from Cross Validated