Multiple plot by group by one function - r

I have the following data:
Animal MY Age
1 17.03672067 1
1 17.00833641 2
1 16.97995215 3
1 16.95156788 4
1 16.92318362 5
1 16.88157748 6
2 16.83997133 2
2 16.79836519 3
2 16.75675905 4
2 16.7151529 5
2 16.67354676 6
2 16.63194062 7
3 16.59033447 1
3 16.54872833 2
3 16.50712219 3
3 16.46551604 4
3 16.4239099 5
3 16.38230376 6
4 16.34069761 1
4 16.29909147 2
4 16.25748533 3
4 16.21587918 4
4 16.17427304 5
4 16.1326669 6
I want to plot a scatter plot between MY vs Age for each animal. I use this function
plot(memo$MY[memo$Animal=="1223100747"]~memo$Age[memo$Animal=="1223100747"]).
If I now want to add a same plot (MY vs Age) for another animals, I just need to use function: lines.
However, since I have about 200 animals I do not want to do this manually 100 times. My questions is that: How can I plot these different animals by one function?, instead of using lines, lines ....lines)
Regards,
Phuong

You can use by for example :
by(memo,memo$Animal,FUN=function(x) plot(x$MY~x$Age))

You could use a loop or a matplot if you want to use base R, but I advise you to use package ggplot2.
DF <- read.table(text="Animal MY Age
1 17.03672067 1
1 17.00833641 2
1 16.97995215 3
1 16.95156788 4
1 16.92318362 5
1 16.88157748 6
2 16.83997133 2
2 16.79836519 3
2 16.75675905 4
2 16.7151529 5
2 16.67354676 6
2 16.63194062 7
3 16.59033447 1
3 16.54872833 2
3 16.50712219 3
3 16.46551604 4
3 16.4239099 5
3 16.38230376 6
4 16.34069761 1
4 16.29909147 2
4 16.25748533 3
4 16.21587918 4
4 16.17427304 5
4 16.1326669 6",header=TRUE)
library(ggplot2)
DF$Animal <- factor(DF$Animal)
p1 <- ggplot(DF,aes(x=MY,y=Age,colour=Animal)) + geom_line()
print(p1)

Related

Transforming a looping factor variable into a sequence of numerics

I have a factor variable with 6 levels, which simplified looks like:
1 1 2 2 2 3 3 3 4 4 4 4 5 5 5 6 6 6 1 1 1 2 2 2 2... 1 1 1 2 2... (with n = 78)
Note, that each number is repeated mostly but not always three times.
I need to transform this variable into the following pattern:
1 1 2 2 2 3 3 3 4 4 4 4 5 5 5 6 6 6 7 7 7 8 8 8 8...
where each repetition of the 6 levels continuous counting ascending.
Is there any way / any function that lets me do that?
Sorry for my bad description!
Assuming that you have a numerical vector that represents your simplified version you posted. i.e. x = c(1,1,1,2,2,3,3,3,1,1,2,2), you can use this:
library(dplyr)
cumsum(x != lag(x, default = 0))
# [1] 1 1 1 2 2 3 3 3 4 4 5 5
which compares each value to its previous one and if they are different it adds 1 (starting from 1).
Maybe you can try rle, i.e.,
v <- rep(seq_along((v<-rle(x))$values),v$lengths)
Example with dummy data
x = c(1,1,1,2,2,3,3,3,4,4,5,6,1,1,2,2,3,3,3,4,4)
then we can get
> v
[1] 1 1 1 2 2 3 3 3 4 4 5 6 7 7 8 8 9 9
[19] 9 10 10
In base you can use diff and cumsum.
c(1, cumsum(diff(x)!=0)+1)
# [1] 1 1 2 2 2 3 3 3 4 4 4 4 5 5 5 6 6 6 7 7 7 8 8 8 8
Data:
x <- c(1,1,2,2,2,3,3,3,4,4,4,4,5,5,5,6,6,6,1,1,1,2,2,2,2)

r repeat sequence number sequence while keeping the order of the sequence

I want repeat a sequence for specific length:
Sequence is 1:4 and I want to repeat the sequence till number of rows in a data frame.
Lets say length of the data frame is 24
I tried following:
test <- rep(1:4, each=24/4)
1 1 1 1 1 1 2 2 2 2 2 2 3 3 3 3 3 3 4 4 4 4 4 4
Lengthwise this is fine but i want to retain the sequence
1 2 3 4 1 2 3 4 1 2 3 4.....
You need to use times instead of each
rep(1:4, times=24/4)
[1] 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4
We can just pass it without any argument and it takes the times by default
rep(1:4, 24/4)
#[1] 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4

Pie chart of frequency counts

I've imported a 1-column excel file using gdata, the data is as follows
3 4 3 3 1 4 1 3 2 3 1 1 4 2 3 3 2 6 1 1 3 3 2 2 2 2 1 3 2 1 6 1 3 2 2 1 2 2 4 2
I'm using the pie(md[, 1]) command to create a pie chart for the data, however, I'm getting the following chart when I do this:
.
It's taking the data as 1-40 and then creating the pie width to the data sample rather than having 5 segments (1,2,3,4,6) with width created by the amount of times the result appears, i.e. the frequency counts of unique elements in the vector. How can I achieve that?
Use the ?table function to compute frequencies before applying pie:
table(x)
#x
# 1 2 3 4 6
#10 13 11 4 2
Then, to produce the pie chart of frequencies:
pie(table(x))
produces:
x <- scan(text = "3 4 3 3 1 4 1 3 2 3 1 1 4 2 3 3 2 6 1 1 3 3 2 2 2 2 1 3 2 1 6 1 3 2 2 1 2 2 4 2")

Interpreting the result of 'cutree' from hclust/heatmap.2

I have the following code that perform hiearchical clustering and plot
them in heatmap.
set.seed(538)
# generate data
y <- matrix(rnorm(50), 10, 5, dimnames=list(paste("g", 1:10, sep=""),
paste("t", 1:5, sep="")))
# the actual data is much larger that the above
# perform hiearchical clustering and plot heatmap
test <- heatmap.2(y)
What I want to do is to print the cluster member from each hierarchy
of in the plot. I'm not sure what's the good way to do it.
I tried this:
cutree(as.hclust(test$rowDendrogram), 1:dim(y)[1])
But having problem in interpreting the result.
What's the meaning of each value in the matrix?
For example g9-9=8 . What does 8 mean here?
1 2 3 4 5 6 7 8 9 10
g1 1 1 1 1 1 1 1 1 1 1
g2 1 2 2 2 2 2 2 2 2 2
g3 1 2 2 3 3 3 3 3 3 3
g4 1 2 2 2 2 2 2 2 2 4
g5 1 1 1 1 1 1 1 4 4 5
g6 1 2 3 4 4 4 4 5 5 6
g7 1 2 2 2 2 5 5 6 6 7
g8 1 2 3 4 5 6 6 7 7 8
g9 1 2 3 4 4 4 7 8 8 9
g10 1 2 3 4 5 6 6 7 9 10
Your expert advice will be greatly appreciated.
Column j tells you how your gs should be grouped if you wanted exactly j groups.
Columns 1 and 10 are not very useful, but maybe column 2 is a good example. It is telling you that if you wanted exactly two groups then they would be:
group1: {g1, g5}
group2: {g2, g3, g4, g6, g7, g8, g9, g10}

Predicting LDA topics for new data

It looks like this question has may have been asked a few times before (here and here), but it has yet to be answered. I'm hoping this is due to the previous ambiguity of the question(s) asked, as indicated by comments. I apologize if I am breaking protocol by asking a simliar question again, I just assumed that those questions would not be seeing any new answers.
Anyway, I am new to Latent Dirichlet Allocation and am exploring its use as a means of dimension reduction for textual data. Ultimately I would like extract a smaller set of topics from a very large bag of words and build a classification model using those topics as a few variables in the model. I've had success in running LDA on a training set, but the problem I am having is being able to predict which of those same topics appear in some other test set of data. I am using R's topicmodels package right now, but if there is another way to this using some other package I am open to that as well.
Here is an example of what I am trying to do:
library(topicmodels)
data(AssociatedPress)
train <- AssociatedPress[1:100]
test <- AssociatedPress[101:150]
train.lda <- LDA(train,5)
topics(train.lda)
#how can I predict the most likely topic(s) from "train.lda" for each document in "test"?
With the help of Ben's superior document reading skills, I believe this is possible using the posterior() function.
library(topicmodels)
data(AssociatedPress)
train <- AssociatedPress[1:100]
test <- AssociatedPress[101:150]
train.lda <- LDA(train,5)
(train.topics <- topics(train.lda))
# [1] 4 5 5 1 2 3 1 2 1 2 1 3 2 3 3 2 2 5 3 4 5 3 1 2 3 1 4 4 2 5 3 2 4 5 1 5 4 3 1 3 4 3 2 1 4 2 4 3 1 2 4 3 1 1 4 4 5
# [58] 3 5 3 3 5 3 2 3 4 4 3 4 5 1 2 3 4 3 5 5 3 1 2 5 5 3 1 4 2 3 1 3 2 5 4 5 5 1 1 1 4 4 3
test.topics <- posterior(train.lda,test)
(test.topics <- apply(test.topics$topics, 1, which.max))
# [1] 3 5 5 5 2 4 5 4 2 2 3 1 3 3 2 4 3 1 5 3 5 3 1 2 2 3 4 1 2 2 4 4 3 3 5 5 5 2 2 5 2 3 2 3 3 5 5 1 2 2

Resources