I've imported a 1-column excel file using gdata, the data is as follows
3 4 3 3 1 4 1 3 2 3 1 1 4 2 3 3 2 6 1 1 3 3 2 2 2 2 1 3 2 1 6 1 3 2 2 1 2 2 4 2
I'm using the pie(md[, 1]) command to create a pie chart for the data, however, I'm getting the following chart when I do this:
.
It's taking the data as 1-40 and then creating the pie width to the data sample rather than having 5 segments (1,2,3,4,6) with width created by the amount of times the result appears, i.e. the frequency counts of unique elements in the vector. How can I achieve that?
Use the ?table function to compute frequencies before applying pie:
table(x)
#x
# 1 2 3 4 6
#10 13 11 4 2
Then, to produce the pie chart of frequencies:
pie(table(x))
produces:
x <- scan(text = "3 4 3 3 1 4 1 3 2 3 1 1 4 2 3 3 2 6 1 1 3 3 2 2 2 2 1 3 2 1 6 1 3 2 2 1 2 2 4 2")
Related
This question already has answers here:
Extract row corresponding to minimum value of a variable by group
(9 answers)
Closed 5 years ago.
I have a table which contains multiple rows of the different data for a key of multiple columns.
Table looks like this:
A B C
1 1 1 2
2 1 1 3
3 2 1 4
4 1 2 4
5 2 2 3
6 2 3 1
7 2 3 2
8 2 3 2
I also discovered how to remove all of the duplicate elements using unique command for multiple colums, so the data duplication is not a problem.
I would like to know how to for every key(columns A and B in example) in the table to find only the minimum value in third column(C column in table)
At the end table should look like this
A B C
1 1 1 2
3 2 1 4
4 1 2 4
5 2 2 3
6 2 3 1
Thanks for any help. It is really appreciated
In any question, feel free to ask
con <- textConnection(" A B C
1 1 1 2
2 1 1 3
3 2 1 4
4 1 2 4
5 2 2 3
6 2 3 1
7 2 3 2
8 2 3 2")
df <- read.table(con, header = T)
df[with(df, order(A, B, C)), ]
df[!duplicated(df[1:2]),]
# A B C
# 1 1 1 2
# 3 2 1 4
# 4 1 2 4
# 5 2 2 3
# 6 2 3 1
I have been doing some hierarchical clusterings in R. Its worked out fine up til now, producing hclust objects left and center, but suddenly not anymore. Now it will only produce lists when performing:
mydata.clusters <- hclust(dist(mydata[, 1:8]))
mydata.clustercut <- cutree(mydata.clusters, 4)
and when trying to:
table(mydata.clustercut, mydata$customer_lifetime)
it doesnt produce a table, but an endless print of the values (Im guessing from the list).
The cutree function provide the grouping to which each observation belong to. For example:
iris.clust <- hclust(dist(iris[,1:4]))
iris.clustcut <- cutree(iris.clust, 4)
iris.clustcut
# [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2
# [52] 2 2 3 2 3 2 3 2 3 3 3 3 2 3 2 3 3 2 3 2 3 2 2 2 2 2 2 2 3 3 3 3 2 3 2 2 2 3 3 3 2 3 3 3 3 3 2 3 3 2 2
# [103] 4 2 2 4 3 4 2 4 2 2 2 2 2 2 2 4 4 2 2 2 4 2 2 4 2 2 2 4 4 4 2 2 2 4 2 2 2 2 2 2 2 2 2 2 2 2 2 2
Additional comparison can then be done by using this as a grouping variable for the observed data:
new.iris <- data.frame(iris, gp=iris.clustcut)
# example to visualise quickly the Species membership of each group
library(ggplot2)
ggplot(new.iris, aes(gp, fill=Species)) +
geom_bar()
I want repeat a sequence for specific length:
Sequence is 1:4 and I want to repeat the sequence till number of rows in a data frame.
Lets say length of the data frame is 24
I tried following:
test <- rep(1:4, each=24/4)
1 1 1 1 1 1 2 2 2 2 2 2 3 3 3 3 3 3 4 4 4 4 4 4
Lengthwise this is fine but i want to retain the sequence
1 2 3 4 1 2 3 4 1 2 3 4.....
You need to use times instead of each
rep(1:4, times=24/4)
[1] 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4
We can just pass it without any argument and it takes the times by default
rep(1:4, 24/4)
#[1] 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4
I want to generate a frequency table of values, but so far I only found how to do that by making classes, I want non-grouped values. Let's say i have:
values <- c(1,2,5,6,3,4,3,2,6,7)
how to generate a frequency table out of that?
Take a look at table
> tab <- table(values)
values
1 2 3 4 5 6 7
1 2 2 1 1 2 1
if you prefer a data.frame
> as.data.frame(tab)
values Freq
1 1 1
2 2 2
3 3 2
4 4 1
5 5 1
6 6 2
7 7 1
Plotting:
hist(values) # histogram of `values`
plot(tab) # plot of `tab`, table of frequencies
barplot(tab) # plot of `tab`, table of frequencies
table(values)
values
1 2 3 4 5 6 7
1 2 2 1 1 2 1
I have the following data:
Animal MY Age
1 17.03672067 1
1 17.00833641 2
1 16.97995215 3
1 16.95156788 4
1 16.92318362 5
1 16.88157748 6
2 16.83997133 2
2 16.79836519 3
2 16.75675905 4
2 16.7151529 5
2 16.67354676 6
2 16.63194062 7
3 16.59033447 1
3 16.54872833 2
3 16.50712219 3
3 16.46551604 4
3 16.4239099 5
3 16.38230376 6
4 16.34069761 1
4 16.29909147 2
4 16.25748533 3
4 16.21587918 4
4 16.17427304 5
4 16.1326669 6
I want to plot a scatter plot between MY vs Age for each animal. I use this function
plot(memo$MY[memo$Animal=="1223100747"]~memo$Age[memo$Animal=="1223100747"]).
If I now want to add a same plot (MY vs Age) for another animals, I just need to use function: lines.
However, since I have about 200 animals I do not want to do this manually 100 times. My questions is that: How can I plot these different animals by one function?, instead of using lines, lines ....lines)
Regards,
Phuong
You can use by for example :
by(memo,memo$Animal,FUN=function(x) plot(x$MY~x$Age))
You could use a loop or a matplot if you want to use base R, but I advise you to use package ggplot2.
DF <- read.table(text="Animal MY Age
1 17.03672067 1
1 17.00833641 2
1 16.97995215 3
1 16.95156788 4
1 16.92318362 5
1 16.88157748 6
2 16.83997133 2
2 16.79836519 3
2 16.75675905 4
2 16.7151529 5
2 16.67354676 6
2 16.63194062 7
3 16.59033447 1
3 16.54872833 2
3 16.50712219 3
3 16.46551604 4
3 16.4239099 5
3 16.38230376 6
4 16.34069761 1
4 16.29909147 2
4 16.25748533 3
4 16.21587918 4
4 16.17427304 5
4 16.1326669 6",header=TRUE)
library(ggplot2)
DF$Animal <- factor(DF$Animal)
p1 <- ggplot(DF,aes(x=MY,y=Age,colour=Animal)) + geom_line()
print(p1)