I want to build this matrix
What I tried
table <- matrix(c(163,224,312,314,303,175,119,662,933,909,871,702,522,307,1513,2400,2164,2299,1824,1204,678,1603,2337,2331,2924,2360,1428,808,2834,3903,3826,4884,3115,2093,89), nrow=5, ncol=7, byrow=T)
rownames(table) <- c("Fair", "Good", "Very Good", "Premium", "Ideal")
colnames(table) <- c("D", "E", "F", "G", "H", "I", "J")
but the result is this:
and my question is how to add the color and cut labels
Here, dimnames(table) is a 'list'. In the original matrix 'table', the list elements are not named. We can use names to change the names of the list from 'NULL' to the preferred one.
names(dimnames(table)) <- c('cut', 'color')
table
# color
# cut D E F G H I J
# Fair 163 224 312 314 303 175 119
# Good 662 933 909 871 702 522 307
# Very Good 1513 2400 2164 2299 1824 1204 678
# Premium 1603 2337 2331 2924 2360 1428 808
# Ideal 2834 3903 3826 4884 3115 2093 89
NOTE: table is an R function, so it is better to name the object a different name.
Related
I am new to R. I have a R dataframe of following structure:
164_I_.CEL 164_II.CEL 183_I.CEL 183_II.CEL 2114_I.CEL
1 4496 5310 4492 4511 2872
2 181 280 137 101 91
3 4556 5104 4379 4608 2972
4 167 217 99 79 82
5 89 110 69 58 47
I want to group the columns which have "_I.CEL" in the column name.
I need a list output like NI, NI, I, NI, I
where NI means Not I.
A combination of ifelse and grepl looking for the required pattern in the column names.
ifelse(grepl("_I\\.CEL", names(df1)), "I", "NI")
#[1] "NI" "NI" "I" "NI" "I"
where df1 is your data frame.
Or use fixed = TRUE
ifelse(grepl("_I.CEL", names(df1), fixed = TRUE), "I", "NI")
Im trying to make a barplot with the following data
Dept
Admit A B C D E F
Admitted 601 370 322 269 147 46
Rejected 332 215 596 523 437 668
and I have tried the following code:
admission_department <- barplot(biasUCB_d, main="Admit by deparment",
xlab="biasUCB_d[['Dept']]",
col=c("darkblue","red"),
legend = rownames(biasUCB_d[['Dept']]),
beside=TRUE)
The name of the coding used to create the dataset is:
biasUCB_d <- margin.table(UCBAdmissions, c(1,3))
What am I doing wrong?
Assuming that Dept is an element of a list this should work:
Data:
biasUCB_d <- list(Dept = read.table(header=T, text='
Admit A B C D E F
Admitted 601 370 322 269 147 46
Rejected 332 215 596 523 437 668'))
Solution:
dmission_department <- barplot(as.matrix(biasUCB_d$Dept[2:7]), main="Admit by deparment",
xlab="biasUCB_d[['Dept']]",
col=c("darkblue","red"),
legend = biasUCB_d$Dept$Admit,
beside=TRUE)
Output:
Try:
admission_department <- barplot(biasUCB_d, main="Admit by deparment",
xlab="Department",
col=c("darkblue","red"),
legend.text = rownames(biasUCB_d),
beside=TRUE)
How can I pseudo-table() two variables but fill with values from third column/ separate list?
Example:
library(ggplot2) # diamonds data
data(diamonds)
T.matrix <- with(diamonds, table(color, clarity))
Produces:
clarity
color I1 SI2 SI1 VS2 VS1 VVS2 VVS1 IF
D 42 1370 2083 1697 705 553 252 73
E 102 1713 2426 2470 1281 991 656 158
F 143 1609 2131 2201 1364 975 734 385
G 150 1548 1976 2347 2148 1443 999 681
H 162 1563 2275 1643 1169 608 585 299
I 92 912 1424 1169 962 365 355 143
J 50 479 750 731 542 131 74 51
I want a similar table with color by clarity except with fill = reference$value instead of table()'s count
reference <- expand.grid(clarity = c("I1", "SI2", "SI1", "VS2", "VS1","VVS2", "VVS1", "IF"),
color = c("D", "E", "F", "G", "H", "I", "J"))
reference$value <- 1:56
So: [D,I1] would have a value of 1, [SI1, D] = 2, [VS2, H] = 36, etc.
Try tapply:
tapply(diamonds$price, list(diamonds$color, diamonds$clarity), mean)
tapply takes your desired variable, groups it by the list of variables to group by, then applies the last function. The table output is maybe not so useful, depending on your desired use.
If you want your data in a more usable format, you might want to use dplyr:
library(dplyr)
diamonds %>% group_by(clarity, color) %>%
summarise(mean(price))
Edit: It is the same!
tapply(reference$value, list(reference$color, reference$clarity), FUN = sum)
you need the fun or tapply collapses the output
even if I am getting used to R I am still new with it and I hope that someone can help me deal with this task ...I have tried to look for some previous topics but I couldn't find what I was looking for, so here I am hoping for some help.
I am trying to draw my bar plot but I am not having much luck on some of the settings so I hope someone could help. I am using R 3.1.1 on my mac OS 10.9.4.
my table look like this:
family area1 area2 area3 area4 area5 area6
A 15 20 500 200 17 26
B 170 520 26 13 100 70
C 35 250 358 128 88 26
D 95 375 289 156 169 356
E 425 177 136 144 285 70
since I have the file save it as a csv I am doing this steps:
fam <- read.csv ("family_per_area_count.csv", sep =";", header = T)
I am converting the file as a matrix
fam.mat <- as.matrix(fam_1, ncol = 6, byrow = T)
then I assign row names and col names
rownames(fam.mat) <- c("A", "B", "C", "D", "E")
colnames(fam.mat) <- c("area1", "area2", "area3", "area4", "area5", "area6")
then I am simply running the bar plot command as
barplot(fam.mat, beside = T, col = rainbow(ncol(fam.mat)))
but I am missing most of the labels for the x axis and the plot seems to be pressed together.
I also tried to run the cumulative bar plot using this command
par(mar = c(5.1, 4.1, 4.1, 7.1), xpd = TRUE)
prop <- prop.table(data_mat, margin = 2)
barplot(data_mat, col = rainbow(length(rownames(data_mat))), width = 3)
legend("topright", inset = c(-0.25, 0), fill = rainbow(length(rownames(data_mat))),
legend = rownames(data_mat))
but the legend colours don't match the data and again my x-axis seems out of center. I have tried to transpose the matrix but still no luck.
Can anyone make any suggestion?
Thank you so much in advance
F.
Here is a start:
DF <- read.table(text="family area1 area2 area3 area4 area5 area6
A 15 20 500 200 17 26
B 170 520 26 13 100 70
C 35 250 358 128 88 26
D 95 375 289 156 169 356
E 425 177 136 144 285 70", header=TRUE)
library(reshape2)
DF <- melt(DF, id.var="family")
library(ggplot2)
ggplot(DF, aes(x=family, y=value, fill=variable)) +
geom_bar(stat="identity", position="dodge")
Study ggplot2 documentation and tutorials to learn how to customise the plot.
Let's say I'd like to calculate the magnitude of the range over a few columns, on a row-by-row basis.
set.seed(1)
dat <- data.frame(x=sample(1:1000,1000),
y=sample(1:1000,1000),
z=sample(1:1000,1000))
Using data.frame(), I would do something like this:
dat$diff_range <- apply(dat,1,function(x) diff(range(x)))
To put it more simply, I'm looking for this operation, over each row:
diff(range(dat[1,]) # for i 1:nrow(dat)
If I were doing this for the entire table, it would be something like:
setDT(dat)[,diff_range := apply(dat,1,function(x) diff(range(x)))]
But how would I do it for only named (or numbered) rows?
pmax and pmin find the min and max across columns in a vectorized way, which is much better than splitting and working with each row separately. It's also pretty concise:
dat[, r := do.call(pmax,.SD) - do.call(pmin,.SD)]
x y z r
1: 266 531 872 606
2: 372 685 967 595
3: 572 383 866 483
4: 906 953 437 516
5: 201 118 192 83
---
996: 768 945 292 653
997: 61 231 965 904
998: 771 145 18 753
999: 841 148 839 693
1000: 857 252 218 639
How about this:
D[,list(I=.I,x,y,z)][,diff(range(x,y,z)),by=I][c(1:4,15:18)]
# I V1
#1: 1 971
#2: 2 877
#3: 3 988
#4: 4 241
#5: 15 622
#6: 16 684
#7: 17 971
#8: 18 835
#actually this will be faster
D[c(1:4,15:18),list(I=.I,x,y,z)][,diff(range(x,y,z)),by=I]
use .I to give you an index to call with the by= parameter, then you can run the function on each row. The second call pre-filters by any list of row numbers, or you can add a key and filter on that if your real table looks different.
You can do it by subsetting before/during the function. If you only want every second row for example
dat_Diffs <- apply(dat[seq(2,1000,by=2),],1,function(x) diff(range(x)))
Or for rownames 1:10 (since their names weren't specified they are just numbers counting up)
dat_Diffs <- apply(dat[rownames(dat) %in% 1:10,],1,function(x) diff(range(x)))
But why not just calculate per row then subset later?