Boxplot: Need to capture all extreme outliers - r
I'm trying to capture all my data in a boxplot. I found a neat example in Cross Validated, but it's not entirely working for me and I was hoping that someone could help me out.
My code is:
boxplot(x,horizontal=TRUE,boxwex=.7,axes=FALSE,frame.plot=TRUE)
axis(1,at=xlab,labels=xlab)
opar <- par()
layout(matrix(1:3,nr=1,nc=3),heights=c(1,1,1),widths=c(1,6,1))
par(oma = c(5,4,0,0) + 0.1,mar = c(0,0,1,1) + 0.1)
stripchart(x[x< -400],pch=1,cex=1,xlim=c(-1700000,-400),method="jitter")
boxplot(x[abs(x)<400],horizontal=TRUE,ylim=c(-400,400),at=0,boxwex=.7,cex=1,method="jitter")
stripchart(x[x> 400],pch=1,cex=1,xlim=c(400,60000),method="jitter")
par(opar)
but the jitter doesn't work in the boxplot and the stripcharts shouldn't start at 0. If I can figure out how to paste the output chart I would do it.
[1] -1620000.00 -85000.00 -32672.62 -30963.50 -28335.64 -26531.30 -18305.68 -13964.04 -13500.00
[10] -13248.48 -10975.05 -7410.00 -6034.32 -5629.00 -5349.09 -5125.00 -4994.45 -4973.72
[19] -4404.84 -4063.76 -3632.77 -3118.50 -3056.18 -3000.00 -2774.00 -2699.86 -2541.50
[28] -2327.06 -2238.89 -1750.00 -1548.63 -1343.25 -1271.67 -1187.55 -1114.80 -1087.44
[37] -1084.59 -1080.00 -977.20 -936.00 -900.00 -896.50 -853.60 -850.00 -792.00
[46] -791.44 -773.53 -750.00 -750.00 -710.82 -700.00 -697.68 -678.00 -665.00
[55] -620.00 -578.49 -513.96 -500.00 -474.18 -468.51 -412.47 -334.50 -332.50
[64] -331.20 -305.32 -300.00 -300.00 -244.04 -239.65 -212.30 -210.00 -203.32
[73] -202.15 -199.50 -198.24 -188.64 -177.25 -174.78 -169.80 -168.80 -168.25
[82] -166.75 -144.35 -140.00 -129.98 -126.74 -120.33 -120.00 -115.92 -114.99
[91] -112.45 -108.00 -106.64 -103.40 -100.00 -100.00 -98.28 -95.68 -89.36
[100] -87.84 -86.59 -75.68 -72.16 -72.04 -71.13 -65.52 -51.00 -50.00
[109] -50.00 -44.12 -41.25 -40.00 -35.18 -35.14 -34.41 -33.82 -33.80
[118] -33.60 -32.98 -30.00 -30.00 -29.13 -28.00 -27.44 -26.46 -26.32
[127] -25.92 -25.50 -25.06 -25.00 -21.84 -20.00 -19.63 -19.14 -18.64
[136] -18.60 -18.00 -17.25 -16.72 -16.69 -16.54 -15.50 -15.00 -13.51
[145] -12.16 -11.78 -11.69 -11.56 -11.26 -10.97 -10.88 -10.84 -10.62
[154] -10.45 -10.20 -10.00 -9.83 -9.04 -9.00 -8.75 -8.70 -8.50
[163] -8.28 -8.26 -7.92 -7.88 -7.74 -6.70 -6.44 -6.10 -5.35
[172] -5.04 -4.84 -4.73 -4.65 -4.50 -4.44 -4.40 -4.34 -4.25
[181] -4.00 -3.99 -3.98 -3.96 -3.94 -3.70 -3.08 -2.88 -2.85
[190] -2.75 -2.52 -2.14 -2.06 -2.00 -1.98 -1.96 -1.92 -1.74
[199] -1.68 -1.50 -1.10 -1.08 -0.89 -0.67 -0.60 -0.50 -0.48
[208] -0.42 -0.40 -0.30 -0.14 -0.04 0.00 0.00 0.00 0.00
[217] 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
[226] 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
[235] 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
[244] 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
[253] 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
[262] 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
[271] 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
[280] 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
[289] 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
[298] 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
[307] 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
[316] 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
[325] 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
[334] 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
[343] 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
[352] 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
[361] 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
[370] 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
[379] 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
[388] 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
[397] 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
[406] 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
[415] 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
[424] 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
[433] 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
[442] 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
[451] 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
[460] 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
[469] 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
[478] 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
[487] 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
[496] 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
[505] 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
[514] 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
[523] 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
[532] 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
[541] 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
[550] 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
[559] 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
[568] 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
[577] 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
[586] 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
[595] 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
[604] 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
[613] 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
[622] 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
[631] 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
[640] 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
[649] 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
[658] 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
[667] 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
[676] 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
[685] 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.20 0.21
[694] 0.40 0.44 0.46 0.46 0.48 0.59 0.70 1.00 1.14
[703] 1.17 1.25 1.28 1.40 1.42 1.60 1.68 2.10 2.10
[712] 2.16 2.32 2.34 2.37 2.52 2.80 2.88 3.50 3.51
[721] 3.99 4.76 5.00 5.63 5.76 5.85 5.85 6.00 6.20
[730] 6.50 7.36 8.07 8.68 9.25 9.67 9.80 9.82 11.02
[739] 14.00 15.00 15.04 15.27 16.60 17.55 17.68 19.50 20.94
[748] 21.81 23.51 23.86 24.57 24.57 25.96 27.36 27.44 27.81
[757] 29.20 29.59 29.72 30.30 38.50 39.77 47.20 47.92 50.00
[766] 50.59 51.00 54.20 65.02 68.00 71.28 75.00 92.80 95.28
[775] 105.29 110.00 126.84 134.04 134.24 140.00 140.58 147.50 148.78
[784] 152.48 173.80 181.37 181.80 185.60 186.90 188.48 201.30 209.50
[793] 215.27 228.64 240.00 243.68 248.08 250.00 250.00 255.58 277.50
[802] 282.00 285.40 290.80 304.39 325.00 327.76 339.80 362.00 372.93
[811] 373.24 380.70 400.00 440.00 450.00 493.74 508.50 510.64 538.20
[820] 551.37 565.00 570.95 612.22 616.00 653.40 665.24 666.75 667.20
[829] 718.23 770.66 825.26 855.79 884.00 1000.00 1064.00 1064.77 1080.00
[838] 1152.00 1159.62 1177.24 1271.27 1495.52 1590.00 1670.00 1739.79 2075.68
[847] 2496.00 3570.00 3648.64 4152.64 4158.00 4556.44 4594.75 5040.00 5099.40
[856] 5150.67 5926.65 5967.81 6110.64 6144.00 6942.20 7350.00 7525.32 8667.90
[865] 9601.02 11557.20 12360.12 14425.70 15000.00 17962.14 27655.72 34709.96 45430.00
[874] 50000.00 57785.00
OK, rather than go through the cumbersome process I used (and before I get another Tumbleweed "award"), I found a better solution posted by bdemarest in 2015 under the title "Understanding Boxplot with jitter". If my dataframe is called DRP with headings "Cost_Delta" and "Month" (Data for Jan-2017 is in first post), my solution chart can be found here https://i.stack.imgur.com/sSWtr.png. Code is below.
DRP<-read.table("C:\\Projects\\Mat Group\\DRP\\1000_Item_Data\\RFiles\\Cost Delta\\DRP_CostDelta2.txt",header=TRUE)
DRP$Month <- as.character(DRP$Month)
DRP$Month <- factor(DRP$Month, levels=unique(DRP$Month))
library(ggplot2)
p<-ggplot(DRP, aes(x=Month, y=Cost_Delta)) +
geom_point(aes(fill=Month), size=2, shape=21, colour="grey20",
position=position_jitter(width=0.2, height=0.1)) +
geom_boxplot(outlier.colour=NA, fill=NA, colour="grey20")
p + scale_y_continuous(labels=comma,breaks=seq(-300000,350000,50000)) +
labs(x="Month-Year", y="Cost Delta (Demand-DRP Forecast)") +
#*** January Outliers
geom_text(x=1, y=-250000, label="-1,620,000",size=3) +
geom_segment(aes(x=1, xend=1, y=-275000, yend=-276000),
arrow = arrow(length = unit(0.3, "cm"),ends="last", type = "closed"),col="red") +
#*** February Outliers
geom_text(x=2, y=300000, label="1,101,786",size=3) +
geom_segment(aes(x=2, xend=2, y=325000, yend=326000),
arrow = arrow(length = unit(0.3, "cm"),ends="last", type = "closed"),col="red") +
geom_text(x=2, y=-250000, label="-7,020,000",size=3) +
geom_segment(aes(x=2, xend=2, y=-275000, yend=-276000),
arrow = arrow(length = unit(0.3, "cm"),ends="last", type = "closed"),col="red") +
#*** March Outliers
geom_text(x=3, y=-250000, label="-3,780,000",size=3) +
geom_segment(aes(x=3, xend=3, y=-275000, yend=-276000),
arrow = arrow(length = unit(0.3, "cm"),ends="last", type = "closed"),col="red") +
#*** August Outliers
geom_text(x=6, y=-225000, label="-484,960",size=3) +
geom_text(x=6, y=-250000, label="-540,000",size=3) +
geom_segment(aes(x=6, xend=6, y=-275000, yend=-276000),
arrow = arrow(length = unit(0.3, "cm"),ends="last", type = "closed"),col="red") +
#*** September Outliers
geom_text(x=7, y=300000, label="593,960",size=3) +
geom_segment(aes(x=7, xend=7, y=325000, yend=326000),
arrow = arrow(length = unit(0.3, "cm"),ends="last", type = "closed"),col="red") +
geom_text(x=7, y=-250000, label="-484,960",size=3) +
geom_segment(aes(x=7, xend=7, y=-275000, yend=-276000),
arrow = arrow(length = unit(0.3, "cm"),ends="last", type = "closed"),col="red") +
#*** October Outliers
geom_text(x=8, y=300000, label="969,920",size=3) +
geom_segment(aes(x=8, xend=8, y=325000, yend=326000),
arrow = arrow(length = unit(0.3, "cm"),ends="last", type = "closed"),col="red") +
#*** November Outliers
geom_text(x=9, y=300000, label="2,909,760",size=3) +
geom_segment(aes(x=9, xend=9, y=325000, yend=326000),
arrow = arrow(length = unit(0.3, "cm"),ends="last", type = "closed"),col="red") +
#*** December Outliers
geom_text(x=10, y=300000, label="1,080,000",size=3) +
geom_segment(aes(x=10, xend=10, y=325000, yend=326000),
arrow = arrow(length = unit(0.3, "cm"),ends="last", type = "closed"),col="red") +
geom_text(x=10, y=-250000, label="-1,939,000",size=3) +
geom_segment(aes(x=10, xend=10, y=-275000, yend=-276000),
arrow = arrow(length = unit(0.3, "cm"),ends="last", type = "closed"),col="red")
Related
How to match elements from data frame with values from an array in R?
I want to match elements from df1 with values from an array1. df1 <- (c('A','S','E','E','V','G','H','P','K','L','W','N','P','A','A','S','E','N','M','Y','S','G','D','R','H')) array1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 A 0.15 0.00 0.10 0.10 0.05 0.00 0.05 0.00 0.00 0.05 0.00 0.0 0.00 0.0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.05 0.00 0.00 0.00 C 0.00 0.00 0.00 0.00 0.00 0.05 0.00 0.00 0.00 0.00 0.00 0.0 0.00 0.0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 D 0.00 0.00 0.00 0.00 0.00 0.00 0.10 0.00 0.05 0.05 0.00 0.0 0.10 0.0 0.00 0.25 0.10 0.20 0.10 0.00 0.15 0.05 0.00 0.00 0.05 E 0.05 0.10 0.05 0.05 0.00 0.05 0.00 0.10 0.10 0.20 0.00 0.0 0.05 0.0 0.00 0.00 0.05 0.10 0.00 0.20 0.10 0.05 0.15 0.10 0.10 F 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.05 0.00 0.00 0.05 0.0 0.05 0.0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 G 0.00 0.00 0.10 0.00 0.05 0.00 0.00 0.00 0.05 0.00 0.00 0.0 0.00 0.0 0.00 0.00 0.00 0.00 0.05 0.00 0.00 0.05 0.00 0.00 0.00 H 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.0 0.00 0.0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 I 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.05 0.2 0.05 0.1 0.05 0.05 0.00 0.00 0.00 0.00 0.05 0.00 0.00 0.00 0.00 K 0.00 0.10 0.00 0.05 0.00 0.05 0.05 0.05 0.00 0.00 0.00 0.0 0.00 0.0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.10 0.00 0.05 L 0.00 0.00 0.05 0.05 0.05 0.05 0.10 0.00 0.10 0.00 0.00 0.0 0.00 0.2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.05 0.00 M 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.0 0.00 0.0 0.00 0.00 0.05 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 N 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.0 0.00 0.0 0.00 0.00 0.00 0.05 0.05 0.00 0.00 0.00 0.00 0.05 0.00 P 0.00 0.00 0.00 0.05 0.05 0.00 0.10 0.10 0.00 0.00 0.00 0.0 0.00 0.0 0.00 0.00 0.00 0.00 0.00 0.05 0.00 0.10 0.00 0.05 0.00 Q 0.00 0.05 0.05 0.00 0.10 0.00 0.00 0.00 0.00 0.00 0.05 0.0 0.00 0.1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.05 R 0.00 0.00 0.05 0.00 0.05 0.15 0.00 0.00 0.00 0.05 0.00 0.0 0.00 0.0 0.00 0.00 0.00 0.00 0.05 0.00 0.00 0.00 0.00 0.00 0.00 S 0.10 0.10 0.00 0.00 0.05 0.00 0.00 0.00 0.00 0.00 0.05 0.0 0.00 0.0 0.15 0.10 0.20 0.05 0.10 0.10 0.05 0.00 0.05 0.05 0.10 T 0.00 0.00 0.00 0.05 0.00 0.05 0.00 0.05 0.05 0.00 0.00 0.0 0.00 0.0 0.05 0.00 0.00 0.00 0.00 0.00 0.00 0.05 0.05 0.05 0.00 V 0.05 0.05 0.00 0.05 0.00 0.00 0.00 0.05 0.05 0.00 0.10 0.2 0.15 0.0 0.15 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 W 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.0 0.00 0.0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 Y 0.05 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.05 0.10 0.0 0.00 0.0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 The expected outcome can be a list or a df: 0.15, 0.10, 0.05, 0.05, 0.00, 0.00, 0.00, 0.10, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.10, 0.05, 0.05, 0.00, 0.00, 0.05, 0.05, 0.00, 0.00, 0.00 This is what I have tried: res <- left_join(df1, array1, by = array1[[y]]) view(res)
You can use matrix subsetting on array1 : array1[cbind(match(df1, rownames(array1)), 1:ncol(array1))] #[1] 0.15 0.10 0.05 0.05 0.00 0.00 0.00 0.10 0.00 0.00 0.00 0.00 0.00 #[14] 0.00 0.00 0.10 0.05 0.05 0.00 0.00 0.05 0.05 0.00 0.00 0.00 match(df1, rownames(array1)) creates a row number to subset based on values in df1.
How to do multiple time series forecasting i.e. forecasting for each customer
I have the following month-wise data for each customer. I would like to do a 3-month forecast for each customer. Note:many obs have zeros(no transaction)------Need to tackle this sparse dataset CustomerName 01/2009 02/2009 03/2009 04/2009 05/2009 06/2009 07/2009 08/2009 09/2009 10/2009 Aaron Bergman 0.00 0.00 0.00 0.00 0.00 0.0 4270.87 0.00 0.00 0 Aaron Hawkins 0.00 0.00 0.00 0.00 0.00 0.0 0.00 455.04 0.00 0 Aaron Smayling 136.29 4658.69 0.00 119.34 4674.16 0.0 0.00 0.00 0.00 0 Adam Bellavance 0.00 0.00 0.00 0.00 2107.55 0.0 0.00 0.00 0.00 0 Adam Hart 60.52 0.00 0.00 0.00 0.00 0.0 0.00 0.00 0.00 0 Adam Shillingsburg 0.00 1749.50 125.86 0.00 0.00 5689.4 3275.74 1296.30 9887.52 0 Adrian Barton 0.00 66.00 0.00 0.00 0.00 55.0 0.00 0.00 0.00 0 Adrian Hane 0.00 23.66 0.00 0.00 46.22 0.0 0.00 0.00 0.00 0 Adrian Shami 10.00 0.00 0.00 33.00 0.00 48.0 0.00 0.00 42.00 0 Aimee Bixby 56.33 22.99 0.00 44.28 0.00 0.0 0.00 66.12 0.00 48.22 How can i do some sort of batch times series forecasting say using auto.arima for each customer ......
How to change or swap the color for a stacked bar plot in R?
I have a matrix like this: my.matrix: A B C D E F G H [1,] 12.1 8.10 7.79 11.40 10.30 15.10 9.88 13.90 [2,] 0.0 5.45 0.00 0.00 0.00 0.00 0.00 0.00 [3,] 0.0 0.00 5.42 0.00 0.00 0.00 0.00 0.00 [4,] 0.0 0.00 0.00 6.55 0.00 0.00 0.00 0.00 [5,] 0.0 0.00 0.00 0.00 4.68 0.00 0.00 0.00 [6,] 0.0 0.00 0.00 0.00 0.00 4.55 0.00 0.00 [7,] 0.0 0.00 0.00 0.00 0.00 0.00 4.32 0.00 [8,] 0.0 0.00 0.00 0.00 0.00 0.00 0.00 3.94 and I've generated a barplot: barplot((my.matrix), beside=F, axis.lty=1, xpd=T, ylim= c(0,30),xlim=c(0,11), horiz=F,yaxt='n', axisnames=F, col=c("black","darkolivegreen1","steelblue2","hotpink3","lightpink","gold","darkslategray1","peachpuff")) and here is the plot: So, what I want is to swap the colors of individual columns from the second column to the last one. For example on the column 2 you can see green color on the top of black; but I want to have black over green and the same for the rest of the columns while I keep the matrix in a same order. Here is the example that I've colored with photoshop for the first three columns: I tried to reverse or transpose the matrix but It doesn't work and I am kinda stuck in this part. I would really appreciate if you could help me out with this problem! Best,
You can do barplot(my.matrix[nrow(my.matrix):1, ], beside=F, axis.lty=1, xpd=T, ylim= c(0,30),xlim=c(0,11), horiz=F,yaxt='n', axisnames=F, col=c("peachpuff", "darkslategray1", "gold", "lightpink", "hotpink3", "steelblue2", "darkolivegreen1", "black")) Or, with regards to your edit: m <- my.matrix diag(m) <- my.matrix[1, ] m[1, ] <- diag(my.matrix) barplot(m[nrow(m):1, ], beside=F, axis.lty=1, xpd=T, ylim= c(0,30),xlim=c(0,11), horiz=F,yaxt='n', axisnames=F, col=c("peachpuff", "darkslategray1", "gold", "lightpink", "hotpink3", "steelblue2", "darkolivegreen1", "black")) Data: my.matrix <- read.table(header=T, text=" A B C D E F G H [1,] 12.1 8.10 7.79 11.40 10.30 15.10 9.88 13.90 [2,] 0.0 5.45 0.00 0.00 0.00 0.00 0.00 0.00 [3,] 0.0 0.00 5.42 0.00 0.00 0.00 0.00 0.00 [4,] 0.0 0.00 0.00 6.55 0.00 0.00 0.00 0.00 [5,] 0.0 0.00 0.00 0.00 4.68 0.00 0.00 0.00 [6,] 0.0 0.00 0.00 0.00 0.00 4.55 0.00 0.00 [7,] 0.0 0.00 0.00 0.00 0.00 0.00 4.32 0.00 [8,] 0.0 0.00 0.00 0.00 0.00 0.00 0.00 3.94") my.matrix <- as.matrix(my.matrix)
Extract Data from a File Unix
I have file that has space separated columns from that i want to extract specific data .below is the format of the file : 12:00:01 AM CPU %usr %nice %sys %iowait %steal %irq %soft %guest %idle 12:01:01 AM all 78.13 0.00 0.98 0.00 0.00 0.00 0.56 0.00 20.33 12:02:01 AM all 93.42 0.00 0.53 0.00 0.00 0.00 0.10 0.00 5.95 12:03:01 AM 1 88.62 0.00 1.71 0.00 0.00 0.00 0.71 0.00 8.96 12:01:01 AM 2 92.56 0.00 0.70 0.00 0.00 0.00 1.17 0.00 5.58 12:01:01 AM 3 86.90 0.00 1.57 0.00 0.00 0.00 0.55 0.00 10.99 01:01:01 AM all 78.13 0.00 0.98 0.00 0.00 0.00 0.56 0.00 20.33 01:02:01 AM all 93.42 0.00 0.53 0.00 0.00 0.00 0.10 0.00 5.95 01:03:01 AM all 88.62 0.00 1.71 0.00 0.00 0.00 0.71 0.00 8.96 01:01:01 AM 2 92.56 0.00 0.70 0.00 0.00 0.00 1.17 0.00 5.58 01:01:01 AM 3 86.90 0.00 1.57 0.00 0.00 0.00 0.55 0.00 10.99 12:01:01 PM all 78.13 0.00 0.98 0.00 0.00 0.00 0.56 0.00 20.33 12:02:01 PM 0 93.42 0.00 0.53 0.00 0.00 0.00 0.10 0.00 5.95 12:03:01 PM 1 88.62 0.00 1.71 0.00 0.00 0.00 0.71 0.00 8.96 12:01:01 PM 2 92.56 0.00 0.70 0.00 0.00 0.00 1.17 0.00 5.58 12:01:01 PM 3 86.90 0.00 1.57 0.00 0.00 0.00 0.55 0.00 10.99 Now from this file i want those rows that have time like 12:01:01 AM/PM i means for every hourly basis and have all in the CPU column So after extraction i want below data but i am not able to get that. 12:01:01 AM all 78.13 0.00 0.98 0.00 0.00 0.00 0.56 0.00 20.33 01:01:01 AM all 78.13 0.00 0.98 0.00 0.00 0.00 0.56 0.00 20.33 12:01:01 PM all 78.13 0.00 0.98 0.00 0.00 0.00 0.56 0.00 20.33 Please suggest me how we can get that data in UNIX
If you add the -E option to grep it allows you to look for "Extended Regular Expressions". One such expression is "CPU|01:01" which will allow you to find all lines containing the word "CPU" (such as your column heading line) and also any lines with "01:01" in them. It is called an "alternation" and uses the pipe symbol (|) to separate alternate sub-parts. So, an answer would be" grep -E "CPU|01:01 .*all" yourFile > newFile Try running: man grep to get the manual (help) page.
awk to the rescue! if you need field specific matches awk is the right tool. $ awk '$3=="all" && $1~/01:01$/' file 12:01:01 AM all 78.13 0.00 0.98 0.00 0.00 0.00 0.56 0.00 20.33 01:01:01 AM all 78.13 0.00 0.98 0.00 0.00 0.00 0.56 0.00 20.33 12:01:01 PM all 78.13 0.00 0.98 0.00 0.00 0.00 0.56 0.00 20.33 you can extract the header as well, with this $ awk 'NR==1 || $3=="all" && $1~/01:01$/' file
Compana function for compositional analysis freezes in R
I'm trying to run compositional analysis of the use of different type of habitats by ground nesting chicks on a set of data using R Studio. It starts processing but gives never stops. I have to manually stop the processing or kill R Studio. (Same result in R.) I'm using the campana function from the adehabitatHS package. From the adehabitat I'm able to run the sample pheasant and squirrel data without any problems. (I've tried calling campana from both packages with the same result.) For each chick, the habitat available varies as it's taken as a buffer zone around their nest site. My data This is the available habitats for each chick: grass fallow.plot oil.seed.rape spring.barley winter.wheat maize other.crops other woodland hedgerow 1 23.35 7.53 45.75 0.00 0.00 0.00 0.00 0.00 23.37 0.00 2 86.52 10.35 0.00 0.00 1.24 0.00 0.00 1.89 0.00 0.00 3 5.18 10.33 28.36 38.82 0.00 0.00 17.17 0.14 0.00 0.00 4 4.26 18.32 27.31 32.66 3.82 0.00 0.00 5.02 5.52 3.09 5 4.26 18.32 27.31 32.66 3.82 0.00 0.00 5.02 5.52 3.09 6 12.52 10.35 0.00 0.00 0.00 18.02 43.59 13.15 2.37 0.00 7 21.41 11.56 59.25 0.00 0.00 0.00 0.00 5.82 0.00 1.96 8 21.41 11.56 59.25 0.00 0.00 0.00 0.00 5.82 0.00 1.96 9 36.17 16.93 0.00 30.14 0.00 0.00 0.00 7.08 9.68 0.00 10 0.00 12.17 26.49 0.00 3.99 55.77 0.00 1.58 0.00 0.00 11 0.00 10.27 67.41 1.93 18.30 0.00 0.00 1.18 0.00 0.91 12 2.66 5.38 0.00 14.39 54.06 0.00 8.40 3.83 7.84 3.44 13 2.66 5.38 0.00 14.39 54.06 0.00 8.40 3.83 7.84 3.44 14 84.22 8.00 0.00 0.00 0.00 2.90 0.00 0.22 3.84 0.82 15 84.22 8.00 0.00 0.00 0.00 2.90 0.00 0.22 3.84 0.82 16 86.85 13.04 0.00 0.00 0.00 0.00 0.00 0.11 0.00 0.00 17 86.85 13.04 0.00 0.00 0.00 0.00 0.00 0.11 0.00 0.00 18 86.85 13.04 0.00 0.00 0.00 0.00 0.00 0.11 0.00 0.00 19 86.85 13.04 0.00 0.00 0.00 0.00 0.00 0.11 0.00 0.00 20 21.41 8.11 0.47 8.08 0.00 0.00 56.78 2.26 0.00 2.89 This is the used habitats (mcp): grass fallow.plot oil.seed.rape spring.barley winter.wheat maize other.crops other woodland hedgerow 1 41.14 58.67 0.19 0.00 0.00 0.00 0.00 0.00 0 0.0 2 35.45 64.55 0.00 0.00 0.00 0.00 0.00 0.00 0 0.0 3 10.10 60.04 7.72 21.37 0.00 0.00 0.00 0.77 0 0.0 4 0.00 44.55 0.00 50.27 0.00 0.00 0.00 5.18 0 0.0 5 2.82 48.48 44.80 0.00 0.00 0.00 0.00 0.00 0 3.9 6 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 0 0.0 7 0.00 87.41 12.59 0.00 0.00 0.00 0.00 0.00 0 0.0 8 0.00 83.59 16.41 0.00 0.00 0.00 0.00 0.00 0 0.0 9 0.00 100.00 0.00 0.00 0.00 0.00 0.00 0.00 0 0.0 10 0.00 18.93 0.00 0.00 0.00 81.07 0.00 0.00 0 0.0 11 0.00 100.00 0.00 0.00 0.00 0.00 0.00 0.00 0 0.0 12 0.00 22.79 0.00 0.00 77.13 0.00 0.00 0.08 0 0.0 13 0.00 0.00 0.00 0.00 100.00 0.00 0.00 0.00 0 0.0 14 54.60 44.97 0.00 0.00 0.00 0.00 0.00 0.43 0 0.0 15 62.86 36.57 0.00 0.00 0.00 0.00 0.00 0.57 0 0.0 16 11.15 88.10 0.00 0.00 0.00 0.00 0.00 0.75 0 0.0 17 20.06 79.62 0.00 0.00 0.00 0.00 0.00 0.32 0 0.0 18 38.64 60.95 0.00 0.00 0.00 0.00 0.00 0.41 0 0.0 19 3.81 95.81 0.00 0.00 0.00 0.00 0.00 0.38 0 0.0 20 0.00 3.56 0.00 0.00 0.00 0.00 96.44 0.00 0 0.0 I've tried both parametric and randomisation tests with the same results. The code I'm running: habuse <- compana(used, avail, test = "randomisation",rnv = 0.001, nrep = 500, alpha = 0.1) habuse <- compana(used, avail, test = "parametric") Any ideas where I'm going wrong?
I've discovered the answer to my own question. For the used data, the function replaces 0 values with the value you specify (0.001 in my case). But it doesn't replace 0 values in the available data, and it doesn't like them either. I replaced all the 0s with 0.001 in the available table, adjusted the other values and the function worked.