I want to plot a lot of boxplots in on particular style to compare them.
But when a group is empty the group "isn't plotted".
lets say I have a dataframe:
a b
1 1 5
2 1 4
3 1 6
4 1 4
5 2 9
6 2 8
7 2 9
8 3 NaN
9 3 NaN
10 3 NaN
11 4 2
12 4 8
and I use boxplot to plot it:
boxplot(b ~ a , df)
than I get the plot without group 3
(which I can't show because I did not have "10 reputation")
I found some solutions for removing empty groups via Google but my problem is the other way around.
And I found the solution via at=c(1,2,4) but as I generate an Rscript with python and different groups are empty I would prefer, that the groups aren't dropped at all.
Oh I don't think I have the time to grapple with additional packages.
Therefore I would be thankful for solutions without them.
You can get the group on the x-axis by
boxplot(b ~ a , df, na.action=na.pass)
Or
boxplot(b~factor(a), df)
Related
I have a dataset of Ages for the customer and I wanted to make a frequency distribution by 9 years of a gap of age.
Ages=c(83,51,66,61,82,65,54,56,92,60,65,87,68,64,51,
70,75,66,74,68,44,55,78,69,98,67,82,77,79,62,38,88,76,99,
84,47,60,42,66,74,91,71,83,80,68,65,51,56,73,55)
My desired outcome would be similar to below-shared table, variable names can be differed(as you wish)
Could I use binCounts code into it ? if yes could you help me out using the code as not sure of bx and idxs in this code?
binCounts(x, idxs = NULL, bx, right = FALSE) ??
Age Count
38-46 3
47-55 7
56-64 7
65-73 14
74-82 10
83-91 6
92-100 3
Much Appreciated!
I don't know about the binCounts or even the package it is in but i have a bare r function:
data.frame(table(cut(Ages,0:7*9+37)))
Var1 Freq
1 (37,46] 3
2 (46,55] 7
3 (55,64] 7
4 (64,73] 14
5 (73,82] 10
6 (82,91] 6
7 (91,100] 3
To exactly duplicate your results:
lowerlimit=c(37,46,55,64,73,82,91,101)
Labels=paste(head(lowerlimit,-1)+1,lowerlimit[-1],sep="-")#I add one to have 38 47 etc
group=cut(Ages,lowerlimit,Labels)#Determine which group the ages belong to
tab=table(group)#Form a frequency table
as.data.frame(tab)# transform the table into a dataframe
group Freq
1 38-46 3
2 47-55 7
3 56-64 7
4 65-73 14
5 74-82 10
6 83-91 6
7 92-100 3
All this can be combined as:
data.frame(table(cut(Ages,s<-0:7*9+37,paste(head(s+1,-1),s[-1],sep="-"))))
Probably a similar situation has already been solved but I could not find it.
I have a mapper data frame like the following
mapper
bucket_label bucket_no
1 (-Inf; 9.99) 1
2 (25.01; 29.99) 1
3 (29.99; 30.01) 1
4 (30.01; Inf) 1
5 (19.99; 20.01) 2
6 (20.01; 24.99) 2
7 (24.99; 25.01) 2
8 (9.99; 10.11) 3
9 (10.11; 14.99) 3
10 (14.99; 15.01) 3
11 (15.01; 19.99) 3
and a vector x with random data
x <- rnorm(100)*100
I need to set the corresponding bucket for each entry of this in a quick way and findInterval and cut seem not to help for this issue.
I would like to make a bubble plot of two ordinal variables plotted against each other, with a loess line plotted trough it in SAS, could somebody help me with this?
More specific:
The two variables contain scores between 0 and 10.
my data looks pretty much like this:
data dataset;
Obs var1 var2
1 0 4
2 3 2
3 3 2
4 2 5
5 6 9
6 7 9
7 1 7
8 7 9
What I'm doing right now is just making a scatterplot and drawing a loess line trough it, but since a scatterplot of this kind of data only gives you a roster-like graph, I would like to make a bubble plot out of it to represent the frequency of each case... (so in my example the bubbles in (3,2) and (7,9) would be a bit bigger than te rest)
Afterwards however I would like to still be able to draw that loess line trough it...
Not exact but hopefully enough to get you started
data dataset;
input obs var1 var2;
cards;
1 0 4
2 3 2
3 3 2
4 2 5
5 6 9
6 7 9
7 1 7
8 7 9
;
run;
proc freq data=dataset noprint;
table var1*var2/out=data2;
run;
proc sgplot data=data2;
bubble x=var1 y=var2 size=count;
loess x=var1 y=var2;
run; quit;
I have an imputed dataset that I'm analysing, and I'm trying to draw boxplots, but I can't wrap my head around the proper procedure.
my data (a sample, original has 20 observations per imputation and 13 vars per group, all values range from 0 to 25):
.imp .id FTE_RM FTE_PD OMZ_RM OMZ_PD
1 1 25 25 24 24
1 2 4 0 2 6
1 3 11 5 3 2
1 4 12 3 3 3
2 1 20 15 15 15
2 2 4 1 2 3
2 3 0 0 0 6
2 4 20 0 0 0
.imp signifies the imputation round, .id the identifer for each observartion.
I want to draw all the FTE_* variables in a single plot (and the `OMZ_* in another), but wonder what to do with all the imputations, can I just include all values? The imputated data now has 500 observations. With for instance an ANOVA I'd need to average the ANOVA results by 5 to get back to 20 observations. But is this needed for a boxplot as well, since I only deal with medians, means, max. and min.?
Such as:
data_melt <- melt(df[grep("^FTE_", colnames(df))])
ggplot(data_melt, aes(x=variable, y=value))+geom_boxplot()
I've played a couple of times with ggplot, but consider myself a complete newbie.
I assume you want to keep the identifier for .imp and .id after melting so rather put:
data_melt <- melt(df,c(".imp",".id"))
For completeness of the dataframe it probably helps to introduce a column that identifies the type - FTE vs. OMZ:
data_melt$type <- ifelse(grepl("FTE",data_melt$variable),"FTE","OMZ")
Having this data.frame you can, for example, facet on the type (alternatively you can just use a simple filter statement on data_melt to restrict to one type):
ggplot(data_melt, aes(x=variable, y=value))+geom_boxplot()+facet_wrap(~type,scales="free_x")
This would look like this.
EDIT: fixed the data mess-up
Following are first 15 rows of my data:
> head(df,15)
frame.group class lane veh.count mean.speed
1 [22,319] 2 5 9 23.40345
2 [22,319] 2 4 9 24.10870
3 [22,319] 2 1 11 14.70857
4 [22,319] 2 3 8 20.88783
5 [22,319] 2 2 6 16.75327
6 (319,616] 2 5 15 22.21671
7 (319,616] 2 2 16 23.55468
8 (319,616] 2 3 12 22.84703
9 (319,616] 2 4 14 17.55428
10 (319,616] 2 1 13 16.45327
11 (319,616] 1 1 1 42.80160
12 (319,616] 1 2 1 42.34750
13 (616,913] 2 5 18 30.86468
14 (319,616] 3 3 2 26.78177
15 (616,913] 2 4 14 32.34548
'frame.group' contains time intervals, 'class' is the vehicle class i.e. 1=motorcycles, 2=cars, 3=trucks and 'lane' contains lane numbers. I want to create 3 scatter plots with frame.group as x-axis and mean.speed as y-axis, 1 for each class. In a scatterplot for one vehicle class e.g. cars, I want 5 plots i.e. one for each lane. I tried following:
cars <- subset(df, class==2)
by(cars, lane, FUN = plot(frame.group, mean.speed))
There are two problems:
1) R does not plot as expected i.e. 5 plots for 5 different lanes.
2) Only one is plotted and that too is box-plot probably because I used intervals instead of numbers as x-axis.
How can I fix the above issues? Please help.
Each time a new plot command is issued, R replaces the existing plot with the new plot. You can create a grid of plots by doing par(mfrow=c(1,5)), which will be 1 row with 5 plots (other numbers will have other numbers of rows and columns). If you want a scatterplot instead of a boxplot you can use plot.default
It is easier to do all this with the ggplot2 library instead of the base graphics, and the resulting plot will look much nicer:
library(ggplot2)
ggplot(cars,aes(x=frame.group,y=mean.speed))+geom_point()+facet_wrap(~lane)
See the ggplot2 documentation for more details: http://docs.ggplot2.org/current/