Incorrect order of stack, identity geom_bar - r

I used dplyr to filter a dataset, which resulted in the tibble below. I want to create a stacked bar chart of the types of features and their capability levels. I would like the bar chart to be ordered from largest frequency to smallest.
Using the code below, the plot that is output has the first two values reversed. Is this because "Position" only has two capability levels, whereas the rest have 3? Even then the highest frequency overall is 96 and belongs to a "Distance" level.
I would ideally like to do the least amount of "brute forcing" to make the code work as the actual data I am working with have over 10 types of features, some with only one capability level.
# A tibble: 11 x 3
# Groups: Type.of.Feature [?]
Type.of.Feature Capability.Category Freq
<fct> <chr> <int>
1 Diameter <1 75
2 Diameter >1.33 5
3 Diameter 1-1.33 13
4 Distance <1 96
5 Distance >1.33 5
6 Distance 1-1.33 6
7 Position <1 90
8 Position >1.33 4
9 Radius <1 7
10 Radius >1.33 1
11 Radius 1-1.33 2
ggplot(freq, aes(x=reorder(Type.of.Feature, -Freq), y=Freq, fill=Capability.Category)) +
geom_bar(stat="identity", position="stack")

Please follow the below procedure to order your bars
#Import Data
file1<- readxl::read_excel(file.choose())
#Import Required Libraries
library(ggplot2)
library(dplyr)
#Split Dataframe into list based on the Type.of.Feature factor
factor_list <-split.data.frame(file1, f= file1$Type.of.Feature)
#Create new column with frequency sum for each of the level of factor above
for( lnam in names(factor_list)){
factor_list[[lnam]]["group_sum"]<- sum(factor_list[[lnam]]["Freq"])
}
#Get back the data into dataframe
file1<- rbind_list(factor_list)
#Use newly created group frequency to order your bars
ggplot(file1, aes(x=reorder(Type.of.Feature, -group_sum), y=Freq, fill=Capability.Category)) +
geom_bar(stat="identity", position="stack")

Related

How to interpolate a single point where line crosses a baseline between two points [duplicate]

This question already has answers here:
get x-value given y-value: general root finding for linear / non-linear interpolation function
(2 answers)
Closed 3 years ago.
I am new to R but I am trying to figure out an automated way to determine where a given line between two points crosses the baseline (in this case 75, see dotted line in image link below) in terms of the x-coordinate. Once the x value is found I would like to have it added to the vector of all the x values and the corresponding y value (which would always be the baseline value) in the y value vectors. Basically, have a function look between all points of the input coordinates to see if there are any linear lines between two points that cross the baseline and if there are, to add those new coordinates at the baseline crossing to the output of the x,y vectors. Any help would be most appreciated, especially in terms of automating this between all x,y coordinates.
https://i.stack.imgur.com/UPehz.jpg
baseline = 75
X <- c(1,2,3,4,5,6,7,8,9,10)
y <- c(75,53,37,25,95,35,50,75,75,75)
Edit: added creation of combined data frame with original data + crossing points.
Adapted from another answer related to two intersecting series with uniform X spacing.
baseline = 75
X <- c(1,2,3,4,5,6,7,8,9,10)
Y1 <- rep(baseline, 10)
Y2 <- c(75,53,37,25,95,35,50,75,75,75)
# Find points where x1 is above x2.
above <- Y1>Y2
# Points always intersect when above=TRUE, then FALSE or reverse
intersect.points<-which(diff(above)!=0)
# Find the slopes for each line segment.
Y2.slopes <- (Y2[intersect.points+1]-Y2[intersect.points]) /
(X[intersect.points+1]-X[intersect.points])
Y1.slopes <- rep(0,length(Y2.slopes))
# Find the intersection for each segment
X.points <- intersect.points + ((Y2[intersect.points] - Y1[intersect.points]) / (Y1.slopes-Y2.slopes))
Y.points <- Y1[intersect.points] + (Y1.slopes*(X.points-intersect.points))
# Plot.
plot(Y1,type='l')
lines(Y2,type='l',col='red')
points(X.points,Y.points,col='blue')
library(dplyr)
combined <- bind_rows( # combine rows from...
tibble(X, Y2), # table of original, plus
tibble(X = X.points,
Y2 = Y.points)) %>% # table of interpolations
distinct() %>% # and drop any repeated rows
arrange(X) # and sort by X
> combined
# A tibble: 12 x 2
X Y2
<dbl> <dbl>
1 1 75
2 2 53
3 3 37
4 4 25
5 4.71 75
6 5 95
7 5.33 75
8 6 35
9 7 50
10 8 75
11 9 75
12 10 75

divide not rectangle plot into subplots within spatstat package in R

I have data that contains information about sub-plots with different numbers and their corresponding species types (more than 3 species within each subplot). Every species have X & Y coordinates.
> df
subplot species X Y
1 1 Apiaceae 268675 4487472
2 1 Ceyperaceae 268672 4487470
3 1 Vitaceae 268669 4487469
4 2 Ceyperaceae 268665 4487466
5 2 Apiaceae 268662 4487453
6 2 Magnoliaceae 268664 4487453
7 3 Magnoliaceae 268664 4487453
8 3 Apiaceae 268664 4487456
9 3 Vitaceae 268664 4487458
with these data, I have created ppp for the points of each subplot within a window of general plot (big).
grp <- factor(data$subplot)
win <- ripras(data$X, data$Y)
p.p <- ppp(data$X, data$Y, window = window, marks = grp)
Now I want to divide a plot into equal 3 x 3 sub-plots because there are 9 subplots. The genetal plot is not rectangular looks similar to rombo shape when I plot.
I could use quadrats() funcion as below but it has divided my plot into unequal subplots. Some are quadrat, others are traingle etc which I don't want. I want all the subplots to be equal sized quadrats (divide it by lines that paralel to each sides). Can you anyone guide me for this?
divide <-quadrats(p.patt,3,3)
plot(divide)
Thank you!
Could you break up the plot canvas into 3x3, then run each plot?
> par(mfrow=c(3,3))
> # run code for plot 1
> # run code for plot 2
...
> # run code for plot 9
To return back to one plot on the canvas type
> par(mfrow=c(1,1))
This is a question about the spatstat package.
You can use the function quantess to divide the window into tiles of equal area. If you want the tile boundaries to be vertical lines, and you want 7 tiles, use
B <- quantess(Window(p.patt), "x", 7)
where p.patt is your point pattern.

How do I change the size of the points based on the value of the column using ggplot?

Using ggplot on rstudio, I am trying to change the size of the point of my scatter plot based on the log of the pvalue column. This is how my matrix looks like.
head(BDpvalue)
id t-value pvalue mean.f mean.m Gene Chromosome
1 ILMN_1212619 3.0512842692996 0.00938046962249251 85.40076 80.02744 Mfap3l 8
2 ILMN_1212693 3.40887110529531 0.00452088152864021 87.28189 82.89533 Snx33 9
3 ILMN_1213324 -4.54750670298688 0.000414140589714275 82.68924 88.81421 Zfp961 8
4 ILMN_1213848 -3.63180275429357 0.00246745595956587 421.61780 469.51845 Itgb1bp1 12
5 ILMN_1213961 2.97573716869553 0.00960659647288939 82.01748 78.44721 Copg2 6
6 ILMN_1214482 -4.23666060706341 0.000813240203181102 136.55021 153.34681 2700081O15Rik 19
>
The code to change the size based on the log of the pvalue column seems correct to me, but for some reason I am not seeing a change in the graph, this is the code that I used.
ggplot(BDpvalue, aes(x=(log(mean.m,10)+log(mean.f,10))/2,
y= log(mean.f/mean.m,10),color=Chromosome) + geom_point(aes(size=(-log(pvalue,10))))

How to create two barplots of unequal height (different max values) in R but with the same units on the Y axis?

Is it possible to make barplots (two) of unequal size (different max values on Y axis) but equal units (count data)?
The data is count data of the number of nesting attempts per season. Each species has 7 seasons of data. My objective is to present the data as clearly as possible for the reader to show the increase in the number of each of the two species nesting season on season. Although the initial pattern of increase is similar for both species, the number of species 1 nesting rises more rapidly. Plotting both sets of data on the same barplot is not a good option because the 7 seasons of data are not concurrent for the two species - rather it is the first 7 years of colonisation for each species (eg the labels on the x axis are different for the two species)
I have tried par(fig) and layout but not yet achieved what I need and I am not sure which function is better suited to what I need. Any advice welcome
Two barplots, one above the other, each taking up half the window. The Y units are the same for both graphs but the maximum for one is 300 whilst the other is 900. When they are plotted a count of 100 looks very different on the two graphs
SPECIES1 <- c(2,12,44,153,451,857)
SPECIES2 <- c(4,15,35,54,63,243)
windows(11,12)
par(oma=c(3,0.1,1,0.1),mfrow=c(2,1),mar=c(2,6,2,2.1))
barplot(SPECIES2,space=c(0.1,0),ylim=c(0,300),col="black",axes=FALSE)
axis(2,at=seq(0,300,100),las=2, cex.axis=0.9)
barplot(SPECIES1,space=c(0.1,0),ylim=c(0,900), col="black",border=NA,axes=FALSE )axis(2,at=seq(0,900,100),las=2,cex.axis=0.9)
Here how you go by using ggplot package
## supp dose len
## 1 VC D0.5 6.8
## 2 VC D1 15.0
## 3 VC D2 33.0
## 4 OJ D0.5 4.2
## 5 OJ D1 10.0
## 6 OJ D2 29.5
ggplot(data=df2, aes(x=dose, y=len, fill=supp)) +
geom_bar(stat="identity", position=position_dodge())
But you need third variable(supp in above case). Please provide Sample data which you want to plot for clear answer.

2 y-axes Dumbbell ggplot2

enter image description hereI am quite new to R and programming in general. So please forgive my ignorance, I am trying to learn.
I have two sets of data and I would like to plot them against each other. Both have 27 rows and 3 columns; one set is called "range" and the other is called "rangePx".
Column “Comp” has the different components, column “Min” is the minimum concentration in % and column “Max” is the maximum concentration in %.
I want to make a 2-y axis dumbbell plot, with the y axis being the different components and x axis being the concentration.
I do manage to create 1 y axis dumbbell plot, but I have troubles to add the second y axis.
Here is a snap from the "range" data
head(range)
# A tibble: 6 x 3
Comp Min Max
<chr> <dbl> <dbl>
1 Methane 0.0100 100
2 Ethane 0.0100 65.0
3 Ethene 0.100 20.0
4 Propane 0.0100 40.0
5 Propene 0.100 6.00
6 Propadien 0.0500 2.00
and here is a snap from the "rangePx" data
head(rangePx)
# A tibble: 6 x 3
Comp Min Max
<chr> <dbl> <dbl>
1 Methane 50.0 100
2 Ethane 0.00800 14.0
3 Ethene 0 0
4 Propane 0.00800 8.00
5 Propene 0 0
6 Propadien 0 0
Here is the piece of code that I use:
library(ggplot2)
library(ggalt)
library(readxl)
theme_set(theme_classic())
range <- read_excel(range.xlsx)
rangePx <- read_excel(rangePx.xlsx")
p <- ggplot(range, aes(x=Max, xend=Min, y = Comp, group=Comp))
p <- p + geom_dumbbell(color="blue")
p
px <- ggplot(rangePx, aes(x=Max, xend=Min, y = Comp, group=Comp))
px <- px + geom_dumbbell(color="green")
p <- p + geom_dumbbell(aes(y=px, color="red"))
p
and here is the complain I get when I call p:
Error: Aesthetics must be either length 1 or the same as the data (27): y, colour, x, xend, group
Here I saw a 6x3 data frame but my original data are 27x3
can anyone help me?
Thnx in advance
ggplot2 does not have the ability to plot 2 y-axes - this is an intentional decision by Hadley Wickham who wrote the package. You can see his response to a similar question here where he comments on his reasons for not including:
Plot with 2 y axes, one y axis on the left, and another y axis on the right
As mentioned in the comments and in reply to the question, if you want to use ggplot2 you have to use faceting to compare. Otherwise you need to use a different plotting package.

Resources