Ordering a variable by a specific year in ggplot bar chart R

Ordering a variable by a specific year in ggplot bar chart R - r

I have a question related to ordering specific values of a bar chart created with ggplot.
My data "df" is the following:
city X2020 X2021
1 Stuttgart 2.9 3.1
2 Munich 2.3 2.4
3 Berlin 2.2 2.3
4 Hamburg 3.8 4.0
5 Dresden 3.3 3.0
6 Dortmund 2.5 2.6
7 Paderborn 1.7 1.8
8 Essen 2.6 2.6
9 Heidelberg 3.0 3.2
10 Karlsruhe 2.5 2.4
11 Kiel 2.6 2.7
12 Ravensburg 3.3 2.7
I want exactly this kind of barchart below, but cities should be only ordered by the value of 2021! I tried "reorder" in the ggplot as recommended, but this does not fit. There are some cities where the ordering is pretty weird and I do not understand what R is doing here. My code is the following:
df_melt <- melt(df, id = "city")
ggplot(df_melt, aes(value, reorder(city, -value), fill = variable)) +
geom_bar(stat="identity", position = "dodge")
str(df_melt)
'data.frame': 24 obs. of 3 variables:
$ city : chr "Stuttgart" "Munich" "Berlin" "Hamburg" ...
$ variable: Factor w/ 2 levels "X2020","X2021": 1 1 1 1 1 1 1 1 1 1 ...
$ value : num 2.9 2.3 2.2 3.8 3.3 2.5 1.7 2.6 3 2.5 ...
https://i.stack.imgur.com/rJQMV.png
I think this gets messy because in the variable "value" there are values of both 2020 and 2021 and R possibly takes the mean of both (I dont know!). But I have no idea to deal with this further. I hope somebody can help me with my concern.
Thanks!

You could try sorting your df with arrange and then use fct_inorder to ensure that the city levels is in the order that you want.
library(tidyverse)
df <- read_table(" city X2020 X2021
1 Stuttgart 2.9 3.1
2 Munich 2.3 2.4
3 Berlin 2.2 2.3
4 Hamburg 3.8 4.0
5 Dresden 3.3 3.0
6 Dortmund 2.5 2.6
7 Paderborn 1.7 1.8
8 Essen 2.6 2.6
9 Heidelberg 3.0 3.2
10 Karlsruhe 2.5 2.4
11 Kiel 2.6 2.7
12 Ravensburg 3.3 2.7 ")
#> Warning: Missing column names filled in: 'X1' [1]
df %>%
select(-X1) %>%
pivot_longer(-city) %>%
arrange(desc(name), -value) %>%
mutate(
city = fct_inorder(city)
) %>%
ggplot(aes(city, value, fill = name)) +
geom_col(position = "dodge")
Created on 2021-07-13 by the reprex package (v1.0.0)

I just want to add to the previous answer that you can also take this plot and use coord_flip() to achieve the final result you were looking for. 😉

Related

R ggplot multiple columns facet by similar column name

Bear with me here as I am new to R.
I have a data frame with many columns, some of them have similar names:
> df
x1 x2 y1 z1 z2 z3
1 1 2 1.2 1.1 1.4 4.4
2 2 3 2.4 2.2 2.8 8.8
3 3 4 3.6 3.3 4.2 13.2
4 4 5 4.8 4.4 5.6 17.6
5 5 6 6.0 5.5 7.0 22.0
6 6 7 7.2 6.6 8.4 26.4
7 7 8 8.4 7.7 9.8 30.8
I want to plot all of the columns in the same figure, but each similar column name to be plotted in the same "facet" using ggplot. So for this there should be three sections, "x","y","z". Each facet should have a line for each column
Is there some type of ggplot solution using facet wrap?

Using some data wrangling to tidy your data you could do (where I assumed the x axis value should be the row number as you asked for a a line for each column):
library(tidyr)
library(dplyr)
library(ggplot2)
dat_tidy <- dat |>
mutate(row = row_number()) |>
pivot_longer(-row) |>
extract(name, into = c("facet", "col"), "(.)(.)")
ggplot(dat_tidy, aes(row, value, color = col)) +
geom_line() +
facet_wrap(~facet)

How to pass the filtered dataframe to a subsequent function?

I'm trying to pass a filtered dataframe onto a subsequent function.
Consider Iris dataframe. I filter out only on Versicolor species and then I want to use Sepal.Length and Sepal.Width column into a function that takes two vectors. I'm currently trying to implement DouglasPeuckerNbPoints, so I will use this as an example
iris %>%
filter(
(Species == "versicolor"))
I have tried:
library(kmlShape)
iris %>%
filter(
(Species == "versicolor")) %>%
DouglasPeuckerNbPoints(.$Sepal.Length,.$Sepal.Width,20)
But this is giving me the error "Error in xy.coords(x, y, setLab = FALSE) : 'x' and 'y' lengths differ".
Any help here?

The following works. We can put the function inside {}. This is called lambda expression as there are more than one dot. See https://magrittr.tidyverse.org/reference/pipe.html for more information.
library(tidyverse)
library(kmlShape)
iris %>%
filter(Species == "versicolor") %>%
{DouglasPeuckerNbPoints(trajx = .$Sepal.Length,
trajy = .$Sepal.Width, 20)}
# x y
# 1 7.0 3.2
# 2 4.9 2.4
# 3 6.6 2.9
# 4 5.2 2.7
# 5 5.0 2.0
# 6 5.9 3.0
# 7 6.0 2.2
# 8 5.6 2.9
# 9 6.7 3.1
# 10 5.6 3.0
# 11 6.2 2.2
# 12 5.9 3.2
# 13 6.7 3.0
# 14 5.5 2.4
# 15 5.4 3.0
# 16 6.7 3.1
# 17 6.3 2.3
# 18 5.6 3.0
# 19 5.0 2.3
# 20 5.7 2.8

Trying to use a variable as label in ggplots

I'm not sure what's going on here, but when I try to run ggplots, it tells me that u and u1 are not valid lists. Did I enter u and u1 incorrectly, that it thinks these are functions, did I forget something, or did I enter things wrong into ggplots?
u1 <- function(x,y){max(utilityf1(x))}
utilityc1 <- data.frame("utilityc1" =
u(c(0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,20),
c(0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,20)))
utilityc1 <- data.frame("utilityc1" =
u1(c(0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,20),
c(0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,20)))
hhcomp <- data.frame(
pqx, pqy, utility, hours, p1qx, p1qy, utilit, utilityc1,
utilityc, u,u1, o, o1, o2
)
library(ggplot2)
ggplot(hhcomp, aes(x=utility, y=consumption))+
coord_cartesian(xlim = c(0, 16) )+
ylim(0,20)+
labs(x = "leisure(hours)",y="counsumption(units)")+
geom_line(aes(x = u, y = consumption))+
geom_line(aes(x = u1, y = consumption))
I'm not sure what else to explain, so if someone could provide some help on providing code to stack overflow that would be useful. I'm also not sure how much of a description to have, I should have enough code to be reproducible, but there is a problem that Stack Overflow only allows so much code, so it would be good to know the right amount to add.

I think you may need to read the documentation for ggplot2 and maybe r in general.
data.frame
For starters, a data.frame object is a collection of vectors appended together column wise. Most of what you have defined as inputs for hhcomp are functions, which cannot be stored as a data.frame. A canonical example of a data frame in r is iris
head(iris)
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#1 5.1 3.5 1.4 0.2 setosa
#2 4.9 3.0 1.4 0.2 setosa
#3 4.7 3.2 1.3 0.2 setosa
#4 4.6 3.1 1.5 0.2 setosa
#5 5.0 3.6 1.4 0.2 setosa
#6 5.4 3.9 1.7 0.4 setosa
str(iris) #print the structure of an r object
#'data.frame': 150 obs. of 5 variables:
# $ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
# $ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
# $ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
# $ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
# $ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
functions
There is a lot going on with your functions. Nested functions are fine, but it seems as though you are failing to pass all values on. This probably means you are trying to apply R's scoping rules but this makes code ambiguous of where values are found.
With the currently defined functions, calling u(1:2,3:4) passes 1:2 to utilityf but utilityf's y argument is never assigned (but with r's lazy evaluation we reach a different error before r realizes that this value is missing). The next function that gets evaluated in this nest is p1qyf which is defined as follows
p1qyf <- function(y){(w1*16)-(w1*x)}
with this definition, it does not matter what you pass to the argument y it will never be used and will always return the same thing.
#with only the function defined
p1qyf()
#Error in p1qyf() : object 'w1' not found
#defining w1
w1 <- 1.5
p1qyf()
#Error in p1qyf() : object 'x' not found
x <- 10:20
#All variables defined in the function
#can now be found in the global environment
#thus the function can be called with no errors because
#w1 and x are defined somewhere...
p1qyf() #nothing assigned to y
[1] 9.0 7.5 6.0 4.5 3.0 1.5 0.0 -1.5 -3.0 -4.5 -6.0
p1qyf(y = iris) #a data.frame assigned to y
[1] 9.0 7.5 6.0 4.5 3.0 1.5 0.0 -1.5 -3.0 -4.5 -6.0
p1qyf(y = foo_bar) #an object that hasn't even been assigned yet
[1] 9.0 7.5 6.0 4.5 3.0 1.5 0.0 -1.5 -3.0 -4.5 -6.0
I imagine you actually intend to define it this way
p1qyf <- function(y){(w1*16)-(w1*y)}
#Now what we pass to it affects the output
p1qyf(1:10)
#[1] 22.5 21.0 19.5 18.0 16.5 15.0 13.5 12.0 10.5 9.0
head(p1qyf(iris))
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#1 16.35 18.75 21.90 23.7 NA
#2 16.65 19.50 21.90 23.7 NA
#3 16.95 19.20 22.05 23.7 NA
#4 17.10 19.35 21.75 23.7 NA
#5 16.50 18.60 21.90 23.7 NA
#6 15.90 18.15 21.45 23.4 NA
You can improve this further by defining more arguments so that R doesn't need to search for missing values with it's scoping rules
p1qyf <- function(y, w1 = 1.5){(w1*16)-(w1*y)}
#w1 is defaulted to 1.5 and doesn't need to be searched for.
I would spend some time looking into your functions because they are unclear and some, such as your p1qyf, do not fully use the arguments they are passed.
ggplot
ggplot takes some type of structured data object such as data.frame tbl_df, and allows plotting. The aes mappings can take the symbol names of the column headers you wish to map. Continuing with iris as an example.
ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width, color = Species))+
geom_point() +
geom_line()
I hope this helps clears up why you may be getting some errors. Honestly though, if you were actually able to declare a data.frame then the problem here is that your post is still not that reproducible. Good luck

pqxf <- function(x){(1)*(y)} # replace 1 with py and assign a value to py
pqyf <- function(y){(w * 16)-(w * x)} #
utilityf <- function(x, y) { (pqyf(x)) * ((pqxf(y)))} # the utility function C,l
hours <- c(0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,20)
w1 <- 1.5
p1qxf <- function(x){(1)*(y)} # replace 1 with py and assign a value to p1y
p1qyf <- function(y){(w1 * 16)-(w1 * x)} #
utilityf1 <- function(x, y) { (p1qyf(x)) * ((p1qxf(y)))} # the utility function (C,l)
utilitycf <- function(x,y){max(utilityf(x))/((pqyf(y)))}
utilityc1f <- function(x,y){max(utilityf1(x))/((pqyf(y)))}
u <- function(x,y){max(utilityf(x))}
u1 <- function(x,y){max(utilityf1(x))}```

Plotting sales over time in R

I am trying to show the top 100 sales on a scatterplot by year. I used the below code to take top 100 games according to sales and then set it as a data frame.
top100 <- head(sort(games$NA_Sales,decreasing=TRUE), n = 100)
as.data.frame(top100)
I then tried to plot this with the below code:
ggplot(top100)+
aes(x=Year, y = Global_Sales) +
geom_point()
I bet the below error when using the subset top100
Error: data must be a data frame, or other object coercible by fortify(), not a numeric vector
if i use the actual games dataseti get the plot attached.
Any ideas?

As pointed out in comments by #CMichael, you have several issues in your code.
In absence of reproducible example, I used iris dataset to explain you what is wrong with your code.
top100 <- head(sort(games$NA_Sales,decreasing=TRUE), n = 100)
By doing that you are only extracting a single column.
The same command with the iris dataset:
> head(sort(iris$Sepal.Length, decreasing = TRUE), n = 20)
[1] 7.9 7.7 7.7 7.7 7.7 7.6 7.4 7.3 7.2 7.2 7.2 7.1 7.0 6.9 6.9 6.9 6.9 6.8 6.8 6.8
So, first, you do not have anymore two dimensions to be plot in your ggplot2. Second, even colnames are not kept during the extraction, so you can't after ask for ggplot2 to plot Year and Global_Sales.
So, to solve your issue, you can do (here the example with the iris dataset):
top100 = as.data.frame(head(iris[order(iris$Sepal.Length, decreasing = TRUE), 1:2], n = 100))
And you get a data.frame of of this type:
> str(top100)
'data.frame': 100 obs. of 2 variables:
$ Sepal.Length: num 7.9 7.7 7.7 7.7 7.7 7.6 7.4 7.3 7.2 7.2 ...
$ Sepal.Width : num 3.8 3.8 2.6 2.8 3 3 2.8 2.9 3.6 3.2 ...
> head(top100)
Sepal.Length Sepal.Width
132 7.9 3.8
118 7.7 3.8
119 7.7 2.6
123 7.7 2.8
136 7.7 3.0
106 7.6 3.0
And then if you are plotting:
library(ggplot2)
ggplot(top100, aes(x = Sepal.Length, y = Sepal.Width)) + geom_point()
Warning Based on what you provided in your example, I will suggest you to do:
top100 <- as.data.frame(head(games[order(games$NA_Sales,decreasing=TRUE),c("Year","Global_Sales")], 100))
However, if this is not satisfying to you, you should consider to provide a reproducible example of your dataset How to make a great R reproducible example

Multiple columns of data and getting average R program

I asked a question like this before but I decided to simplify my data format because I'm very new at R and didnt understand what was going on....here's the link for the question How to handle more than multiple sets of data in R programming?
But I edited what my data should look like and decided to leave it like this..in this format...
X1.0 X X2.0 X.1
0.9 0.9 0.2 1.2
1.3 1.4 0.8 1.4
As you can see I have four columns of data, The real data I'm dealing with is up to 2000 data points.....Columns "X1.0" and "X2.0" refer "Time"...so what I want is the average of "X" and "X.1" every 100 seconds based on my 2 columns of time which are "X1.0" and "X2.0"...I can do it using this command
cuts <- cut(data$X1.0, breaks=seq(0, max(data$X1.0)+400, 400))
 by(data$X, cuts, mean)
But this will only give me the average from one set of data....which is "X1.0" and "X".....How will I do it so that I could get averages from more than one data set....I also want to stop having this kind of output
cuts: (0,400]
[1] 0.7
------------------------------------------------------------
cuts: (400,800]
[1] 0.805
Note that the output was done every 400 s....I really want a list of those cuts which are the averages at different intervals...please help......I just used data=read.delim("clipboard") to get my data into the program

It is a little bit confusing what output do you want to get.
First I change colnames but this is optional
colnames(dat) <- c('t1','v1','t2','v2')
Then I will use ave which is like by but with better output. I am using a trick of a matrix to index column:
matrix(1:ncol(dat),ncol=2) ## column1 is col1 adn col2...
[,1] [,2]
[1,] 1 3
[2,] 2 4
Then I am using this matrix with apply. Here the entire solution:
cbind(dat,
apply(matrix(1:ncol(dat),ncol=2),2,
function(x,by=10){ ## by 10 seconds! you can replace this
## with 100 or 400 in you real data
t.col <- dat[,x][,1] ## txxx
v.col <- dat[,x][,2] ## vxxx
ave(v.col,cut(t.col,
breaks=seq(0, max(t.col),by)),
FUN=mean)})
)
EDIT correct the cut and simplify the code
cbind(dat,
apply(matrix(1:ncol(dat),ncol=2),2,
function(x,by=10)ave(dat[,x][,1], dat[,x][,1] %/% by)))
X1.0 X X2.0 X.1 1 2
1 0.9 0.9 0.2 1.2 3.3000 3.991667
2 1.3 1.4 0.8 1.4 3.3000 3.991667
3 2.0 1.7 1.6 1.1 3.3000 3.991667
4 2.6 1.9 2.2 1.6 3.3000 3.991667
5 9.7 1.0 2.8 1.3 3.3000 3.991667
6 10.7 0.8 3.5 1.1 12.8375 3.991667
7 11.6 1.5 4.1 1.8 12.8375 3.991667
8 12.1 1.4 4.7 1.2 12.8375 3.991667
9 12.6 1.8 5.4 1.2 12.8375 3.991667
10 13.2 2.1 6.3 1.3 12.8375 3.991667
11 13.7 1.6 6.9 1.1 12.8375 3.991667
12 14.2 2.2 9.4 1.3 12.8375 3.991667
13 14.6 1.8 10.0 1.5 12.8375 10.000000

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Ordering a variable by a specific year in ggplot bar chart R - r

I just want to add to the previous answer that you can also take this plot and use coord_flip() to achieve the final result you were looking for. 😉

Related

R ggplot multiple columns facet by similar column name

How to pass the filtered dataframe to a subsequent function?

Trying to use a variable as label in ggplots

Plotting sales over time in R

Multiple columns of data and getting average R program

Categories

Resources