I am working on a music streaming project, and I am trying to get the top15 global streamings in 2020 and make it an interactive graph.
It successfully showed the top 15 song names as a dataframe, but it failed to show as a bar graph, I wonder where did I do wrong here? Although it worked after I flip the bar graph into horizontal, but the data seem to look a bit off.
It looks like this as a vertical bar graph:
The horizontical bar graph looks like this, but the data seem incorrect:
Here is the code I have:
library("dplyr")
library("ggplot2")
# load the .csv into R studio, you can do this 1 of 2 ways
#read.csv("the name of the .csv you downloaded from kaggle")
spotiify_origional <- read.csv("charts.csv")
spotiify_origional <- read.csv("https://raw.githubusercontent.com/info201a-au2022/project-group-1-section-aa/main/data/charts.csv")
View(spotiify_origional)
# filters down the data
# removes the track id, explicit, and duration columns
spotify_modify <- spotiify_origional %>%
select(name, country, date, position, streams, artists, genres = artist_genres)
#returns all the data just from 2022
#this is the data set you should you on the project
spotify_2022 <- spotify_modify %>%
filter(date >= "2022-01-01") %>%
arrange(date) %>%
group_by(date)
# use write.csv() to turn the new dataset into a .csv file
write.csv(Your DataFrame,"Path to export the DataFrame\\File Name.csv", row.names = FALSE)
write.csv(spotify_2022, "/Users/oliviasapp/Documents/info201/project-group-1-section-aa/data/spotify_2022.csv" , row.names = FALSE)
# then I pushed the spotify_2022.csv to the GitHub repo
View(spotiify_origional)
spotify_2022_global <- spotify_modify %>%
filter(date >= "2022-01-01") %>%
filter(country == "global") %>%
arrange(date) %>%
group_by(streams)
View(spotify_2022_global)
top_15 <- spotify_2022_global[order(spotify_2022_global$streams, decreasing = TRUE), ]
top_15 <- top_15[1:15,]
top_15$streams <- as.numeric(top_15$streams)
View(top_15)
col_chart <- ggplot(data = top_15) +
geom_col(mapping = aes(x = name, y = streams)) +
ggtitle("Top 15 Songs Daily Streamed Globally") +
theme(plot.title = element_text(hjust = 0.5))
col_chart <- col_chart + coord_cartesian(ylim = c(999000,1000000)) + coord_flip()
col_chart
Thank you so much! Any suggestions will hugely help!
top_15 <- spotify_2022_global[order(spotify_2022_global$streams, decreasing = TRUE), ]
This code sorts in decreasing order, but the streams data here is still of character type, so numbers like 999975 will be "higher" than 1M, which is why your data looks weird. One song had two weeks just under 1M which is why it shows up with ~2M.
If you use this instead you'll get more what you intended:
top_15 <- spotify_2022_global[order(as.numeric(spotify_2022_global$streams), decreasing = TRUE), ]
However, this is finding the highest song-weeks, not the highest songs, so in this case all 15 highest song-weeks were one song.
I'd suggest you group_by(name) and then summarize to get total streams by song, filter top 15, and then make name an ordered factor, e.g. with forcats::fct_reorder.
I would like the user to be able to page through the plots in a trelliscopejs object by group. In the following example the group I would like to paginate by is "continent". In the example, there are always 10 plots per page. The first 5 plots are from continent Africa, the next three Americas and the final two Asia. Instead, I would like each page to show one continent only by default (user could then sort and filter as needed). So the 5 Africa plots would show on the first page, then the user would click next, and view the 3 Americas plots, then click next and view 3 Asia plots, and so on.
library(trelliscopejs)
library(tidyverse)
library(gapminder)
library(purrr)
library(plotly)
# reduced gapminder dataset to use for this example
set.seed(123)
data <- gapminder %>%
filter(country %in% sample(levels(country),15)) %>%
nest(data = !one_of(c("country", "continent")))
# add a plot column with map_plot
pops <- data %>% mutate(
panel = map_plot(data, function(x) {
plot_ly(data = x, x = ~year, y = ~pop,
type = "scatter", mode = "markers")
}))
# generate trelliscope object
pops %>%
arrange(continent, country) %>%
trelliscope(name = "populations", nrow = 2, ncol = 5)
Would it be possible to paginate by group in this way?
I have a column that contains 4 variables which are( Bad , Good , Very Good , Excellent )
I need to count how much they repeats in that column and compare each of them and presint to me in pie chart and bar chart in echarts4r
For example : df <- data.frame( var = c("low","low","low","high") ) i want the same result as ggplot(df)+geom_bar(aes(var)).
First you need to create a dataframe which shows the count per var and after that you can use this in e_chart with e_bar like this:
df <- data.frame( var = c("low","low","low","hight") )
library(dplyr)
library(echarts4r)
df_result <- df %>%
count(var) %>%
arrange(n)
plot <- df_result %>%
e_charts(x = var) %>%
e_bar(n, legend = FALSE, name = "var")
Output:
Which is the same result using ggplot:
ggplot(df)+geom_bar(aes(var))
I am at the final stages of a project where i have been comparing the appraisal price vs the sold price of different properties. The complete code for data collection and tidying is below.
At this stage i am looking at different ways to visualize my data. However, I am quite new to it so my question is whether anyone has any "new" or special ways they visualizing data that they find usefull og intuitive. I have given a couple of examples of what i am able to visualize now using ggplot.
Additionally: Now my visualizations plots all 1275 observations every time. I would however also like to visualize the data both with mean and median for the Percentage, Sold and Tax variables which i am most interested in. For example to visualize the mean value of the Percentage column based on different years.
Appreciate any help!
Complete code:
#Step 1: Load needed library
library(tidyverse)
library(rvest)
library(jsonlite)
library(stringi)
library(dplyr)
library(data.table)
library(ggplot2)
#Step 2: Access the URL of where the data is located
url <- "https://www.forsvarsbygg.no/ListApi/ListContent/78635/SoldEstates/0/10/"
#Step 3: Direct JSON as format of data in URL
data <- jsonlite::fromJSON(url, flatten = TRUE)
#Step 4: Access all items in API
totalItems <- data$TotalNumberOfItems
#Step 5: Summarize all data from API
allData <- paste0('https://www.forsvarsbygg.no/ListApi/ListContent/78635/SoldEstates/0/', totalItems,'/') %>%
jsonlite::fromJSON(., flatten = TRUE) %>%
.[1] %>%
as.data.frame() %>%
rename_with(~str_replace(., "ListItems.", ""), everything())
#Step 6: removing colunms not needed
allData <- allData[, -c(1,4,8,9,11,12,13,14,15)]
#Step 7: remove whitespace and change to numeric in columns SoldAmount and Tax
#https://stackoverflow.com/questions/71440696/r-warning-argument-is-not-an-atomic-vector-when-attempting-to-remove-whites/71440806#71440806
allData[c("Tax", "SoldAmount")] <- lapply(allData[c("Tax", "SoldAmount")], function(z) as.numeric(gsub(" ", "", z)))
#Step 8: Remove rows where value is NA
#https://stackoverflow.com/questions/4862178/remove-rows-with-all-or-some-nas-missing-values-in-data-frame
alldata <- allData %>%
filter(across(where(is.numeric),
~ !is.na(.)))
#Step 9: Remove values below 10000 NOK on SoldAmount og Tax.
alldata <- alldata %>%
filter_all(any_vars(is.numeric(.) & . > 10000))
#Step 10: Calculate percentage change between tax and sold amount and create new column with percent change
#df %>% mutate(Percentage = number/sum(number))
alldata_Percent <- alldata %>% mutate(Percentage = (SoldAmount-Tax)/Tax)
Visualization
# Plot Percentage difference based on County
ggplot(data=alldata_Percent,mapping = aes(x = Percentage, y = County)) +
geom_point(size = 1.5)
#Plot County with both Date and Percentage difference The The
theme_set(new = ggthemes::theme_economist())
p <- ggplot(data = alldata_Percent,
mapping = aes(x = Date, y = Percentage, colour = County)) +
geom_line(na.rm = TRUE) +
geom_point(na.rm = TRUE)
p
I am trying to do the following:
I have a two datasets about my company. The first one has, say, the top 20 growing sellers. The second one has the bottom 20 losing sellers. So, it's something like this:
growing_seller <- c("a","b","c","d","e","f","g","h","i","h")
sales_yoy_growing <- c(100000,90000,75000,50000,37500,21000,15000,12000,10000,8000)
top_growing <- data.frame(growing_seller,sales_yoy_growing)
losing_seller <- c("i","j","k","l","m","n","o","p","q","r")
sales_yoy_losing <- c(-90000,-75000,-50000,-37500,-21000,-15000,-12000,-10000,-8000,-5000)
bottom_losing <- data.frame(losing_seller,sales_yoy_losing)
I am trying to plot both charts in the same plot using DIFFERENT categories, corresponding to the sellers' name. So what I have so far is this:
library(highcharter)
growing_seller <- c("a","b","c","d","e","f","g","h","i","h")
sales_yoy_growing <- c(100000,90000,75000,50000,37500,21000,15000,12000,10000,8000)
top_growing <- data.frame(growing_seller,sales_yoy_growing)
losing_seller <- c("i","j","k","l","m","n","o","p","q","r")
sales_yoy_losing <- c(-90000,-75000,-50000,-37500,-21000,-15000,-12000,-10000,-8000,-5000)
bottom_losing <- data.frame(losing_seller,sales_yoy_losing)
highchart() %>%
hc_add_series(
data = top_growing$sales_yoy_growing,
type = "column",
grouping = FALSE
) %>%
hc_add_series(
data = bottom_losing$sales_yoy_losing,
type = "column"
)
This is what I want to achieve graphically: Chart example
Now,I would like to have a different category array per each independent x-axis: something like the possibility to have "two hc_xAxis" controls, where I could specify per each plotted series its own categories.
My final aim is to, then, have the seller's name as I parse over each of the different columns.
Hope I was clear enough :)
Thanks
Highcharts displays the point's name in the tooltip by default. You just need to point the name value in your data.
You can do it this way:
top_growing <- data.frame(name = growing_seller, y = sales_yoy_growing)
This is the whole code:
library(highcharter)
growing_seller <- c("a","b","c","d","e","f","g","h","i","h")
sales_yoy_growing <- c(100000,90000,75000,50000,37500,21000,15000,12000,10000,8000)
top_growing <- data.frame(name = growing_seller, y = sales_yoy_growing)
losing_seller <- c("i","j","k","l","m","n","o","p","q","r")
sales_yoy_losing <- c(-90000,-75000,-50000,-37500,-21000,-15000,-12000,-10000,-8000,-5000)
bottom_losing <- data.frame(name = losing_seller, y = sales_yoy_losing)
highchart() %>%
hc_add_series(
data = top_growing,
type = "column",
grouping = FALSE
) %>%
hc_add_series(
data = bottom_losing,
type = "column"
)