R Shiny - Groups are not represented as specified in plot - r

I've grouped some Data about the titanic the following way:
priceMutate <- mutate(titanic, PriceGroup = ifelse (Fare < 51,
'0 - 50',
ifelse(Fare >=51 & Fare < 101,
'51-100',
ifelse(Fare >= 101 & Fare < 151,
'101-150',
ifelse(Fare >= 151 & Fare < 201,
'151-200',
ifelse(Fare >= 201 & Fare < 251,
'201-250',
ifelse(Fare >= 251 & Fare < 301,
'251-300',
ifelse(Fare >= 301 & Fare < 351,
'301-350',
ifelse(Fare >= 351 & Fare < 401,
'351-400',
ifelse(Fare >= 401 & Fare < 451,
'401-450',
ifelse(Fare > 450,
'451+','?')))))))))))
"Fare" is the price payed for a ticket for the titanic. I've chosen steps of 50$.
Now here is my problem:
I've made a plot that shows the chance of survival regarding the price of the tickets:
output$ex15 <- renderPlot({
ggplot(priceMutate,
aes(x = PriceGroup,
fill = Status)) +
geom_bar(position = "fill")+
ggtitle("Überlebenschancen nach Preis des Tickets (gruppiert)")+
theme(plot.title = element_text(size = 16))+
scale_fill_manual(values = c("grey24", "snow"))+
labs(y= "Anzahl")
})
However this plot mixes up the groups I made and does not show the "?" for the not-available data!
Can anyone see a problem/mistake that I've made?
Here is a link to my dataset: https://drive.google.com/file/d/1xsIfkv1464etX23O0J9y35CviK0mKYQl/view?usp=sharing
Thank you a lot :)

As already mentioned by #YBS in the comments at least for your example data there is no observation which will be assigned a "?" as all values are in the range 0 to 512 and there are no missings.
Concerning your second issue, as you recoded the Fare column as a character your PriceGroups will be ordered alphabetically by default. And alphabetically a string starting with a 4 like 451+ comes before a string starting with a 5 like 51-100. If you want the categories to be ordered you have to convert to a factor with the levels set according to your desired order. This for example could be achieved via the cut function which makes it easy to recode a numeric to intervals and which will automatically convert to a factor. If you do that often I also would suggest to have a look at the santoku package which makes it even easier set nice labels.
Finally, instead of using your data I created a minimal reproducible example by using some fake random example data to mimic your real data:
library(shiny)
library(tidyverse)
# Create fake example data
set.seed(123)
titanic <- data.frame(
PassengerId = 1:100,
Survived = sample(0:1, 100, replace = TRUE),
Fare = runif(100, 0, 512)
)
# Set breaks and labels
breaks <- c(0, seq(51, 451, 50), Inf)
labels <- paste(breaks[-length(breaks)], breaks[-1], sep = "-")
labels[length(labels)] <- "451+"
priceMutate <- titanic %>%
mutate(PriceGroup = cut(Fare, breaks = breaks, labels = labels, right = FALSE),
Status = recode(Survived, "0" = "Dead", "1" = "Survived"))
ui <- fluidPage(
plotOutput("ex15")
)
server <- function(input, output, session) {
output$ex15 <- renderPlot({
ggplot(priceMutate,
aes(x = PriceGroup,
fill = Status)) +
geom_bar(position = "fill")+
ggtitle("Überlebenschancen nach Preis des Tickets (gruppiert)")+
theme(plot.title = element_text(size = 16))+
scale_fill_manual(values = c("grey24", "snow"))+
labs(y= "Anzahl")
})
}
shinyApp(ui, server)
#>
#> Listening on http://127.0.0.1:8734

Related

R: Sentimentanalyse with quanteda package - groups does not work -> Error: groups must have length ndoc(x)

this is my first Question, i hope someone can help, i really dont know what else to do.
So its about the website:
http://inhaltsanalyse-mit-r.de/sentiment.html There is an example on how to make an Sentimentanalyse, which I want to do. It is this code:
if(!require("quanteda")) {install.packages("quanteda"); library("quanteda")}
if(!require("readtext")) {install.packages("readtext"); library("readtext")}
if(!require("tidyverse")) {install.packages("tidyverse"); library("tidyverse")}
if(!require("scales")) {install.packages("scales"); library("scales")}
theme_set(theme_minimal())
////
load("daten/twitter/trumpclinton.korpus.RData")
korpus.stats.monat <- ungroup(korpus.stats.monat)
korpus.stats.monat
///
korpus.trump <- corpus_subset(korpus, Kandidat == "Trump")
meine.dfm.trump <- dfm(korpus.trump, groups = c("monat", "jahr"), dictionary = sentiment.lexikon)
sentiment.trump <- convert(meine.dfm.trump, "data.frame") %>%
gather(positive, negative, key = "Polarität", value = "Wörter") %>%
mutate(Datum = as.Date(paste("01", doc_id, sep = "."), "%d.%m.%Y")) %>%
filter(Datum >= "2015-04-01" & Datum <= "2017-04-01")
ggplot(sentiment.trump, aes(Datum, Wörter, colour = Polarität, group = Polarität)) +
geom_line(size = 1) +
scale_colour_brewer(palette = "Set1") +
scale_x_date(date_breaks = "2 months", date_labels = "%b %Y") +
ggtitle("Sentiment-Scores für Donald Trump") + xlab("Monat") +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
which gives me this:
meine.dfm.trump <- dfm(korpus.trump, groups = c("monat", "jahr"), dictionary = sentiment.lexikon)
Error: groups must have length ndoc(x) In addition: Warning messages:
1: 'dfm.corpus()' is deprecated. Use 'tokens()' first. 2: 'groups' is
deprecated; use dfm_group() instead
so i rearranged it to have it updated, maybe the older version was the problem:
korpus.trump <- corpus_subset(korpus, Kandidat == "Trump")
korpus.trump.tokens <- tokens(korpus.trump)
meine.dfm.trump <- dfm(korpus.trump.tokens)
meine.dfm.trump.group <- dfm_group (meine.dfm.trump, c(monat, jahr))
Which still gives me this:
Error: groups must have length ndoc(x)
I have understood that it means that the object is kind of not long enough?
But when I look at monat and jahr in the dfm it shows me under "type" character[11221] for both, so they should have the same length.
It works when i just use monat or jahr. But then the plot does not show me anything but the names at the side
To solve your issue you need to reorganize the code a bit. Quanteda has changed a bit since the site "Automatisierte Inhaltsanalyse mit R" was created.
The bit you need to change is below.
....
meine.dfm.trump <- tokens(korpus.trump) %>%
dfm() %>%
dfm_group(groups = interaction(monat, jahr)) %>%
dfm_lookup(dictionary = sentiment.lexikon)
....

Change each histogram color in chart

I have this histogram separated in five categories depending the age. The problem is that I cannot change the color depending the category. I tried to use the marker function with an array filled with the colors I want, but it didn't work as expected:
As you can see, the colors are all bugged.
This is what I tried:
less20 <- subset(dataset, dataset$EDAD <20)
between20n40 <- subset(dataset, dataset$EDAD >=20 & dataset$EDAD <40)
between40n60 <- subset(dataset, dataset$EDAD >=40 & dataset$EDAD <60)
between60n80 <- subset(dataset, dataset$EDAD >=60 & dataset$EDAD <80)
more80 <- subset(dataset, dataset$EDAD >=80)
plot_ly(alpha = 0.7, orientation = 'h', marker = list(color = c('rgba(31,119,180,1)','rgba(105,122,125,1)','rgba(183,124,67,1)', 'rgba(243,127,23,1)','rgba(255,127,14,1)'))) %>%
add_histogram(y = more80$EDAD, name = "More than 80") %>%
add_histogram(y = between60n80$EDAD, name = "Between 60 and 79") %>%
add_histogram(y = between40n60$EDAD, name = "Between 40 and 59") %>%
add_histogram(y = between20n40$EDAD, name = "Between 20 and 39") %>%
add_histogram(y = less20$EDAD, name = "Less than 20") %>%
layout(barmode = "group", title = "",orientation="h")
The correct color order is the next one:
However, I want to change those colors.
Any recomendations? Thanks in advance :)
I think it might be easier if you put the values along with their corresponding ranges inside a dataframe and color the plot using these values and ranges. This is my solution to this using ggplot2. You can define Values with your original dataset for the histogram and should obtain a similar result to yours. This solution uses a 1000 normally distributed sample with SD = 30 and MEAN = 70 in order to produce the plot.
# Import ggplot2
library("ggplot2")
# Obtain sample values for histogram
set.seed(1234)
Values = rnorm(n = 1000, mean = 70, sd = 30)
Range = c()
# Get ranges for each value in data
for(i in 1:length(Values)){
if(Values[i] >= 80){
Range[i] = "More than 80"
} else if (Values[i] < 80 & Values[i] >= 60){
Range[i] = "Between 60 and 79"
} else if (Values[i] < 60 & Values[i] >= 40){
Range[i] = "Between 40 and 59"
} else if (Values[i] < 40 & Values[i] >= 20){
Range[i] = "Between 20 and 39"
} else {
Range[i] = "Less than 20"
}
}
# Put all data inside a data frame
plot_dat = data.frame(Values, Range)
# Order plot labels
plot_dat$Range <- factor(plot_dat$Range, levels = c("More than 80", "Between 60 and 79", "Between 40 and 59", "Between 20 and 39", "Less than 20"))
# Produce plot
ggplot(plot_dat, aes(x=Values, fill=Range)) + geom_histogram(binwidth = 5) + coord_flip() + ggtitle("Sample Histogram")
Output
I just had to add the marker function inside add_histogram. That way, I only change the color of each histogram added.
plot_ly(alpha = 0.7, orientation = 'h') %>%
add_histogram(y = more80$EDAD, name = "More than 80", marker = list(color ='rgba(31,119,180,1)')) %>%
add_histogram(y = between60n80$EDAD, name = "Between 60 and 79", marker = list(color ='rgba(105,122,125,1)')) %>%
Thank you for your answers!

Problems with Shiny, arguments must be of same length

I have, once again, run into some problems with Shiny. When I run my app I get the following message:
Warning: Error in tapply: arguments must have same length
[No stack trace available]
Still quite new to R, I dont understand why I get this warning. I've been randomly experiement with aes_string instead of aes in the graph, but I can't find any workable solution. I would be so grateful if someone could help me.
library(shiny)
library(tidyverse)
df_bransch <- data_frame(
kommun = c("Bjuv", "Bjuv", "Bjuv", "Bromölla", "Bromölla", "Bromölla", "Båstad", "Båstad", "Båstad", "Helsingborg", "Helsingborg", "Helsingborg"),
bransch = c("Besöksnäring", "Byggnadsmaterial", "Fastigheter", "Besöksnäring", "Byggnadsmaterial", "Fastigheter",
"Besöksnäring", "Byggnadsmaterial", "Fastigheter", "Besöksnäring", "Byggnadsmaterial", "Fastigheter"),
Anställda = c(46, 369, 36, 57, 40, 36, 525, 5, 1133, 2392, 195, 1042),
Förädlingsvärde = c(20724, 579892, 91406, 26097, 136440, 51731, 252891, 3852, 1343391, 1257333, 176595, 5017640))
ui <- fluidPage(
navbarPage(title = "TEST", id = "nav",
tabPanel("Branschstruktur",
sidebarLayout(
sidebarPanel(selectInput("kom", "Kommun", choices = unique(df_bransch$kommun), selected = "Malmö"),
varSelectInput("var", "Variabel", df_bransch[c(3,4)])),
mainPanel(plotOutput("plot"))),
tabPanel("Utveckling"))))
server <- function(input, output, session) {
df <- reactive({df_bransch %>%
req(input$var, input$kom) %>%
filter(kommun == input$kom)
})
output$plot <- renderPlot({
ggplot(df(), aes(x = reorder(bransch, input$var), y = input$var)) +
geom_bar(position = "dodge", stat = "identity") +
labs(title = paste0("Branschstruktur, ", input$kom, " år 2018"),
subtitle = paste0("Variabel: ", input$var),
caption = "Källa: Bisnode") +
coord_flip()
})
}
shinyApp(ui, server)
Please make the following change:
ggplot(df(), aes(x = reorder(bransch, df()[[input$var]]), y = df()[[input$var]]))
Also, add to labs() the code y = "Bransch", x = as.name(input$var), to give proper labels on x and y axis. Then you will get the output as:

Plotting every three rows from data frame

I would like to make some plots from my data. Unfortunately, it is hard to predict how many plots I will generate because it depends on data and may be different. It is a reason why I would like to make it easy adjustable. However, it will be most often a plot from group of 3 rows each time.
So, I would like to plot from rows 1:3, 4-6,7-9, etc.
This is data:
> dput(DF_final)
structure(list(AC = c(0.0031682160632777, 0.00228591145206846,
0.00142094444568728, 0.000661218113472149, 0.0010078157353918,
0.000400289437089513, 40.4634784175177, 40.5055070858594, 0.0183737773741582
), SD = c(0.00250647379467532, 0.0013244185401148, 0.000469332241199189,
0.000294558308707343, 0.000385553400676202, 0.000104447914881357,
11.0693842400794, 8.78768774254084, 0.00696532251341454), ln_AC = c(-5.75458660556339,
-6.08099044923792, -6.556433525855, -7.32142679754668, -6.89996992823399,
-7.8233226797995, 3.70039979980691, 3.70143794229703, -3.99683077355773
), ln_SD = c(-5.98887837626238, -6.62678175351058, -7.66419963690747,
-8.13003358225542, -7.86083085139947, -9.16682203300101, 2.40418312097106,
2.17335162163583, -4.96681136795312), Percent_AC = c(126.401324043689,
172.597361244303, 302.758754023937, 224.477834753288, 261.394591157605,
383.243109777925, 365.544076706723, 460.934756361151, 263.789326894369
), Percent_SD = c(100, 100, 100, 100, 100, 100, 100, 100, 100
), TP = c(0, 40, 80, 0, 40, 80, 0, 40, 80)), row.names = c("Tim_0",
"Tim_40", "Tim_80", "Jack_0", "Jack_40", "Jack_80", "Tom_0",
"Tom_40", "Tom_80"), class = "data.frame")
Column ln_AC should be set as an Y axis and column TP as X axis. First of all I would like to have all of them on separate graphs next to each other (remember about issue that the number of plots may be igh at some point) and if possible everything at the same graph. It should be a point plot with trend line.
Is it also possible to get a slope, SD slope, R^2 on a plot from linear regression ?
I manage to do it a for a single plot but regression line looks strange...
The code below was used to generate this plot and regression line.
fit <- lm(DF_final$ln_AC~DF_final$TP, data=DF_final)
plot(DF_final[1:3,7], DF_final[1:3,3], type = "p", ylim = c(-10,0), xlim=c(0,100), col = "red")
lines(DF_final$TP, fitted(fit), col="blue")
In base R (without so many packages), you can do:
# splits every 3 rows
DF = split(DF_final,gsub("_[^ ]*","",rownames(DF_final) ))
# you can also do
# DF = split(DF_final,(1:nrow(DF_final) - 1) %/%3 ))
To store your values:
slopes = vector("numeric",3)
names(slopes) = names(DF)
rsq = vector("numeric",3)
names(rsq) = names(DF)
To plot:
par(mfrow=c(1,3))
for(i in names(DF)){
fit <- lm(ln_AC~TP, data=DF[[i]])
plot(DF[[i]]$TP, DF[[i]]$ln_AC, type = "p", col = "red",main=i)
abline(fit, col="blue")
slopes[i]=round(fit$coefficients[2],digits=2)
rsq[i]=round(summary(fit)$r.squared,digits=2)
mtext(side=1,paste("slope=",slopes[i],"\nrsq=",rsq[i]),
padj=-2,cex=0.7)
}
And your values:
slopes
Jack Tim Tom
-0.01 -0.01 -0.10
rsq
Jack Tim Tom
0.29 0.99 0.75
If I understand correctly, the reason you want 3 observation per graph is because you have different individuals (Jack,Tim,Tom) . Is that so?
If you don't want to worry about that number, you can do this
# move rownames to column
data$person <- rownames(data)
data$person <- gsub("\\_.*","",data$person) # remove TP from names
# better to use library(data.table) for this step
data <- melt(data,id.vars=c("person","TP","ln_AC"))
ggplot(data,aes(x=TP, y=ln_AC)) + geom_point() +
geom_smooth(method = "lm") + facet_grid(~person)
This results in a plot like #giocomai, but it will work also if you have 4,5,6 or whatever persons in your data.
---- Edit
If you want to add R2 values, you can do something like this. Note, that it may not be the best and elegant solution, but it works.
data <- data.frame(...)
data$person <- rownames(data)
data$person <- gsub("\\_.*","",data$person)
# run lm for all persons and save them in a data.frame
nomi <- unique(data$person)
#lmStats <- data.frame()
lmStats <- sapply(nomi,
function(ita){
model <- lm(ln_AC~TP,data= data[which(data$person == ita),])
lmStat <- summary(model)
# I only save r2, but you can get all the statistics you need
lmRow <- data.frame("r2" = lmStat$r.squared )
#lmStats <- rbind(lmStats,lmRow)
}
)
lmStats <- do.call(rbind,lmStats)
# format the output,and create a dataframe we will use to annotate facet_grid
lmStats <- as.data.frame(lmStats)
rownames(lmStats) <- gsub("\\..*","",rownames(lmStats))
lmStats$person <- rownames(lmStats)
colnames(lmStats)[1] <- "r2"
lmStats$r2 <- round(lmStats$r2,2)
lmStats$TP <- 40
lmStats$ln_AC <- 0
lmStats$lab <- paste0("r2= ",lmStats$r2)
# melt and add r2 column to the data (not necessary, but I like to have everything I plot in teh data)
data <- melt(data,id.vars=c("person","TP","ln_AC"))
data$r2 <- lmStats[match(data$person,rownames(lmStats)),1]
ggplot(data,aes(x=TP, y=ln_AC)) + geom_point() +
geom_smooth(method = "lm") + facet_grid(~person) +
geom_text(data=lmStats,label=lmStats$lab)
An easier way (less steps) would be to use facet_grid(~r2), so that you have the R.square value in the title.
If I understand correctly what you mean, assuming you will always have three observation per graph, your main issue would be creating a categorical variable to separate them. Here's one way to accomplish it. Depending on the layout you prefer, you may want to check facet_wrap instead of facet_grid.
library("dplyr")
library("ggplot2")
DF_final <- structure(list(AC = c(0.0031682160632777, 0.00228591145206846,
0.00142094444568728, 0.000661218113472149, 0.0010078157353918,
0.000400289437089513, 40.4634784175177, 40.5055070858594, 0.0183737773741582
), SD = c(0.00250647379467532, 0.0013244185401148, 0.000469332241199189,
0.000294558308707343, 0.000385553400676202, 0.000104447914881357,
11.0693842400794, 8.78768774254084, 0.00696532251341454), ln_AC = c(-5.75458660556339,
-6.08099044923792, -6.556433525855, -7.32142679754668, -6.89996992823399,
-7.8233226797995, 3.70039979980691, 3.70143794229703, -3.99683077355773
), ln_SD = c(-5.98887837626238, -6.62678175351058, -7.66419963690747,
-8.13003358225542, -7.86083085139947, -9.16682203300101, 2.40418312097106,
2.17335162163583, -4.96681136795312), Percent_AC = c(126.401324043689,
172.597361244303, 302.758754023937, 224.477834753288, 261.394591157605,
383.243109777925, 365.544076706723, 460.934756361151, 263.789326894369
), Percent_SD = c(100, 100, 100, 100, 100, 100, 100, 100, 100
), TP = c(0, 40, 80, 0, 40, 80, 0, 40, 80)), row.names = c("Tim_0",
"Tim_40", "Tim_80", "Jack_0", "Jack_40", "Jack_80", "Tom_0",
"Tom_40", "Tom_80"), class = "data.frame")
DF_final %>%
mutate(id = as.character(sapply(1:(nrow(DF_final)/3), rep, 3))) %>%
ggplot(aes(x=TP, y=ln_AC)) +
geom_point() +
geom_smooth(method = "lm") +
facet_grid(~id)
Created on 2020-02-06 by the reprex package (v0.3.0)

R 3.2.1 incorrect mapping of color

This is based on R 3.2.1, reverse colors on map
I have two data points, one is more than 66%, which should be green, other is less than 33%, which should be red.
However, the less than 33% is orange.
Below is the code, which looks correct (but something is wrong)
sep <- read.csv("Out_SEP_assets_csv.csv")
Sub1 <- sep[grep("SEP.12", names(sep))]
sep$newCol <- 100*rowSums(Sub1)/rowSums(sep[4:7])
# create a new grouping variable
Percent_SEP12_Assets <- ifelse(sep[,8] <= 33, "Less than 33%", ifelse(sep[,8] >= 66, "More than 66%", "Between 33% and 66%"))
Percent_SEP12_Assets <- factor(Percent_SEP12_Assets,
levels = c("More than 66%", "Between 33% and 66%", "Less than 33%"))
# get the map
bbox <- make_bbox(sep$Longitude, sep$Latitude, f = 1)
map <- get_map(bbox)
# plot the map and use the grouping variable for the fill inside the aes
ggmap(map) +
geom_point(data=sep, aes(x = Longitude, y = Latitude, color=Percent_SEP12_Assets ), size=9, alpha=0.6) +
scale_color_manual(values=c("green","orange","red"))
The dput(sep) is
structure(list(School = structure(1:2, .Label = c("Out of City\\00L001",
"Out of City\\O308"), class = "factor"), Latitude = c(40.821367,
41.310426), Longitude = c(-73.488313, -73.837612), Windows.SEP.11 = c(4L,
69L), Mac.SEP.11 = 0:1, Windows.SEP.12 = c(3L, 26L), Mac.SEP.12 = c(16L,
1L), newCol = c(82.6086956521739, 27.8350515463918)), .Names = c("School",
"Latitude", "Longitude", "Windows.SEP.11", "Mac.SEP.11", "Windows.SEP.12",
"Mac.SEP.12", "newCol"), row.names = c(NA, -2L), class = "data.frame")
Output is this (incorrect circled in red) ........ How to fix?
Responses
Coordinates are correct, I am asking why is the point incorrectly colored. I thought this logic is correct
Percent_SEP12_Assets <- ifelse(sep[,8] <= 33, "Less than 33%", ifelse(sep[,8] >= 66, "More than 66%", "Between 33% and 66%"))
Updated code
I tried this per #bondeded user and resulting map is same as before
sep <- read.csv("Out_SEP_assets_csv.csv")
Sub1 <- sep[grep("SEP.12", names(sep))]
sep$newCol <- 100*rowSums(Sub1)/rowSums(sep[4:7])
# create a new grouping variable
sep$Percent_SEP12_Assets <- ifelse(sep[,8] <= 33, "Less than 33%", ifelse(sep[,8] >= 66, "More than 66%", "Between 33% and 66%"))
sep$Percent_SEP12_Assets <- factor(sep$Percent_SEP12_Assets,
levels = c("More than 66%", "Between 33% and 66%", "Less than 33%"))
# get the map
bbox <- make_bbox(sep$Longitude, sep$Latitude, f = 1)
map <- get_map(bbox)
# plot the map and use the grouping variable for the fill inside the aes
ggmap(map) +
geom_point(data=sep, aes(x = Longitude, y = Latitude, color=sep$Percent_SEP12_Assets ), size=9, alpha=0.6) +
scale_color_manual(values=c("green","orange","red"))
Actual CSV
Here is actual CSV, two rows
School Latitude Longitude Windows-SEP-11 Mac-SEP-11 Windows-SEP-12 Mac-SEP-12
Out of City\00L001 40.821367 -73.488313 4 0 3 16
Out of City\O308 41.310426 -73.837612 69 1 26 1
The problem is that by default ggplot2 drops unused levels from factors. There are two options:
Specify drop = FALSE
ggmap(map) +
geom_point(data=sep, aes(x = Longitude, y = Latitude, color=sep$Percent_SEP12_Assets ), size=9, alpha=0.6) +
scale_color_manual(values=c("green","orange","red"), drop = FALSE)
Specify the values for each level:
ggmap(map) +
geom_point(data=sep, aes(x = Longitude, y = Latitude, color=sep$Percent_SEP12_Assets ), size=9, alpha=0.6) +
scale_color_manual(values=c(`More than 66%` = "green", `Between 33% and 66%` = "orange", `Less than 33%` = "red"))
Clearly you could also do both.
Now I got what you meant. The problem is in you ifelse structure. Maybe this can help:
ifelse(sep[,8] <= 33, "Less than 33%", ifelse(sep[,8] >= 66, "More than 66%", "Between 33% and 66%"))
[1] "More than 66%" "Less than 33%"

Resources