The advance of my code is (MWE) :
# https://www.kaggle.com/kaggle/kaggle-survey-2017/data
#### Analisis primario del dataset ####
response <- read.csv(file = "multipleChoiceResponses.csv",na.strings = "")
# seleccionamos solo algunas variables :
Variables <- c("GenderSelect","Country","Age","CurrentJobTitleSelect","MLToolNextYearSelect","LanguageRecommendationSelect","FormalEducation",
"FirstTrainingSelect","EmployerIndustry")
# Mantenemos en memoria solo las variables seleecionadas :
response <- response[,Variables]
# Por un tema de cantidades solo nos quedamos con M y F
Response <- response[response$GenderSelect == "Male" | response$GenderSelect == "Female",]
# agrego una columna para los continenetes (continent) a donde pertenecen los paises (Country)
library(countrycode)
Response$continent <- countrycode(sourcevar = Response[, "Country"],
origin = "country.name",
destination = "continent")
# Convertimos a factor esta nueva variable
Response$continent <- as.factor(Response$continent)
# Eliminamos las filas con elementos NA
Response <- Response[complete.cases(Response), ]
# Enumeramos todas las filas de manera adecuada
rownames(Response) <- 1:nrow(Response)
Response <- droplevels(Response)
bp_Continent <- barplot(table(Response$continent),
main = "Distribucion de DS por continentes",
ylim = c(0,3500)
)
# Add GenderSelect proportion by continent in label argument ("BLABLABLA")
text(x = bp_Continent, y = table(Response$continent), label = "BLABLABLA", pos = 3, cex = 0.8, col = "red")
Basically, the script loads the data, chooses some of the variables, creates a new variable (continent), to finally clean the data. The next thing to do is create a barplot, placing the proportion of men and women on top of the bars
What I am looking to do is change the "BLABLABLA" to the proportion between men and women (GenderSelect variable) by continent.
My question is not at all similar to :
How to display the frequency at the top of each factor in a barplot in R
Because what interests me is the calculation of the proportion and the impression above the bars.
After reading Rui's answer,I thought of another solution .
first a function to calculate the ratio of men and women (by continent) and then sapply .
CreaEtiq <- function(conti){
NumHContin <- dim(Response[Response$GenderSelect=="Male" & Response$continent==conti,])[1]
NumMACntin <- dim(Response[Response$GenderSelect=="Female" & Response$continent==conti,])[1]
return(round(NumHContin/NumMACntin,2))
}
EtiquetaBarPlot <- sapply(levels(Response$continent),CreaEtiq)
And to finish:
bp_Continent <- barplot(table(Response$continent),
main = "Distribucion de DS por continentes",
ylim = c(0,3500)
)
text(x = bp_Continent, y= table(Response$continent),
label = paste("H/M = ", EtiquetaBarPlot) ,
pos = 3, cex = 0.8, col = "red")
obtaining the following graph
The code below uses a made up data set, created in the end.
Once the proportions computed, all it is needed is to pass them function text, argument label.
Compute the proportions.
tbl <- table(Response$continent)
xt <- xtabs( ~ GenderSelect + continent, Response)
prop <- sweep(xt, 2, tbl, `/`)
Now plot the bars. The labels are the proportions of "Male".
bp_Continent <- barplot(tbl,
main = "Distribucion de DS por continentes",
ylim = c(0, 3500)
)
text(x = bp_Continent, y = tbl,
label = round(prop[2, ], 2),
pos = 3, cex = 0.8, col = "red")
Other labels could be, for instance, these:
sprintf("F: %1.2f/M: %1.2f", prop[1,], prop[2,])
Data creation code.
set.seed(1234)
n <- 5e3
GenderSelect <- c("Male", "Female")
GenderSelect <- sample(GenderSelect, n, TRUE)
continent <- c("Africa", "Americas", "Asia", "Europa", "Oceania")
continent <- sample(continent, n, TRUE, prob = c(1, 20, 14, 16, 2))
Response <- data.frame(GenderSelect, continent)
Related
Error in RStudio barplot
I work for a research lab and last worker left without explaining how her RStudio programs work, I have to adapt one to get the data of participants in the month of September, but I get 2 errors, it should be an easy fix but I can't see it.
I have the parameter days:
dias <- c("01","02","05","06","07","08","09","12","13","14","16","19","20","21","22","23","26","27","28","29","30")
and a part that reads data from an excel and then makes a barplot but I get the error:
Error in barplot.default(Volun_citados_dias_mes, ylim = c(0, 100), names.arg = dias, :
incorrect number of names
In this part for the barplot:
jpeg("x.jpeg", width = 800, height = 800)
text(barplot(Volun_citados_dias_mes, ylim = c(0, 100), names.arg = dias, cex.names = 0.9, col = "khaki", xlab = "Días",
ylab = "Voluntarios", main = "Número de voluntarios citados en Septiembre de 2022"),
Volun_citados_dias_mes + 4, labels = round(Volun_citados_dias_mes, 1))
dev.off()
And here when putting that data on an excel i get this other error:
Error in data.frame(dias, Volun_citadosSep) :
arguments imply differing number of rows: 21, 2
write.xlsx(data.frame(dias, Volun_citadosSep), "X:/Cohorte Cantabria/2. PROYECTO/INFORMES/Exito reclutamiento/Asistencia/Septiembre/Tablas/VolunCitadosDiasMes.xlsx", row.names = FALSE)
on.exit(par(opar))
dev.off()
I tried changing that 4 (Volun_citados_dias_mes + 4) to a 21... but not really sure what else to change, i was expecting a graph with at least 2 days added 01 and 02 that are the ones added by now, I prefer fixing this before adding 20 more.
CODE resumed:
#"..." is parts that are omitted
#libraries
library(readxl)
library("stringr")
library("xlsx")
dir <- "**...**"
#days
dias <- c("01","02","05","06","07","08","09","12","13","14","16","19","20","21","22","23","26","27","28","29","30")
#Redcap
df <- file.info(list.files("**...**", pattern = "**...**", full.names = TRUE))
lastFileCallcenter <- rownames(df)[which.max(df$mtime)]
ccList <- read.csv2(lastFileCallcenter, header = TRUE, sep = ",", quote = "\"",
dec = ",", stringsAsFactors = FALSE, encoding = "UTF-8")
colnames(ccList)[[which(str_detect(colnames(ccList), "codigo_registro"))]] <- "codigo_registro"
ccList <- unified_df(ccList)
ccList$codigo_registro <- toupper(ccList$codigo_registro)
ccList$dni_nie <- toupper(ccList$dni_nie)
septiembre_22 <- read.csv2(file.path(dir, "**...**.csv"), header = TRUE, sep = ",",
quote = "\"", dec = ",", stringsAsFactors = FALSE, encoding = "UTF-8")
colnames(septiembre_22)[[which(str_detect(colnames(septiembre_22), "codigo_registro"))]] <- "codigo_registro"
septiembre_22 <- unified_df(septiembre_22)
septiembre_22$codigo_registro <- toupper(septiembre_22$codigo_registro)
septiembre_22$dni_nie <- toupper(septiembre_22$dni_nie)
#-------------------Volunteers days of the month ---------------------------------------------------
septiembre_22_01 <- dim(septiembre_22[which(septiembre_22$fecha_extraccion == "01/09/2022"),])[1] + dim(septiembre_22[which(septiembre_22$fecha_extraccion_vt == "01/09/2022"),])[1]
septiembre_22_02 <- dim(septiembre_22[which(septiembre_22$fecha_extraccion == "02/09/2022"),])[1] + dim(septiembre_22[which(septiembre_22$fecha_extraccion_vt == "02/09/2022"),])[1]
**#...TODO, other days**
Volun_citadosSep <- c(septiembre_22_01, septiembre_22_02,**#...TODO**)
jpeg("x.jpeg", width = 800, height = 800)
text(barplot(Volun_citadosSep, ylim = c(0, 100), names.arg = dias, cex.names = 0.9, col = "khaki", xlab = "Días",
ylab = "Voluntarios", main = "Número de voluntarios citados en Septiembre de 2022"),
Volun_citadosSep + 4, labels = round(Volun_citadosSep, 1))
dev.off()
write.xlsx(data.frame(dias, Volun_citadosSep), "VolunCitadosDiasMes.xlsx", row.names = FALSE)
on.exit(par(opar))
dev.off()
I have several measured values from different sources, I want to put an upper and lower limit for a given Median of a single test ID. I have different tests grouped together as you see in the picture I have several so to say, each test have about 5 sources and each source has 3 Measured values. therefore I have put boxplots for each source over its data and had all the tests with the boxplots of the different sources grouped in one source. my problem starts when I want to put a z score limit over the data just one z score per test is registerd but i would rather have a certain line limit over all the boxplots and not have just single points where they are all connected ( see the pic )
here is my code without the data
## Libraries call
library(readxl)
require(tidyverse)
require(rlang)
library(dplyr)
require(tidyr)
require(stringr)
require(plotly)
require(ggplot2)
require(matrixStats)
require(openxlsx)
############################
# source comparision Functions
############################
# Mean und Median bauen
df$Mean = rowMeans(as.matrix(df[,c(6,7,8)]),na.rm = TRUE)
df$Median = rowMedians(as.matrix(df[,c(6,7,8)]),na.rm = TRUE)
# summarize for TestID
df_sum <-df%>%
group_by(TestID)%>%
summarise(Mean=mean(Mean)
,Max=max(Mean)
,Min=min(Mean)
,Median=median(Median)
,Std=sd(Mean)
,Mad=mad(Mean)
,z_limit_std=2*Std
,z_limit_mad=2*Mad
)
# Merge von summary und DLG Daten
df_Median<- df[,c('TestID','Median')]
df_sum_Median <- df_Median%>% group_by(TestID)%>% summarise(Median=median(Median))
df = merge(x = df, y = df_sum, by = "TestID")
############################
#Box Plot
############################
Plot_Data_df <- data.frame(df$TestID
,df$`measured_value 1`
,df$`measured_value 2`
,df$`measured_value 3`
,df$Median.y
,df$z_limit_std)
# Daten in einem String umformen und die measured_valuee mit subset Daten mit NA
dfboxplot <- data.frame(TestID = rep(paste0(Plot_Data_df$df.TestID, '_Test'), 3)
,measured_value = c(Plot_Data_df$df..measured_value.1.,
Plot_Data_df$df..measured_value.2.,
Plot_Data_df$df..measured_value.3.)
,Median = rep(Plot_Data_df$df.Median.y, 3)
,z_limit = rep(Plot_Data_df$df.z_limit_std, 3)
)
dfboxplot$lower_limit <- dfboxplot$Median - dfboxplot$z_limit
dfboxplot$upper_limit <- dfboxplot$Median + dfboxplot$z_limit
plot <-plot_ly(dfboxplot, x = ~TestID, y = ~measured_value , color = ~Lab, type = "box",inherit=FALSE) %>%
layout(boxmode = "group",
xaxis = list(title='Test ID'),
yaxis = list(title= ' measured_value'))%>%
plotly::add_lines(data = dfboxplot # lower limit einführen
,y= ~Median
,x= ~TestID
,type = 'scatter'
,mode = 'lines'
,showlegend = FALSE
,line = list(color = 'rgb(0, 0, 0)',
width = 1)
,name = 'Median'
)%>% plotly::add_lines(data = dfboxplot # lower limit einführen
,y= ~upper_limit
,x= ~TestID
,type = 'scatter'
,mode = 'lines'
,showlegend = FALSE
,line = list(color = 'rgb(200, 0, 0)',
width = 1)
,name = 'upper limit'
)%>%
#
plot
I'm working on a Bubble map where I generated two columns, one for a color id (column Color) and one for a text refering to the id (column Class). This is a classification of my individuals (Color always belongs to Class).
Class is a factor following a certain order that I made with :
COME1039$Class <- as.factor(COME1039$Class, levels = c('moins de 100 000 F.CFP',
'entre 100 000 et 5 millions F.CFP',
'entre 5 millions et 1 milliard F.CFP',
'entre 1 milliard et 20 milliards F.CFP',
'plus de 20 milliards F.CFP'))
This is my code
g <- list(
scope = 'world',
visible = F,
showland = TRUE,
landcolor = toRGB("#EAECEE"),
showcountries = T,
countrycolor = toRGB("#D6DBDF"),
showocean = T,
oceancolor = toRGB("#808B96")
)
COM.g1 <- plot_geo(data = COME1039,
sizes = c(1, 700))
COM.g1 <- COM.g1 %>% add_markers(
x = ~LONGITUDE,
y = ~LATITUDE,
name = ~Class,
size = ~`Poids Imports`,
color = ~Color,
colors=c(ispfPalette[c(1,2,3,7,6)]),
text=sprintf("<b>%s</b> <br>Poids imports: %s tonnes<br>Valeur imports: %s millions de F.CFP",
COME1039$NomISO,
formatC(COME1039$`Poids Imports`/1000,
small.interval = ",",
digits = 1,
big.mark = " ",
decimal.mark = ",",
format = "f"),
formatC(COME1039$`Valeur Imports`/1000000,
small.interval = ",",
digits = 1,
big.mark = " ",
decimal.mark = ",",
format = "f")),
hovertemplate = "%{text}<extra></extra>"
)
COM.g1 <- COM.g1%>% layout(geo=g)
COM.g1 <- COM.g1%>% layout(dragmode=F)
COM.g1 <- COM.g1 %>% layout(showlegend=T)
COM.g1 <- COM.g1 %>% layout(legend = list(title=list(text='Valeurs des importations<br>'),
orientation = "h",
itemsizing='constant',
x=0,
y=0)) %>% hide_colorbar()
COM.g1
Unfortunately my data are too big to be added here, but this is the output I get :
As you can see, the order of the legend is not the one of the factor levels. How to get it ? If data are mandatory to help you to give me a hint, I will try to limit their size.
Many thanks !
Plotly is going to alphabetize your legend and you have to 'make' it listen. The order of the traces in your plot is the order in which the items appear in your legend. So if you rearrange the traces in the object, you'll rearrange the legend.
I don't have your data, so I used some data from rnaturalearth.
First I created a plot, using plot_geo. Then I used plotly_build() to make sure I had the trace order in the Plotly object. I used lapply to investigate the current order of the traces. Then I created a new order, rearranged the traces, and plotted it again.
The initial plot and build.
library(tidyverse)
library(plotly)
library(rnaturalearth)
canada <- ne_states(country = "Canada", returnclass = "SF")
x = plot_geo(canada, sizes = c(1, 700)) %>%
add_markers(x = ~longitude, y = ~latitude,
name = ~name, color = ~name)
x <- plotly_build(x) # capture all elements of the object
Now for the investigation; this is more so you can see how this all comes together.
# what order are they in?
y = vector()
invisible(
lapply(1:length(x$x$data),
function(i) {
z <- x$x$data[[i]]$name
message(i, " ", z)
})
)
# 1 Alberta
# 2 British Columbia
# 3 Manitoba
# 4 New Brunswick
# 5 Newfoundland and Labrador
# 6 Northwest Territories
# 7 Nova Scotia
# 8 Nunavut
# 9 Ontario
# 10 Prince Edward Island
# 11 Québec
# 12 Saskatchewan
# 13 Yukon
In your question, you show that you made the legend element a factor. That's what I've done as well with this data.
can2 = canada %>%
mutate(name = ordered(name,
levels = c("Manitoba", "New Brunswick",
"Newfoundland and Labrador",
"Northwest Territories",
"Alberta", "British Columbia",
"Nova Scotia", "Nunavut",
"Ontario", "Prince Edward Island",
"Québec", "Saskatchewan", "Yukon")))
I used the data to reorder the traces in my Plotly object. This creates a vector. It starts with the levels and their row number or order (1:13). Then I alphabetized the data by the levels (so it matches the current order in the Plotly object).
The output of this set of function calls is a vector of numbers (i.e., 5, 6, 1, etc.). Since I have 13 names, I have 1:13. You could always make it dynamic, as well 1:length(levels(can2$name).
# capture order
df1 = data.frame(who = levels(can2$name), ord = 1:13) %>%
arrange(who) %>% select(ord) %>% unlist()
Now all that's left is to rearrange the object traces and visualize it.
x$x$data = x$x$data[order(c(df1))] # reorder the traces
x # visualize
Originally:
With reordered traces:
This question already has an answer here:
How should I deal with "'someFunction' is not an exported object from 'namespace:somePackage'" error? [closed]
(1 answer)
Closed 2 years ago.
I have some code showing this error, but, I haven't called "overlay", maybe it's a library function that is calling it
Code:
d.mle=likfit(P, ini.cov.pars = c(1,30), cov.model = 'matern', kappa = 0.5)
d.mle
Xb = c(1, size, size, 1)
Yb = c(1, 1, size, size)
bordas = cbind(Xb, Yb)
polygon(bordas)
Ap = matrix(apply(bordas, 2, range))
gr <- expand.grid(x = seq(Ap[1, ], Ap[2, ], by = 1), y = seq(Ap[3, ], Ap[4, ], by = 1))
require(splancs)
gi <- polygrid(gr, borders = bordas) # delimita a area para interpolação
points(gi, pch = "+", col = 2)
KC = krige.control(obj = d.mle, type.krige = "ok", lam = 1)
d.k = krige.conv(P, loc = gr, krige = KC) #Realiza a interpolação por krigagem
valores_preditos = d.k$predict
Ze = matrix(valores_preditos, size, size) # Transforma os valores preditos em matriz
plot(Ze)
plot(image(X, Y, Ze, col = gray((0 : 4) / 4), breaks = c(a., b., c., d., e., f.)))
If you do this:
??overlay
... you should get a list of all the functions in packages that mention the word "overlay". When I do it, I see two functions with that name but I strongly suspect that it is the raster-package's version that is expected by the code you are using. So do this:
install.packages('raster')
library(raster)
#re-run code
I just discovered rChart and googleVis and i want to thank developers for their job.
My problem is simple, i want to add a variable label to my axis for nplot?
I also want to know if it's possible to add variable like sizevar and colorvar of gvisBubbleChart to nplot ?
Thank you.
library(rCharts)
VehiculeFunction <- function(data, gamme, absciss, ordinate){
# Aim: Permet de visualiser les données sous la forme de nuages de points
# croisant GMF*Coût, Ratio K * Ratio Coût ou bien GMF*Ratio en
# choisissant la gamme qu'on désire
# Input: data.frame avec notamment GAMME, PROJET, PERIMETRE, NITG, GMF.24,
# Cout.24 et libele
# Output: Graphique avec le croisement choisi ainsi que le libellé étiquetté
# sur le point qu'on voudra identifier
if(absciss == "GMF.24"){
my.data <- data[data$RANG_NITG_PROJET_K %in% c(1, 2, 3),]
} else if(absciss == "Ratio.K") {
my.data <- data[data$RANG_NITG_PROJET_C %in% c(1, 2, 3),]
}
my.data2 <- my.data[my.data$GAMME == gamme,]
X <- my.data2[[absciss]]
Y <- my.data2[[ordinate]]
SIZEVAR <- my.data2$Ratio.K
df <- data.frame(X,Y,SIZEVAR)
plot <- nPlot(x = "X", y = "Y", size = "SIZEVAR", data = df, type = "scatterChart")
plot$xAxis(axisLabel = 'X')
plot
}
VehiculeFunction(data.vehicule, gamme = "M1", "GMF.24", "Cout.24")
Usage:
gvisBubbleChart(data, idvar = "", xvar = "", yvar = "",
colorvar = "", sizevar = "",
options = list(), chartid)
E.G.
## Set color and size
bubble2 <- gvisBubbleChart(Fruits, idvar="Fruit", xvar="Sales", yvar="Expenses",
colorvar="Location", sizevar="Profit",
options=list(hAxis='{minValue:75, maxValue:125}'))
SEE http://www.inside-r.org/packages/cran/googleVis/docs/gvisBubbleChart