I'm trying to build a grouped bar chart in R. I have pasted the dataframe below. I have been using plotly to build the chart. The problem is, the numbers on Y axis are not proper, as in they do not increase in ascending order. I've also posted an image of graph formed.
Can someone please point out, where I'm going wrong?
Dataframe
chart.supp.part.defect.matrix
Supplier PaintMarking45 Seal78 AirConditioning57 Engine34 CargoCompartment543 Insulation11
1 HJRU 8 <NA> <NA> 1 <NA> <NA>
2 DJDU <NA> 1 <NA> <NA> <NA> <NA>
3 DEF7 <NA> 3 54 <NA> <NA> <NA>
4 A23 <NA> <NA> <NA> 7 <NA> <NA>
5 A52 3 <NA> <NA> <NA> 2 <NA>
6 FJUE 65 <NA> 1 <NA> <NA> 11
7 A31 <NA> 1 5 <NA> <NA> <NA>
8 DJHD <NA> <NA> <NA> <NA> <NA> <NA>
9 A38 4 <NA> 22 <NA> <NA> <NA>
Code to build chart
title <- paste( "Supplier vs Defect")
p3 <- plot_ly(chart.supp.part.defect.matrix, x = ~Supplier, y = ~PaintMarking45, type = 'bar', name = 'Paint/Marking-45') %>%
add_trace(y = ~Seal78,name = 'Seal-78') %>%
add_trace(y = ~AirConditioning57,name = 'Air conditioning - 57') %>%
add_trace(y = ~Engine34,name = 'Engine-34') %>%
add_trace(y = ~CargoCompartment543,name = 'Cargo compartment-543') %>%
add_trace(y = ~Insulation11 ,name = 'Insulation -11') %>%
add_trace(y = ~Insulation6,name = 'Insulation-6') %>%
add_trace(y = ~Engine11,name = 'Engine-11') %>%
add_trace(y = ~Propulsion32,name = 'Propulsion-32') %>%
layout(yaxis = list(title = 'Defect Count'), barmode = 'group') %>%
layout(title = title)
ggplotly(p3)
Chart
Edit
dput(chart.supp.part.defect.matrix)
structure(list(Supplier = structure(c(9L, 6L, 5L, 1L, 4L, 8L,
2L, 7L, 3L), .Label = c(" A23", " A31", " A38", " A52", " DEF7",
"DJDU", "DJHD", "FJUE", "HJRU"), class = "factor"), PaintMarking45 = structure(c(4L,
NA, NA, NA, 1L, 3L, NA, NA, 2L), .Label = c("3", "4", "65", "8"
), class = "factor"), Seal78 = structure(c(NA, 1L, 2L, NA, NA,
NA, 1L, NA, NA), .Label = c("1", "3"), class = "factor"), AirConditioning57 = structure(c(NA,
NA, 4L, NA, NA, 1L, 3L, NA, 2L), .Label = c("1", "22", "5", "54"
), class = "factor"), Engine34 = structure(c(1L, NA, NA, 2L,
NA, NA, NA, NA, NA), .Label = c("1", "7"), class = "factor"),
CargoCompartment543 = structure(c(NA, NA, NA, NA, 1L, NA,
NA, NA, NA), .Label = "2", class = "factor"), Insulation11 = structure(c(NA,
NA, NA, NA, NA, 1L, NA, NA, NA), .Label = "11", class = "factor"),
Insulation6 = structure(c(NA, NA, NA, NA, NA, NA, 1L, NA,
NA), .Label = "7", class = "factor"), Engine11 = structure(c(NA,
NA, NA, NA, NA, NA, 2L, 1L, NA), .Label = c("54", "8"), class = "factor"),
Propulsion32 = structure(c(NA, NA, NA, NA, NA, NA, NA, NA,
1L), .Label = "2", class = "factor")), .Names = c("Supplier",
"PaintMarking45", "Seal78", "AirConditioning57", "Engine34",
"CargoCompartment543", "Insulation11", "Insulation6", "Engine11",
"Propulsion32"), row.names = c(NA, -9L), class = "data.frame")
In addition to Adam Spannbauer's approach you can also force Plotly to interpret the data as numbers by setting the yaxis type to linear
layout(yaxis=list(type='linear'))
as #neilfws mentioned in a comment the issue is that your y access data is being built off of factors. You can attempt to fix this on your data read (as #neilfws mentioned) or coerce your data to numeric before plotting. Below is how you can do the latter.
chart.supp.part.defect.matrix[,2:10] <- lapply(chart.supp.part.defect.matrix[,2:10], as.numeric)
p3 <- plot_ly(chart.supp.part.defect.matrix, x = ~Supplier, y = ~PaintMarking45, type = 'bar', name = 'Paint/Marking-45') %>%
add_trace(y = ~Seal78,name = 'Seal-78') %>%
add_trace(y = ~AirConditioning57,name = 'Air conditioning - 57') %>%
add_trace(y = ~Engine34,name = 'Engine-34') %>%
add_trace(y = ~CargoCompartment543,name = 'Cargo compartment-543') %>%
add_trace(y = ~Insulation11 ,name = 'Insulation -11') %>%
add_trace(y = ~Insulation6,name = 'Insulation-6') %>%
add_trace(y = ~Engine11,name = 'Engine-11') %>%
add_trace(y = ~Propulsion32,name = 'Propulsion-32') %>%
layout(yaxis = list(title = 'Defect Count'), barmode = 'group') %>%
layout(title = title)
p3
Additionally, you don't need to call ggplotly in this case. That function is only needed when you want to build your plot using ggplot2 and then add plotly's interactivity to the ggplot object.
Related
This question already has answers here:
How to reshape data from long to wide format
(14 answers)
Closed 1 year ago.
I have a table which has 5 columns (ID, var, state, loc and position). The var column contains a description of a certain variant e.g. var1. Within the table there are multiple rows which include var 1 but they have a different state and position. What I want to do is make a new table where each var is included only once and the position is included in two columns based on its state.
For example, say I have four var1 rows; two with the state H and two with the state h. In the new table I need the columns to be: sample - var - loc - position if H and position if h - such that all the information for var 1 is in one row. I would need to be able to do this for every single variant in my original data set.
Current data example
structure(list(ID = c(1234L, 1234L, 1234L, 1234L, 5678L, 5678L,
NA, NA, NA, NA), var = c("var1", "var1", "var1", "var1", "var2",
"var2", NA, NA, NA, NA), state = c("H", "H", "h", "h", "H", "h",
NA, NA, NA, NA), loc = c(4L, 4L, 4L, 4L, 12L, 12L, NA, NA, NA,
NA), position = c(6000L, 6002L, 6004L, 6006L, 3002L, 3004L, NA,
NA, NA, NA)), row.names = c("1", "2", "3", "4", "5", "6", "NA",
"NA.1", "NA.2", "NA.3"), class = "data.frame")
wanted format
structure(list(V1 = c("ID", "1234", "5678", NA, NA, NA, NA, NA,
NA, NA), V2 = c("var1", "var1", "var2", NA, NA, NA, NA, NA, NA,
NA), V3 = c("loc", "4", "12", NA, NA, NA, NA, NA, NA, NA), V4 = c("state H",
"6000 6002", "3002", NA, NA, NA, NA, NA, NA, NA), V5 = c("state h",
"6004 6006", "3004", NA, NA, NA, NA, NA, NA, NA)), row.names = c("1",
"2", "3", "NA", "NA.1", "NA.2", "NA.3", "NA.4", "NA.5", "NA.6"
), class = "data.frame")
Any guidance would be appreciate
The answer to your question is likely revolving around tidyr::pivot_wider
I changed the example data because I believe yours was inconsistent.
Data
df<-structure(list(ID = c(1234L, 1234L, 1234L, 1234L, 5678L, 5678L
), var = c("var1", "var1", "var1", "var1", "var2", "var2"), state = c("H",
"H", "h", "h", "H", "h"), loc = c(4L, 4L, 4L, 4L, 12L, 12L),
position = c(6000L, 6002L, 6004L, 6006L, 3002L, 3004L)), row.names = c("1",
"2", "3", "4", "5", "6"), class = "data.frame")
df
ID var state loc position
1 1234 var1 H 4 6000
2 1234 var1 H 4 6002
3 1234 var1 h 4 6004
4 1234 var1 h 4 6006
5 5678 var2 H 12 3002
6 5678 var2 h 12 3004
Answer
library(tidyr)
df %>% pivot_wider(names_from = state,
values_from = position,
values_fn = toString)
# A tibble: 2 × 5
ID var loc H h
<int> <chr> <int> <chr> <chr>
1 1234 var1 4 6000, 6002 6004, 6006
2 5678 var2 12 3002 3004
I'm running the igraph package for some network analysis on this example dataset
structure(list(ï..Column1 = c(NA, NA, NA, NA), Column2 = c(NA,
NA, NA, NA), Column3 = c(NA, NA, NA, NA), Column4 = c(NA, NA,
NA, NA), Column5 = structure(c(2L, 1L, 4L, 3L), .Label = c("Eric ",
"Jim", "Matt", "Tim"), class = "factor"), Column6 = c(NA, NA,
NA, NA), Column7 = structure(c(1L, 3L, 2L, 3L), .Label = c("Eric",
"Erica", "Mary "), class = "factor"), Column8 = structure(c(3L,
2L, 1L, 3L), .Label = c("Beth", "Loranda", "Matt"), class = "factor"),
Column9 = structure(c(2L, 3L, 1L, 3L), .Label = c("Courtney ",
"Heather ", "Patrick"), class = "factor"), Column10 = structure(4:1, .Label = c("Beth",
"Heather", "John", "Loranda "), class = "factor"), Column11 = c(NA,
NA, NA, NA), Column12 = c(NA, NA, NA, NA), Column13 = c(NA,
NA, NA, NA), Column14 = c(NA, NA, NA, NA), Column15 = c(NA,
NA, NA, NA)), class = "data.frame", row.names = c(NA, -4L
))
Here is the edgelist for anyone who wants to skip the step of finding that
structure(c("Jim", "Eric ", "Tim", "Matt", "Jim", "Eric ", "Tim",
"Matt", "Jim", "Eric ", "Tim", "Matt", "Jim", "Eric ", "Tim",
"Matt", "Eric", "Mary ", "Erica", "Mary ", "Matt", "Loranda",
"Beth", "Matt", "Heather ", "Patrick", "Courtney ", "Patrick",
"Loranda ", "John", "Heather", "Beth"), .Dim = c(16L, 2L), .Dimnames = list(
NULL, c("Column5", "value")))
I'm trying to calculate centrality for each of the nodes in the network using this code (mat is my edgelist matrix)
g1=graph_from_edgelist(mat)
degree.cent <- centr_degree(g1, mode = "all")
degree.cent
My output is something like this
> degree.cent
$`res`
[1] 4 1 4 2 4 1 6 1 2 1 2 1 1 1 1
$centralization
[1] 0.1479592
$theoretical_max
[1] 392
I know 'degree$res` is my centrality score measures, but what isn't clear to me is which nodes are actually receiving that score. I looked up a tutorial here, but all it says is the first score is "node 1". There's no indication of what node 1 is or an easy way to identify that
Firstly, you are getting incorrect results as some of the names contain spaces (Eric, Marry, Heather, ...). So, let
mat <- gsub(" ", "", mat)
g1 <- graph_from_edgelist(mat)
degree.cent <- centr_degree(g1, mode = "all")
Now we may extract the corresponding names of vertices and combine them with your result:
setNames(degree.cent$res, V(g1)$name)
# Jim Eric Mary Tim Erica Matt Loranda Beth Heather
# 4 5 2 4 1 6 2 2 2
# Patrick Courtney John
# 2 1 1
I would like to be able to create a new variable based on specific values in two existing variables. My dataframe looks like:
structure(list(id = structure(c(1L, 2L, 3L, NA, NA, NA), .Label = c("blue",
"red", "yellow"), class = "factor"), value = c(-4.3, -2.5, -3.6,
NA, NA, NA)), .Names = c("id", "value"), row.names = c(NA, -6L
), class = "data.frame")
I would like to create a new column that contains only those values that pertain to blue (e.g., 4.2). All other values would result in NA, like so:
structure(list(id = structure(c(1L, 2L, 3L, NA, NA, NA), .Label = c("blue",
"red", "yellow"), class = "factor"), value = c(-4.3, -2.5, -3.6,
NA, NA, NA), newvalue = c(-4.3, NA, NA, NA, NA, NA)), .Names = c("id",
"value", "newvalue"), row.names = c(NA, -6L), class = "data.frame")
I tried the following:
b1 <- dat$id=="blue"
dat$newvalue <- dat$value[b1]
But that filled every cell in the new column with the same value (-4.3).
Due to presence of NA's it becomes tricky to assign values directly using indexing. We can use replace instead where we replace any non "blue" value to NA.
dat$newvalue <- replace(dat$value, dat$id != "blue", NA)
dat
# id value newvalue
#1 blue -4.3 -4.3
#2 red -2.5 NA
#3 yellow -3.6 NA
#4 <NA> NA NA
#5 <NA> NA NA
#6 <NA> NA NA
The equivalent ifelse statement would be :
dat$newvalue <- ifelse(dat$id != "blue", NA, dat$value)
I have variables with the names "VA01_01", "VA01_02" etc. and "VA02_01", "VA02_02". Those variables with the prefix VA01 are data from female participants, those with the prefix VA02 are from male participants. Male participants, for example, have NAs in the variables VA01. I already have a factor with values for sex.
What I'd like to do is create a new set of variables that take over the values from both variable types. That is, if it's a male participant, he gets the values of the VA02 variables in that set of variables. So the new set of variables won't have any NAs any more because it won't be based on sex.
Does anyone have a simple solution for that question? I don't know if reshape is the answer because I don't really want to transform my data frame into long format.
Here how it looks like at the beginning:
structure(list(sex = structure(c(1L, 2L, 1L, 2L), .Label = c("female",
"male"), class = "factor"), VA01_01 = c(1, NA, 2, NA), VA01_02 = c(4,
NA, 4, NA), VA02_01 = c(NA, 3, NA, 4), VA02_02 = c(NA, 5, NA,
3)), .Names = c("sex", "VA01_01", "VA01_02", "VA02_01", "VA02_02"
), row.names = c(NA, -4L), class = "data.frame")
And here at the end (I'd like to keep the original variables):
structure(list(sex = structure(c(1L, 2L, 1L, 2L), .Label = c("female",
"male"), class = "factor"), VA_tot_01 = c(1, 3, 2, 4), VA_tot_02 = c(4,
5, 4, 3), VA01_01 = c(1, NA, 2, NA), VA01_02 = c(4, NA, 4, NA
), VA02_01 = c(NA, 3, NA, 4), VA02_02 = c(NA, 5, NA, 3)), .Names = c("sex",
"VA_tot_01", "VA_tot_02", "VA01_01", "VA01_02", "VA02_01", "VA02_02"
), row.names = c(NA, -4L), class = "data.frame")
Considering the VAR01s and VAR02s don't overlap, you could simply create another variables VAR_tot_xx including the original values from both. It would be something like this:
new_vars <- function(df) {
vars <- unique(gsub(
pattern = ".*_",
replacement = "_",
x = grep(
pattern = "_[0-9]{2}$",
x = names(df),
value = TRUE
)
))
for (i in vars) {
new_name <- paste0("VA_tot", i)
male_name <- paste0("VA01", i)
female_name <- paste0("VA02", i)
df[[new_name]] <- NA
df[[new_name]][!is.na(df[[female_name]])] <-
df[[female_name]][!is.na(df[[female_name]])]
df[[new_name]][!is.na(df[[male_name]])] <-
df[[male_name]][!is.na(df[[male_name]])]
}
return(df)
}
It could probably get prettier than this, but this does the job.
c <- structure(
list(
sex = structure(
c(1L, 2L, 1L, 2L),
.Label = c("female", "male"),
class = "factor"
),
VA01_01 = c(1, NA, 2, NA),
VA01_02 = c(4, NA, 4, NA),
VA02_01 = c(NA, 3, NA, 4),
VA02_02 = c(NA, 5, NA, 3)
),
.Names = c("sex", "VA01_01", "VA01_02", "VA02_01", "VA02_02"),
row.names = c(NA, -4L),
class = "data.frame"
)
new_vars(c)
# sex VA01_01 VA01_02 VA02_01 VA02_02 VA_tot_01 VA_tot_02
# 1 female 1 4 NA NA 1 4
# 2 male NA NA 3 5 3 5
# 3 female 2 4 NA NA 2 4
# 4 male NA NA 4 3 4 3
I have data that is organized like below M1 - M4, and I use the code from here to generate M_NEW:
M1 M2 M3 M4 M_NEW
1 1,2 0 1 1
3,4 3,4 1,2,3,4 4 3,4
NA NA 1 2 NA
It looks for a specified number of occurneces of number in the four columns and reports those numbers in M_NEW. Now, I would like to include the numbers 0 and 21 to each of the observations, unless that observation is NA. However, so far, I am unable to paste 0 and 21 to the observations, without also pasting them the NA values. The desired output is include in df below as M_NEW1. How can this be accomplished? It appears that I am missing something with paste here.
# sample data
df <- structure(list(M1 = structure(c(3L, 4L, 2L, 2L, 1L, 5L, NA, 6L
), .Label = c("0", "1", "1,2", "1,2,3,4", "1,2,3,4,5", "3,4,5,6,7"
), class = "factor"), M2 = structure(c(3L, NA, 2L, 2L, 1L, 4L,
NA, 5L), .Label = c("0", "1,2", "1,2,3,4,5", "4,5,6", "4,5,6,7,8,9,10,11,12,13,14"
), class = "factor"), M3 = structure(c(3L, NA, 1L, 1L, 1L, 2L,
NA, 4L), .Label = c("0", "1,2,3,4", "1,2,3,4,5", "1,2,3,4,5,6,7,8"
), class = "factor"), M4 = structure(c(3L, NA, 1L, 2L, 1L, 5L,
NA, 4L), .Label = c("0", "1", "1,2,3,4,5,6", "1,2,3,4,5,6,7,8,9,10,11,12",
"4,5"), class = "factor"), M_NEW1 = structure(c(3L, NA, 1L, 2L,
1L, 5L, NA, 4L), .Label = c("0,21", "1,0,21", "1,2,3,4,5,0,21",
"3,4,5,6,7,8,0,21", "4,5,0,21"), class = "factor")), .Names = c("M1",
"M2", "M3", "M4", "M_NEW1"), class = "data.frame", row.names = c(NA,
-8L))
# function slightly modified from https://stackoverflow.com/a/23203159/1670053
f <- function(x, n=3) {
tab <- table(strsplit(paste(x, collapse=","), ","))
res <- paste(names(tab[which(tab >= n)]), collapse=",")
return(ifelse(is.na(res), NA, ifelse(res == 0, "0,21", paste(res,",0,21",sep=""))))
#return(ifelse(is.na(res), ifelse(res == 0, "0,21", NA), paste(res,",0,21",sep=""))) #https://stackoverflow.com/a/17554670/1670053
#return(ifelse(is.na(res), NA, ifelse(res == 0, "0,21", paste(na.omit(res),",0,21",sep=""))))
#return(ifelse(is.na(res), as.character(NA), ifelse(res == 0, "0,21", paste(res,",0,21",sep=""))))
}
df$M_NEW2 <- apply(df[, 1:4], 1, f))
You can add another if else statement - rather inelegant but gets you there.
f2 <- function(x, n=3) {
tab <- table(strsplit(paste(x, collapse=","), ","))
res <- paste(names(tab[which(tab >= n)]), collapse=",")
res <- ifelse(res %in% c("0", ""), "0,21", res)
if(res %in% c("NA","0,21")) res else paste(res, "0,21", sep=",")
}
apply(df[1:4], 1, f2)
# "1,2,3,4,5,0,21" "NA" "0,21" "1,0,21" "0,21" "4,5,0,21" "NA"
# "3,4,5,6,7,8,0,21"