R haven: accessing column label from imported SPSS file

R haven: accessing column label from imported SPSS file - r

I have a dataset in SPSS that I am reading into R using the 'haven' library
df <- structure(list(SC155Q09HA = structure(c(2, 1, 1, 2, 1, 2, 3,
4, 3, 1), label = "School's capacity using digital devices: An effective online learning support platform is available", labels = c(`Strongly disagree` = 1,
Disagree = 2, Agree = 3, `Strongly agree` = 4, `Valid Skip` = 5,
`Not Applicable` = 7, Invalid = 8, `No Response` = 9), class = "haven_labelled")), row.names = c(NA,
-10L), class = c("tbl_df", "tbl", "data.frame"))
I'm trying to extract the label from the dataframe and can do this in base R:
library(tidyverse)
library(magrittr)
library(haven)
focus <- quo(SC156Q05HA)
attr(df$SC155Q09HA,"label")
>[1] "School's capacity using digital devices: An effective online learning support platform is available"
But not in in a dplyr style way with a variable for selection:
df[quo_name(focus)] %>% attr("label")
>NULL
df %>% select(!!focus) %>% attr("label")
>NULL
I understand that the two none-working examples return tibbles, whilst the first returns a labelled double. How do I make them equivalent?

You can do:
focus <- quo(SC155Q09HA) # Changed to match the data provided
df %>% pull(!!focus) %>% attr("label")
[1] "School's capacity using digital devices: An effective online learning support platform is available"
Your attempt using select() passes the tibble to attr() which doesn't have a label attribute, hence it returns NULL.
If you have multiple labels to extract use purrr::map_chr()
df %>% purrr::map_chr(attr, "label")

Related

gt R package: Giving a different color to a table's cells according to numerical threshold(s)

Aim
Giving a different color to a table's cells according to numerical threshold(s).
R Package
gt
Reproducible example
mydata <- structure(list(none = c(4, 4, 25, 18, 10), light = c(2, 3, 10,
24, 6), medium = c(3, 7, 12, 33, 7), heavy = c(2, 4, 4, 13, 2
)), row.names = c("SM", "JM", "SE", "JE", "SC"), class = "data.frame")
Using the above dataset, I can produce a table (however crude), using the following code:
mytable <- gt::gt(mydata)
Where I got stuck
It must be really easy, but I can wrap my head around how to assign (say) red to the cells where the value is (say) larger than 20 AND blue to cells whose value is (say) smaller than 10. It's days now that I am trying to do a little of google search (example HERE), but I could not find a solution. It must be pretty simple but no success so far. My best guess is using the tab_style() function, but I am at loss of understanding how to tune the parameters to get what I am after.

This isn't ideal if you have an arbitrarily large data frame, but for an example of your size it's certainly manageable, imo. I generalized the tests as separate functions to reduce additional code duplication and make it easier to adjust your conditional parameters.
If you're looking for a more generalized solution it would be to look over a vector of columns, as described here.
library(gt)
isHigh <- function(x) {
x > 20
}
isLow <- function(x) {
x < 10
}
mydata %>%
gt() %>%
tab_style(
style = list(
cell_fill(color = 'red'),
cell_text(weight = 'bold')
),
locations =
list(
cells_body(
columns = none,
rows = isHigh(none)
),
cells_body(
columns = light,
rows = isHigh(light)
),
cells_body(
columns = medium,
rows = isHigh(medium)
),
cells_body(
columns = heavy,
rows = isHigh(heavy)
)
)
) %>%
tab_style(
style = list(
cell_fill(color = 'lightblue'),
cell_text(weight = 'bold')
),
locations =
list(
cells_body(
columns = none,
rows = isLow(none)
),
cells_body(
columns = light,
rows = isLow(light)
),
cells_body(
columns = medium,
rows = isLow(medium)
),
cells_body(
columns = heavy,
rows = isLow(heavy)
)
)
)

On the basis of the comment I got, and after having read the earlier post here on SO, I came up with the following:
Create a dataset to work with:
mydata <- structure(list(none = c(4, 4, 25, 18, 10), light = c(2, 3, 10,
24, 6), medium = c(3, 7, 12, 33, 7), heavy = c(2, 4, 4, 13, 2
)), row.names = c("SM", "JM", "SE", "JE", "SC"), class = "data.frame")
Create a 'gt' table:
mytable <- gt::gt(mydata)
Create a vector of columns name to be later used inside the 'for' loops:
col.names.vect <- colnames(mydata)
Create two 'for' loops, one for each threshold upon which we want our values to be given different colors (say, a RED text for values > 20; a BLUE text for values < 5):
for(i in seq_along(col.names.vect)) {
mytable <- gt::tab_style(mytable,
style = gt::cell_text(color="red"),
locations = gt::cells_body(
columns = col.names.vect[i],
rows = mytable$`_data`[[col.names.vect[i]]] > 20))
}
for(i in seq_along(col.names.vect)) {
mytable <- gt::tab_style(mytable,
style = gt::cell_text(color="blue"),
locations = gt::cells_body(
columns = col.names.vect[i],
rows = mytable$`_data`[[col.names.vect[i]]] < 5))
}
This seems to achieve the goal I had in mind.

Trouble creating lists in R for the networkD3 package

I'd like to create the radial network above utilizing the R package networkD3. I read the guide here which utilizes lists to create radial networks. Unfortunately my R skills with lists are lacking. They're actually non-existent. Fortunately there's the R4DS guide here.
After reading everything I come up with this code below, to create the diagram above.
library(networkD3)
nd3 <- list(Start = list(A = list(1, 2, 3), B = "B"))
diagonalNetwork(List = nd3, fontSize = 10, opacity = 0.9)
Alas, my attempt fails. And subsequent attempts fail to generate anything that's close to the diagram above. I'm pretty sure it's my list that's wrong. Maybe you can show me the right list and things will start to make sense.

Jason!
The issue here is that the parameter nd3 has a very specific grammar of node name and children. So your code should look like this:
library(networkD3)
nd3 <- list(name = "Start", children = list(list(name = "A",
children = list(list(name = "1"),
list(name = "2"),
list(name = "3")
)),
list(name = "B")))
diagonalNetwork(List = nd3, fontSize = 10, opacity = 0.9)

If you're like me and the data frame/spreadsheet format is easier to wrap your head around, you could build an easy data frame with your data and then use data.tree functions to convert it to the list/json format...
library(data.tree)
library(networkD3)
source <- c("Start", "Start", "A", "A", "A")
target <- c("A", "B", "1", "2", "3")
df <- data.frame(source, target)
nd3 <- ToListExplicit(FromDataFrameNetwork(df), unname = T)
diagonalNetwork(List = nd3, fontSize = 10, opacity = 0.9)

Grouped Bar Plot + SD from imported excel data

I've recently been trying to convert to R from Excel, and have been struggling a bit.
My goal right now is to create a grouped bar plot of the means of 2 groups: Pre-Contrast and Post-Contrast.
Each should have "Control" and "Treated" groups.
I'm not sure how to make my data reproducible (would I upload the excel sheet?)
Data Sheet
This is the graph I'm trying to produce
library(plyr)
library(tidyverse)
library(readxl)
t2<- read_excel("t2quant.xlsx")
summary(t2)
colwise(sd)(t2)
Copy and paste from dput(t2):
structure(list(Pre_Contrast_Control = c(14, 10.4, 11), Pre_Contrast_Treated = c(16.7,
15, 12), Post_Contrast_Control = c(6, 5.8, 6.5), Post_Contrast_Treated = c(3.5,
3.7, 2.6)), .Names = c("Pre_Contrast_Control", "Pre_Contrast_Treated",
"Post_Contrast_Control", "Post_Contrast_Treated"), class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -3L))
I used colwise to obtain the SD of the values in each column.
How would I go about making grouped bar plots of the mean +/- the SD of each column?
I would really appreciate the help, everyone.

R: Smoothing Time Series Data by Item

I have a data series that displays purchases over time by item ID. Purchasing habits are irregular, so I would like to smooth this data out over time and by item ID.
If items had orders placed more regularly (i.e. Every day) we could better plot/evaluate our ordering and set stocking levels. However, some people will purchase excess of an item so they don't have to restock. This then is skewing our par level data (Since a 1 day total could really be a week's worth of product since they could only be ordering once per week.
Reproducible Example:
POData <- structure(list(a = structure(c(1499918400, 1499918400, 1499918400,
1499918400, 1499918400, 1499918400, 1496376000, 1497412800, 1497412800,
1497412800, 1497412800, 1497412800, 1497240000, 1497412800, 1497412800,
1497412800, 1501214400, 1496376000, 1496376000, 1496376000, 1496289600,
1496289600, 1496289600, 1496289600, 1496289600, 1496289600, 1501214400,
1501214400, 1501214400, 1501214400), class = c("POSIXct", "POSIXt"
), tzone = ""), b = c(446032L, 101612L, 37740L, 482207L, 152360L,
4483L, 482207L, 141729L, 81192L, 482207L, 85273L, 142955L, 460003L,
142955L, 17752L, 29763L, 309189L, 361905L, 17396L, 410762L, 437420L,
17752L, 18002L, 150698L, 163342L, 433332L, 150587L, 44159L, 433332L,
446032L), c = c(4, 1, 25, 1, 1, 1, 3, 12, 12, 1, 1, 1, 300, 1,
1, 2, 6, 6, 2, 1, 1, 1, 1, 1, 1, 1, 40, 2, 1, 2)), .Names = c("PO Date",
"PS Item ID", "PO Qty"), row.names = c(NA, 30L), class = "data.frame")
This is probably a simple question, but I hope someone has a simple way to do this.

You could use something like this
require(zoo)
require(dply)
df2 = POData %>%
arrange(`PS Item ID`,`PO Date`)%>%
group_by(`PS Item ID`)%>%
mutate(temp_lag1 = lag( `PO Qty`))%>%
mutate(temp.5.previous = rollapply(data = temp_lag1,
width = 2,
FUN = mean,
align = "left",
fill = `PO Qty`,
na.rm = T))
It essentially groups by PS Item ID and arranges by PS Item ID and PO Date. The width argument in mutate specifies how far you would like to go back for a moving average. As of now its set to 1 because your data is not that extensive by product ID.

R - Loss labels when I subset a data frame after using read_sav from haven package

I use the read_sav function from haven package to import an SPSS file. Therefore I have column names and associate labels (class labelled).
I lost the labels when I subset the data frame with subset. I can use a workaround with indexing data[i] but is this behavior a bug or not ?
Here is a simple example.
DataForExample <- structure(list(q0001_0001 = structure(c(2, NA, 5, 4, NA), label = "être plus rapide", class = "labelled", labels = structure(c(1,
2, 3, 4, 5), .Names = c("non, pas du tout", "non, pas vraiment",
"oui, un peu", "oui, tout à fait", "je ne sais pas"))), q0001_0002 = structure(c(NA,
3, NA, 4, 2), label = "être plus fiable", class = "labelled", labels = structure(c(1,
2, 3, 4, 5), .Names = c("non, pas du tout", "non, pas vraiment",
"oui, un peu", "oui, tout à fait", "je ne sais pas")))), .Names = c("q0001_0001",
"q0001_0002"), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
-5L))
View(DataForExample) # OK
Toto <- subset(DataForExample, select = q0001_0001)
View(Toto) # NOK : the labels disappeared
Toto2 <- DataForExample[1]
View(Toto2) # OK
Thanks

The same answer as with your previous question about sorting. You need to load package with support for subsetting operations for class labelled. It is better to load it after the haven. There are at least two packages with such support: Hmisc and expss. No additional actions are needed, just library(expss) or library(Hmisc).

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

R haven: accessing column label from imported SPSS file - r

Related

gt R package: Giving a different color to a table's cells according to numerical threshold(s)

Trouble creating lists in R for the networkD3 package

Grouped Bar Plot + SD from imported excel data

R: Smoothing Time Series Data by Item

R - Loss labels when I subset a data frame after using read_sav from haven package

Categories

Resources