I have to cross tabulate variables of tibble. I used table() for it, but the output is not easily readable.
Is there a way to format the output to make it more easily readable.
Thanks
library(tidyverse)
# random arrays of 0 and 1
a <- sample(c(0, 1, 2, 3, 4, 5), replace = TRUE, size = 100)
b <- sample(c(0, 1, 2, 3, 4, 5), replace = TRUE, size = 100)
tbl <- tibble(a, b)
cross_tab <- table(tbl$a, tbl$b)
cross_tab
I use expss for these kinds of tables:
library(expss)
cro(tbl$a,tbl$b) %>% htmlTable()
You can precede the command above with the expss command apply_labels to format the variable and value names. See the documentation for details.
Related
I have a dataset with these three columns and other additional columns
structure(list(from = c(1, 8, 3, 3, 8, 1, 4, 5, 8, 3, 1, 8, 4,
1), to = c(8, 3, 8, 54, 3, 4, 1, 6, 7, 1, 4, 3, 8, 8), time = c(1521823032,
1521827196, 1521827196, 1522678358, 1522701516, 1522701993, 1522702123,
1522769399, 1522780956, 1522794468, 1522794468, 1522794468, 1522794468,
1522859524)), class = "data.frame", row.names = c(NA, -14L))
I need the code to take all indices less than a number (e.g. 5) and for each of them do the following: Subset the data set if the index is either in column "from" or in column "to" and calculate a function (e.g the difference between the min and max in time). As a result I expect a dataframe with the indexes and the results of the calculation.
This is what I have, but it does not work.
dur<-function(x)max(x)-min(x) #The function to calculate the difference. In other cases I need to use other functions of my own
filternumber <- function(number,x){ #A function to filter data x by the number in the two two columns
x <- x%>% subset(from == number | to == number)
return(x)
}
lista <- unique(c(data$from, data$to)) # Creates a list with all the indexes in the data. I do this to avoid having non-existing indexes
lista <-lista[lista <= 5] #Limit the list to 5. In my code this number would be an argument to a function
result<-lista%>%filteremployee(.,data) %>% select(time) %>% dur() #I use select because I have many other columns in the data
The result in this case should be a dataframe with 1036492 for 1, 967272 for 3 and 92475 for 4
I´ve also try putting filteremployee(.,data) %>% select(time) %>% dur() in side mutate but that does not work either
Perhaps you are looking for something like this:
library(purrr)
library(dplyr)
index <- c(1, 3, 4)
names(index) <- index
index %>%
map_dfr(~ df %>%
filter(from == .x | to == .x) %>%
summarize(result = dur(time)),
.id = "index")
This returns
index result
1 1 1036492
2 3 967272
3 4 92475
The function was created with ==, which is elementwise. Here, we may need to loop
library(dplyr)
library(purrr)
map_dbl(lista, ~ filternumber(.x, data) %>%
select(time) %>%
dur)
[1] 1036492 967272 92475 0
I'm trying to reverse score (recode) some items in a dataframe. All reverse scored items end in an R, and each scale has a unique start ("hc", "out", and "hm"). I normally would just select all variables that end with an "r", but the issue is that some scales are on a 5-point scale ("hc" and "out") and others are on a 7-point scale ("hm").
Here is a sample of the much, much larger dataset:
library(tidyverse)
data <- tibble(name = c("Mike", "Ray", "Hassan"),
hc_1 = c(1, 2, 3),
hc_2r = c(5, 5, 4),
out_1r = c(5, 4, 2),
out_2 = c(2, 4, 5),
out_3r = c(2, 2, 1),
hm_1 = c(6, 7, 7),
hm_2r = c(7, 1, 7))
Let's say that I want to do this one scale at a time, so I start with hm, which is on a seven-point scale.
I want to try something like this with an & statement, but I get an error:
library(tidyverse)
library(car)
data %>%
mutate_at(vars(ends_with("r") & starts_with("hm")), ~(recode(., "1=7; 2=6; 3=5; 4=4; 5=3; 6=2; 7=1")))
Error: ends_with("r") & starts_with("hc") must evaluate to column positions or names, not a logical vector
What's a clean way to make it perform the reverse scoring on these few variables at a time? Once again, the dataset is too big too practically select individual variables at a time.
Thanks!
It would be easier to use matches here
library(tidyverse)
data %>%
mutate_at(vars(matches("^hm.*r$")), ~(recode(.,
"1=7; 2=6; 3=5; 4=4; 5=3; 6=2; 7=1")))
My dataset looks like this:
"userid","progress"
1, incomplete
2, complete
3, not attempted
4, incomplete
5, not attempted
6, complete
7, complete
8, complete
9, complete
10, incomplete
I want to make a pie chart showing the percentage of people who have status-completed, incomplete and not attempted, that is total no of users/user id = complete/incomplete
This code is not working.
var1 = nrow(data1)/sum(data1$progress=="complete")
var2 = nrow(data1)/sum(data1$progress=="incomplete")
df <- data.frame(
val = c (var1, var2)
)
hchart(df, "pie")%>%hc_add_series_labels_values(values = df)
If you are trying to make a pie chart, most methods will do much of the work for you. No need to explicitly calculate the percentages. Anyway, the output of table is exactly what you want together with pie
# Load your data
ds <- read.csv(header = TRUE, text =
"userid,progress
1, incomplete
2, complete
3, not attempted
4, incomplete
5, not attempted
6, complete
7, complete
8, complete
9, complete
10, incomplete")
# Tabularize
tab <- table(ds$progress)
pie(tab) # Make piechart
As you see below, table counts the number of appearances for each level and returns a named integer vector. The nice thing here is that pie() computes the angles/areas from the relative frequencies and uses the names to label the chart.
print(tab)
#
# complete incomplete not attempted
# 5 3 2
If you insist on computing the percentages yourself, you can just use tab/sum(tab).
Edit: I see that you try to use the highcharter package. Why not use hcpie in that case? That function takes a factor as input:
library("highcharter")
hcpie(ds$progress)
Like this:
userid <- c(1,2,3,4,5,6,7,8,9,10)
progress <- c("incomplete","complete", "not attempted", "incomplete", "not attempted", "complete","complete","complete", "complete","incomplete")
df <- data.frame("userid"=userid, "progress"=progress)
df$progress <- as.factor(df$progress)
var1 = nrow(df[which(df$progress=="complete"), ])/nrow(df)
var2 = nrow(df[which(df$progress=="incomplete"), ])/nrow(df)
var3 = nrow(df[which(df$progress=="not attempted"), ])/nrow(df)
data <- c(var1, var2, var3)
pie(data, labels=c("complete","incomplete", "not attempted"))
I have an R dataframe that contains 18 columns, I would like to write a function that compares column 1 to column 2, and if both columns contain the same value, a logical result of T or F is written to a new column (this part is not too hard for me), however I would like to repeat this process over for the next columns and write T/F to a new column.
values col 1 = values col 2, write T/F to new column, values col 3 = values col 4, write T/F to a new column (or write results to a new dataframe)
I have been trying to do this with the purrr package, and use the pmap/map function, but I know I am making a mistake and missing some important part.
This function should work if I understand your problem correctly.
df <-
data.frame(a = c(18, 6, 2 ,0),
b = c(0, 6, 2, 18),
c = c(1, 5, 6, 8),
d = c(3, 5, 9, 2))
compare_columns <-
function(x){
n_columns <- ncol(x)
odd_columns <- 2*1:(n_columns/2) - 1
even_columns <- 2*1:(n_columns/2)
comparisons_list <-
lapply(seq_len(n_columns/2),
function(y){
df[, odd_columns[y]] == df[, even_columns[y]]
})
comparisons_df <-
as.data.frame(comparisons_list,
col.names = paste0("column", odd_columns, "_column", even_columns))
return(cbind(x, comparisons_df))
}
compare_columns(df)
I have multiple data frames and I want to perform the same action in all data frames, such, for example, transform all them into data.tables (this is just an example, I want to apply other functions too).
A simple example can be (df1=df2=df3, without loss of generality here)
df1 <- data.frame(var1 = c(1, 2, 3, 4, 5), var2 =c(1, 2, 2, 1, 2), var3 = c(10, 8, 15, 7, 9))
df2 <- data.frame(var1 = c(1, 2, 3, 4, 5), var2 =c(1, 2, 2, 1, 2), var3 = c(10, 8, 15, 7, 9))
df3 <- data.frame(var1 = c(1, 2, 3, 4, 5), var2 =c(1, 2, 2, 1, 2), var3 = c(10, 8, 15, 7, 9))
My approach was: (i) to create a list of the data frames (list.df), (ii) to create a list of how they should be called afterwards (list.dt) and (iii) to loop into those two lists:
list.df:
list.df<-vector('list',3)
for(j in 1:3){
name <- paste('df',j,sep='')
list.df[j] <- name
}
list.dt
list.dt<-vector('list',3)
for(j in 1:3){
name <- paste('dt',j,sep='')
list.dt[j] <- name
}
Loop (to make all data frames into data tables):
for(i in 1:3){
name<-list.dt[i]
assign(unlist(name), setDT(list.df[i]))
}
I am definitely doing something wrong as the result of this are three data tables with 1 variable, 1 observation (exactly the name list.df[i]).
I've tried to unlist the list.df thinking r would recognize that as an entire data frame and not only as a string:
for(i in 1:3){
name<-list.dt[i]
assign(unlist(name), setDT(unlist(list.df[i])))
}
But I get the error message:
Error in setDT(unlist(list.df[i])) :
Argument 'x' to 'setDT' should be a 'list', 'data.frame' or 'data.table'
Any suggestions?
You can just put all the data into one dataframe. Then, if you want to iterate through dataframes, use dplyr::do or, preferably, other dplyr functions
library(dplyr)
data =
list(df1 = df2, df2 = df2, df3 = df3) %>%
bind_rows(.id = "source") %>%
group_by(source)
Change your last snippet to this:
for(i in 1:3){
name <- list.dt[i]
assign(unlist(name), setDT(get(list.df[[i]])))
}
# Alternative to using lists
list.df <- paste0("df", 1:3)
# For loop that works with the length of the input 'list'/vector
# Creates the 'dt' objects on the fly
for(i in seq_along(list.df)){
assign(paste0("dt", i), setDT(get(list.df[i])))
}
Using data.table (which deserve far more advertising):
a) If you need all your data.frames converted to data.tables, then as was already suggested in the comments by #A5C1D2H2I1M1N2O1R2T1, iterate over your data.frames with setDT
library(data.table)
lapply(mget(paste0("df", 1:3)), setDT)
# or, if you wish to type them one by one:
lapply(list(df1, df2, df3), setDT)
class(df1) # check if coercion took place
# [1] "data.table" "data.frame"
b) If you need to bind your data.frames by rows, then use data.table::rbindlist
data <- rbindlist(mget(paste0("df", 1:3)), idcol = TRUE)
# or, if you wish to type them one by one:
data <- rbindlist(list(df1 = df1, df2 = df2, df3 = df3), idcol = TRUE)
Side note: If you like chaining/piping with the magrittr package (which you see almost always in combination with dplyr syntax), then it goes like:
library(data.table)
library(magrittr)
# for a)
mget(paste0("df", 1:3)) %>% lapply(setDT)
# for b)
data <- mget(paste0("df", 1:3)) %>% rbindlist(idcol = TRUE)