Calculate and add values to a data-frame - r

My dataset looks like this:
"userid","progress"
1, incomplete
2, complete
3, not attempted
4, incomplete
5, not attempted
6, complete
7, complete
8, complete
9, complete
10, incomplete
I want to make a pie chart showing the percentage of people who have status-completed, incomplete and not attempted, that is total no of users/user id = complete/incomplete
This code is not working.
var1 = nrow(data1)/sum(data1$progress=="complete")
var2 = nrow(data1)/sum(data1$progress=="incomplete")
df <- data.frame(
val = c (var1, var2)
)
hchart(df, "pie")%>%hc_add_series_labels_values(values = df)

If you are trying to make a pie chart, most methods will do much of the work for you. No need to explicitly calculate the percentages. Anyway, the output of table is exactly what you want together with pie
# Load your data
ds <- read.csv(header = TRUE, text =
"userid,progress
1, incomplete
2, complete
3, not attempted
4, incomplete
5, not attempted
6, complete
7, complete
8, complete
9, complete
10, incomplete")
# Tabularize
tab <- table(ds$progress)
pie(tab) # Make piechart
As you see below, table counts the number of appearances for each level and returns a named integer vector. The nice thing here is that pie() computes the angles/areas from the relative frequencies and uses the names to label the chart.
print(tab)
#
# complete incomplete not attempted
# 5 3 2
If you insist on computing the percentages yourself, you can just use tab/sum(tab).
Edit: I see that you try to use the highcharter package. Why not use hcpie in that case? That function takes a factor as input:
library("highcharter")
hcpie(ds$progress)

Like this:
userid <- c(1,2,3,4,5,6,7,8,9,10)
progress <- c("incomplete","complete", "not attempted", "incomplete", "not attempted", "complete","complete","complete", "complete","incomplete")
df <- data.frame("userid"=userid, "progress"=progress)
df$progress <- as.factor(df$progress)
var1 = nrow(df[which(df$progress=="complete"), ])/nrow(df)
var2 = nrow(df[which(df$progress=="incomplete"), ])/nrow(df)
var3 = nrow(df[which(df$progress=="not attempted"), ])/nrow(df)
data <- c(var1, var2, var3)
pie(data, labels=c("complete","incomplete", "not attempted"))

Related

crosstab variables of tibble and make the output readable

I have to cross tabulate variables of tibble. I used table() for it, but the output is not easily readable.
Is there a way to format the output to make it more easily readable.
Thanks
library(tidyverse)
# random arrays of 0 and 1
a <- sample(c(0, 1, 2, 3, 4, 5), replace = TRUE, size = 100)
b <- sample(c(0, 1, 2, 3, 4, 5), replace = TRUE, size = 100)
tbl <- tibble(a, b)
cross_tab <- table(tbl$a, tbl$b)
cross_tab
I use expss for these kinds of tables:
library(expss)
cro(tbl$a,tbl$b) %>% htmlTable()
You can precede the command above with the expss command apply_labels to format the variable and value names. See the documentation for details.

use mutate_at for variables that meet two criteria dplyr R

I'm trying to reverse score (recode) some items in a dataframe. All reverse scored items end in an R, and each scale has a unique start ("hc", "out", and "hm"). I normally would just select all variables that end with an "r", but the issue is that some scales are on a 5-point scale ("hc" and "out") and others are on a 7-point scale ("hm").
Here is a sample of the much, much larger dataset:
library(tidyverse)
data <- tibble(name = c("Mike", "Ray", "Hassan"),
hc_1 = c(1, 2, 3),
hc_2r = c(5, 5, 4),
out_1r = c(5, 4, 2),
out_2 = c(2, 4, 5),
out_3r = c(2, 2, 1),
hm_1 = c(6, 7, 7),
hm_2r = c(7, 1, 7))
Let's say that I want to do this one scale at a time, so I start with hm, which is on a seven-point scale.
I want to try something like this with an & statement, but I get an error:
library(tidyverse)
library(car)
data %>%
mutate_at(vars(ends_with("r") & starts_with("hm")), ~(recode(., "1=7; 2=6; 3=5; 4=4; 5=3; 6=2; 7=1")))
Error: ends_with("r") & starts_with("hc") must evaluate to column positions or names, not a logical vector
What's a clean way to make it perform the reverse scoring on these few variables at a time? Once again, the dataset is too big too practically select individual variables at a time.
Thanks!
It would be easier to use matches here
library(tidyverse)
data %>%
mutate_at(vars(matches("^hm.*r$")), ~(recode(.,
"1=7; 2=6; 3=5; 4=4; 5=3; 6=2; 7=1")))

For Loop filtering data frame for rows that meet criteria from separate data frame containing null values

I have two data frames and a for loop. The first data frame's rows represent separate customers with their financial information represented in each column. The second data frame includes a new set of bounds that will help me filter my first data frame.
The for loop is meant to go through the first data frame, find any rows where at least one value is outside the bounds imposed by the second data frame and filter out these rows into a new data frame.
I keep getting an error stating "Error in Region[, r] : incorrect number of dimensions." I'm assuming this is because the second data frame with the new bound limits include NA values so I guess the issue is my current for loop does not have a rule to ignore these but I'm not sure what that would look like.
Here is the first data frame:
ApplicationID <- c(1,2,3,4,5,6,7,8)
Profit <- c(100000,50000,-100000,200000,
200, 100, 2000000, 10000000)
Expenses <- c(50000, 10000, 5000000, 500,
100, 100, 500, 100000)
CreditScore <- c(10, 9, 4, 6,
1, 2, 3, 5)
Customer_Info <- data.frame(cbind(ApplicationID, Profit,
Expenses,CreditScore))
Second data frame:
max <-c(100000, 10000, NA)
min <-c(10000, NA, 5)
Limits <- data.frame(cbind(max, min))
rownames(Limits) <- c("Profit", "Expenses", "CreditScore")
My loop so far:
New_table<- data.frame()
for(r in rownames(Limits)){
if(!is.na(Limits[r,"max"])){
max_rows <- Customer_Info %>% subset(Customer_Info[,r]
>Limits[r,"max"])
New_table<-rbind(New_table, max_rows)}
if(!is.na(Limits[r,"min"])){
min_rows <- Customer_Info %>% subset(Customer_Info[,r]
<Limits[r,"min"])
New_table<-rbind(New_table, min_rows)}
}
If there is a function out there that can do this, even better. The output would be an updated version of the first data frame with out the rows included where any of their values fall outside the new limits from data frame two.
The error occurs when your for loop hits the "Credit" row of Limits and tries to select the Credit column Customer_Info (which does not exist). This can be easily solved in one of two ways:
Change:
rownames(Limits) <- c("Profit", "Expenses", "Credit")
To:
rownames(Limits) <- c("Profit", "Expenses", "CreditScore")
or change:
CreditScore <- c(10, 9, 4, 6,
1, 2, 3, 5)
To:
Credit <- c(10, 9, 4, 6,
1, 2, 3, 5)

Trying to recode multiple variables into one?

I'm trying to take three different variables representing partisanship and combine them into one. The data looks like this, where each respondent has data on only one of the three variables as either a 1 or 2:
PARTISANSHIP_D PARTISANSHIP_I PARTISANSHIP_R
1 NA NA
2 NA NA
NA 1 NA
And what I'm trying to create is one variable on a 1:6 scale based on the responses to all three. I've tried to do this using dplyr
survey$partisan <- mutate(survey, partisan = ifelse(PARTISANSHIP_D==1, 6,
ifelse(PARTISANSHIP_D==2, 5,
ifelse(PARTISANSHIP_I==1, 4, ifelse(PARTISANSHIP_I==2, 3, ifelse(
PARTISANSHIP_R==2, 2, 1)
)))))
car
survey$partisan <- Recode(survey$PARTISANSHIP_D, "1=6; 2=5",
survey$PARTISANSHIP_I, "1=4; 2=3",
survey$PARTISANSHIP_R, "1=1; 2=2")
and plain ifelse commands like this:
survey$partisan <- ifelse(survey$PARTISANSHIP_D == 1, 6,
ifelse(survey$PARTISANSHIP_D == 2, 5,
ifelse(survey$PARTISANSHIP_I == 1, 4,
ifelse(survey$PARTISANSHIP_I == 2, 3,
ifelse(survey$PARTISANSHIP_R == 2, 2, 1)))))
But none of these is working. Any pointers of what I'm doing wrong?
I got your mutate to work by doing a couple of things: change the NA in your survey dataframe to 0:
survey[is.na(survey)]<-0
This is because ifelse stops when it encounters an NA.
And don't assign the mutate result to survey$partisan. Rather, assign it to the whole dataframe:
survey <- mutate(survey, partisan = ifelse(PARTISANSHIP_D==1, 6,
ifelse(PARTISANSHIP_D==2, 5,
ifelse(PARTISANSHIP_I==1, 4, ifelse(PARTISANSHIP_I==2, 3, ifelse(
PARTISANSHIP_R==2, 2, 1)
)))))
You are looking to pivot and reshape into a tidy format.
Try this:
library(dplyr)
tidysurvey <- gather(survey, ## the source DF
key = Partisanship, ## A name for the new key varaible
value = Code, ## A name for the new values varaible
PARTISANSHIP_D:PARTISANSHIP_R) ## a list of which of the source DF to reshape

Combining frequency tables in R

I have a vector containing the frequencies of molecules within their respective molecular class for all molecules measured. I also have a vector that contains the per class frequency of significant molecules identified by variable selection. How can I merge these 2 vectors into a data frame and fill in empty frequencies with 0's (in R)?
Here is a workable example:
full = rep(letters[1:4], 4:7)
fullTable = table(full)
sub = rep(letters[1:2], c(2, 4))
subTable = table(sub)
I would like the table to look like:
print(data.frame(Letter=letters[1:4], fullFreq=c(4, 5, 6, 7), subFreq=c(2, 4, 0, 0)))
Try this (I supposed you meant subTable=table(sub) in your last line):
res<-merge(as.data.frame(fullTable),as.data.frame(subTable),by.x=1,by.y=1,all=TRUE)
colnames(res)<-c("Letter","fullFreq","subFreq")
res[is.na(res)]<-0
With the library dplyr
library(dplyr)
full=rep(letters[1:4], 4:7)
sub=rep(letters[1:2], c(2,4))
df <- data.frame(Letter=unique(c(full, sub)))
df <- df %>%
left_join(as.data.frame(table(full)), by=c("Letter"="full")) %>%
left_join(as.data.frame(table(sub)), by=c("Letter"="sub"))
df[is.na(df)] <- 0
df

Resources