Related
I have no prolem doing group-mean centering for one variable at a time but how can I do that for multiple variables at the same time?
library(misty)
x<-(c(1, 2, 3, 4, 3, 1))
y<-(c(1, NA, 3, 5, 3, 2))
group<-as.factor(c(1, 2, 2, 1, 2, 1))
mydata<-data.frame(x, y, group)
mydata<-mydata%>% mutate(x_cwc = center(mydata$x, type="CWC", group=mydata$group))
mydata<-mydata%>% mutate(y_cwc = center(mydata$y, type="CWC", group=mydata$group))
I have the episode duration data (in days)
dur<-c(1, 2, 1, 2, 1, 3, 11, 2, 2, 3, 2, 4, 1, 2, 2, 1, 2, 10, 1, 1, 2, 2, 18, 2, 2, 2, 1, 7, 1, 1, 11, 25, 17, 2, 2, 9, 3, 3, 2, 5, 3, 2, 3, 2, 5, 363, 1, 1, 2, 2)
Which means in one instance the episode duration was 1 days, 2 days, 1 days etc etc
table(dur) summarizes the duration data (12 instances of 1 day, 20 instances of 2 days etc)
freq.table<-(table(dur)/sum(table(dur))) gives me the frequency of the observed durations of episodes (point estimates).
How can I get confidence intervals of freq.table in R? What would be the most appropriate way for this kind of data?
Edit: I am interested in estimating the CI of the frequency of episode durations of 1, 2, ..., n days
A fast and easy way to get CIs for proportions in R is the function binom.test as in
dur <- c(1, 2, 1, 2, 1, 3, 11, 2, 2, 3, 2, 4,
1, 2, 2, 1, 2, 10, 1, 1, 2, 2, 18, 2,
2, 2, 1, 7, 1, 1, 11, 25, 17, 2, 2, 9,
3, 3, 2, 5, 3, 2, 3, 2, 5, 363, 1, 1, 2, 2)
t <- table(dur)
n <- length(dur)
ci <- sapply(t, function(x) binom.test(x, n, conf.level = .95)$conf.int)
rownames(ci) <- c("lower", "upper")
print(ci)
That is supposing, that the data forming process for each episode is anything like a binomial process.
Edit after first comment
As Roland has pointed out in an earlier comment above, you have not stated the problem in inambigous statistical terms, so I made some assumptions. I suppose Roland would suggest trying to find a distribution for all the possible durations as a whole system. Considerung a mode on 2 and the existence of an observation with value 363 this is unlikely to be a common distribution like poisson or binomial etc. Knowing nothing about the data generating process I estimated a confidence interval for each observed outcome on it's own, not regarding the distribution as a whole. For each observed outcome I stated that I assumed a binomial distribution which you should look up before you use my proposition for an answer for anything serious.
I have a tibble created like this:
tibble(district = c(1, 5, 3, 5, 2, 7, 8, 1, 1, 2, 2, 4, 5, 6, 8, 6, 3),
housing = c(1, 1, 2, 1, 2, 2, 2, 1, 1, 2, 3, 2, 1, 1, 1, 3, 2))
Now I would like to know how the type of housing is distributed per district. Since the amount of respondents per district is different, I would like to work with percentages. Basically I'm looking for two plots;
1) One barplot in which the percentage of housing categories is visualized in 1 bar per district (since it is percentages all the bars would be of equal height).
2) A pie chart for every district, with the percentage of housing categories for that specific district.
I am however unable to group the data is the wished way, let along compute percentages of them. How to make those plots?
Thanks ahead!
Give this a shot:
library(tidyverse)
library(ggplot2)
# original data
df <- data.frame(district = c(1, 5, 3, 5, 2, 7, 8, 1, 1, 2, 2, 4, 5, 6, 8, 6, 3),
housing = c(1, 1, 2, 1, 2, 2, 2, 1, 1, 2, 3, 2, 1, 1, 1, 3, 2))
# group by district
df <- df %>%
group_by(district) %>%
summarise(housing=sum(housing))
# make percentages
df <- df %>%
mutate(housing_percentage=housing/sum(df$housing)) %>%
mutate(district=as.character(district)) %>%
mutate(housing_percentage=round(housing_percentage,2))
# bar graph
ggplot(data=df) +
geom_col(aes(x=district, y=housing_percentage))
# pie chart
ggplot(data=df, aes(x='',y=housing_percentage, fill=district)) +
geom_bar(width = 1, stat = "identity", color = "white") +
coord_polar("y", start = 0) +
theme_void()
Which yields the following plots:
I want to create a variable region based on a series of similar variables zipid1 to zipid26. My current code is like this:
dat$region <- with(dat, ifelse(zipid1 == 1, 1,
ifelse(zipid2 == 1, 2,
ifelse(zipid3 == 1, 3,
ifelse(zipid4 == 1, 4,
5)))))
How can I write a loop to avoid typing from zipid1 to zipid26? Thanks!
We subset the 'zipid' columns, create a logical matrix by comparing with 1 (== 1), get the column index of the TRUE value with max.col (assuming there is only a single 1 per each row and assign it to create 'region'
dat$region <- max.col(dat[paste0("zipid", 1:26)] == 1, "first")
Using a small reproducible example
max.col(dat[paste0("zipid", 1:5)] == 1, "first")
data
dat <- data.frame(id = 1:5, zipid1 = c(1, 3, 2, 4, 5),
zipid2 = c(2, 1, 3, 5, 4), zipid3 = c(3, 2, 1, 5, 4),
zipid4 = c(4, 3, 6, 2, 1), zipid5 = c(5, 3, 8, 1, 4))
I am a beginner in R, and have a question about making boxplots of columns in R. I just made a dataframe:
SUS <- data.frame(RD = c(4, 3, 4, 1, 2, 2, 4, 2, 4, 1), TK = c(4, 2, 4, 2, 2, 2, 4, 4, 3, 1),
WK = c(3, 2, 4, 1, 3, 3, 4, 2, 4, 2), NW = c(2, 2, 4, 2, NA, NA, 5, 1, 4, 2),
BW = c(3, 2, 4, 1, 4, 1, 4, 1, 5, 1), EK = c(2, 4, 3, 1, 2, 4, 2, 2, 4, 2),
AN = c(3, 2, 4, 2, 3, 3, 3, 2, 4, 2))
rownames(SUS) <- c('Pleasant to use', 'Unnecessary complex', 'Easy to use',
'Need help of a technical person', 'Different functions well integrated','Various function incohorent', 'Imagine that it is easy to learn',
'Difficult to use', 'Confident during use', 'Long duration untill I could work with it')
I tried a number of times, but I did not succeed in making boxplots for all rows. Someone who can help me out here?
You can do it as well using tidyverse
library(tidyverse)
SUS %>%
#create new column and save the row.names in it
mutate(variable = row.names(.)) %>%
#convert your data from wide to long
tidyr::gather("var", "value", 1:7) %>%
#plot it using ggplot2
ggplot(., aes(x = variable, y = value)) +
geom_boxplot()+
theme(axis.text.x = element_text(angle=35,hjust=1))
As #blondeclover says in the comment, boxplot() should work fine for doing a boxplot of each column.
If what you want is a boxplot for each row, then actually your current rows need to be your columns. If you need to do this, you can transpose the data frame before plotting:
SUS.new <- as.data.frame(t(SUS))
boxplot(SUS.new)