Hi I am relatively new to R. I am struggling with what seems like it should be a relatively simple task- I am trying to make a frequency histogram using ggplot2 from a subset of data from a longer dataframe.
Here is an example of the data structure us in the picture attached
https://i.stack.imgur.com/HIwQv.png
The data is from a survey where 0 means not selected and 1 means it was selected. There are numeric in the original dataset I want a histogram of the frequency in which each variable was selected. The column variables on the x-axis and frequency counts on the y-axis. I have various subsets like this within a dataframe and I would like each to subset to how their own graph.
I first subset the columns of interest
new dataset <-subset(df, select = c(WAB_R, WAB_B, BDAE, PNT))
When I checked the class it was dataframe and no longer numeric
I tried to use as.numeric to convert it back to a numeric, but with no luck
I could use some guidance in how to structure the data to then obtain a histogram.
Thanks Carla
Maybe try this approach using tidyverse functions. You have to reshape to long selecting the desired variables. Here the code using ggplot2 for the final plot:
library(tidyverse)
#Code 1
df %>% select(c(WAB_R, WAB_B, BDAE, PNT)) %>%
pivot_longer(everything()) %>%
ggplot(aes(x=value))+
geom_histogram(stat = 'count',aes(fill=name),
position = position_dodge2(0.9,preserve = 'single'))+
labs(fill='Variable')
Output:
Or this:
#Code 2
df %>% select(c(WAB_R, WAB_B, BDAE, PNT)) %>%
pivot_longer(everything()) %>%
ggplot(aes(x=factor(value)))+
geom_histogram(stat = 'count',aes(fill=name),
position = position_dodge2(0.9,preserve = 'single'))+
labs(fill='Variable')+xlab('value')
Output:
Some data used:
#Data
df <- structure(list(ID = 1:4, WAB_R = c(0L, 1L, 0L, 1L), WAB_B = c(0L,
1L, 0L, 0L), BDAE = c(0L, 0L, 0L, 1L), PNT = c(0L, 0L, 0L, 0L
)), class = "data.frame", row.names = c(NA, -4L))
Related
I have the following data (sample below):
Participant Group Choice
1 Control 0
2 Control 0
3 Control 0
4 Stress 1
5 Stress 1
6 Stress 1
I want to create a bar graph depicting the frequencies of Choice (0 or 1) for Group (Stress VS Control).
Make a table and use barplot which comes with R.
barplot(with(dat, table(Choice, Group)), main="My plot", beside=T, col=2:3)
Data:
(Forgive me that I chose slightly more interesting data :)
dat <- structure(list(Participant = 1:6, Group = c("Control", "Control",
"Control", "Stress", "Stress", "Stress"), Choice = c(0L, 1L,
0L, 0L, 1L, 1L)), class = "data.frame", row.names = c(NA, -6L
))
You can use count to count the frequencies, convert the variables to factor and plot.
library(dplyr)
library(ggplot2)
df %>%
count(Group, Choice) %>%
mutate(Choice = factor(Choice), Group = factor(Group)) %>%
ggplot() + aes(Group, n, fill = Choice) + geom_col()
I have a 3 column csv file like this
x,y1,y2
100,50,10
200,10,20
300,15,5
I want to have a barplot using R, with first column values on x axis and second and third columns values as grouped bars for the corresponding x. I hope I made it clear. Can someone please help me with this? My data is huge so I have to import the csv file and can't enter all the data.I found relevant posts but none was exactly addressing this.
Thank you
Use the following code
library(tidyverse)
df %>% pivot_longer(names_to = "y", values_to = "value", -x) %>%
ggplot(aes(x,value, fill=y))+geom_col(position = "dodge")
Data
df = structure(list(x = c(100L, 200L, 300L), y1 = c(50L, 10L, 15L),
y2 = c(10L, 20L, 5L)), class = "data.frame", row.names = c(NA,
-3L))
I have already searched the Forum for Hours (really) and start to get the faint Feeling that I am slowly going crazy, especially as it appears to me to be a really easily solvable Problem.
What do I want to do?
Basically, I want to simulate clinical data. Specifically, for each Patient (column 1:ID) an arbitrary score (column 3: score), dependant on the assigned Treatment Group (column 2: group).
set.seed(123)
# Number of subjects in study
n_patients = 1000
# Score: Mean and SDs
mean_verum = 70
sd_verum = 20
mean_placebo = 40
sd_placebo = 20
# Allocating to Treatment groups:
data = data.frame(id = as.character(1:n_patients))
data$group[1:(n_patients/2)] <- "placebo"
data$group[(n_patients/2+1):n_patients] <- "verum"
# Attach Score for each treatment group
data$score <- ifelse(data$group == "verum", rnorm(n=100, mean=mean_verum, sd=sd_verum), rnorm(n=100, mean=mean_placebo, sd=sd_placebo))
So far so easy. Now, I wish to 1) calculate a probability of an Event happening (logit function) depending on the score. Then, 2) I want to actually assign an Event, depending on the probability (rbinom).
I want to do this for n different probablities/Events. This is the Code I've used so far:
Calculate probabilities:
a = -1
b = 0.01
p1 = 1-exp(a+b*data$score)/(1+exp(a+b*data$score))
data$p_AE1 <- p1
a = -0.5
b = 0.01
p1 = 1-exp(a+b*data$score)/(1+exp(a+b*data$score))
data$p_AE2 <- p1
…
Assign Events:
data$Abbruch_AE1 <- rbinom(n_patients, 1, data$p_E1)
data$Abbruch_AE2 <- rbinom(n_patients, 1, data$p_E2)
…
Obviously, this is really inefficient, as it would like to easily scale this up or down, depending on how many probabilities/Events I want to simulate.
The Problem is, I simply do not get it, how I can simultaneously a) generate new, single column in the dataframe, where I want to put in the values for each, b) perform the function to assign the probabilities/Events, and c) do this for a number n of different formulas, which have their specific a and b.
I am sure the solution to this Problem is a simple one - what I didn't manage was to do all These Things at once, which is were I would like this to be eventually. I ahve played around with for loops, all to no avail.
Any help would be greatly appreciated!
This how my dataframe Looks like:
structure(list(id = structure(1:3, .Label = c("1", "2", "3"), class = "factor"),
group = c("placebo", "placebo", "placebo"), score = c(25.791868726014,
45.1376741831306, 35.0661624307525), p_AE1 = c(0.677450814266315,
0.633816117436442, 0.656861351663365), p_AE2 = c(0.560226492151216,
0.512153420188678, 0.537265362130761), p_AE3 = c(0.435875409622676,
0.389033483248856, 0.413221988111604), p_AE4 = c(0.319098312196655,
0.278608032377073, 0.299294085148527), p_AE5 = c(0.221332386680766,
0.189789774534235, 0.205762225373345), p_AE6 = c(0.147051201194953,
0.124403316086538, 0.135795233451071), p_AE7 = c(0.0946686004658072,
0.0793379289917946, 0.0870131973838217), p_AE8 = c(0.0596409872667201,
0.0496714832182721, 0.0546471270895262), AbbruchAE1 = c(1L,
1L, 1L), AbbruchAE2 = c(1L, 1L, 0L), AbbruchAE3 = c(0L, 0L,
0L), AbbruchAE4 = c(0L, 1L, 0L), AbbruchAE5 = c(1L, 0L, 0L
), AbbruchAE6 = c(1L, 0L, 0L), AbbruchAE7 = c(0L, 0L, 0L),
AbbruchAE8 = c(0L, 0L, 0L)), .Names = c("id", "group", "score", "p_AE1", "p_AE2", "p_AE3", "p_AE4", "p_AE5", "p_AE6", "p_AE7", "p_AE8", "AbbruchAE1", "AbbruchAE2", "AbbruchAE3", "AbbruchAE4", "AbbruchAE5", "AbbruchAE6", "AbbruchAE7", "AbbruchAE8"), row.names = c(NA, 3L), class = "data.frame")
I'm trying to apply inverse probability weights to a regression, but lm() only uses analytic weights. This is part of a replication I'm working on where the original author is using pweight in Stata, but I'm trying to replicate it in R. The analytic weights are providing lower standard errors which is causing problems with some of my variable being significance.
I've tried looking at the survey package, but am not sure how to prepare a survey object for use with svyglm(). Is this the approach I want, or is there an easier way to apply inverse probability weights?
dput :
data <- structure(list(lexptot = c(9.1595012302023, 9.86330744180814,
8.92372556833205, 8.58202430280175, 10.1133857229336), progvillm = c(1L,
1L, 1L, 1L, 0L), sexhead = c(1L, 1L, 0L, 1L, 1L), agehead = c(79L,
43L, 52L, 48L, 35L), weight = c(1.04273509979248, 1.01139605045319,
1.01139605045319, 1.01139605045319, 0.76305216550827)), .Names = c("lexptot",
"progvillm", "sexhead", "agehead", "weight"), class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -5L))
Linear Model (using analytic weights)
prog.lm <- lm(lexptot ~ progvillm + sexhead + agehead, data = data, weight = weight)
summary(prog.lm)
Alright, so I figured it out and thought I would update the post incase others were trying to figure it out. It's actually pretty straightforward.
data$X <- 1:nrow(data)
des1 <- svydesign(id = ~X, weights = ~weight, data = data)
prog.lm <- svyglm(lexptot ~ progvillm + sexhead + agehead, design=des1)
summary(prog.lm)
Standard errors are now correct.
I want to rank the variables in my dataset in a descending order of the Number of Plants used. I tried ranking in .csv and then exporting it in R. But even then, the plot was not ranked in the required order. Here is my dataset
df <- structure(list(Lepidoptera.Family = structure(c(3L, 2L, 5L, 1L, 4L, 6L),
.Label = c("Hesperiidae", "Lycaenidae", "Nymphalidae", "Papilionidae", "Pieridae","Riodinidae"), class = "factor"),
LHP.Families = c(55L, 55L, 15L, 14L, 13L, 1L)),
.Names = c("Lepidoptera.Family", "LHP.Families"),
class = "data.frame", row.names = c(NA, -6L))
library(ggplot2)
library(reshape2)
gg <- melt(df,id="Lepidoptera.Family", value.name="LHP.Families", variable.name="Type")
ggplot(gg, aes(x=Lepidoptera.Family, y=LHP.Families, fill=Type))+
geom_bar(stat="identity")+
coord_flip()+facet_grid(Type~.)
How do i rank them in a descending order? Also, i want to combine 3 plots into one. How can i go about it?
The reason this is happening is that ggplot plots the x variables that are factors in the ordering of the underlying values (recall that factors are stored as numbers underneath the covers). If you want to graph them in an alternate order, you should change the order of the levels before plotting
gg$Lepidoptera.Family<-with(gg,
factor(Lepidoptera.Family,
levels=Lepidoptera.Family[order(LHP.Families)]))
The trick is to reorder the levels of the Lepidoptera.Family factor, which by default is alphabetical:
df = within(df, {
factor(Lepidoptera.Family, levels = reorder(Lepidoptera.Family, LHP.Families))
})
gg <- melt(df,id="Lepidoptera.Family", value.name="LHP.Families", variable.name="Type")
ggplot(gg, aes(x=Lepidoptera.Family, y=LHP.Families, fill=Type))+ geom_bar(stat="identity")+ coord_flip()+facet_grid(Type~.)