plotting two numeric variables in the same graph - r

I want to visualise two variables in the same graph.
the variables look like this
> head(intp.trust_male)
# A tibble: 1 × 1
average_intp.trust
<dbl>
1 2.33
and
> head(intp.trust_fem)
# A tibble: 1 × 1
average_intp.trust
<dbl>
1 2.34
I have tried merge to put them in the same data frame, but it doesn't seem to work
Q5 <- merge(intp.trust_fem, intp.trust_male)
ggplot(data = Q5)+
aes(fill = percent_owned) +
geom_sf() +
scale_fill_viridis_c()
can anyone help me out here, please?
Thank you :)

I think what you want to do is stack your data frames. You can do this with dplyr::bind_rows. It's not clear from your question what you're trying to accomplish because percent_owned is not a variable in the data you've shown. Generally, you could do (using geom_point):
library(dplyr)
library(ggplot2)
intp.trust_male <- mutate(intp.trust_male, label = "intp.trust_male")
intp.trust_fem <- mutate(intp.trust_fem, label = "intp.trust_fem")
df <- bind_rows(intp.trust_male, intp.trust_fem)
ggplot(df, aes(x = label, y = average_intp.trust)) +
geom_point()

Related

Data wrangling for creating multiple bar graph

So, I have this tibble from which I am trying to make a multiple bar graph that shows how much was spent supporting(for) or opposing(against) each of these candidates
However, I am completely lost on how to go about doing it, and I think I want to rearrange this tibble to make it simpler to create a graph. Any pointers would be very helpful.
A tibble: 5 x 5
type clinton sanders omalley fa_camp
<chr> <dbl> <dbl> <dbl> <chr>
1 24A 51937848 859337 0 against
2 24C 15106530 900 0 for
3 24E 29651626 5307952 374821 for
4 24F 5096083 304153 0 for
5 24N 10139 0 0 against
I am hoping to eventually achieve a result that looks like this:
The different colored bars would be for/against, and the y-axis would be the amount spent.
Before plotting, would put into long format.
library(tidyverse)
library(scales)
df %>%
pivot_longer(cols = -c(type, fa_camp), names_to = "candidate", values_to = "amount_spent") %>%
ggplot(aes(x = candidate, y = amount_spent, group = fa_camp, fill = fa_camp)) +
geom_bar(stat = "identity", position = "dodge") +
scale_y_continuous(labels = dollar)
Plot

Plotting 3 Variables on One Chart - ggplot

I have some experience with base R but am trying to learn tidyverse and ggplot. I have a dataframe with 4 columns of data. I want a simple x-y plot, where the first column of data is on the x-axis, and the data in the other 3 columns is plotted on the y-axis, resulting in 3 lines on one plot. The first 15 lines of my data look like this (sorry about the image - I don't know how to insert a sample of my data):
screen shot - first 15 rows of data
I tried to plot the second and third columns of data as follows: ,
ggplot(data=SWRC_SL, aes(x=SWRC_SL$pressure_head, y=SWRC_SL$UNSODA_theta)) +
geom_line(colour="red") + scale_x_log10() +
ggplot(data=SWRC_SL, aes(x=SWRC_SL$pressure_head, y=SWRC_SL$Vrugt_theta)) +
geom_line(colour="blue") + scale_x_log10()
I get this error:
Error: Don't know how to add ggplot(data = SWRC_SL, aes(x = SWRC_SL$pressure_head, y = SWRC_SL$Vrugt_theta)) to a plot
I believe I should be using something like "group=" to indicate which columns should be plotted, but I haven't been able to find an example that shows how you can use gglot to plot data across multiple columns. What am I missing ?
ggplot() is only ever called once when you create a chart. Try with the following:
ggplot() +
geom_line(data=SWRC_SL, aes(x=pressure_head, y=UNSODA_theta), colour="red") +
geom_line(data=SWRC_SL, aes(x=pressure_head, y=Vrugt_theta), colour="blue") +
scale_x_log10()
A better method would be to turn your data to long, where the UNSODA_theta and Vrugt_theta data are in the same column (say thetas), and have another column (say type_theta) indicating whether the data is for UNSODA_theta or Vrugt_theta. Then you could do the following:
ggplot(data=SWRC_SL, aes(x=pressure_head, y=thetas, colour=type_theta)) +
geom_line() +
scale_x_log10()
This is more desirable because ggplot2 will include a legend indicating what type of theta the colours are applied to.
As suggested by #Marius, the most efficient way to plot your data is to convert them into a long format.
Using tidyverse, you can have the use of pivot_longer function (from tidyr package) and write the following code:
library(tidyverse)
SWRC_SL %>% pivot_longer(.,-pressure_head, names_to = "variable", values_to = "value") %>%
ggplot(aes(x = pressure_head, y = value, color = variable))+
geom_line()+
scale_x_log10()
EDIT: Illustrating example
Using this dummy dataset:
pressure UNSODA_theta Vrugt_theta Cassel_theta
1 0 -1.4672500 1.4119747 -2.0553118
2 1 0.5210227 0.6189239 1.4817574
3 2 -0.1587546 1.4094018 2.2796175
4 3 1.4645873 2.6888733 -0.4631109
5 4 -0.7660820 2.5865884 -1.8799346
6 5 -0.4302118 0.6690922 0.9633620
First, you pivot your data into a long format:
df %>% pivot_longer(.,-pressure, names_to = "variable", values_to = "value")
# A tibble: 45 x 3
pressure variable value
<int> <chr> <dbl>
1 0 UNSODA_theta -1.47
2 0 Vrugt_theta 1.41
3 0 Cassel_theta -2.06
4 1 UNSODA_theta 0.521
5 1 Vrugt_theta 0.619
6 1 Cassel_theta 1.48
7 2 UNSODA_theta -0.159
8 2 Vrugt_theta 1.41
9 2 Cassel_theta 2.28
10 3 UNSODA_theta 1.46
# … with 35 more rows
Now, your data are suitable for the plotting with ggplot2, you can directly add ggplot command to the previous command by adding a "pipe" (%>%) between them:
library(tidyverse)
df %>% pivot_longer(.,-pressure, names_to = "variable", values_to = "value") %>%
ggplot(aes(x = pressure, y = value, color = variable))+
geom_line()+
scale_x_log10()
And you get the following plot with legend included:
Data example
structure(list(pressure = 0:14, UNSODA_theta = c(-1.46725002909224,
0.521022742648139, -0.158754604716016, 1.4645873119698, -0.766081999604665,
-0.430211753928547, -0.926109497377437, -0.17710396143654, 0.402011779486338,
-0.731748173119606, 0.830373167981674, -1.20808278630446, -1.04798441280774,
1.44115770684428, -1.01584746530465), Vrugt_theta = c(1.41197471231751,
0.61892394889108, 1.40940183965093, 2.68887328620405, 2.58658843344197,
0.669092199317234, -1.28523553529247, 3.49766158983416, 1.66706616676549,
1.5413273359637, 0.986600476854091, 1.51010842295293, 0.835624168230333,
1.42069464325451, 0.599753256022356), Cassel_theta = c(-2.05531181632119,
1.48175740118232, 2.27961753824932, -0.46311085383842, -1.87993463341154,
0.963361958516736, -0.0670637053409687, -2.59982761023726, 0.00319778952040447,
-0.945450500892219, -0.511452869790608, -1.73485854395378, 2.7047128618762,
-0.496698054586832, -2.40827011837962)), class = "data.frame", row.names = c(NA,
-15L))

Why is geom_bar y-axis unproportional to actual numbers?

Sorry if this question already exists - was googling for a while now already and didn't find anything.
I am relatively new to R and learning while doing all of this.
I'm supposed to create some PDF via r markdown that analyses patient-data with specific main-diagnosis and secondary-diagnosis. For this I'm supposed to plot some numbers via ggplot (geom_bar and geom_boxplot).
So what I do so far is, I retrieve data-sets that include both codes via SQL and load them into data.table-objects afterwards. Afterwards I join them to get the data I need.
After this I add columns that consist sub-strings of those codes and others that consist the count of those certain sub-strings (so I can plot the occurrences of every code).
I wanted now for example to put certain data.table into a geom_bar or geom_boxplot and make it visible. This actually works, but my y-axis has a weird scale that doesn't fit the numbers it actually should show. The proportions of the bars are also not accurate.
For example: one diagnoses appears 600 times and the other one 1000 times. The y-axis shows steps of 0 - 500.000 - 1.000.000 - 1.500.000 - ....
The Bar that shows 600 is super small and the bar with 1000 goes up to 1.500.000
If I create a new variable before and count what I need via count() and plot this it just works. The rows I put for the y-axis have in both variable the same datatype (integer)
So here is just how I create the data.table that I use for plotting
exazerbationsHdComorbiditiesNd <- allExazerbationsHd[allComorbiditiesNd, on="encounter_num", nomatch=0]
exazerbationsHdComorbiditiesNd <- exazerbationsHdComorbiditiesNd[, c("i.DurationGroup", "i.DurationInDays", "i.start_date", "i.end_date", "i.duration", "i.patient_num"):=NULL]
exazerbationsHdComorbiditiesNd[ , IcdHdCodeCount := .N, by = concept_cd]
exazerbationsHdComorbiditiesNd[ , IcdHdCodeClassCount := .N, by = IcdHdClass]
If I want to bar-plot now for example IcdHdClass by IcdHdCodeClassCount I do following:
ggplot(exazerbationsHdComorbiditiesNd, aes(exazerbationsHdComorbiditiesNd$IcdHdClass, exazerbationsHdComorbiditiesNd$IcdHdCodeClassCount, label=exazerbationsHdComorbiditiesNd$IcdHdCodeClassCount)) + geom_bar(stat = "identity") + geom_text(vjust = 0, size = 5)
It outputs said bar-plot with weird proportions.
If I do first:
plotTest <- count(exazerbationsHdComorbiditiesNd, exazerbationsHdComorbiditiesNd$IcdHdClass)
And then bar-plot it:
ggplot(plotTest, aes(plotTest$`exazerbationsHdComorbiditiesNd$IcdHdClass`, plotTest$n, label=plotTest$n)) + geom_bar(stat = "identity") + geom_text(vjust = 0, size = 5)
Its all perfect and works.
I checked also data-types of the columns I needed:
sapply(exazerbationsHdComorbiditiesNd, class)
sapply(plotTest, class)
In both variables the columns I need are of the type character and integer
Edit:
Unfortunately I cant post images. So here are just the links to those.
Here is a screenshot of the plot with wrong y-axis:
https://ibb.co/CbxX1n7
And here is a screenshot of the plot shown right:
https://ibb.co/Xb8gyx1
Here is some example-data that I copied out the data.table object:
Exampledata
Since you added the class counts as an additional column--rather than aggregating--what’s happening is that for each row in your data, the class counts get stacked on top of each other:
library(tidyverse)
set.seed(42)
df <- tibble(class = sample(letters[1:3], 10, replace = TRUE)) %>%
add_count(class, name = "count")
df # this is essentially what your data looks like
#> # A tibble: 10 x 2
#> class count
#> <chr> <int>
#> 1 a 5
#> 2 a 5
#> 3 a 5
#> 4 a 5
#> 5 b 3
#> 6 b 3
#> 7 b 3
#> 8 a 5
#> 9 c 2
#> 10 c 2
ggplot(df, aes(class, count)) + geom_bar(stat = "identity")
You could use position = "identity" so that the bars don’t get stacked:
ggplot(df, aes(class, count)) +
geom_bar(stat = "identity", position = "identity")
However, that creates a whole bunch of unnecessary layers in your plot that you can’t see. A better approach would be to drop the extra rows from your data before plotting:
df %>%
distinct(class, count)
#> # A tibble: 3 x 2
#> class count
#> <chr> <int>
#> 1 a 5
#> 2 b 3
#> 3 c 2
df %>%
distinct(class, count) %>%
ggplot(aes(class, count)) +
geom_bar(stat = "identity")
Created on 2019-09-05 by the reprex package (v0.3.0.9000)

adding rows to a tibble based on mostly replicating existing rows

I have data that only shows a variable if it is not 0. However, I would like to have gaps representing these 0s in the graph.
(I will be working from a large dataframe, but have created an example data based on how I will be manipulating it for this purpose.)
library(tidyverse)
library(ggplot2)
A <- tibble(
name = c("CTX_M", "CblA_1"),
rpkm = c(350, 4),
sample = "A"
)
B <- tibble(
name = c("CTX_M", "OXA_1", "ampC"),
rpkm = c(324, 357, 99),
sample = "B"
)
plot <- bind_rows(A, B)
ggplot()+ geom_col(data = plot, aes(x = sample, y = rpkm, fill = name),
position = "dodge")
Sample A and B both have CTX_M, however the othre three "names" are only present in either sample A or sample B. When I run the code, the output graph shows two bars for sample A and three bars for sample B the resulting graph was:
Is there a way for me to add ClbA_1 to sample B with rpkm=0, and OXA_1 and ampC to sample A with rpkm=0, while maintaining sample separation? - so the tibble would look like this (order not important):
and the graph would therefore look like this:
You can use complete from tidyr.
plot <- plot %>% complete(name,sample,fill=list(rpkm=0))
# A tibble: 8 x 3
name sample rpkm
<chr> <chr> <dbl>
1 ampC A 0
2 ampC B 99
3 CblA_1 A 4
4 CblA_1 B 0
5 CTX_M A 350
6 CTX_M B 324
7 OXA_1 A 0
8 OXA_1 B 357
ggplot()+ geom_col(data = plot, aes(x = sample, y = rpkm, fill = name),
position = "dodge")

Tail of my df is being plotted in the beginning of my plot

I have a dataframe that contains time(H:M:S), thetaX(degrees), thetaY(degrees), and thetaZ(degress). I want to plot the degrees vs time using ggplot as mentioned here.
This is the original state of my dataframe:
> head(df)
time thetaX thetaY thetaZ
1 08:27:27 0.01539380 -0.001609785 -0.03271715
2 08:27:27 0.03079389 -0.003863202 -0.06512209
3 08:27:27 0.04588598 -0.006668402 -0.09720450
4 08:27:28 0.06008822 -0.008774166 -0.12872514
5 08:27:28 0.07400642 -0.008951306 -0.15985775
6 08:27:28 0.08823425 -0.012280650 -0.19023676
I run these lines to plot each column of df over time:
df = data.frame(time, thetaX,thetaY,thetaZ)
> df.m = melt(df,id="time")
> ggplot(data = df.m, aes(x = x, y = value)) + geom_point() + facet_grid(variable ~ .)
But, this is what comes out:
Question: Why is my data plotting from the what looks like the tail end at #1pm-ish of my df then jumping across to the beginning #8am-ish and finishing through the rest?

Resources