I am using the R programming language. I have two datasets:
The first dataset:
my_data_1 <- data.frame(read.table(header=TRUE,
row.names = 1,
text="
height weight age
1 13.14600 2882.7709 49
2 12.65080 3183.7991 48
3 13.84154 3138.2280 48
4 15.25780 2786.5297 49
5 15.01213 3006.9687 50
6 14.37567 3286.9644 50
7 12.99385 2881.7667 51
8 15.38893 2916.1883 50
9 14.80093 2791.7292 49
10 15.40423 2427.7706 50
11 17.55129 630.8886 20
12 18.34758 1076.6810 19
13 16.37789 1778.5550 20
14 14.98782 1401.4328 17
15 17.40527 361.3323 20
16 16.53979 869.5829 21
17 16.61986 1712.1686 19
18 17.78508 1961.6090 20
19 16.83144 1043.5052 19
20 18.66166 360.3037 20
"))
The second dataset:
prior_age = rnorm(100000, 50,5)
prior_height = rnorm(100000, 17,1)
prior_weight = rnorm(100000, 3000, 200)
my_data_2 = data.frame(prior_age, prior_height, prior_weight)
(Based on the answer from this post: ggplot combining two plots from different data.frames) I am trying to plot the "densities" of the height variables from both data sets on the same graph. However, both datasets differ in the number of rows.
I tried the following code in R:
library(ggplot2)
ggplot() +
geom_density(data=my_data1, aes(x=height), color='green') +
geom_density(data=my_data2, aes(x=prior_height), color='red')
But this produces the following error:
Error: Aesthetics must be either length 1 or the same as the data (20): x
Can someone please show me how to fix this problem?
Thanks!
Well, from code you provide, I didn't need to change shape of data. Just use guides(... = guide_legend(title = ...)) and scale_colour_discrete to manually change the legend's components.
ggplot() +
geom_density(data=my_data_1, aes(x=height), color='green') +
stat_density(data = my_data_1, aes(x=height, colour="red"), geom="line",position="identity") +
geom_density(data=my_data_2, aes(x=prior_height), color='red') +
stat_density(aes(x=prior_height, colour='green'), geom="line",position="identity") +
guides(colour = guide_legend(title = "new title"),) +
scale_colour_discrete(labels = c( "prior", "measurements"))
Related
I have a data like this
df<- structure(list(Number = 1:23, Value1 = c(0.054830335, 1.19531842,
3.27820329, 1.03530176, 5.77430976, 3.72944, -0.683513395, 0.029550239,
2.487922644, 0.533448117, 0.098825565, -1.089022938, 2.301631235,
-0.095666867, -1.359480317, -1.359480317, 1.089441628, 3.307589929,
4.67838434, 3.562761178, 2.630726653, 1.795107015, 2.616255192
), Value2 = c(-0.296874921, 1.491747294, 2.951219257, 1.258677675,
-8.68096591, 3.361029751, -1.824459195, -1.445827538, 1.889631269,
-15.47774216, 3.085461276, -1.078286963, 0.948056999, -2.109354753,
-1.36703068, -1.36703068, 1.074642842, 2.945589842, 3.757911793,
2.765225717, 2.44452491, 1.784451022, 1.158493893)), class = "data.frame", row.names = c(NA,
-23L))
I am trying to make a dot plot (one color for the Value1 vrsus number) and one with Value2 versus Number. Then show the first 5 values in bigger size and the bottom 5 in bigger size
I tried to plot it like this
df$Number <- factor(df$Number, levels = paste0("D", 1:23), ordered = TRUE)
ggplot(df, aes(x=Value1, y=Value2, color= Number)) +
geom_text()+
theme_classic()
I can plot one of them like this
ggplot(data = df, aes(x = Number, y = Value1))+
geom_point()
when it comes to have the second one on the same plot, kinda fuzzy.
I can put them together in this way
# wide to long format
plotDf <- gather(df, Group, Myvalue, -1)
# plot
ggplot(plotDf, aes(Number, Myvalue, col = Group)) +
geom_point()
I still don't know how to show the first 5 values in bigger size and last 5 values in bigger size
The first 5 and the last 5 I mean these ones
df
Number Value1 Value2
1 1 0.05483034 -0.2968749
2 2 1.19531842 1.4917473
3 3 3.27820329 2.9512193
4 4 1.03530176 1.2586777
5 5 5.77430976 -8.6809659
6 6 3.72944000 3.3610298
7 7 -0.68351339 -1.8244592
8 8 0.02955024 -1.4458275
9 9 2.48792264 1.8896313
10 10 0.53344812 -15.4777422
11 11 0.09882557 3.0854613
12 12 -1.08902294 -1.0782870
13 13 2.30163123 0.9480570
14 14 -0.09566687 -2.1093548
15 15 -1.35948032 -1.3670307
16 16 -1.35948032 -1.3670307
17 17 1.08944163 1.0746428
18 18 3.30758993 2.9455898
19 19 4.67838434 3.7579118
20 20 3.56276118 2.7652257
21 21 2.63072665 2.4445249
22 22 1.79510701 1.7844510
23 23 2.61625519 1.1584939
These are the first 5
1 1 0.05483034 -0.2968749
2 2 1.19531842 1.4917473
3 3 3.27820329 2.9512193
4 4 1.03530176 1.2586777
5 5 5.77430976 -8.6809659
and these are the last 5
19 19 4.67838434 3.7579118
20 20 3.56276118 2.7652257
21 21 2.63072665 2.4445249
22 22 1.79510701 1.7844510
23 23 2.61625519 1.1584939
Using the original data (without factor):
ggplot(df, aes(Number, Value1, size = (Number <= 5 | Number > 18))) +
geom_point() +
geom_point(aes(y=Value2)) +
scale_size_manual(name = NULL, values = c("TRUE" = 2, "FALSE" = 0.5)) +
scale_x_continuous(breaks = function(z) do.call(seq, as.list(round(z,0))))
Because using a logical condition to determine size=, the manual values assigned to it need to correspond to character versions of the various values observed, which are of course TRUE and FALSE logicals into "TRUE" and "FALSE". My choice of 2 and 0.5 is arbitrary.
Feel free to name the legend better with name="some name" if desired. If you want no legend (which makes sense), you can use
... +
scale_size_manual(guide = "none", values = c("TRUE" = 2, "FALSE" = 0.5))
instead.
Another alternative, in case you want to make distinct the dots by which Value# they are, is to melt the data into a long format before plotting.
ggplot(reshape2::melt(df, "Number"),
aes(Number, value, color = variable,
size = (Number <= 5 | Number >= 18))) +
geom_point() +
scale_size_manual(guide = "none", values = c("TRUE" = 2, "FALSE" = 0.5))
One can use tidyr::pivot_longer or data.table::melt with similar results, see Reshaping data.frame from wide to long format.
I'm trying to plot the following dataframe as bar plot, where the values for the filteredprovince column are listed on a separate column (n)
Usually, the ggplot and all the other plots works on horizontal dataframe, and after several searches I am not able to find a way to plot this "transposed" version of dataframe.
The cluster should group each bar graph, and within each cluster I would plot each filteredprovince based on the value of the n column
Thanks you for the support
d <- read.table(text=
" cluster PROVINCIA n filteredprovince
1 1 08 765 08
2 1 28 665 28
3 1 41 440 41
4 1 11 437 11
5 1 46 276 46
6 1 18 229 18
7 1 35 181 other
8 1 29 170 other
9 1 33 165 other
10 1 38 153 other ", header=TRUE,stringsAsFactors = FALSE)
UPDATE
Thanks to the suggestion in comments I almost achived the format desired :
ggplot(tab_s, aes(x = cluster, y = n, fill = factor(filteredprovince))) + geom_col()
There is any way to put on Y labels not frequencies but the % ?
If I understand correctly, you're trying to use the geom_bar() geom which gives you problems because it wants to make sort of an histogram but you already have done this kind of summary.
(If you had provided code which you have tried so far I would not have to guess)
In that case you can use geom_col() instead.
ggplot(d, aes(x = filteredprovince, y = n, fill = factor(PROVINCIA))) + geom_col()
Alternatively, you can change the default stat of geom_bar() from "count" to "identity"
ggplot(d, aes(x = filteredprovince, y = n, fill = factor(PROVINCIA))) +
geom_bar(stat = "identity")
See this SO question for what a stat is
EDIT: Update in response to OP's update:
To display percentages, you will have to modify the data itself.
Just divide n by the sum of all n and multiply by 100.
d$percentage <- d$n / sum(d$n) * 100
ggplot(d, aes(x = cluster, y = percentage, fill = factor(filteredprovince))) + geom_col()
I'm not sure I perfectly understand, but if the problem is the orientation of your dataframe, you can transpose it with t(data) where data is your dataframe.
I'm trying to plot the number of observations for each instance of a word, both of which are stored in a data frame.
I can generate the plot with ggplot2, but the y-axis displays "1+e05", "2+e05",...,etc...instead of numerical values.
How can I modify this code so that the y-axis displays numbers instead?
Here is my code:
> w
p.word p.freq
1 the 294571
2 and 158624
3 you 84152
4 for 77117
5 that 71672
6 with 47987
7 this 42768
8 was 41088
9 have 39835
10 are 36458
11 but 33899
12 not 30370
13 all 27079
14 your 26923
15 just 25507
16 from 24497
17 out 22578
18 like 22501
19 what 22150
20 will 21530
21 they 21435
22 about 21184
23 one 20877
24 its 20109
ggplot(w, aes(x = p.word, y = p.freq))+ geom_bar(stat = "identity")
Here is the plot that is generated:
"1e+05" etc are numerical values (scientific notation).
If you want the long notation (e.g. "100,000") use library(scales) and the comma formatter:
library(scales)
ggplot(w, aes(x = p.word, y = p.freq))+ geom_bar(stat = "identity") +
scale_y_continuous(labels=comma)
I've a csv file with multiple columns. I'm considering only two of them, 'Time' and 'RiseOrFall'. Both are of Factor datatype. Sample data looks like:
Time RiseOrFallRiseOrFall
12 32
34 0
56 0
78 25
90 29
123 0
567 50
I'm trying to create a line chart in R that falls everytime 'RiseOrFall' hits 0 and rises when it's not 0. ('Time' on x-axis and 'RiseOrFall' on Y-axis)
I tried:
countFile <- read.csv(file = "counts.csv", nrows = 1000)[, c ('TIME','RiseOrFall')]
ggplot(data=countFile, aes(x=RiseOrFall)) + geom_line()
How can this be achieved in R (possibly using ggplot2 or anything)??
My expected output chart is as follows (The records in the sample data are very few.. Actual data is immensely large, with bigger 'RiseOrFall' values (Y-axis):
try this.
I have set x-axis(Time) and y-axis(RiseOrFall) ticks to 6 units and 200 units respectively.
ggplot(data=countFile, aes(x=Time,y = TRPM)) + geom_line(colour="brown") + scale_x_continuous(expand = c(0, 0),breaks = round(seq(min(countFile$Time), max(countFile$Time), by = 6),1)) + scale_y_continuous(expand = c(0, 0),breaks = round(seq(min(countFile$TRPM), max(countFile$TRPM), by = 200),1))+ xlab("Time") + ylab("RiseOrFall")+ theme(panel.background = element_blank(),axis.line = element_line(colour = "black"))
There must be a package for this type of data...
You could try this:
require(ggplot2)
require(zoo) #na.locf
#dummy data
countFile <- read.table(text="Time RiseOrFall
12 32
34 0
56 0
78 25
90 29
123 0
567 50",header=TRUE)
#assign values for every x value
filler <- data.frame(Time=min(countFile$Time):max(countFile$Time))
df <- merge(filler, countFile,all.x=TRUE)
df$RiseOrFall <- na.locf(df$RiseOrFall)
#plot
ggplot(df,aes(x=Time,y=RiseOrFall)) +
geom_line()
I am quite new in R. I have number of coordinates and I want to plot them in a proper way in R which also presents labels. Moreover, axises should present the lat and long. I have tries ggplot but I cannot fit the data to the code.
id lon lat
1 2 7.173500 45.86880
2 3 7.172540 45.86887
3 4 7.171636 45.86924
4 5 7.180180 45.87158
5 6 7.178070 45.87014
6 7 7.177229 45.86923
7 8 7.175240 45.86808
8 9 7.181409 45.87177
9 10 7.179299 45.87020
10 11 7.178359 45.87070
11 12 7.175189 45.86974
12 13 7.179379 45.87081
13 14 7.175509 45.86932
14 15 7.176839 45.86939
15 17 7.180990 45.87262
16 18 7.180150 45.87248
17 19 7.181220 45.87355
18 20 7.174910 45.86922
19 25 7.154970 45.87058
20 28 7.153399 45.86954
21 29 7.152649 45.86992
22 31 7.154419 45.87004
23 32 7.156099 45.86983
To do this use the geom_text geometry:
ggplot(aes(x = lon, y = lat), data = df) + geom_text(aes(label = id))
This plots the text in the id column on the locations specfied by the columns lon and lat. The data is stored in the data.frame df.
or use:
ggplot(aes(x = lon, y = lat), data = df) + geom_text(aes(label = id)) +
geom_point()
if you want to add both a point and a label. Use the hjust and vjust parameters of geom_text to change the orientation of the label relative to the point. In addition, give each point a color according to the column var by using the color parameter in the geom_point aesthetics:
ggplot(aes(x = lon, y = lat), data = df) + geom_text(aes(label = id)) +
geom_point(aes(color = var))
Do note that ggplot2 cannot deal with the Spatial classes provided by the sp package. Use as.data.frame to convert point (SpatialPoints) and gridsets (SpatialPixels/SpatialGrid) to data.frame's. In addition, use fortify to convert polygon datasets (SpatialPolygons) to data.frame.