geom_line : How to connect only a few points - r

I have this dataframe and this plot :
df <- data.frame(Groupe = rep(c("A","B"),4),
Period = gl(4,2,8,c("t0","t1","t2","t3","t4")),
rate = c(0.83,0.96,0.75,0.93,0.67,0.82,0.65,0.73))
ggplot(data = df, mapping = aes(y = rate, x = Period ,group = Groupe, colour=Groupe, shape=Groupe)) +
geom_line(size=1.2) +
geom_point(size=5)
How could i organize my data so that the points between t1 and t2 are not connected with a line ? I'd like t0 and t1 to be connected (blue or red according to the group), t2 and t3 connected in the same way, but no lines between t1 and t2. I tried several things by looking at similar questions, but it always mess up my grouping colors :/

Creating a new grouping variable manually is mostly not the best way. So, a slightly different approach which requires less hardcoding:
# create new grouping variable
df$grp <- c(1,2)[df$Period %in% c("t2","t3","t4") + 1L]
# create the plot and use the interaction between 'Group' and 'grp' as group
ggplot(df, aes(x = Period, y = rate,
group = interaction(Groupe,grp),
colour = Groupe,
shape = Groupe)) +
geom_line(size=1.2) +
geom_point(size=5)
this gives the same plot as in the other answer:

The best way to handle a problem like this in ggplot is often to create an additional column in your data frame that indicates the grouping you want to work with in your data. For example, here I've added an extra column gp to your data frame:
df$gp <- c(1,2,1,2,3,4,3,4)
ggplot(data = df, aes(y = rate, x = Period, group = gp, colour=Groupe, shape=Groupe)) +
geom_line(size=1.2) +
geom_point(size=5)
The result is, I believe, what you are looking for:
If you make Period a numerical column rather than a character vector or factor, you can more easily generate a column like gp automatically rather than manually specifying it (perhaps using ifelse or cases to create it) - this would be useful if you wanted to do the same thing many times or with a large data frame.

Related

multiple groups-boxplot-R

I have a CSV file with 3 levels of the parameter(C, C+Fe, Fe). Now I want to group the boxplot based on parameters by using geom-boxplot(fill=parameter) but only for two levels of them, not all 3.
The current code is: geom_boxplot(aes(x = gene, y=RA, fill = parameter) which yealds to:
However, I want to eliminate the blue box plot which is one of the parameters.
You may just filter the observation before plotting. Assuming your data frame is called df:
df2 <- subset(df, parameter != "Fe")
ggplot(df2, aes(x = gene, y=RA, fill=parameter)) +
geom_boxplot()

ggplot2 does not plot multiple groups of a variable, only plots one line

I would like to make a plot with multiple lines corresponding to different groups of variable "Prob" (0.1, 0.5 and 0.9) using ggplot. Although that, when I run the code, it only plots one line instead of 3. Thanks for the help :)
Here my code:
Prob <- c(0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.9,0.9,0.9,0.9,0.9,0.9,0.9,0.9,0.9,0.9,0.9,0.9,0.9,0.9,0.9,0.9,0.9,0.9,0.9,0.9,0.9,0.9,0.9,0.9,0.9,0.9,0.9,0.9,0.9,0.9,0.9,0.9,0.9,0.9,0.9,0.9,0.9)
nit <- c(0.9,0.902777775,0.90555555,0.908333325,0.9111111,0.913888875,0.91666665,0.919444425,0.9222222,0.924999975,0.92777775,0.930555525,0.9333333,0.936111075,0.93888885,0.941666625,0.9444444,0.947222175,0.94999995,0.952777725,0.9555555,0.958333275,0.96111105,0.963888825,0.9666666,0.969444375,0.97222215,0.974999925,0.9777777,0.980555475,0.98333325,0.986111025,0.9888888,0.991666575,0.99444435,0.997222125,0.9999999,0.9,0.902777775,0.90555555,0.908333325,0.9111111,0.913888875,0.91666665,0.919444425,0.9222222,0.924999975,0.92777775,0.930555525,0.9333333,0.936111075,0.93888885,0.941666625,0.9444444,0.947222175,0.94999995,0.952777725,0.9555555,0.958333275,0.96111105,0.963888825,0.9666666,0.969444375,0.97222215,0.974999925,0.9777777,0.980555475,0.98333325,0.986111025,0.9888888,0.991666575,0.99444435,0.997222125,0.9999999,0.9,0.902777775,0.90555555,0.908333325,0.9111111,0.913888875,0.91666665,0.919444425,0.9222222,0.924999975,0.92777775,0.930555525,0.9333333,0.936111075,0.93888885,0.941666625,0.9444444,0.947222175,0.94999995,0.952777725,0.9555555,0.958333275,0.96111105,0.963888825,0.9666666,0.969444375,0.97222215,0.974999925,0.9777777,0.980555475,0.98333325,0.986111025,0.9888888,0.991666575,0.99444435,0.997222125,0.9999999)
greek <- log((1-Prob)/Prob)/-10
italian <- ((0.997-nit)/(0.997-0.97))^3
Temp<-c(rep(25,111))
GT <- ((30-Temp)/(30-3.3))^3
GH <- 1-GT-italian
acid <- (-1*(((sign(GH)*(abs(GH)^(1/3)))*(7-5))-7))
Species<-c(rep("Case",111))
data <- as.data.frame(cbind(Prob,greek,GT,GH,italian, Temp,acid,nit, Species))
ggplot() +
geom_line(data = data, aes_string(x = acid, y = nit, group = Prob, color = factor(Prob)), size = 0.8)
The answer seems to be kind of two parts:
In your data frame data, the columns that should be numeric are not numeric.
The reason why you only see one line.
Fixing the Data Frame and Using aes() in place of aes_string()
I noticed something was odd when you had as.data.frame(cbind(... to make your data frame and are using aes_string(.. within the ggplot portion. If you do a quick check on data via str(data), you'll see all of your columns in data are characters, whereas in the environment the data prepared in the code for their respective columns are numeric. Ex. acid is numeric, yet data$acid is a character.
The reason for this is that you're binding the columns into a data frame by using as.data.frame(cbind(.... This results in all data being coerced into a character, so you loose the numeric nature of the data. This is also why you have to use aes_string(...) to make it work instead of aes(). To bind vectors together into a data frame, use data.frame(..., not as.data.frame(cbind(....
To fix all this, bind your columns together like this + the ggplot code:
data <- data.frame(Prob,greek,GT,GH,italian, Temp,acid,nit, Species)
# data <- as.data.frame(cbind(Prob,greek,GT,GH,italian, Temp,acid,nit, Species))
ggplot() +
geom_line(data=data, aes(x = acid, y = nit, group = Prob, color = factor(Prob)), size = 0.8)
Why is there only one line?
The simple answer to why you only see one line is that the line for each of the values of data$Prob is equal. What you see is the effect of overplotting. It means that the line for data$Prob == 0.1 is the same line when data$Prob == 0.5 and data$Prob = 0.9.
To demonstrate this, let's separate each. I'm going to do this realizing that Prob could be created by repeating 0.1, 0.5, and 0.9 each 37 times in a row. I'll create a factor that I'll use as multiplication factor for data$nit that will result in separating our our lines:
my_factor <- rep(c(1,1.1,1.5), each=37) # our multiplication fractor
data$nit <- data$nit * my_factor # new nit column
# same plot code
ggplot() +
geom_line(data=data, aes(x = acid, y = nit, group = Prob, color = factor(Prob)), size = 0.8)
There ya go. We have all lines there, you just could not see them due to overplotting. You can convince yourself of this without the multiplication business and the original data by comparing the plots for each data$Prob:
# use original dataset as above
ggplot() +
geom_line(data=data, aes(x = acid, y = nit, group = Prob, color = factor(Prob)), size = 0.8) +
facet_wrap(~Prob)

How to diplay the boxplot in order with date x - axis?

How can I make this in order of month, x axis is not in date class its in character? I tried using reorder and sort it doesn't work for my case.
Two approaches.
Fake data:
set.seed(42) # R-4.0.2
dat <- data.frame(
when = sample(c("Apr20", "Feb20", "Mar20"), size = 500, replace = TRUE),
charge = 10000 * rexp(500)
)
ggplot(dat, aes(charge, when)) +
geom_boxplot() +
coord_flip()
Date class
This is what I'll call "The Right Way (tm)", for two reasons: if the data is date-like, them let's use Date; and allow R to handle the ordering naturally.
dat$when2 <- as.Date(paste0("01", dat$when), "%d%b%y")
ggplot(dat, aes(charge, when2, group = when)) +
geom_boxplot() +
coord_flip() +
scale_y_date(labels = function(z) format(z, format = "%b%y"))
(I should note that I need both when2 and group=when: since when2 is a continuous variable, ggplot2 is not going to auto-group things based on it, so we need group=.)
factor
I think this is the wrong approach, for two reasons: (1) not using dates as the numeric data they are; and (2) the more months you have, the more you have to manually control the levels within the factors.
However, having said that:
dat$when3 <- factor(dat$when, levels = c("Feb20", "Mar20", "Apr20"))
ggplot(dat, aes(charge, when3)) +
geom_boxplot() +
coord_flip()
(You could easily overwrite dat$when instead of creating a new variable dat$when3, but I kept it separate because I went back and forth during code-testing here. Frankly, if you prefer to not go the Date route, then doing this allows other things to be ordered correctly, too.)

Manually added legend not working in ggplot2?

Here's facsimile of my data:
d1 <- data.frame(
e=rnorm(3000,10,10)
)
d2 <- data.frame(
e=rnorm(2000,30,30)
)
So, I got around the problem of plotting two different density distributions from two very different datasets on the same graph by doing this:
ggplot() +
geom_density(aes(x=e),fill="red",data=d1) +
geom_density(aes(x=e),fill="blue",data=d2)
But when I try to manually add a legend, like so:
ggplot() +
geom_density(aes(x=e),fill="red",data=d1) +
geom_density(aes(x=e),fill="blue",data=d2) +
scale_fill_manual(name="Data", values = c("XXXXX" = "red","YYYYY" = "blue"))
Nothing happens. Does anybody know what's going wrong? I thought I could actually manually add legends if need be.
Generally ggplot works best when your data is in a single data.frame and in long format. In your case we therefore want to combine the data from both data.frames. For this simple example, we just concatenate the data into a long variable called d and use an additional column id to indicate to which dataset that value belongs.
d.f <- data.frame(id = rep(c("XXXXX", "YYYYY"), c(3000, 2000)),
d = c(d1$e, d2$e))
More complex data manipulations can be done using packages such as reshape2 and tidyr. I find this cheat sheet often useful. Then when we plot we map fill to id, and ggplot will take of the legend automatically.
ggplot(d.f, aes(x = d, fill = id)) +
geom_density()

How to create histogram in R with CSV time data?

I have CSV data of a log for 24 hours that looks like this:
svr01,07:17:14,'u1#user.de','8.3.1.35'
svr03,07:17:21,'u2#sr.de','82.15.1.35'
svr02,07:17:30,'u3#fr.de','2.15.1.35'
svr04,07:17:40,'u2#for.de','2.1.1.35'
I read the data with tbl <- read.csv("logs.csv")
How can I plot this data in a histogram to see the number of hits per hour?
Ideally, I would get 4 bars representing hits per hour per srv01, srv02, srv03, srv04.
Thank you for helping me here!
I don't know if I understood you right, so I will split my answer in two parts. The first part is how to convert your time into a vector you can use for plotting.
a) Converting your data into hours:
#df being the dataframe
df$timestamp <- strptime(df$timestamp, format="%H:%M:%S")
df$hours <- as.numeric(format(df$timestamp, format="%H"))
hist(df$hours)
This gives you a histogram of hits over all servers. If you want to split the histograms this is one way but of course there are numerous others:
b) Making a histogram with ggplot2
#install.packages("ggplot2")
require(ggplot2)
ggplot(data=df) + geom_histogram(aes(x=hours), bin=1) + facet_wrap(~ server)
# or use a color instead
ggplot(data=df) + geom_histogram(aes(x=hours, fill=server), bin=1)
c) You could also use another package:
require(plotrix)
l <- split(df$hours, f=df$server)
multhist(l)
The examples are given below. The third makes comparison easier but ggplot2 simply looks better I think.
EDIT
Here is how thes solutions would look like
first solution:
second solution:
third solution:
An example dataset:
dat = data.frame(server = paste("svr", round(runif(1000, 1, 10)), sep = ""),
time = Sys.time() + sort(round(runif(1000, 1, 36000))))
The trick I use is to create a new variable which only specifies in which hour the hit was recorded:
dat$hr = strftime(dat$time, "%H")
Now we can use some plyr magick:
hits_hour = count(dat, vars = c("server","hr"))
And create the plot:
ggplot(data = hits_hour) + geom_bar(aes(x = hr, y = freq, fill = server), stat="identity", position = "dodge")
Which looks like:
I don't really like this plot, I'd be more in favor of:
ggplot(data = hits_hour) + geom_line(aes(x = as.numeric(hr), y = freq)) + facet_wrap(~ server, nrow = 1)
Which looks like:
Putting all the facets in one row allows easy comparison of the number of hits between the servers. This will look even better when using real data instead of my random data.

Resources