R Studio: Plotting lines for a large dataset with repeating columns - r

I have huge dataset for which I need to create plot with all the temperature data for a time scale on the same plot. Now, the data frame consists of many repeats of the following structure:
Name, Date, Temp, Name__1, Date__1, Temp__1,...
So for every line, I want to use 3 columns and then plot the next 3 columns. I don't know if it is possible to use some kind of loop for this. What I've been doing so far is the following:
ggplot(data = "name_of_mydatase") +
geom_line(mapping = aes(Date, Temp, col = Name)) +
geom_line(mapping = aes(Date__1, Temp1, col = Name__1))
If I have to repeat this for every single temperature logger, the code would be endless and I know there are more elegant ways to do this but I just can't figure out a simpler ways. Can someone please help? Thank you so much!!!

Related

How do I create a grouped boxplot in R?

I have a data frame containing 5 probes which are my variables in a dataframe, cg02823866, cg13474877, cg14305799, cg15837913 and cg19724470. I want to create a boxplot that will group cg02823866 and cg14305799 into a group called 'GeneBody' and then cg13474877, cg14305799 and cg19724470 into a group called 'Promoter'. I then want to colour code the boxplots to represent the probe names. I can't figure out how to group those variables into groups to plot the graph.
I created an ungrouped boxplot of the five probes and it looked like this.
I want there to be the titles 'Promoter' and 'GeneBody' on the x axis. Above the 'GeneBody' title there are the 2 boxplots for the cg02823866 and cg14305799 probes. Then a 'Promoter' label with the boxplots for cg13474877, cg14305799 and cg19724470. I then want each boxplots colour coded to represent each different probe.
My data frame that I imported into RStudio looks like this: https://i.stack.imgur.com/r4gEC.png
Assuming you have some data with variable names Beta (your y axis), Probe (your current x axis), and group (either "GeneBody" or "Promoter"), you can do something like the following:
library(ggplot2)
ggplot(data, aes(x = group, y = Beta, fill = Probe)) +
geom_boxplot()
If you provide a reproducible set of data, I can probably do better.
Adding to Ben's answer the traditional iris-data.frame example,which you can easily load by data(iris):
ggplot(iris) +
aes(x = "", y = Sepal.Length, group = Species) +
geom_boxplot(shape = "circle", fill = "#112446") +
theme_minimal()
So you just need a column which indicates the group dependency.
It gets of course more difficult with uncleand data, where you might need to transpond the data first etc. But those are follow up questions i guess.
Also if you want to make your life easier, use esquisse R-Studio add-on
Boxplot

geom_line not outputting connected points

I have the attached dataframe. I am wanting to create a line graph using ggplot in order to plot Total and Year, with seperate lines for each offence category. I have used the following code, but I feel it is very incorrect as the output does not have any connected lines, it looks more like a vertical line graph. Any help is much appreciated :)
Dataframe
The code I have tried is:
ggplot(data = annual, aes(x = (as.numeric(Year)), y = Total, group = Offence Category)) +
geom_line()

How to plot top observations in R

this is probably pretty straightforward but new to R and have not been able to find a question quite like this one. I want to plot the top ten observations in my data set and have tried slice_max() but I end up plotting the whole data set. Please see below for what I have so far. Any help would be much appreciated!
Summary of data set that I am trying to plot
Here's my script for when I try to plot the above data set and I get the whole data set instead of the top ten.
Non_DFW_Orig_Counties %>%
slice_max(Non_DFW_Orig_Counties$tax_returns_by_county, n = 10) %>%
ggplot(data = Non_DFW_Orig_Counties, mapping = aes(x = Orig_County, y = tax_returns_by_county, fill = Dest_County)) +
geom_col()
Thank you to teunbrand for the stackoverflow etiquette. Is there a page where all stackoverflow etiquette is populated?
Because the data set is ordered, you can do
Non_DFW_Orig_Counties[1:10,] %>%
ggplot(mapping = aes(x = Orig_County, y = tax_returns_by_county, fill = Dest_County)) +
geom_col()
This will select the first ten rows of the data set and all columns. You also do not need the data argument in ggplot because you are using the pipe

Plotting graph from Text file using R

I am using an NS3 based simulator called NDNsim. I can generate certain trace files that can be used to analyze performance, etc. However I need to visualize the data generated.
I am a complete Novice with R, and would like a way to visualize. Here is how the output looks from which I would to plot. Any help is appreciated.
It's pretty difficult to know what you're looking for, since you have almost 50,000 measurements across 9 variables. Here's one way of getting a lot of that information on the screen:
df <- read.table(paste0("https://gist.githubusercontent.com/wuodland/",
"9b2c76650ea37459f869c59d5f5f76ea/raw/",
"6131919c105c95f8ba6967457663b9c37779756a/rate.txt"),
header = TRUE)
library(ggplot2)
ggplot(df, aes(x = Time, y = Kilobytes, color = Type)) +
geom_line() +
facet_wrap(~FaceDescr)
You could look into making sub structures from your input file and then graphing that by node, instead of trying to somehow invoke the plotter in just the right way.
df <- read.table(paste0("https://gist.githubusercontent.com/wuodland/",
"9b2c76650ea37459f869c59d5f5f76ea/raw/",
"6131919c105c95f8ba6967457663b9c37779756a/rate.txt"),
header = TRUE)
smaller_df <- df[which(df$Type=='InData'), names(df) %in% c("Time", "Node",
"FaceId", "FaceDescr", "Type", "Packets", "Kilobytes",
"PacketRaw", "KilobyteRaw")]
ggplot(smaller_df, aes(x = Time, y = Kilobytes, color = Type))
+ geom_line()
+ facet_wrap (~ Node)
The above snippet makes a smaller data frame from your original text data using only the "InData" Type, and then plots that by nodes.

Reordering data based on a column in [r] to order x-value items from lowest to highest y-values in ggplot

I have a dataframe that I want to reorder to make a ggplot so I can easily see which items have the highest and lowest values in them. In my case, I've grouped the data into two groups, and it'd be nice to have a visual representation of which group tends to score higher. Based on this question I came up with:
library(ggplot2)
cor.data<- read.csv("https://dl.dropbox.com/s/p4uy6uf1vhe8yzs/cor.data.csv?dl=0",stringsAsFactors = F)
cor.data.sorted = cor.data[with(cor.data,order(r.val,pic)),] #<-- line that doesn't seem to be working
ggplot(cor.data.sorted,aes(x=pic,y=r.val,size=df.val,color=exp)) + geom_point()
which produces this:
I've tried quite a few variants to reorder the data, and I feel like this should be pretty simple to achieve. To clarify, if I had succesfully reorganised the data then the y-values would go up as the plot moves along the x-value. So maybe i'm focussing on the wrong part of the code to achieve this in a ggplot figure?
You could do something like this?
library(tidyverse);
cor.data %>%
mutate(pic = factor(pic, levels = as.character(pic)[order(r.val)])) %>%
ggplot(aes(x = pic, y = r.val, size = df.val, color = exp)) + geom_point()
This obviously still needs some polishing to deal with the x axis label clutter etc.
Rather than try to order the data before creating the plot, I can reorder the data at the time of writing the plot:
cor.data<- read.csv("https://dl.dropbox.com/s/p4uy6uf1vhe8yzs/cor.data.csv?dl=0",stringsAsFactors = F)
cor.data.sorted = cor.data[with(cor.data,order(r.val,pic)),] #<-- This line controls order points drawn created to make (slightly) more readible plot
gplot(cor.data.sorted,aes(x=reorder(pic,r.val),y=r.val,size=df.val,color=exp)) + geom_point()
to create

Resources