Plotting graph from Text file using R - r

I am using an NS3 based simulator called NDNsim. I can generate certain trace files that can be used to analyze performance, etc. However I need to visualize the data generated.
I am a complete Novice with R, and would like a way to visualize. Here is how the output looks from which I would to plot. Any help is appreciated.

It's pretty difficult to know what you're looking for, since you have almost 50,000 measurements across 9 variables. Here's one way of getting a lot of that information on the screen:
df <- read.table(paste0("https://gist.githubusercontent.com/wuodland/",
"9b2c76650ea37459f869c59d5f5f76ea/raw/",
"6131919c105c95f8ba6967457663b9c37779756a/rate.txt"),
header = TRUE)
library(ggplot2)
ggplot(df, aes(x = Time, y = Kilobytes, color = Type)) +
geom_line() +
facet_wrap(~FaceDescr)

You could look into making sub structures from your input file and then graphing that by node, instead of trying to somehow invoke the plotter in just the right way.
df <- read.table(paste0("https://gist.githubusercontent.com/wuodland/",
"9b2c76650ea37459f869c59d5f5f76ea/raw/",
"6131919c105c95f8ba6967457663b9c37779756a/rate.txt"),
header = TRUE)
smaller_df <- df[which(df$Type=='InData'), names(df) %in% c("Time", "Node",
"FaceId", "FaceDescr", "Type", "Packets", "Kilobytes",
"PacketRaw", "KilobyteRaw")]
ggplot(smaller_df, aes(x = Time, y = Kilobytes, color = Type))
+ geom_line()
+ facet_wrap (~ Node)
The above snippet makes a smaller data frame from your original text data using only the "InData" Type, and then plots that by nodes.

Related

Creating an alluvial plot in R to demonstrate web traffic flow

I have a dataset that reads like a log file showing each user interaction with a website. I'm trying to visualize this data to show the most common sequences/pathways through the site (no, I do not have access to Google Analytics - just a data dump.) I've been able to distill the data down to a format that contains the page and the number of times it is the first, second, third page visited, etc.
I thought I might create an alluvial plot (using ggaluvial) stratified by the sequential position. I've roughed together a version of what I'm going for:
Here is a way to generate some sample data that is structured like mine:
pages <- rep(c("Home", "About", "People", "Contact", "Products"), each=6)
positions <- sample(c(1,2,3,4,5))
counts <- sample(1:100, 30)
df_colnames <- c("Page", "Position", "Count")
df <- data.frame(pages, positions, counts)
colnames(df) <- df_colnames
But I cannot seem to get ggaluvial to accept a single column as repeated strata, if that makes sense. Here's what I've got, but it's not much to go on:
library(ggalluvial)
ggplot(df,
aes(axis1 = Page,
axis2 = Position,
y = Count)) +
geom_alluvium() +
geom_stratum() +
geom_text(stat = "stratum",
label.strata = TRUE) +
theme_minimal()
This is just something that I have been trying. If you know of a better way to visualize this information, I'm all ears.
Thank you in advance.

How to plot top observations in R

this is probably pretty straightforward but new to R and have not been able to find a question quite like this one. I want to plot the top ten observations in my data set and have tried slice_max() but I end up plotting the whole data set. Please see below for what I have so far. Any help would be much appreciated!
Summary of data set that I am trying to plot
Here's my script for when I try to plot the above data set and I get the whole data set instead of the top ten.
Non_DFW_Orig_Counties %>%
slice_max(Non_DFW_Orig_Counties$tax_returns_by_county, n = 10) %>%
ggplot(data = Non_DFW_Orig_Counties, mapping = aes(x = Orig_County, y = tax_returns_by_county, fill = Dest_County)) +
geom_col()
Thank you to teunbrand for the stackoverflow etiquette. Is there a page where all stackoverflow etiquette is populated?
Because the data set is ordered, you can do
Non_DFW_Orig_Counties[1:10,] %>%
ggplot(mapping = aes(x = Orig_County, y = tax_returns_by_county, fill = Dest_County)) +
geom_col()
This will select the first ten rows of the data set and all columns. You also do not need the data argument in ggplot because you are using the pipe

Overlaying trials in separate files onto one ggplot graph

I am trying to plot one graph with multiple trials (from separate text files). In the below case, I am plotting the "place" variable with the "firing rate" variable, and it works when I use ggplot on its own:
a <- read.table("trial1.txt", header = T)
library(ggplot2)
ggplot(a, aes(x = place, y = firing_rate)) + geom_point() + geom_path()
But when I try to create a for loop to go through each trial file in the folder and plot it on the same graph, I am having issues. This is what I have so far:
files <- list.files(pattern=".txt")
for (i in files){
p <- lapply(i, read.table)
print(ggplot(p, aes(x = place, y = firing_rate)) + geom_point() + geom_path())
}
It gives me a "Error: data must be a data frame, or other object coercible by fortify(), not a list" message. I am a novice in R so I am not sure what to make of that.
Thank you in advance for the help!
In general avoiding loops is the best adivce in R. Since you are using ggplot you may be interested in using the map_df function from tidyverse:
First create a read function and include the filename as a trial lable:
readDataFile = function(x){
a <- read.table(x, header = T)
a$trial = x
return(a)
}
Next up map_df:
dataComplete = map_df(files, readDataFile)
This runs our little function on each file and combines them all to a single data frame (of course assuming they are compatible in format).
Finally, you can plot almost as before but can distinguish based on the trial variable:
ggplot(dataComplete, aes(x = place, y = firing_rate, color=trial, group=trial)) + geom_point() + geom_path()

How to draw a violin plot with the color showing the expression of gene value?

I am trying to plot the gene expression of "gene A" among several groups.
I use ggplot2 to draw, but I fail
p <- ggplot(MAPK_plot, aes(x = group, y = gene_A)) + geom_violin(trim = FALSE , aes( colour = gene_A)) + theme_classic()
And I want to get the figure like this from https://www.researchgate.net/publication/313728883_Neuropilin-1_Is_Expressed_on_Lymphoid_Tissue_Residing_LTi-like_Group_3_Innate_Lymphoid_Cells_and_Associated_with_Ectopic_Lymphoid_Aggregates
You would have to provide data to get a more specific answer, tailored to your problem. But, I do not want that you get demotivated by the down-votes you got so far and, based on your link, maybe this example can give you some food for thought.
Nice job on figuring out that you have to use geom_violin. Further, you will need some form of faceting / multi-panels. Finally, to do the full annotation like in the given link, you need to make use of the grid package functionality (which I do not use here).
I am not familiar with gene-expression data sets, but I use a IMDB movie rating data set for this example (stored in the package ggplot2movies).
library(ggplot2)
library(ggplot2movies)
library(data.table)
mv <- copy(movies)
setDT(mv)
# make some variables for our plotting example
mv[, year_10 := cut_width(year, 10)]
mv[, rating_10yr_avg := mean(rating), by = year_10]
mv[, length_3gr := cut_number(length, 3)]
ggplot(mv,
aes(x = year_10,
y = rating)) +
geom_violin(aes(fill = rating_10yr_avg),
scale = "width") +
facet_grid(rows = vars(length_3gr))
Please do not take this answer as a form on encouragement of not posting data relevant to your problem.

visualizing statistical test results with ggplot2

I would like to get my statistical test results integrated to my plot. Example of my script with dummy variables (dummy data below generated after first post):
cases <- rep(1:1:5,times=10)
var1 <- rep(11:15,times=10)
outcome <- rep(c(1,1,1,2,2),times=10)
maindata <- data.frame(cases,var1,outcome)
df1 <- maindata %>%
group_by(cases) %>%
select(cases,var1,outcome) %>%
summarise(var1 = max(var1, na.rm = TRUE), outcome=mean(outcome, na.rm =TRUE))
wilcox.test(df1$var1[df1$outcome<=1], df1$var1[df1$outcome>1])
ggplot(df1, aes(x = as.factor(outcome), y = as.numeric(var1), fill=outcome)) + geom_boxplot()
With these everything works just fine, but I can't find a way to integrate my wilcox.test results to my plot automatically (of course I can make use annotation() and write the results manually but that's not what I'm after.
My script produces two boxplots with max-value of var1 on the y-axis and grouped by outcome on the x-axis (only two different values for outcome). I would like to add my wilcox.test results to that boxplot, all other relevant data is present. Tried to find a way from forums and help files but can't find a way (at least with ggplot2)
I'm new to R and trying learn stuff through using ggplot2 and dplyr which I see as most intuitive packages for manipulation and visualization. Don't know if they are optimal for the solution which I'm after so feel free to suggest solutions from alternative packages also...
I thinks this figure shows what you want. I also added some parts to the code because you're new with ggplot2. Take or leave them, but there're things I do make publication quality figures:
wtOut = wilcox.test(df1$var1[df1$outcome<=1], df1$var1[df1$outcome>1])
exampleOut <- ggplot(df1,
aes(x = as.factor(outcome), y = as.numeric(var1), fill=outcome)) +
geom_boxplot() +
scale_fill_gradient(name = paste0("P-value: ",
signif(wtOut$p.value, 3), "\nOutcome")) +
ylab("Variable 1") + xlab("Outcome") + theme_bw()
ggsave('exampleOut.jpg', exampleOut, width = 6, height = 4)
If you want to include the p-value as its own legend, it looks like it is some work, but doable.
Or, if you want, just throw signif(wtOut$p.value, 3) into annotate(...). You'll just need to come up with rules for where to place it.

Resources