How to plot top observations in R - r

this is probably pretty straightforward but new to R and have not been able to find a question quite like this one. I want to plot the top ten observations in my data set and have tried slice_max() but I end up plotting the whole data set. Please see below for what I have so far. Any help would be much appreciated!
Summary of data set that I am trying to plot
Here's my script for when I try to plot the above data set and I get the whole data set instead of the top ten.
Non_DFW_Orig_Counties %>%
slice_max(Non_DFW_Orig_Counties$tax_returns_by_county, n = 10) %>%
ggplot(data = Non_DFW_Orig_Counties, mapping = aes(x = Orig_County, y = tax_returns_by_county, fill = Dest_County)) +
geom_col()
Thank you to teunbrand for the stackoverflow etiquette. Is there a page where all stackoverflow etiquette is populated?

Because the data set is ordered, you can do
Non_DFW_Orig_Counties[1:10,] %>%
ggplot(mapping = aes(x = Orig_County, y = tax_returns_by_county, fill = Dest_County)) +
geom_col()
This will select the first ten rows of the data set and all columns. You also do not need the data argument in ggplot because you are using the pipe

Related

Plotting multiple continuous variables by frequencies together with same scale margin in r

I am trying to visualize my data. All I need is a plot to compare the distribution of the different variables.
I already tried with multi.hist. Actually, that would be enough for me. But the problem is, I cannot manage the margins of the scale to stay the same for each histogram to compare the distributions as it is already trying to fit for each variable.
As well, I have a categorial variable in my data as well (topic 1-5). Maybe there is a good way to visualize this as well but I am not dying if it is not possible so easy.
I tried a lot with ggplot as well but I am rather new to r and could not make anything good yet.
Below you see an example for my data.
Thank you very much in advance :)
My data:
Data
Try first converting your data to long format:
df2 <- df %>% pivot_longer(cols = 1:5, names_to = 'set', values_to = 'sub_means')
Then you can do a density plot, either colouring by set and faceting by topic:
df2 %>% ggplot(x = sub_means, fill = set) + geom_density() + facet_wrap(~topic)
Or vice versa:
df2 %>% ggplot(x = sub_means, fill = topic) + geom_density() + facet_wrap(~set)

Plotting graph from Text file using R

I am using an NS3 based simulator called NDNsim. I can generate certain trace files that can be used to analyze performance, etc. However I need to visualize the data generated.
I am a complete Novice with R, and would like a way to visualize. Here is how the output looks from which I would to plot. Any help is appreciated.
It's pretty difficult to know what you're looking for, since you have almost 50,000 measurements across 9 variables. Here's one way of getting a lot of that information on the screen:
df <- read.table(paste0("https://gist.githubusercontent.com/wuodland/",
"9b2c76650ea37459f869c59d5f5f76ea/raw/",
"6131919c105c95f8ba6967457663b9c37779756a/rate.txt"),
header = TRUE)
library(ggplot2)
ggplot(df, aes(x = Time, y = Kilobytes, color = Type)) +
geom_line() +
facet_wrap(~FaceDescr)
You could look into making sub structures from your input file and then graphing that by node, instead of trying to somehow invoke the plotter in just the right way.
df <- read.table(paste0("https://gist.githubusercontent.com/wuodland/",
"9b2c76650ea37459f869c59d5f5f76ea/raw/",
"6131919c105c95f8ba6967457663b9c37779756a/rate.txt"),
header = TRUE)
smaller_df <- df[which(df$Type=='InData'), names(df) %in% c("Time", "Node",
"FaceId", "FaceDescr", "Type", "Packets", "Kilobytes",
"PacketRaw", "KilobyteRaw")]
ggplot(smaller_df, aes(x = Time, y = Kilobytes, color = Type))
+ geom_line()
+ facet_wrap (~ Node)
The above snippet makes a smaller data frame from your original text data using only the "InData" Type, and then plots that by nodes.

R Studio: Plotting lines for a large dataset with repeating columns

I have huge dataset for which I need to create plot with all the temperature data for a time scale on the same plot. Now, the data frame consists of many repeats of the following structure:
Name, Date, Temp, Name__1, Date__1, Temp__1,...
So for every line, I want to use 3 columns and then plot the next 3 columns. I don't know if it is possible to use some kind of loop for this. What I've been doing so far is the following:
ggplot(data = "name_of_mydatase") +
geom_line(mapping = aes(Date, Temp, col = Name)) +
geom_line(mapping = aes(Date__1, Temp1, col = Name__1))
If I have to repeat this for every single temperature logger, the code would be endless and I know there are more elegant ways to do this but I just can't figure out a simpler ways. Can someone please help? Thank you so much!!!

Clustered bar chart R using 2 Numeric Variables/Metrics

I want to create a clustered Bar chart in R using 2 numeric variables, e.g:
Movie Genre (X-axis) and Gross$ + Budget$ should be Y-axis
It's a very straightforward chart to create in Excel. However, in R, I have put Genre in my X-axis and Gross$ in Y-axis.
My question is: Where do I need to put another Numeric variable ie Budget$ in my code so that the new Budget$ will be visible beside Gross$ in the chart?
Here is my Code:
ggplot(data=HW, aes(reorder(x=HW$Genre,-HW$Gross...US, sum),
y=HW$Gross...US))+
geom_col()
P.S. In aes I have just put reorder to sort the categories.
Appreciate help!
Could you give us some data so we can recreate it?
I think you are looking for geom_bar() and one of its options, position="dodge", which tells ggplot to put the bars side by side. But without knowing your data and its structure I can't further help you.
Melting the dataset should help in this case. A dummy-data based example below:
Data
HW <- data.frame(Genre = letters[sample(1:6, 100, replace = T)],
Gross...US = rnorm(100, 1e6, sd=1e5),
Budget...US = rnorm(100, 1e5, sd=1e4))
Code
library(tidyverse)
library(reshape2)
HW %>%
melt %>%
ggplot(aes(Genre, value, fill=variable)) + geom_col(position = 'dodge')

Reordering data based on a column in [r] to order x-value items from lowest to highest y-values in ggplot

I have a dataframe that I want to reorder to make a ggplot so I can easily see which items have the highest and lowest values in them. In my case, I've grouped the data into two groups, and it'd be nice to have a visual representation of which group tends to score higher. Based on this question I came up with:
library(ggplot2)
cor.data<- read.csv("https://dl.dropbox.com/s/p4uy6uf1vhe8yzs/cor.data.csv?dl=0",stringsAsFactors = F)
cor.data.sorted = cor.data[with(cor.data,order(r.val,pic)),] #<-- line that doesn't seem to be working
ggplot(cor.data.sorted,aes(x=pic,y=r.val,size=df.val,color=exp)) + geom_point()
which produces this:
I've tried quite a few variants to reorder the data, and I feel like this should be pretty simple to achieve. To clarify, if I had succesfully reorganised the data then the y-values would go up as the plot moves along the x-value. So maybe i'm focussing on the wrong part of the code to achieve this in a ggplot figure?
You could do something like this?
library(tidyverse);
cor.data %>%
mutate(pic = factor(pic, levels = as.character(pic)[order(r.val)])) %>%
ggplot(aes(x = pic, y = r.val, size = df.val, color = exp)) + geom_point()
This obviously still needs some polishing to deal with the x axis label clutter etc.
Rather than try to order the data before creating the plot, I can reorder the data at the time of writing the plot:
cor.data<- read.csv("https://dl.dropbox.com/s/p4uy6uf1vhe8yzs/cor.data.csv?dl=0",stringsAsFactors = F)
cor.data.sorted = cor.data[with(cor.data,order(r.val,pic)),] #<-- This line controls order points drawn created to make (slightly) more readible plot
gplot(cor.data.sorted,aes(x=reorder(pic,r.val),y=r.val,size=df.val,color=exp)) + geom_point()
to create

Resources