Grouped barplots in R using csv - r

I have a 3 column csv file like this
x,y1,y2
100,50,10
200,10,20
300,15,5
I want to have a barplot using R, with first column values on x axis and second and third columns values as grouped bars for the corresponding x. I hope I made it clear. Can someone please help me with this? My data is huge so I have to import the csv file and can't enter all the data.I found relevant posts but none was exactly addressing this.
Thank you

Use the following code
library(tidyverse)
df %>% pivot_longer(names_to = "y", values_to = "value", -x) %>%
ggplot(aes(x,value, fill=y))+geom_col(position = "dodge")
Data
df = structure(list(x = c(100L, 200L, 300L), y1 = c(50L, 10L, 15L),
y2 = c(10L, 20L, 5L)), class = "data.frame", row.names = c(NA,
-3L))

Related

How to visualize this data clarifying in R and recognize patterns?

This is my dataframe:
dataframe
Output dput(dataframe):
structure(list(ChargePoint_skey = c(2174, 2174, 2174, 2239, 2239,
2266, 2266, 2266, 2266, 2266), MonthYear = structure(c(17532,
17563, 17591, 17956, 17987, 17532, 17563, 17591, 17622, 17652
), class = "Date"), aantalsessies = c(16L, 15L, 14L, 8L, 8L,
61L, 29L, 33L, 13L, 14L)), .Names = c("ChargePoint_skey", "MonthYear",
"aantalsessies"), class = c("grouped_df", "tbl_df", "tbl", "data.frame"
), row.names = c(NA, -10L), groups = structure(list(ChargePoint_skey =
c(2174,
2239, 2266), .rows = list(1:3, 4:5, 6:10)), .Names = c("ChargePoint_skey",
".rows"), row.names = c(NA, -3L), class = c("tbl_df", "tbl",
"data.frame"), .drop = TRUE))
As you can see there are a lot of groups in the column 'ChargePoint_skey' because there are a lot different ChargePoints. I want to visualize this data to recognize patterns about all the ChargePoints. Does anybody have a suggestion for a type of visualization? I was thinking of a stacked barchart like this:
stackedbarchart
But this isn't an option for me since I have many different ChargePoints in my data.
I hope somebody can help me with this!
Hard to post as a comment so I'm posting an answer.
3D plots (surface/scatter etc.) can be quite useful for visualizing data. Here's an example of a surface plot from one of my previous projects that I used to examine the relationship b/w the three variables and where the problem becomes infeasible (voids in the plot). This is an example of a full-factorial DOE.
Back to you problem now - I prefer using plotly for an interactive output that you can play around with. This (imo) is better than static tools that can only be manipulated via code (rotations/pan/zoom etc.).
CODE
library(plotly)
# dat is the data.frame from your dput output
plt <- plot_ly(dat, x = ~MonthYear, y = ~ChargePoint_skey, y = ~aantalsessies) %>%
add_markers()
OUTPUT
Since this is a small sample of the dataset, the plot is rather sparse. With a larger dataset you'd likely get some better insights.
You can refer to https://plotly.com/r/3d-scatter-plots/ for more information. The plots themselves can be saved as html files (can be opened in a browser) for sharing - using htmlwdigets package.
htmlwidgets::saveWidget(widget = as_widget(plt), file = 'myfile.html', selfcontained = T, title = 'my tab title')
Hope this is helpful!
Try something like this using geom_tile():
library(ggplot2)
#Code
df %>%
ggplot(aes(x=factor(ChargePoint_skey),y=factor(MonthYear),
fill=aantalsessies))+
geom_tile()+xlab('ChargePoint_skey')+ylab('MonthYear')
Output:

Creating a frequency histogram using ggplot2

Hi I am relatively new to R. I am struggling with what seems like it should be a relatively simple task- I am trying to make a frequency histogram using ggplot2 from a subset of data from a longer dataframe.
Here is an example of the data structure us in the picture attached
https://i.stack.imgur.com/HIwQv.png
The data is from a survey where 0 means not selected and 1 means it was selected. There are numeric in the original dataset I want a histogram of the frequency in which each variable was selected. The column variables on the x-axis and frequency counts on the y-axis. I have various subsets like this within a dataframe and I would like each to subset to how their own graph.
I first subset the columns of interest
new dataset <-subset(df, select = c(WAB_R, WAB_B, BDAE, PNT))
When I checked the class it was dataframe and no longer numeric
I tried to use as.numeric to convert it back to a numeric, but with no luck
I could use some guidance in how to structure the data to then obtain a histogram.
Thanks Carla
Maybe try this approach using tidyverse functions. You have to reshape to long selecting the desired variables. Here the code using ggplot2 for the final plot:
library(tidyverse)
#Code 1
df %>% select(c(WAB_R, WAB_B, BDAE, PNT)) %>%
pivot_longer(everything()) %>%
ggplot(aes(x=value))+
geom_histogram(stat = 'count',aes(fill=name),
position = position_dodge2(0.9,preserve = 'single'))+
labs(fill='Variable')
Output:
Or this:
#Code 2
df %>% select(c(WAB_R, WAB_B, BDAE, PNT)) %>%
pivot_longer(everything()) %>%
ggplot(aes(x=factor(value)))+
geom_histogram(stat = 'count',aes(fill=name),
position = position_dodge2(0.9,preserve = 'single'))+
labs(fill='Variable')+xlab('value')
Output:
Some data used:
#Data
df <- structure(list(ID = 1:4, WAB_R = c(0L, 1L, 0L, 1L), WAB_B = c(0L,
1L, 0L, 0L), BDAE = c(0L, 0L, 0L, 1L), PNT = c(0L, 0L, 0L, 0L
)), class = "data.frame", row.names = c(NA, -4L))

R: filter %in% range not filtering values with decimals

I have a dataset e:
`structure(list(num = c(23L, 23L, 23L), code = structure(1:3, .Label = c("A",
"B", "C"), class = "factor"), ranking = c(140.5, 140.5,
2662), bottom = c(-0.0207357225475016, -0.0146710913954366,
-0.019899240924872), previous = c(0.00312288516116536,
0.00207118230618904, -0.00191931365721628), mean_of_all = c(-0.000222419352160109,
-0.00107348087538642, -0.00202343390338765)), row.names = c(NA,
-3L), class = "data.frame")`
code:
`winner_filtered <- e %>%
group_by(code) %>%
filter(ranking %in% (winner_lower:winner_upper))`
is not filtering the two values with 140.5
Any guesses? Thanks.
As the column 'ranking' is numeric, it may not exactly be equal to the values generated from the sequence due to precision. So, the filter can be either with <, > operators or use a convenient wrapper between
library(dplyr)
e %>%
group_by(code) %>%
filter(between(ranking, winner_lower, winner_upper))

'height' must be a vector or a matrix. barplot error

I am trying to create a simple bar chart, but I keep receiving the error message
'height' must be a vector or a matrix
The barplot function I have been trying is
barplot(data, xlab="Percentage", ylab="Proportion")
I have inputted my csv, and the data looks as follows:
34.88372093 0.00029997
35.07751938 0.00019998
35.27131783 0.00029997
35.46511628 0.00029997
35.65891473 0.00069993
35.85271318 0.00069993
36.04651163 0.00049995
36.24031008 0.0009999
36.43410853 0.00189981
...
Where am I going wrong here?
Thanks in advance!
EDIT:
dput(head(data)) outputs:
structure(list(V1 = c(34.88372093, 35.07751938, 35.27131783,
35.46511628, 35.65891473, 35.85271318), V2 = c(0.00029997, 0.00019998,
0.00029997, 0.00029997, 0.00069993, 0.00069993)), .Names = c("V1",
"V2"), row.names = c(NA, 6L), class = "data.frame")
and barplot(as.matrix(data)) produced a chart with all the data one bar as opposed to each piece of data on a separate bar.
You can specify the two variables you want to plot rather than passing the whole data frame, like so:
data <- structure(list(V1 = c(34.88372093, 35.07751938, 35.27131783, 35.46511628, 35.65891473, 35.85271318),
V2 = c(0.00029997, 0.00019998, 0.00029997, 0.00029997, 0.00069993, 0.00069993)),
.Names = c("V1", "V2"), row.names = c(NA, 6L), class = "data.frame")
barplot(data$V2, data$V1, xlab="Percentage", ylab="Proportion")
Alternatively, you can use ggplot to do this:
library(ggplot2)
ggplot(data, aes(x=V1, y=V2)) + geom_bar(stat="identity") +
labs(x="Percentage", y="Proportion")
Probably the entire dataframe format is wrong, The same thing happened to me since I added the columns individually and made the dataframe together.
table.values = c(value1, value2,.......)
table = matrix(table.values,nrow=number of rows ,byrow = T)
colnames(table) = c("column1","column2",........)
row.names(table) = c("row1", "row2",............)
barplot(table, beside = T, xlab= "X-axis",ylab= "Y-axis")

ranking in a descending order in R

I want to rank the variables in my dataset in a descending order of the Number of Plants used. I tried ranking in .csv and then exporting it in R. But even then, the plot was not ranked in the required order. Here is my dataset
df <- structure(list(Lepidoptera.Family = structure(c(3L, 2L, 5L, 1L, 4L, 6L),
.Label = c("Hesperiidae", "Lycaenidae", "Nymphalidae", "Papilionidae", "Pieridae","Riodinidae"), class = "factor"),
LHP.Families = c(55L, 55L, 15L, 14L, 13L, 1L)),
.Names = c("Lepidoptera.Family", "LHP.Families"),
class = "data.frame", row.names = c(NA, -6L))
library(ggplot2)
library(reshape2)
gg <- melt(df,id="Lepidoptera.Family", value.name="LHP.Families", variable.name="Type")
ggplot(gg, aes(x=Lepidoptera.Family, y=LHP.Families, fill=Type))+
geom_bar(stat="identity")+
coord_flip()+facet_grid(Type~.)
How do i rank them in a descending order? Also, i want to combine 3 plots into one. How can i go about it?
The reason this is happening is that ggplot plots the x variables that are factors in the ordering of the underlying values (recall that factors are stored as numbers underneath the covers). If you want to graph them in an alternate order, you should change the order of the levels before plotting
gg$Lepidoptera.Family<-with(gg,
factor(Lepidoptera.Family,
levels=Lepidoptera.Family[order(LHP.Families)]))
The trick is to reorder the levels of the Lepidoptera.Family factor, which by default is alphabetical:
df = within(df, {
factor(Lepidoptera.Family, levels = reorder(Lepidoptera.Family, LHP.Families))
})
gg <- melt(df,id="Lepidoptera.Family", value.name="LHP.Families", variable.name="Type")
ggplot(gg, aes(x=Lepidoptera.Family, y=LHP.Families, fill=Type))+ geom_bar(stat="identity")+ coord_flip()+facet_grid(Type~.)

Resources