How to visualize this data clarifying in R and recognize patterns?

How to visualize this data clarifying in R and recognize patterns? - r

This is my dataframe:
dataframe
Output dput(dataframe):
structure(list(ChargePoint_skey = c(2174, 2174, 2174, 2239, 2239,
2266, 2266, 2266, 2266, 2266), MonthYear = structure(c(17532,
17563, 17591, 17956, 17987, 17532, 17563, 17591, 17622, 17652
), class = "Date"), aantalsessies = c(16L, 15L, 14L, 8L, 8L,
61L, 29L, 33L, 13L, 14L)), .Names = c("ChargePoint_skey", "MonthYear",
"aantalsessies"), class = c("grouped_df", "tbl_df", "tbl", "data.frame"
), row.names = c(NA, -10L), groups = structure(list(ChargePoint_skey =
c(2174,
2239, 2266), .rows = list(1:3, 4:5, 6:10)), .Names = c("ChargePoint_skey",
".rows"), row.names = c(NA, -3L), class = c("tbl_df", "tbl",
"data.frame"), .drop = TRUE))
As you can see there are a lot of groups in the column 'ChargePoint_skey' because there are a lot different ChargePoints. I want to visualize this data to recognize patterns about all the ChargePoints. Does anybody have a suggestion for a type of visualization? I was thinking of a stacked barchart like this:
stackedbarchart
But this isn't an option for me since I have many different ChargePoints in my data.
I hope somebody can help me with this!

Hard to post as a comment so I'm posting an answer.
3D plots (surface/scatter etc.) can be quite useful for visualizing data. Here's an example of a surface plot from one of my previous projects that I used to examine the relationship b/w the three variables and where the problem becomes infeasible (voids in the plot). This is an example of a full-factorial DOE.
Back to you problem now - I prefer using plotly for an interactive output that you can play around with. This (imo) is better than static tools that can only be manipulated via code (rotations/pan/zoom etc.).
CODE
library(plotly)
# dat is the data.frame from your dput output
plt <- plot_ly(dat, x = ~MonthYear, y = ~ChargePoint_skey, y = ~aantalsessies) %>%
add_markers()
OUTPUT
Since this is a small sample of the dataset, the plot is rather sparse. With a larger dataset you'd likely get some better insights.
You can refer to https://plotly.com/r/3d-scatter-plots/ for more information. The plots themselves can be saved as html files (can be opened in a browser) for sharing - using htmlwdigets package.
htmlwidgets::saveWidget(widget = as_widget(plt), file = 'myfile.html', selfcontained = T, title = 'my tab title')
Hope this is helpful!

Try something like this using geom_tile():
library(ggplot2)
#Code
df %>%
ggplot(aes(x=factor(ChargePoint_skey),y=factor(MonthYear),
fill=aantalsessies))+
geom_tile()+xlab('ChargePoint_skey')+ylab('MonthYear')
Output:

Related

How to get an specific data from a json column in a dataframe (R)?

I am working with an dataset that contains this kind of column, that looks like a json structure
enter image description here
I am trying to get only the occupation, how can I access that?
I tried to replace some symbols to convert into a chr vector
data$speakers <- str_replace(data$speakers, "[{", "(")
data:
structure(list(X_id = c(21L, 1L, 7L, 47L, 55L), duration = c(992L,
957L, 1266L, 1126L, 1524L), event = c("TED2006", "TED2006", "TED2006",
"TEDGlobal 2005", "TED2006"), likes = c("17000", "110000", "60000",
"80000", "14000"), published_date = structure(c(1156464660, 1151367060,
1151367060, 1158019860, 1153786260), class = c("POSIXct", "POSIXt"
), tzone = "UTC"), related_videos = c("[\"144\",\"1282\",\"1379\",\"87\",\"2302\",\"2638\"]",
"[\"243\",\"547\",\"2093\",\"74405\",\"64693\",\"83767\"]", "[\"1725\",\"2274\",\"172\",\"2664\",\"2464\",\"1268\"]",
"[\"2237\",\"701\",\"1095\",\"1386\",\"76211\",\"242\"]", "[\"2228\",\"1476\",\"800\",\"2890\",\"45233\",\"2694\"]"
), speakers = c("[{\"name\":\"Mena Trott\",\"occupation\":\"Blogger; cofounder, Six Apart\"}]",
"[{\"name\":\"Al Gore\",\"occupation\":\"Climate advocate\"}]",
"[{\"name\":\"David Pogue\",\"occupation\":\"Technology columnist\"}]",
"[{\"name\":\"David Deutsch\",\"occupation\":\"Physicist, author\"}]",
"[{\"name\":\"Jehane Noujaim\",\"occupation\":\"Filmmaker\"}]"
)), row.names = c(NA, 5L), class = "data.frame")

I got some help and it worked!
"[{"name":"Mena Trott","occupation":"Blogger; cofounder, Six Apart"}]" |>
jsonlite::fromJSON() |> as_tibble() |> select(occupation) |> as.vector()

Why geom_bracket is not allowing me to plot a bracket?

I would like to add a bracket using geom_bracket for my first two groups of countries the United Kingdom (UK) and France (FR). I use the following code and it plots the three estimates:
library(ggpubr)
library(ggplot2)
df %>%
ggplot(aes(estimate, cntry)) +
geom_point()
However, whenever i add the geom_bracket as below, i get an error. I tried to get around it in different ways but it is still not working. Could someone let me know what i am doing wrong?
df %>%
ggplot(aes(estimate, cntry)) +
geom_point() +
geom_bracket(ymin = "UK", ymax = "FR", x.position = -.75, label.size = 7,
label = "group 1")
Here is a reproducible example:
structure(list(cntry = structure(1:3, .Label = c("BE", "FR",
"UK"), class = "factor"), estimate = c(-0.748, 0.436,
-0.640)), row.names = c(NA, -3L), groups = structure(list(
cntry = structure(1:3, .Label = c("BE", "FR", "UK"), class = "factor"),
.rows = structure(list(1L, 2L, 3L), ptype = integer(0), class = c("vctrs_list_of",
"vctrs_vctr", "list"))), row.names = c(NA, 3L), class = c("tbl_df",
"tbl", "data.frame"), .drop = TRUE), class = c("grouped_df",
"tbl_df", "tbl", "data.frame"))

Well, it's pretty damn late at that, but I figured out a workaround for this. I though that I might as well post it here in case anyone finds it useful.
Firstly, as Basti mentioned, ymin, ymax, and x.position aren't arguments that can be used - you have to use xmin, xmax, and y.position. Now, won't this only work for a flipped graph (i.e. x = cntry, y = estimate)? Yes, it will. However you can easily get around this by using coord_flip().
Secondly, it turns out that geom_bracket doesn't inherit the data description (df) and won't run without it being defined inside it. Why? No idea. But this is what was causing the error. Additionally, for some reason, merely defining the data isn't enough, a label must also be added. Not a problem here, just thought I might mention it for dumb people like me who decided to use geom_bracket to add brackets to stat_compare_means.
Here's an example of the OP that should work, along with data generation:
library(ggplot2)
library(ggpubr)
library(tibble) #I like tibbles
df <- tibble(cntry = factor(c("BE", "FR", "UK")),
estimate = c(-0.748,0.436,-0.64)) #dataframe generation
df %>%
ggplot(aes(cntry, estimate)) +
geom_point() +
coord_flip() + #necessary if you want to keep this weird x/y orientation
geom_bracket(data = df, xmin = "UK", xmax = "FR", y.position = -.75,
label.size = 7, label = "group 1", coord.flip = T)
#coord.flip = T reflects the added coord_flip()
You can then play around with y coordinates, size, etc. You can also expand the graph using expand_limits().

Grouped barplots in R using csv

I have a 3 column csv file like this
x,y1,y2
100,50,10
200,10,20
300,15,5
I want to have a barplot using R, with first column values on x axis and second and third columns values as grouped bars for the corresponding x. I hope I made it clear. Can someone please help me with this? My data is huge so I have to import the csv file and can't enter all the data.I found relevant posts but none was exactly addressing this.
Thank you

Use the following code
library(tidyverse)
df %>% pivot_longer(names_to = "y", values_to = "value", -x) %>%
ggplot(aes(x,value, fill=y))+geom_col(position = "dodge")
Data
df = structure(list(x = c(100L, 200L, 300L), y1 = c(50L, 10L, 15L),
y2 = c(10L, 20L, 5L)), class = "data.frame", row.names = c(NA,
-3L))

Clusters on separate pages of pdf. Each row may belong to different clusters

As title says I would like to save each cluster on separate page of pdf file.
Example data:
structure(list(P1 = c("ATCG00490", "AT5G17710", "AT2G42910",
"AT4G23600", "AT3G61540", "AT2G05990"), P2 = c("AT5G38420", "AT5G20070",
"AT5G04230", "AT1G08200", "AT4G30910", "AT5G52100"), clique = structure(list(
`930` = integer(0), `2090` = integer(0), `3120` = c(2L, 3L,
231L), `3663` = integer(0), `3704` = integer(0), `4156` = c(19L,
27L)), .Names = c("930", "2090", "3120", "3663", "3704",
"4156"), class = "AsIs")), .Names = c("P1", "P2", "clique"), row.names = c(930L,
2090L, 3120L, 3663L, 3704L, 4156L), class = "data.frame")
Some of the rows belong to many clusters and some of them just to single one. Of course all possible variants have to be considered.
If it's possible I would like to keep only clusters which have at least two members.
That's the function which I use if each of the row belongs to single cluster:
pdf("clusters.pdf", , width=12, height=18)
lapply(split(data_cluster, data_cluster$cluster), function(d) {
grid::grid.newpage()
gridExtra::grid.table(d)
}
)
dev.off()
Maybe it will help someone to find an answer for me.
EDIT:
I made a mistake while preparing an example data... Please take a look on my original data and than you will find out that's not that simple (at least in my opinion).
structure(list(P1 = c("ATCG00490", "AT5G17710", "AT2G42910",
"AT4G23600", "AT3G61540", "AT2G05990"), P2 = c("AT5G38420", "AT5G20070",
"AT5G04230", "AT1G08200", "AT4G30910", "AT5G52100"), clique = structure(list(
`930` = integer(0), `2090` = integer(0), `3120` = c(2L, 3L,
231L), `3663` = integer(0), `3704` = integer(0), `4156` = c(19L,
27L)), .Names = c("930", "2090", "3120", "3663", "3704",
"4156"), class = "AsIs")), .Names = c("P1", "P2", "clique"), row.names = c(930L,
2090L, 3120L, 3663L, 3704L, 4156L), class = "data.frame")

It seems that this is only a question of splitting a variable to a long format data.frame. library(splitstackshape) does just that. Here is a solution using #Ananda's suggestion of listCol_l rather than cSplit.
library(splitstackshape)
data_cluster <- listCol_l(data_cluster, "clique")
data_cluster <- data_cluster[,n := .N >= 2,by=clique_ul][!is.na(clique_ul) & n,][,n :=NULL]
pdf("clusters.pdf", width=12, height=18)
lapply(unique(data_cluster$clique_ul), function(i) {
grid::grid.newpage()
gridExtra::grid.table(data_cluster[clique_ul == i,])
})
dev.off()
This will produce an empty pdf document with your dataset, since no cluster is repeated.

'height' must be a vector or a matrix. barplot error

I am trying to create a simple bar chart, but I keep receiving the error message
'height' must be a vector or a matrix
The barplot function I have been trying is
barplot(data, xlab="Percentage", ylab="Proportion")
I have inputted my csv, and the data looks as follows:
34.88372093 0.00029997
35.07751938 0.00019998
35.27131783 0.00029997
35.46511628 0.00029997
35.65891473 0.00069993
35.85271318 0.00069993
36.04651163 0.00049995
36.24031008 0.0009999
36.43410853 0.00189981
...
Where am I going wrong here?
Thanks in advance!
EDIT:
dput(head(data)) outputs:
structure(list(V1 = c(34.88372093, 35.07751938, 35.27131783,
35.46511628, 35.65891473, 35.85271318), V2 = c(0.00029997, 0.00019998,
0.00029997, 0.00029997, 0.00069993, 0.00069993)), .Names = c("V1",
"V2"), row.names = c(NA, 6L), class = "data.frame")
and barplot(as.matrix(data)) produced a chart with all the data one bar as opposed to each piece of data on a separate bar.

You can specify the two variables you want to plot rather than passing the whole data frame, like so:
data <- structure(list(V1 = c(34.88372093, 35.07751938, 35.27131783, 35.46511628, 35.65891473, 35.85271318),
V2 = c(0.00029997, 0.00019998, 0.00029997, 0.00029997, 0.00069993, 0.00069993)),
.Names = c("V1", "V2"), row.names = c(NA, 6L), class = "data.frame")
barplot(data$V2, data$V1, xlab="Percentage", ylab="Proportion")
Alternatively, you can use ggplot to do this:
library(ggplot2)
ggplot(data, aes(x=V1, y=V2)) + geom_bar(stat="identity") +
labs(x="Percentage", y="Proportion")

Probably the entire dataframe format is wrong, The same thing happened to me since I added the columns individually and made the dataframe together.
table.values = c(value1, value2,.......)
table = matrix(table.values,nrow=number of rows ,byrow = T)
colnames(table) = c("column1","column2",........)
row.names(table) = c("row1", "row2",............)
barplot(table, beside = T, xlab= "X-axis",ylab= "Y-axis")

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

How to visualize this data clarifying in R and recognize patterns? - r

Try something like this using geom_tile(): library(ggplot2) #Code df %>% ggplot(aes(x=factor(ChargePoint_skey),y=factor(MonthYear), fill=aantalsessies))+ geom_tile()+xlab('ChargePoint_skey')+ylab('MonthYear') Output:

Related

How to get an specific data from a json column in a dataframe (R)?

Why geom_bracket is not allowing me to plot a bracket?

Grouped barplots in R using csv

Clusters on separate pages of pdf. Each row may belong to different clusters

'height' must be a vector or a matrix. barplot error

Categories

Resources