ggplot2 stacked area line charts producing odd lines and holes - r

I have a data set that's structured as follows:
year color toyota honda ford
2011 blue 66 75 13
2011 red 75 91 62
2011 green 65 26 57
2012 blue 64 23 10
2012 red 84 8 62
2012 green 67 21 62
2013 blue 31 74 49
2013 red 48 43 35
2013 green 57 62 74
2014 blue 59 100 32
2014 red 72 47 67
2014 green 97 24 70
2015 blue 31 0 79
2015 red 60 35 74
2015 green 51 2 28
(My actual data, presented in the chart images below, is much larger and has 100s of "colors" but I'm simplifying here so you can merely understand the structure.)
I am trying to make a stacked area line chart that shows how many cars of each color are produced over time for a specific company. (i.e. each company has its own chart in which x axis = years, y axis = cars produced).
I run this code:
qplot(year, toyota, data = dataName, fill = color, group = color, geom= "area", position = "stack")
+ geom_area() + theme(legend.position = "none")
However, every company's chart has issues. There are seemingly random cut-out holes as well as lines that cut across the top of the layers.
company1_chart
company2_chart
I'm confused why this is happening or even possible (especially the holes... won't the data stack down?) Would it help if I made the companies long rather than wide in the data structure?

Even with 0 values, you should not have those errors. I took your data and added 0's in the honda column sporadically.
The code (using ggplot2)
library(ggplot2)
df <- read.csv("cartest.csv", header = TRUE)
ggplot(data=df,aes(x=year,y=h,fill=color)) +
geom_area() +
ggtitle("car test")
If you are importing your data as a CSV or TSV and your data columns are numeric you should not have this issue. If it was imported as .character you can convert using:
df$h <- as.numeric(df$h)

Related

How to visualize a layer onto a map with R?

I am trying to produce a map which would visualize the lu_effic on my map. With the current codes I can visualize the map and the lu_effic category, but they are not layered on each other. I would like to fill the map with lu_effic colors. I tried with aes to add fill in geom_sf(data = shp_dt) but didn't work. The geom_polygon() also showed error, as the shape file data is a sf format. It has geometry column instead of longitude and latitude columns separately. Hence, failed to run aes (x= long, y=lat). I need to fill the map with lu_effic colors and other necessary visualization elements. Would anybody kindly assist me to solve this? Thanks in advance.
The data frame urb_dtf is here: https://docs.google.com/spreadsheets/d/1mZOkMTLx830oPaqtg9TVpg0OTP5fGzcR/edit?usp=sharing&ouid=114933701557583416369&rtpof=true&sd=true
urb_dtf
dtafil.GCPNT_LAT dtafil.GCPNT_LON lu_effic dtafil.UC_NM_MN
1 26.23825 88.40822 2.49722588 Atwari
2 26.03490 88.44599 1.30779927 Thakurgaon
3 26.33849 88.55293 4.41597618 Panchagarh
4 25.86100 88.36024 0.60699713 Pirganj
5 24.94617 88.21594 0.15688455 Asia
6 26.39944 88.96166 1.70447128 Asia
7 25.85807 88.65385 4.35869046 N/A
8 24.61404 88.23778 0.69757530 Chapainawabganj
9 25.62459 88.62559 0.70376589 Dinajpur
10 26.34469 89.01786 -0.78882643 Patgram
11 25.93154 88.85591 1.17247204 Nilphamari
12 25.79222 88.89938 0.53713622 Saidpur
13 24.44356 88.34488 21.83406438 Godagari
14 25.67936 89.05180 0.94306324 Bodorganj
15 24.41426 88.61983 0.67571401 Rajshahi
16 25.10153 89.02355 9.89191877 Joypurhat
17 24.80587 88.95890 1.26512380 Naogaon
18 24.37298 88.76614 0.43754723 Baneshwar
19 23.78359 88.63357 0.74975530 Meherpur
20 24.41677 88.99007 0.82625124 নাটোর
21 25.69513 89.60984 1.43875426 Ulipur
22 24.94282 89.37863 1.15731210 Bogura
23 25.32976 89.53690 1.28528943 Gaibandha
24 23.63560 88.83938 0.88394331 Chuadanga
25 24.08493 89.09479 0.36944300 Ishurdi
26 23.75310 88.94598 0.52944470 Alamdanga
27 23.90058 89.10053 0.68007198 Barkhada
28 24.65859 89.44197 1.51617122 Bogra
29 24.01567 89.26079 0.71299373 Pabna
30 23.86184 89.15752 0.82472321 Alampur
31 24.41254 89.64289 0.95057900 Sirajganj
32 23.40571 89.01174 0.69460150 Kotchadpur
33 23.04288 88.89597 0.75806861 Asia
34 23.54037 89.17593 0.61603886 Jhenaidah
35 23.40988 89.13477 1.10069949 কালীগঞà§\u008dজ
36 24.96816 89.97172 0.46869564 Jamalpur
37 23.16668 89.20843 0.52640165 Jessore
38 23.48845 89.42102 1.09687780 Magura
39 24.32668 89.94591 0.82200687 Tangail
40 24.76605 90.25557 0.56361005 Muktagacha
41 22.84015 89.55533 -1.26805716 Khulna
42 23.58577 89.83287 1.35248758 Faridpur
43 24.74904 90.38985 0.17224254 Mymensingh
44 23.84481 90.00978 0.95263604 Manikgonj
45 24.53342 90.34734 0.46884662 Trishal
46 23.80705 90.40166 0.17146435 Dhaka
47 23.91039 90.06940 0.25197463 Jaigir
48 24.54544 90.42188 0.26382334 Rampur
49 24.42903 90.38683 0.77181766 Bhaluka
50 24.49606 90.49881 0.32848142 Gafargaon
51 23.01082 89.82701 -2.48743996 Gopalganj
52 22.66161 89.77432 -0.26417887 Bagerhat
53 24.87865 90.72984 0.02135679 Netrokona
54 23.10274 90.17751 -2.59794615 Madaripur
55 24.41027 90.81461 0.29627310 Kishoreganj
56 24.57433 90.89773 0.40133470 Tarail
57 23.21486 90.33779 1.49758668 Shariatpur
58 23.55846 90.60338 0.84538204 N/A
59 24.05720 90.98408 0.15528478 Brahmanbaria
60 23.88657 90.73217 0.33951829 N/A
61 23.55013 90.66072 1.08523717 Gazaria
62 23.93776 90.86446 0.26901075 N/A
63 23.69415 90.83184 0.35016879 Homna
64 22.70696 90.35247 -0.89130028 Barisal
65 23.14673 90.80557 0.57463630 Chandpur
66 25.06408 91.40035 0.13625843 Sunamganj District
67 23.46517 90.78352 0.75471736 N/A
68 23.80230 90.97777 0.18687093 Nabinagar
69 23.89096 91.03594 0.05931816 Shibpur
70 24.06237 91.17975 0.09915729 Shahbazpur Town
71 22.99137 90.86482 0.19705905 Lakshmipur
72 22.67565 90.65828 1.14616972 Bhola
73 24.38139 91.41250 0.16456846 Habiganj
74 24.87859 91.90338 0.11481848 Sylhet
75 24.49782 91.78539 0.13333066 Moulvibazar
76 24.33299 91.73140 0.20055411 Sreemangal
77 24.82160 92.16267 0.18369562 Beanibazar
78 23.00552 91.40018 0.53240186 Feni
79 22.43880 91.82426 0.05366313 Chattogram
80 23.10985 91.98522 0.13909249 Khagrachhari
81 22.52012 91.92949 0.05425749 Raujan
82 21.75023 92.05997 0.11659127 Chakaria
83 22.20431 92.21605 0.08462762 Bandarban
84 21.43620 92.02313 0.07814110 Ramu
85 20.86985 92.29864 0.17803279 Teknaf
The shape file link: https://biogeo.ucdavis.edu/data/diva/adm/BGD_adm.zip
The Zip file is available at DIVA-GIS (a popular platform to download shape file). DIVA-GIS-> free spatial data -> country level data
The codes I ran are as follows:
library(sf)
library(ggplot2)
library(tidyverse)
urb_dt= read.csv("GHS_STAT_V1_2_BGD.csv")
urb_dtf= data.frame(dtafil$GCPNT_LAT, dtafil$GCPNT_LON, lu_effic, dtafil$UC_NM_MN)
dput(urb_dtf)
shp_dt= st_read("bgd_admbnda_adm3_bbs_20201113.shp")
str(shp_dt)
dput(shp_dt)
ggplot()+ geom_sf(data = shp_dt)+
geom_polygon(data = urb_dtf, aes(dtafil.GCPNT_LON, dtafil.GCPNT_LAT, group = lu_effic, fill = lu_effic))
Finally, I get the following image. But I would like to fill the map with the colors of lu_effic.
1

How to get shaded background in xyplot in lattice for different grouping variables in different panels

By pulling up the data.frame that comes with the package lattice and latticeExtra we get the following :
require(lattice)
require(latticeExtra)
data("environmental")
This is the way the head(environmental) looks:
ozone radiation temperature wind
41 190 67 7.4
36 118 72 8.0
12 149 74 12.6
18 313 62 11.5
23 299 65 8.6
19 99 59 13.8
Now please consider the code for the plot and the plot itself:
temp.cut <- equal.count(environmental$temperature,4)
xyplot(ozone~radiation | temp.cut,data = environmental,
panel = function(x,y,...){
panel.xyplot(x,y,...)
x <- 1:400
panel.xblocks(x,x>20&x<50,col = 'green',alpha=0.2)
panel.xblocks(x,x>80&x<100,col = 'blue',alpha=0.2)
})
My doubt is how do I get the shaded areas to behave sensibly for each panel ? Say I want blue shade in the first panel but not on the others, same reasoning apply for the green shaded area, on the second panel but not on the others. Thanks in advance !

cluster analysis with weight

I have a data frame 'heat' demonstrating people's performance across time.
'Var1' represents the code of persons.
'Var2' represents a time line (measured by number of days from the starting point).
'Variable' is the score they get at a given time point.
Var1 Var2 value
1 1 36 -0.6941826
2 2 36 -0.5585414
3 3 36 0.8032384
4 4 36 0.7973031
5 5 36 0.7536959
6 6 36 -0.5942059
....
54 10 73 0.7063218
55 11 73 -0.6949616
56 12 73 -0.6641516
57 13 73 0.6890433
58 14 73 0.6310124
59 15 73 -0.6305091
60 16 73 0.6809655
61 17 73 0.8957870
....
101 13 110 0.6495796
102 14 110 0.5990869
103 15 110 -0.6210600
104 16 110 0.6441960
105 17 110 0.7838654
....
Now I want to cluster their performance and reflect it on a heatmap. So I used the function dist() and hclust() to clustered the data frame and plotted it with ggplot2:
ggplot(data = heat) + geom_tile(aes(x = Var2, y = Var1 %>% as.character(),
fill = value)) +
scale_fill_gradient(low = "yellow",high = "red") +
geom_vline(xintercept = c(746, 2142, 2917))
It looks like this:
However, I am more interested in what happened around day 746, day 2142 and day 2917 (the black lines). I would like the scores around these days bearing more weight in the clustering. I want people demonstrating similar performance around these days to have more priority to be clustered together. Is there a way of doing this?
As long as your weights are integer, you supposedly can just replicate those days artificially.
If you want more control, just compute the distance matrix yourself, with whatever weighted distance you want to use.

rRharts shows in Rstudio and browser but not R viewer

Morning Community,
I wanted to ask a quick question regarding rCharts graph outputs compared to native R.
Question 1: Why are graphs from rCharts displayed in my browser rather than the viewer in R?
Question 2: How can I force (or choose to use) the graphing function in native R instead?
See these two screen shots:
Code for native R:
# Simple Scatterplot
attach(mtcars)
plot(wt, mpg, main="Scatterplot Example",
xlab="Car Weight ", ylab="Miles Per Gallon ", pch=19)
Code for rChart:
library(rCharts)
myData
plot<-Highcharts$new()
plot$chart(polar = TRUE, type = "line",height=NULL)
plot$xAxis(categories=myData$Subject.ID, tickmarkPlacement= 'on', lineWidth=1)
plot$yAxis(gridLineInterpolation= 'circle', lineWidth=1, min=NULL,max=NULL,endOnTick=T,tickInterval=10)
plot$series(data = myData[,"A"],name = "A", pointPlacement="on")
plot
rChart Data used
Subject.ID A B C
1 1 65 29 60
2 2 87 67 59
3 3 98 54 24
4 4 67 44 23
5 5 54 50 4
6 6 83 60 54
7 7 82 55 27
8 8 80 48 32
9 9 88 56 44
10 10 68 68 56
11 11 90 76 69
12 12 41 47 45
13 13 NA 82 NA
14 14 NA 55 NA
Ps: As an aside, I understand that I am graphing two different functions, a scatterplot vs radar plot. My goal is to understand whether or not native R can display (or perhaps another word) the graph output from rCharts - Even if I lose interactivity.
I have reached out to the developer for rCharts and he has replied back to me:
"The native viewer that comes with the R GUI is NOT capable of displaying html. So, the only way to view html output like what rCharts generates is to use the browser. The RStudio viewer on the other hand is capable of displaying html and so rCharts takes advantage of that."

Barchart help in R

I am trying to set up a bar chart to compare control and experimental samples taken of specific compounds. The data set is known as 'hydrocarbon3' and contains the following information:
Exp. Contr.
c12 89 49
c17 79 30
c26 78 35
c42 63 3
pris 0.5 0.8
phy 0.5 0.9
nap 87 48
nap1 83 44
nap2 78 44
nap3 73 20
acen1 81 50
acen2 86 46
fluor 83 11
fluor1 68 13
fluor2 79 17
dibe 65 7
dibe1 67 6
dibe2 56 10
phen 82 13
phen1 70 12
phen2 65 15
phen3 53 14
fluro 62 9
pyren 48 11
pyren1 34 10
pyren2 19 8
chrys 22 3
chrys1 21 3
chrys2 21 3
When I create a bar chart with the formula:
barplot(as.matrix(hydrocarbon3),
main=c("Fig 1. Change in concentrations of different hydrocarbon compounds\nin sediments with and without the presence of bacteria after 21 days"),
beside=TRUE,
xlab="Oiled sediment samples collected at 21 days",
space=c(0,2),
ylab="% loss in concentration relative to day 0")
I receive this diagram, however I need the control and experimental samples of each chemical be next to each other allow a more accurate comparison, rather than the experimental samples bunched on the left and control samples bunched on the right: Is there a way to correct this on R?
Try transposing your matrix:
barplot(t(as.matrix(hydrocarbon3)), beside=T)
Basically, barplot will plot things in the order they show up in the matrix, which, since a matrix is just a vector wrapped colwise, means barplot will plot all the values of the first column, then all those of the second column, etc.
Check this question out: Barplot with 2 variables side by side
It uses ggplot2, so you'll have to use the following code before running it:
intall.packages("ggplot2")
library(ggplot2)
Hopefully this works for you. Plus it looks a little nicer with ggplot2!
> df
row exp con
1 a 1 2
2 b 2 3
3 c 3 4
> barplot(rbind(df$exp,df$con),
+ beside = TRUE,names.arg=df$row)
produces:

Resources