I have the following data in a data frame;
TYPE_OF_COMPANY COUNT_OF_COMPANIES
AIM-Listed 876
Charitable-organisation 82
Industrial-Provident 50
Limited-Partnership 2
Limited by Guarantee 277
Limited Liability Partnership 167
Listed-LSE 1131
Not-Companies-Act 75
Private Limited Company 1163
Public-Unlisted 418
Royal-Charter 5
Unlimited 111
I want represent this data is a bar plot. When this data is in a table then my code succeeds but I was having a problem sorting the data but I was happy with the out put using the following code:
barplot(counts,
Xlab='TYPES_OF_COMPANIES', ylab='TYPE_OF_COMPANY', ylim=c(0,1200),
names.arg = c("LP", "RC", "IP",
"N-C-A", "C-O", "Guarantee",
"U-L", "C-O",
"LLP","L-G","P-U",
"A-L", "LSE", "PL"),
main='Number of Different Types of Companies in the database')
When I tried to modify the code to use a data frame, it gave me an error. I know I could use ggplot2 package to do it, but this is for illustration, and I want to do it in base R.
You can show me how to sort the table or how to make barplot work with the data frame.
Any help is greatly appreciated.
Does this work for you?
barplot(sort(counts$COUNT_OF_COMPANIES),
xlab = 'TYPES_OF_COMPANIES', ylab = 'Count', ylim = c(0, 1200),
names.arg = c("AIM", "Charity", "I-P", "L-P",
"L-G", "LLP", "L-LSE", "NCA",
"PLC", "P-UL", "RC", "UL")[order(counts$COUNT_OF_COMPANIES)],
main = 'Number of Different Types of Companies in the database')
Instead of writing the names.arg, You could do:
labs <- c("AIM", "Charity", "I-P", "L-P", "L-G", "LLP",
"L-LSE", "NCA", "PLC", "P-UL", "RC", "UL")
new_df <- transform(df, labels = ordered(labs, labs[order(COUNT_OF_COMPANIES)]))
barplot(COUNT_OF_COMPANIES~labels, new_df, xlab='TYPES_OF_COMPANIES',
ylab='TYPE_OF_COMPANY',
main='Number of Different Types of Companies in the database')
Related
I am trying to use spplot to visualize plots from different months. I'd like to change this figure so the same months are in the same columns to easily compare. I would like to push May 2016 5 panels in, so all the rest of the months are in line. I hope this makes sense.
click here for figure
I have missing data for Dec2017 for now which is why it's blacked out.
Here is my code:
stack_months <- stack(May2016, June2016, July2016, Aug2016, Sep2016, Oct2016, Nov2016, Dec2016, January2017, Febuary2017, March2017, April2017, May2017, June2017, July2017, July2017, Aug2017, Sep2017, Oct2017, Nov2017, Dec2017, January2018, Febuary2018, March2017, April2018, May2018, June2017, July2017, July2018, Aug2018, Sep2018, Oct2018, Nov2018, Dec2018, January2019, Febuary2019, March2019, April2019, May2019, June2019, July2019, July2019)
spplot(stack_months, col.regions=viridis(20), names.attr = c("May2016", "June2016", "July2016", "Aug2016", "Sep2016", "Oct2016", "Nov2016", "Dec2016",
"Jan2017", "Feb2017", "March2017", "April2017", "May2017", "June2017", "July2017", "July2017", "Aug2017", "Sep2017", "Oct2017", "Nov2017", "Dec2017",
"Jan2018", "Feb2018", "March2017", "April2018", "May2018", "June2017", "July2017", "July2018", "Aug2018", "Sep2018", "Oct2018", "Nov2018", "Dec2018",
"Jan2019", "Feb2019", "March2019", "April2019", "May2019", "June2019", "July2019", "July2019"), layout = c(12,4))
Is there an easy way to manipulate the panels?
Note that you have type some of the months twice, for example July2017 appeared 3 times and March2017, 2 times, June2017 2x and July2019 2x.
What I have below are complete months from May2016 to July2019, so that when you plot, the months will align.
library(raster)
library(sp)
library(viridis)
library(lattice)
months=c("May2016", "June2016", "July2016", "Aug2016", "Sep2016", "Oct2016",
"Nov2016", "Dec2016", "Jan2017", "Feb2017", "March2017", "April2017",
"May2017", "June2017", "July2017","Aug2017", "Sep2017",
"Oct2017", "Nov2017", "Dec2017","Jan2018", "Feb2018", "March2018",
"April2018", "May2018", "Jun2018", "July2018", "Aug2018",
"Sep2018", "Oct2018", "Nov2018", "Dec2018","Jan2019", "Feb2019",
"March2019", "April2019", "May2019", "June2019", "July2019")
I don't have your data, so I simulate something for the image:
r <- raster(system.file("external/test.grd", package="raster"))
stack_months = do.call(stack,lapply(months,function(i)runif(1)*r))
You defined layout to be 12,4 so you will have 48 entries which are filled by row. In your case, the first 4 will not be plotted and the last 5 will not be plotted:
SKIP = rep(FALSE,12*4)
SKIP[1:4] = TRUE
SKIP[44:48] = TRUE
Then we plot using the SKIP above:
spplot(stack_months, col.regions=viridis(20),
layout = c(12,4),
strip = strip.custom(par.strip.text = list(cex = 0.65)),
names.attr = months,
skip=SKIP
)
pardon me if it is a basic question, this is my first time to write here, so my thanks in advance.
I have exported a report from Google Analytics with columns Longitude, Latitude and Sessions and I want to add these data points to polygon map I have created in R for administrative regions of Slovakia.
This is what I have for now.
##Load the Raster Library
library(raster)
##Get the Province Shapefile for Slovakia
slovakia_level_1 <- getData('GADM', country='SVK', level=1)
slovakia_level_2 <- getData('GADM', country='SVK', level=2)
##Plot this shapefile
plot(slovakia_level_1)
library(ggmap) ##load the ggmap package so we can access the crime data
## read our dataset with sessions from google analytics ( more on how to read excel files http://www.sthda.com/english/wiki/reading-data-from-excel-files-xls-xlsx-into-r)
library(readxl) ## this is the dataframe from google analytics and i would like to plot these data to the slovakia administrtaive region map
lugera <- read_excel("Analytics 01. [Lugera.sk] - [Reporting View] - [Filtered Data] New Custom Report 20190101-20190627.xlsx")
But i really do not know how to move on. I went based on this article http://data-analytics.net/wp-content/uploads/2014/09/geo2.html but i have stuck when i needed to plot points.
This is a sample how the report from google analytics looks like:
Longitude Latitude Sessions
17.1077 48.1486 25963
0.0000 0.0000 13366
21.2611 48.7164 4732
18.7408 49.2194 3154
21.2393 49.0018 2597
18.0335 48.8849 2462
19.1462 48.7363 2121
17.5833 48.3709 1918
18.0764 48.3061 1278
14.4378 50.0755 1099
20.2954 49.0511 715
18.1571 47.9882 663
18.6245 48.7745 653
17.8272 48.5918 620
18.9190 49.0617 542
19.1371 48.5762 464
-6.2603 53.3498 369
18.1700 48.5589 369
20.5637 48.9453 325
-0.1278 51.5074 284
21.9184 48.7557 258
Can someone help me how to progress from here as I am struggling to figure it out how to plot those points on polygon map.
Is it also possible to create a heat map over particular regions as well, please?
I hope it was clear, but if not, please tell me, i will improve my question, this is my first time to ask.
Thank you very much!
UPDATE
I was trying to reproduce Jay`s answer and the first map with red dots works awesome! Thanks!
But in case of the heat map I am getting errors and cannot reproduce the same map as I am getting several erros.
Belowe is my code how it looks like and I am not sure where is the issue as I tried to name my dataframe as ses the same way as in jay`s answer.
##Load the Raster Library
library(raster) # imports library(sp)
slovakia_level_1 <- getData('GADM', country='SVK', level=1)
##Plot
plot(slovakia_level_1)
points(coordinates(slovakia_level_2), pch=20, col="red")
#ses is my google analytics dataframe where all 3 columns Longitude, Latitude and Sessions are numeric
## it is imported excel file to r and stored as a dataframe
ses
spdf <- SpatialPointsDataFrame(coords=ses[1:2], data=ses[3],
proj4string=CRS(proj4string(slovakia_level_2)))
ppl.sum <- aggregate(x=spdf["Sessions"], by=slovakia_level_2, FUN=sum)
spplot(ppl.sum, "Sessions", main="Sessions in Slovakia")
These are the errors I am getting
spdf <- SpatialPointsDataFrame(coords=ses[1:2], data=ses[3],
+ proj4string=CRS(proj4string(slovakia_level_2)))
Error in proj4string(slovakia_level_2) :
object 'slovakia_level_2' not found
> ppl.sum <- aggregate(x=spdf["Sessions"], by=slovakia_level_2, FUN=sum)
Error in aggregate(x = spdf["Sessions"], by = slovakia_level_2, FUN = sum) :
object 'spdf' not found
> spplot(ppl.sum, "Sessions", main="Sessions in Slovakia")
Error in spplot(ppl.sum, "Sessions", main = "Sessions in Slovakia") :
object 'ppl.sum' not found
Please, take my huge thanks for being so helpful on my first question and I cannot express my respect to all people at StackOverflow.
Thank you
Actually there's a coordinates() function included in the sp package (imported from raster), with which we easily can add the points to the plot.
library(raster) # imports library(sp)
slovakia_level_1 <- getData('GADM', country='SVK', level=1)
slovakia_level_2 <- getData('GADM', country='SVK', level=2)
##Plot
plot(slovakia_level_1)
points(coordinates(slovakia_level_2), pch=20, col="red")
To get a heatmap using your google analytics data (here ses) we can use spplot(), also included in sp. First we need to create a SpatialPointsDataFrame, which - according to this post on gis.stackexchange - we aggregate to match ses$Sessionspoints and polygons from slovakia_level_2.
spdf <- SpatialPointsDataFrame(coords=ses[1:2], data=ses[3],
proj4string=CRS(proj4string(slovakia_level_2)))
ppl.sum <- aggregate(x=spdf["Sessions"], by=slovakia_level_2, FUN=sum)
spplot(ppl.sum, "Sessions", main="Sessions in Slovakia")
Result
Data
# your data from google analytics above
ses <- structure(list(Longitude = c(17.1077, 0, 21.2611, 18.7408, 21.2393,
18.0335, 19.1462, 17.5833, 18.0764, 14.4378, 20.2954, 18.1571,
18.6245, 17.8272, 18.919, 19.1371, -6.2603, 18.17, 20.5637, -0.1278,
21.9184), Latitude = c(48.1486, 0, 48.7164, 49.2194, 49.0018,
48.8849, 48.7363, 48.3709, 48.3061, 50.0755, 49.0511, 47.9882,
48.7745, 48.5918, 49.0617, 48.5762, 53.3498, 48.5589, 48.9453,
51.5074, 48.7557), Sessions = c(25963L, 13366L, 4732L, 3154L,
2597L, 2462L, 2121L, 1918L, 1278L, 1099L, 715L, 663L, 653L, 620L,
542L, 464L, 369L, 369L, 325L, 284L, 258L)), row.names = c(NA,
-21L), class = "data.frame")
The simplest way to do it would be this (slov_df is your dataset):
library(sp)
library(ggplot2)
slov_reg <- fortify(slovakia_level_2)
ggplot() +
geom_polygon(data = slov_reg, aes(x = long, y = lat, group = group), col = "black", fill = NA) +
geom_point(data = slov_df, aes(x = Longitude, y = Latitude))
EDIT:
Nice solution by jay.sf. If you like this let me provide another option:
sp_google <- SpatialPointsDataFrame(coords=slov_df[1:2], data=slov_df[3],
proj4string=CRS(proj4string(slovakia_level_2)))
slovakia_level_2#data$Sessions <- over(slovakia_level_2, sp_google, fn = sum)$Sessions
slovakia_level_2#data$id <- row.names(slovakia_level_2#data)
slov_reg <- fortify(slovakia_level_2, region = "id")
slov_reg <- join(slov_reg, slovakia_level_2#data, by="id")
ggplot() +
geom_polygon(data = slov_reg, aes(x = long, y = lat, group = group, fill = Sessions), col = "black") +
scale_fill_gradient(low = "yellow", high = "red", na.value = "lightgrey") +
theme_bw()
It's a little bit more work, but in the end ggplot offers you a much wider range of customization options. It's a question of your preference.
I am using argparse library to build a boxplot with ggpubr library from command line interface. I can reorder particular column of my interest manually.
However, I want to reorder particular column using argparse.
I cannot figure it out, how to use df$args$reorder name.
Somehow, I need to automatize the line
df$Population <- factor(df$Population, levels = c("Control", "American"))
to
get(args$reorder, df) <- factor(get(args$reorder, df), levels = c(args$new_order))
or
df$args$reorder <- factor(df$args$reorder, levels = c(args$new_order)
Here is the code I have tried
#!/usr/local/bin/Rscript
suppressWarnings(library(argparse))
suppressWarnings(library(ggpubr))
parser <- ArgumentParser(description="Tools for making plot from command line")
parser$add_argument("--file", type="character", help="Input file")
parser$add_argument("--x-ax", type="character", help="x_axis value")
parser$add_argument("--y-ax", type="character", help="y_axis value")
parser$add_argument("--color", type="character", help="color by")
parser$add_argument("--facet-col", type="character", default=NULL, help="facet by")
parser$add_argument("--reorder", type="character", default=NULL, help="reorder a column")
parser$add_argument("--new_order", type="character", default=NULL, help="new orders for the items")
args <- parser$parse_args()
df <- read.csv(args$file)
head(df)
#Population Diet BloodPressure
#1 American Vegan 167
#2 American Vegan 160
#3 American Vegan 162
#4 American Vegan 165
#5 American Vegan 159
#6 American Vegan 177
#The line below can manually reorder the items in the column.
#df$Population <- factor(df$Population, levels = c("Control", "American"))
# I want to do something like this
#get(args$reorder, df) <- factor(get(args$reorder, df), levels = c(args$new_order))
fig <- ggboxplot(df, x = args$x_ax , y = args$y_ax,
facet.by = args$facet_col, fill = args$color, palette = "npg")
ggsave("reorder_factor.png")
The data I used is here
Before reordering the image is like the following
and after reordering image is like following
Sorry folks!
I solved it by accidental trial and error.
df[[args$reorder]] <- factor(df[[args$reorder]], levels = args$new_order)
Although I don't know how does this work.
Happy coding.
I want to make a histogram for each column. Each Column has three values (Phase_1_Mean, Phase_2_Mean and Phase_3_Mean)
The output should be:
12 histograms (because we have 12 rows), and per histogram the 3 values showed in a bar (Y axis = value, X axis = Phase_1_Mean, Phase_2_Mean and Phase_3_Mean).
Stuck: When I search the internet, almost everyone is making a "long" data frame. That is not helpful with this example (because than we will generate a value "value". But I want to keep the three "rows" separated.
At the bottom you can find my data. Appreciated!
I tried this (How do I generate a histogram for each column of my table?), but here is the "long table" problem, after that I tried Multiple Plots on 1 page in R, that solved how we can plot multiple graphs on 1 page.
dput(Plots1)
structure(list(`0-0.5` = c(26.952381, 5.455598, 28.32947), `0.5-1` =
c(29.798635,
25.972696, 32.87372), `1-1.5` = c(32.922764, 41.95935, 41.73577
), `1.5-2` = c(31.844156, 69.883117, 52.25974), `2-2.5` = c(52.931034,
128.672414, 55.65517), `2.5-3` = c(40.7, 110.1, 63.1), `3-3.5` =
c(73.466667,
199.533333, 70.93333), `3.5-4` = c(38.428571, 258.571429, 95),
`4-4.5` = c(47.6, 166.5, 233.4), `4.5- 5` = c(60.846154,
371.730769, 74.61538), `5-5.5` = c(7.333333, 499.833333,
51), `5.5-6` = c(51.6, 325.4, 82.4), `6-6.5` = c(69, 411.5,
134)), class = "data.frame", .Names = c("0-0.5", "0.5-1",
"1-1.5", "1.5-2", "2-2.5", "2.5-3", "3-3.5", "3.5-4", "4-4.5",
"4.5- 5", "5-5.5", "5.5-6", "6-6.5"), row.names = c("Phase_1_Mean",
"Phase_2_Mean", "Phase_3_Mean"))
Something which is showed in this example (which didn't worked for me, because it is Python) https://www.google.com/search?rlz=1C1GCEA_enNL765NL765&biw=1366&bih=626&tbm=isch&sa=1&ei=Yqc8XOjMLZDUwQLp9KuYCA&q=multiple+histograms+r&oq=multiple+histograms+r&gs_l=img.3..0i19.4028.7585..7742...1.0..1.412.3355.0j19j1j0j1......0....1..gws-wiz-img.......0j0i67j0i30j0i5i30i19j0i8i30i19j0i5i30j0i8i30j0i30i19.j-1kDXNKZhI#imgrc=L0Lvbn1rplYaEM:
I think you have to reshape to long to make this work, but I don't see why this is a problem. I think this code achieves what you want. Note that there are 13 plots because you have 13 (not 12) columns in the dataframe you posted.
# Load libraries
library(reshape2)
library(ggplot2)
Plots1$ID <- rownames(Plots1) # Add an ID variable
Plots2 <- melt(Plots1) # melt to long format
ggplot(Plots2, aes(y = value, x = ID)) + geom_bar(stat = "identity") + facet_wrap(~variable)
Below is the resulting plot. I've kept it basic, but of course you can make it pretty by adding further layers.
I have a grouped bar graph and I am trying to customize the colors for each of the variables or columns. I was able to to it for the first, but if I tired that with the second a third bar would populate.
What would be the best method to go about this?
My code is below:
SpendOpt <- plot_ly(
x= TV_Attribute_Solver$Channel,
y= TV_Attribute_Solver$Current.spend,
name="Current Spend",
type = "bar",
marker = list(color = "#33aFFF"))
SpendOpt <- add_trace(
SpendOpt,
x=TV_Attribute_Solver$Channel,
y=TV_Attribute_Solver$Optimized.Spend,
name = "Optimized Spend",
type = "bar"
)
the data would be:
> dput(data)
structure(list(Channel = c("13th Street", "7 TWO Sydney", "7MATE Sydney",
"Arena", "ATN-7 Sydney", "BBC Knowledge"), Current.spend = c(2782L,
2075L, 990L, 1194L, 32534L, 356L), Optimized.Spend = c(3060.2,
2282.5, 891, 1313.4, 33410.127, 391.344)), .Names = c("Channel",
"Current.spend", "Optimized.Spend"), class = "data.frame", row.names = c(NA,
-6L))
> data
Channel Current.spend Optimized.Spend
1 13th Street 2782 3060.200
2 7 TWO Sydney 2075 2282.500
3 7MATE Sydney 990 891.000
4 Arena 1194 1313.400
5 ATN-7 Sydney 32534 33410.127
6 BBC Knowledge 356 391.344
Also is there a way to create more space between the two groupings?
thank you
I figured I'd just go ahead and add an answer. I've only used plotly for the first time ever, so I'm not really sure how the code you have works. That said, if I understand the question correctly, you want colors based on whether or not it's Optimized.spend or Current.spend. I just extended what you did above to the second part, and I get this:
SpendOpt <- plot_ly(
x= data$Channel,
y= data$Current.spend,
name="Current Spend",
type = "bar",
marker = list(color = "#33aFFF")
)
SpendOpt <- add_trace(
SpendOpt,
x=data$Channel,
y=data$Optimized.Spend,
name = "Optimized Spend",
type = "bar",
marker = list(color = "#afafaf")
)
SpendOpt
I get this:
If that's what you expected, perhaps something else is going on with your setup. I don't see the mentioned "third bar." You might want to try ls() to see what variables you have created or double check something else.
If this isn't what you were trying to achieve, feel free to comment/clarify. This goes back to my suggestion in the comments that reproducible examples and visuals speak so much louder than words: take a screenshot of what you get and mark it up with labels or hand-drawn corrections in the editor of your choice, for example.