Excel allows you to switch rows and columns in its Chart functionality.
I am trying to replicate this in R. My data (shown) below, is showing production for each company in rows. I am unable to figure out how to show the Month-1, Month-2 etc in x-axis, and the series for each company in the same graph. Any help appreciated.
Data:
tibble::tribble( ~Company.Name, ~Month-1, ~Month-2, ~Month-3, ~Month-4, "Comp-1", 945.5438986, 1081.417009, 976.7388701, 864.309703, "Comp-2", 16448.87, 13913.19, 12005.28, 10605.32, "Comp-3", 346.9689321, 398.2297592, 549.1282647, 550.4207169, "Comp-4", 748.8806367, 949.463941, 1018.877481, 932.3773791 )
I'm going to skip the part where you want to transpose, and infer that your purpose for that was solely to help with plotting. The part I'm focusing on here is "show the Month-1, Month-2 etc in x-axis, and the series for each company in the same graph".
This is doable in base graphics, but I highly recommend using ggplot2 (or plotly or similar), due to its ease of dealing with dimensional plots like this. The "grammar of graphics" (which both tend to implement) really prefers data like this be in a "long" format, so part of what I'll do is convert to this format.
First, some data:
set.seed(2)
months <- paste0("Month", 1:30)
companies <- paste0("Comp", 1:5)
m <- matrix(abs(rnorm(length(months)*length(companies), sd=1e3)),
nrow = length(companies))
d <- cbind.data.frame(
Company = companies,
m,
stringsAsFactors = FALSE
)
colnames(d)[-1] <- months
str(d)
# 'data.frame': 5 obs. of 31 variables:
# $ Company: chr "Comp1" "Comp2" "Comp3" "Comp4" ...
# $ Month1 : num 896.9 184.8 1587.8 1130.4 80.3
# $ Month2 : num 132 708 240 1984 139
# $ Month3 : num 418 982 393 1040 1782
# $ Month4 : num 2311.1 878.6 35.8 1012.8 432.3
# (truncated)
Reshaping can be done with multiple libraries, including base R, here are two techniques:
library(data.table)
d2 <- melt(as.data.table(d), id = 1, variable.name = "Month", value.name = "Cost")
d2[,Month := as.integer(gsub("[^0-9]", "", Month)),]
d2
# Company Month Cost
# 1: Comp1 1 896.91455
# 2: Comp2 1 184.84918
# 3: Comp3 1 1587.84533
# 4: Comp4 1 1130.37567
# 5: Comp5 1 80.25176
# ---
# 146: Comp1 30 653.67306
# 147: Comp2 30 657.10598
# 148: Comp3 30 549.90924
# 149: Comp4 30 806.72936
# 150: Comp5 30 997.37972
library(dplyr)
# library(tidyr)
d2 <- tbl_df(d) %>%
tidyr::gather(Month, Cost, -Company) %>%
mutate(Month = as.integer(gsub("[^0-9]", "", Month)))
I also integerized the Month, since it made sense with an ordinal variable. This isn't strictly necessary, the plot would just treat them as discretes.
The plot is anti-climactically simple:
library(ggplot2)
ggplot(d2, aes(Month, Cost, group=Company)) +
geom_line(aes(color = Company))
Bottom line: I don't think you need to worry about transposing your data: doing so has many complications that can just confuse things. Reshaping is a good thing (in my opinion), but with this kind of data is fast enough that if your data is stored in the wide format, you can re-transform it without too much difficulty. (If you are thinking about putting this in a database, however, I'd strongly recommend you re-think "wide", your db schema will be challenging if you keep it.)
Related
I have a survey file in which row are observation and column question.
Here are some fake data they look like:
People,Food,Music,People
P1,Very Bad,Bad,Good
P2,Good,Good,Very Bad
P3,Good,Bad,Good
P4,Good,Very Bad,Very Good
P5,Bad,Good,Very Good
P6,Bad,Good,Very Good
My aim is to create this kind of plot with ggplot2.
I absolutely don't care of the colors, design, etc.
The plot doesn't correspond to the fake data
Here are my fake data:
raw <- read.csv("http://pastebin.com/raw.php?i=L8cEKcxS",sep=",")
raw[,2]<-factor(raw[,2],levels=c("Very Bad","Bad","Good","Very Good"),ordered=FALSE)
raw[,3]<-factor(raw[,3],levels=c("Very Bad","Bad","Good","Very Good"),ordered=FALSE)
raw[,4]<-factor(raw[,4],levels=c("Very Bad","Bad","Good","Very Good"),ordered=FALSE)
But if I choose Y as count then I'm facing an issue about choosing the X and the Group values... I don't know if I can succeed without using reshape2... I've also tired to use reshape with melt function. But I don't understand how to use it...
EDIT: Many years later
For a pure ggplot2 + utils::stack() solution, see the answer by #markus!
A somewhat verbose tidyverse solution, with all non-base packages explicitly stated so that you know where each function comes from:
library(magrittr) # needed for %>% if dplyr is not attached
"http://pastebin.com/raw.php?i=L8cEKcxS" %>%
utils::read.csv(sep = ",") %>%
tidyr::pivot_longer(cols = c(Food, Music, People.1),
names_to = "variable",
values_to = "value") %>%
dplyr::group_by(variable, value) %>%
dplyr::summarise(n = dplyr::n()) %>%
dplyr::mutate(value = factor(
value,
levels = c("Very Bad", "Bad", "Good", "Very Good"))
) %>%
ggplot2::ggplot(ggplot2::aes(variable, n)) +
ggplot2::geom_bar(ggplot2::aes(fill = value),
position = "dodge",
stat = "identity")
The original answer:
First you need to get the counts for each category, i.e. how many Bads and Goods and so on are there for each group (Food, Music, People). This would be done like so:
raw <- read.csv("http://pastebin.com/raw.php?i=L8cEKcxS",sep=",")
raw[,2]<-factor(raw[,2],levels=c("Very Bad","Bad","Good","Very Good"),ordered=FALSE)
raw[,3]<-factor(raw[,3],levels=c("Very Bad","Bad","Good","Very Good"),ordered=FALSE)
raw[,4]<-factor(raw[,4],levels=c("Very Bad","Bad","Good","Very Good"),ordered=FALSE)
raw=raw[,c(2,3,4)] # getting rid of the "people" variable as I see no use for it
freq=table(col(raw), as.matrix(raw)) # get the counts of each factor level
Then you need to create a data frame out of it, melt it and plot it:
Names=c("Food","Music","People") # create list of names
data=data.frame(cbind(freq),Names) # combine them into a data frame
data=data[,c(5,3,1,2,4)] # sort columns
# melt the data frame for plotting
data.m <- melt(data, id.vars='Names')
# plot everything
ggplot(data.m, aes(Names, value)) +
geom_bar(aes(fill = variable), position = "dodge", stat="identity")
Is this what you're after?
To clarify a little bit, in ggplot multiple grouping bar you had a data frame that looked like this:
> head(df)
ID Type Annee X1PCE X2PCE X3PCE X4PCE X5PCE X6PCE
1 1 A 1980 450 338 154 36 13 9
2 2 A 2000 288 407 212 54 16 23
3 3 A 2020 196 434 246 68 19 36
4 4 B 1980 111 326 441 90 21 11
5 5 B 2000 63 298 443 133 42 21
6 6 B 2020 36 257 462 162 55 30
Since you have numerical values in columns 4-9, which would later be plotted on the y axis, this can be easily transformed with reshape and plotted.
For our current data set, we needed something similar, so we used freq=table(col(raw), as.matrix(raw)) to get this:
> data
Names Very.Bad Bad Good Very.Good
1 Food 7 6 5 2
2 Music 5 5 7 3
3 People 6 3 7 4
Just imagine you have Very.Bad, Bad, Good and so on instead of X1PCE, X2PCE, X3PCE. See the similarity? But we needed to create such structure first. Hence the freq=table(col(raw), as.matrix(raw)).
In #jakub's answer the calculations are done before the data is passed to ggplot(), which is why the stat in geom_bar is set to "identity" (i.e. take the data as is and do nothing with it).
Another approach is to let ggplot do the counting for you, hence we can make use of stat = "count", the default of geom_bar:
library(ggplot2)
ggplot(stack(df1[, -1]), aes(ind, fill = values)) +
geom_bar(position = "dodge")
data
df1 <- read.csv(text = "People,Food,Music,People
P1,Very Bad,Bad,Good
P2,Good,Good,Very Bad
P3,Good,Bad,Good
P4,Good,Very Bad,Very Good
P5,Bad,Good,Very Good
P6,Bad,Good,Very Good
P7,Bad,Very Bad,Good
P8,Very Good,Very Bad,Good
P9,Very Bad,Good,Bad
P10,Bad,Good,Very Bad
P11,Good,Bad,Very Bad
P12,Very Bad,Bad,Very Good
P13,Bad,Very Good,Bad
P14,Bad,Very Good,Very Bad
P15,Good,Good,Good
P16,Very Bad,Very Good,Very Bad
P17,Very Bad,Good,Good
P18,Very Bad,Very Bad,Bad
P19,Very Good,Very Bad,Very Bad
P20,Very Bad,Bad,Good", header = TRUE)
I would like to plot a shape file loaded using read.shp from the fastshp package. However, the read.shp function returns a list of list and not a data.frame. I'm unsure which part of the list I need to extract to obtain the correctly formatted data.frame object. This exact question has been asked on stack overflow already, however, the solution no longer seems to work (solution was from > 7 years ago). Any help is much appreciated.
remotes::install_github("s-u/fastshp") #fastshp not on CRAN
library(ggplot2);library(fastshp)
temp <- tempfile()
temp2 <- tempfile()
download.file("https://www2.census.gov/geo/tiger/TIGER2017/COUNTY/tl_2017_us_county.zip",temp)
unzip(zipfile = temp, exdir = temp2)
shp <- list.files(temp2, pattern = ".shp$",full.names=TRUE) %>% read.shp(.)
shp is a list of lists containing a plethora of information. I tried the following solution from the SO posted earlier, but to no avail:
shp.list <- sapply(shp, FUN = function(x) Polygon(cbind(lon = x$x, lat = x$y))) #throws an error here cbind(lon = x$x, lat = x$y) returns NULL
shp.poly <- Polygons(shp.list, "area")
shp.df <- fortify(shp.poly, region = "area")
I also tried the following:
shp.list <- sapply(shp, FUN = function(x) do.call(cbind, x[c("id","x","y")])) #returns NULL value here...
shp.df <- as.data.frame(do.call(rbind, shp.list))
Updated: Still no luck but closer:
file_shp<-list.files(temp2, pattern = ".shp$",full.names=TRUE) %>%
read.shp(., format = c("table"))
ggplot() +
geom_polygon(data = file_shp, aes(x = x, y = y, group = part),
colour = "black", fill = NA)
Looks like the projection is off. I'm not sure how to order the data to map correctly, also not sure how to read in the CRS data. Tried the following to no avail:
file_prj<-list.files(temp2, pattern = ".prj$",full.names=TRUE) %>%
proj4string(.)
I tried to use the census data you have in your script. However, R Studio somehow kept crashing when I applied read.shp() to the polygon data. Hence, I decided to use the example from the help page of read.shp(), which is also census data. I hope you do not mind. It took some time to figure out how to draw a map with class shp. Let me explain what I went through step by step.
This part is from the help page. I am basically getting shapefile and importing it as shp object.
# Census 2010 TIGER/Line(TM) state shapefile
library(fastshp)
fn <- system.file("shp", "tl_2010_us_state10.shp.xz", package="fastshp")
s <- read.shp(xzfile(fn, "rb"))
Let's check how this object, s is like. It contains 52 lists. In each list, there are six vectors. ID is a unique integer to represent a state. x is longitude and y is latitude. The nasty part was parts. In this example below, there is only one number, which means there is one polygon only in this state. But some other lists (states) have multiple numbers. These numbers are basically indices which indicate where new polygons begin in the data.
#> str(s)
#List of 52
# $ :List of 6
# ..$ id : int 1
# ..$ type : int 5
# ..$ box : num [1:4] -111 41 -104 45
# ..$ parts: int 0
# ..$ x : num [1:9145] -109 -109 -109 -109 -109 ...
# ..$ y : num [1:9145] 45 45 45 45 45 ...
Here is the one for Alaska. As you see there are some numbers in parts These numbers indicate where new polygon data begin. Alaksa has many small islands. Hence they needed to indicate different polygons in the data with this information. We will come back to this later when we create data frames.
#List of 6
# $ id : int 18
# $ type : int 5
# $ box : num [1:4] -179.2 51.2 179.9 71.4
# $ parts: int [1:50] 0 52 88 127 175 207 244 306 341 375 ...
# $ x : num [1:14033] 177 177 177 177 177 ...
# $ y : num [1:14033] 52.1 52.1 52.1 52.1 52.1 ...
What we need is the following. For each list, we need to extract longitude (i.e., x), latitude (i.e., y), and id in order to create a data fame for one state. In addition, we need to use parts so that we can indicate all polygons with unique IDs. We need to crate a new group variable, which contains unique ID value for each polygon. I used findInterval() which takes indices to create a group variable. One tricky part was that we need to use left.open = TRUE in findInterval() in order to create a group variable. (This gave me some hard time to figure out what was going on.) This map_dfr() part handles the job I just described.
library(tidyverse)
map_dfr(.x = s,
.f = function(mylist){
temp <- data.frame(id = mylist$id,
lon = mylist$x,
lat = mylist$y)
ind <- mylist$parts
out <- mutate(temp,
subgroup = findInterval(x = 1:n(), vec = ind, left.open = TRUE),
group = paste(id, subgroup, sep = "_"))
return(out)
}) -> test
Once we have test, we have another job. Some longitude points of Alaska stay in positive numbers (e.g., 179.85). As long as we have numbers like this, ggplot2 draws funny long lines, which you can see even in your example. What we need is to convert these positive numbers to negative ones so that ggplot2 can draw a proper map.
mutate(test,
lon = if_else(lon > 0, lon * -1, lon)) -> out
By this time, out looks like this.
id lon lat subgroup group
1 1 -108.6213 45.00028 1 1_1
2 1 -108.6197 45.00028 1 1_1
3 1 -108.6150 45.00031 1 1_1
4 1 -108.6134 45.00032 1 1_1
5 1 -108.6133 45.00032 1 1_1
6 1 -108.6130 45.00032 1 1_1
Now we are ready to draw a map.
ggplot() +
geom_polygon(data = out, aes(x = lon, y = lat, group = group))
I would like to discretize data with zip codes into regions
I have character data
sample:
zip_code
'45654'
'12321'
'99453'
etc
I have 6 categories with rules:
region 1 - NE: 01000-19999
region 2 - SE: 20000-39999
region 3 - MW: 40000-58999,60000-69999
region 4 - SW: 70000-79999,85000-88499
region 5 - MT: 59000-59999,80000-84999,88900-89999
region 6 - PC: 90000-99999
I would like my output to be factor data:
region
'MW'
'NE'
'PC'
etc
Obviously, I know many ways to discretize the data, but none are clean and elegant (like loops, ifelse, etc)
Is there an elegant way to apply a case with 6 categories to discretize this data?
Okay, messy but this can work. I assume you'll have to use character objects since some zip codes start with 0. Obs. replace these numbers with your zip codes.
zip_code <- c('1','6','15')
regions <- list(NE = as.character(1:3),
SE = as.character(4:6),
MW = as.character(7:9),
SW = as.character(10:12),
MT = as.character(13:15),
PC = as.character(16:19))
sapply(zip_code, function(x) names(regions[sapply(regions, function(y) x %in% y)]))
1 6 15
"NE" "SE" "MT"
Here is a data.table solution using foverlaps(...) and the full US zip code database in package zipcode for the example. Note that your definitions of the ranges are deficient: for instance there are zip codes in NH that are outside the NE range, and PR is completely missing.
library(data.table) # 1.9.4+
library(zipcode)
data(zipcode) # database of US zip codes (a data frame)
zips <- data.table(zip_code=zipcode$zip)
regions <- data.table(region=c("NE" , "SE", "MW", "MW", "SW", "SW", "MT", "MT", "MT", "PC"),
start =c(01000,20000,40000,60000,70000,85000,59000,80000,88900,90000),
end =c(19999,39999,58999,69999,79999,88400,59999,84999,89999,99999))
setkey(regions,start,end)
zips[,c("start","end"):=list(as.integer(zip_code),as.integer(zip_code))]
result <- foverlaps(zips,regions)[,list(zip_code,region)]
result[sample(1:nrow(result),10)] # random sample of the result
# zip_code region
# 1: 27113 SE
# 2: 36101 SE
# 3: 55554 MW
# 4: 91801 PC
# 5: 20599 SE
# 6: 90250 PC
# 7: 95329 PC
# 8: 63435 MW
# 9: 60803 MW
# 10: 07040 NE
foverlaps(...) works this way: suppose a data.table x has columns a and b that represent a range (e.g., a <= b for all rows), and a data.table y has columns c and d that similarly represent a range. Then foverlaps(x,y) finds, for each row in x, all the rows in y which have overlapping ranges.
In your case we set up the y argument as the regions, where the ranges are the beginning and ending zipcodes for each (sub) region. Then we set up x as the original zip code database using the actual zip codes (converted to integer) for both the beginning and end of the range.
foverlaps(...) is extremely fast. In this case the full US zip code database (>44,000 zipcodes) was processed in about 23 milliseconds.
You could also try (Using #Scott Chamberlain's data)
with(stack(regions), unique(ind[ave(values %in% zip_code, ind, FUN=I)]))
#[1] NE SE MT
#Levels: MT MW NE PC SE SW
Or
library(dplyr)
library(tidyr)
unnest(regions, region) %>%
group_by(region) %>%
filter(x %in% zip_code)
# region x
#1 NE 1
#2 SE 6
#3 MT 15
Or
r1 <- vapply(regions, function(x) any(x %in% zip_code), logical(1))
names(r1)[r1]
#[1] "NE" "SE" "MT"
Is there a clean/automatic way to convert CSV values formatted with as percents (with trailing % symbol) in R?
Here is some example data:
actual,simulated,percent error
2.1496,8.6066,-300%
0.9170,8.0266,-775%
7.9406,0.2152,97%
4.9637,3.5237,29%
Which can be read using:
junk = read.csv("Example.csv")
But all of the % columns are read as strings and converted to factors:
> str(junk)
'data.frame': 4 obs. of 3 variables:
$ actual : num 2.15 0.917 7.941 4.964
$ simulated : num 8.607 8.027 0.215 3.524
$ percent.error: Factor w/ 4 levels "-300%","-775%",..: 1 2 4 3
but I would like them to be numeric values.
Is there an additional parameter for read.csv? Is there a way to easily post process the needed columns to convert to numeric values? Other solutions?
Note: of course in this example I could simply recompute the values, but in my real application with a larger data file this is not practical.
There is no "percentage" type in R. So you need to do some post-processing:
DF <- read.table(text="actual,simulated,percent error
2.1496,8.6066,-300%
0.9170,8.0266,-775%
7.9406,0.2152,97%
4.9637,3.5237,29%", sep=",", header=TRUE)
DF[,3] <- as.numeric(gsub("%", "",DF[,3]))/100
# actual simulated percent.error
#1 2.1496 8.6066 -3.00
#2 0.9170 8.0266 -7.75
#3 7.9406 0.2152 0.97
#4 4.9637 3.5237 0.29
This is the same as Roland's solution except using the stringr package. When working with strings I'd recommend it though as the interface is more intuitive.
library(stringr)
d <- str_replace(junk$percent.error, pattern="%", "")
junk$percent.error <- as.numeric(d)/100
With data.table you can achieve it as
a <- fread("file.csv")[,`percent error` := as.numeric(sub('%', '', `percent error`))/100]
Tidyverse has multiple ways of solving such issues. You can use the parse_number() specification which will strip a number off any symbols, text etc.:
sample_data = "actual,simulated,percent error\n 2.1496,8.6066,-300%\n 0.9170,8.0266,-775%\n7.9406,0.2152,97%\n4.9637,3.5237,29%"
DF <- read_csv(sample_data,col_types = cols(`percent error`= col_number()))
# A tibble: 4 x 3
# actual simulated `percent error`
# <chr> <dbl> <dbl>
# 1 2.1496 8.61 -300
# 2 + 0.9170 8.03 -775
# 3 + 7.9406 0.215 97.0
# 4 + 4.9637 3.52 29.0
I currently have a shapefile of the UK and have plot the population of species in different regions of the UK. So far I have just plotted 3 levels of species population and coloured them red=high, orange=med, green=low. But what I would like to do would be to have a gradient plot instead of being bounded by just 3 colours.
So far I have a table called Count that has the regions as the column names and then the count of species for each region below. My lowest count being 0 and my highest being around 2500 and the regions in Count match with the regions in my shapefile. I have a function that determines what is high, med, low based on levels you input yourself
High<-colnames(Count)[which(Count>'input value here')]
and then these are plotted onto the shapefile like this:
plot(ukmap[(ukmap$Region %in% High),],col='red',add=T)
Unfortunately I can't really install any packages, I was thinking of using colorRamp, but I'm not really sure what to do?
EDIT: my data looks something like this
Wales Midlands North Scotland South East South West
1 551 32 124 1 49 28
3 23 99 291 152 164 107
4 1 7 17 11 21 14
7 192 32 12 0 1 9
9 98 97 5 1 21 0
and the first column is just a number that represents the species and currently I have a function that plots the count onto a UK shapefile but based on boundaries of high, med and low. The data above is not attached to my shapefile. I then loop through for each line (species) of my data set and plot a new map for each line (species).
All right, I'll bite. I'm not going to use base R because plot is too hard for me to understand, so instead we will be using ggplot2.
# UK shapefile found via http://www.gadm.org/download
uk.url <- "http://www.filefactory.com/file/s3dz3jt3vr/n/GBR_adm_zip"
# replace following with your working directory - no trailing slash
work.dir <- "C:/Temp/r.temp/gb_map"
# the full file path for storing file
file.loc <- paste0(work.dir, "/uk.zip")
download.file (uk.url, destfile = file.loc, mode = "wb")
unzip(file.loc, exdir = work.dir)
# open the shapefile
require(rgdal)
require(ggplot2)
uk <- readOGR(work.dir, layer = "GBR_adm2")
# use the NAME_2 field (representing counties) to create data frame
uk.map <- fortify(uk, region = "NAME_2")
# create fake count data...
uk.map$count <- round(runif(nrow(uk.map), 0, 2500), 0)
# quick visual check
ggplot(uk.map, aes(x = long, y = lat, group = group, fill = count)) +
geom_polygon(colour = "black", size = 0.5, aes(group = group)) +
theme()
This generates the output below, which may be similar to what you need.
Note that we don't explictly specify the gradient in this case - we just leave it up to ggplot. If you wish to specify those details it is possible but more involved. If you go down that route you should create another column in uk.map to allocate each count into one of (say) 10 bins using the cut function. The uk.map data frame looks like this:
> str(uk.map)
'data.frame': 427339 obs. of 8 variables:
$ long : num -2.05 -2.05 -2.05 -2.05 -2.05 ...
$ lat : num 57.2 57.2 57.2 57.2 57.2 ...
$ order: int 1 2 3 4 5 6 7 8 9 10 ...
$ hole : logi FALSE FALSE FALSE FALSE FALSE FALSE ...
$ piece: Factor w/ 234 levels "1","2","3","4",..: 1 1 1 1 1 1 1 1 1 1 ...
$ group: Factor w/ 1136 levels "Aberdeen.1","Aberdeenshire.1",..: 1 1 1 1 1 1 1 1 1 1 ...
$ id : chr "Aberdeen" "Aberdeen" "Aberdeen" "Aberdeen" ...
$ count: num 1549 1375 433 427 1282 ...
>
OK, here is an alternative solution that doesn't use ggplot (I will leave the ggplot solution for reference). This code is simple but it should be enough to give you some ideas as to how you can adapt it to your own data.
# UK shapefile found via http://www.gadm.org/download
uk.url <- "http://www.filefactory.com/file/s3dz3jt3vr/n/GBR_adm_zip"
# replace following with your working directory - no trailing slash
work.dir <- "C:/Temp/r.temp/gb_map"
# the full file path for storing file
file.loc <- paste0(work.dir, "/uk.zip")
download.file (uk.url, destfile = file.loc, mode = "wb")
unzip(file.loc, exdir = work.dir)
# open the shapefile
require(rgdal)
uk <- readOGR(work.dir, layer = "GBR_adm2")
# make some fake data to plot
uk#data$count <- round(runif(nrow(uk#data), 0, 2500), 0)
uk#data$count <- as.numeric(uk#data$count)
# and plot it
plot(uk, col = gray(uk#data$count/2500))
The result of the code is the following plot.
EDIT following a request to include a legend, I have tweaked the code a little but in all honesty I don't understand base R's legend function well enough to get something of production quality and I have no wish to research it further. (Incidentally hat tip to this question for ideas.) A look at the plot beneath the code suggests that we need to reorder the legend colours etc but I will leave that to the original poster as an exercise or to post as another question.
# UK shapefile found via http://www.gadm.org/download
uk.url <- "http://www.filefactory.com/file/s3dz3jt3vr/n/GBR_adm_zip"
# replace following with your working directory - no trailing slash
work.dir <- "C:/Temp/r.temp/gb_map"
# the full file path for storing file
file.loc <- paste0(work.dir, "/uk.zip")
download.file (uk.url, destfile = file.loc, mode = "wb")
unzip(file.loc, exdir = work.dir)
# open the shapefile
require(rgdal)
uk <- readOGR(work.dir, layer = "GBR_adm2")
# make some fake data to plot
uk#data$count <- as.numeric(round(runif(nrow(uk#data), 0, 2500), 0))
uk#data$bin <- cut(uk#data$count, seq(0, 2500, by = 250),
include.lowest = TRUE, dig.lab = 4)
# labels for the legend
lev = levels(uk#data$bin)
lev2 <- gsub("\\,", " to ", lev)
lev3 <- gsub("\\]$", "", lev2)
lev4 <- gsub("\\(|\\)", " ", lev3)
lev5 <- gsub("^\\[", " ", lev4)
my.levels <- lev5
# Create a function to generate a continuous color palette
rbPal <- colorRampPalette(c('red','blue'))
uk#data$Col <- rbPal(10)[as.numeric(cut(uk#data$count, seq(0, 2500, by = 250)))]
# Plot
plot(uk, col = uk#data$Col)
legend("topleft", fill = uk#data$Col, legend = my.levels, col = uk#data$Col)
Have you tried colorRampPalette?
Here is how you could try to build a gradient palette
gradient_color <- colorRampPalette(c("blue", "red"))
gradient_color(10)
[1] "#0000FF" "#1C00E2" "#3800C6" "#5500AA" "#71008D" "#8D0071" "#AA0055"
[8] "#C60038" "#E2001C" "#FF0000"
An example plot
plot(rep(1,10),col=gradient_color(10))