I have a gridded field that I plotted with the image function
df <- datainSUB
yr mo dy hr lon lat cell sst avg moavg
1900 6 5 17 -73.5 -60.5 83 2.4 2.15 3.15
1900 6 7 17 -74.5 -60.5 83 3.9 2.15 3.15
1900 8 17 17 -70.5 -60.5 83 -0.9 2.15 0.60
1900 8 18 17 -73.5 -60.5 83 2.1 2.15 0.60
1900 9 20 17 -71.5 -60.5 83 0.2 2.15 2.20
1900 9 21 17 -74.5 -61.5 83 1.6 2.15 2.20
gridplot <- function(df){
pdf(paste(df$mo,".pdf"))
# Compute the ordered x- and y-values
LON <- seq(-180, 180, by = space)
LAT <- seq(-90, 90, by = space)
# Build the matrix to be plotted
moavg <- matrix(NA, nrow=length(LON), ncol=length(LAT))
moavg[cbind(match(round(df$lon, -1), LON), match(round(df$lat, -1), LAT))] <- df$moavg
# Plot the image
image(LON, LAT, moavg)
map(add=T,col="saddlebrown",interior = FALSE, database="world")
dev.off()
}
I want to add a colour legend to the plot but I don't know how to do that. Maybe ggplot is better?
Many thanks
Add the following line after plotting your data:
legend(x="topright", "your legend goes here", fill="saddlebrown")
Related
I have a data set of UK earthquakes that I want to plot by location on a map. (Hopefully I then want to change the size to be representative of the magnitude). I have made a map of the uk using ggmap, but I am struggling to then add the points to a map.
I however keep getting 2 errors, and cannot plot my points on the map. The errors are either
- Error: Aesthetics must be either length 1 or the same as the data (990): x, y
or
- Error in FUN(X[[i]], ...) : object 'group' not found
depending on how I try to plot the points.
this is what I have so far:
table <- data.frame(long2, lat2, mag1)
table
long2 lat2 mag1
1 -2.62 52.84 1.9
2 1.94 57.03 4.2
3 -0.24 51.16 0.6
4 -2.34 53.34 0.8
5 -3.16 55.73 2.0
6 -0.24 51.16 1.0
7 -4.11 53.03 1.5
8 -0.24 51.16 0.2
9 -0.24 51.16 1.1
10 -5.70 57.08 1.6
11 -2.40 53.00 1.4
12 -1.19 53.35 1.2
13 -1.02 53.84 1.7
14 -4.24 52.62 0.8
15 -3.23 54.24 0.3
16 -2.06 52.62 1.0
17 1.63 54.96 1.7
18 -5.24 56.05 0.7
19 -5.86 55.84 1.3
20 -3.22 54.23 0.3
21 -0.24 51.16 -1.4
22 -0.24 51.16 -0.7
23 -4.01 55.92 0.3
24 -5.18 50.08 2.3
25 -1.95 54.44 1.0
library(ggplot2)
library(maps)
w <- map_data("world", region = "uk")
uk <- ggplot(data = w, aes(x = long, y = lat, group=group)) + geom_polygon(fill = "seagreen2", colour="white") + coord_map()
uk + geom_point(data=table, aes(x=long2, y=lat2, colour="red", size=2), position="jitter", alpha=I(0.5))
Is it the way I have built my map, or how I am plotting my points? And how do I fix it?
I've made three changes to your code, and one or more of them solved the problems you were having. I'm not sure exactly which—feel free to experiment!
I named your data pdat (point data) instead of table. table is the name of a built-in R function, and it's best to avoid using it as a variable name.
I have placed both data= expressions inside the geom function that needs that data (instead of placing the data= and aes() inside the initial ggplot() call.) When I use two or more data.frames in a single plot, I do this defensively and find that it avoids many problems.
I have moved colour="red" and size=2 outside of the aes() function. aes() is used to create an association between a column in your data.frame and a visual attribute of the plot. Anything that's not a name of a column doesn't belong inside aes().
# Load data.
pdat <- read.table(header=TRUE,
text="long2 lat2 mag1
1 -2.62 52.84 1.9
2 1.94 57.03 4.2
3 -0.24 51.16 0.6
4 -2.34 53.34 0.8
5 -3.16 55.73 2.0
6 -0.24 51.16 1.0
7 -4.11 53.03 1.5
8 -0.24 51.16 0.2
9 -0.24 51.16 1.1
10 -5.70 57.08 1.6
11 -2.40 53.00 1.4
12 -1.19 53.35 1.2
13 -1.02 53.84 1.7
14 -4.24 52.62 0.8
15 -3.23 54.24 0.3
16 -2.06 52.62 1.0
17 1.63 54.96 1.7
18 -5.24 56.05 0.7
19 -5.86 55.84 1.3
20 -3.22 54.23 0.3
21 -0.24 51.16 -1.4
22 -0.24 51.16 -0.7
23 -4.01 55.92 0.3
24 -5.18 50.08 2.3
25 -1.95 54.44 1.0")
library(ggplot2)
library(maps)
w <- map_data("world", region = "uk")
uk <- ggplot() +
geom_polygon(data = w,
aes(x = long, y = lat, group = group),
fill = "seagreen2", colour = "white") +
coord_map() +
geom_point(data = pdat,
aes(x = long2, y = lat2),
colour = "red", size = 2,
position = "jitter", alpha = 0.5)
ggsave("map.png", plot=uk, height=4, width=6, dpi=150)
I'm a bit new to R and wanting to remove a column from a matrix by the name of that column. I know that X[,2] gives the second column and X[,-2] gives every column except the second one. What I really want to know is if there's a similar command using column names. I've got a matrix and want to remove the "sales" column, but X[,-"sales"] doesn't seem to work for this. How should I do this? I would use the column number only I want to be able to use it for other matrices later, which have different dimensions. Any help would be much appreciated.
I'm not sure why all the answers are solutions for data frames and not matrices.
Per #Sotos's and #Moody_Mudskipper's comments, here is an example with the builtin state.x77 data matrix.
dat <- head(state.x77)
dat
#> Population Income Illiteracy Life Exp Murder HS Grad Frost Area
#> Alabama 3615 3624 2.1 69.05 15.1 41.3 20 50708
#> Alaska 365 6315 1.5 69.31 11.3 66.7 152 566432
#> Arizona 2212 4530 1.8 70.55 7.8 58.1 15 113417
#> Arkansas 2110 3378 1.9 70.66 10.1 39.9 65 51945
#> California 21198 5114 1.1 71.71 10.3 62.6 20 156361
#> Colorado 2541 4884 0.7 72.06 6.8 63.9 166 103766
# for removing one column
dat[, colnames(dat) != "Area"]
#> Population Income Illiteracy Life Exp Murder HS Grad Frost
#> Alabama 3615 3624 2.1 69.05 15.1 41.3 20
#> Alaska 365 6315 1.5 69.31 11.3 66.7 152
#> Arizona 2212 4530 1.8 70.55 7.8 58.1 15
#> Arkansas 2110 3378 1.9 70.66 10.1 39.9 65
#> California 21198 5114 1.1 71.71 10.3 62.6 20
#> Colorado 2541 4884 0.7 72.06 6.8 63.9 166
# for removing more than one column
dat[, !colnames(dat) %in% c("Area", "Life Exp")]
#> Population Income Illiteracy Murder HS Grad Frost
#> Alabama 3615 3624 2.1 15.1 41.3 20
#> Alaska 365 6315 1.5 11.3 66.7 152
#> Arizona 2212 4530 1.8 7.8 58.1 15
#> Arkansas 2110 3378 1.9 10.1 39.9 65
#> California 21198 5114 1.1 10.3 62.6 20
#> Colorado 2541 4884 0.7 6.8 63.9 166
#be sure to use `colnames` and not `names`
names(state.x77)
#> NULL
Created on 2020-06-27 by the reprex package (v0.3.0)
my favorite way:
# create data
df <- data.frame(x = runif(100),
y = runif(100),
remove_me = runif(100),
remove_me_too = runif(100))
# remove column
df <- df[,!names(df) %in% c("remove_me", "remove_me_too")]
so this dataframe:
> df
x y remove_me remove_me_too
1 0.731124508 0.535219259 0.33209113 0.736142042
2 0.612017350 0.404128030 0.84923974 0.624543223
3 0.415403559 0.369818154 0.53817387 0.661263087
4 0.199780006 0.679946936 0.58782429 0.085624708
5 0.343304259 0.892128112 0.02827132 0.038203599
becomes this:
> df
x y
1 0.731124508 0.535219259
2 0.612017350 0.404128030
3 0.415403559 0.369818154
4 0.199780006 0.679946936
5 0.343304259 0.892128112
As always in R there are many potential solutions. You can use the package dplyr and select() to easily remove or select columns in a data frame.
df <- data.frame(x = runif(100),
y = runif(100),
remove_me = runif(100),
remove_me_too = runif(100))
library(dplyr)
select(df, -remove_me, -remove_me_too) %>% head()
#> x y
#> 1 0.35113636 0.134590652
#> 2 0.72545356 0.165608839
#> 3 0.81000067 0.090696049
#> 4 0.29882204 0.004602398
#> 5 0.93492918 0.256870750
#> 6 0.03007377 0.395614901
You can read more about dplyr and its verbs here.
As a general case, if you remove so many columns that only one column remains, R will convert it to a numeric vector. You can prevent it by setting drop = FALSE.
(df <- data.frame(x = runif(6),
y = runif(6),
remove_me = runif(6),
remove_me_too = runif(6)))
# x y remove_me remove_me_too
# 1 0.4839869 0.18672217 0.0973506 0.72310641
# 2 0.2467426 0.37950878 0.2472324 0.80133920
# 3 0.4449471 0.58542547 0.8185943 0.57900456
# 4 0.9119014 0.12089776 0.2153147 0.05584816
# 5 0.4979701 0.04890334 0.7420666 0.44906667
# 6 0.3266374 0.37110822 0.6809380 0.29091746
df[, -c(3, 4)]
# x y
# 1 0.4839869 0.18672217
# 2 0.2467426 0.37950878
# 3 0.4449471 0.58542547
# 4 0.9119014 0.12089776
# 5 0.4979701 0.04890334
# 6 0.3266374 0.37110822
# Result is a numeric vector
df[, -c(2, 3, 4)]
# [1] 0.4839869 0.2467426 0.4449471 0.9119014 0.4979701 0.3266374
# Keep the matrix type
df[, -c(2, 3, 4), drop = FALSE]
# x
# 1 0.4839869
# 2 0.2467426
# 3 0.4449471
# 4 0.9119014
# 5 0.4979701
# 6 0.3266374
I need to generate bins from a data.frame based on the values of one column. I have tried the function "cut".
For example: I want to create bins of air temperature values in the column "AirTDay" in a data frame:
AirTDay (oC)
8.16
10.88
5.28
19.82
23.62
13.14
28.84
32.21
17.44
31.21
I need the bin intervals to include all values in a range of 2 degrees centigrade from that initial value (i.e. 8-9.99, 10-11.99, 12-13.99...), to be labelled with the average value of the range (i.e. 9.5, 10.5, 12.5...), and to respect blank cells, returning "NA" in the bins column.
The output should look as:
Air_T (oC) TBins
8.16 8.5
10.88 10.5
5.28 NA
NA
19.82 20.5
23.62 24.5
13.14 14.5
NA
NA
28.84 28.5
32.21 32.5
17.44 18.5
31.21 32.5
I've gotten as far as:
setwd('C:/Users/xxx')
temp_data <- read.csv("temperature.csv", sep = ",", header = TRUE)
TAir <- temp_data$AirTDay
Tmin <- round(min(TAir, na.rm = FALSE), digits = 0) # is start at minimum value
Tmax <- round(max(TAir, na.rm = FALSE), digits = 0)
int <- 2 # bin ranges 2 degrees
mean_int <- int/2
int_range <- seq(Tmin, Tmax + int, int) # generate bin sequence
bin_label <- seq(Tmin + mean_int, Tmax + mean_int, int) # generate labels
temp_data$TBins <- cut(TAir, breaks = int_range, ordered_result = FALSE, labels = bin_label)
The output table looks correct, but for some reason it shows a sequential additional column, shifts column names, and collapse all values eliminating blank cells. Something like this:
Air_T (oC) TBins
1 8.16 8.5
2 10.88 10.5
3 5.28 NA
4 19.82 20.5
5 23.62 24.5
6 13.14 14.5
7 28.84 28.5
8 32.21 32.5
9 17.44 18.5
10 31.21 32.5
Any ideas on where am I failing and how to solve it?
v<-ceiling(max(dat$V1,na.rm=T))
breaks<-seq(8,v,2)
labels=seq(8.5,length.out=length(s)-1,by=2)
transform(dat,Tbins=cut(V1,breaks,labels))
V1 Tbins
1 8.16 8.5
2 10.88 10.5
3 5.28 <NA>
4 NA <NA>
5 19.82 18.5
6 23.62 22.5
7 13.14 12.5
8 NA <NA>
9 NA <NA>
10 28.84 28.5
11 32.21 <NA>
12 17.44 16.5
13 31.21 30.5
This result follows the logic given: we have
paste(seq(8,v,2),seq(9.99,v,by=2),sep="-")
[1] "8-9.99" "10-11.99" "12-13.99" "14-15.99" "16-17.99" "18-19.99" "20-21.99"
[8] "22-23.99" "24-25.99" "26-27.99" "28-29.99" "30-31.99"
From this we can tell that 19.82 will lie between 18 and 20 thus given the value 18.5, similar to 10.88 being between 10-11.99 thus assigned the value 10.5
I'm currently plotting several datasets of one day in R. The format of the dates in the datasets is yyyymmmdddhh. When I plot this, the formatting fails gloriously: on the x-axis, I now have 2016060125, 2016060150, etc. and a very weirdly shaped plot. What do I have to do to create a plot with a more "normal" date notation (e.g. June 1, 12:00 or just 12:00)??
Edit: the dates of these datasets are integers
The dataset looks like this:
> event_1
date P ETpot Q T fXS GRM_SintJorisweg
1 2016060112 0.0 0.151 0.00652 19.6 0.00477 0.39250
2 2016060113 0.0 0.134 0.00673 20.8 0.00492 0.38175
3 2016060114 0.0 0.199 0.00709 22.6 0.00492 0.36375
4 2016060115 0.0 0.201 0.00765 21.2 0.00492 0.36850
5 2016060116 19.4 0.005 0.00786 19.5 0.00492 0.36900
6 2016060117 2.8 0.005 0.00824 18.1 0.00492 0.36625
7 2016060118 2.6 0.017 0.00984 18.0 0.00508 0.35975
8 2016060119 9.7 0.000 0.01333 16.7 0.00555 0.34750
9 2016060120 7.0 0.000 0.01564 16.8 0.00524 0.33550
10 2016060121 4.1 0.000 0.01859 17.1 0.00524 0.32000
11 2016060122 9.5 0.000 0.02239 17.2 0.00539 0.30250
12 2016060123 2.6 0.000 0.03330 17.5 0.00555 0.27050
13 2016060200 11.6 0.000 0.03997 17.4 0.00555 0.23800
14 2016060201 0.9 0.000 0.04928 17.3 0.00555 0.21725
15 2016060202 0.0 0.000 0.05822 17.2 0.00555 0.20350
16 2016060203 2.3 0.002 0.06547 16.4 0.00555 0.18575
17 2016060204 0.0 0.016 0.07047 16.5 0.00555 0.16950
18 2016060205 0.0 0.027 0.07506 16.7 0.00555 0.16475
19 2016060206 0.0 0.070 0.07762 18.0 0.00555 0.16525
20 2016060207 0.0 0.285 0.08006 19.5 0.00555 0.14500
21 2016060208 0.0 0.224 0.08109 20.3 0.00555 0.15875
22 2016060209 0.0 0.362 0.07850 21.3 0.00555 0.17825
23 2016060210 0.0 0.433 0.07441 22.0 0.00524 0.19175
24 2016060211 0.0 0.417 0.07380 23.9 0.00492 0.19050
I want to plot the date on the x-axis and the Q on the y-axis
Create a minimal verifiable example with your data:
date_int <- c(2016060112,2016060113,2016060114,2016060115,2016060116,2016060117,2016060118,2016060119,2016060120,2016060121,2016060122,2016060123,2016060200,2016060201,2016060202,2016060203,2016060204,2016060205,2016060206,2016060207,2016060208,2016060209,2016060210,2016060211)
Q <- c(0.00652,0.00673,0.00709,0.00765,0.00786,0.00824,0.00984,0.01333,0.01564,0.01859,0.02239,0.0333,0.03997,0.04928,0.05822,0.06547,0.07047,0.07506,0.07762,0.08006,0.08109,0.0785,0.07441,0.0738)
df <- data.frame( date_int, Q)
So, now we have a dataframe 'df'
With the dataframe 'df' you can convert your date_int column to a date format with hours and update the dataframe:
date_time <- strptime(df$date_int, format = '%Y%m%d%H', tz= "UTC")
df$date_int <- date_time
Finally,
plot(df)
You will see a nice plot! Like the following:
Ps.: Please note that you need to use abbreviations specified on "Date and Times in R" (e.g. "%Y%m%d%H" in this case)
Ref.: https://www.stat.berkeley.edu/~s133/dates.html
Here a lubridate answer:
library(lubridate)
event_1$date <- ymd_h(event_1$date)
or base R:
event_1$date <- as.POSIXct(event_1$date, format = "%Y%d%d%H")
What is happening is the dates are getting interpreted as numeric classes. As indicated, you need to convert. To get the formatting correct, you need to do a little more:
set.seed(123)
library(lubridate)
## date
x <- ymd_h(2016060112)
y <- ymd_h(2016060223)
dfx <- data.frame(
date = as.numeric(format(seq(x, y, 3600), "%Y%m%d%H")),
yvar = rnorm(36))
dfx$date_x <- ymd_h(dfx$date)
# plot 1
plot(dfx$date, dfx$yvar)
Now using date_x which is POSIXct:
#plot 2
# converted to POSIXct
class(dfx$date_x)
## [1] "POSIXct" "POSIXt"
plot(dfx$date_x, dfx$yvar)
You will need to fix your date axis to get the format you desire:
#plot 3
# using axis.POSIXct to help things
with(dfx, plot(date_x, yvar, xaxt="n"))
r <- round(range(dfx$date_x), "hours")
axis.POSIXct(1, at = seq(r[1], r[2], by = "hour"), format = "%b-%d %H:%M")
I would like to generate summary of a histogram in a table format. With plot=FALSE, i am able to get histogram object.
> hist(y,plot=FALSE)
$breaks
[1] 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0 2.2 2.4 2.6 2.8 3.0 3.2 3.4 3.6 3.8
$counts
[1] 48 1339 20454 893070 1045286 24284 518 171 148
[10] 94 42 42 37 25 18 21 14 5
$density
[1] 0.00012086929 0.00337174962 0.05150542703 2.24884871999 2.63214538964
[6] 0.06114978928 0.00130438111 0.00043059685 0.00037268032 0.00023670236
[11] 0.00010576063 0.00010576063 0.00009317008 0.00006295276 0.00004532598
[16] 0.00005288032 0.00003525354 0.00001259055
$mids
[1] 0.3 0.5 0.7 0.9 1.1 1.3 1.5 1.7 1.9 2.1 2.3 2.5 2.7 2.9 3.1 3.3 3.5 3.7
$xname
[1] "y"
$equidist
[1] TRUE
attr(,"class")
[1] "histogram"
Is there a way to summarize this object like pareto chart summary. (Below summary is for different data, including this as an example)
Pareto chart analysis for counts
Frequency Cum.Freq. Percentage Cum.Percent.
c 2294652 2294652 33.689225770 33.68923
f 1605467 3900119 23.570868362 57.26009
g 896893 4797012 13.167848880 70.42794
i 464220 5261232 6.815505091 77.24345
b 365399 5626631 5.364651985 82.60810
j 332239 5958870 4.877809219 87.48591
h 215313 6174183 3.161145249 90.64705
l 129871 6304054 1.906717637 92.55377
e 107001 6411055 1.570948818 94.12472
k 104954 6516009 1.540895526 95.66562
d 103648 6619657 1.521721321 97.18734
m 56172 6675829 0.824696377 98.01203
o 51093 6726922 0.750128391 98.76216
n 49320 6776242 0.724097865 99.48626
p 32321 6808563 0.474524881 99.96079
q 1334 6809897 0.019585291 99.98037
r 620 6810517 0.009102609 99.98947
s 247 6810764 0.003626362 99.99310
u 182 6810946 0.002672056 99.99577
t 162 6811108 0.002378424 99.99815
z 126 6811234 0.001849885 100.00000
You can write a wrapper function that will convert the relevant parts of the hist output into a data.frame:
myfun <- function(x) {
h <- hist(x, plot = FALSE)
data.frame(Frequency = h$counts,
Cum.Freq = cumsum(h$counts),
Percentage = h$density/sum(h$density),
Cum.Percent = cumsum(h$density)/sum(h$density))
}
Here's an example on the built-in iris dataset:
myfun(iris$Sepal.Width)
# Frequency Cum.Freq Percentage Cum.Percent
# 1 4 4 0.026666667 0.02666667
# 2 7 11 0.046666667 0.07333333
# 3 13 24 0.086666667 0.16000000
# 4 23 47 0.153333333 0.31333333
# 5 36 83 0.240000000 0.55333333
# 6 24 107 0.160000000 0.71333333
# 7 18 125 0.120000000 0.83333333
# 8 10 135 0.066666667 0.90000000
# 9 9 144 0.060000000 0.96000000
# 10 3 147 0.020000000 0.98000000
# 11 2 149 0.013333333 0.99333333
# 12 1 150 0.006666667 1.00000000