Plot observations in same x-axis point which linked with id variable - r

I need help. This is a view of my database :
482 940 914 1
507 824 1042 2
514 730 1450 3
477 595 913 4
My aim is to plot in the same point of x-axis each row.
Example:
in 1 (=x) i want to plot 482, 940 and 914
in 2 (=x) I want to plot 507, 824 and 1042.
So three points in vertical for each x axis points.

it's a good idea to share the data in a reproducible way - I'm using readClipboard to read in the copied vector into R. Anyway, here's a quick answer:
x <- as.numeric(unlist(strsplit(readClipboard(), " ")))
This makes it into a numeric vector. We now need to split into groups based on the description you provided. I'm using matrix to achieve this and will then convert to data.frame for plotting using ggplot2:
m <- matrix(x, ncol = 4, byrow = T)
> m
[,1] [,2] [,3] [,4]
[1,] 482 940 914 1
[2,] 507 824 1042 2
[3,] 514 730 1450 3
[4,] 477 595 913 4
df <- as.data.frame(m)
# Assign names to the data.frame
names(df) <- letters[1:4]
> df
a b c d
1 482 940 914 1
2 507 824 1042 2
3 514 730 1450 3
4 477 595 913 4
To get the plot:
library(ggplot2)
ggplot(df, aes(x = d)) +
geom_point(aes(y = a), color = "red") +
geom_point(aes(y = b), color = "green") +
geom_point(aes(y = c), color = "blue")
OUTPUT
You can play around with ggtitle and xlab etc. to change the plot labels and add legends.
Hope this is helpful!

Related

calculating Z-values for path over surface described by x-values, y-values, and z-matrix in R

The x axis and y axis is described by Longitude and latitude range values (1d-vectors),
The z axis is described by a 2d matrix.
How I can estimate the z value for a give lon/lat position in the surface?
terra= interp(x, y, z, xo = lons[i], yo = lats[i]))
is not working as expected (akima) due to the case, that z has another dimension:
Error in interp(x, y, z, xo = lons[i], yo = lats[i]) :
Lengths of x, y, and z do not match
The size of the matrix z is x * y.
The bottom line here is that akima::interp is probably the wrong tool for what you are trying to do, and you should instead use:
pracma::interp2(x, y, z, xp = lat[i], yp = lon[i])
Explanation
If I am understanding you correctly, you have a matrix z that represents a 3D surface. The latitude of each row is stored in the vector y and the longitude of each column in the vector x. You wish to find out the implied value of z for any arbitrary x, y co-ordinates that are not on the original grid using 2D interpolation.
This isn't really what the function interp is for (see below for an explanation of what interp does). It would be better here to use a function that specifically interpolates a 2D matrix such as pracma::interp2.
For example, suppose I have the following x, y, z objects, similar to your own set up:
x <- seq(0, 45, length = 10)
y <- seq(-45, 45, length = 10)
z <- outer(x, y, function(x, y) x^2 + y^2)
x
#> [1] 0 5 10 15 20 25 30 35 40 45
y
#> [1] -45 -35 -25 -15 -5 5 15 25 35 45
z
#> [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
#> [1,] 2025 1225 625 225 25 25 225 625 1225 2025
#> [2,] 2050 1250 650 250 50 50 250 650 1250 2050
#> [3,] 2125 1325 725 325 125 125 325 725 1325 2125
#> [4,] 2250 1450 850 450 250 250 450 850 1450 2250
#> [5,] 2425 1625 1025 625 425 425 625 1025 1625 2425
#> [6,] 2650 1850 1250 850 650 650 850 1250 1850 2650
#> [7,] 2925 2125 1525 1125 925 925 1125 1525 2125 2925
#> [8,] 3250 2450 1850 1450 1250 1250 1450 1850 2450 3250
#> [9,] 3625 2825 2225 1825 1625 1625 1825 2225 2825 3625
#> [10,] 4050 3250 2650 2250 2050 2050 2250 2650 3250 4050
We can see this draws a concave surface:
persp(x, y, z, theta = 225, phi = 30, col = 'gold')
Now suppose I have some lat, lon co-ordinates that are within this domain but not directly on the original grid:
lat <- c(12.3, 4.6, 5.1)
lon <- c(-5.3, 0, 4.1)
I can work out the z value at these three points like this:
pracma::interp2(x, y, z, xp = lat, yp = lon)
#> [1] 835.75 1801.50 1817.75
Which seems to be the type of output you are looking for.
Why not akima::interp here?
By contrast, akima::interp tries to create a regular matrix of x, y, z values representing a 3D surface, given only a few x, y, z co-ordinates on that surface. It therefore creates a regular grid when you don't already have one, rather than interpolating arbitrary points between an existing regular grid.
We can show this by turning our x, y and z objects into a set of points and feeding them to interp. if we expand the x and y vectors and turn z from a matrix to a vector, we create 3 vectors of length 100 that represent the x co-ordinates, y co-ordinates and z co-ordinates of points on our surface. Feeding these to akima::interp will result in a smooth 40 x 40 grid being created from the original data:
persp(interp::interp(rep(x, length(y)), rep(y, each = length(x)), z),
theta = 225, phi = 30, col = 'gold')
Since you already have a regular grid, this does not seem to be what you need.

R - Sum range over lookback period, divided sum of look back - excel to R

I am looking to workout a percentage total over a look back range in R.
I know how to do this in excel with the following formula:
=SUM(B2:B4)/SUM(B2:B4,C2:C4)
This is summing column B over a range of today looking back 3 lines. It then divides this sum buy the total sum of column B + C again looking back 3 lines.
I am looking to achieve the same calculation in R to run across my matrix.
The output would look something like this:
adv dec perct
1 69 376
2 113 293
3 270 150 0.355625492
4 74 371 0.359559402
5 308 96 0.513790386
6 236 173 0.491255962
7 252 134 0.663886572
8 287 129 0.639966969
9 219 187 0.627483444
This is a line of code I could perhaps add the look back range too:
perct <- apply(data.matrix[,c('adv','dec')], 1, function(x) { (x[1] / x[1] + x[2]) } )
If i could get [1] to sum the previous 3 line range and
If i could get [2] to also sum the previous 3 line range.
Still learning how to apply forward and look back periods within R. So any additional learning on the answer would be appreciated!
Here are some approaches. The first 3 use rollsumr and/or rollapplyr in zoo and the last one uses only the base of R.
1) rollsumr Create a matrix with rollsumr whose columns contain the rollling sums, convert that to row proportions and take the "adv" column. Finally assign that to a new column frac in DF. This approach has the shortest code.
library(zoo)
DF$frac <- prop.table(rollsumr(DF, 3, fill = NA), 1)[, "adv"]
giving:
> DF
adv dec frac
1 69 376 NA
2 113 293 NA
3 270 150 0.3556255
4 74 371 0.3595594
5 308 96 0.5137904
6 236 173 0.4912560
7 252 134 0.6638866
8 287 129 0.6399670
9 219 187 0.6274834
1a) This variation is similar except instead of using prop.table we write out the ratio. The code is longer but you may find it clearer.
m <- rollsumr(DF, 3, fill = NA)
DF$frac <- with(as.data.frame(m), adv / (adv + dec))
1b) This is a variation of (1) that is the same except it uses a magrittr pipeline:
library(magrittr)
DF %>% rollsumr(3, fill = NA) %>% prop.table(1) %>% `[`(TRUE, "adv") -> DF$frac
2) rollapplyr We could use rollapplyr with by.column = FALSE like this. The result is the same.
ratio <- function(x) sum(x[, "adv"]) / sum(x)
DF$frac <- rollapplyr(DF, 3, ratio, by.column = FALSE, fill = NA)
3) Yet another variation is to compute the numerator and denominator separately:
DF$frac <- rollsumr(DF$adv, 3, fill = NA) /
rollapplyr(DF, 3, sum, by.column = FALSE, fill = NA)
4) base This uses embed followed by rowSums on each column to get the rolling sums and then uses prop.table as in (1).
DF$frac <- prop.table(sapply(lapply(rbind(NA, NA, DF), embed, 3), rowSums), 1)[, "adv"]
Note: The input used in reproducible form is:
Lines <- "adv dec
1 69 376
2 113 293
3 270 150
4 74 371
5 308 96
6 236 173
7 252 134
8 287 129
9 219 187"
DF <- read.table(text = Lines, header = TRUE)
Consider an sapply that loops through the number of rows in order to index two rows back:
DF$pred <- sapply(seq(nrow(DF)), function(i)
ifelse(i>=3, sum(DF$adv[(i-2):i])/(sum(DF$adv[(i-2):i]) + sum(DF$dec[(i-2):i])), NA))
DF
# adv dec pred
# 1 69 376 NA
# 2 113 293 NA
# 3 270 150 0.3556255
# 4 74 371 0.3595594
# 5 308 96 0.5137904
# 6 236 173 0.4912560
# 7 252 134 0.6638866
# 8 287 129 0.6399670
# 9 219 187 0.6274834

Replacing legend and making distinct colours on scatter plot

I am working on a countrywide data and trying to look at the relationship between disease count and flock size. I want to change the legend for scatter plot i.e to have names for the regions rather than the codes as appear on the plot posted here.I also want to make some improvements on the colours which represent the 8 regions so that there are some clear differences as it is a bit hard to differentiate between the current colours. Any suggestions on making the improvements on the plot?
library(lattice)
xyplot(log(Cases2012+1)~ Flock2012, data=orf, groups = Region.Coding,
auto.key =
list(space = "right", points = TRUE))
portion of data:
Region Flock2012
1 190
2 343
1 810
3 1450
1 1125
3 1305
1 750
1 227
3 1800
1 1100
2 1250
1 362
6 800
2 559
4 770
1 900
2 600
1 860
2 1450
6 1014
1 1870
4 950
1 1730
5 353
1 6000
5 1150
1 3100
1 2400
5 278
2 444
2 546
7 775
2 870
5 690
8 1032
2 2351
7 680
3 430
2 931
8 1590
2 70
5 780
2 1366
2 1900
4 730
2 1860
2 1032
7 1700
2 230
2 301
5 565
Tried this but plot not showing up
mycols <- c("red", "blue", "forestgreen", "gold", "black", "cyan", "darkorange", "darkred")
myregions <- c("East", "Midlands", "Wmidlands","NWest","NEast","Yorkshire","SEast","SWest")
xyplot(log(Flock2012+1)~ Flock2012, data=stack, groups = Regions,
col=mycols, pch=1,
key=list(space="right",
text=list(myregions),
points=list(col=mycols, cex=1.5, pch=1)
I think this should work. I would create a list of colours that you do want and a list of names of the regions.
mycols <- c("red", "blue", "forestgreen", "gold", "black", "cyan", "darkorange", "darkred")
myregions <- c("East", "Midlands", "Wmidlands","NWest","NEast","Yorkshire","SEast","SWest")
Then rather than use the auto.key option, use the key option for a bit more flexibility.
xyplot(log(Cases2012+1)~ Flock2012, data=orf, groups = Region.Coding,
col=mycols, pch=1,
key=list(space="right",
text=list(myregions),
points=list(col=mycols, cex=1.5, pch=1)))
Hope this helps.

2D irregular aggregation of a matrix

I'm trying to bin a symmetric matrix with irregular intervals in R but am not sure how to proceed. My ideas are:
Reshape the matrix to long format, aggregate and cast it back?
Bin as-is in both dimensions (somehow... tapply, aggregate?)
Keep the regular binning but for each of my (larger) irregular bins, replace all inner values with their sum?
Here's an example of what I'm trying to do:
set.seed(42)
# symmetric matrix
a <- matrix(rpois(1e4, 2), 100)
a[upper.tri(a)] <- t(a)[upper.tri(a)]
image(x=1:100, y=1:100, a, asp=1, frame=F, axes=F)
# vector of irregular breaks for binning
breaks <- c(12, 14, 25, 60, 71, 89)
# white line show the desired bins
abline(h=breaks-.5, lwd=2, col="white")
abline(v=breaks-.5, lwd=2, col="white")
(The aim being that each rectangle drawn above be filled according to the sum of values within it.) I'd appreciate any pointers of how best to approach this.
This answer provides a great starting point using tapply:
b <- melt(a)
bb <- with(b, tapply(value,
list(
y=cut(Var1, breaks=c(0, breaks, Inf), include.lowest=T),
x=cut(Var2, breaks=c(0, breaks, Inf), include.lowest=T)
),
sum)
)
bb
# x
# y [0,12] (12,14] (14,25] (25,60] (60,71] (71,89] (89,Inf]
# [0,12] 297 48 260 825 242 416 246
# (12,14] 48 3 43 141 46 59 33
# (14,25] 260 43 261 794 250 369 240
# (25,60] 825 141 794 2545 730 1303 778
# (60,71] 242 46 250 730 193 394 225
# (71,89] 416 59 369 1303 394 597 369
# (89,Inf] 246 33 240 778 225 369 230
These can then be plotted as rectangular bins using a base plot and rect — i.e.:
library("reshape2")
library("magrittr")
bsq <- melt(bb)
# convert range notation to numerics
getNum <- . %>%
# rm brackets
gsub("\\[|\\(|\\]|\\)", "", .) %>%
# split digits and convert
strsplit(",") %>%
unlist %>% as.numeric
y <- t(sapply(bsq[,1], getNum))
x <- t(sapply(bsq[,2], getNum))
# normalise bin intensity by area
bsq$size <- (y[,2] - y[,1]) * (x[,2] - x[,1])
bsq$norm <- bsq$value / bsq$size
# draw rectangles on top of empty plot
plot(1:100, 1:100, type="n", frame=F, axes=F)
rect(ybottom=y[,1], ytop=y[,2],
xleft=x[,1], xright=x[,2],
col=rgb(colorRamp(c("white", "steelblue4"))(bsq$norm / max(bsq$norm)),
alpha=255*(bsq$norm / max(bsq$norm)), max=255),
border="white")

Multiple scatterplot figure in R

I have a slightly complicated plotting task. I am half way there, quite sure how to get it. I have a dataset of the form below, with multiple subjects, each in either Treatgroup 0 or Treatgroup 1, each subject contributing several rows of data. Each row corresponds to a single timepoint at which there are values in columns "count1, count2, weirdname3, etc.
Task 1. I need to calculate "Days", which is just the visitdate - the startdate, for each row. Should be an apply type function, I guess.
Task 2. I have to make a multiplot figure with one scatterplot for each of the count variables (a plot for count1, one for count2, etc). In each scatterplot, I need to plot the value of the count (y axis) against "Days" (x-axis) and connect the dots for each subject. Subjects in Treatgroup 0 are one color, subjects in treatgroup 1 are another color. Each scatterplot should be labeled with count1, count2 etc as appropriate.
I am trying to use the base plotting function, and have taken the approach of writing a plotting function to call later. I think this can work but need some help with syntax.
#Enter example data
tC <- textConnection("
ID StartDate VisitDate Treatstarted count1 count2 count3 Treatgroup
C0098 13-Jan-07 12-Feb-10 NA 457 343 957 0
C0098 13-Jan-06 2-Jul-10 NA 467 345 56 0
C0098 13-Jan-06 7-Oct-10 NA 420 234 435 0
C0098 13-Jan-05 3-Feb-11 NA 357 243 345 0
C0098 14-Jan-06 8-Jun-11 NA 209 567 254 0
C0098 13-Jan-06 9-Jul-11 NA 223 235 54 0
C0098 13-Jan-06 12-Oct-11 NA 309 245 642 0
C0110 13-Jan-06 23-Jun-10 30-Oct-10 629 2436 45 1
C0110 13-Jan-07 30-Sep-10 30-Oct-10 461 467 453 1
C0110 13-Jan-06 15-Feb-11 30-Oct-10 270 365 234 1
C0110 13-Jan-06 22-Jun-11 30-Oct-10 236 245 23 1
C0151 13-Jan-08 2-Feb-10 30-Oct-10 199 653 456 1
C0151 13-Jan-06 24-Mar-10 3-Apr-10 936 25 654 1
C0151 13-Jan-06 7-Jul-10 3-Apr-10 1147 254 666 1
C0151 13-Jan-06 9-Mar-11 3-Apr-10 1192 254 777 1
")
data1 <- read.table(header=TRUE, tC)
close.connection(tC)
# format date
data1$VisitDate <- with(data1,as.Date(VisitDate,format="%d-%b-%y"))
# stuck: need to define days as VisitDate - StartDate for each row of dataframe (I know I need an apply family fxn here)
data1$Days <- [applyfunction of some kind ](VisitDate,ID,function(x){x-data1$StartDate})))
# Unsure here. Need to define plot function
plot_one <- function(d){
with(d, plot(Days, Count, t="n", tck=1, cex.main = 0.8, ylab = "", yaxt = 'n', xlab = "", xaxt="n", xlim=c(0,1000), ylim=c(0,1200))) # set limits
grid(lwd = 0.3, lty = 7)
with(d[d$Treatgroup == 0,], points(Days, Count1, col = 1))
with(d[d$Treatgroup == 1,], points(Days, Count1, col = 2))
}
#Create multiple plot figure
par(mfrow=c(2,2), oma = c(0.5,0.5,0.5,0.5), mar = c(0.5,0.5,0.5,0.5))
#trouble here. I need to call the column names somehow, with; plyr::d_ply(data1, ???, plot_one)
Task 1:
data1$days <- floor(as.numeric(as.POSIXlt(data1$VisitDate,format="%d-%b-%y")
-as.POSIXlt(data1$StartDate,format="%d-%b-%y")))
Task 2:
par(mfrow=c(3,1), oma = c(2,0.5,1,0.5), mar = c(2,0.5,1,0.5))
plot(data1$days, data1$count1, col=as.factor(data1$Treatgroup), main="count1")
plot(data1$days, data1$count2, col=as.factor(data1$Treatgroup), main="count2")
plot(data1$days, data1$count3, col=as.factor(data1$Treatgroup), main="count3")

Resources