Time series with MODISTools - r

I need to get full EVI time series, along with dates and a quality information. After executing MODISSubsets() the crude data is available, but not processed in a comparably nice way as MODISSummaries() would do.
MODISSummaries() however reduces the time series to summary statistics, taking into account quality information.
Is there a way to extract time series for each tile from the crude data (see data frame crude below)? It would be great if that this could return a list of data frames, where each data frame represents one tile and holds data for EVI (or whatever variable), its date, and a quality flag.
Specifically, after doing the following ...
savedir <- './'
modis.subset <- data.frame(
lat = 11.3175,
long = 47.1167,
end.date = "2016-09-29"
)
MODISSubsets(
LoadDat = modis.subset,
Products = "MOD13Q1",
Bands = c("250m_16_days_EVI", "250m_16_days_pixel_reliability"),
Size = c(1,1),
StartDate = FALSE,
SaveDir = savedir,
TimeSeriesLength = 3
)
crude <- read.csv("./Lat47.11670Lon11.31750Start2013-01-01End2016-09-29___MOD13Q1.asc", header = FALSE, as.is = TRUE)
... how would you get to something like
nice <- list( lonX1_latY1=data.frame( date=..., var=..., qual=... ), lonX2_latX2=... )
...?

In short, I was missing that ExtractTile() would be usable with the return value of MODISTimeSeries(). My workaround is based on using ExtractTile() in combination with the output of reading the ASCII file. Here is what I got working for my purpose, returning a list that contains a matrix (npixels_lon, npixels_lat, n_timesteps) containing all the downloaded MODIS data, in this case EVI; a matrix of identical dimensions containing the pixel reliability code; and a vector of length n_timesteps holding the centre pixel information if its quality flag is 0 or the mean of its surrounding pixels otherwise:
read_crude_modis <- function( filn, savedir, expand_x, expand_y ){
# arguments:
# filn: file name of ASCII file holding MODIS "crude" data
# savedir: directory, where to look for that file
# expand_x : number of pixels to the right and left of centre
# expand_y : number of pixels to the top and bottom of centre
# MODIS quality flags:
# -1 Fill/No Data Not Processed
# 0 Good Data Use with confidence
# 1 Marginal data Useful, but look at other QA information
# 2 Snow/Ice Target covered with snow/ice
# 3 Cloudy Target not visible, covered with cloud
library( MODISTools )
ScaleFactor <- 0.0001 # applied to output variable
ndayyear <- 365
## Read dowloaded ASCII file
crude <- read.csv( paste( savedir, filn, sep="" ), header = FALSE, as.is = TRUE )
crude <- rename( crude, c( "V1"="nrows", "V2"="ncols", "V3"="modislon_ll", "V4"="modislat_ll", "V5"="dxy_m", "V6"="id", "V7"="MODISprod", "V8"="yeardoy", "V9"="coord", "V10"="MODISprocessdatetime" ) )
## this is just read to get length of time series and dates
tseries <- MODISTimeSeries( savedir, Band = "250m_16_days_EVI" )
ntsteps <- dim(tseries[[1]])[1]
tmp <- rownames( tseries[[1]] )
time <- data.frame( yr=as.numeric( substr( tmp, start=2, stop=5 )), doy=as.numeric( substr( tmp, start=6, stop=8 )) )
time$dates <- as.POSIXlt( as.Date( paste( as.character(time$yr), "-01-01", sep="" ) ) + time$doy - 1 )
time$yr_dec<- time$yr + ( time$doy - 1 ) / ndayyear
## get number of products for which data is in ascii file (not used)
nprod <- dim(crude)[1] / ntsteps
if ((dim(crude)[1]/nprod)!=ntsteps) { print("problem") }
## re-arrange data
if ( dim(crude)[2]==11 && expand_x==0 && expand_y==0 ){
## only one pixel downloaded
nice_all <- as.matrix( crude$V11[1:ntsteps], dim(1,1,ntsteps) ) * ScaleFactor ## EVI data
nice_qual_flg <- as.matrix( crude$V11[(ntsteps+1):(2*ntsteps)], dim(1,1,ntsteps) ) ## pixel reliability data
} else if ( dim(crude)[2]>11 ){
## multiple pixels downloaded
# nice <- ExtractTile( Data = tseries, Rows = c(crude$nrows,expand_y), Cols = c(crude$ncols,expand_x), Grid = TRUE ) ## > is not working: applying ExtractTile to return of MODISTimeSeries
nice_all <- ExtractTile( Data = crude[1:ntsteps,11:dim(crude)[2]] * ScaleFactor, Rows = c(crude$nrows[1],expand_y), Cols = c(crude$ncols[1],expand_x), Grid = TRUE )
nice_qual_flg <- ExtractTile( Data = crude[(ntsteps+1):(2*ntsteps),11:dim(crude)[2]], Rows = c(crude$nrows[1],expand_y), Cols = c(crude$ncols[1],expand_x), Grid = TRUE )
} else {
print( "Not sufficient data downloaded. Adjust expand_x and expand_y.")
}
## Clean data for centre pixel: in case quality flag is not '0', use mean of all 8 surrounding pixels
if ( expand_x==1 && expand_y==1 ){
nice_centre <- nice_all[2,2,]
nice_centre[ which( nice_qual_flg[2,2,]!=0 ) ] <- apply( nice_all[,,which( nice_qual_flg[2,2,]!=0 )], c(3), FUN=mean)
}
modis <- list( nice_all=nice_all, nice_centre=nice_centre, nice_qual_flg=nice_qual_flg, time=time )
return( modis )
}

Related

%dopar% safe way of write to csv inside foreach loop

[EDITED]
It is a general question: I have seen some posts saying that it is not a good idea to use foreach and write.csv inside a foreach loop due to different cores trying to write in the file at the same time, resulting in missing results. Still, I need to write in an external file inside the parallel loop to get my output (500000+ rows and 10+ columns). Otherwise, it crushes for memory issues. So, I would like to know if there is a more safe way to write a result file within a foreach loop.
I appreciate any help on this
I am adding some more info and a much more simple code and data than what I actually have.
Description: I have two different polygons layers (sf, polygon), each with 500000+ sf. I need to calculate the area of different raster classes (1 raster layer with 3 classes) within each one of the polygons. This is the most time-consuming part of the script, specifically because I need to use sf::sf_intersection multiple times. Then, I use many different combinations of if-else and rules to populate a df with values and rules.
This is the original code, which I get memory issues with the original data:
require(sf)
require(raster)
require(rgdal)
require(rgeos)
require(dplyr)
require(stars)
## Sample data
set.seed(131)
sample_raster = raster(nrows = 1, ncols = 1, res = 0.5, xmn = 0, xmx = 11, ymn = 0, ymx = 11)
values(sample_raster) = rep(1:3, length.out = ncell(sample_raster))
crs(sample_raster) = CRS('+init=EPSG:4326')
plot(sample_raster, axes=T)
sample_raster
##
m = rbind(c(0,0), c(1,0), c(1,1), c(0,1), c(0,0))
p = st_polygon(list(m))
n = 100
l = vector("list", n)
for (i in 1:n)
l[[i]] = p + 10 * runif(2)
sample_poly = st_sfc(l)
data = data.frame(PR_ID = seq(1:100),
COND1 = rep(1:10, length.out = 100))
sample_poly = st_sf(cbind(data, sample_poly))
plot(sample_poly, col = sf.colors(categorical = TRUE, alpha = .5), add=T)
sample_poly = sample_poly %>% st_set_crs(4326)
sample_poly
##
## Code
require(parallel)
require(foreach)
require(doParallel)
idall = as.character(sample_poly$PR_ID)
area = as.numeric(st_area(sample_poly))/10000
# i=1
# listID = idall
# mainpoly = sample_poly
# mainras = sample_raster
# mainpolyarea = area
per.imovel.paralallel = function (listID, mainpoly, mainras, mainpolyarea) { # Starting the function
## Setting the parallel work up into your computer
UseCores = detectCores()-1
cl = parallel::makeCluster(UseCores, output="")
doParallel::registerDoParallel(cl)
writeLines(c(""), "log.txt") # Creates a LOG FILE in the folder to follow processing
FOREACH.RESULT = foreach(i = 1:length(listID), .packages=c('raster', 'rgdal', 'rgeos', 'dplyr', 'parallel',
'doParallel', 'sf', 'stars'), .inorder = T , .combine ='rbind') %dopar%
{ # Stating the paral-loop
sink("log.txt", append=TRUE) # LOG FILE in the home folder
cat(paste(i, "of", length(listID), as.character(Sys.time()),"\n")) # Write to LOG FILE
sink() # end diversion of output
########################
### Pick one poly
px = sf::st_buffer(mainpoly[mainpoly$PR_ID == listID[i],], # Conditional to select the geometry PR_ID in position i
dist = 0.1) # buffer = 0 w/ byid, selects the geometry
########################
### Intersect with raster and get area
px2 = sf::st_buffer(px, dist = 0.1) # Buffer because raster::mask() masks out partially covered cells since it call rasterize() first
desm_prop = raster::crop(mainras, as_Spatial(px2))
desm_prop_shp = if(all(is.na(values(desm_prop)))){NULL
} else {sf::st_intersection(st_cast(sf::st_as_sf(stars::st_as_stars(desm_prop)), "POLYGON"), px)}
names(desm_prop_shp)[1] = if(any(names(desm_prop_shp) == "layer")){"values"
} else {NULL}
desm_prop_bet0108 = if(is.null(desm_prop_shp)){NULL
} else {desm_prop_shp[desm_prop_shp$values == 1, ]}
desm_prop_bet0108 = if(is.null(desm_prop_bet0108) | length(desm_prop_bet0108) == 0){NULL
} else if(length(desm_prop_bet0108$values) == 0){NULL
} else {desm_prop_bet0108}
desm_prop_after08 = if(is.null(desm_prop_shp)){NULL
} else {desm_prop_shp[desm_prop_shp$values == 2, ]}
desm_prop_after08 = if(is.null(desm_prop_after08) | length(desm_prop_after08) == 0){NULL
} else if(length(desm_prop_after08$values) == 0){NULL
} else {desm_prop_after08}
desm_prop_upto00 = if(is.null(desm_prop_shp)){NULL
} else {desm_prop_shp[desm_prop_shp$values == 3, ]}
desm_prop_upto00 = if(is.null(desm_prop_upto00) | length(desm_prop_upto00) == 0){NULL
} else if(length(desm_prop_upto00$values) == 0){NULL
} else {desm_prop_upto00}
area_desm_prop_bet0108 <- if(is.null(desm_prop_bet0108)){0
} else { sum(as.numeric(sf::st_area(desm_prop_bet0108)/10000))} # Deforestation area in PX 2001 - 2008
area_desm_prop_after08 <- if(is.null(desm_prop_after08)){0
} else { sum(as.numeric(sf::st_area(desm_prop_after08)/10000))} # Deforestation area in PX after 2008
area_desm_prop_upto00 <- if(is.null(desm_prop_upto00)){0
} else { sum(as.numeric(sf::st_area(desm_prop_upto00)/10000))} # Deforestation area in PX upto 2000
########################
# RESULTS
TEMP.RESULTS = data.frame(PR_ID = as.character(listID[i]),
PR_AREA_HA = mainpolyarea[i],
PR_D09 = area_desm_prop_after08,
PR_D0108 = area_desm_prop_bet0108,
PR_D00 = area_desm_prop_upto00)
return (TEMP.RESULTS)
} # Ending the loop
return (FOREACH.RESULT)
parallel::stopCluster(cl) # stop cluster
stopImplicitCluster() # stop cluster
gc()
} # Ending the function
#####################################################################################################
results_feach = per.imovel.paralallel (listID = idall, mainpoly = sample_poly, mainras = sample_raster, mainpolyarea = area)
warnings()
I have also tried #mischva11 (modified) suggestion by adding this:
length_of_chunk = round(length(idall)/(length(idall)/10)) # generate chunks of 10 lines
lchunks = split(idall, sort(rep_len(1:length_of_chunk, length(idall))))
for (z in 1:length_of_chunk){
# split up the data in chunks
idall_chunk = as.vector(unlist(lchunks[z]))
results_chunk = per.imovel.paralallel (listID = idall_chunk, mainpoly = sample_poly, mainras = sample_raster, mainpolyarea = area)
# save your foreach results for each chunk, append after the first one
if (z == 1) {write.table(results_chunk, file = "TESTDATAresults1.csv")
}else {write.table(results_chunk, file = "TESTDATAresults1.csv", append = TRUE, col.names = FALSE)}
print(NULL) # print(results_chunk)
}
It works like a charm for this example.
BUT, I have a setback when running it with the real script/data: it takes ages for the foreach to close. I am watching my machine performance and log file.. after processing all lines of my sf object, my CPU work goes down as expected, but it still takes more than 30min (i did not wait for it to completely finish) to close the foreach function.
Because of it, I thought about writing the output on the flow inside the foreach work. But clearly it is not a good idea as explained here. I have seen some posts about the package 'flock' which look the output file for writing the output. I have not tested but it sounds promising.
The problem here is, that you need communication between the cores. One core has to wait for the next one until it's finished writing in the csv. That's not easily done and not possible with foreach as far as I now. foreach does provide this method with the variable inorder(by default true). You are telling us, you got memory issues. So one solution is to chunk up your output if it's possible. I do not have a good dataset for this example, so I use mtcars which will be filled by NAs
library(foreach)
library(parallel)
library(doParallel)
registerDoParallel(4)
# split your output here, I use 5 chunks here. My data is mtcars */
length_of_chunk <-round(nrow(mtcars)/5)
for ( z in 1:length_of_chunk-1){
x<-0
#here the data gets split up
data <- mtcars[(z*length_of_chunk):(z*length_of_chunk+length_of_chunk),]
#foreach with those 5 datarows
results <- foreach(i=1:length_of_chunk, .combine=rbind) %dopar% {
#***your code***
y = data[i,]
return(y)
}
print(results)
# save your foreach results and then begin again
if (z==1) {write.table(results, file= "test.csv")}
else {write.table(results, file="test.csv", append=TRUE, col.names = FALSE)}
}

How to read a .MAP file extension in R?

Is there a simple way to read a file of .MAP extension in R? I have tried a few options below but had no success. Here is a .MAP file for a reproducible example.
context: For some odd reason, the spatial regionalization used in health planning policies in Brazil is only available in this format. I would like to convert it to geopackage so we can add it to the geobr package.
# none of these options work
mp <- sf::st_read("./se_mapas_2013/se_regsaud.MAP")
mp <- rgdal::readGDAL("./se_mapas_2013/se_regsaud.MAP")
mp <- rgdal::readOGR("./se_mapas_2013/se_regsaud.MAP")
mp <- raster::raster("./se_mapas_2013/se_regsaud.MAP")
mp <- stars::read_stars("./se_mapas_2013/se_regsaud.MAP")
ps. there is a similar question on SO focused on Python, unfortunately unanswered
UPDATE
We have found a publication that uses a custom function that reads the .MAP file. See example below. However, it returns a "polylist" object. Is there a simple way to convert it to a simple feature?
original custom function
read.map = function(filename){
zz=file(filename,"rb")
#
# header of .map
#
versao = readBin(zz,"integer",1,size=2) # 100 = versao 1.00
#Bounding Box
Leste = readBin(zz,"numeric",1,size=4)
Norte = readBin(zz,"numeric",1,size=4)
Oeste = readBin(zz,"numeric",1,size=4)
Sul = readBin(zz,"numeric",1,size=4)
geocodigo = ""
nome = ""
xleg = 0
yleg = 0
sede = FALSE
poli = list()
i = 0
#
# repeat of each object in file
#
repeat{
tipoobj = readBin(zz,"integer",1,size=1) # 0=Poligono, 1=PoligonoComSede, 2=Linha, 3=Ponto
if (length(tipoobj) == 0) break
i = i + 1
Len = readBin(zz,"integer",1,size=1) # length byte da string Pascal
geocodigo[i] = readChar(zz,10)
Len = readBin(zz,"integer",1,size=1) # length byte da string Pascal
nome[i] = substr(readChar(zz,25),1,Len)
xleg[i] = readBin(zz,"numeric",1,size=4)
yleg[i] = readBin(zz,"numeric",1,size=4)
numpontos = readBin(zz,"integer",1,size=2)
sede = sede || (tipoobj = 1)
x=0
y=0
for (j in 1:numpontos){
x[j] = readBin(zz,"numeric",1,size=4)
y[j] = readBin(zz,"numeric",1,size=4)
}
# separate polygons
xInic = x[1]
yInic = y[1]
for (j in 2:numpontos){
if (x[j] == xInic & y[j] == yInic) {x[j]=NA; y[j] = NA}
}
poli[[i]] = c(x,y)
dim(poli[[i]]) = c(numpontos,2)
}
class(poli) = "polylist"
attr(poli,"region.id") = geocodigo
attr(poli,"region.name") = nome
attr(poli,"centroid") = list(x=xleg,y=yleg)
attr(poli,"sede") = sede
attr(poli,"maplim") = list(x=c(Oeste,Leste),y=c(Sul,Norte))
close(zz)
return(poli)
}
using original custom function
mp <- read.map("./se_mapas_2013/se_regsaud.MAP")
class(mp)
>[1] "polylist"
# plot
plot(attributes(mp)$maplim, type='n', asp=1, xlab=NA, ylab=NA)
title('Map')
lapply(mp, polygon, asp=T, col=3)
The problems were: use of readChar with trailing nul bytes - changed to readBin(); 8-bit characters that rawToChar() would not accept (on my UTF-8 system); multiple slivers in some files that needed dropping; and some others. I added the edited read.map() function above to maptools, but with a different name and not exported. So now (with maptools rev 370 from https://r-forge.r-project.org/R/?group_id=943 when build completes):
library(maptools)
o <- maptools:::readMAP2polylist("se_regsaud.MAP")
oo <- maptools:::.makePolylistValid(o)
ooo <- maptools:::.polylist2SpP(oo, tol=.Machine$double.eps^(1/4))
rn <- row.names(ooo)
df <- data.frame(ID=rn, row.names=rn, stringsAsFactors=FALSE)
res <- SpatialPolygonsDataFrame(ooo, data=df)
library(sf)
res_sf <- st_as_sf(res)
res_sf
plot(st_geometry(res_sf))
This approach re-uses the maptools code dating back almost twenty years, with minor edits to handle subsequent changes in reading binary files, and fixing slivers.
EDIT: looks like this doesn't work generally across all files so proper conversion to sf would need a deeper look.
Here's a quick stab at resurrection. It might be incorrect to cumulatively sum to get the multi linestrings, I tested with se_municip.MAP and it only had NAs as the closing row of each ring. If it potentially has non-connected multi-rings (multipolygon) then this approach won't work completely.
x <- read.map("se_municip.MAP")
df <- setNames(as.data.frame(do.call(rbind, x)), c("x", "y"))
df$region.name <- rep(attr(x, "region.name"), unlist(lapply(x, nrow)))
## in case there are multi-rings
df$linestring_id <- cumsum(c(0, diff(is.na(df$x))))
df$polygon_id <- as.integer(factor(df$region.name))
df <- df[!is.na(df$x), ]
sfx <- sfheaders::sf_polygon(df, x = "x", y = "y", linestring_id = "linestring_id", polygon_id = "polygon_id", keep = TRUE)
#sf::st_crs(sfx) <- sf::st_crs(<whatever it is probably 4326>)
plot(sf::st_geometry(sfx), reset = FALSE)
maps::map(add = TRUE)
Interesting that you came across an official version of a forgotten legacy!
(BTW can I publish the data sets in a package?)

R Interactive Sankey Diagram + Hierarchize Nodes

I am trying to visualize sequences of events by using Sankey diagrams.
I have a set of event (Event1 to Event16) over sequences of different length.
The steps of the sequences are noted by T0, T0 - 1, T0 - 2 ...
The width of the flow is corresponding to the frequency rate of the sequences.
I would like that all the nodes corresponding to a given step to be aligned vertically.
By using the GoogleVis package I succeed to obtain the following :
Sankey with GoogleVis
As you can see some events T0-1, T0-2 and T0-3... are on the far right, instead of with the others of their time step.
It seems to be due to the fact that it is not possible to have nodes whithout children...
Do you know a way to have hierarchize nodes or/and nodes whithout children, for GoogleVis ?
If not, do you know another R package which could allow to have these characteristics for interactive plots ?
My R code is bellow. The main variable containing the sequences is a list of list, see picture.
Data containing sequences
My code :
# Package
library(googleVis)
library(dplyr)
library(reshape2)
library(tidyverse)
# Load
load("SeqCh")
# Loop -------------------------------------------------------------
# Inits
From = c()
To = c()
Freq = c()
Target = SeqCh
# Get maximum length of sequence
maxls = 0
for (kk in 1:length(Target)){
temp = length(Target[[kk]])
if (temp > maxls){
maxls = temp
}
}
# Loop on length of sequences
for (zz in 2:maxls){
# Prefix to add to manage same event repeated
if (zz == 2){
SufixFrom = "(T0)"
SufixTo = "(T0 - 1)"
} else {
SufixFrom = paste("(T0 - ", as.character(zz-2), ")", sep = "")
SufixTo = paste("(T0 - ", as.character(zz-1), ")", sep = "")
}
# Message
cat("\n")
print(paste(" Processing events from ", SufixFrom, " to ", SufixTo))
# Loop on Target
ind = lapply(Target, function(x) length(x) == zz)
TargetSub = Target[unlist(ind)]
FreqSub = Support[unlist(ind)]
for (jj in 1:length(TargetSub)){
temp = TargetSub[[jj]]
TempFrom = paste(temp[zz-1], SufixFrom, sep = " ")
TempTo = paste(temp[zz], SufixTo, sep = " ")
From = c(From, TempFrom)
To = c(To, TempTo)
Freq = c(Freq, FreqSub[jj])
}
} # end for loop on length of sequences
# All in same variable
Flows = data.frame("From" = From, "To" = To, "Occurence_Frequency" = Freq, stringsAsFactors = FALSE)
# Plot --------------------------------------------------------------------
plot(gvisSankey(Flows, from='From', to='To', weight="Occurence_Frequency",
options=list(height=900, width=1800, sankey="{link:{color:{fill:'lightblue'}}}")))
Thanks, Romain.

Calculate the bearing between more than two data points

I have some tracking data and I want to calculate the bearing over the course of the track. For two points we can use function from the fossil package:
# earth.bear(long1, lat1, long2, lat2)
earth.bear(-10.54427, 52.11112, -10.55493, 52.10944)
# 255.6118
However, this won't work for more than two points. Here's some sample data:
tracks <- read.table(text =
"latitude, longitude
52.111122, -10.544271
52.10944, -10.554933
52.108898, -10.558025
52.108871, -10.560946
52.113991, -10.582005
52.157223, -10.626506
52.194977, -10.652878
52.240215, -10.678817
52.26421, -10.720366
52.264015, -10.720642", header = TRUE, sep = ",")
Try this:
sum(
sapply(1:(nrow(tracks) - 1), function(i){
earth.bear(tracks$longitude[i], tracks$latitude[i],
tracks$longitude[i+1], tracks$latitude[i+1] )
})
)
# 2609.871

Ordering Merged data frames

As a fairly new R programmer I seem to have run into a strange problem - probably my inexperience with R
After reading and merging successive files into a single data frame, I find that order does not sort the data as expected.
I have multiple references in each file but each file refers to measurement data obtained at a different time.
Here's the code
library(reshape)
# Enter file name to Read & Save data
FileName=readline("Enter File name:\n")
# Find first occurance of file
for ( round1 in 1 : 6) {
ReadFile=paste(round1,"C_",FileName,"_Stats.csv", sep="")
if (file.exists(ReadFile))
break
}
x = data.frame(read.csv(ReadFile, header=TRUE),rnd=round1)
for ( round2 in (round1+1) : 6) {
#
ReadFile=paste(round2,"C_",FileName,"_Stats.csv", sep="")
if (file.exists(ReadFile)) {
y = data.frame(read.csv(ReadFile, header=TRUE),rnd = round2)
if (round2 == (round1 +1))
z=data.frame(merge(x,y,all=TRUE))
z=data.frame(merge(y,z,all=TRUE))
}
}
ordered = order(z$lab_id)
results = z[ordered,]
res = data.frame( lab=results[,"lab_id"],bw=results[,"ZBW"],wi=results[,"ZWI"],pf_zbw=0,pf_zwi=0,r = results[,"rnd"])
#
# Establish no of samples recorded
nsmpls = length(res[,c("lab")])
# Evaluate Z_scores for Between Lab Results
for ( i in 1 : nsmpls) {
if (res[i,"bw"] > 3 | res[i,"bw"] < -3)
res[i,"pf_zbw"]=1
}
# Evaluate Z_scores for Within Lab Results
for ( i in 1 : nsmpls) {
if (res[i,"wi"] > 3 | res[i,"wi"] < -3)
res[i,"pf_zwi"]=1
}
dd = melt(res, id=c("lab","r"), "pf_zbw")
b = cast(dd, lab ~ r)
If anyone could see why the ordering only works for about 55 of 70 records and could steer me in the right direction I would be obliged
Thanks very much
Check whether z$lab_id is a factor (with is.factor(z$lab_id)).
If it is, try
z$lab_id <- as.character(z$lab_id)
if it is supposed to be a character vector; or
z$lab_id <- as.numeric(as.character(z$lab_id))
if it is supposed to be a numeric vector.
Then order it again.
Ps. I had previously put these in the comments.

Resources