I would like to use the trans function of soil.spec package to transform spectra using continuum removal. But I don't understand data format of raw spectra "raw"
the function example is:
trans(raw, tr = "continuum removed", order = , gap = )
Could someone show me an example of "raw" matrix
for continuum removal you can alternatively use the prospectr package
require(prospectr)
data(NIRsoil)
If your spectral data is in absorbance units then:
crt <- continuumRemoval(X = NIRsoil$spc, type = 'A')
matplot(x = colnames(NIRsoil$spc), y = t(crt),
type = "l", lty = 1,
xlab = "Wavelengths (nm)",
ylab = "Absorbance (CR)",
col = palette(gray(seq(0, 0.9, len = 25))))
If the spectral data is in reflectance units the type argument must be set to 'R'.
I have to say the soil.spec package is very weak on documentation. But, based on this quote from one of the I/O tools,
read.spc reads binary spectral spc-files from a folder into R. The
spectra can be made compatible (see details in make.comp) either to
the first sample wavebands or to the standard wavebands of the ICRAF
spectral lab. Information from the scanning method is gathered to
check on spectral comparability. The default has been set to ICRAF
spectral bands
My suspicion is that you need to have your files in whatever "spectral spc-files" format is, assuming that is an industry standard. Best bet may be to contact the package maintainer directly.
To obtain continuum removal transformation on your spectra using the soil.spec library, proceed as follows:
Prepare the raw spectra table and ensure its columns contains the spectral data to be transformed. Remove all non-spectral columns and ensure that no missing values.
Make the column names of raw spectra table to numeric format.
Proceed to run the transformation as shown below
raw.cw <- trans(raw,tr="continuum removed", order=1, gap=21)
raw.cw contains the raw spectra prior to the transformation, and the transformed spectra matrix now in your case continuum removed and the transformation method used.
To see these three objects run:
names(raw.cw)
raw.cw is an arbitrary object name assigned to the results obtained via other trans function.
Your continuum removed spectra is extracted from the results using the standard syntax as used in R system:
cw.spectra<-raw.cw$trans
We are updating the documentation of the soil.spec package and some of these explanations will be included as we release the next updated version which will bring additional functionalities for handling spectral data.
Kindly let me know if this helps but if you face any difficult with following this guideline to get the expected results, I will be glad to help.
Best,
Andrew
ICRAF
Related
I have a set of one-dimensional data points (locations on a segment), and I would like to test for Complete Spatial randomness. I was planning to run Gest (nearest neighbor), Fest (empty space) and Kest (pairwise distances) functions on it.
I am not sure how I should import my data set though. I can use ppp by setting a second dimension to 0, e.g.:
myDistTEST<- data.frame(
col1= sample(x = 1:100, size = 50, replace = FALSE),
col2= paste('Event', 1:50, sep = ''), stringsAsFactors = FALSE)
myDistTEST<- myDistTEST[order(myDistTEST$col1),]
myPPPTest<- ppp(x = myDistTEST[,1], y = replicate(n = 50, expr = 0),
c(1,120), c(0,0))
But I am not sure it is the proper way to format my data. I have also tried to use lpp, but I am not sure how to set the linnet object. What would be the correct way to import my data?
Thank you for your kind attention.
It will be wrong to simply let y=0 for all your points and then proceed as if you had a point pattern in two dimensions. Your suggestion of using lpp is good. Regarding how to define the linnet and lpp try to look at my answer here.
I have considered making a small package to handle one dimensional patterns more easily in spatstat, but so far I have only started the package with a single function to make the definition of the appropriate lpp easier. If you feel adventurous you can install it from the GitHub repo via the remotes package:
remotes::install_github("rubak/spatstat.1d")
The single function you can use is called lpp1. It basically just wraps up the few steps described in the linked answer.
I used some raw-output files from our flow cytometer which tells me in .csv which intensities it measures at which wavelength for every event/cell.
This resulted in a .csv with around 25000 cells and around 240 measuring points.
Importing the .csv file into R-Studio and removing some measurements yielded a matrix with 25000 obs x 73 variables.
Then I used rPhenograph to calculate the neighborhoods, which worked well.
But now it seems to be a dataframe or something that I genuinely have no idea how to plot it.
Data1 <- read_csv("CD4_3.csv", skip=17)
Data_selected <- select(Data2, ends_with(".A"))
rpheno_out <- Rphenograph(Data_selected)
I hoped to get a plot which looks/resembles a tSNE plot.
Instead, I only got an error-code telling me that ggplot can't handle it.
ggplot(rpheno_out) + geom_point()
Fehler: data must be a data frame, or other object coercible by
fortify(), not an S3 object with class communities
I think you've misunderstood what the Rphenograph() function returns ; the doc states :
A simple R implementation of the phenograph
PhenoGraph
algorithm, which is a clustering method designed for high-dimensional
single-cell data analysis. It works by creating a graph ("network")
representing phenotypic similarities between cells by calclating the
Jaccard coefficient between nearest-neighbor sets, and then
identifying communities using the well known Louvain
method in this graph.
This only builds clusters on your data based on the information you provide. The output has no dimensionally reduced version of your input.
If you want to see what your clustering looks like, then you have to apply your favorite dimension-reduction analysis and plot color-coding with the cluster info from Rphenograph().
To give you an example, I've done this on the provided code from the function's doc :
library(cytofkit)
## Example from Rphenograph's doc
iris_unique <- unique(iris) # Remove duplicates
data <- as.matrix(iris_unique[,1:4])
Rphenograph_out <- Rphenograph(data, k = 45)
## Added bit to see the results
pca <- prcomp(iris_unique[,1:4], retx = T, rank. = 2)
par(mfrow=c(1,2))
plot(pca$x, col=Rphenograph_out$membership, lwd=3,
main="Color by Rphenograph cluster")
plot(pca$x, col=iris_unique$Species, lwd=3,
main="Color by Species")
Results in :
Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 3 years ago.
Improve this question
I hope you can help me with this problem i can't find how to overcome. Sorry if I made some mistakes while writing this post, my english is a bit rusty right now.
Here is the question. I have .shp data that I want to analyze in R. The .shp can be either lines that represent lines of traps we set to catch octopuses or points located directly over those lines, representing where we had catured one.
The question i'm trying to answer is: Are octopuses statistically grouped or not?
After a bit of investigation it seems to me that i need to use R and its linearK function to answer that question, using the libraries Maptools, SpatStat and Sp.
Here is the code i'm using in RStudio:
Loading the libraries
library(spatstat)
library(maptools)
library(sp)
Creating a linnet object with the track
t1<- as.linnet(readShapeSpatial("./20170518/t1.shp"))
I get the following warning but it seems to work
Warning messages:
1: use rgdal::readOGR or sf::st_read
2: use rgdal::readOGR or sf::st_read
Plotting it to be sure everything is ok
plot(t1)
Creating a ppp object with the points
p1<- as.ppp(readShapeSpatial("./20170518/p1.shp"))
I get the same warning here, but the real problems start when I try to plot it:
> plot(p1)
Error in if (!is.vector(xrange) || length(xrange) != 2 || xrange[2L] < :
missing value where TRUE/FALSE needed
In addition: Warning messages:
1: Interpretation of arguments maxsize and markscale has changed (in spatstat version 1.37-0 and later). Size of a circle is now measured by its diameter.
2: In plot.ppp(x, ..., multiplot = FALSE, do.plot = FALSE) :
All mark values are NA; plotting locations only.
3: In plot.ppp(x, ..., multiplot = FALSE, do.plot = FALSE) :
All mark values are NA; plotting locations only.
4: In plot.ppp(x, ..., multiplot = FALSE, do.plot = FALSE) :
All mark values are NA; plotting locations only.
5: In plot.ppp(x, ..., multiplot = FALSE, do.plot = FALSE) :
All mark values are NA; plotting locations only.
6: In plot.ppp(x, ..., multiplot = FALSE, do.plot = FALSE) :
All mark values are NA; plotting locations only.
7: In plot.ppp(x, ..., multiplot = FALSE, do.plot = FALSE) :
All mark values are NA; plotting locations only.
Now what is left is to join the objects in a lpp object and to analyze it with the linearK function
> pt1 <- lpp(p1,t1)
> linearK(pt1)
Function value object (class ‘fv’)
for the function r -> K[L](r)
......................................
Math.label Description
r r distance argument r
est {hat(K)[L]}(r) estimated K[L](r)
......................................
Default plot formula: .~r
where “.” stands for ‘est’
Recommended range of argument r: [0, 815.64]
Available range of argument r: [0, 815.64]
This is my situation right now. What i dont know is why the plot function is not working with my ppp object and how to understant the return of the linearK function. Help(linearK) didn't provide any clue. Since i have a lot of tracks, each with its set of points, my desired outcome would be some kind of summary like x tracks analized, a grouped, b dispersed and c unkown.
Thank you for your time, i'll greatly appreciate if you can help me solve this problem.
Edit: Here is a link to a zip file containing al the shp files of one day, both tracks and points, and a txt file with my code. https://drive.google.com/open?id=0B0uvwT-2l4A5ODJpOTdCekIxWUU
First two pieces of general advice: (1) each time you create a complicated object, print it at the terminal, to see if it is what you expected. (2) When you get an error, immediately type traceback() and copy the output. This will reveal exactly where the error is detected.
A ppp object must include a specification of the study region (window). In your code, the object p1 is created by converting data of class SpatialPointsDataFrame, which do not include a specification of the study region, converted via the function as.ppp.SpatialPointsDataFrame, into an object of class ppp in which the window is guessed by taking the bounding box of the coordinates. Unfortunately, in your example, there is only one data point in p1, so the default bounding box is a rectangle of width 0 and height 0. [This would have been revealed by printing p1.] Such objects can usually be handled by spatstat, but this particular object triggers a bug in the function plot.solist which expects windows to have non-zero size. I will fix the bug, but...
In your case, I suggest you do
Window(p1) <- Window(t1)
immediately after creating p1. This will ensure that p1 has the window that you probably intended.
If all else fails, read the spatstat vignette on shapefiles...
I have managed to find a solution. As Adrian Baddeley noticed there was a problem with the owin object. That problem seems to be bypassed (not really solved) if I create the ppp object in a manual way instead of converting my set of points.
I have also changed the readShapeFile function for the rgdal::readOGR, since the first once was deprecated, and that was the reason of the warnings I was getting.
This is the R script i'm using right now, commented to clarify:
#first install spatstat, maptools y sp
#load them
library(spatstat)
library(maptools)
library(sp)
#create an array of folders, will add more when everything works fine
folders=c("20170518")
for(f in folders){
#read all shp from that folder, both points and tracks
pointfiles <- list.files(paste("./",f,"/points", sep=""), pattern="*.shp$")
trackfiles <- list.files(paste("./",f,"/tracks", sep=""), pattern="*.shp$")
#for each point and track couple
for(i in 1:length(pointfiles)){
#create a linnet object with the track
t<- as.linnet(rgdal::readOGR(paste("./",f,"/tracks/",trackfiles[i], sep="")))
#plot(t)
#create a ppp object for each set of points
pre_p<-rgdal::readOGR(paste("./",f,"/points/",pointfiles[i], sep=""))
#plot(p)
#obtain the coordinates the current set of points
c<-coordinates(pre_p)
#create vector of x coords
xc=c()
#create vector of y coords
yc=c()
#not a very good way to fill my vectors but it works for my study area
for(v in c){
if(v>4000000){yc<-c(yc,v)}
else {if(v<4000000 && v>700000){xc<-c(xc,v)}}
}
print(xc)
print(yc)
#create a ppp object using the vectors of x and y coords, and a window object
#extracted from my set of points
p=ppp(xc,yc,Window(as.ppp(pre_p)))
#join them into an lpp object
pt <- lpp(p,t)
#plot(pt)
#analize it with the linearK function, nsim=9 for testing purposes
#envelope.lpp is the method for analyzing linear point patterns
assign(paste("results",f,i,sep="_"),envelope.lpp(pt, nsim=9, fun=linearK))
}#end for each points & track set
}#end for each day of study
So as you can see this script is testing for CSR each couple of points and track for each day, working fine right now. Unfortunately I have not managed to create a report or reportlike with the results yet (or even to fully understand them), I'll keep working on that. Of course I can use any advice you have, since this is my first try with R and many newie mistakes will happen.
The script and the shp files with the updated folder structure can be found here(113 KB size)
I have been running two unmarked planar point pattern data sets through a series of spatstat functions. Now I would like to use the Kcross.inhom function to describe interaction between the two, but Kcross only works with marked data, so I have combined all x-y data into one csv file and added a column that distinguishes the two. I have established the following point pattern object, but do not understand how to edit the subsequent example of Kcross for my purposes. Or, perhaps there is a better way? Thanks for your help!
# read in data & create ppp
collisionspotholes<-read.csv("cpmulti.csv")
cp<-ppp(collisionspotholes[,3],collisionspotholes[,4],c(40.50390735,40.91115166),c(-74.25262139,-73.7078596))
# synthetic example
pp <- runifpoispp(50)
pp <- pp %mark% factor(sample(0:1, npoints(pp), replace=TRUE))
K <- Kcross(pp, "0", "1")
K <- Kcross(pp, 0, 1) # equivalent
I am not really clear as to what the problem is that you are having. You seem to me to "be there" essentially. However let me, for completeness, spell out the procedure that you should follow:
Let X and Y be your two point patterns (observed, presumably, in the same window).
Put these together into a single pattern:
XY <- superimpose(X=X,Y=Y)
Note that there is no need to dick around with your csv files; it is much more efficient to use the facilities provided by spatstat.
The foregoing syntax produces a multitype point pattern with marks being a factor with levels "X" and "Y". (If you want the levels to be denoted by other symbols you can easily arrange this.)
Then just calculate the inhomogeneous Kcross function:
Ki <- Kcross.inhom(XY,"X","Y")
That is all that there is to it.
Note that the foregoing uses the default method of estimating the intensities of the two patterns, explicitly leave-one-out kernel smoothing with bandwidth chosen by bw.diggle(). There may be better ways of estimating the intensities, perhaps by fitting a parametric model. This depends on the nature of the information available to you.
Interpreting the output of Kcross.inhom() is, IMHO, subtle and difficult.
Be cautious in any conclusions that you draw.
Rolf Turner's answer is correct. However, you say that
I have combined all x-y data into one csv file and added a column that distinguishes the two.
OK, suppose the data frame is called df and it has columns named x and y giving the spatial coordinates and h which is a character vector identifying whether the corresponding point is a pothole (h="p") or a collision (h="c"). Then you could do
X <- ppp(df$x, df$y, xlim, ylim, marks=factor(df$h))
where xlim, ylim are the limits for the spatial coordinates. Or more elegantly
X <- with(df, ppp(x, y, xlim, ylim, marks=factor(h))
Note the use of factor to ensure that the marks are categorical values. Then type
X
to check that you've got a 'multitype point pattern'.
Then you can do, e.g.
K <- Kcross(X)
Ki <- Kcross.inhom(X)
Please read the help files for Kcross, Kcross.inhom for advice about how to use these functions and how to interpret the results.
Incidentally, please do not send the same question to multiple forums at the same time. That is difficult for those who have to answer.
Your comments, suggestions, or solutions are/will be greatly appreciated, thank you.
I'm using the fpc package in R to do a dbscan analysis of some very dense data (3 sets of 40,000 points between the range -3, 6).
I've found some clusters, and I need to graph just the significant ones. The problem is that I have a single cluster (the first) with about 39,000 points in it. I need to graph all other clusters but this one.
The dbscan() creates a special data type to store all of this cluster data in. It's not indexed like a data frame would be (but maybe there is a way to represent it as such?).
I can graph the dbscan type using a basic plot() call. But, like I said, this will graph the irrelevant 39,000 points.
tl;dr:
how do I graph only specific clusters of a dbscan data type?
If you look at the help page (?dbscan) it is organized like all others into sections labeled Description, Usage, Arguments, Details and Value. The Value section describes what the function dbscan returns. In this case it is simply a list (a standard R data type) with a few components.
The cluster component is simply an integer vector whose length it equal to the number of rows in your data that indicates which cluster each observation is a member of. So you can use this vector to subset your data to extract only those clusters you'd like and then plot just those data points.
For example, if we use the first example from the help page:
set.seed(665544)
n <- 600
x <- cbind(runif(10, 0, 10)+rnorm(n, sd=0.2), runif(10, 0, 10)+rnorm(n,
sd=0.2))
ds <- dbscan(x, 0.2)
we can then use the result, ds to plot only the points in clusters 1-3:
#Plot only clusters 1, 2 and 3
plot(x[ds$cluster %in% 1:3,])
Without knowing the specifics of dbscan, I can recommend that you look at the function smoothScatter. It it very useful for examining the main patterns in a scatterplot when you otherwise would have too many points to make sense of the data.
The probably most sensible way of plotting DBSCAN results is using alpha shapes, with the radius set to the epsilon value. Alpha shapes are closely related to convex hulls, but they are not necessarily convex. The alpha radius controls the amount of non-convexity allowed.
This is quite closely related to the DBSCAN cluster model of density connected objects, and as such will give you a useful interpretation of the set.
As I'm not using R, I don't know about the alpha shape capabilities of R. There supposedly is a package called alphahull, from a quick check on Google.