I have some classified raster layers as categorical land cover maps. All the layers having exactly the same categories (lets say: "water", "Trees", "Urban","bare soil") but they are from different time points (e.g. 2005 and 2015)
I load them into memory using the raster function like this:
comp <- raster("C:/workingDirectory4R/rasterproject/2005marsh3.rst")
ref <- raster("C:/workingDirectory4R/rasterproject/2013marsh3.rst")
"comp" is the comparison map at time t+1 and "ref" is the reference map from time t. Then I used the crosstab function to generate the confusion table. This table can be used to explore the changes in categories through the time interval.
contingency.Matrix <- crosstab(comp, ref)
The result is in the matrix format with the "comp" categories in the column and "ref" in the rows. And column and row names labeled with numbers numbers 1 to 4.
Now I have 2 questions and I really appreciate any help on how to solve them.
1- I want to assign the category names to the columns and rows of
the matrix to facilitate it's interpretation.
2- Now let's say I have three raster layers for 2005, 2010 and 2015.
This means I would have two confusion tables one for 2005-2010 and
another one for 2010-2015. What's the best procedure to automate
this process with the minimal interaction from user.
I thought to ask the user to load the raster layers, then the code save them in a list. Then I ask for a vector of years from the user but the problem is how can I make sure that the order of raster layers and the years are the same? And is there a more elegant way to do this.
Thanks
I found a partial answer to my first question. If the categorical map is created in TerrSet(IDRISI) software with the ".rst" extention then I can extract the category names like this:
comp <- raster("C:/rasterproject/2005subset.rst")
attributes <- data.frame(comp#data#attributes)
categories <- as.character(attributes[,8])
and I get a vector with the name of categories. However if the raster layers are created with a different extension then the code won't work. For instance if the raster is created in ENVI then the third line of the code should get changed to:
categories <- as.character(attributes[,2])
Related
I am a beginner using R, and I am wanting to create a dataframe that stores a range of dates to their respective classified time period.
paleo.periods <- c("Paleoindian","Early Paleoindian", "Middle Paleoindian", "Late Paleoindian", "Archaic","Early Archaic", "Middle Archaic","Late Archaic","Woodland","Early Woodland","Middle Woodland","Late Woodland","Late Prehistoric")
paleo.dates <- c(c(13500,8000), c(13500,10050) ,c(10050,9015), c(9015,8000), c(8000,2500), c(8000,5500), c(5500,3500), c(3500,2500), c(2500,1150), c(2500,2000), c(2000,1500), c(1500,1150), c(1150,500))
I would like for the arrangement to come out where I can refer to a given time period, ex: "Late Woodland", and get the associated vector of it's beginning and end timeframes, ex: (1500,1150)
I tried simply doing this by
paleo.seg <- data.frame(paleo.periods,paleo.dates)
however, this creates 3 variables: a list of the periods, a list of the vectors, and paleo.dates. I am not sure why it is creating 3 variables, as I'd like it to be only 2: paleo.periods and paleo.dates. I would also like to refer to them as paleo.seg$paleo.periods which will return the list of periods (and later use this to somehow refer to the periods individually), same with the dates.
Essentially I would like my dataframe to look a bit like this:
paleoperiods paleodates
"Late Woodland" 1500,1100
Therefore I could look specifically for the string "Late Woodland" and find the vector dates. I tried doing this on my current data.frame, and
"Woodland" %in% paleo.seg returns false. So I feel like I am misunderstanding how to build a proper dataframe, as well as being able to match one categorical variable to two dates.
There are a few ways that you could go about this depending on your reasoning about what you want to do with your dataframe. My recommendation would actually be to split the dates column into two separate date columns(start and end I believe, from your description). This way you could calculate or use rules based on the dates. I've found this useful when looking at data, as it gives you the ability to filter based on two different aspects of the date. If you would like them to be in the same column, you could make the dates a character in order to have them in the same column. However, this approach does have drawbacks in terms of using it for exploratory data analysis. An example of this would be:
paleo.dates <- c("13500,8000","13500,10050","10050,9015","9015,8000", ...)
This would allow you to look up Late Woodland and get "1500,1100", but you wouldn't be able to search for periods occurring after 1500 if that type of analysis is something you would be doing at a later point.
I have a panel dataset with population data. I am working mostly with two vectors - population and households. The household vector(there are 3 countries) has a substantial amount of missing values, the population vector is full. I use a model with population as the independent variable to get the missing values of households. What function should I use to extract these values? I do not need to make any forecasts, just to imput the missing data.
Thank you.
EDIT:
This is a printscreen of my dataset:
https://imagizer.imageshack.us/v2/1366x440q90/661/RAH3uh.jpg
As you can see, many values of datatype = "original" data are missing and I need to input it somehow. I have created several panel data models (Pooled, within, between) and without further considerations tried to extract the missing data with each of them; however I do not know how to do this.
EDIT 2: What I need is not how to determine which model to use but how to get the missing values(so making the dataset more balanced) of the model.
I am a new R user and an unexperienced coder and I have a data handling problem. Hopefully someone can help:
I have a data.frame with 3 columns (firm, year, class) and about 50.000 rows. I want to generate and store for every firm a (class x year) matrix with class counts as the elements in the matrix. Every matrix would be automatically named something like firm.name and stored so that I can use them afterwards for computations. Ideally, I'd be able to change the simple class counts into a function of values in columns 4 and 5 (backward and forward citations)
I am looking at 40 firms, 30 years, and about 1500 classes (so many firm-year-class counts are zero).
I realise I can get most of what I need (for counts) by simply using table(class,year,firm) as these columns have the same length. However, I don't know how to either store or access the matrices this function generates...
Any help would be greatly appreciated!
Simon
So, your question is how to deal with a table object?
Example:
#note the assigment operator
mytable <- with(ChickWeight, table(cut(weight, c(0,100,200,Inf)), Diet, Chick))
#access the data for the first chick
mytable[,,1]
#turn the table object into a data.frame
as.data.frame(mytable)
I am working with NDVI3g data sets. My problem is that i am trying to create monthly composite data sets from the bi-monthly original data sets using maximum value composite method in R. Please i need your help, because i tried my possible best, but couldn't figure it out. The problem with data is that the first composite in a month is named as for example below;
AF99sep15a.n14-VI3g: first 15 days
AF99sep15b.n14-VI3g : Last 15 days;
I have 31 years data sets (i.e 1982-2012).
Kindly need your help on how to combine the whole data sets into a monthly composite.
given RasterStack gimms and that you want to average sequential pairs, I think you can do
i <- rep(1:(nlayers(gimms)/2), each =2)
x <- stackApply(gimms, i, mean)
Make sure to also check out the gimms package which includes the function monthlyComposite (including optional parallel support) to create monthly maximum value composites from the initial half-monthly layers. Needless to say, the function is heavily based on stackApply from the raster package.
I have about 500,000 points in R of occurrence data of a migratory bird species throughout the US.
I am attempting to overlay a grid on these points, and then count the number of occurrences in each grid. Once the counts have been tallied, I then want to reference them to a grid cell ID.
In R, I've used the over() function to just get the points within the range map, which is a shapefile.
#Read in occurrence data
data=read.csv("data.csv", header=TRUE)
coordinates(data)=c("LONGITUDE","LATITUDE")
#Get shapefile of the species' range map
range=readOGR(".",layer="data")
proj4string(data)=proj4string(range)
#Get points within the range map
inside.range=!is.na(over(data,as(range,"SpatialPolygons")))
The above worked exactly as I hoped, but does not address my current problem: how to deal with points that are the type SpatialPointsDataFrame, and a grid that is a raster. Would you recommend polygonizing the raster grid, and using the same method I indicated above? Or would another process be more efficient?
First of all, your R code doesn't work as written. I would suggest copy-pasting it into a clean session, and if it errors out for you as well, correcting syntax errors or including add-on libraries until it runs.
That said, I assume that you are supposed to end up with a data.frame of two-dimensional numeric coordinates. So, for the purposes of binning and counting them, any such data will do, so I took the liberty of simulating such a dataset. Please correct me if this doesn't capture a relevant aspect of your data.
## Skip this line if you are the OP, and substitute the real data instead.
data<-data.frame(LATITUDE=runif(100,1,100),LONGITUDE=runif(100,1,100));
## Add the latitudes and longitudes between which each observation is located
## You can substitute any number of breaks you want. Or, a vector of fixed cutpoints
## LATgrid and LONgrid are going to be factors. With ugly level names.
data$LATgrid<-cut(data$LATITUDE,breaks=10,include.lowest=T);
data$LONgrid<-cut(data$LONGITUDE,breaks=10,include.lowest=T);
## Create a single factor that gives the lat,long of each observation.
data$IDgrid<-with(data,interaction(LATgrid,LONgrid));
## Now, create another factor based on the above one, with shorter IDs and no empty levels
data$IDNgrid<-factor(data$IDgrid);
levels(data$IDNgrid)<-seq_along(levels(data$IDNgrid));
## If you want total grid-cell count repeated for each observation falling into that grid cell, do this:
data$count<- ave(data$LATITUDE,data$IDNgrid,FUN=length);
## You could have also used data$LONGITUDE, doesn't matter in this case
## If you want just a table of counts at each grid-cell, do this:
aggregate(data$LATITUDE,data[,c('LATgrid','LONgrid','IDNgrid')],FUN=length);
## I included the LATgrid and LONgrid vectors so there would be some
## sort of descriptive reference accompanying the anonymous numbers in IDNgrid,
## but only IDNgrid is actually necessary
## If you want a really minimalist table, you could do this:
table(data$IDNgrid);