r Kohonen map - How to find position of one dataset? - r

I have a dataframe df with my data of interest
I rescale with
df.sc <- scale(df)
and make my Kohonen map with
df.grid <- somgrid(15, 10, "hexagonal")
df.som <- som(df.sc, rlen=700, grid = df.grid)
That works fine and I get a nice map.
Now I have an extra datapoint
extra.sc <- as.matrix(-0.29985191, -0.35905786, -0.260923297, -0.2415673150,
-0.259426676, -0.330404078)
It is scaled exactly the same way as df.sc
Now I want to see the position of the unit in the kohonen map given the df.som for the extra.sc
map(df.som,extra.sc)
does not give me what I want.
How can I determine the position of extra.sc within df.som? And preferentially also how I can mark it on the map

Maybe you defined your new data incorrectly, i.e. they did not have similar dimension with that of the training data. Check the output of extra.sc using parenthesis (extra.sc). I recommend that you provide the number of rows and columns to the definition of extra.sc using matrix() and c() function instead of as.matrix(). For example:
extra.sc <- matrix(c(0.29985191, -0.35905786, -0.260923297, -0.2415673150, -0.259426676, -0.330404078), nrow = 1, ncol = 6)`
and observe the result:
(extra.sc)
It is one row and six columns. If you do not provide the shape of your data, then R will regard them as one column and six rows.
extra.sc <- matrix(c(-0.29985191, -0.35905786, -0.260923297, -0.2415673150, -0.259426676, -0.330404078))
(extra.sc)

Related

R function to find the index of a value in an array nearest to a given value

I'm currently working on setting up a locator for some ECG plots in R. The idea is to click twice (locator(n = 2)) to get a minimum and maximum index for the x-axis to then zoom into detail on the ECG.
The issue is that I get two rounded values in the array location_array for the new field of values to plot (for x) but if I then try to use it with my dataframe containing my data df it's using the actual values of my x-axis input from the locator rather than the indexes of the values.
Long story short: How can I get the indexes of the values which are closest to my min and max of location_array?
par(ask = TRUE)
location_array <- locator(n = 2)
location_array <- round(location_array$x)
attach(df)
#need the indexes of the values closest to location_array[1] and location_array[2] of df$time
df2 <- df[location_array[1]:location_array[2],]
Thanks and have great evening everyone
tholori
In case anyone stumbles upon my question, I solved it with:
location_array[1] <- which.min(abs(my.dataframe$Zeit - location_array[1]))
location_array[2] <- which.min(abs(my.dataframe$Zeit - location_array[2]))

I want to use heatmap in my code but i am getting error

heatmap(Web_Data$Timeinpage)
str(Web_Data)
heat = c(t(as.matrix(Web_Data$Timeinpage[,-1])))
heatmap(heat)
A few items to note here:
1) by including the c() operator in the c(t(as.matrix(Web_Data$Timeinpage[,-1]))) You are creating a single vector and not a matrix. You can see this by running the following: is.matirx(c(t(as.matrix(Web_Data$Timeinpage[,-1])))). heatmap (I believe) is checking for a matrix because...
2) You need to provide a matrix with at least two rows and two columns for this function to work. Currently, you are only give on vector - time. You will need to provide some other feature of interest to have it work correctly, such as Continent.
3) If you intend to plot ONLY one field, you may consider doing as suggested here and use the image() function. (I included an example below).
4) I find the heatmap function somewhat dated in look. You may want to consider other popular functions, such as ggplot's geom_tile. (see here).
Below is an example code that should produce an output:
#fake data
Web_Data <- data.frame("Timeinpage" = c(123,321,432,555,332,1221,2,43,0, NA,10, 44),
OTHER = rep(c("good", "bad",6)) )
#a matrix with TWO columns from my data frame. Notice the c() is removed and I am not transposing. Also removing the , from [,-1]
heat <- matrix(c(Web_Data$Timeinpage[-1], Web_Data$OTHER[-1]), 2,11)
#output
heatmap(heat)
#one row
heat2 <- as.matrix(sort(Web_Data$Timeinpage[-1])) #sorting as well
#output
image(heat2)

R function to count coordinates

Trying to get it done via mapply or something like this without iterations - I have a spatial dataframe in R and would like to subset all more complicated shapes - ie shapes with 10 or more coordinates. The shapefile is substantial (10k shapes) and the method that is fine for a small sample is very slow for a big one. The iterative method is
Street$cc <-0
i <- 1
while(i <= nrow(Street)){
Street$cc[i] <-length(coordinates(Street)[[i]][[1]])/2
i<-i+1
}
How can i get the same effect in any array way? I have a problem with accessing few levels down from the top (Shapefile/lines/Lines/coords)
I tried:
Street$cc <- lapply(slot(Street, "lines"),
function(x) lapply(slot(x, "Lines"),
function(y) length(slot(y, "coords"))/2))
/division by 2 as each coordinate is a pair of 2 values/
but is still returns a list with number of items per row, not the integer telling me how many items are there. How can i get the number of coordinates per each shape in a spatial dataframe? Sorry I do not have a reproducible example but you can check on any spatial file - it is more about accessing low level property rather than a very specific issue.
EDIT:
I resolved the issue - using function
tail()
Here is a reproducible example. Slightly different to yours, because you did not provide data, but the principle is the same. The 'principle' when drilling down into complex S4 structures is to pay attention to whether each level is a list or a slot, using [[]] to access lists, and # for slots.
First lets get a spatial ploygon. I'll use the US state boundaries;
library(maps)
local.map = map(database = "state", fill = TRUE, plot = FALSE)
IDs = sapply(strsplit(local.map$names, ":"), function(x) x[1])
states = map2SpatialPolygons(map = local.map, ID = IDs)
Now we can subset the polygons with fewer than 200 vertices like this:
# Note: next line assumes that only interested in one Polygon per top level polygon.
# I.e. assumes that we have only single part polygons
# If you need to extend this to work with multipart polygons, it will be
# necessary to also loop over values of lower level Polygons
lengths = sapply(1:length(states), function(i)
NROW(states#polygons[[i]]#Polygons[[1]]#coords))
simple.states = states[which(lengths < 200)]
plot(simple.states)

How to do for loops without overwriting?

I have a large data.frame called rain with information of many species mesured in different plots at different times (census), from which I want to extract the information. This data frame have many collumns, and in dataF2 I want to keep the same structure however I want to extract from rain the information of the penultimate census (Census.No is one of the collumns of rain) in each plot (Plot.Code is another one). In idx3 I have the information of the number of the penultimate census for each plot.
It's easy to do it for one plot
data1<- rain[Plot.Code==idx3[1,1] & Census.No==idx3[1,2],]
I've been trying to do for loops in R.. but I keep overwriting my data.frame and ending up just with the last loop.
dataF2<- data.frame(nrow= nrow (rain), ncol = ncol (rain))
summary (dataF2)
for (i in 1:length (idx3[,1])){
dataF2<- rain[Plot.Code==idx3[i,1] & Census.No==idx3[i,2],]
}
Here I want to extract from a data frame the information of the penultimate census in each plot (ixd3 contains this information of what was the penultimate census in each plot).
I've tried many things, like:
dataF2<- data.frame(nrow= nrow (rain), ncol = ncol (rain))
for (i in 1:length (idx3[,1])){
data1<- rainfor[Plot.Code==idx3[i,1] & Census.No==idx3[i,2],]
dataF2<- rbind (data1[i])
}
But nothing worked.. my problem is that it keeps overwithin on dataF2!
Cheers!!!
Your clarifications in the comments helped somewhat, but reproducible examples are always better. Let's start at the beginning:
dataF2<- data.frame(nrow= nrow (rain), ncol = ncol (rain))
This is wrong. I think that you're trying to create an empty data frame with the same dimensions as your data frame rain. If you examine dataF2 you'll see that this is far from what you have done with this line. If you read the documentation for the function ?data.frame it will become clear that there are no arguments called nrow and ncol. What you probably intended was something like this:
dataF2 <- rain
dataF2[] <- NA
Inside your for loop you are overwriting your entire data frame because....you are overwriting your entire data frame.
dataF2<- rain[Plot.Code==idx3[i,1] & Census.No==idx3[i,2],]
This assigns something to dataF2, replacing it completely. If you want to assign to just a single row of dataF2 you need to assign to that specific row:
dataF2[i,] <- rain[Plot.Code==idx3[i,1] & Census.No==idx3[i,2],]
I can't absolutely assure that this will work correctly, since you haven't provided a sufficiently detailed example, so I'm not sure that all the dimensions will coincide properly when you index on i. But this is the basic idea.

performing a calculation with a `paste`d vector reference

So I have some lidar data that I want to calculate some metrics for (I'll attach a link to the data in a comment).
I also have ground plots that I have extracted the lidar points around, so that I have a couple hundred points per plot (19 plots). Each point has X, Y, Z, height above ground, and the associated plot.
I need to calculate a bunch of metrics on the plot level, so I created plotsgrouped with split(plotpts, plotpts$AssocPlot).
So now I have a data frame with a "page" for each plot, so I can calculate all my metrics by the "plot page". This works just dandy for individual plots, but I want to automate it. (yes, I know there's only 19 plots, but it's the principle of it, darn it! :-P)
So far, I've got a for loop going that calculates the metrics and puts the results in a data frame called Results. I pulled the names of the groups into a list called groups as well.
for(i in 1:length(groups)){
Results$Plot[i] <- groups[i]
Results$Mean[i] <- mean(plotsgrouped$PLT01$Z)
Results$Std.Dev.[i] <- sd(plotsgrouped$PLT01$Z)
Results$Max[i] <- max(plotsgrouped$PLT01$Z)
Results$75%Avg.[i] <- mean(plotsgrouped$PLT01$Z[plotsgrouped$PLT01$Z <= quantile(plotsgrouped$PLT01$Z, .75)])
Results$50%Avg.[i] <- mean(plotsgrouped$PLT01$Z[plotsgrouped$PLT01$Z <= quantile(plotsgrouped$PLT01$Z, .50)])
...
and so on.
The problem arises when I try to do something like:
Results$mean[i] <- mean(paste("plotsgrouped", groups[i],"Z", sep="$")). mean() doesn't recognize the paste as a reference to the vector plotsgrouped$PLT27$Z, and instead fails. I've deduced that it's because it sees the quotes and thinks, "Oh, you're just some text, I can't get the mean of you." or something to that effect.
Btw, groups is a list of the 19 plot names: PLT01-PLT27 (non-consecutive sometimes) and FTWR, so I can't simply put a sequence for the numeric part of the name.
Anyone have an easier way to iterate across my test plots and get arbitrary metrics?
I feel like I have all the right pieces, but just don't know how they go together to give me what I want.
Also, if anyone can come up with a better title for the question, feel free to post it or change it or whatever.
Try with:
for(i in seq_along(groups)) {
Results$Plot[i] <- groups[i] # character names of the groups
tempZ = plotsgrouped[[groups[i]]][["Z"]]
Results$Mean[i] <- mean(tempZ)
Results$Std.Dev.[i] <- sd(tempZ)
Results$Max[i] <- max(tempZ)
Results$75%Avg.[i] <- mean(tempZ[tempZ <= quantile(tempZ, .75)])
Results$50%Avg.[i] <- mean(tempZ[tempZ <= quantile(tempZ, .50)])
}

Resources