R mlr3 create TaskregrST duplicate rows? - r

I have a dataframe called tab_mlr with coordinates about 19 features in 788 rows.
str(tab_mlr)
This object have 788 observations of 21 variables (with 2 variables as Latitude and Longitude). I create an sf object like this :
data_mlr <- sf::st_as_sf(tab_mlr, coords = c("Longitude", "Latitude"), crs = 4326)
data_mlr have 788 features, that's ok. But when i create a task with this data_mlr like this :
task <- TaskRegrST$new(
"mlr",
backend = data_mlr,
target = "Hauteur"
)
task object have 620 944 rows !!! Why not 788 rows ?

The reason might be that you are making 788 rows for every row, and you have 788^2 rows as a result.

Related

Adding new Data rows in R

I am trying to build a data frame so I can generate a Plot with a specific set of data, but I am having trouble getting the data into a table correctly.
So, here is what I have available from a data query:
> head(c, n=10)
EVTYPE FATALITIES INJURIES
834 TORNADO 5633 91346
856 TSTM WIND 504 6957
170 FLOOD 470 6789
130 EXCESSIVE HEAT 1903 6525
464 LIGHTNING 816 5230
275 HEAT 937 2100
427 ICE STORM 89 1975
153 FLASH FLOOD 978 1777
760 THUNDERSTORM WIND 133 1488
244 HAIL 15 1361
I then tried to generate a set of data variables to build a finished a data.frame like this:
a <- c(c[1,1], c[1,2], c[1,3])
b <- c(c[6,1], c[4,2] + c[6,2], c[4,3] + c[6,3])
d <- c(c[2,1], c[2,2], c[2,3])
e <- c(c[3,1], c[3,2], c[3,3])
f <- c(c[5,1], c[5,2], c[5,3])
g <- c(c[7,1], c[7,2], c[7,3])
h <- c(c[8,1], c[8,2], c[8,3])
i <- c(c[9,1], c[9,2], c[9,3])
j <- c(c[10,1], c[10,2], c[10,3])
k <- c(c[11,1], c[11,2], c[11,3])
df <- data.frame(a,b,d,e,f,g,h,i,j)
names(df) <- c("Event", "Fatalities","Injuries")
But, that is failing miserably. What I am getting is a long string of all the data variables, repeated 10 times. nice trick, but that is not what I am looking for.
I would like to get a finished data.frame with ten (10) rows of the data, like it was originally, but with my combined data in place. Is that possible.
I am using R version 3.5.3. and the tidyverse library is not available for install on that version.
Any ideas as to how I can generate that data.frame?
If a barplot is what you're after, here's a piece of code to get you that:
First, you need to get the data in the right format (that's probably what you tried to do in df), by column-binding the two numerical variables using cbindand transposing the resulting dataframe using t(i.e., turning rows into columns and vice versa):
plotdata <- t(cbind(c$FATALITIES, c$INJURIES))
Then set the layout to your plot, with a wide margin for the x-axis to accommodate your long factor names:
par(mfrow=c(1,1), mar = c(8,3,3,3))
Now you're ready to plot the data; you grab the labels from c$EVTYPE, reduce the label size in cex.names and rotate them with las to avoid overplotting:
barplot(plotdata, beside=T, names = c$EVTYPE, col=c("red","blue"), cex.names = 0.7, las = 3)
(You can add main =to define the heading to your plot.)
That's the barplot you should obtain:

Stratified sampling in R with unequal weights and replacement

I have a large data set with a field containing a combined FIPS code and zip code, and another data set with population weighted centroids for block groups combined with some zip code data. I want to stratify my data by "FIPS code" and then assign each row a set of coordinates for a block group centroid, where the centroid's probability of being selected is proportional to its population.
I was originally using a sample of the data (1000 rows) and the strata function from the sampling package, which worked fine. Now that I want to do this for every row in the data set, however, I'm getting this error:
Error in strata(popCenters2, stratanames = "FIPS", method = "systematic", :
not enough obervations in the stratum 1
I suspect that this is because strata does not use replacement and my data set is much larger than the centroid data set.
This is the code I used with the strata function applied to my sample:
## Combined fields to match format of other data
popCenters2 <- within(popCenters2,
FIPS <- paste(stateFIPS,
countyFIPS,
zipcode,
sep = ""))
sample %>% group_by(FIPS) %>% count() -> sampleCounts
popCenters2[order(popCenters2$FIPS), ] -> popCenters2
sampleCounts[order(sampleCounts$FIPS), ] -> sampleCounts
st = strata(popCenters2, stratanames = "FIPS", method = "systematic", size =
sampleCounts$n, pik = popCenters2$contribPop)
stTable = getdata(popCenters2, st)
My sample had 5 rows with the "FIPS" variable equal to 4200117325, this is the centroid data corresponding to that:
FIPS tract blkGroup latitude longitude contribPop
4200117325 030200 1 +40.000254 -077.137559 452
4200117325 030200 2 +39.959070 -077.160354 324
4200117325 030400 1 +39.915855 -077.406954 194
4200117325 030400 2 +39.923503 -077.298505 131
4200117325 030400 3 +39.878509 -077.307547 173
4200117325 030400 4 +39.873705 -077.360488 176
4200117325 030400 5 +39.880362 -077.412175 108
4200117325 030500 1 +39.926149 -077.227283 630
4200117325 030500 2 +39.921269 -077.260640 459
My question is, how can I reproduce this sort of procedure if, for example, my actual data set has 20 rows corresponding to 4200117325? I've read through the documentation for the strata function and a few others (Strata from DescTools, the survey package) but have been unable to find anything that allows replacement.

Dividing Individual Spatial Polygons Equally in R

I have a shapefile of polygons that are the townships in the state of Iowa.I'd like to divide each element (ie each township) into 9 equal parts (i.e. a 3 x 3 grid for each township). I've figured out how to do this, but am having trouble forming a new dataframe out of the new polygons. My code is below. The data can be downloaded here: https://ufile.io/wi6tt
library(sf)
library(tidyverse)
setwd("~/Desktop")
iowa<-st_read( dsn="Townships/iowa", layer="PLSS_Township_Boundaries", stringsAsFactors = F) # import data
## Make division
r<-NULL
for (row in 1:nrow(iowa)) {
r[[row]]<-st_make_grid(iowa[row,],n=c(3,3))
}
# Combine together
region<-NULL
for (row in 1:nrow(iowa)) {
region<-rbind(region,r[[row]])
}
region<-st_sfc(region,crs=4326) #convert to sfc
reg_id<-data.frame(reg_id=1:length(region)) #make ID for dataframe
# Make SF
region_df<-st_sf(reg_id,region)
The last line gives the following error:
Error in `[[<-.data.frame`(`*tmp*`, all_sfc_names[i], value = list(list( : replacement has 1644 rows, data has 14796
1664 is the number of rows in the initial Iowa dataframe.
Clearly the number of rows does not match the number of elements.
This might be a general r thing, rather than a spatial one, but I figured I'd post the whole thing in case someone had an idea on how to do the entirety of this a little cleaner

R: Read raster layers as one object to apply a function in each ENVI file

I have an ENVI file (82_83_test.envi) that contains biweekly raster layers from 1982 to 1983. That is 24 layers per year, 48 layers in total. I would like to create a for loop to apply a function to perform time-series analysis per year, i.e., R will run through 24 layers in a pixel and calculate 5 parameters with the function "fun" for all pixels for that year. Ultimately, I would like to have 5 plots (5 parameters) for each year, so a total of 10 plots for two years.
I tried working with 1 ENVI file with 2 years of data and 2 ENVI files with 1 year of data in each file. I used brickstack_to_raster_list() from the library spatial.tools to read the file, and I get 48 layers. However, I would like to get 2 chunks (1982 and 1983) which consist of 24 layers for each chunk, so that I can run the equation.
Maybe something like brickstack_to_raster_list() then merge the 1st layer to 24th into one, followed by the 25th to 48th layer into one?
new <- stack("82_83_test.envi")
new1<- brickstack_to_raster_list(new)
new1 returns 48 raster layers. For example,
new1
[[1]]
class : RasterLayer
band : 1 (of 48 bands)
dimensions : 151, 101, 15251 (nrow, ncol, ncell)
resolution : 0.08333333, 0.08333333 (x, y)
extent : -105.0833, -96.66667, 56.66667, 69.25 (xmin, xmax, ymin,
ymax)
coord. ref. : +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84
+towgs84=0,0,0
data source : C:\*\82_83_test.envi
names : Band.1
values : -32768, 5038 (min, max)
The other approach is to concatenate multiple annual ENVI files into a list.
new <- stack("1982_test.envi")
new1<- stack(new,new)
new2<- brickstack_to_raster_list(new1)
Both methods above yield the same result, although I am not certain about its efficiency. Because after getting this set up, I will be generating data from 1982 to 2015, and so efficiency matters a lot.
Below is the function that I would like to apply in the for loop.
# A is an unknown that will be the number of components in the list.
for (i in length(A)) {
new1[new1<=-1000]<-0
Data_value<-new1/10000
# assign 0 to pixel value that is less than -1000 and divide by 10000 in order to use the equation
DOY<-(1:nlayers(new1)*15)
# so that the unit will be in days instead of the number of weeks.
fun<- function(x) { if (all(is.na(x[1]))) { return(rep(NA,5)) } else {
fitForThisData <-nls(x~ a+((b/(1+ exp(-c*(DOY-e))))- (g/(1+ exp(-d*(DOY-
f))))), alg="port",start=list(a=0.1,b=1,g=1,c=0.04,d=0.04,e=112,f=218),
lower=list(a=0,b=0.3,g=0.3,c=-1,d=-1,e=20,f=100),
upper=list(a=0.4,b=2,g=2,c=1,d=1,e=230,f=365),
control=nls.control(maxiter=2000, tol = 1e-15, minFactor = 1/1024,
warnOnly=TRUE))
SOS<-(coef(fitForThisData)[6] -(4.562/(2*coef(fitForThisData)[4])))
EOS<-(coef(fitForThisData)[7] -(4.562/(2*coef(fitForThisData)[5])))
LOS<-(EOS-SOS)
SPUDOY<-(1.317*((-1/coef(fitForThisData)[4])+ coef(fitForThisData)[6]))
P_TAmplitude<-(SPUDOY-SOS)
return (c(SOS,EOS,LOS,SPUDOY,P_TAmplitude))
}
}
}
equation<-calc(Data_value,fun,forceapply=TRUE)
plot(equation)
I would truly appreciate your advice on how to do this. Thank you very much!
After reading in your stack:
library(raster)
new <- stack("82_83_test.envi")
simply split the stack into yearly substacks using basic indexing:
year1 <- new[[1:24]]
year2 <- new[[25:48]]
UPDATE: I was able to loop the function but my guess is that the calculation was done on all raster layers after comparing it with the truth value. However, two files of the same content with different file names are produced, because two files have the same summaries.
new <- stack("82_83_test.envi")
new[new<=-1000]<-0
Data_value<-new/10000
nlayers <- nlayers(new)
nyears <- nlayers(new)/24
DOY<-((1:nlayers(new))/nyears)*15
dummy<- FALSE
for (i in 1:nyears) {
for (j in (1+24*(i-1)):(24*i)) {
fun<-function (x)
equation<-calc(Data_value,fun,forceapply=TRUE)
date<- 1981+i
writeRaster(equation,filename=paste("Output",date,".envi",sep=""),
format="ENVI",overwrite=T)
if (j == nlayers){
dummy<-TRUE
break
if (dummy) {break}
}
}

How to create a heat map in R?

I am doing a multiple part project. To begin with I had a data set which provided the deposits per district over the years. After scrubbing the data set, I was able to create a data frame, which provides the growth of deposits by district. I have growth of deposits by 3 different kinds of institutions - foreign banks, public banks and private banks in 3 different data frames as the # of rows differs in each frame. I have been asked to create 3 maps (heat maps) with deposit growth against each of the kind of banks.
My data frame looks like the attached picture.
I want to make a heat map for the growth column. enter image description here
Thanks.
Maybe I provide some spam by this answer, so delete it without hasitation.
I'll show you how I make some heatmaps in R:
Fake data:
Gene Patient_A Patient_B Patient_C Patient_D
BRCA1 52 46 124 148
TP53 512 487 112 121
FOX3D 841 658 321 364
MAPK1 895 541 198 254
RASA1 785 554 125 69
ADAM18 12 65 85 121
hmcols <- rev(redgreen(2750))
heatmap.2(hm_mx, scale="row", key=TRUE, lhei=c(2,5), symkey="FALSE", density.info="none", trace="none", cexRow=1.1, cexCol=1.1, col=hmcols, dendrogram = "none")
In case of read.table you propably will have to convert data frame to matrix and put first column as a row names to avoid errors from R:
hm <- read.table("hm1.txt", sep = '\t', header=TRUE, stringsAsFactors=FALSE)
row.names(hm) <- hm$Gene
hm_mx <- data.matrix(hm)
hm_mx <- hm_mx[,-c(1)]

Resources