Matrices of difent size multiplication - r

Good evenning
In Rstudio
I have a problem multiplying these two matrices of a different size, and it becomes worse because I have to multiply in such a way that the values in the row d2$ID=1 have to multiply only the repetitions of w$sample=1.
sample and ID are indicating is the same sample
In other words, from the "subset" d2$ID=1, every single value ("L1", "ST", "GR", "CB", "HSK", "DDM") has to multiply the whole "subset" w$sample=1 (4 rows in this case, but not always), yes, all the values "G2", "G4", "G6", "G8", "G12"
>d2
ID L1 ST GR CB HSK DDM
1 1 0.1662000 0.2337000 0.3637000 0.11110000 0.10100000 0.024300000
2 2 0.1896576 0.2280830 0.3705740 0.09406879 0.09319434 0.024422281
3 3 0.1110259 0.2217769 0.4180797 0.11122498 0.10902635 0.028866094
4 4 0.1558785 0.2008862 0.4222565 0.09805538 0.10218119 0.020742172
5 5 0.1536421 0.1674096 0.4205395 0.14362176 0.08635519 0.028431849
6 6 0.1841964 0.1514189 0.4603306 0.10243621 0.08928011 0.012337688
> w
sample G2 G4 G6 G8 G12
1 1 10.9 15.9 21.4 28.0 37.8
2 1 11.5 16.6 22.2 29.5 38.3
3 1 10.3 15.1 20.7 28.3 36.7
4 1 11.7 18.1 24.8 31.2 39.5
5 2 11.0 16.8 22.4 30.6 38.0
6 2 10.1 15.9 22.5 30.2 36.7
7 2 12.8 17.8 22.8 28.7 37.1
8 2 11.8 16.3 20.8 27.3 34.7
9 2 11.9 16.7 21.6 28.3 34.6
10 3 12.0 18.1 24.2 30.9 40.0
11 3 12.2 17.7 24.2 31.7 40.5
12 4 11.1 16.5 22.7 31.0 39.2
13 4 12.5 19.8 27.4 32.8 38.8
14 4 12.4 19.2 25.8 33.0 39.9
15 4 12.4 19.2 26.2 33.4 38.9
16 4 13.4 18.3 23.7 30.0 38.2
17 5 13.3 18.6 24.0 30.7 38.4
18 5 13.3 18.1 22.9 30.1 36.8
19 5 13.7 19.9 26.5 33.8 43.0
20 5 12.7 18.2 24.6 32.5 41.3
21 6 12.1 17.5 24.3 33.7 42.2
22 6 14.5 20.8 28.4 35.3 43.7
I have check already a lot of questions but I can't figure it out, specially because most of the information is for matrices of the same size.
I tried by filtering the data from d2, but the data set is really big, then is really inefficient.
I am a beginner, if you consider is so easy I would appreciate at least a hint, please!
I have several data sets like these ones...
Thanks in advance!

This seems to perform as requested:
res <- apply(w, 1, function(x){ unclass(
outer(as.matrix( x[-1] ),
as.matrix( d2[1, c( "L1", "ST", "GR", "CB", "HSK", "DDM")])))})
str(res)
# result
# num [1:30, 1:22] 1.81 2.64 3.56 4.65 6.28 ...
# - attr(*, "dimnames")=List of 2
# ..$ : NULL
# ..$ : chr [1:22] "1" "2" "3" "4" ...
I almost got it right on the first pass but after some debugging found that I needed to add the as.matrix call to both arguments inside outer (so to speak ;-). To explain my logic ... I wanted to run down each row of w with apply and then use match on the value of the first column (of each row of w) to the unique row of d2. The match function is designed for just this purpose, to return a suitable number to be used for indexing. Then with the rest of the row (x[-1] by the time it was passed through the function call), I would use outer on the row values crossed with the desired row and columns of d2. If you do it without the as.matrix calls you get an error message:
Error in tcrossprod(x, y) :
requires numeric/complex matrix/vector arguments
I don't think that's a very informative error message. Both of the arguments were numeric vectors.

Related

Tidyverse: Error in as.matrix : attempt to apply non-function

I am trying to calculate SPEI values using SPEI package and Hargreaves method. I want to automate the process so that I can calculate SPEI for all 6 stations in one go and save them to a new file spei.3.
SPEI is calculated in three steps. First, we calculate PET values (spei_pet), which is then subtracted from Precipitation value to calculate climatic water balance (spei_cwbal). The CWBAL value is then used in SPEI function from the package of the same name with a scale to calculate SPEI values.
I am new to R and very new to tidyverse, but the internet says they are easier to work on. I wrote the code below to do my task. But I am surely missing something (or maybe, many things) because the code throws an error. Please help me identify error in my code, and help me get a solution.
library(tidyverse)
library(SPEI)
file_path = "I:/Proj/Excel sheets - climate/SPI/heatmap/spei_forecast_data.xlsx"
file_forecast = openxlsx::read.xlsx(file_path)
##spei calculation
spei.scale = c(3, 6, 9, 12, 15, 24)
stations = c(1:3, 5:7)
lat = c(23.29, 23.08, 22.95, 22.62, 22.43, 22.40)
lat.fn = function(i) {
if (i <= 3)
lat.fn = lat[i]
else if (i == 5)
lat.fn = lat[4]
else if (i == 6)
lat.fn = lat[5]
else if (i == 7)
lat.fn = lat[6]
}
for ( i in stations) {
file_forecast %>%
mutate(spei_pet[i] <- hargreaves(Tmin = file_forecast$paste("tmin", i),
Tmax = file_forecast$paste("tmax", i),
Pre = file_forecast$paste("p", i),
lat = lat.fn[i])) %>%
mutate(spei_cwbal[i] <- spei_pet[[i]] - file_forecast$paste("p", i)) %>%
mutate(spei.3[i] <- spei(spei_cwbal[[i]], scale = 3))
}
It throws an error
Error in as.matrix(Tmin) : attempt to apply non-function
lat.fn[i] also throws an error, which gets rectified if I use no i. But I need to use some kind of function so that lat.fn takes different value depending on i.
Error in lat.fn[i] : object of type 'closure' is not subsettable
Thanks.
Edit: The data is in the form of a data.frame. I converted it into a tibble to give an idea of what it looks like.
> file_forecast
# A tibble: 960 x 20
Month p7 p6 p5 p3 p2 p1 tmax7 tmax6 tmax5 tmax3 tmax2 tmax1 tmin7 tmin6 tmin5 tmin3 tmin2 tmin1
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Jan 0.162 0.185 0.293 0.436 0.529 0.658 26.4 26.5 26.2 25.9 25.7 24.9 9.57 9.75 10.0 10.4 9.94 9.77
2 Feb 0.207 0.305 0.250 0.260 0.240 0.186 32.2 32.2 32.1 31.9 31.8 30.9 12.4 12.7 12.7 13.0 12.2 11.9
3 Mar 0.511 0.650 0.602 0.636 0.625 0.501 37.3 37.1 37.1 37.0 36.9 36.1 18.7 19.3 18.3 18.0 17.3 16.9
4 Apr 0.976 1.12 1.05 1.12 1.17 1.16 39.5 39.2 39.6 39.5 39.5 38.8 22.8 23.2 22.5 22.2 21.7 20.8
5 May 3.86 4.12 3.76 4.29 4.15 3.84 38.2 37.9 38.3 38.1 38.2 37.6 25.1 25.4 24.9 24.7 24.5 23.8
6 Jun 7.31 8.27 7.20 8.51 9.14 8.76 38.0 37.6 38.1 38.0 38.0 37.7 27.2 27.3 26.9 26.7 26.6 26.1
7 Jul 13.9 15.6 13.2 17.0 19.1 17.8 33.9 33.6 34.0 33.9 33.8 33.5 26.8 26.9 26.6 26.5 26.4 26.0
8 Aug 15.2 17.2 14.4 18.6 20.1 18.4 32.6 32.4 32.7 32.4 32.3 32.0 26.2 26.4 26.1 25.9 25.9 25.4
9 Sep 11.4 11.9 10.5 12.9 13.2 13.1 31.9 31.9 31.8 31.5 31.5 30.9 24.4 24.6 24.3 24.3 24.3 23.7
10 Oct 5.19 5.76 4.81 5.40 5.44 5.04 29.8 30.0 29.6 29.3 29.3 28.6 20.9 21.1 20.8 20.9 20.8 20.2
# ... with 950 more rows, and 1 more variable: year <dbl>

read data from clipboard correctly in r

I want to read data to r from clipboard but the data dimension is wrong. The question is how I can read data from clipboard correctly and how can I distinguish the data separator.
My data is this
group month Estimate lwr upr
placebo 0 18.7 17.6 19.9
placebo 6 21.5 20.3 22.7
placebo 12 24.3 22.8 25.7
placebo 18 27.0 25.2 28.9
active 0 18.7 17.6 19.9
active 6 20.8 19.6 22.0
active 12 22.9 21.4 24.3
active 18 25.0 23.1 26.8
Code I tried is this
d1 <- read.delim('clipboard')
d2 <- readClipboard()

Error from rollingmedian using columns

I have a dataframe for which I try to add additional column calculating the median of the current and the previous 2 values.
Date Value
21/07/2016 14.8
22/07/2016 14.9
23/07/2016 15.8
24/07/2016 15.0
25/07/2016 15.7
26/07/2016 15.6
27/07/2016 16.1
28/07/2016 16.1
I used the following code:
library(zoo)
dataframe$medianval <-rollmedian(dataframe$Value,k=3)
I get the following error
> Error: k <= n is not TRUE
Any suggestions?
Think about what R is trying to do here. The data frame has 8 rows, but the vector you want to append has only 6 elements. To which rows should those elements align? What should R put in the other two spots?
library(zoo)
dataframe <- read.table(text="Date Value
21/07/2016 14.8
22/07/2016 14.9
23/07/2016 15.8
24/07/2016 15.0
25/07/2016 15.7
26/07/2016 15.6
27/07/2016 16.1
28/07/2016 16.1", header=TRUE)
rollmedian(dataframe$Value,k=3)
# [1] 14.9 15.0 15.7 15.6 15.7 16.1
nrow(dataframe) # [1] 8
length(rollmedian(dataframe$Value,k=3)) # [1] 6
Because I can guess what you meant (correct me if I'm wrong), I would try:
dataframe$medianval <- c(NA, NA, rollmedian(dataframe$Value,k=3))
dataframe
# Date Value medianval
# 1 21/07/2016 14.8 NA
# 2 22/07/2016 14.9 NA
# 3 23/07/2016 15.8 14.9
# 4 24/07/2016 15.0 15.0
# 5 25/07/2016 15.7 15.7
# 6 26/07/2016 15.6 15.6
# 7 27/07/2016 16.1 15.7
# 8 28/07/2016 16.1 16.1
If you want to be able to adapt this conveniently, you should write a simple function:
med.fun <- function(var, data, k){
# Note: variable name must be in quotes
return(c(rep(NA, k-1), with(data, rollmedian(get(var), k=k))))
}
med.fun("Value", dataframe, 5)
# [1] NA NA NA NA 15.0 15.6 15.7 15.7

How to use aritmatic using tapply() in R

I'm calling height, diameter and age from a csv file. I'm trying to calculate the volume of the tree using pi x h x r^2. In order to calculate the radius, I'm taking dbh and dividing it by 2. Then I get this error.
Error in dbh/2 : non-numeric argument to binary operator
setwd("/Users/user/Desktop/")
treeg <- read.csv("treeg.csv",row.names=1)
head(treeg)
heights <- tapply(treeg$height.ft,treeg$forest, identity)
ages <- tapply(treeg$age,treeg$forest, identity)
dbh <- tapply(treeg$dbh.in,treeg$forest, identity)
radius <- dbh / 2
In the vector dbh it is storing the diameter from he csv file in terms of forest which is the ID.
How can I divide dbh by 2, while still retaining format of each value being stored by its receptive ID (which is he forest ---> treeg$forest) and treeg is the dataframe that call the csv file.
> head(treeg)
tree.ID forest habitat dbh.in height.ft age
1 1 4 5 14.6 71.4 55
2 1 4 5 12.4 61.4 45
3 1 4 5 8.8 40.1 35
4 1 4 5 7.0 28.6 25
5 1 4 5 4.0 19.6 15
6 2 4 5 20.0 103.4 107
str(dbh)
List of 9
$ 1: num [1:36] 19.9 18.6 16.2 14.2 12.3 9.4 6.8 4.9 2.6 22 ...
$ 2: num [1:60] 16.5 15.5 14.5 13.7 12.7 11.4 9.5 8 5.9 4.1 ...
$ 3: num [1:50] 18.4 17.2 15.6 13.7 11.6 8.5 5.3 2.8 13.3 10.6 ...
$ 4: num [1:81] 14.6 12.4 8.8 7 4 20 18.8 17 15.9 14 ...
$ 5: num [1:153] 28 27.2 26.1 25 23.7 21.3 19 16.7 12.2 9.8 ...
$ 6: num [1:22] 21.3 20.2 19.1 18 16.9 15.6 14.8 13.3 11.3 9.2 ...
$ 7: num [1:63] 13.9 12.4 10.6 8.1 5.8 3.4 27 25.6 23 20.2 ...
$ 8: num [1:27] 20.8 17.7 15.6 13.2 10.5 7.5 4.8 2.9 12.9 11.3 ...
$ 9: num [1:50] 23.6 20.5 16.9 14.1 11.1 8 5.1 2.9 24.1 20.9 ...
- attr(*, "dim")= int 9
- attr(*, "dimnames")=List of 1
..$ : chr [1:9] "1" "2" "3" "4" ...
Are you just trying to create a radius column that is dbh.in divided by two?
treeg <- read.table(textConnection("tree.ID forest habitat dbh.in height.ft age
1 1 4 5 14.6 71.4 55
2 1 4 5 12.4 61.4 45
3 1 4 5 8.8 40.1 35
4 1 4 5 7.0 28.6 25
5 1 4 5 4.0 19.6 15
6 2 4 5 20.0 103.4 107"), header=TRUE)
treeg$radius <- treeg$dbh.in / 2
Or do you need that dbh list for something...
dbh <- tapply(treeg$dbh.in,treeg$forest, identity)
> dbh
$`4`
[1] 14.6 12.4 8.8 7.0 4.0 20.0
lapply(dbh, function(x)x/2)
List of 1
$ 4: num [1:6] 7.3 6.2 4.4 3.5 2 10

Draw histograms per row over multiple columns in R

I'm using R for the analysis of my master thesis
I have the following data frame: STOF: Student to staff ratio
HEI.ID X2007 X2008 X2009 X2010 X2011 X2012
1 OP 41.8 147.6 90.3 82.9 106.8 63.0
2 MO 20.0 20.8 21.1 20.9 12.6 20.6
3 SD 21.2 32.3 25.7 23.9 25.0 40.1
4 UN 51.8 39.8 19.9 20.9 21.6 22.5
5 WS 18.0 19.9 15.3 13.6 15.7 15.2
6 BF 11.5 36.9 20.0 23.2 18.2 23.8
7 ME 34.2 30.3 28.4 30.1 31.5 25.6
8 IM 7.7 18.1 20.5 14.6 17.2 17.1
9 OM 11.4 11.2 12.2 11.1 13.4 19.2
10 DC 14.3 28.7 20.1 17.0 22.3 16.2
11 OC 28.6 44.0 24.9 27.9 34.0 30.7
Then I rank colleges using this commend
HEIrank1<-(STOF[,-c(1)])
rank1 <- apply(HEIrank1,2,rank)
> HEIrank11
HEI.ID X2007 X2008 X2009 X2010 X2011 X2012
1 OP 18.0 20 20.0 20.0 20.0 20
2 MO 14.0 9 13.0 13.5 2.0 12
3 SD 15.0 16 17.0 16.0 16.0 19
4 UN 20.0 18 8.0 13.5 14.0 13
5 WS 12.0 8 4.0 7.0 6.0 8
6 BF 6.5 17 9.5 15.0 10.0 14
7 ME 17.0 15 19.0 19.0 17.0 15
8 IM 2.0 6 12.0 8.0 8.5 10
9 OM 4.5 3 2.5 3.0 3.0 11
10 DC 11.0 14 11.0 9.0 15.0 9
11 OC 16.0 19 16.0 18.0 19.0 17
I would like to draw histogram for each HEIs (for each row)?
If you use ggplot you won't need to do it as a loop, you can plot them all at once. Also, you need to reformat your data so that it's in long format not short format. You can use the melt function from the reshape package to do so.
library(reshape2)
new.df<-melt(HEIrank11,id.vars="HEI.ID")
names(new.df)=c("HEI.ID","Year","Rank")
substring is just getting rid of the X in each year
library(ggplot2)
ggplot(new.df, aes(x=HEI.ID,y=Rank,fill=substring(Year,2)))+
geom_histogram(stat="identity",position="dodge")
Here's a solution in lattice:
require(lattice)
barchart(X2007+X2008+X2009+X2010+X2011+X2012 ~ HEI.ID,
data=HEIrank11,
auto.key=list(space='right')
)

Resources