Creating a 3-dimensional array of excel files in R - r

I have the following MWE, although the datasets are unavailable:
N <- 84 #Number of datasets to pull data from
dates <- c("2010.01", "2010.02", "2010.03", "2010.04", "2010.05", "2010.06", "2010.07", "2010.08",
"2010.09", "2010.10", "2010.11", "2010.12", "2011.01", "2011.02", "2011.03", "2011.04", "2011.05",
"2011.06", "2011.07", "2011.08", "2011.09", "2011.10", "2011.11", "2011.12", "2012.01", "2012.02",
"2012.03", "2012.04", "2012.05", "2012.06", "2012.07", "2012.08", "2012.09", "2012.10", "2012.11",
"2012.12", "2013.01", "2013.02", "2013.03", "2013.04", "2013.05", "2013.06", "2013.07", "2013.08",
"2013.09", "2013.10", "2013.11", "2013.12", "2014.01", "2014.02", "2014.03", "2014.04", "2014.05",
"2014.06", "2014.07", "2014.08", "2014.09", "2014.10", "2014.11", "2014.12", "2015.01", "2015.02",
"2015.03", "2015.04", "2015.05", "2015.06", "2015.07", "2015.08", "2015.09", "2015.10", "2015.11",
"2015.12", "2016.01", "2016.02", "2016.03", "2016.04", "2016.05", "2016.06", "2016.07", "2016.08",
"2016.09", "2016.10", "2016.11", "2016.12") #list of all dates to loop through
A <- list() #empty list to store excel files
for (k in seq_along(dates)) {
A[k] <- read_excel(paste0("~/R/data.", dates[k], ".xlsx"), range = "B3:EO94")
}
Total <- array(unlist(A), dim=c(91,144,84))
First <- read_excel(paste0("~R/data.", dates[1], ".xlsx"), range="B3:EO94")
This gives me Total the 3-dimensional array and then First which should be the first "slice" of the array. So if I take some arbitrary coordinate, say 15,34, then I should be able to pull the exact some value from both Total and First so I try the following:
> Total[15,34,1]
[1] 0.000392432
> A1[15,34]
# A tibble: 1 x 1
`-97.5`
<dbl>
1 0.000384
The 0.000384 is the proper number found in the excel file from AI18 and the number given from Total is incorrect. What gives? To further double-check I compared Total[15,34,2] with the second "slice" and alas, the same incorrect result from Total.

Try using the double square brackets, A[[k]] to assign the data from the Excel files.
A <- list() #empty list to store excel files
for (k in seq_along(dates)) {
A[[k]] <- read_excel(paste0("~/R/data.", dates[k], ".xlsx"), range = "B3:EO94")
}

Related

Graphing Values from multiple H5/HDF5 files at once

I've first figured out how to read and name multiple H5 files from my directory, but I'm running into actually being able to graph with them. My problem is multiple - with this type of file, I do not know how to make the columns have the same number of rows and I do not know how to call on specific files.
My initial setup is as followed
library("rhdf5")
library("ggplot2")
library("fs")
library("tidyverse")
wd <- "D:/Data/1282-1329/"
setwd(wd)
testh5 <- H5Fopen("1282.h5")
H5Fclose(testh5)
y <- h5read(file = "1282.h5",
name = "/Signal")
x <- h5read(file = "1282.h5",
name = "/Scan")
The / refers to the H5 files 'Group' and the Signal or Scan refers to the 'Name', thus "/Signal" creates a numerical list with a length of 48 (number of files within 1282-1329). I make multiple lists from each of these by doing
file_paths <- fs::dir_ls("D:/Data/1282-1329/H5")
file_paths
file_Scan <- list()
for (i in seq_along(file_paths)) {
file_Scan[[i]] <- h5read(
file = file_paths[[i]],
name = "/Scan"
)
}
file_Signal <- list()
for (i in seq_along(file_paths)) {
file_Signal[[i]] <- h5read(
file = file_paths[[i]],
name = "/Signal"
)
}
file_Scan <- setNames(file_Scan, file_paths)
file_Signal <- setNames(file_Signal, file_paths)
Thus str(file_Signal) gives me something like..
List of 48
$ D:/Data/1282-1329/H5/1282.h5: num [1:8044(1d)] 11569527 11576106 10848312 11007212 11074822 ...
$ D:/Data/1282-1329/H5/1283.h5: num [1:8045(1d)] 9746633 9886735 10000637 9617273 ...
So my first problem here is [1:8044(1d)] and [1:8045(1d)] - they're one row off. But I'm unable to add in NAs or make the lengths the same as I would a normal list. Is it because I'm thinking about this wrong? I feel like the solution is simple.
My ultimate goal will be to create multiple single plots for each of these files in the directory using something like
for (i in seq_along(file_paths)) {
plots[[i]] = ggplot(file_paths, aes(x=file_Signal, y=file_Scan))+
geom_point(size=1)
}
Then use these to create a rolling gif of the files with Even numbers (1282, 1284, 1286, etc) and Odd numbers (1283, 1285, 1287, etc.)
Thank you for any help or resources to might have to offer.

Writing a single NCDF4 file with variables extracted from 159 separate NCDF4 files

I have a 159 Ncdf4 files with 56 ensembles in each file. I want to pull out ensemble 1 from each of the 159 input files. Then produce a single NCDF4 file with all the ensemble 1 in a single file. My code is below. My problem is that only data the last file of the 159 is written to the output file. I think I am missing a nested loop, but not sure and my attempts have failed.
rm(list=ls())
library(ncdf.tools)
library(ncdf4)
library(ncdf4.helpers)
library(RNetCDF)
setwd("D:/Rwork/Project") # set working folder
#####Write NCDF4 files#############################################
dir("D:/Rwork/Project/Test")->xlab # This is the directory where the file for analysising are
filelist <- paste("Test/",dir("Test"),sep="")
N <- length(filelist) # Loop over the individual files
for(j in 1:N){
File<-nc_open(filelist[j])
print(filelist[j])
Temperature<-ncvar_get(File,"t2m")
Lat<-ncvar_get(File, "lat")
Lon<-ncvar_get(File,"lon")
Time<-ncvar_get(File,"time")
EnsambleNo.<-ncvar_get(File,"ensemble_member")
Temperature
Ensamble1<-Temperature[,,1,] #The Ensamble wanted, 1 to 56
Ensamble1<-round(Ensamble1,digits = 0)
tunits<-"hours since 1800-01-01 00:00:00"
#Define dimensions
##################################################################
londim<-ncdim_def("Lon","degrees_east",as.double(Lon))
latdim<-ncdim_def("Lat", "degrees_north",as.double(Lat))
timedim<-ncdim_def("Time",tunits,as.double(Time))
#Define variables
##################################################################
fillvalue<-1e32
dlname<-"tm2"
tmp_def<-ncvar_def("Ensamble1","deg_K", list(londim,latdim,timedim),fillvalue,dlname,prec = "double")
ncfname<-("D:/Rwork/Project/TrialEnsamble/TrialEnsamble.nc")
ncout<-nc_create(ncfname,list(tmp_def),force_v4=T)
ncvar_put(ncout,tmp_def,Ensamble1,start=NA,count = NA )# Think I need a nested loop here
ncatt_put(ncout,"Lon","axis","X")
ncatt_put(ncout, "Lat", "axis", "Y")
ncatt_put(ncout, "Time","axis", "T")
title<-c( 1:2 )
names(title)<-c("Ian","Gillespie")
title<-as.data.frame(title)
ncatt_put(ncout,0,"Make_NCDF4_File",1, prec="int")
ncatt_put(ncout,0,"Maynooth_University",1,prec="short")
ncatt_put(ncout,0,"AR000087828",1, prec="short")
ncatt_put(ncout,0,"mickymouse",1, prec="short")
history <- paste("P.J. Bartlein", date(), sep=", ")
ncatt_put(ncout,0,"description","this is the script to write NCDF4 files")
#Close file and write date to disk
##########################################################
nc_close(ncout)
}
Found it is better to prepare an empty array of four dimensions with the first 3 the same size and names dimensions as the array produced from the for loop with an additional fourth dimension. The forth dimension to hold the results of each iteration
dir("D:/Rwork/Project/Test")->xlab # This is the directory where the file for analysising are
filelist <- paste("Test/",dir("Test"),sep="")
output <- array(, dim=c(192,94,12,160))# need to change this to length(Lat), etc
N <- length(filelist) # Loop over the individual files
for(j in 1:N){
File<-nc_open(filelist[j])
print(filelist[j])
Temperature<-ncvar_get(File,"t2m")
Lat<-ncvar_get(File, "lat")
Lon<-ncvar_get(File,"lon")
Time<-ncvar_get(File,"time")
Year<-c(1851:2010)
EnsambleNo.<-ncvar_get(File,"ensemble_member")
Temperature
Lat<-round(Lat,digits = 2)
Lon<-round(Lon,digits = 2)
Ensamble1<-Temperature[,,1,] #The Ensamble wanted, 1 to 56
Ensamble1<-round(Ensamble1,digits = 1)
dimnames(Ensamble1)<-list(Lon,Lat,Time)
dimnames(output) <- list(Lon,Lat,Time,Year)
}
print(Ensamble1)

Incorrect number of dimensions: extracting elements from multiple rdata files

PROBLEM
I have many .RData files in one folder and I want to extract the coordinates continued in each .rdata file. I'd also like to link the concomitant file name(use_hab) and datetime(dt) to each row of their respective coordinates.
CODE
file.namez<-list.files("C:/fitting/fitdata/7 27 2015") #name of files
#file.namez.rev<-file.namez[grep(".RData",file.namez)]
datastor<-data.frame(matrix(NA,length(file.namez),4))
names(datastor)<-c("use_hab",paste("B",1:3,sep=""))
allresults<-NULL
for(i in 1:length(file.namez))
{
datastor<-NULL
print(file.namez[i])
load(paste("C:/fitting/fitdata/7 27 2015/",file.namez[i], sep=""))
use_hab <- as.character(as.data.frame(strsplit(file.namez[i],"_an"))[2,])# this line is used to remove unwanted parts of the file name
use_hab <- gsub(".RData","", use_hab)
datastor <- fitdata$coords
datastor$use_hab <- use_hab
datastor$dt <- fitdata$dt
allresults <- rbind(allresults, datastor[,c(3,4,1,2)])
}
This is only result before the error message:
[1] "fitdata_anw514_yr2008.RData"
ERROR
Error in datastor[, c(3, 4, 1, 2)] : incorrect number of dimensions
In addition: Warning message:
In datastor$use_hab <- use_hab : Coercing LHS to a list
QUESTION
How am I getting the incorrect number of dimensions? Each file name should have 1098 coordinates and date time. In total, 63 files x 1098 rows with 4 columns(filename, datetime, x, y).
The desired result is to have the file name as the first column, the date time as the second column, and the x and y coordinates as the third and fourth columns.
Replace
datastor <- fitdata$coords
with
datastor$coords <- fitdata$coords
The error message Coercing LHS to a list is thrown when you try to access something with $ that does not support this. datastor <- fitdata$coords changes datastor to the data type of fitdata$coords.
Also, you'd change
allresults<-NULL
datastor<-NULL
to
allresults <- data.frame()
datastor <- data.frame()
but this may just my personal preference.

How to use a script on multiple excel working sheets in r studio

I have a rather long script, at least for me, but i'm new in R.
It looks like this
#DATA IS LOADED
library(xlsx)
df=read.xlsx2("/Users/emiliemariafalkkallenbach/Documents/Norsk_Institut_for_vandforskning/Chase_DK.xlsx",sheetIndex=1)
df[df==""]<- NA
df=data.matrix(df)
df=data.frame(df)
attach(df)
Phyto=df[,1:6]
Submerged=df[,7:12]
Benthic=df[13:18]
# PHTYOPLANKTON
#making of boundaries
a<-(ifelse(P_Resp<=0,(1-P_Acdev),1/(1+P_Acdev)))
b=(0.95+a)/2
c=2*a-b
d=2*c-a
#Weighted boundaries
e=0.95*P_weight
e=sum(e)
f=b*P_weight
f=sum(f)
g=a*P_weight
g=sum(g)
h=c*P_weight
g=sum(g)
i=d*P_weight
i=sum(i)
#EQR weighted
j=P_EQR*P_weight
j=sum(j)
#complete
l <- ifelse(is.na(rowSums(Phyto)),1,0)
m=sum(l)
#SUBMERGED VEGETATION
#Overall assesment
y=ifelse(m<=0,j,"")
z=ifelse(y!="NA",0,(ifelse(y>f,1,(ifelse(y>g,2,(ifelse(y>h,3,(ifelse(y>i,4,5)))))))))
#making of boundaries
a.sub<-(ifelse(S_Resp<=0,(1-S_Acdev),1/(1+S_Acdev)))
a.sub=sum(a.sub)
b.sub=(0.95+a.sub)/2
b.sub=sum(b.sub)
c.sub=2*a.sub-b.sub
c.sub=sum(c.sub)
d.sub=2*c.sub-a.sub
d.sub=sum(d.sub)
#Weighted boundaries
e.sub=0.95*S_weight
e.sub=sum(e)
f.sub=b.sub*S_weight
f.sub=sum(f)
g.sub=a.sub*S_weight
g.sub=sum(g)
h.sub=c.sub*S_weight
h.sub=sum(h)
i.sub=d.sub*S_weight
i.sub=sum(i)
#EQR.sub weighted
j.sub=S_EQR*S_weight
j.sub=sum(j.sub)
#complete.sub
l.sub <- ifelse(is.na(rowSums(Submerged)),1,0)
m.sub=sum(l.sub)
#Overall Assesment
q.sub=m.sub*0.75
y.sub=ifelse(m.sub<=0,j.sub,"")
z.sub=ifelse(y.sub!="NA",(ifelse(y.sub>f.sub,1,(ifelse(y.sub>g.sub,2,(ifelse(y.sub>h.sub,3,(ifelse(y.sub>i.sub,4,5)))))))),0)
BENTHIC INVERTEBRATES
#making of boundaries
a.ben<-(ifelse(B_Resp<=0,(1-B_Acdev),1/(1+B_Acdev)))
b.ben=(0.95+a.ben)/2
c.ben=2*a.ben-b.ben
d.ben=2*c.ben-a.ben
#Weighted boundaries
e.ben=0.95*B_weight
e.ben=sum(e.ben)
f.ben=b.ben*B_weight
f.ben=sum(f.ben)
g.ben=a.ben*B_weight
g.ben=(sum(g.ben))
h.ben=c.ben*B_weight
h.ben=sum(h.ben)
i.ben=d.ben*B_weight
i.ben=sum(i.ben)
#EQR weighted
j.ben=B_EQR*B_weight
#Complete
l.ben <- ifelse(is.na(rowSums(Benthic)),1,0)
m.ben=sum(l.sub)
#ChkAccDev
n.ben=ifelse(B_Resp>0,0.53,1.1)
o.ben=ifelse(B_Acdev<0.15,-1,0)
p.ben=ifelse(B_Acdev>n.ben,1,o.ben)
#Overall Assesment
q.ben=m.ben*0.75
y.ben=ifelse(m.ben<1,j.ben,"")
z.ben=ifelse(y.ben!="NA",ifelse(y.ben>f.ben,1,(ifelse(y.ben>g.ben,2,(ifelse(y.ben>h.ben,3,(ifelse(y.ben>i.ben,4,5))))))),0)
#Final assesment
Z=max(na.omit(c(z,z.sub,z.ben)))
#Overall assesment
SCORE=(ifelse(Z<=1 && Z<2,"HIGH",(ifelse(Z>=2 && Z<3,"GOOD",(ifelse(Z>=3 && Z<4,"MODERATE",(ifelse(Z>=4 && Z<5,"BAD",(ifelse(Z>=5,"POOR","NA"))))))))))
SCORE
#Pie Chart
library('plotrix')
Phyto=33.3
Submerged=33.3
Benthic=33.3
total=100
Slices1=c=(1)
total=100
iniR=0.2
Slices1=c=(1)
pie(1, radius=iniR, init.angle=90, col=c('white'), border = NA, labels='Overall')
colors=c(ifelse(z<=1,"blue", ifelse(z<=2,"green",ifelse(z<=3,"yellow",ifelse(z<=4,"orange",ifelse(z<=5,"red","red"))))), ifelse(z.sub<=1,"blue", ifelse(z.sub<=2,"green", ifelse(z.sub<=3,"yellow",ifelse(z.sub<=4,"orange",ifelse(z.sub<=5,"red","red"))))),ifelse(z.ben<=1,"blue", ifelse(z.ben<=2,"green", ifelse(z.ben<=3, "yellow", ifelse(z.ben<=4,"orange",ifelse(z.ben<=5,"red","red"))))))
floating.pie(0,0,c(Phyto, Submerged, Benthic),radius=5*iniR, startpos=pi/2, col=colors,border=NA)
Slices1=c=(1)
total=100
iniR=0.2 # initial radius
colorst=c(ifelse(Z<=1,"blue", ifelse(Z<=2,"green",ifelse(Z<=3,"yellow",ifelse(Z<=4,"orange",ifelse(Z<=5,"red","red"))))))
floating.pie(0,0,c(total),radius=3*iniR, startpos=pi/2, col=colorst,border=NA)
angles=as.numeric(c(-10,75,80))
pie.labels(0,0,angles,c("Phyto","Submerged","Benthic"),radius=1, bg="white")
pie.labels(0,0,0,c("Overall Assessment"),radius=-0.3)
I guess it does not matter, what is into my script.
At the moment it only runs the first sheet in excel, but I have several and would like to run them on all of them.
The outcome should be a table looking like this
z z.sub z.ben Z Pie-chart (only if possible)
Sheet 1 0 NA NA High
Sheet 2 ... ... ... ...
Sheet 3 ... ... ... ...
I'm sorry if this is an ordinary question!
Hope someone is able to help
Thanks!
a better way to read xls is the library readxl.
# remove "#" if you don't have these libraries installed already
# install.packages("readxl") # faster excel reader
# install.packages("data.table") # faster everything, in this case rbindlist
library(readxl)
library(data.table)
sheets = 1:5 # index numbers or names of the sheets you want to read
readmysheets = function(sheets) {
df = read_excel(file="myexcel.xls", sheets)
}
myfiles = lapply(sheets, readmysheets) # apply the indices/names on the readmysheets function
# you now have a list:
str(myfiles)
# bind the separate sheets together
together = rbindlist(myfiles, fill = T)
Wrap the body of your script into a "for" loop. You have several was of doing this, here are two.
# pre-allocate an object to write to
out <- matrix(rep(NA, numsheets * numcols), ncols = numcols))
for (i in 1:nsheets) {
df=read.xlsx2(".../Chase_DK.xlsx", sheetIndex = i)
#... do calculations
out[, i] <- c(z, z.sub, z.ben, Z)
}
# second way, no need to pre-allocate anything
sapply(1:nsheets, FUN = function(x) {
df=read.xlsx2(".../Chase_DK.xlsx", sheetIndex = x)
#... do calculations
out <- c(z, z.sub, z.ben, Z) # specify what you wish the function to return
return(out) # sapply will try to simplify the combined result on its own
})

Load frequent subsequences from TXT

Is it possible to load a list of frequent subsequences from a .txt file, and make TraMineR recognize it as a sequence object?
Unfortunately I don't have the raw data, therefore I am not able to recreate the analysis. The only file what I have is a .txt file containing the frequent subsequences. I assume it was created with the seqefsub() function from the TraMineR package, with maxGap=2, because the data looks like as an output of the mentioned function.
read.table() reads it as a data frame but as far as I understood, TraMineR handles event sequences as lists with many additional attributes, that for example are not contained in this file. Or I don't know how to extract them...
This is how the a couple of lines from the .txt file look like:
Subsequence Support Count
16 (WT4)-(WT3) 0.76666667 805
17 (WL2) 0.76380952 802
18 (S1) 0.76000000 798
19 (FRF,WL2) 0.74380952 781
20 (WT2)-(WT1) 0.70571429 741
To create an event sequence object from the (text) subsequences, you have to transform them into vertical time stamped event (TSE) form. The function below does the job for your data
## Function subseq.to.TSE
## puts the sequences into TSE format using
## position as timestamp
## sdf: a data frame with columns Id, Subsequence, Support and Count.
subseq.to.TSE <- function(sdf){
tse <- data.frame(id=0, event="", time=0)
k <- 0
for (i in 1:nrow(sdf)){
id <- sdf[i,"Id"]
s <- sdf[i,"Subsequence"]
ss <- gsub("\\(","",s)
ss <- gsub("\\)","",ss)
# split transitions
st <- strsplit(ss, split="-")[[1]]
for (j in 1:length(st)){
stt <- strsplit(st[j], split=",")[[1]]
for(jj in 1:length(stt)){
k <- k+1
tse[k,1] <- id
## parsing for simultaneous events
if (!(stt[jj] %in% levels(tse[,2])))
{levels(tse[,2]) <- c(levels(tse[,2]),stt[jj])}
tse[k,2] <- stt[jj]
tse[k,3] <- j
}
}
}
return(tse)
}
Here is how you would use it on the example data.
We first create the data frame that we name s.df
s.df <- data.frame(scan(what=list(Id=0, Subsequence="", Support=double(), Count=0)))
16 (WT4)-(WT3) 0.76666667 805
17 (WL2) 0.76380952 802
18 (S1) 0.76000000 798
19 (FRF,WL2) 0.74380952 781
20 (WT2)-(WT1) 0.70571429 741
# leave a blank line to end the scan
Then we extract the TSE data from s.df and create from it the event sequence object using seqecreate. Finally, we assign the counts as sequence weights.
s.tse <- subseq.to.TSE(s.df)
seqe <- seqecreate(s.tse)
seqeweight(seqe) <- s.df[,"Count"]
Now you can for instance plot the event sequences with
seqpcplot(seqe)

Resources