how to use for loop to calculate the same formula among different data sets in R? - r

I have 64 different data sets: data 1, data 2, data 3... data 64.
I need to calculate the "DTW" distance between each data set and finally get the distance matrix, the code I am using is :
zooData <- zoo(data1$length, data1$Time.Elapsed)
zooData2<-zoo(data2$length,data2$Time.Elapsed)
alignment<-dtw(zooData,zooData2)
alignment$normalizedDistance
In this way, I have to manually change the data set name one by one. The process is super tedious. I am thinking I can use "for loop" to solve this problem, maybe I can put 64 different data sets into a list, and using "for loop"? I am not sure how can I achieve this goal in R. Anyone can help me? Thank you very much!

Related

How to extract the matrix to a submatrix

New to R, it might be a stupid question but I just have no thoughts.
I am doing the social network analysis and one of the works is to extract the entries that appeared in the same waves, i.e. t1t2 or t2t3.
I got a mega matrix file mega matrix and tried to get the right entries according to a list.
list
Should I use dplyr or write if-then condition? or what kinds of function?
Hope someone gives me a hint, thanks in advance!!
Assuming you want to extract rows which t1t2Node (the object you refer to as "a list") is equal to 1, then simply:
indexes <- as.logical(my_list)
filtered_matrix <- mega_matrix[indexes, ]

Subset variables by name in R

I know that there are many threads called this but either the advice within hasn't worked or I haven't understood it.
I have read what was an SPSS file into R.
I cleaned some variables and added new ones.
By this point the file size is 1,000 MB.
I wanted to write it into a CSV to look at it more easily but it just stops responding - file too big I guess.
So instead I want to create a subset of only the variables I need. I tried a couple of things
(besb <- bes[, c(1, 7, 8)])
data1 <- bes[,1:8]
I also tried referring to variables by name:
nf <- c(bes$approveGov, bes$politmoney)
All these attempts return errors with number of dimensions.
Therefore could somebody please explain to me how to create a reduced subset of variables preferably using variable names?
An easy way to subset variables from a data.frame is with the dplyr package. You can select variables with their bare names. For example:
library(dplyr)
nf <- select(bes, approveGov, politmoney)
It's fast for large data frames too.

copying data from one data frame to other using variable in R

I am trying to transfer data from one data frame to other. I want to copy all 8 columns from a huge data frame to a smaller one and name the columns n1, n2, etc..
first I am trying to find the column number from which I need to copy by using this
x=as.numeric(which(colnames(old_df)=='N1_data'))
Then I am pasting it in new data frame this way
new_df[paste('N',1:8,'new',sep='')]=old_df[x:x+7]
However, when I run this, all the new 8 columns have exactly same data. However, instead if I directly use the value of x, then I get what I want like
new_df[paste('N',1:8,'new',sep='')]=old_df[10:17]
So my questions are
Why I am not able to use the variable x. I added as.numeric just to make sure it is a number not a list. However, that does not seem to help.
Is there any better or more efficient way to achieve this?
If I'm understanding your question correctly, you may be overthinking the problem.
library(dplyr);
new_df <- select(old_df, N1_data, N2_data, N3_data, N4_data,
N5_data, N6_data, N7_data, N8_data);
colnames(new_df) <- sub("N(\\d)_data", "n\\\\1", colnames(new_df));

How to subset a set of variables and use Aggregate function in R

I'm a beginner in R and i'm working on a automation,i have a list of variables in a separate file based on which the values needs to be aggregated in the master dataset.The Master datastructure is attached Master Dataset
and the referal dataset contains the vars to be aggregated Referal dataset
Of the 6 variables i need to aggregate the Variables D,E,F by Sum(C)(as per the referal dataset).
The below code does my requirement manually,
X<-aggregate(C,by=list(D,E,F),FUN=sum)
But i need a code which does the same funtionality automatically.I tried making loops but the problem i face is that both datasets dont have same data.frame size. Can someone help me on this ?
So, it seems like you want to do a few things:
1) read in the master/referent datasets
2) subset the master according to the values in the referent
3) compute column sums on the master?
also, is there a specific reason you want to use aggregate()? there are probably lots of ways to do this. In any case, here is what i would do:
# assuming master is a dataframe or matrix, referent is a vector
# just simulating them here because not clear how you are reading them in
master = matrix(rnorm(36),6)
colnames(master) = c('A','B','C','D','E','F')
referent = c('D','E','F')
colSums(master[,referent])
so is that doing what you want to do? I like colSums because it's a handy built-in. I am not an R superstar though so it is possible that other ways are better for some reason.

Cluster PAM in R - How to ignore a Column/variable but still keep it

I would like to use the Cluster PAM algorithm in R to cluster a dataset of around 6000 rows.
I want the PAM algorithm to ignore a column called "ID" (Not use it in the clustering) but i do not want to delete that column. I want to use that column later on to combine my clustered data with the original dataset.
basically what i want is to add a cluster column to the original dataset.
I am want to use the PAM as a data compression/variables reduction method. I have 220 variables and i would like to cluster some of the variables and reduce the dimensionality of my dataset so i can apply a classification algorithm (Most likely a tree) to classify a problem that i am trying to solve.
If anyone knows a way around this or a better approach, please let me know.
Thank you
import data
data <- read.table(“sampleiris.txt”)
execution
result <- pam(data[2:4], 3, FALSE, “euclidean”)
Here subset [2:4] is done considering id is the first column.And the below code should fetch you the cluster values from PAM. you can the add this as a column to your Data
result$silinfo[[1]][1:nrow(pam.result$silinfo[[1]])]
Their is a small problem in the above code.
You should not use the silhouette information because it re-orders the rows as a preparation for the plot.
If you want to extract the cluster assignment while preserving the original dataset order and adding just a column of cluster assignment you should use $cluster. I tried it and it works like a charm.
This is the code:
data<- swiss[4:6]
result <- pam(data, 3)
summary (result)
export<-result$cluster
swiss[,"Clus"]<- export
View(export)
View(swiss)
Cheers

Resources