Perform vector operation on dataframe of coordinates - r

I currently have a data frame storing separate x,y,z coordinates from an accelerometer sensor (with timestamps), but want to perform vector operations on it.
Test data (actually have thousands of rows, and a timestamp row to be preserved)
x <- c(1,3,1,0,3)
y <- c(2,4,8,8,9)
z <- c(0,1,1,2,0)
df <- data.frame(x,y,z)
proj <- function(a,b) {
as.double((a %*% b) / (b %*% b)) * b
}
v = c(1,2,3)
I want to mutate (or create a new dataframe?) df by applying proj(_,v) on each row.
I have tried along the lines of mutate(projected = proj(c(x,y,z), v), but doesn't work, I am probably misusing this.
What is the best way to achieve this? Should I instead be using a list of vectors to store the coordinates?

While your proj(a,b)-function does only take two inputs, in your example you wanted to provide three proj(c(x,y,z),v) or did I misunderstand?
However, this would work:
dplyr::mutate(projected = proj(x,y), df) resulting in
x y z projected
1 1 2 0 0.4279476
2 3 4 1 0.8558952
3 1 8 1 1.7117904
4 0 8 2 1.7117904
5 3 9 0 1.9257642

Related

Apply function to column by segments in R

I have a function f that needs to be applied to a single column of length n in segments of m length, where m divides n. (For example, to a column of 1000 values, apply f to the first 250 values, then to 250-500, ...).
A loop is overkill, since the column has over 16 million values. I was thinking the efficient way would be to separate the column of length n into q vectors of length m, where mq = n. Then I could apply f simultaneously to all this vectors using some lapply-like functionality. Then I cold join the q vectors to obtain the transformed version of the column.
Is that the efficient way to go here? If so, what function could decompose a column into q vectors of equal length and what function should I use to broadcast f across the q vectors?
Lastly, although less importantly, what if we wanted to do this to several columns and not just one?
Context
I've programmed a function that computes the power spectrum of an EEG signal (a numeric vector). However, it is bad practice to compute the power spectrum of a whole signal at once. The correct method is to compute it epoch by epoch, in 30 or 5 second segments, and average the spectrum of all those epochs. Hence why I need to apply a function to a column (an EEG signal) by epochs (or segments).
A way to do it is to create an auxiliar variable, so you can apply to each variable, depending on your function you can use group_by and/or summarize, an example:
df <- data.frame(
x = rnorm(15),
y = rnorm(15),
z = rnorm(15)
)
library(dplyr)
df %>%
mutate(
aux = rep(1:3,each = (nrow(df)/3)),
across(.cols = c(x,y,z),.fns = ~ . + 2 * aux)
)
x y z aux
1 2.164841 2.882465 2.139098 1
2 2.364115 2.205598 2.410275 1
3 2.552158 1.383564 1.441543 1
4 1.398107 1.265201 2.605371 1
5 1.006301 1.868197 1.493666 1
6 5.026785 4.310017 2.579434 2
7 4.751061 2.960320 4.127993 2
8 2.490833 3.815691 5.945851 2
9 3.904853 4.967267 4.800914 2
10 3.104052 3.891720 5.165253 2
11 3.929249 5.301579 6.358856 3
12 6.150120 5.724055 5.391443 3
13 5.920788 7.114649 5.797759 3
14 5.902631 6.550044 5.726752 3
15 6.216153 7.236676 5.531300 3

Creating a table that shows a function evaluated for sequences

How would I create a table that takes two varaibles composed of incremental sequences and evaluates a function for the these two variables. An example of what I want to create is like a multiplication table. So the function would be x*y and it would produce a table where [row, column] [1,1]=1, [1,2]=2 [5,5]=25 etc
I think you can use for loops bit I'm not sure.
Thanks in advance
JOE this is pretty basic ... try to follow a basic data manipulation tutorial.
For this type of operations you do not need loops. Read up on vector operations.
What you want to do can be easily done in R with a data frame/tibble.
base R
# create your test vectors
x <- c(1,1,5)
y <- c(1,2,5)
# store them in a data frame
df <- data.frame(x = x, y = y)
df
x y
1 1 1
2 1 2
3 5 5
# in base R you code by refernce to the object and dollar notation
df$mult <- df$x * df$y
df
x y mult
1 1 1 1
2 1 2 2
3 5 5 25
tidyverse
The tidyverse might be a bit more intuitive for vectorised operations:
library(dplyr) # the main data crunching package of the tidyverse
df <- data.frame(x = x, y = y)
# with mutate you can create a new vector (or overwrite an existing one)
df <- df %>% mutate(MULT = x * y)
df
x y MULT
1 1 1 1
2 1 2 2
Good luck with your learning journey!
3 5 5 25

summarize results on a vector of different length of the original - Pivot table r

I would like to use the vector:
time.int<-c(1,2,3,4,5) #vector to be use as a "guide"
and the database:
time<-c(1,1,1,1,5,5,5)
value<-c("s","s","s","t","d","d","d")
dat1<- as.data.frame(cbind(time,value))
to create the following vector, which I can then add to the first vector "time.int" into a second database.
freq<-c(4,0,0,0,3) #wished result
This vector is the sum of the events that belong to each time interval, there are four 1 in "time" so the first value gets a four and so on.
Potentially I would like to generalize it so that I can decide the interval, for example saying sum in a new vector the events in "times" each 3 numbers of time.int.
EDIT for generalization
time.int<-c(1,2,3,4,5,6)
time<-c(1,1,1,2,5,5,5,6)
value<-c("s","s","s","t", "t","d","d","d")
dat1<- data.frame(time,value)
let's say I want it every 2 seconds (every 2 time.int)
freq<-c(4,0,4) #wished result
or every 3
freq<-c(4,4) #wished result
I know how to do that in excel, with a pivot table.
sorry if a duplicate I could not find a fitting question on this website, I do not even know how to ask this and where to start.
The following will produce vector freq.
freq <- sapply(time.int, function(x) sum(x == time))
freq
[1] 4 0 0 0 3
BTW, don't use the construct as.data.frame(cbind(.)). Use instead
dat1 <- data.frame(time,value))
In order to generalize the code above to segments of time.int of any length, I believe the following function will do it. Note that since you've changed the data the output for n == 1 is not the same as above.
fun <- function(x, y, n){
inx <- lapply(seq_len(length(x) %/% n), function(m) seq_len(n) + n*(m - 1))
sapply(inx, function(i) sum(y %in% x[i]))
}
freq1 <- fun(time.int, time, 1)
freq1
[1] 3 1 0 0 3 1
freq2 <- fun(time.int, time, 2)
freq2
[1] 4 0 4
freq3 <- fun(time.int, time, 3)
freq3
[1] 4 4
We can use the table function to count the event number and use merge to create a data frame summarizing the information. event_dat is the final output.
# Create example data
time.int <- c(1,2,3,4,5)
time <- c(1,1,1,1,5,5,5)
# Count the event using table and convert to a data frame
event <- as.data.frame(table(time))
# Convert the time.int to a data frame
time_dat <- data.frame(time = time.int)
# Merge the data
event_dat <- merge(time_dat, event, by = "time", all = TRUE)
# Replace NA with 0
event_dat[is.na(event_dat)] <- 0
# See the result
event_dat
time Freq
1 1 4
2 2 0
3 3 0
4 4 0
5 5 3

How to add output values from a for-loop to a data.frame depending on the variable loop input values in R?

x <- 1
y <- 1
for (y in 1:2){
for (x in 1:2){
z <- x+y
zresults <- data.frame(x, y, z)
}
}
Hello together,
sorry for my dump question, but I am new to R and this is actually my first attempt to code a little bit.
I created a for-loop with the indizes x and y and I want to save the output values (z) together with the corresponding x and y values in a data.frame.
The code posted it is obviously wrong but I'm not getting it.
The data.frame should look like that:
x y z
1 1 1 2
2 2 1 3
3 1 2 3
4 2 2 4
Thank you guys a lot in advance!
Greetings from Germany
Here's one way to do what you want to do:
zresults <- expand.grid(x=1:2,y=1:2);
zresults$z <- zresults$x + zresults$y;
zresults;
## x y z
## 1 1 1 2
## 2 2 1 3
## 3 1 2 3
## 4 2 2 4
Notes on your attempt:
The initial assignments to x and y are not necessary. The values are overwritten on the first iteration of each respective loop with the first value of the RHS vector (1 in each case). Also worth noting is that, unlike languages like C/C++ and Java, in R you don't have to declare variables; any variable name can be assigned a value at any time.
In your inner loop you're assigning zresults. After the first iteration, you are overwriting the previous value that existed for zresults. If you want to "build up" a data.frame one row at a time, you can use the following solutions, although note that performance will not be ideal with these approaches:
zresults[nrow(zresults)+1L,] <- c(x,y,z);
or
zresults <- rbind(zresults,c(x,y,z));
Also note that zresults would have to be initialized first, prior to the build-up loop; for example:
zresults <- data.frame(x=integer(),y=integer(),z=integer());
In general, try to avoid for-loops in R. Instead, vectorization is preferred. There are many good sources on this; for example, see http://www.noamross.net/blog/2014/4/16/vectorization-in-r--why.html and http://alyssafrazee.com/vectorization.html.
Here is another solution
x = 1
y = 1
result = NULL
for (y in 1:2) {
for (x in 1:2) {
z = x + y
if (is.null(result)) {
result = data.frame(x,y,z)
} else {
result = rbind(result, data.frame(x,y,z))
}
}
}
result

Replicate variable based off match of two other variables in R

I've got a seemingly simple question that I can't answer: I've got three vectors:
x <- c(1,2,3,4)
weight <- c(5,6,7,8)
y <- c(1,1,1,2,2,2)
I want to create a new vector that replicates the values of weight for each time an element in x matches y such that it produces the following new weight vector associated with y:
y_weight <- c(5,5,5,6,6,6)
Any thoughts on how to do this (either loop or vectorized)? Thanks
You want the match function.
match(y, x)
to return the indicies of the matches, the use that to build your new weight vector
weight[match(y, x)]
#Using plyr
library(plyr)
df<-as.data.frame(cbind(x,weight)) # converting to dataframe
df<-rename(df,c(x="y")) # rename x as y for joining dataframes
y<-as.data.frame(y) # converting to dataframe
mydata <- join(df, y, by = "y",type="right")
> mydata
y weight
1 1 5
2 1 5
3 1 5
4 2 6
5 2 6
6 2 6

Resources