I apologize for the poor phrasing of this question, I am still a beginner in R and I am still getting used to the proper terminology. I have provided sample data below:
mydata <- data.frame(x = c(1, 2, 7, 19, 45), y=c(10, 12, 15, 19, 24))
View(mydata)
My intention is to find the x speed, and for this I would need to find the difference between 1 and 2, 2 and 7, 7 and 19, and so on. How would I do this?
You can use the diff function.
> diffs <- as.data.frame(diff(as.matrix(mydata)))
> diffs
x y
1 1 2
2 5 3
3 12 4
4 26 5
> mean(diffs$x)
[1] 11
You can use dplyr::lead() and dplyr::lag() depending on how you want the calculations to line up
library(dplyr)
mydata <- data.frame(x = c(1, 2, 7, 19, 45), y=c(10, 12, 15, 19, 24))
View(mydata)
mydata %>%
mutate(x_speed_diff_lead = lead(x) - x
, x_speed_diff_lag = x - lag(x))
# x y x_speed_diff_lead x_speed_diff_lag
# 1 1 10 1 NA
# 2 2 12 5 1
# 3 7 15 12 5
# 4 19 19 26 12
# 5 45 24 NA 26
Related
What i want to have is a matrix in which each element is an array itself.
This array is taken subsetting a dataframe, but the example can be generalized for any array.
I tried with:
My_matrix <- matrix(array(), nrow = NROW, ncol = NCOL)
for (i in 1:NROW){
for(j in 1:NCOL){
My_matrix[i,j] <- df[df$var1 == j & df$var2== i,]$var3
}
}
but I got this message error:
Error in My_matrix[i,j] <- df[df$var1== j & df$var2== i,]$var3 :
number of items to replace is not a multiple of replacement length
How should I define and access each element of the matrix and each element of the contained array?
I think I understand that: (1) the base array is 45x3; (2) each cell has a differently sized matrix; and (3) this is not known apriori. Gotcha. Not possible. An array (matrix) is always perfectly dimensioned, and while you can dynamically change one or more of the dimensions, you change for all cells.
Alternative: list-columns.
dat <- data.frame(x=1:3, y=11:13)
dat$z <- lapply(3:5, function(i) matrix(seq_len(i^2), nr=i))
dat
# x y
# 1 1 11
# 2 2 12
# 3 3 13
# z
# 1 1, 2, 3, 4, 5, 6, 7, 8, 9
# 2 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16
# 3 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25
That doesn't look very appealing, but if you want a different presentation, you might consider assigning it as a tibble::tbl_df (available whenever dplyr is loaded as well). (Note that presentation is distinct from storage and accessibility.)
library(tibble)
as_tibble(dat)
# # A tibble: 3 x 3
# x y z
# <int> <int> <list>
# 1 1 11 <int[,3] [3 x 3]>
# 2 2 12 <int[,4] [4 x 4]>
# 3 3 13 <int[,5] [5 x 5]>
Subsetting is consistent:
dat$z[ dat$x == 2 & dat$y == 12 ]
# [[1]]
# [,1] [,2] [,3] [,4]
# [1,] 1 5 9 13
# [2,] 2 6 10 14
# [3,] 3 7 11 15
# [4,] 4 8 12 16
### note that you need an extra [[1]] to get to the real data
m <- dat$z[ dat$x == 2 & dat$y == 12 ][[1]]
m
# [,1] [,2] [,3] [,4]
# [1,] 1 5 9 13
# [2,] 2 6 10 14
# [3,] 3 7 11 15
# [4,] 4 8 12 16
m[3,4]
# [1] 15
It seems pretty basic... but I'm trying to generate a second array in R that would correspond to the counts of the events of my primary array. For instance, if there are 14 Age[x] that are 42, I would want Age.count[x] to equal 14.
So if Age was [1] 10 14 14 13 14 12 10 I would want my Age.count to be [1] 2 3 3 1 3 1 2. It seems like it should be really simple but I haven't managed yet...
My best shot so far:
for (val in length(Age)) {
Age.count[val] <- length(subset(Age, Age==val2))
}
Unfortunately it's giving me NA values on all but the first and last values. Help?
A simple way is to use ave, i.e.,
> ave(age,age,FUN = length)
[1] 2 3 3 1 3 1 2
DATA
age <- c(10, 14, 14, 13, 14, 12, 10)
You could make it more compact than this, but at least you can see what is happening this way.
age = c(10, 14, 14, 13, 14, 12, 10)
counts = table(age)
i = match(age, names(counts))
counts[i]
> counts[i]
age
10 14 14 13 14 12 10
2 3 3 1 3 1 2
Let
Age = c(10, 14, 14, 13, 14, 12, 10)
X = data.frame(Age)
Test = as.data.frame(table(X))
Test$X = as.numeric(as.character(Test$X))
colnames(Test) = c("Age", "Frequency")
Then
Result = dplyr::inner_join(X, Test)
will work.
I have a dataframe like this:
V1 = paste0("AB", seq(1:48))
V2 = seq(1:48)
test = data.frame(name = V1, value = V2)
I want to calculate the means of the value-column and specific rows.
The pattern of the rows is pretty complicated:
Rows of MeanA1: 1, 5, 9
Rows of MeanA2: 2, 6, 10
Rows of MeanA3: 3, 7, 11
Rows of MeanA4: 4, 8, 12
Rows of MeanB1: 13, 17, 21
Rows of MeanB2: 14, 18, 22
Rows of MeanB3: 15, 19, 23
Rows of MeanB4: 16, 20, 24
Rows of MeanC1: 25, 29, 33
Rows of MeanC2: 26, 30, 34
Rows of MeanC3: 27, 31, 35
Rows of MeanC4: 28, 32, 36
Rows of MeanD1: 37, 41, 45
Rows of MeanD2: 38, 42, 46
Rows of MeanD3: 39, 43, 47
Rows of MeanD4: 40, 44, 48
As you see its starting at 4 different points (1, 13, 25, 37) then always +4 and for the following 4 means its just stepping 1 more row down.
I would like to have an output of all these means in one list.
Any ideas? NOTE: In this example the mean is of course always the middle number, but my real df is different.
Not quite sure about the output format you require, but the following codes can calculate what you want anyhow.
calc_mean1 <- function(x) mean(test$value[seq(x, by = 4, length.out = 3)])
calc_mean2 <- function(x){sapply(x:(x+3), calc_mean1)}
output <- lapply(seq(1, 37, 12), calc_mean2)
names(output) <- paste0('Mean', LETTERS[seq_along(output)]) # remove this line if more than 26 groups.
output
## $MeanA
## [1] 5 6 7 8
## $MeanB
## [1] 17 18 19 20
## $MeanC
## [1] 29 30 31 32
## $MeanD
## [1] 41 42 43 44
An idea via base R is to create a grouping variable for every 4 rows, split the data every 12 rows (nrow(test) / 4) and aggregate to find the mean, i.e.
test$new = rep(1:4, nrow(test)%/%4)
lapply(split(test, rep(1:4, each = nrow(test) %/% 4)), function(i)
aggregate(value ~ new, i, mean))
# $`1`
# new value
# 1 1 5
# 2 2 6
# 3 3 7
# 4 4 8
# $`2`
# new value
# 1 1 17
# 2 2 18
# 3 3 19
# 4 4 20
# $`3`
# new value
# 1 1 29
# 2 2 30
# 3 3 31
# 4 4 32
# $`4`
# new value
# 1 1 41
# 2 2 42
# 3 3 43
# 4 4 44
And yet another way.
fun <- function(DF, col, step = 4){
run <- nrow(DF)/step^2
res <- lapply(seq_len(step), function(inc){
inx <- seq_len(run*step) + (inc - 1)*run*step
dftmp <- DF[inx, ]
tapply(dftmp[[col]], rep(seq_len(step), run), mean, na.rm = TRUE)
})
names(res) <- sprintf("Mean%s", LETTERS[seq_len(step)])
res
}
fun(test, 2, 4)
#$MeanA
#1 2 3 4
#5 6 7 8
#
#$MeanB
# 1 2 3 4
#17 18 19 20
#
#$MeanC
# 1 2 3 4
#29 30 31 32
#
#$MeanD
# 1 2 3 4
#41 42 43 44
Since you said you wanted a long list of the means, I assumed it could also be a vector where you just have all these values. You would get that like this:
V1 = paste0("AB", seq(1:48))
V2 = seq(1:48)
test = data.frame(name = V1, value = V2)
meanVector <- NULL
for (i in 1:(nrow(test)-8)) {
x <- c(test$value[i], test$value[i+4], test$value[i+8])
m <- mean(x)
meanVector <- c(meanVector, m)
}
Let's say I have data in wide format (samples in row and species in columns).
species <- data.frame(
Sample = 1:10,
Lobvar = c(21, 15, 12, 11, 32, 42, 54, 10, 1, 2),
Limtru = c(2, 5, 1, 0, 2, 22, 3, 0, 1, 2),
Pocele = c(3, 52, 11, 30, 22, 22, 23, 10, 21, 32),
Genmes = c(1, 0, 22, 1, 2,32, 2, 0, 1, 2)
)
And I want to automatically change the species names, based on a reference of functional groups that I have for all of the species (so it works even if I have more references than actual species in the dataset), for example:
reference <- data.frame(
Species_name = c("Lobvar", "Ampmis", "Pocele", "Genmes", "Limtru", "Secgio", "Nasval", "Letgos", "Salnes", "Verbes"),
Functional_group = c("Crustose", "Geniculate", "Erect", "CCA", "CCA", "CCA", "Geniculate", "Turf","Turf", "Crustose"),
stringsAsFactors = FALSE
)
EDIT
Thanks to #Dan Y suggestions, I can now changes the species names to their functional group names:
names(species)[2:ncol(species)] <- reference$Functional_group[match(names(species), reference$Species_name)][-1]
However, in my actual data.frame I have more species, and this creates many functional groups with the same name in different columns. I now would like to sum the columns that have the same names. I updated the example to give a results in which there is more than one functional group with the same name.
So i get this:
Sample Crustose CCA Erect CCA Crustose
1 21 2 3 1 2
2 15 5 52 0 3
3 12 1 11 22 4
4 11 0 30 1 1
5 32 2 22 2 0
6 42 22 22 32 0
and the final result I am looking for is this:
Sample Crustose CCA Erect
1 23 3 3
2 18 5 52
3 16 22 11
4 12 1 30
5 32 4 22
6 42 54 22
How do you advise on approaching this? Thanks for your help and the amazing suggestions I already received.
Re Q1) We can use match to do the name lookup:
names(species)[2:ncol(species)] <- reference$Functional_group[match(names(species), reference$Species_name)][-1]
Re Q2) Then we can mapply the rowSums function after some regular expression work on the colnames:
namevec <- gsub("\\.[[:digit:]]", "", names(df))
mapply(function(x) rowSums(df[which(namevec == x)]), unique(namevec))
Given a vector of numbers, I'd like to map each to the smallest in a separate vector that the number does not exceed. For example:
# Given these
v1 <- 1:10
v2 <- c(2, 5, 11)
# I'd like to return
result <- c(2, 2, 5, 5, 5, 11, 11, 11, 11, 11)
Try
cut(v1, c(0, v2), labels = v2)
[1] 2 2 5 5 5 11 11 11 11 11
Levels: 2 5 11
which can be converted to a numeric vector using as.numeric(as.character(...)).
Another way (Thanks for the edit #Ananda)
v2[findInterval(v1, v2 + 1) + 1]
# [1] 2 2 5 5 5 11 11 11 11 11]