My question is similar to a few that have been asked before, but I hope different enough to warrant a separate question.
See here, and here. I'll pull some of the same example data as these questions. For context to my question- I am looking to see how my observed catch-rate (sea creatures) changed over multiple days of sampling the same area.
I want to calculate the difference between the first sample day at a given site (first Letter in data below), and the subsequent sample days (next rows of same letter).
#Example data
df <- data.frame(
id = c("A", "A", "A", "A", "B", "B", "B"),
num = c(1, 8, 6, 3, 7, 7 , 9),
What_I_Want = c(NA, 7, 5, 2, NA, 0, 2))
The first solution that I found calculates a lagged difference between each row. I also wanted this calculation- so it was helpful to find:
#Calculate lagged differences
df_new <- df %>%
# group by condition
group_by(id) %>%
# find difference
mutate(diff = num - lag(num))
Here the difference is between A.1 and A.2; then A.2 and A.3 etc...
What I would like to do now is calculate the difference with respect to the first value of each group. So for letter A, I would like to calculate 1 - 8, then 1 - 6, and finally 1 - 3. Any suggestions?
One clunky solution (linked above) is to create two (or more) columns for each distance lagged and some how merge the results that I want
df_clunky = df %>%
group_by(id) %>%
mutate(
deltaLag1 = num - lag(num, 1),
deltaLag2 = num - lag(num, 2))
Here is a base R method with replace and ave
ave(df$num , df$id, FUN=function(x) replace(x - x[1], 1, NA))
[1] NA 7 5 2 NA 0 2
ave applies the replace function to each id. replace takes the difference of the vector and the first element in the vector as its input and replaces NA into the first element.
Related
I have a dataframe df and I wish to create a new column b that is the smaller value of column a and 10 - a. When there is NA, I wish column b also returnsNA in the corresponding rows. So column b should be c(1, 3, 1, NA). I tried the following code but all rows of b are 1. I wish to find a solution in tidyverse.
library(tidyverse)
df <- data.frame(a = c(1, 3, 9, NA))
df2 <- df %>% mutate(b = min(a, 10 - a, na.rm = T))
I guess the issue arises becuase of applying the min function, which is complicated by the presence of NA. But I cannot figure out how to solve the issue.
Hope you have a nice day.
Today I was trying two make from one big column two small ones in R. However, I haven't found a way how to make it.
I have something like this (however, it is way bigger)
name3 <- c(1, 2, 3, 4, 5, 6)
df1 <- data.frame(name3)
print(df1)
I want to do something like this. My intention is just take the total number of variables and divide it into two equal groups.
name <- c(1, 2, 3)
name1 <- c(4, 5, 6)
df <- data.frame(name, name1)
print (df)
Thanks in advance!
One way to do it, you can first write this as a matrix in which you specify the number of columns
than transform the matrix to dataframe
from a dataframe you can convert each column to a vector
This is how I did it
name3 <- c(1, 2, 3, 4, 5, 6)
df <- as.data.frame(matrix(name3, ncol = 2))
name1 <- df$V1
name2 <- df$V2
Trying to accomplish this as close to base r as possible, this would be my method if the order of the sub vector don't matter:
# needed for index function
library(zoo)
# simple function to calculate even / odd
is.even <- function(x) x %% 2 == 0
# define my vector of values
name3 <- c(1, 2, 3, 4, 5, 6)
# split vector by even or odd index.
split(name3,f= is.even(index(name3)) )
Result:
$`FALSE`
[1] 1 3 5
$`TRUE`
[1] 2 4 6
Sorry for what might be a basic / redundant question (with an awful title to boot). I have been struggling with calculating means of columns within data frames in a list. I've tried a variety of approaches mentioned in similar questions but can never get it to work. I'm relatively new to r and am a bit out of my depth.
I have a list of data frames similar to:
df1 <- data.frame(c("Jan", "Jan", "Jan"), c("21:14:33", "21:14:33", "21:14:33"), c(1, 2, 3), c(11, 12, 13))
df2 <- data.frame(c("Feb", "Feb", "Feb"), c("22:14:33", "22:14:33", "22:14:33"), c(2, 3, 4), c(12, 13, 14))
df3 <- data.frame(c("Mar", "Mar", "Mar"), c("23:14:33", "23:14:33", "23:14:33"), c(3, 4, 5), c(13, 14, 15))
mylist <- list(df1, df2, df3)
My goal is to create a vector for each data frame that contains the month, time, mean.column3, mean.column4. For example "Jan, 21:14:33, 2, 12" for the first data frame. (Ultimately I want to combine all these vectors into a new data frame, but I can do this once I have the vectors using rbind).
I have gotten the closest using for loops to calculate the mean, but using the below code it only gives me the mean for the last data frame (df3):
for(i in seq_along(mylist)){
output <- sapply(mylist[[i]][3:4], MARGIN = 2, FUN = mean)
}
I have also tried using lapply (as suggested here), abind (as suggested here), and map (as suggested here), which makes me think I'm the problem and must be missing something.
None of these approaches begins to address the need to include month and time in the resulting vector. I've tried to do it for a single data frame using code such as this, but it gives me all the months and times, when I really just need them once.
output1 <- c(mylist[[1]][1,1:2],sapply(mylist[[1]][3:4], MARGIN = 2, FUN = mean))
Help?
I think your plan to calculate the means and then combine into one data frame is backwards - your data frames all have the same columns, so go ahead and combine them already! Then doing grouped means is easy.
I'll use data.table here because it has nice syntax for grouped means and it's rbindlist will ignore the different (terrible) column names in your example:
library(data.table)
mydt = rbindlist(mylist)
# get better column names
setnames(mydt, c("month", "time", "x1", "x2"))
# means by group
mydt[, .(mx1 = mean(x1), mx2 = mean(x2)), by = .(month, time)]
# month time mx1 mx2
# 1: Jan 21:14:33 2 12
# 2: Feb 22:14:33 3 13
# 3: Mar 23:14:33 4 14
# (if you have more columns and you don't want to type out all the means)
mydt[, lapply(.SD, mean), by = .(month, time)]
with(do.call(rbind, lapply(mylist, function(x)
setNames(x, paste0("X",1:NCOL(x))))),
aggregate(list(C3 = X3, C4 = X4), list(C1 = X1, C2 = X2), mean))
# C1 C2 C3 C4
#1 Jan 21:14:33 2 12
#2 Feb 22:14:33 3 13
#3 Mar 23:14:33 4 14
I have a vector containing the frequencies of molecules within their respective molecular class for all molecules measured. I also have a vector that contains the per class frequency of significant molecules identified by variable selection. How can I merge these 2 vectors into a data frame and fill in empty frequencies with 0's (in R)?
Here is a workable example:
full = rep(letters[1:4], 4:7)
fullTable = table(full)
sub = rep(letters[1:2], c(2, 4))
subTable = table(sub)
I would like the table to look like:
print(data.frame(Letter=letters[1:4], fullFreq=c(4, 5, 6, 7), subFreq=c(2, 4, 0, 0)))
Try this (I supposed you meant subTable=table(sub) in your last line):
res<-merge(as.data.frame(fullTable),as.data.frame(subTable),by.x=1,by.y=1,all=TRUE)
colnames(res)<-c("Letter","fullFreq","subFreq")
res[is.na(res)]<-0
With the library dplyr
library(dplyr)
full=rep(letters[1:4], 4:7)
sub=rep(letters[1:2], c(2,4))
df <- data.frame(Letter=unique(c(full, sub)))
df <- df %>%
left_join(as.data.frame(table(full)), by=c("Letter"="full")) %>%
left_join(as.data.frame(table(sub)), by=c("Letter"="sub"))
df[is.na(df)] <- 0
df
I want to multiply two data.frames that are of unequal length
If I have a data frame of observations (in reality this is around 30000 entries long)
Species number
1 3
1 3
3 5
4 40
5 22
and another data frame with conversion ratios for each species present in the first data frame (this is only about 120 entries in length)
species conversion ratio
1 3
2 5
3 4
4 2
5 2
and I want to multiply each number column entry by the conversion ratio entry associated with that Species, how might I go about doing this in R?
I've attempted using the match function to no avail, and my attempts at working with arrays have only resulted in errors, as well.
See ?merge. Assuming you have species named consistently (capitals):
df3 <- merge(df1,df2)
df3$number*df3$conversion.ratio
You could merge the two data frames.
## Your example data
df.number <- matrix(c(1, 1, 3, 4, 5, 3, 3, 5, 40, 22), ncol = 2)
colnames(df.number) <- c("species", "number")
df.ratio <- matrix(c(1, 2, 3, 4, 5, 3, 5, 4, 2, 2), ncol = 2)
colnames(df.ratio) <- c("species", "ratio")
## Merge the two matrices
dat <- merge(df.number, df.ratio, by = "species")
## Multiply for your result
result <- with(dat, number * ratio)
Edit
#Frank: In your comment to James, you say that the resulting data frame after the merge is too long. Do you mean that you want to remove duplicated rows? If so:
dat2 <- subset(dat, subset = !duplicated(dat))
result2 <- with(dat2, number * ratio)