multiplying data frames conditionally in R - r

I want to multiply two data.frames that are of unequal length
If I have a data frame of observations (in reality this is around 30000 entries long)
Species number
1 3
1 3
3 5
4 40
5 22
and another data frame with conversion ratios for each species present in the first data frame (this is only about 120 entries in length)
species conversion ratio
1 3
2 5
3 4
4 2
5 2
and I want to multiply each number column entry by the conversion ratio entry associated with that Species, how might I go about doing this in R?
I've attempted using the match function to no avail, and my attempts at working with arrays have only resulted in errors, as well.

See ?merge. Assuming you have species named consistently (capitals):
df3 <- merge(df1,df2)
df3$number*df3$conversion.ratio

You could merge the two data frames.
## Your example data
df.number <- matrix(c(1, 1, 3, 4, 5, 3, 3, 5, 40, 22), ncol = 2)
colnames(df.number) <- c("species", "number")
df.ratio <- matrix(c(1, 2, 3, 4, 5, 3, 5, 4, 2, 2), ncol = 2)
colnames(df.ratio) <- c("species", "ratio")
## Merge the two matrices
dat <- merge(df.number, df.ratio, by = "species")
## Multiply for your result
result <- with(dat, number * ratio)
Edit
#Frank: In your comment to James, you say that the resulting data frame after the merge is too long. Do you mean that you want to remove duplicated rows? If so:
dat2 <- subset(dat, subset = !duplicated(dat))
result2 <- with(dat2, number * ratio)

Related

How can i enter a data that is already in a frequency distribution format to R?

I am talking a statistics course and I wanted to enter this data to R, it is already in the form of frequency distribution.
Data 1 2 3 4 5 6
Frequency 4 7 3 3 2 2
You can use data.frame:
freq_df = data.frame(value = 1:6, frequency = c(4, 7, 3, 3, 2, 2))
On the other hand, if you wish to perform other computations, you might want to reconstruct the original dataset:
values_raw <- rep(freq_df$value, times = freq_df$frequency)
print(mean(values_raw)) # mean or other statistics
Are you looking for this?
freq_df = data.frame(value = 1:6, frequency = c(4, 7, 3, 3, 2, 2))
result <- with(freq_df, sum(value * frequency)/sum(frequency))
result
#[1] 2.904762
This is also called as weighted.mean.
result <- with(freq_df, weighted.mean(value, frequency))

Is there a possibility to make from one long column two small in R?

Hope you have a nice day.
Today I was trying two make from one big column two small ones in R. However, I haven't found a way how to make it.
I have something like this (however, it is way bigger)
name3 <- c(1, 2, 3, 4, 5, 6)
df1 <- data.frame(name3)
print(df1)
I want to do something like this. My intention is just take the total number of variables and divide it into two equal groups.
name <- c(1, 2, 3)
name1 <- c(4, 5, 6)
df <- data.frame(name, name1)
print (df)
Thanks in advance!
One way to do it, you can first write this as a matrix in which you specify the number of columns
than transform the matrix to dataframe
from a dataframe you can convert each column to a vector
This is how I did it
name3 <- c(1, 2, 3, 4, 5, 6)
df <- as.data.frame(matrix(name3, ncol = 2))
name1 <- df$V1
name2 <- df$V2
Trying to accomplish this as close to base r as possible, this would be my method if the order of the sub vector don't matter:
# needed for index function
library(zoo)
# simple function to calculate even / odd
is.even <- function(x) x %% 2 == 0
# define my vector of values
name3 <- c(1, 2, 3, 4, 5, 6)
# split vector by even or odd index.
split(name3,f= is.even(index(name3)) )
Result:
$`FALSE`
[1] 1 3 5
$`TRUE`
[1] 2 4 6

Using lapply to subset a dataframe based on two or more factor variables

This is an extension of the StackOverflow question - Subset Data Based On Elements In List - which answered the problem of how to create a list of new dfs, each being constructed by subsetting the original one based on a grouping factor variable.
The challenge I am encountering is that i need to create the dfs using more than one grouping variable
To generalise the problem, I have created this toy dataset - which has as the response variable the daily amount of rain, and as classifiers the temperature range and the cloudiness of that day.
rain <- c(2, 0, 4, 25, 3, 9, 4, 0, 4, 0, 8, 35)
temp <- as.factor(c("Warm","Cold","Hot","Cold","Warm","Cold","Cold","Warm","Warm","Hot","Cold", "Cold"))
clouds <- as.factor(c("Some","Lots","None","Lots","None","None","Lots","Some","Some","Lots","None", "Some"))
df <- data.frame(rain, temp, clouds)
With the following code, i can produce three new dataframes grouped on the temp variable, all combined into a single list (df_1A):
temp_levels <- unique(as.character(df$temp))
df_1A <- lapply(temp_levels, function(x){subset(df, temp == x)})
And ditto for three new dataframes grouped by the cloudiness
cloud_levels <- unique(as.character(df$clouds))
df_1B <- lapply(cloud_levels, function(x){subset(df, clouds == x)})
However, I have not been able to come up with a simple, elegant way to produce the 9 dataframes each of which has a unique combination of temp and cloudiness
Thanks
You could use split to divide data based on unique levels of temp and clouds.
df_1 <- split(df, list(df$temp, df$clouds))
Your question implies a preference for lapply but if you don't mind using dplyr there is an elegant solution.
library(dplyr)
df_list <-
df %>%
group_by(temp, clouds) %>%
group_split()
# df_list
df_list[[1]]
#> # A tibble: 3 x 3
#> rain temp clouds
#> <dbl> <fct> <fct>
#> 1 0 Cold Lots
#> 2 25 Cold Lots
#> 3 4 Cold Lots
Your data
rain <- c(2, 0, 4, 25, 3, 9, 4, 0, 4, 0, 8, 35)
temp <- as.factor(c("Warm","Cold","Hot","Cold","Warm","Cold","Cold","Warm","Warm","Hot","Cold", "Cold"))
clouds <- as.factor(c("Some","Lots","None","Lots","None","None","Lots","Some","Some","Lots","None", "Some"))
df <- data.frame(rain, temp, clouds)

Reorder a list of dataframes by the number of rows in each dataframe

Say I have three dataframes in my.list, each with different numbers of rows. I would like to reorder this list so that the first element of the list is the dataframe with the highest number of rows (in the below example, d2).
d1 <- data.frame(y1 = c(1, 2, 3),
y2 = c(4, 5, 6))
d2 <- data.frame(y1 = c(3, 2, 1, 3, 2),
y2 = c(6, 5, 4, 2, 5))
d3 <- data.frame(y1 = c(2, 1),
y2 = c(3, 2))
my.list <- list(d1, d2, d3)
The expected output:
str(mylist[[1]]) ## 'data.frame': 5 obs. of 2 variables:
$ y1: num 3 2 1 3 2
$ y2: num 6 5 4 2 5
The reason for this: I'm repeatedly plotting data from the first element in several lists of dataframes, and would like to make sure I'm plotting the dataframes with the most data points when I call plot(my.list[[1]]).
Probably a cleaner solution would be to, within the plot call, search for the element/dataframe with the highest number of rows and plot that, but I'm not sure how easy that would be.
One potentially complicating factor is that there will occasionally be a list of dataframes where there is more than one dataframe sharing the highest number of rows. In that case, it wouldn't matter which one is called--they'd both do fine--but I'm not sure whether that creates an issue here.
Assuming your list is called 'lst'
lst= lst[order(sapply(lst,nrow),decreasing = T)]
Use vapply to get the number of rows in each data frame, then use rev(order(...)) to sort them from most rows to least.
nrow_each <- vapply(my.list, nrow, numeric(1))
my.list[rev(order(nrow_each))]
Or do it in one, difficult-to-read line
my.list[rev(order(vapply(my.list, nrow, numeric(1))))]

Data frame operations: filtering common rows and removing rows of several data frames

dfA <- data.frame(Efficiency=c(7,2,8,9), Value=c(3, 4, 7, 8))
dfB <- data.frame(Efficiency=c(7,2,4,2,8,9), Value=c(3, 4, 4, 1, 7, 8))
dfC <- data.frame(Efficiency=c(7,9), Value=c(3, 8))
I want to get the common rows of dfA and dfB. From the resulting data.frame I want to remove the rows that have the same values as dfC.
dfA+dfB (only common rows) - dfC (overlapping rows)
this should work:
library(dplyr)
inner_join(dfA, dfB) %>% anti_join(dfC)
which gives:
Efficiency Value
1 8 7
2 2 4

Resources