aggregating categories in R - r

Hi I'm new to R and I'm trying to aggregate a list and count the total but not sure how to do it.
myList =c("A", "B", "A", "A", "B")
I can create a function that loops through the list and groups each category and counts them. But I'm sure there must be an easier way to group this so that I can get the category and the number each category. That is A would be 3 and B would be 2.
I tried using the function below but I believe I don't have the proper syntax.
aggr <-aggregate(myList, count)
Thanks for your help in advance.

I'm guessing that you're just looking for table and not actually aggregate:
myList =c("A", "B", "A", "A", "B")
table(myList)
# myList
# A B
# 3 2
tapply can also be handy here:
tapply(myList, myList, length)
# A B
# 3 2
And, I suppose you could "trick" aggregate in the following way:
aggregate(ind ~ myList, data.frame(myList, ind = 1), length)
# myList ind
# 1 A 3
# 2 B 2
If you're looking to understand why as well, aggregate generally takes a data.frame as its input, and you specify one or more columns to be aggregated grouped by one or more other columns (or vectors in your workspace of the same length as the number of rows).
In the example above, I converted your vector into a data.frame, adding a dummy column where all the values were "1". I then used the formula ind ~ myList (where ~ is sort of like "is grouped by") and set the aggregation function to length (there is no count in base R, though that function can be found in different packages).

Related

Replacing elements of a list with those from another list based on name

I'm trying to use the named index to replace some elements of a list.
I have three lists:
Superset
Subset
SubsetNames
My objective is to replace the old elements in Superset with the corresponding ones from Subset where Name(Subset) == Name(Superset).
Example Code (Edited for correctness):
# Setting things up
Superset <- list(1, 2, 3, 4)
names(Superset) <- c("a", "b", "c", "d")
Subset <- list(5, 6)
names(Subset) <- c("b", "c") # or any names from Superset
SubsetNames <- as.list(names(Subset))
I have tried things like this:
lapply(SubsetNames, FUN=function(x) Superset[[x]] <- Subset[[x]])
And:
Superset[SubsetNames] <- Subset
I even tried to construct a for-loop with a counter however this is not a working solution in my scenario.
In reality, Superset is a list of dataframes, each of which has almost 90k datapoints in 117 columns.
Some of those dataframes need some tweaking. I have code which successfully extracts a list of the ones needing tweaking and tweaks them... now I just need to put them back.
Your help much appreciated! Thank you!
We can use the names of the 'Subset' to subset the 'Superset' and assign it to values of 'Subset'
Superset[names(Subset)] <- Subset
Superset
#$a
#[1] 1
#$b
#[1] 5
#$c
#[1] 6
#$d
#[1] 4
The list creation seems to be faulty. It would be as.list
Superset <- as.list(1:4)
It will return a list of length 4 as opposed to length 1 with list(1:4)
If you want to change for every value in Subset, you could just do
modifyList(Superset, Subset)
or if you are just updating a smaller set of values from subset
modifyList(Superset, Subset[SubsetNames])

Recursive indexing only works up to [[1:3]]

I need to refer to individual dataframes within a list of dataframes (one by one) produced from a lapply function, but I'm getting the "recursive indexing failed at level 3" error. I've found similar questions, but none of them explain why this doesn't work.
I used lapply to make a list of dataframes, each with a different filter applied. The output in my reproducible example has 4 dataframes in the output (dfs). Now I want to refer to each dataframe in turn by indexing its position in the list.
If I use the format c(dfs[[1]], dfs[[2]], dfs[[3]], dfs[[4]]) I get the output that I want, and it works for the next function I need to apply, but it seems very inefficient.
When I try to shorten it by using c(dfs[[1:4]]) instead, I get the error Error in data1[[1:4]] : recursive indexing failed at level 3. If I try c(dfs[[1:3]]), it runs, bit doesn't give the output I expect (no longer a list of dataframes).
Here's an example:
library(tidyverse) # for glimpse, filter, mutate
data(mtcars)
mtcars2 <- mutate(mtcars, var = rep(c("A", "B", "C", "D"), len = 32)) # need a variable with more than 3 possible outcomes
glimpse(mtcars2)
list <- c("A", "B", "C", "D") # each new dataframe will filter based on these variables
dfs <- lapply(list, function(x) {
mtcars2 %>% filter(var == x) %>% glimpse()
}) # each dataframe now only contains A, B, C, or D
dfs # list of dataframes produced from lapply
dflist1 <- list(dfs[[1]], dfs[[2]], dfs[[3]], dfs[[4]]) # indexing one by one
dflist1 # this is what I want
dflist2 <- list(dfs[[1:4]]) # indexing all together
dflist2 # this produces an error
dflist3 <- list(dfs[[1:3]])
dflist3 # this runs, but the output is just `[[1]] [1] 4`, not a list of dataframes
I want something that looks like the output from dflist1 but that doesn't require me to add and remove list items every time the number of dataframes changes. I can't use the lapply output (dfs) as it is because my next function can't locate the variables within each dataframe as needed.
Any guidance appreciated.

Split Data Frame and call subframe rows by their index

This is a very basic R programming question but I haven't found the answer anywhere, would really appreciate your help:
I split my large dataframe into 23 subframes of 4 rows in length as follows:
DataframeSplits <- split(Dataframe,rep(1:23,each=4))
Say I want to call the second subframe I can:
DataframeSplits[2]
But what if I want to call a specific row of that subframe (using index position)?
I was hoping for something like this (say I call 2nd subframe's 2nd row):
DataframeSplits[2][2,]
But that doesn't work with the error message
Error in DataframeSplits[2][2, ] : incorrect number of dimensions
If you want to subset the list which is returned by split and use it for later subsetting, you must use double parenthesis like this to get to the sub-data.frame. Then you can subset this one with single parenthesis as you already tried:
Dataframe <- data.frame(x = rep(c("a", "b", "c", "d"), 23), y = 1)
DataframeSplits <- split(Dataframe,rep(1:23,each=4))
DataframeSplits[[2]][2,]
# x y
# 6 b 1
More info on subsetting can be found in the excellent book by Hadley Wickham.

Selecting data based on different values in the same column and unique value in another column in R [duplicate]

This question already has answers here:
Select groups based on number of unique / distinct values
(4 answers)
Closed 6 years ago.
Sorry if the title is confusing, wasn't sure how to describe this problem. Ok so I have a dataframe with one column that is sampling site, of which I have many, and one column that is sampling method, of which there are only two. Here's a simplified version:
site <- c("X", "Y", "X","Z")
method <- c("A", "B", "B", "A")
data <- data.frame(site, method)
data
site method
1 X A
2 Y B
3 X B
4 Z A
Now some sites got sampled using both sampling method A and method B, and some got sampled by only method A or method B.
I am trying to select only those sites that got sampled using both methods. For example, the output for this data would look like this:
site method
1 X A
2 X B
I don't have a sample code because I honestly do not know how to do this. Please help!
We can use data.table. Convert the 'data.frame' to 'data.table' (setDT(data)), grouped by 'site', if the length of the unique 'method' is greater than 1, then get the Subset of Data.table.
library(data.table)
setDT(data)[, if(uniqueN(method)>1) .SD , by = site]
Or with dplyr, we can do it.
library(dplyr)
data %>%
group_by(site) %>%
filter(n_distinct(method)>1)
A possible base R option would be
data[ with(data, ave(method, site, FUN = function(x) length(unique(x))>1)),]

how to put many rows in a dataframe by looping in r

I am looping for example, from a list ["A", "B","C"],
I will run a for loop
to get v<- for different run it has v1,v2,v3 different values
I want to use cbind("A", "v1") #I want to get three of rows (after 3 times loop) together to form a dataframe.
At the end, I want to get a dataframe which has the format of
"A" v1
"B" v2
"C" v3
How to get this output? Thanks!
I may have misunderstood the request, but is the following what you are looking for?
input <- c("A", "B", "C")
data.frame(x=input, y=paste0("v", seq_along(input)))
# x y
# 1 A v1
# 2 B v2
# 3 C v3
Note that the approach you mentioned in your question (iteratively building a row and combining with the existing data via rbind) is a bad idea both because it will take a lot more typing (note that I could do the operation in one line) and also because it is inefficient (you can read more about that in the second circle of the R inferno).
The part I have been stuck by is that, I have to start with a empty dataframe
df <-data.frame()
for (e in mylist){
v <- function(e) #get the value our from e by a function
one_row<- cbind(e, v) #cbind e, and v corresponding to e
new_f <-data.frame(one_row)
output <-rbind(output,new_f)
}
At the end, I get the right output.

Resources