Create vectors from a contingency table [duplicate] - r

This question already has answers here:
Split a large dataframe into a list of data frames based on common value in column
(3 answers)
Closed 2 years ago.
I have a contingency table of meteorological stations and frequency of occurrence. I used logical indexing to create separate vectors like below (b1:b5) from the table. However there has to be a simpler way, perhaps from the apply family. Can someone provide such an example, thanks.
mf1<-c("USW00023047","USW00013966","USC00416740","USC00413828", "USC00414982", "USC00414982", "USW00013966", "USW00013966", "USW00003927",
"USW00003927", "USC00412019", "USC00411596", "USW00012960", "USW00012960", "USW00012960", "USW00012960", "USW00012960", "USC00417327",
"USC00417327", "USC00418433", "USC00417743", "USC00419499", "USC00419847", "USR0000TCLM", "USR0000TCOL", "USW00012921", "USW00012921",
"USW00012970", "USW00012921", "USW00012921", "USW00012924")
table(mf1)
dfcont<-as.data.frame(table(mf1))
a<-dfcont$mf1
b1<-a[dfcont$Freq < 6]
b2<-a[dfcont$Freq == 2]
b3<-a[dfcont$Freq == 3]
b4<-a[dfcont$Freq == 4]
b5<-a[dfcont$Freq == 5]

You can use split:
temp <- split(as.character(dfcont$mf1), dfcont$Freq)
This will give you list of vectors in temp. Usually, it is better to keep data in a list but if you want them as separate vectors assign name to them and use list2env
names(temp) <- paste0('b', seq_along(temp))
list2env(temp, .GlobalEnv)
You would now have b1, b2 etc in your global environment.

I couldn't find anything simpler than
tbl <- table(mf1)
split(names(tbl), tbl)
If the names need to be b*, assign by pasting the "b" as a prefix to the current names.
names(sp) <- paste0('b', names(sp))

Related

R: Function that uses variable dataframe names from a vector [duplicate]

This question already has answers here:
How to convert certain columns only to numeric?
(4 answers)
Make a list from ls(pattern="") [R]
(1 answer)
Closed 2 years ago.
I have a number of x dataframes (depending on previous operation). The names of the dataframes are stored in a different vector:
> list.industries
[1] "misc" "machinery" "electronics" "drugs" "chemicals"
Now, I want to set every column after the 4th as numeric. As the number of created dataframes and, therefore, the names change, I want to ask, if there is any way to do it automatically.
I tried:
for (i in 1:length(list.industries)) {
paste0(list.industries) <- lapply(paste0(list.industries)[,4:ncol(paste0(list.industries))] , as.numeric)
}
Where the function places automatically the name of the dataframe from the vector list.industries to set it as numeric.
Is there any way, how I can place the name of a dataframe as a variable from a vector?
Thanks!
You can use mget to get data as a named list, turn every columns after 4th as numeric and return the dataframe back.
new_data <- lapply(mget(list.industries), function(x) {
x[, 4:ncol(x)] <- lapply(x[, 4:ncol(x)], as.numeric)
x
})
new_data would have list of dataframes, if you want the changes to be reflected in the orignal dataframe use list2env.
list2env(new_data, .GlobalEnv)
You could use this fragment (untested):
one_df <- function(x) {
dat <- get(x)
for (i in seq(4, ncol(dat))) dat[,i] <- as.numeric(dat[,i])
return(dat)
}
ans <- lapply(list.industries, one_df)
So in short: you are looking for get.

How to loop through a vector of data frame names to print first columns of the df's? [duplicate]

This question already has answers here:
How to extract certain columns from a list of data frames
(3 answers)
Closed 2 years ago.
so x is a vector. i am trying to print the first col of df's name's saved in the vector. so far I have tried the below but they don't seem to work.
x = (c('Ethereum,another Df..., another DF...,'))
for (i in x){
print(i[,1])
}
sapply(toString(Ethereum), function(i) print(i[1]))
You can try this
x <- c('Ethereum','anotherDf',...)
for (i in x){
print(get(i)[,1])
}
You can use mget to get data in a list and using lapply extract the first column of each dataframe in the list.
data <- lapply(mget(x), `[`, 1)
#Use `[[` to get it as vector.
#data <- lapply(mget(x), `[[`, 1)
Similar solution using purrr::map :
data <- purrr::map(mget(x), `[`, 1)

Creating Subset data frames in R within For loop [duplicate]

This question already has answers here:
Split a large dataframe into a list of data frames based on common value in column
(3 answers)
Closed 4 years ago.
What I am trying to do is filter a larger data frame into 78 unique data frames based on the value of the first column in the larger data frame. The only way I can think of doing it properly is by applying the filter() function inside a for() loop:
for (i in 1:nrow(plantline))
{x1 = filter(rawdta.df, Plant_Line == plantline$Plant_Line[i])}
The issue is I don't know how to create a new data frame, say x2, x3, x4... every time the loop runs.
Can someone tell me if that is possible or if I should be trying to do this some other way?
There must be many duplicates for this question
split(plantline, plantline$Plant_Line)
will create a list of data.frames.
However, depending on your use case, splitting the large data.frame into pieces might not be necessary as grouping can be used.
You could use split -
# creates a list of dataframes into 78 unique data frames based on
# the value of the first column in the larger data frame
lst = split(large_data_frame, large_data_frame$first_column)
# takes the dataframes out of the list into the global environment
# although it is not suggested since it is difficult to work with 78
# dataframes
list2env(lst, envir = .GlobalEnv)
The names of the dataframes will be the same as the value of the variables in the first column.
It would be easier if we could see the dataframes....
I propose something nevertheless. You can create a list of dataframes:
dataframes <- vector("list", nrow(plantline))
for (i in 1:nrow(plantline)){
dataframes[[i]] = filter(rawdta.df, Plant_Line == plantline$Plant_Line[i])
}
You can use assign :
for (i in 1:nrow(plantline))
{assign(paste0(x,i), filter(rawdta.df, Plant_Line == plantline$Plant_Line[i]))}
alternatively you can save your results in a list :
X <- list()
for (i in 1:nrow(plantline))
{X[[i]] = filter(rawdta.df, Plant_Line == plantline$Plant_Line[i])}
Would be easier with sample data. by would be my favorite.
d <- data.frame(plantline = rep(LETTERS[1:3], 4),
x = 1:12,
stringsAsFactors = F)
l <- by(d, d$plantline, data.frame)
print(l$A)
print(l$B)
Solution using plyr:
ma <- cbind(x = 1:10, y = (-4:5)^2, z = 1:2)
ma <- as.data.frame(ma)
library(plyr)
dlply(ma, "z") # you split ma by the column named z

what does it mean: samples[,dim(samples)[[2]],2] [duplicate]

This question already has answers here:
The difference between bracket [ ] and double bracket [[ ]] for accessing the elements of a list or dataframe
(11 answers)
Closed 5 years ago.
I´m a newbie in R (and program). There are some examples with one or two "[", but I could not be sure, what they means.
dim(data)[[-1]] # means the column number of a data frame
dim(data)[-1] # what does it mean?
samples[,dim(samples)[[2]],2] # what does this mean?
Thanks a lot for your help!
In case data is stored in object of class data.frame, matrix or array, dim() returns a numeric vector containing size of each dimension. So the subsetting operator is simply applied to that vector. The operations you described can be used more generally. Here is explanation of what those exactly do.
Let vec <- dim(data)
vec[-1] - drops the first element similar to vec[2:length(vec)]
vec[[-1]] - same as above in your example, but is usually used in context of data.frames and lists. Here is an example that demonstrates the difference:
dt <- data.frame(a = rnorm(20), b = rnorm(20))
dt[-1] # returns data.frame with only b column
dt[[-1]] # returns numeric vector containing values of b column
samples[, dim(samples)[[2]], 2] - this syntax is more often use for selecting dimensions in an array (matrix with more than rows and columns) and will return a numeric vector that contains all rows in last column of the third dimension. Can play with the following to see for yourself:
array <- array(data = rnorm(8), dim = c(2, 2, 2))
array[, dim(array)[[2]], 2]
Note: Plz provide example data so we don't have to guess what objects are or replicate it.

Alternative to FOR Loop for below [duplicate]

This question already has answers here:
Group Data in R for consecutive rows
(3 answers)
Closed 6 years ago.
I have written a for loop that takes a group of 5 rows from a dataframe and passes it to a function, the function then returns just one row after doing some operations on those 5 rows. Below is the code:
for (i in 1:nrow(features_data1)){
if (i - start == 4){
group = features_data1[start:i,]
group <- as.data.frame(group)
start <- i+1
sub_data = feature_calculation(group)
final_data = rbind(final_data,sub_data)
}
}
Can anyone please suggest me an alternative to this as the for loop is taking a lot of time. The function feature_calculation is huge.
Try this for a base R approach:
# convert features to data frame in advance so we only have to do this once
features_df <- as.data.frame(features_data1)
# assign each observation (row) to a group of 5 rows and split the data frame into a list of data frames
group_assignments <- as.factor(rep(1:ceiling(nrow(features_df) / 5), each = 5, length.out = nrow(features_df)))
groups <- split(features_df, group_assignments)
# apply your function to each group individually (i.e. to each element in the list)
sub_data <- lapply(X = groups, FUN = feature_calculation)
# bind your list of data frames into a single data frame
final_data <- do.call(rbind, sub_data)
You might be able to use the purrr and dplyr packages for a speed-up. The latter has a function bind_rows that is much quicker than do.call(rbind, list_of_data_frames) if this is likely to be very large.

Resources