I have an object that contains list of lab tests and based on the length of the object, I have created a FOR loop that processes scripts. During each loop, R should create a data frame using list in that object. Please see below.
adlb <- data.frame(subjid = c(1:20), aval = c(100:119))
adlb$paramcd <- ifelse(adlb$subjid <= 10, "ALT", "AST")
lab_list <- unique(filter(adlb, !is.na(aval))$paramcd)
for (i in 1:length(lab_list))
{
lab_name <- unlist(lab_list)[[i]]
print(lab_name)`
**???** <- adlb %>%
dplyr::filter(paramcd == lab_name) %>%
drop_na(aval)
}
When I run it, it should first create data frame named ALT followed by AST. What should I replace ??? with?
Only reason why I would prefer it this way is because it helps me to review data in question and debug scripts when needed.
Thank you in advance.
I tried lab_name[[i]] and few other options but it resulted in either error or incorrect data frame name.
I think this might help:
# example dataframes
df1 <- iris
df2 <- mtcars
df3 <- iris
#put them into list
mylist <- list(df1,df2,df3)
#give names to list
names(mylist) <- c("df_name1","df_name2","df_name3")
#put dataframes into global env
list2env(mylist ,.GlobalEnv)
Related
I am trying to create a list of all possible IP addresses in the UK, so I can plot them on a map for a data science project I am working on.
I have tried to simplify the code as much as possible, basically I have a created a blank tibble to use in the for loop, but it does not appear to get any data appended to it.
Guessing I have done the append part of the FOR LOOP incorrectly.
#install libraries
library(iptools)
library(dplyr)
library(tidyverse)
#creating a list of ip ranges for processing
ip_ranges <- list("2.96.0.0/13", "2.120.0.0/13", "2.216.0.0/13")
#converting list into dataframe
ip_ranges_to_process <- do.call(rbind.data.frame, ip_ranges)
colnames(ip_ranges_to_process) <- c("ip.ranges")
#creating a blank tibble for use in for loop
UK_ip_addresses_df <- tibble(
var_name_1 = character()
)
new_df <- tibble(
var_name_1 = character()
)
#for loop to generate all possible ip addresses in ranges
for(i in 1:nrow(ip_ranges_to_process)){
ip_numbers <- range_generate(ip_ranges_to_process$ip.ranges[i])
new_df <- ip_numbers %>%
map_df(as_tibble)
UK_ip_addresses_df[i,] <- new_df
}
Any suggestions would be welcome. Sorry I am pretty new to programming.
Have been researching this question on SO, and found only solutions for merging list elements into one large data frame. However, I am struggling with unpacking only those elements that meet certain condition.
df1 <- iris %>% filter(Sepal.Length > 2.5)
df2 <- mtcars %>% filter(qsec > 16)
not_neccessary <- head(diamonds, 10)
not_neccessary2 <- head(beaver1, 12)
data_lists <- list("#123 DATA" = df1, "CON" = not_neccessary2, "#432 DATA" = df2, "COM" = not_neccessary)
My goal is to convert only those list elements that contain "DATA" in their name. I was thinking about writing a loop function within a lapply:
a <- lapply(data_lists, function(x){if (x == "#+[1-9]+_+DATA"){new_df <- as.data.frame(x)}})
It does not work. Also was trying to make a for loop:
for (i in list){
if (i == "#+[1-9]+_+DATA"){
df <- i
}
}
It does not work neither.
Is there any effective function that will unpack my list into particular dataframes by certain condition? My R skills are very bad, especially in writing functions, although I am not really new to this language. Sorry about that.
Use grepl/grep to find lists that have 'DATA' in their name and subset the list.
result <- data_lists[grepl('DATA', names(data_lists))]
#With `grep`
#result <- data_lists[grep('DATA', names(data_lists))]
Using %like%
result <- data_lists[names(data_lists) %like% 'DATA']
I'm sure there are much better ways of doing this, I'm open to suggestions.
I have these vectors:
vkt1 <- c("df1", "df2", "df3")
vector2 <- paste("sample", wSheatx, sep="_")
The first vector contains a list of the names of dataframes stored in the environment. These are stored as strings, but I'd like to call them as variable names.
The second vector is just the first one adding "sample" at the beggining, equivalent to:
vector2 <- c('sample_df1', 'sample_df2', 'sample_df3')
These strings from vector2 would serve as the names of new data frames to be created.
Alrighty, so now I want to do something like this:
for (i in 1:length(vector){ # meaning for i in 1,2,3
vector2[i] = data.frame(which(eval(parse(text = vkt1[i])) == "Some_String", arr.ind=TRUE))
addStyle(wb, vkt1[i], cols = 1:ncol(eval(parse(text = vkt1[i]))), rows = vector2[[i]][,1]+1, style = duppedStyle, gridExpand = TRUE)
}
It may look complicated, but the idea is to make a data frames named as the strings contained in vector2, being a subset of the data frames from vkt1 when "Some_String" is found.
Then, use that created data frame and add a style to the entire row when said string is present.
vector2[[i]][,1]+1 is intended to deploy as sample_df1[,1]+1 (in the first iteration)
Note that I'm using eval(parse(text = vkt1[i])) to get the variables from the strings of vkt1. So, say, eval(parse(text = vkt1[1])) is equal do df1 (the data frame, not the string)
Like this, the code gives the following error:
In file(filename, "r") :
cannot open file 'noCoinColor_Concat': No such file or directory
Been trying to get it working like so, but I'm beginning to feel this approach might be very wrong.
It is easier to manage code and data when you keep them in a list instead of separate dataframes.
You can use mget to get all the dataframes in vkt1 in a string and let's say you want to search for 'Some_String' in the first column of each dataframe, so you can do :
new_data <- lapply(mget(vkt1), function(df) df[df[[1]] == 'Some_String', ])
I haven't included the addStyle code here because I don't know from which package it is and what it does but you can easily include it in lapply's anonymous function.
Is it not easier to combine your data frames into a list and then use apply or map family functions to adjust your data frames?
data(mtcars)
df1 <- mtcars %>% filter(cyl == 4)
df2 <- mtcars %>% filter(cyl == 6)
df3 <- mtcars %>% filter(cyl == 8)
df_old_names <- c("df1", "df2", "df3")
df_new_names <- c("df_cyl_4", "df_cyl_6", "df_cyl_8")
df_list <- lapply(df_old_names, get)
names(df_list) <- df_new_names
I have the following dataframes that are stored in a list as a result of using the map() function:
How can I extract the six dataframes from the list? I would like to do this because I would like to give each column a different name of the dataframe and then store all data in a csv file? Or do I not have to extract the dfs from the list then?
I am not sure about what you are exactly looking for, so below are something just from guessing your objective:
If you want to extract the data frame as objects in your global environment, then you can do like this:
list2env(setNames(dats1,paste0("df",seq(dats1))),envir = .GlobalEnv)
Assuming you are giving names "col1" and"col2" to two columns of different data frames in your list, maybe this can help you
dats1 <- lapply(dats1, setNames, c("col1","col2"))
You have a few options
Fake data
library(tidyverse)
df <- tibble(a = 1:9,b = letters[1:9])
x <- list(df,df,df,df)
You can bind dfs and create just one
bind_rows(x)
You can execute your logic on all dfs
logic <- . %>%
mutate(c = a*3)
x %>% map(logic)
You can can also name the dfs inside the list
names(x) <- letters[1:4]
bind_rows(x,.id = "id")
I have gotten instructions to do an analysis in R with the vegan package (concerning DCA's).
The instructions on a single dataframe are pretty straightforward, but I would like to apply the analysis on a set of dataframes.
I know this can be done with a for-loop or lapply or sapply, but I have trouble dealing with the fact that each step of the analysis a new extension is added to the name of the dataframe.
An example below
Say I have a dataframe DF, then it goes as follows:
DF.t1 <- decostand(DF, "total")
DF.t2 <- decostand(DF.t1, "max")
DF.t2.dca <- decorana(DF.t2)
DF.t2.dca.DW <- decorana(DF.t2, iweigh=1)
names(DF.t2.dca)
summary(DF.t2.dca)
DF.t2.dca.taxonscores <- scores(DF.t2.dca, display=c("species"), choices=c(1,2))
DF.t2.dca.taxonscores <- DF.t2.dca$cproj[ ,1:2]
DF.t2.dca.samplescores <- scores(DF.t2.dca, display=c("sites"), choices=1)
What I want to achieve is to run several dataframes through this analysis without writing it all out separately.
Let's say I have a set of dataframes called "DF_1", "DF_2" & "DF_3" which I want to do this analysis on.
I probably need to put the dataframes in a list, and get all the steps in a for-loop or one of the apply methods.
But how do I approach the problem with the extensions added (.ra, .t1, .t2, .t2.dca, .t2.dca.DW etc.) to the dataframe names?
Edit: I need to retain the original dataframes after the analysis, in order to do follow-up analysis on them.
Unless you have a very limited amount of data frames, I would not advise to define ca. 8 new objects for each data frame in the global environment as this can become very messy.
One approach you might consider is creating a nested list where the first level is the data frame and the second level are the modified data frames.
# some example data sets
DF1 <- mtcars
DF2 <- mtcars*2
DF3 <- mtcars*3
all_dfs <- list(DF1 = DF1, DF2 = DF2, DF3 =DF3)
some_stuff <- function(df) {
DF.t1 <- decostand(df, "total")
DF.t2 <- decostand(DF.t1, "max")
DF.t2.dca <- decorana(DF.t2)
DF.t2.dca.DW <- decorana(DF.t2, iweigh=1)
names(DF.t2.dca)
summary(DF.t2.dca)
DF.t2.dca.taxonscores <- scores(DF.t2.dca, display=c("species"), choices=c(1,2))
DF.t2.dca.taxonscores <- DF.t2.dca$cproj[ ,1:2]
DF.t2.dca.samplescores <- scores(DF.t2.dca, display=c("sites"), choices=1)
return(list(DF.t1 = DF.t1, DF.t2 = DF.t2,
DF.t2.dca = DF.t2.dca,
DF.t2.dca.DW = DF.t2.dca.DW,
DF.t2.dca.taxonscores = DF.t2.dca.taxonscores,
DF.t2.dca.taxonscores = DF.t2.dca.taxonscores
))
}
nested_list <- lapply(all_dfs, some_stuff)
# To obtain any of the objects for a specific data.frame you could, for example, run
nested_list$DF1$DF.t2.dca.DW