I have five dataframes (a-f), each of which has a column 'nq'. I want to find the max, min and average of the nq columns
classes <- c("a","b","c","d","e","f")
for (i in classes){
format(max(i$nq), scientific = TRUE)
format(min(i$nq), scientific = TRUE)
format(mean(i$nq), scientific = TRUE)
}
But the code is not working. Can you please help?
You can't use a character value as a data.frame name. The value "a" is not the same as the data.frame a.
You probably shouldn't have a bunch of data.frames lying around. You probably want to have them all in a list. Then you can lapply over them to get results.
mydata <- list(
a = data.frame(nq=runif(10)),
b = data.frame(nq=runif(10)),
c = data.frame(nq=runif(10)),
d = data.frame(nq=runif(10))
)
then you can do
lapply(mydata, function(x)
format(c(max(x$nq), min(x$nq), mean(x$nq)), scientific = TRUE)
)
to get all the values at once.
The reason it is not working is because 'i' is a character/string. As already mentioned by Mr.Flick you have to make it into a list.
Alternatively, you instead of writing i$nq in your loop you can write get(i)$nq. The get() function will search the workspace for an object by name and it will return the object itself. However, this is not as clean as making it into a list and using lapply.
Related
I am trying to create a large number of data frames in a for loop using the "assign" function in R. I want to use the colnames function to set the column names in the data frame. The code I am trying to emulate is the following:
county_tmax_min_df <- data.frame(array(NA,c(length(days),67)))
colnames(county_tmax_min_df) <- c('Date',sd_counties$NAME)
county_tmax_min_df$Date <- days
The code I have so far in the loop looks like this:
file_vars = c('file1','file2')
days <- seq(as.Date("1979-01-01"), as.Date("1979-01-02"), "days")
f = 1
for (f in 1:2){
assign(paste0('county_',file_vars[f]),data.frame(array(NA,c(length(days),67))))
}
I need to be able to set the column names similar to how I did in the above statement. How do I do this? I think it needs to be something like this, but I am unsure what goes in the text portion. The end result I need is just a bunch of data frames. Any help would be wonderful. Thank you.
expression(parse(text = ))
You can set the names within assign, like that:
file_vars = c('file1', 'file2')
days <- seq.Date(from = as.Date("1979-01-01"), to = as.Date("1979-01-02"), by = "days")
for (f in seq_along(file_vars)) {
assign(x = paste0('county_', file_vars[f]),
value = {
df <- data.frame(array(NA, c(length(days), 67)))
colnames(df) <- paste0("fancy_column_",
sample(LETTERS, size = ncol(df), replace = TRUE))
df
})
}
When in {} you can use colnames(df) or setNames to assign column names in any manner desired. In your first piece of code you are referring to sd_counties object that is not available but the generic idea should work for you.
I want to create a dataframe with 3 columns.
#First column
name_list = c("ABC_D1", "ABC_D2", "ABC_D3",
"ABC_E1", "ABC_E2", "ABC_E3",
"ABC_F1", "ABC_F2", "ABC_F3")
df1 = data.frame(C1 = name_list)
These names in column 1 are a bunch of named results of the cor.test function. The second column should consist of the correlation coefficents I get by writing ABC_D1$estimate, ABC_D2$estimate.
My problem is now that I dont want to add the $estimate manually to every single name of the first column. I tried this:
df1$C2 = paste0(df1$C1, '$estimate')
But this doesnt work, it only gives me this back:
"ABC_D1$estimate", "ABC_D2$estimate", "ABC_D3$estimate",
"ABC_E1$estimate", "ABC_E2$estimate", "ABC_E3$estimate",
"ABC_F1$estimate", "ABC_F2$estimate", "ABC_F3$estimate")
class(df1$C2)
[1] "character
How can I get the numeric result for ABC_D1$estimate in my dataframe? How can I convert these characters into Named num? The 3rd column should constist of the results of $p.value.
As pointed out by #DSGym there are several problems, including the it is not very convenient to have a list of character names, and it would be better to have a list of object instead.
Anyway, I think you can get where you want using:
estimates <- lapply(name_list, function(dat) {
dat_l <- get(dat)
dat_l[["estimate"]]
}
)
cbind(name_list, estimates)
This is not really advisable but given those premises...
Ok I think now i know what you need.
eval(parse(text = paste0("ABC_D1", '$estimate')))
You connect the two strings and use the functions parse and eval the get your results.
This it how to do it for your whole data.frame:
name_list = c("ABC_D1", "ABC_D2", "ABC_D3",
"ABC_E1", "ABC_E2", "ABC_E3",
"ABC_F1", "ABC_F2", "ABC_F3")
df1 = data.frame(C1 = name_list)
df1$C2 <- map_dbl(paste0(df1$C1, '$estimate'), function(x) eval(parse(text = x)))
I am struggling with converting several characters to vectors and making them as a list in R.
The converting rule is as follows:
Assign a number to each character. ex. A=1, B=2, C=3,...
Make a vector when the length of characters is ">=2". ex. AB = c(1,2), ABC = c(1,2,3)
Make lists containing several vectors.
For example, suppose that there is ex object with three components. For each component, I want to make it to list objects list1, list2, and list3.
ex = c("(A,B,C,D)", "(AB,BC,CD)","(AB,C)")
# 3 lists to be returned from ex object
list1 = "list(1,2,3,4)" # from (A,B,C,D)
list2 = "list(c(1,2), c(2,3), c(3,4))" # from (AB,BC,CD)
list3 = "list(c(1,2), c(3))" # from (AB,C)
Please let me know a good R function to solve the example above.
* The minor change is reflected.
lookUpTable = as.numeric(1:4) #map numbers to their respective strings
names(lookUpTable) = LETTERS[1:4]
step1<- #get rid of parentheses and split by ",".
strsplit(gsub("[()]", "", ex), ",")
result<- #split again to make things like "AB" into "A", "B", also convert the strings to numbers acc. to lookUpTable
lapply(step1, function(x){ lapply(strsplit(x, ""), function(u) unname(lookUpTable[u])) })
# assign to the global environment.
invisible(
lapply(seq_along(result), function(x) {assign(paste0("list", x), result[[x]], envir = globalenv()); NULL})
)
# get it as strings:
invisible(
lapply(seq_along(result), function(x) {assign(paste0("list_string", x), capture.output(dput(result[[x]])), envir = globalenv()); NULL})
)
data:
ex = c("(A,B,C,D)", "(AB,BC,CD)","(AB,C)")
tips and tricks:
I make use of regular expressions in gsub (and strsplit). Learn regex!
I made a lookUpTable that maps the individual strings to numbers. Make sure your lookUpTable is set up analogously.
Have a look at apply functions like in that case ?lapply.
lastly I assign the result to the global environment. I dont recommend this step but its what you have requested.
I am a long-time Stata user but am trying to familiarize myself with the syntax and logic of R. I am wondering if you could help me with writing more efficient codes as shown below (The "The Not-so-efficient Codes")
The goal is to (A) read several files (each of which represents the data of a year), (B) create the same variables for each file, and (C) combine the files into a single one for statistical analysis. I have finished revising "part A", but are struggling with the rest, particularly part B. Could you give me some ideas as to how to proceed, e.g. use unlist to unlist data.l first, or lapply to each element of data.l? I appreciate your comments-thanks.
More Efficient Codes: Part A
# Creat an empty list
data.l = list()
# Create a list of file names
fileList=list.files(path="C:/My Data, pattern=".dat")
# Read the ".dat" files into a single list
data.l = sapply(fileList, readLines)
The Not-so-efficient Codes: Part A, B and C
setwd("C:/My Data")
# Part A: Read the data. Each "dat" file is text file and each line in the file has 300 characters.
dx2004 <- readLines("2004.INJVERBT.dat")
dx2005 <- readLines("2005.INJVERBT.dat")
dx2006 <- readLines("2006.INJVERBT.dat")
# Part B-1: Create variables for each year of data
dt2004 <-data.frame(hhx = substr(dx2004,7,12),fmx = substr(dx2004,13,14),
,iphow = substr(dx2004,19,318),stringsAsFactors = FALSE)
dt2005 <-data.frame(hhx = substr(dx2005,7,12),fmx = substr(dx2005,13,14),
,iphow = substr(dx2005,19,318),stringsAsFactors = FALSE)
dt2006 <-data.frame(hhx = substr(dx2006,7,12),fmx = substr(dx2006,13,14),
iphow = substr(dx2006,19,318),stringsAsFactors = FALSE)
# Part B-2: Create the "iid" variable for each year of data
dt2004$iid<-paste0("2004",dt2004$hhx, dt2004$fmx, dt2004$fpx, dt2004$ipepno)
dt2005$iid<-paste0("2005",dt2005$hhx, dt2005$fmx, dt2005$fpx, dt2005$ipepno)
dt2006$iid<-paste0("2006",dt2006$hhx, dt2006$fmx, dt2006$fpx, dt2006$ipepno)
# Part C: Combine the three years of data into a single one
data = rbind(dt2004,dt2005, dt2006)
you are almost there. Its a combination of lapply and do.call/rbind to work with lapply's list output.
Consider this example:
test1 = "Thisistextinputnumber1"
test2 = "Thisistextinputnumber2"
test3 = "Thisistextinputnumber3"
data.l = list(test1, test2, test3)
makeDF <- function(inputText){
DF <- data.frame(hhx = substr(inputText, 7, 12), fmx = substr(inputText, 13, 14), iphow = substr(inputText, 19, 318), stringsAsFactors = FALSE)
DF <- within(DF, iid <- paste(hhx, fmx, iphow))
return(DF)
}
do.call(rbind, (lapply(data.l, makeDF)))
Here test1, test2, test3 represent your dx200X, and data.l should be the list format you get from the efficient version of Part A.
In makeDF you create your desired data.frame. The do.call(rbind, ) is somewhat standard if you work with lapply-return values.
You also might want to consider checking out the data.table-package which features the function rbindlist, replacing any do.call-rbind construction (and is much faster), next to other great utility for large data sets.
I have 9880 records in a data frame, I am trying to split it into 9 groups of 1000 each and the last group will have 880 records and also name them accordingly. I used for-loop for 1-9 groups but manually for the last 880 records, but i am sure there are better ways to achieve this,
library(sqldf)
for (i in 0:8)
{
assign(paste("test",i,sep="_"),as.data.frame(final_9880[((1000*i)+1):(1000*(i+1)), (1:53)]))
}
test_9<- num_final_9880[9001:9880,1:53]
also am unable to append all the parts in one for-loop!
#append all parts
all_9880<-rbind(test_0,test_1,test_2,test_3,test_4,test_5,test_6,test_7,test_8,test_9)
Any help is appreciated, thanks!
A small variation on this solution
ls <- split(final_9880, rep(0:9, each = 1000, length.out = 9880)) # edited to Roman's suggestion
for(i in 1:10) assign(paste("test",i,sep="_"), ls[[i]])
Your command for binding should work.
Edit
If you have many dataframes you can use a parse-eval combo. I use the package gsubfn for readability.
library(gsubfn)
nms <- paste("test", 1:10, sep="_", collapse=",")
eval(fn$parse(text='do.call(rbind, list($nms))'))
How does this work? First I create a string containing the comma-separated list of the dataframes
> paste("test", 1:10, sep="_", collapse=",")
[1] "test_1,test_2,test_3,test_4,test_5,test_6,test_7,test_8,test_9,test_10"
Then I use this string to construct the list
list(test_1,test_2,test_3,test_4,test_5,test_6,test_7,test_8,test_9,test_10)
using parse and eval with string interpolation.
eval(fn$parse(text='list($nms)'))
String interpolation is implemented via the fn$ prefix of parse, its effect is to intercept and substitute $nms with the string contained in the variable nms. Parsing and evaluating the string "list($mns)" creates the list needed. In the solution the rbind is included in the parse-eval combo.
EDIT 2
You can collect all variables with a certain pattern, put them in a list and bind them by rows.
do.call("rbind", sapply(ls(pattern = "test_"), get, simplify = FALSE))
ls finds all variables with a pattern "test_"
sapply retrieves all those variables and stores them in a list
do.call flattens the list row-wise.
No for loop required -- use split
data <- data.frame(a = 1:9880, b = sample(letters, 9880, replace = TRUE))
splitter <- (data$a-1) %/% 1000
.list <- split(data, splitter)
lapply(0:9, function(i){
assign(paste('test',i,sep='_'), .list[[(i+1)]], envir = .GlobalEnv)
return(invisible())
})
all_9880<-rbind(test_0,test_1,test_2,test_3,test_4,test_5,test_6,test_7,test_8,test_9)
identical(all_9880,data)
## [1] TRUE