Appending list to data frame in R - r

I have created an empty data frame in R with two columns:
d<-data.frame(id=c(), numobs=c())
I would like to append this data frame (in a loop) with a list, d1 that has output:
[1] 1 100
I tried using rbind:
d<-rbind(d, d2)
and merge:
d<-merge(d, d2)
And I even tried just making a list of lists and then converting it to a data frame, and then giving that data frame names:
d<-rbind(dlist1, dlist2)
dframe<-data.frame(d)
names(dframe)<-c("id","numobs")
But none of these seem to meet the standards of a routine checker (this is for a class), which gives the error:
Error: all(names(cc) %in% c("id", "nobs")) is not TRUE
Even though it works fine in my workspace.
This is frustrating since the error does not reveal where the error is occurring.
Can anyone help me to either merge 2 data frames or append a data frame with a list?

I think you are confusing the purpose of rbind and merge. rbind appends data.frames or named lists, or both vertically. While merge combines data.frames horizontally.
You seem to be also confused by vector's and list's. In R, list can take different datatypes for each element, while vector has to have all elements the same type. Both list and vector are one-dimensional. When you use rbind you want to append a named list, not a named/unnamed vector.
Unnamed Vectors and Lists
The way you define a vector is with the c() function. The way you define an unnamed list is with the list() function, like so:
vec1 = c(1, 10)
# > vec1
# [1] 1 10
list1 = list(1, 10)
# > list1
# [[1]]
# [1] 1
#
# [[2]]
# [1] 10
Notice that both vec1 and list1 have two elements, but list1 is storing the two numbers as two separate vectors (element [[1]] the vector c(1) and [[2]] the vector c(10))
Named Vectors and Lists
You can also create named vectors and lists. You do this by:
vec2 = c(id = 1, numobs = 10)
# > vec2
# id numobs
# 1 10
list2 = list(id = 1, numobs = 10)
# > list2
# $id
# [1] 1
#
# $numobs
# [1] 10
Same data structure for both, but the elements are named.
Dataframes as Lists
Notice that list2 has a $ in front of each element name. This might give you some clue that data.frame's are actually list's with each column an element of the list, since df$column is often used to extract a column from a dataframe. This makes sense since both list's and data.frame's can take different datatypes, unlike vectors's.
The rbind function
When your first element is a dataframe, rbind requires that what you are appending has the same names as the columns of the dataframe. Now, a named vector would not work, because the elements of a vector are not treated as columns of a dataframe, whereas a named list matches elements with columns if the names are the same:
To demonstrate:
d<-data.frame(id=c(), numobs=c())
rbind(d, c(1, 10))
# X1 X10
# 1 1 10
rbind(d, c(id = 1, numobs = 10))
# X1 X10
# 1 1 10
rbind(d, list(1, 10))
# X1 X10
# 1 1 10
rbind(d, list(id = 1, numobs = 10))
# id numobs
# 1 1 10
Knowing the above, it is obvious that you can most certainly also rbind two dataframes with column names that match:
df2 = data.frame(id = 1, numobs = 10)
rbind(d, df2)
# id numobs
# 1 1 10

For starters, the routine checker appears to be looking for columns labeled "id" and "nobs". If that doesn't match your file output, you'll get that error.
I'm taking what is probably the same class and had the same error; correcting my column names made that go away (I'd labeled the 2nd one "nob" not "nobs"!) Now I've gotten the routine checker to complete correctly, or so it seems... but it outputs three data files, and the first and last files are correct but the second one yields "Sorry, that is incorrect." No further feedback. Maddening!
No point posting my code here as it runs fine locally with all the course examples, and it's kinda hard to debug when you don't know what the script is asking for. Sigh.

That d2 object is being printed as an atomic vector would be. Maybe if you showed us either dput(d2) or str(d2) you would havea better understanding of R lists. Furthermore that first bit of code does not produce a two column dataframe, either.
> d<-data.frame(id=1, numobs=1)[0, ] # 2-cl dataframe with 0 rows
> dput(d)
structure(list(id = numeric(0), numobs = numeric(0)), .Names = c("id",
"numobs"), row.names = integer(0), class = "data.frame")
> d2 <- list(id="fifty three", numobs=6) # names that match names(d)
> rbind(d,d2)
id numobs
2 fifty three 6

Related

Why won't R recognize data frame column names within lists?

HEADLINE: Is there a way to get R to recognize data.frame column names contained within lists in the same way that it can recognize free-floating vectors?
SETUP: Say I have a vector named varA:
(varA <- 1:6)
# [1] 1 2 3 4 5 6
To get the length of varA, I could do:
length(varA)
#[1] 6
and if the variable was contained within a larger list, the variable and its length could still be found by doing:
list <- list(vars = "varA")
length(get(list$vars[1]))
#[1] 6
PROBLEM:
This is not the case when I substitute the vector for a dataframe column and I don't know how to work around this:
rows <- 1:6
cols <- c("colA")
(df <- data.frame(matrix(NA,
nrow = length(rows),
ncol = length(cols),
dimnames = list(rows, cols))))
# colA
# 1 NA
# 2 NA
# 3 NA
# 4 NA
# 5 NA
# 6 NA
list <- list(vars = "varA",
cols = "df$colA")
length(get(list$vars[1]))
#[1] 6
length(get(list$cols[1]))
#Error in get(list$cols[1]) : object 'df$colA' not found
Though this contrived example seems inane, because I could always use the simple length(variable) approach, I'm actually interested in writing data from hundreds of variables varying in lengths onto respective dataframe columns, and so keeping them in a list that I could iterate through would be very helpful. I've tried everything I could think of, but it may be the case that it's just not possible in R, especially given that I cannot find any posts with solutions to the issue.
You could try:
> length(eval(parse(text = list$cols[1])))
[1] 6
Or:
list <- list(vars = "varA",
cols = "colA")
length(df[, list$cols[1]])
[1] 6
Or with regex:
list <- list(vars = "varA",
cols = "df$colA")
length(df[, sub(".*\\$", "", list$cols[1])])
[1] 6
If you are truly working with a data frame d, then nrow(d) is the length of all of the variables in d. There should be no reason to use length in this case.
If you are actually working with a list x containing variables of potentially different lengths, then you should use the [[ operator to extract those variables by name (see ?Extract):
x <- list(a = 1:10, b = rnorm(20L))
l <- list(vars = "a")
length(d[[l$vars[1L]]]) # 10
If you insist on using get (you shouldn't), then you need to supply a second argument telling it where to look for the variable (see ?get):
length(get(l$vars[1L], x)) # 10

R write elements of nested list to csv

I have a list of lists which, in turn, have multiple lists in them due to the structure of some JSON files. Every list has the same number (i.e., 48 lists of 1 list, of 1 list, of 1 list, of 2 lists [where I need the first of the last two]). Now, the issue is, I need to retrieve deeply nested data from all of these lists.
For a reproducible example.
The list structure is roughly as follows (maybe one more level):
list1 = list(speech1 = 1, speech2 = 2)
list2 = list(list1, randomvariable="rando")
list3 = list(list2) #container
list4 = list(list3, name="name", stage="stage")
list5 = list(list4) #container
list6 = list(list5, date="date")
listmain1 = list(list6)
listmain2 = list(list6)
listmain3 = list(listmain1, listmain2)
The structure should like like so:
[[1]]
[[1]][[1]]
[[1]][[1]][[1]]
[[1]][[1]][[1]][[1]]
[[1]][[1]][[1]][[1]][[1]]
[[1]][[1]][[1]][[1]][[1]][[1]]
[[1]][[1]][[1]][[1]][[1]][[1]][[1]]
[[1]][[1]][[1]][[1]][[1]][[1]][[1]]$speech1
[1] 1
[[1]][[1]][[1]][[1]][[1]][[1]][[1]]$speech2
[1] 2
[[1]][[1]][[1]][[1]][[1]][[1]]$randomvariable
[1] "rando"
[[1]][[1]][[1]][[1]]$name
[1] "name"
[[1]][[1]][[1]][[1]]$stage
[1] "stage"
[[1]][[1]]$date
[1] "date"
[[2]]
[[2]][[1]]
[[2]][[1]][[1]]
[[2]][[1]][[1]][[1]]
[[2]][[1]][[1]][[1]][[1]]
[[2]][[1]][[1]][[1]][[1]][[1]]
[[2]][[1]][[1]][[1]][[1]][[1]][[1]]
[[2]][[1]][[1]][[1]][[1]][[1]][[1]]$speech1
[1] 1
[[2]][[1]][[1]][[1]][[1]][[1]][[1]]$speech2
[1] 2
[[2]][[1]][[1]][[1]][[1]][[1]]$randomvariable
[1] "rando"
[[2]][[1]][[1]][[1]]$name
[1] "name"
[[2]][[1]][[1]][[1]]$stage
[1] "stage"
[[2]][[1]]$date
[1] "date"
The end result would look like this:
date name speech1 speech2
1
2
I want to make columns out of the variables which I need and rows out of the lists that I extract them from. In the above example, I would need to retrieve variables speech1, speech2, name, and date from all of the main lists and convert to a simpler dataframe. I'm not quite sure the fastest way to do this and have been knocking my head over with lapply() and purrr for the last couple of days. Ideally, I want to treat the lists as rowIDs with flattened variables in the columns -- but that has also been tricky. Any help is appreciated.
By looping through each list, flatten it and getting the values, it can be achieved quickly with base R:
# Your data
list1 = list(speech1 = 1, speech2 = 2)
list2 = list(list1, randomvariable="rando")
list3 = list(list2) #container
list4 = list(list3, name="name", stage="stage")
list5 = list(list4) #container
list6 = list(list5, date="date")
listmain1 = list(list6)
listmain2 = list(list6)
listmain3 = list(listmain1, listmain2)
# Loop over each list inside listmain3
flatten_list <- lapply(listmain3, function(x) {
# Flatten the list and extract the values that
# you're interested in
unlist(x)[c("date", "name", "speech1", "speech2")]
})
# bind each separate listo into a data frame
as.data.frame(do.call(rbind, flatten_list))
#> date name speech1 speech2
#> 1 date name 1 2
#> 2 date name 1 2
Unless you want to map the row names to some values in particular from each list, the row names should have the same order as the number of lists. That is, if you run this on 48 nested lists, the row names will go down to 1:48 so no need to use the row.names argument.

Adding single rows of a data frame as columns to a large number of other datasets matching 1 by 1

I have 23 data frames each containing ~20 observations over 200 variables and another data frame containing 13 variables and 23 observerations. These 13 variables hold information about the 23 data frames.
What I'm trying to do is to find a way to add the information from the lone data frame to each corresponding data frame in the list of 23, so that each observation in one out of the 23 data frames will hold the same value (e.g. the timepoint the whole data frame has been recorded).
The first line in the lone data frame corresponds to the information for the first data frame of the list of 23 and so on.
ls()
[1] "df1" "df10" "df11" "df12" "df13" "df14" "df15" "df16" "df17" "df18" "df19" "df2"
[13] "df20" "df21" "df22" "df23" "df3" "df4" "df5" "df6" "df7" "df8" "df9" "i"
[25] "lf"
After some research I tried putting this into a list but realized that I have actually no idea in which order the list stores my data. I know that df1 matches row one of the lone frame "lf" (and if the list just flips things I'll match it the wrong way).
So on a single example I tried combine which worked somewhat (but not all too well):
> testdf <- c(df1,lf[1,])
> is.data.frame(testdf)
[1] FALSE
> testdf <- as.data.frame(testdf)
> is.data.frame(testdf)
[1] TRUE
At first it was a list, but using as.data.frame and having a look at the specific columns using View() it was the result I need. e.g. a new column at the end of the frame containing a variable like "time" that has values 13:37 for all observations in "df1".
Next I tried a loop...
for (i in 1:23){
+ assign(paste0("df",i), cbind(paste0("df",i),lf[i,], row.names = NULL))
+ }
...basically just trying to do what I did first multiple times (as.data.frame() is missing here, but it doesn't change a thing). What happens is that each data frame now only has 1 Observeration containing 13 variables I wanted to add at the end of the original frame.
After that everything has gone to s*** basically. I've tried to google for hours, but couldn't get anything to work really. Mostly I've tried playing around with it as a list until I realized this was a bad idea without getting the order right first (I actually know now how I can get that sorted out but right now I don't have the energy to do that. If you have a solution with a list that contains the name of each data frame as stored in the list, I'm sure I can get up to that point).
EDIT So I tried to make an example and show where I'm coming from. I hope it's more clear. I'm aware that I sadly don't solve it the "R-way" like this, which is why I tried looking at lists and apply a lot, but wasn't able to come up with a solution still.
> #create 3 data frames, 5 observations and 10 variables each
> df1 <- as.data.frame(matrix(rnorm(50, mean = 50, sd = 10), ncol = 10, nrow = 5))
> df2 <- as.data.frame(matrix(rnorm(50, mean = 50, sd = 10), ncol = 10, nrow = 5))
> df3 <- as.data.frame(matrix(rnorm(50, mean = 50, sd = 10), ncol = 10, nrow = 5))
>
> #create lone data frame with 3 observerations (1 per data frame) and 2 variables
> df4 <- as.data.frame(matrix(rnorm(6, mean = 5, sd = 1), ncol = 2, nrow = 3))
>
> #create colnames for better explanation
> cn <- c()
> for (i in 1:12){
+ cn[i] <- paste0("Var",i)
+ }
> colnames(df1) <- cn[1:10]
> colnames(df2) <- cn[1:10]
> colnames(df3) <- cn[1:10]
> colnames(df4) <- cn[11:12]
>
> #working example for 1 out of 3 matches
> #adding the first row of the lone data frame "df4" containing
> #Var11 and Var12 to df1. Result is as desired
> newdf1 <- c(df1,df4[1,])
> as.data.frame(newdf1)
Var1 Var2 Var3 Var4 Var5 Var6 Var7 Var8 Var9 Var10 Var11 Var12
1 52.37538 48.47529 41.93258 45.93547 41.71611 58.86811 40.70888 41.87981 56.80464 49.73488 5.233276 4.417211
2 51.90261 61.72404 44.96621 48.59473 51.61673 51.07525 55.02000 43.48264 34.03446 48.93913 5.233276 4.417211
3 39.85056 48.72688 49.93816 60.41899 54.90524 56.84387 53.92486 55.92178 50.81779 66.03640 5.233276 4.417211
4 41.61915 53.22312 47.96660 50.79573 34.98073 41.81004 46.43976 45.49678 32.48257 58.65475 5.233276 4.417211
5 58.52455 39.70007 51.26386 39.92583 47.08723 31.41743 45.34423 63.06964 61.07181 55.44908 5.233276 4.417211
> df4
Var11 Var12
1 5.233276 4.417211
2 5.309388 5.375850
3 6.342876 5.318077
Really grateful for any help offered :)
PS: My first post here, I hope it's readable.
Having a bunch of data.frames lying around with names that have numbers in them is a sign that you're not doing things the "R way". Another sign that things aren't looking good is the use of assign(). You generally should keep such objects in a list in R. That makes everything easier to work with.
But let's say you have such data frames
df1<-data.frame(id=1:10, a=1:10)
df2<-data.frame(id=1:10, b=1:10)
df3<-data.frame(id=1:10, c=1:10)
lf<-data.frame(x=1:3, y=1:3)
We can use ls() to get their names and mget() to return them in a list. Then we can use Map() to cbind() each data.frame in the list to each row of lf. This will return a new list with all the updated data.frames
Map(function(a,b) {row.names(b)<-NULL; cbind(a, b)} ,
mget(ls(pattern="^df\\d+")),
split(lf, 1:nrow(lf))
)
Given the lack of reproducible example it's hard to know exactly what you wanted. You should provide small input data sets and show the desired output. This would make it easier to test solutions.

Use a vector/index as a row name in a dataframe using rbind

I think I'm missing something super simple, but I seem to be unable to find a solution directly relating to what I need: I've got a data frame that has a letter as the row name and a two columns of numerical values. As part of a loop I'm running I create a new vector (from an index) that has both a letter and number (e.g. "f2") which I then need to be the name of a new row, then add two numbers next to it (based on some other section of code, but I'm fine with that). What I get instead is the name of the vector/index as the title of the row name, and I'm not sure if I'm missing a function of rbind or something else to make it easy.
Example code:
#Data frame and vector creation
row.names <- letters[1:5]
vector.1 <- c(1:5)
vector.2 <- c(2:6)
vector.3 <- letters[6:10]
data.frame <- data.frame(vector.1,vector.2)
rownames(data.frame) <- row.names
data.frame
index.vector <- "f2"
#what I want the data frame to look like with the new row
data.frame <- rbind(data.frame, "f2" = c(6,11))
data.frame
#what the data frame looks like when I attempt to use a vector as a row name
data.frame <- rbind(data.frame, index.vector = c(6,11))
data.frame
#"why" I can't just type "f" every time
index.vector2 = paste(index.vector, "2", sep="")
data.frame <- rbind(data.frame, index.vector2 = c(6,11))
data.frame
In my loop the "index.vector" is a random sample, hence where I can't just write the letter/number in as a row name, so need to be able to create the row name from a vector or from the index of the sample.
The loop runs and a random number of new rows will be created, so I can't specify what number the row is that needs a new name - unless there's a way to just do it for the newest or bottom row every time.
Any help would be appreciated!
Not elegant, but works:
new_row <- data.frame(setNames(list(6, 11), colnames(data.frame)), row.names = paste(index.vector, "2", sep=""))
data.frame <- rbind(data.frame, new_row)
data.frame
# vector.1 vector.2
# a 1 2
# b 2 3
# c 3 4
# d 4 5
# e 5 6
# f22 6 11
I Understood the problem , but not able to resolve the issue. Hence, suggesting an alternative way to achieve the same
Alternate solution: append your row labels after the data binding in your loop and then assign the row names to your dataframe at the end .
#Data frame and vector creation
row.names <- letters[1:5]
vector.1 <- c(1:5)
vector.2 <- c(2:6)
vector.3 <- letters[6:10]
data.frame <- data.frame(vector.1,vector.2)
#loop starts
index.vector <- "f2"
data.frame <- rbind(data.frame,c(6,11))
row.names<-append(row.names,index.vector)
#loop ends
rownames(data.frame) <- row.names
data.frame
output:
vector.1 vector.2
a 1 2
b 2 3
c 3 4
d 4 5
e 5 6
f2 6 11
Hope this would be helpful.
If you manipulate the data frame with rbind, then the newest elements will always be at the "bottom" of your data frame. Hence you could also set a single row name by
rownnames(data.frame)[nrow(data.frame)] = "new_name"

How can I present only specific list as my output after using a condition on many other lists

Suppose I have huge number of lists each contains 3 rows ( I present here 3 of them) and I would like to get the name (List1,List2,etc..) of the list that have the minimum values per first and third rows out of the given 3 rows.In this case List3 is the answar (0.1948026 and 0.1125526 have the minimum values of all lists),How can I present only List3 as my output?
list1<-list(
0.3318594
,0.1296125
, 0.1262203)
list2<- list(
0.3654229
,0.1428565
,0.1552035)
list3<- list(
0.1948026
,0.1272514
,0.1125526)
data.table is probably going to be the fastest solution for this if you have lots of lists.
You could do:
library(data.table)
#add all in a list
the_lists <- list(list1, list2, list3)
Or it would probably be much better (if your lists are all in the global environment) to do the following as per #DavidArenburg 's comment:
#this will create a list with all lists in your global env
#that are named list1, list2, list3 etc.
the_lists <- mget(ls(pattern = "list.+"))
#create a data table ouf of them
#notice that every row represents a list here
all_lists <- rbindlist(the_lists)
#find the list with the minimum row
#which for this case means find the min location of each column
mins <- as.numeric(all_lists[, lapply(.SD, which.min)])
#> mins
#[1] 3 3 3
And then just use mins to retrieve the list you want.
For row 1 use:
> the_lists[mins[1]]
$list3
$list3[[1]]
[1] 0.1948026
$list3[[2]]
[1] 0.1272514
$list3[[3]]
[1] 0.1125526
and for row 3:
> the_lists[mins[3]]
$list3
$list3[[1]]
[1] 0.1948026
$list3[[2]]
[1] 0.1272514
$list3[[3]]
[1] 0.1125526
Using mget as suggested by #DavidArenburg the list names are created, and will be shown as above.
To get the value and the names:
> data.frame(min_loc = mins[c(1,3)], names = names(the_lists)[c(mins[c(1,3)])])
min_loc names
1 3 list3
2 3 list3
Your lists are defined in your global envrionment and not in a list .. which is a bad habit. Despite this, you can solve your problem this way:
# first catch your lists names in your envrionment
lnames = Filter(function(x) class(get(x))=='list', ls(pattern="list\\d+", env=globalenv()))
# gather values in the matrix - the colummn names will be the list names
m = sapply(lnames, get)
# to get the name of the list(s) with min value in 1st and 3rd position
colnames(m)[unique(apply(m[c(1,3),],1,which.min))]
#[1] "list3"
Try this:
# Collect lists
collection.list <- list("list1"=list1,"list2"=list2,"list3"=list3)
#Build data
matrix <- do.call(rbind,collection.list)
# Select columns
used.columns <- c(1,3)
# Find minimum value
min.ind <- which(matrix[,used.columns]==min(unlist(matrix[,used.columns])),arr.ind = TRUE)
# Find name
names(collection.list)[min.ind[,"row"]]
I think this should work,
common_list <- mapply(c, list1, list2, list3, SIMPLIFY=FALSE)
a <- lapply(mapply(c, list1, list2, list3, SIMPLIFY=FALSE), min)
b <- paste("list", unlist(lapply(mapply(c, list1, list2, list3, SIMPLIFY=FALSE), which.min)))
data.frame(Min_value = unlist(a), List = unlist(b))
# Min_value List
# 1 0.1948026 list 3
# 2 0.1272514 list 3
# 3 0.1125526 list 3
However, this gives minimum for every row.

Resources