I have a named list whose each element is a character vector. I want to write this list into a single dataframe where I have two columns, one with the name of the character vector and the second column with each element of the character vector. Any help would be appreciated.
NewList <- lapply(names(List), function(X) data.frame(Names=X, Characters=List[[X]]))
do.call(rbind, NewList)
Maybe
data.frame(vecname = rep(names(ll), sapply(ll, length)), chars = unlist(ll))
to have each element of each list component correspond to a row in the final dataframe.
I'm wondering if stack provides the functions you need (using the example of Henrik)
ll <- list(x1 = c("a", "b", "c"), x2 = c("d", "e"))
stack(ll)
#-------
values ind
1 a x1
2 b x1
3 c x1
4 d x2
5 e x2
A very straightforward way would be to use cbind(), like this:
cbind(names(l),l)
This will result in the following data frame, assuming that l = list(a="ax", b="bx"):
l
a "a" "ax"
b "b" "bx"
Of course, you can rename the columns and rows by adjusting the values in colnames(l) and rownames(l). In this example, the string names are automatically also applied to the rownames of the resulting data frame, so, depending on what you'd like to do with your data,
cbind(l)
might suffice, resulting in
l
a "ax"
b "bx"
Hope I could help.
Related
lets take an example dataframe with removal of variable columns:
frame <- data.frame("a" = 1:5, "b" = 2:6, "c" = 3:7, "d" = 4:8)
rem <- readline()
frame <- subset(frame, select = -c(rem))
How do I get the variable column to be removed? This is not my real code, just wanted to present my problem in a simple code. Thanks!
Edit: I am so sorry, I am really sleepy and don't know what I typed into my code, I edited it now.
1) Do both at once. We assume that ix contains at least one column number.
ix <- 1:2
frame[-ix]
## c d
## 1 3 4
## 2 4 5
## 3 5 6
## 4 6 7
## 5 7 8
1a) or if the case where ix is zero length, ix <- c(), is important we can do this. The output of this and all the rest are the same as for (1) so we won't repeat the output.
ix <- 1:2
frame[setdiff(seq_along(frame), ix)]
1b) or if we have names rather than column numbers. This works even if nms is a zero length vector in which case it returns the original data frame.
nms <- c("a", "b")
frame[setdiff(names(frame), nms)]
2) or if you need to do it iteratively remove the largest one first because if it were done in ascending order then after the first one is removed the second column is no longer the second but is the first. If we knew that ix is already sorted we could omit the sort. We have used frame_out to hold the result so that the input is not destroyed. This works even if ix is the empty vector.
ix <- 1:2
frame_out <- frame
for(i in rev(sort(ix))) frame_out <- frame_out[-i]
frame_out
3) One way to do it independent of order is to do it by name. In this case it would be possible to remove them in ascending order. This works even if ix the empty vector.
ix <- 1:2
nms <- names(frame)[ix]
frame_out <- frame
for(nm in nms) frame_out <- frame_out[-match(nm, names(frame_out))]
frame_out
I'm trying to add different suffixes to my data frames so that I can distinguish them after I've merge them. I have my data frames in a list and created a vector for the suffixes but so far I have not been successful.
data2016 is the list containing my 7 data frames
new_names <- c("june2016", "july2016", "aug2016", "sep2016", "oct2016", "nov2016", "dec2016")
data2016v2 <- lapply(data2016, paste(colnames(data2016)), new_names)
Your query is not quite clear. Therefore two solutions.
The beginning is the same for either solution. Suppose you have these four dataframes:
df1x <- data.frame(v1 = rnorm(50),
v2 = runif(50))
df2x <- data.frame(v3 = rnorm(60),
v4 = runif(60))
df3x <- data.frame(v1 = rnorm(50),
v2 = runif(50))
df4x <- data.frame(v3 = rnorm(60),
v4 = runif(60))
Suppose further you assemble them in a list, something akin to your data2016using mgetand ls and describing a pattern to match them:
my_list <- mget(ls(pattern = "^df\\d+x$"))
The names of the dataframes in this list are the following:
names(my_list)
[1] "df1x" "df2x" "df3x" "df4x"
Solution 1:
Suppose you want to change the names of the dataframes thus:
new_names <- c("june2016", "july2016","aug2016", "sep2016")
Then you can simply assign new_namesto names(my_list):
names(my_list) <- new_names
And the result is:
names(my_list)
[1] "june2016" "july2016" "aug2016" "sep2016"
Solution 2:
You want to add the new_names literally as suffixes to the 'old' names, in which case you would use pasteor paste0 thus:
names(my_list) <- paste0(names(my_list), "_", new_names)
And the result is:
names(my_list)
[1] "df1x_june2016" "df2x_july2016" "df3x_aug2016" "df4x_sep2016"
You could use an index number within lapply to reference both the list and your vector of suffixes. Because there are a couple steps, I'll wrap the process in a function(). (Called an anonymous function because we aren't assigning a name to it.)
data2016v2 <- lapply(1:7, function(i) {
this_data <- data2016[[i]] # Double brackets for a list
names(this_data) <- paste0(names(this_data), new_names[i]) # Single bracket for vector
this_data # The renamed data frame to be placed into data2016v2
})
Notice in the paste0() line we are recycling the term in new_names[i], so for example if new_names[i] is "june2016" and your first data.frame has columns "A", "B", and "C" then it would give you this:
> paste0(c("A", "B", "C"), "june2016")
[1] "Ajune2016" "Bjune2016" "Cjune2016"
(You may want to add an underscore in there?)
As an aside, it sounds like you might be better served by adding the "june2016" as a column in your data (like say a variable named month with "june2016" as the value in each row) and combining your data using something like bind_rows() from the dplyr package, running it "long" instead of "wide".
I have one data frame with column name as below
colnames(Data)
[1] "ID" "A" "B" "C" "D" "E" "F" "G"
I wanted to select all columns ahead of column D
Currently there are column E, F and G. but I might expect few more column which I am not sure, also I might expect few more columns before D as well , so I am not sure about at which location column D will be available
Is there any subset command in R we can use? Like below
Datanew <- subset(Data,select=c("D","E","F","G"))
Please advice.
Find which column is D and select all the following columns (using ncol):
columnToSelect <- which(names(Data) == "D"):ncol(Data)
Datanew <- subset(Data, select = columnToSelect)
You can use tail to get the last n names of the data frame once you find where column D is. We can utilize it like this
tail(1:5, 3) # return the last three elements
The following is equivalent
tail(1:5, -2) # don't return the first two elements
If we use which to find column D
columnToSelect <- which(names(Data) == "D")
We can use tail to get all of the columns from D and following.
tail(names(Data), -(columnToSelect - 1))
The column selection, then, can be wrapped up in one neat little call
Data[tail(names(Data), -(which(names(Data) == "D") - 1))]
A fully reproducible example:
Data <-
lapply(LETTERS[1:10],
function(l){
x <- data.frame(l = rnorm(10))
names(x) <- l
x
})
Data <- as.data.frame(Data)
Data[tail(names(Data), -(which(names(Data) == "D") - 1))]
merger <- cbind(as.character(Date),weather1$High,weather1$Low,weather1$Avg..High,weather1$Avg.Low,sale$Scanned.Movement[a])
After cbind the data, the new DF has column names automatically V1, V2......
I want rename the column by
colnames(merger)[,1] <- "Date"
but failed. And when I use merger$V1 ,
Error in merger$V1 : $ operator is invalid for atomic vectors
You can also name columns directly in the cbind call, e.g.
cbind(date=c(0,1), high=c(2,3))
Output:
date high
[1,] 0 2
[2,] 1 3
Try:
colnames(merger)[1] <- "Date"
Example
Here is a simple example:
a <- 1:10
b <- cbind(a, a, a)
colnames(b)
# change the first one
colnames(b)[1] <- "abc"
# change all colnames
colnames(b) <- c("aa", "bb", "cc")
you gave the following example in your question:
colnames(merger)[,1]<-"Date"
the problem is the comma: colnames() returns a vector, not a matrix, so the solution is:
colnames(merger)[1]<-"Date"
If you pass only vectors to cbind() it creates a matrix, not a dataframe. Read ?data.frame.
A way of producing a data.frame and being able to do this in one line is to coerce all matrices/data frames passed to cbind into a data.frame while setting the column names attribute using setNames:
a = matrix(rnorm(10), ncol = 2)
b = matrix(runif(10), ncol = 2)
cbind(setNames(data.frame(a), c('n1', 'n2')),
setNames(data.frame(b), c('u1', 'u2')))
which produces:
n1 n2 u1 u2
1 -0.2731750 0.5030773 0.01538194 0.3775269
2 0.5177542 0.6550924 0.04871646 0.4683186
3 -1.1419802 1.0896945 0.57212043 0.9317578
4 0.6965895 1.6973815 0.36124709 0.2882133
5 0.9062591 1.0625280 0.28034347 0.7517128
Unfortunately, there is no setColNames function analogous to setNames for data frames that returns the matrix after the column names, however, there is nothing to stop you from adapting the code of setNames to produce one:
setColNames <- function (object = nm, nm) {
colnames(object) <- nm
object
}
See this answer, the magrittr package contains functions for this.
If you offer cbind a set of arguments all of whom are vectors, you will get not a dataframe, but rather a matrix, in this case an all character matrix. They have different features. You can get a dataframe if some of your arguments remain dataframes, Try:
merger <- cbind(Date =as.character(Date),
weather1[ , c("High", "Low", "Avg..High", "Avg.Low")] ,
ScnMov =sale$Scanned.Movement[a] )
It's easy just add the name which you want to use in quotes before adding
vector
a_matrix <- cbind(b_matrix,'Name-Change'= c_vector)
I couldn't find a solution for this problem online, as simple as it seems.
Here's it is:
#Construct test dataframe
tf <- data.frame(1:3,4:6,c("A","A","A"))
#Try the apply function I'm trying to use
test <- apply(tf,2,function(x) if(is.numeric(x)) mean(x) else unique(x)[1])
#Look at the output--all columns treated as character columns...
test
#Look at the format of the original data--the first two columns are integers.
str(tf)
In general terms, I want to differentiate what function I apply over a row/column based on what type of data that row/column contains.
Here, I want a simple mean if the column is numeric and the first unique value if the column is a character column. As you can see, apply treats all columns as characters the way I've written this function.
Just write a specialised function and put it within sapply... don't use apply(dtf, 2, fun). Besides, your character ain't so characterish as you may think - run getOption("stringsAsFactors") and see for yourself.
sapply(tf, class)
X1.3 X4.6 c..A....A....A..
"integer" "integer" "factor"
sapply(tf, storage.mode)
X1.3 X4.6 c..A....A....A..
"integer" "integer" "integer"
EDIT
Or even better - use lapply:
fn <- function(x) {
if(is.numeric(x) & !is.factor(x)) {
mean(x)
} else if (is.character(x)) {
unique(x)[1]
} else if (is.factor(x)) {
as.character(x)[1]
}
}
dtf <- data.frame(a = 1:3, b = 4:6, c = rep("A", 3), stringsAsFactors = FALSE)
dtf2 <- data.frame(a = 1:3, b = 4:6, c = rep("A", 3), stringsAsFactors = TRUE)
as.data.frame(lapply(dtf, fn))
a b c
1 2 5 A
as.data.frame(lapply(dtf2, fn))
a b c
1 2 5 A
I find the numcolwise and catcolwise functions from the plyr package useful here, for a syntactically simple solution:
First let's name the columns, to avoid ugly column names when doing the aggregation:
tf <- data.frame(a = 1:3,b=4:6, d = c("A","A","A"))
Then you get your desired result with this one-liner:
> cbind(numcolwise(mean)(tf), catcolwise( function(z) unique(z)[1] )(tf))
a b d
1 2 5 A
Explanation: numcolwise(f) converts its argument ( in this case f is the mean function ) into a function that takes a data-frame and applies f only to the numeric columns of the data-frame. Similarly the catcolwise converts its function argument to a function that operates only on the categorical columns.
You want to use lapply() or sapply(), not apply(). A data.frame is a list under the hood, which apply will try to convert to a matrix before doing anything. Since at least one column in your data frame is character, every other column also gets coerced to character in forming that matrix.