I have a list of dataframes. Each dataframe has 6 rows. I want to create 6 boxplots. The first boxplot should take the values of the first row of the first column. The second boxplot should take the values of the second row of the first column, etc.
I want to end up with something like this: example image
Each row should be one boxplot on the horizontal axis.
Right now I have started to do it in a loop, but I think this is not the way to go:
for (counter in seq(from = 1, to = wins)) {
res <- (lapply(mylist, function(x) x[counter,1]))
boxplot(res)
}
The variable mylist contains the dataframes. I already use lapply to get the first/second/etc. row elements over all dataframes according to the counter variable. However, I think I have to also avoid the loop, but this would need a 'better' lapply which also loops over the rows of the dataframes in mylist.
Maybe not the one liner you want but this works for me
# Add a column to each data frame with the row index
for (i in seq_along(mylist)) {
mylist[[i]]$rowID <- 1:nrow(mylist[[i]])
}
# Stick all the data frames into one single data frame
allData <- do.call(rbind, mylist)
# Split the first column based on rowID
boxList <- split(allData[,1], allData$rowID)
# boxplot likes a list
boxplot(boxList)
Related
My column titles are not correct. I want to rename all my column of a matrix (because i have v1,v2,v3,..) according to data frame (the name of the first column corresponds to the first title of my data frame). I have to repeat this for my 39 columns. So the goal would be to do a for-loop.
df1 is the matrix that has to be changed.
for (i in 1:39) {
names(df1[,i]) <- names(dfnorm[,i])
}
This code is not working.
If I understand correctly, this should work:
names(df1)[1:39] <- names(dfnorm)[1:39]
I have two dataframes
dataframe 1 has around million rows.. and its has two columns named 'row' and 'columns' that has the index of row and column of another dataframe (i.e. dataframe 2)..
i want to extract the values from dataframe 2 with the indexes stated in the columns named 'row' and 'columns' for each row in dataframe1.
I used a simple for loop to get the solution but it is time consuming and takes around 9 minutes, is there any other way with functions in R to solve this problem?
for(i in 1:nrow(datafram1)) {
dataframe1$value[i] = dataframe2[dataframe1$row[i],dataframe1$columns[i]]
}
You actually don't need a for loop to do this. Just add the new column to the Data Frame using the row and column names:
DataFrame1$value <- DataFrame2[DataFrame1$row, DataFrame1$column]
This should work a lot faster. If you wanted to try it a different way you could try adding the values to a new vector and then using cbind to join the vector to the Data Frame. The fact that you're trying to update the whole Data Frame during the loop is most likely what's slowing it down.
Maybe you can try the code below
dataframe1$value <- dataframe2[as.matrix(dataframe1[c("row","columns")])]
Sionce your loop only consider the rows in df1, you can cut the surplus roes on df2 and then use cbind:
dataframe2 <- dataframe2[nrow(dataframe1),]
df3 <- cbind(dataframe1, dataframe2)
I have 68 data files- all with the same identifiers-but with different indicators. I converted these individual files into a list with each data frame as a separate element.
The first row of every data frame is a year, which I would like to paste to the column name. I want to be able to separate it by "_".
For example, right now the column name is Arbeitslose, and the row under it has 2018. I would like the column name to become Arbeitslose_2018.
I know how to do this on a single data frame. The code I used is below.
RAW_2[1,] <- as.character(RAW_2[1,]) # Converting the fist row to a character.
colnames(RAW_2) <- paste(colnames(RAW_2),RAW_2[1, ], sep = "_") # Paste Year (Row 2) and columnname
RAW_2 <- RAW_2[rownames(RAW_2) != 1, ] # Drop 1st row which is the years - now abundant
but I dont know how to do this for a list.
I cannot merge the data frames into a single one, because the column names are not unique. I would need to do this step for me to be able to merge it into a data set and proceed. I'm forced to work with lists, something I am horrible with.
Is there an easy way to do this? I am quite lost on how to proceed.
You can use lapply()
rename_col <- function(x){
colnames(x) <- paste0(colnames(x),x[1,],sep="_")
x[-1,]
}
#df_list as your list of data.frames
lapply(df_list,rename_col)
I try to combine each columns of three different dataframes to get an object with the same length of the original dataframe and three columns of every subobject. Each of the original dataframe has 10 columns and 14 rows.
I tried it with a for-loop, but the result is not usable for me.
t <- NULL
for(i in 1 : length(net)) {
a <- cbind(imp.qua.00.09[i], exp.qua.00.09[i], net[i])
t <- list(t, a)
}
t
But in the end I would like to get 10 seperated dataframes with three columns.
So I want to loop through this:
a <- cbind(imp.qua.00.09[i], exp.qua.00.09[i], net[i])
for every column of each original dataframe. But if I use t <- list(t, a) it constructs a crazy list. Thanks.
The code you're using to append elements to t is wrong, you should do in this way:
t <- list()
for(i in 1:length(net)) {
a <- cbind(imp.qua.00.09[i], exp.qua.00.09[i], net[i])
t[[length(t)+1]] <- a
}
t
Your code is wrong since at each step, you transform t into a list where the first element is the previous t (that is a list, except for the first iteration), and the second element is the subset. So basically in the end you're getting a sort of recursive list composed by two elements where the second one is the data.frame subset and the first is again a list of two elements with the same structure, for ten levels.
Anyway, your code is equivalent to this one-liner (that is probably more efficient since it does not perform any list concatenation):
t <- lapply(1:length(net),
function(i){cbind(imp.qua.00.09[i], exp.qua.00.09[i], net[i])})
This should work:
do.call(cbind,list(imp.qua.00.09, exp.qua.00.09, net))
I am cleaning several excel files in R. They unfortunately are of unequal dimensions, rows and columns. Currently I am storing each excel sheet as a data frame in a list. I know how to print the 4th row of the first data frame in a list by issuing this command:
df.list1[[1]][4,]
Or a range of rows like this:
df.list1[[1]][1:10,]
My question is: How do I print a particular row for every data frame in the list? In other words:
df.list1[[i]][4,]
df.list1 has 30 data frames in it, but my other df.lists have over 140 data frames that I am looking to extract their rows. I'd like to be able to store particular locations across several data frames into a new list. I'm thinking the solution might involve lapply.
Furthermore, is there a way to extract rows in every data frame in a list based on a condition? For example, for all 30 data frames in the list df.list1, extract the row if the value is equal to "Apartment" or some other string of characters.
Appreciate your help, please let me know if I can help clarify my problem.
You could also just directly lapply the extraction function #Justin suggests, e.g.:
# example data of a list containing 10 data frames:
test <- replicate(10,data.frame(a=1:10),simplify=FALSE)
# extract the fourth row of each one - setting drop=FALSE means you get a
# data frame returned even if only one vector/column needs to be returned.
lapply(test,"[",4,,drop=FALSE)
The format is:
lapply(listname,"[",rows.to.return,cols.to.return,drop=FALSE)
# the example returns the fourth row only from each data frame
#[[1]]
# a
#4 4
#
#[[2]]
# a
#4 4
# etc...
To generalise this when you are completing an extraction based on a condition, you would have to change it up a little to something like the below example extracting all rows where a in each data.frame is >4. In this case, using an anonymous function is probably the clearest method, e.g.:
lapply(test, function(x) with(x,x[a>4,,drop=FALSE]) )
#[[1]]
# a
#5 5
#6 6
#7 7
#8 8
#9 9
#10 10
# etc...
There is no need for a wrapper function, just use lapply and pass it a blank argument at the end (to represent the columns)
lapply(df.list, `[`, 4, )
This also works with any type of row argument that you would normally use in myDF[ . , ] eg: lapply(df.list,[, c(2, 4:6), )
.
I would suggest that if you are going to use a wrapper function, have it work more like [ does: eg
Grab(df.list, 2:3, 1:5) would select the second & third row and first through 5th column of every data.frame and
Grab (df.list, 2:3) would select the second & third row of all columns
Grab <- function(ll, rows, cols) {
if (missing(cols))
lapply(ll, `[`, rows, )
else
lapply(ll, `[`, rows, cols)
}
Grab (df.list, 2:3)
My suggestion is to write a function that does what you want on a single data frame:
myfun <- function(dat) {
return(dat[4, , drop=FALSE])
}
If you want to return as a vector instead of data.frame, just do: return(dat[4, ]) insteaad. Then use lapply to apply that function to each element of your list:
lapply(df.list1, myfun)
With that technique, you can easily come up with ways to extend myfun to more complex functions...
For example, you have a .csv file called hw1_data.csv and you want to retrieve the 47th row. Here is how to do that:
x<-read.csv("hw1_data.csv")
x[47,]
If it is a text file you can use read.table.