I have a list with ten data frames with different number of columns and I would like to get a list with ten vectors. Each vector would be the last column from each data frame.
If I want to get the last column from the first data frame, I run:
lapply(list_with_df,function(x) x[,lengths(list_with_df)[1]])
But I have got the same column number for each data frame.
I have tried to do a "loop for" through the ten data frames, but I have got an error. I appreciate if someone could help me with this matter. Regards.
Instead of using the lengths you can ask of the number of columns by ncol. This should work:
lapply(list_with_df,function(x) x[,ncol(x)])
Edit
Just for some clarification: The reason why you got the same column number for each data frame is because you have always selected the column number according to the first element of lengths vector by using lengths(list_with_df)[1]. It was always the length of the first data.frame
Related
I am trying to work with the below data frame and what I am trying to do is to compare the values from columns:
#create a sample data frame
df<-data.frame(
item=c("a","b","c","d"),
price_today=c(1,2,3,4,5),
price_yesterday=c(1,2,3,4,5)
If values from column price_today are the same compared with values from column price_yesterday, then print okay or else print not okay, and the printed result will be shown as a new variable in the data frame.
May I know how should I go about the ifelse part here?
Many thanks for your help and have a good day.
Modified Questions:
Hi all, so now what if the df becomes like this:
#create a sample data frame (modified)
df<-data.frame(
item=c("a","a","c","d"),
price_today=c(1,"",3,"XYZ",5),
price_yesterday=c(1,2,3,4,5)
Now it contains both blank value and non-numerical values in column price_today. And instead of a,b,c,d in column item, it becomes a,a,c,d in column item. I have been trying to do the following:
Sort column item by "a" and I have the below code:
df_1<-df[df$item=="a",]
After df_1 is filtered, then again sort price_today, by removing blank and non-numerical values with codes below:
df_1<-df[!is.numeric(df_1$price_today),]
I am able to filter out by "a" in column item, however, with the second filter, it then returns with the original df, may I know what did I do wrong here?
Million thanks for your help and have a good day/night.
I want to order the "StarWars" data frame based on the number of film appearances. As you can see the "film" column contains a vector with the films. Which is the best approach to do this?
library(tidyverse)
starwars %>%
arrange(-lengths(films))
lengths gives you the number of elements in a list. Since the films column is a list column, this function helps.
And once we have the number of elements per row, we can simply sort/arrange by that information.
I have this function "get_animals" that retrieves data for several specimens of different species of animals. It works by giving a vector with several species names, and it retrieves the data regarding those species (location, dna sequences ...). The thing is that the data base I'm using can't handle a query with too many species names in a single line of code, so I'm trying to use lapply to get one by one.
I tried this:
species_list<-as.list(as.character(unique(df$species_name)))
e<-lapply(species_list, function (x) get_animals(animal_names=x))
The thing is that the lapply returns a series of data frames with too many columns for each species name in "species_list", and what I wanted was only two columns from each data frame, and then I aimed to fuse all those data frame in a single one.
I tried to unlist the result from the lapply function:
e<-unlist(e)
But it didn't work because it just returned all the occurences for the first column of each data frame.
Thanks in advance for any answers
If we need to subset the columns, use either the column index
lapply(species_list, function (x) get_animals(animal_names=x)[c(1, 5)])
Or column name
lapply(species_list, function (x)
get_animals(animal_names=x)[c("species_name", "location")])
I'm trying to convert a large list (220559 elements) into a data frame. Each element is either chr (RT) or chr(0)
I tried:
data.frame(t(sapply(my.list, c)))
I got the data frame, but it turned out to be one observation with 220559 variables instead of one variable with 220559 observations.
Is there an easy way to switch the observations with the variables? Or do I have to create the data frame differently? I'm new to R and really looking forward to your help.
So you have a giant list where is element is either the character "RT" or is it an empty character vector (character(0)). And you want to turn this into a data frame with one row and one column for each item in the list (220559 columns).
The problem is that data.frames like all columns to have the same number of observations (rows). And length("RT")==1 while length(character(0))==0. So you can either drop those columns, or convert those values to NA. I'm going to assume the latter for my example.
# "large" list
xx<-sample(list(character(), "RT"), 1000, replace=T)
#make into data.frame
df<-data.frame(lapply(xx, function(x) if(length(x)==0) NA else x))
#add nicer names
names(df)<-paste0("V",seq_along(df))
That's it. Normally to turn a list into a data.frame you just call data.frame(). It was just a bit trickier because of your zero-length vectors.
I am having a problem... I have two data. frames with a lot of columns and these two data.frames are of different length, in fact one has many rows and second data.frame has only one row.... But in both data frames there are columns of same names. Now, I want to multiply the matching columns with each other. I fail to solve it. Please help me.
The command
mapply("*", DataFrame1, DataFrame2)
should work if you want to multiply all columns. If the relevant columns are only a subset of all columns in the data frames, we first need to identify the columns being present in both data frames.
mapply("*", DataFrame1[intersect(names(DataFrame1), names(DataFrame2))],
DataFrame2[intersect(names(DataFrame1), names(DataFrame2))])