I have two data frames Name_List and Fruits_List. I would like to produce the dataframe Output. Output in such a way that Name should repeat based on the values in Rows_to_repeat column. Please let me know if anyone has the solution.
Name=c("rohit","murali","partha")
Rows_to_repeat=c(6,3,1)
Fruits=c("Apple","Orange","Watermelon","Mango","Banana","Kiwi","Pomo","Dates","Muskmelon","Papaya")
Person_Got_Fruit=c("rohit","rohit","rohit","rohit","rohit","rohit","murali","murali","murali","partha")
Name_List=data.frame(Name,Rows_to_repeat)
Fruits_List=data.frame(Fruits)
Output=data.frame(Fruits,Person_Got_Fruit)
This is the Solution which is working for me
Person_Got_Fruit=data.frame(rep(Name_List$Name, Name_List$Rows_to_repeat))
Output=cbind(Fruits_List,Person_Got_Fruit)
Related
I have looked at similar answers but none of them quite answer my task.
I have found a very messy answer to my question but would like advice as to whether there is a simpler way.
I have a file list of many tables that I want to import into R and append columns to an empty df.
The rownames or column 1 will be the same for each imported table/df but the number of columns (sample_ids) will change.
At the moment I create a vector outside the loop and name it with the row names that I know won't change. Then I loop through the dfs and do a left_join using the same col name
Something like this:
final_df<-c(the row names that I want to extract)
names(final_df) <- "Sample_ID"
for (i in 1:length(files)){
my_df<-read_tsv(files[i])
# get the table specific sample names
my_sn <- my_df[15,-c(1:3)]
# get the rows I want to extract
my_df<-filter(row names I want to extract)
names(my_df)<-c("Sample_ID", my_sn)
final_df<-left_join(final_df, my_df, by="Sample_ID")
}
I'm thinking there must be a more elegant way.
I have a dataset that is like this: list
df
200000
5666666
This dataset continues to 5551
Another dataset has also 5551 observations. I want to merge list dataset with another dataset. But no variable is the same. Just row names are the same.
I gave that
merge(list,df,by="rownames")
The error message is that it should have a valid column name
I tried also merge_all but not work
It is not working? Could someone please help
It's good practice to be more precise with the naming of your dataframe variables. I wouldn't use list but something like df_description. Either way, merging by rownames can be achieved by using by = "row.names" or by = 0. You can read more on merge() in the documentation (under "Details").
I have a data frame with a column of strings that are the question body of a survey, then I have a separate data frame with those question bodies matched two a question number. I want to traverse the original data frame's column and check if the value matches any within the other data frame and if does I want to store the associated question number in a column in the original df. I am having a lot of trouble figuring this out, I have looked into using apply() or something like that but I can't quite get it. Any help would be greatly appreciated.
If df is the first dataframe and df2 the second and Q is the name of the question strings column, then:
library (dplyr)
left_join(df1, df2, by=question_body) %>% select(-question_body)
Of course, it would be easier to give you an accurate answer if you provided some actual examples of your data structure.
I need to have this data structure in R:
column = number (ID)
column = list of N numeric values
How to do this? I tried a lot of solutions, but I either have a problem with "replacement has 2 rows, data has 1" or it does not create a list as a column.
I believe this is possible.
Thanks in advance for your answers.
Ok, so the solution was to create a data matrix where each column had its own value.
what is the easiest way to extract information from a list embedded within a dataframe?
a<-data.frame(cyl=c(4,6,8),k=c("A","B","C"))
j<-by(data=mtcars,INDICES=mtcars$cyl,function(x) lm(mpg~disp,data=x))
a$l<-j
t(sapply(a$l,coef))->a$t
But this results in a matrix embedded within the dataframe and it needs some massaging in order to have it as two columns in a with their associated column names.
What I'd like is an easier method to extract this information and have it stored in dataframe a with the associated column names.
EDIT_ This is what I had in mind, but I just found the procedure somewhat cumbersome.
t(sapply(a$l,coef))->a$t
as.data.frame(a$t)->g
g$cyl<-as.numeric(rownames(g))
merge(x = a,y = g)->a2
a2[,-c(3,4)]->a3
Any simpler ways of doing this?
Now, to complicate matters- What If I´d like to get the residuals from a$l by cylinder.
sapply(a$l,function(x) x[['residuals']])->a$t
How can I generate a new dataframe in a long format with two columns: cyl and residual that later can be merged with the original dataframe a?
Well--see my previous edit for the first answer. This is for my second problem:
It does solve my problem, but I´m sure there must be a quicker and more intuitive way of solving this.
flat.list.df<-function(list,sublist){
nm<-names(list)
i<-do.call(rbind,lapply(nm,function(x){
u<-list[[x]][[sublist]]
g<-length(u)
j<-rep(x,g)
m<-data.frame(var=j,val=u)
m
})
)
return(i)
}
flat.list.df(a$l,"residuals")->w
w
merge(w,a,by.x="var",by.y="cyl")