join dataframes of unequal length, repeating values where appropriate [closed]

join dataframes of unequal length, repeating values where appropriate [closed] - r

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 7 years ago.
Improve this question
I am trying to join two tables of unequal lengths in R. Both share a column (LogsheetID) on which to join. The longer table has more than 1 value in the other columns for each value of shared column. The shorter table has one value in columns (e.g. Date, VesselID) for each LogsheetID. In the joined table I want the values in the columns from short table be repeated according to the way LogsheetID is repeated in the long table. Tried left_join but values in joined columns from short table are NA

this should work
merge(tableX, tableY, by="colName", all=T)

Related

r function for counting specific row of 2 columns [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 1 year ago.
Improve this question
I need help how to compare 2 columns in r studio but only in the first 5 observations
code used first was
damgeByage<-table(test$Vehicle_Age,test$Vehicle_Damage)
damgeByage
which give me all 186 observation but only want the first 10 oobservation of both columns
I

We can subset the data with row index
table(test[1:10, c("Vehicle_Age", "Vehicle_Damage")])

akrun's answer works or you can also use:
table(test$Vehicle_Age[1:10],test$Vehicle_Damage[1:10])
the table is also subsettable if you just wanted the first 10 rows of the results:
damgeByage<-table(test$Vehicle_Age,test$Vehicle_Damage)
damgeByage[1:10,]

undefined columns selected problem when trying to run this code [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 2 years ago.
Improve this question
I am trying to use the air dataset from the dummies package.
I tried:
library(dummies)
dumair<-air[c(5:18)]
but that throws the following error:
Error in `[.data.frame`(air, c(5:18)) : undefined columns selected
How can I overcome this?

You need to specify which columns you want to select. When selecting rows and columns from a dataframe, you need to specify them like this:
df[rows, columns]
If you need all columns, leave the "columns" field empty. Same if you want all rows but only some columns!
I believe you want to select all columns, but only the rows from 5 to 18, right?
So doing:
dumair <- air[c(5:18), ]
Should work!

Is there an easy way to order a large number of columns without using dplyr::select() in r? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
I'm working with a very large dataset with lots of columns (400+) and every time I create a new variable or add a new one I have to reorder it. I want it ordered so that all the related variables remain together so I've been using dplyr::select() to reorder things. Yet there are times when I have to go back into my script very early on and add a new variable. When I run the whole code after that, there tends to be one or two variables I forgot to put into preceding select() functions so it goes missing.
I use select() because selecting all the columns between two variables and referencing them by name is super easy (eg, Vfour:Vthreefifty). Do you have any tips for reordering datasets with lots of columns?

Given no reproducible example but using your 2 column names:
df %>%
select(., starts_with('V'))
You can then chain starts_with as needed.
Other options include:
ends_with, contains, matches

How do you go about organizing 2 dataframes based on common row values? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 3 years ago.
Improve this question
I have 2 dataframe objects with common transcript names. The names are out of order for each other, and as such I am trying to index into them and pull out the rows based off from this first column of transcript names to organize the data. I want to retain all of the different values in the other columns, but just reorder the data based off from the indices. I am trying to do this in R.
In MATLAB I can do this by using intersect to find the indices.

Sounds like you want to merge on transcript name. As in df.new <- merge(df.1,df.2,by="transcript.name"). This will merge the observations (rows) that are common to both data frames. If you want to retain all the observations from the first data frame (df.1), even if they're not in df.2, then include all.x=TRUE.

A few questions about data.table in r [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
So I went over this tutorial, and have a few questions:
What is the exact meaning of "columns within the frame of a data.table are seen as if they are variables"?
Is there a particular meaning for the "L" after 6 in month==6L? (in the data table its only 6 not 6L).
I understand how to calculate mean for every column by something, but what if I simply want to calculate the mean of each column (assuming that I have many columns so I don't want to write all the names).
Thanks!

expanding the quote: "you don’t have to use DT$ repetitively since columns within the frame of a data.table are seen as if they are variables" referring to variables within a data.table, is like using the with function, which minimizes typing and may make lines more readable.
"L" is an R marker that says treat the preceding number as an integer (not a numeric (double)).
use the .SD method, for example to get the sum of all variables by variable byVariable in data.table dt:
myDT <- dt[, lapply(.SD, sum), by="byVariable"]