I have two data frames with totally different column names and values.
Example :
Data Frame 1 ->
company value
A 10
B 11
A 9
Data Frame 2 ->
id value2
Q 7
W 8
E 9
This question has several parts that I want to achieve:
Extract the unique values of COMPANY column from
data frame 1 based on the COMPANY column(Unique companies)
Copy the unique values obtained above into a NEW
COLUMN in Data Frame 2 RANDOMLY (only company field)
Merge the two data frames based on the unique value
column.(This is only for testing, hence why I need this step)
All help is appreciated!!
Thank you in advance.
You could try something like this:
company <- unique(df1$company)
df2$new_column <- sample(company, nrow(df2), replace = TRUE)
Related
Complete R novice here.
I have wide form data frame which includes a vector/variable for participant_number, with each participant providing two responses (score), with a within-subjects manipulation (code).
enter image description here
However, I have three separate sets of values which corresponded to the participant numbers in three different (between subjects) experimental groups (e.g. control, active_1, active_2).
enter image description here
How can I use these sets of values to create a variable in my main data frame which indicates what experimental group the participant belongs to?
Any help, much appreciated.
The package "dplyr" is quite useful for these kind of things. Let's consider a small working example
df <- data.frame(ID=c(1:7))
ListActive1 <- c(1,3)
ListActive2 <- c(2,5)
ListControl <- c(4,7,6)
df is the main data frame containing the ID of the participant (and of course it may have further columns, e.g. the score etc.) The three vectors contain for each group the IDs of the participants belonging to this particular group, e.g. the participants with ID 2 and 5 belong to the group "Active2".
Now we create a new column in the main data frame using the command mutate which comes with the dplyr package (make sure to install and load it).
df <- mutate(df,group=case_when(
ID %in% ListActive1 ~ "Active1",
ID %in% ListActive2 ~ "Active2",
ID %in% ListControl ~ "Control"))
The command case_when checks for each participant in which of the lists the ID appears and then puts the corresponding label in the new column group.
ID group
1 1 Active1
2 2 Active2
3 3 Active1
4 4 Control
5 5 Active2
6 6 Control
7 7 Control
I'm beginning for r programming language, help me how to add list of value in data frame columns.
my expected data frame will be.
U_ID Value
1 list(`First`="ty",'Second'="89")
2 list(`First`= c("20","10","40"),`Second`="user")
3 list(`First`="vendor",`Second`="yu",`Four`=list(list(`ty`="78",'pt'="kkkpp")))
4 NULL
5 list(`First`="client")
So I get that the title is terrible and generic like. I have no idea how to concisely describe what I am trying to do.
I've got a 2 column data frame in R, column A has data values, column B had data that has now been binned (was year associated with Column A, now is a bin label based on year ranges).
I need to generate a new data frame which uses the bin labels as columns with the associated data values as row entries, preferably sorted, back-filled with 'NA' to prevent columns of different lengths.
Sample data:
df <- data.frame(values=c(1,NA,3,NA,5:6,7:9),
bins=rep(c("yr1_yr2","yr2_yr3","yr3_yr4"),each=3))
SOLUTION EDIT: So after a lot of experimentation I was able to do what I wanted with my data by using the 'cut_width' function from ggplot2 to slice my data into bins then plop it in a distribution graph.
Thank you all for your attempts, sorry again for the vague question and lack of sample data.
Not quite sure if this is getting close to what you want...
library(tidyverse)
reshape2::melt(df, id.vars='bins', measure.vars='values')
returns
bins variable value
1 yr1_yr2 values 1
2 yr1_yr2 values NA
3 yr1_yr2 values 3
4 yr2_yr3 values NA
5 yr2_yr3 values 5
6 yr2_yr3 values 6
7 yr3_yr4 values 7
8 yr3_yr4 values 8
9 yr3_yr4 values 9
I'm trying to figure out how can I add something to a data frame df, based on a variable (i.e. a date), ending up with a data frame named df_17 if variable is equal to 2017 for example.
The reason why I want this is because I'm importing datasets from several years and quarters, and I would like to make sure that they are named according to the year variable they have. Each dataset only has 1 date. I know I can do it manually but it would take me less time to automate it.
I know how to do it with columns and rows, but I can't figure it out for objects.
EDIT:
Example 1:
Data frame name "df"
A B Date
1 4 2017
2 3 2017
New data frame name "df_2017"
Example 2:
Data frame name "df"
A B Date
1 4 2016
2 3 2016
New data frame name - "df_2016 "
The assign function should do what you want. A solution could look like
assign(paste0("df_", year), dataframe_read_from_file, pos = 1)
If you use assign inside a function oder a loop, make sure that you set the pos option correctly.
I created a data frame from a data set with unique marketing sources. Let's say I have 20 unique marketing sources in this new data frame D1. I want to add another column that has the count of times this marketing source was in my original data frame. I'm trying to use the dplyr package but not sure how to reference more than one data frame.
original data has 16000 observations
new data frame has 20 observations as there are only 20 unique marketing sources.
How to use summarize in dplyr to reference two data frames?
My objective is to find the percentage of marketing sources.
My original data frame has two columns: NAME, MARKETING_SOURCE
This data frame has 16,000 observations and 20 distinct marketing sources (email, event, sales call, etc)
I created a new data frame with only the unique MARKETING_SOURCES and called that data frame D1
In my new data frame, I want to add another column that has the number of times each marketing source appeared in the original data frame.
My new Data frame should have two columns: MARKETING_SOURCE, COUNT
I don't know if you need to use dplyr for something like this...
First let's create some data.frames:
df1 <- data.frame(source = letters[sample(1:26, 400, replace = T)])
df2 <- data.frame(source = letters, count = NA)
Then we can use table() to get the frequencies:
counts <- table(df1$source)
df2$count <- counts
head(df2)
source count
1 a 10
2 b 22
3 c 12
4 d 17
5 e 18
6 f 18
UPDATE:
In response to #MrFlick's wise comment below, you can use take the names() of the output from table() to ensure order is preserved:
df2$source <- names(counts)
Certainly not quite as elegant and would be even less elegant if df2 had other columns. But sufficient for the simple case presented above.