I have some DF’s with different variable names, but they have the same content. Unfortunately, my files have no pattern, but I am now trying to standardize them. For example, I have these 4 DF’s and I would like to select only one variable:
KEY_WIN <- c(123,456,789)
COUNTRY <- c("USA","FRANCE","MEXICO")
DF1 <- data.frame(KEY_WIN,COUNTRY)
KEY_WINN <- c(12,55,889)
FOOD <- c("RICE","TOMATO","MANGO")
CAR <- c("BMW","FERRARI","TOYOTA")
DF2 <- data.frame(KEY_WINN,FOOD,CAR)
ID <- c(555,698,33)
CITY <- c("NYC","LONDON","PARIS")
DF3 <- data.frame(ID,CITY)
NUMBER <- c(3,436,1000)
OCEAN <- c("PACIFIC","ATLANTIC","INDIAN")
DF4 <- data.frame(NUMBER,OCEAN)
I would like to create a routine to select only the variables KEY_WIN, KEY_WINN, ID, NUMBER. My expected result would be:
DF_FINAL<- data.frame(KEY=c(123,456,789, 12,55,889, 555,698,33, 3,436,1000))
How would I select only those variables?
There are multiple ways I would imagine you could approach this.
First, you could put your data frames in a list:
listofDF <- list(DF1, DF2, DF3, DF4)
Then, you could bind_rows to add the data frames together, and then coalesce to merge into one column.
library(tidyverse)
bind_rows(listofDF) %>%
mutate(KEY = coalesce(KEY_WIN, KEY_WINN, ID, NUMBER)) %>%
select(KEY)
KEY
1 123
2 456
3 789
4 12
5 55
6 889
7 555
8 698
9 33
10 3
11 436
12 1000
If you knew that the first column was always your KEY column, you could simply do:
KEY = unlist(lapply(listofDF, "[[", 1))
This would extract the first column from all of your data frames:
[1] 123 456 789 12 55 889 555 698 33 3 436 1000
Related
My Tibble:
df1 <- tibble(a = c("123*", "123", "124", "678*", "678", "679", "677"))
# A tibble: 7 x 1
a
<chr>
1 123*
2 123
3 124
4 678*
5 678
6 679
7 677
What it should become:
# A tibble: 3 x 2
a b
<chr> <chr>
1 123 124
2 678 679
3 678 677
The values with the stars refer to the following values with no stars, until a new value with a star comes and so on.
Each value with a star should go to the first column, the other values (except the ones that are identical to the values with a star, except the star) should go to the second column. If one value with a star is followed by several values, they should still be linked to eachother, so the values in the first column are duplicated to keep the connection.
I know how to filter and bring the values in each column, but not sure how i would keep the connection.
Regards
We can use tidyverse. Create a grouping column based on the occurence of * in 'a', extract the numeric part with parse_number, get the distinct rows, grouped by 'grp', create a new column with the first value of 'b'
library(dplyr)
library(stringr)
df1 %>%
transmute(grp = cumsum(str_detect(a, fixed("*"))),
b = readr::parse_number(a)) %>%
distinct(b, .keep_all = TRUE) %>%
group_by(grp) %>%
mutate(a = first(b)) %>%
slice(-1) %>%
ungroup %>%
select(a, b)
-output
# A tibble: 3 × 2
a b
<dbl> <dbl>
1 123 124
2 678 679
3 678 677
Here is one base R option -
Using cumsum and grepl we split the data on occurrence of *.
In each group, we drop the values which are similar to the star values and create a dataframe with two columns.
Finally, combine the list of dataframes in one combined dataframe.
result <- do.call(rbind, lapply(split(df1,
cumsum(grepl('*', df1$a, fixed = TRUE))), function(x) {
a <- x[[1]]
a[1] <- sub('*', '', a[1], fixed = TRUE)
data.frame(a = a[1], b = a[a != a[1]])
}))
rownames(result) <- NULL
result
# a b
#1 123 124
#2 678 679
#3 678 677
I have several columns in my source data frame containing the same three possible variables (1, 2 and 3) over several hundred rows. I'm using the table function to summarize each column as shown here
column1 <- table(data$column1)
column2 <- table(data$column2)
column3 <- table(data$column3)
...
These tables print out results of the form below
1 2 3
6 74 300
I'm trying to combine all of these tables into one data frame of this form
1
2
3
column1
6
74
300
column2
2
87
298
column3
4
57
489
How do I make this happen? Thank you!
We can use the tidyverse, suppose your data is called dat:
library(tidyverse)
dat %>%
pivot_longer(cols = everything()) %>%
count(name, value) %>%
pivot_wider(names_from = value, values_from = n)
# name `1` `2` `3`
# 1 column1 6 74 300
# 2 column2 2 87 298
# 3 column3 4 57 489
Got a solution using rbind from reddit. It does exactly what I was looking for.
#selected all tables in the environment
tables = sapply(.GlobalEnv, is.table)
#rbinded them
allquestions <- do.call(rbind, mget(names(tables)[tables]))
Suppose we have this data frame:
avg_1 avg_2 avg_3 avg_4
132 123 23 214
DF DM RF RM
How can I convert this in R so that the output is a new data frame that looks like:
avg key
132 DF
123 DM
23 RF
214 RM
I have tried using pivot_longer from tidyverse, but the trouble is that I'm also trying to rename the columns to avg and key. Can anyone help?
In base R I would try:
setNames(data.frame(t(df), row.names = NULL), c("avg", "key"))
Output
avg key
1 132 DF
2 123 DM
3 23 RF
4 214 RM
Does this work:
library(dplyr)
library(purrr)
library(tibble)
t(df) %>% as.tibble() %>% set_names(c('avg','key')) %>% type.convert(as.is = T)
# A tibble: 4 x 2
avg key
<int> <chr>
1 132 DF
2 123 DM
3 23 RF
4 214 RM
And here is a solution with R builtin methods:
x <- t(your.data.fram)
names(x) <- c("avg", "key")
Note that you might also want to change the data types to numeric and factor, if they are something different, e.g.
x$avg <- as.numeric(x$avg)
x$key <- as.factor(x$key)
I have 3 data set. All of them has 1 column called ID. I would like to list out each ID for whole 3 tables (I'm not sure I'm explaining right). For example
df1
ID age
1 34
2 33
5 34
7 35
43 32
76 33
df2
ID height
1 178
2 176
5 166
7 159
43 180
76 178
df3
ID class type
1 a 1
2 b 1
5 a 2
7 b 3
43 b 2
76 a 3
I would like to have an output which looks like this
ID = 1
df1 age
34
df2 height
178
df3 class type
a 1
ID = 2
df1 age
33
df2 height
176
df3 class type
b 1
I wrote a script
listing <- function(x) {
for(i in 1:n) {
data <- print(x[x$ID == 'i', ])
print(data)
}
return(data)
}
why am I not getting the output I wanted?
This is a hack. If you want/need to export to a word document, I strongly urge you to use something like R-Markdown (such as RStudio) using knitr (and, behind the scenes, pandoc). I'd encourage you to look at knitr::kable, for instance, as well as better looping structures for dealing with large numbers of datasets.
This hack can be improved considerably. But it gets you the output you want.
func <- function(...) {
dfnames <- as.character(match.call()[-1])
dfs <- setNames(list(...), dfnames)
IDs <- unique(unlist(lapply(dfs, `[[`, "ID")))
fmt <- paste("%", max(nchar(dfnames)), "s %s", sep = "")
for (id in IDs) {
cat(sprintf("ID = %d\n", id))
for (nm in dfnames) {
df <- dfs[[nm]][ dfs[[nm]]$ID == id, names(dfs[[nm]]) != "ID", drop =FALSE]
cat(paste(sprintf(fmt, c(nm, ""),
capture.output(print(df, row.names = FALSE))),
collapse = "\n"), "\n")
}
}
}
Execution. Though this is showing just two data.frames, you can provide an arbitrary number of data.frames (and in your preferred order) in the function arguments. It assumes you are providing them as direct variables and not subsetting within the function call ... you'll understand if you try it.
func(df1, df3)
# ID = 1
# df1 age
# 34
# df3 class type
# a 1
# ID = 2
# df1 age
# 33
# df3 class type
# b 1
# ID = 5
# df1 age
# 34
# df3 class type
# a 2
# ID = 7
# df1 age
# 35
# df3 class type
# b 3
# ID = 43
# df1 age
# 32
# df3 class type
# b 2
# ID = 76
# df1 age
# 33
# df3 class type
# a 3
(Personally, I can't imagine providing output in this format, but I don't know your tastes or use-case. There are many many other ways to show data like this. Like:
Reduce(function(x,y) merge(x, y, by = "ID"), list(df1, df2, df3))
# ID age height class type
# 1 1 34 178 a 1
# 2 2 33 176 b 1
# 3 5 34 166 a 2
# 4 7 35 159 b 3
# 5 43 32 180 b 2
# 6 76 33 178 a 3
It's much more concise. But, then again, I'm also assuming that you want to show them all at once instead of "show one, talk about it, then show another one, talk about it ...".)
Why not do a merge by id ?
df_1 <- merge( df1, df2, by='ID')
df_fianl <- merge( df_1, df3, by='ID')
or by using
library(dplyr)
full_join(df1, df2)
I have two data frames:
df1
vehicle speed time
a 23 234
b 34 421
d 45 290
df2
vehicle speed time
a 29 215
b 54 450
c 45 21
f 40 367
Both vehicle columns are factors. I want to find the common vehicles and add the corresponding df2$time to df1, name it as time.2.
The output I want:
df1
vehicle speed time time.2
a 23 234 215
b 34 421 450
I tried:
df1 <- df1[df1$vehicle %in% df2$vehicle, ]
df2 <- df2[df2$vehicle %in% df1$vehicle, ]
df1 <- cbind(df1, time.2 = df2$time)
But after the first two commands, both df1 and df2 have 0 rows inside. I have tried before, when I use another data frame to compare the vehicle with df1, it works. I don't why df2 doesn't work.
Thanks!
Try:
library(dplyr)
inner_join(df1,
df2 %>%
select(-speed) %>%
rename(time.2 = time) )
Use the merge() function:
df1$vehicle <- as.character(df1$vehicle)
df2$vehicle <- as.character(df2$vehicle)
df <- merge(df1, df2, by="vehicle")
df <- df[, c("vehicle.x", "speed.x", "time.x", "time.y")]