Turn the dataset into a script, creating the dataset - r

Are there any ways to turn the dataframe (imported from excel) into a code, which creates this dataframe?
I have a dataframe, which I obtain from excel, but I'd love to get the code like this from it.
abc <- data.frame(a = c("a", "aa", "r"),
b = c(1, 2, 3),
c = c(T, F, T))

Related

For loop over the same variable in multiple datasets

I have multiple datasets and would like to create a contingency table for the same variable in each of them. I am attempting to write a for loop over these datasets, but am having difficulty accessing the necessary variable. Here's a fake set-up to illustrate my issue:
data1 <- data.frame(name = c("A", "B", "C"),
value1 = c(1, 2, 2),
value2 = c(1, 3, 7))
data2 <- data.frame(name = c("D", "E", "F"),
value1 = c(3, 4, 3),
value2 = c(8, 2, 1))
datasets <- c("data1", "data2")
If I manually execute table(data1$value1) then I receive a result. However, if I try something like the following:
for (i in seq_along(datasets)) {
variable <- datasets[[i]]$value1
table(variable)
}
then R throws an error message "Error: $ operator is invalid for atomic vectors." Given this, what is the best way to achieve my initial aim?

How to read and merge only the second sheet from a number of excel files (xlsm) in R?

I have a working directory with a large number of xlsm files (600ish). I need to merge all of these files into one dataframe, but ONLY the second sheet of the excel file. Since there are a lot of files, ideally I would use a loop, but I'm struggling with how to do this. Right now I have this code, which is obviously not working. Any thoughts on how to best do this would be greatly appreciated.
library(readxl)
library(tidyverse)
data.files = list.files(pattern = "*.xlsm")
data_to_merge <- lapply(data.files, read_excel(x, sheet = 2))
combined_df <- bind_rows(data_to_merge)
Not sure how to include examples of the data so it's easily reproducible since my question is dealing with excel sheets, not data that's already in r, but if this is useful, all of the 2nd sheets have the same simple structure that looks something like this:
data1 <- data.frame(id = 1:6,
x1 = c(5, 1, 4, 9, 1, 2),
x2 = c("A", "Y", "G", "F", "G", "Y"))
data2 <- data.frame(id = 4:9,
y1 = c(3, 3, 4, 1, 2, 9),
y2 = c("a", "x", "a", "x", "a", "x"))
You were close. You just need to slightly alter your lapply statement, so that the function and parameter are separated by a column.
library(readxl)
library(tidyverse)
data.files = list.files(pattern = "*.xlsm")
data_to_merge <- lapply(data.files, read_excel, sheet = 2)
combined_df <- bind_rows(data_to_merge)
Or a more tidyverse approach:
combined_df <- list.files(pattern = "*.xlsm") %>%
map(., ~ read_excel(.x, sheet = 2)) %>%
bind_rows()

Change values of a columns according to vlaues of another

I am new in R and I have a question. I have two data frames, and I want to change the values of a column in the second data frame based on the values of a column in the first data frame. Both columns are string and contain 4 numbers separated by (-). Here is an example,
So, based on this example, column b of Table 2 should change in a way that, if the first and last numbers in each set are equal then replace the values in Table 1. Also if a cell exists in column b of table 2 which the first and last numbers do not exist in table 1, delete that row (in this example: 2-201-2012-250).
Thank you
Is that what you're looking for :
library(stringr) #for str_split()
library(dplyr) #for left_join()
my_df <- data.frame("a" = c(1, 2, 3, 4),
"b" = c("7-1-1-100", "7-1-1-12", "31-1-1-5", "31-1-1-8"),
"c" = c(0, 0, 0, 0), stringsAsFactors = FALSE)
my_df2 <- data.frame("a" = c(1, 2, 3, 4, 5),
"b" = c("7-1-1-100", "7-1-1-12", "2-1-1-250", "31-1-1-5", "31-1-1-8"),
"c" = c("ABC", "ABCD", "AD", "ABV", "CDF"), stringsAsFactors = FALSE)
my_var <- str_split(string = my_df$b, pattern = "-", n = 4, simplify = TRUE)
my_var2 <- str_split(string = my_df2$b, pattern = "-", n = 4, simplify = TRUE)
my_df$d <- paste(my_var[, 1], my_var[, 4], sep = "-")
my_df2$d <- paste(my_var2[, 1], my_var2[, 4], sep = "-")
my_df <- left_join(my_df[, c("a", "b", "d")], my_df2[, c("d", "c")], by = "d")
my_df <- my_df[, c("a", "b", "c")]

Grouping to form more than one comma-separated columns in data.table

Problem: I basically want to group data based on the data.table syntax and in parallel create two or more columns which contain comma-separated values (as in the example below).
Approach: I thought about an lapply where I can provide a list of columns which I want to comma-separate; however this did not turn out as expected.
Any suggestions?
EDIT
I am somehow looking for an approach where I only have to provide a list/vector of columns and then apply the function on this list (similar to the not-working lapply approach)
library(data.table)
dt <- data.table(
x = c(1, 1, 1, 3, 3, 2),
y = c("AA", "BB", "CC", "BB", "EE", "AA"),
z = c("H", "A", "C", "Z", "F", "G")
)
## Attempts
dt[, paste0(y, collapse = ","), by = .(x)]
dt[, lapply(c("y", "z"), paste0, collapse = ","), by = x]
## Desired Ouput
x y z
1: 1 AA,BB,CC H, A, C
2: 3 BB,EE Z, F
3: 2 AA G
library(data.table)
dt[, lapply(.SD, toString), by = x, .SDcols = names(dt)[sapply(dt, is.character)]]
dt_sum <- dt[,.(yy=toString(unique(y)),zz=toString(unique(z))),by=c("x")]
dt_sum

How to concatenate multiple columns with a coma between them

I have the following data frame in r
ID COL.1 COL.2 COL.3 COL.4
1 a b
2 v b b
3 x a n h
4 t
I am new to R and I don't understand how to call the data fram in order to have this at the end, another problem is that i have more than 100 columns
stream <- c("1,a,b","2,v,b,b","3,x,a,n,h","4,t")
another problem is that I have more than 100 columns .
Try this
Reduce(function(...)paste(...,sep=","), df)
Where df is your data.frame
This might be what you're looking for, even though it's not elegant.
my_df <- data.frame(ID = seq(1, 4, by = 1),
COL.1 = c("a", "v", "x", "t"),
COL.2 = c("b", "b", "a", NULL),
COL.3 = c(NULL, "b", "n", NULL),
COL.4 = c(NULL, NULL, "h", NULL))
stream <- substring(paste(my_df$ID,
my_df$COL.1,
my_df$COL.2,
my_df$COL.3,
my_df$COL.4,
sep =","), 3)
stream <- gsub(",NA", "", stream)
stream <- gsub("NA,", "", stream)

Resources