Transposing a CSV in R so each row contains one data point - r

I am trying to manipulate a CSV in R to match a very specific formatting need. Pretty confident I can nest a few loops to write to a file, but I'm hoping there's an easier way in R.
Before:
1,2,400,410,420,430,450,490,75700,75701,77035,77035,77035,77035
*,Facility Name,1234 Test Street,Michigan,49503,123-456-7891,,MPI_ID_TYPE,1,Sober Living Community,Clothing,Diapers,Food Pantry
After:
1,*
2,Facility Name
400,123 Test Street
410,TestCity
420,TestState
430,12345
450,123-456-7891
75700,MPI_ID_TYPE
75701,1
77035,Sober Living Community
77035,Clothing
77035,Diapers
77035,Food Pantry
So far, I have ingested data and manipulated it to achieve the Before chunk.

you can use t() to transpose data.
df <- mtcars[,1:2]
df2 <- t(df)
df3 <- t(df2)
To further address your question I am keeping one row in the mtcars dataset. Then I use pivot_longer from tidyr to transpose and keep the original headers as a column in the data.
library(tidyr)
df <- mtcars[1,]
df2 <- pivot_longer(df, cols = everything())

Related

Adding a column of a dataframe to another dataframe if they match in another column

For a project in university, i'm working with large stock price dataframe's.
I have two dataframes.
Dataframe df1 includes the daily close prices over a certain time. The header includes the stock's shortcut.
Dataframe df2 includes the stock's shortcut in the first column and in the second column, there is the industry name of the stock's firm. IMPORTANT to know is that in df2 there are more values than in df1 (but every value in df1 should be in df2)
Is there any possibility to integrate the second column of df2 into the first row of df1 if they match (=> value from df1 header = df2 first column)
# Example Code
df1=as.data.frame(matrix(runif(20,min=0,max=1), nrow = 4))
df1
df2 <- as.data.frame(c("V1","V829","V2","V3","V493","V4","V5","V6","V992","V7"))
df2$insert <- c("test1","test2","test3","test4","test5","test6","test7","test8","test9","test10")
names(df2) <- c("Column2","test")
df1
df2
# Now insert/combine df2$test in (or over) df1[1,] as a row, if names(df1) and df2$Column2 matches
enter image description here (DataFrame df1)
enter image description here (DataFrame df2)
Thank you for your answers guys!
Nino
I would recommend you reshape your df1 into long format (see Reshaping data.frame from wide to long format).
library(tidyr)
df1_long <- df1 %>% gather(Instrument, value, -X)
I would organize the file this way because that makes it easier to use left__join() to match the data frames (see a description of mutating joins on the data wrangling cheat sheet).
df <- left_join(df1_long, df2, by = "Instrument")
If you want you can then make your dataframe wide again using the spread() function, which is the reverse of gather().
For the future I recommend you generate a reproducible example, rather than linking image files of your dataframes, as the links might expire, and it makes it generally less likely to get an answer on Stack Overflow.

Need help organizing and summarizing column data into R Markdown

sorry if this has is a easy question but I have a problem
I have a .csv file imported into RStudio. Picture linked below is an example of how it looks like. I want to create individual data frames for each type (BMW, Mercedes, Honda) and then create summary statistics for each subsetted data frame.
example
I am pretty lost that I cant even really figure out a correct title to this question. Any help would be appreciated.
creating single data.frames for each type can be done with the split function, you can then calculate summary statistics for each data.frame by using lapply on the list of data frames.
split_dfs <- split(your_data, your_data$type)
summary_stats <- lapply(split_dfs, function(x){
data.frame(
mean_price = mean(x$price)
)
})
A more modern version would be, not to create single data.frames but to use a grouped data.frame. Use group_by and summarise from the dplyr package.
require(tidyverse)
your_data %>%
group_by(type) %>%
summarise(
mean_price = mean(price)
)
Another library, that makes the computation easier and most of all faster for large datasets with many groups is the data.table library, the computation would look something like this.
require(data.table)
your_dt <- as.data.table(your_data)
summary_stats <- your_dt[, .(mean_price=mean(price)), by="type"]

Need Help Incorporating Tidyr's Spread into a Function that Outputs a List of Dataframes with Grouped Counts

library(tidyverse)
Using the sample data at the bottom, I want to find counts of the Gender and FP variables, then spread these variables using tidyr::spread(). I'm attempting to do this by creating a list of dataframes, one for the Gender counts, and one for FP counts. The reason I'm doing this is to eventually cbind both dataframes. However, I'm having trouble incorporating the tidyr::spread into my function.
The function below creates a list of two dataframes with counts for Gender and FP, but the counts are not "spread."
group_by_quo=quos(Gender,FP)
DF2<-map(group_by_quo,~DF%>%
group_by(Code,!!.x)%>%
summarise(n=n()))
If I add tidyr::spread, it doesn't work. I'm not sure how to incorporate this since each dataframe in the list has a different variable.
group_by_quo=quos(Gender,FP)
DF2<-map(group_by_quo,~DF%>%
group_by(Code,!!.x)%>%
summarise(n=n()))%>%
spread(!!.x,n)
Any help would be appreciated!
Sample Code:
Subject<-c("Subject1","Subject2","Subject1","Subject3","Subject3","Subject4","Subject2","Subject1","Subject2","Subject4","Subject3","Subject4")
Code<-c("AAA","BBB","AAA","CCC","CCC","DDD","BBB","AAA","BBB","DDD","CCC","DDD")
Code2<-c("AAA2","BBB2","AAA2","CCC2","CCC2","DDD2","BBB2","AAA2","BBB2","DDD2","CCC2","DDD2")
Gender<-c("Male","Male","Female","Male","Female","Female","Female","Male","Male","Male","Male","Male")
FP<-c("F","P","P","P","F","F","F","F","F","F","F","F")
DF<-data_frame(Subject,Code,Code2,Gender,FP)
I think you misplaced the closing parenthesis. This code works for me:
library(tidyverse)
Subject<-c("Subject1","Subject2","Subject1","Subject3","Subject3","Subject4","Subject2","Subject1","Subject2","Subject4","Subject3","Subject4")
Code<-c("AAA","BBB","AAA","CCC","CCC","DDD","BBB","AAA","BBB","DDD","CCC","DDD")
Code2<-c("AAA2","BBB2","AAA2","CCC2","CCC2","DDD2","BBB2","AAA2","BBB2","DDD2","CCC2","DDD2")
Gender<-c("Male","Male","Female","Male","Female","Female","Female","Male","Male","Male","Male","Male")
FP<-c("F","P","P","P","F","F","F","F","F","F","F","F")
DF<-data_frame(Subject,Code,Code2,Gender,FP)
group_by_quo <- quos(Gender, FP)
DF2 <- map(group_by_quo,
~DF %>%
group_by(Code,!!.x) %>%
summarise(n=n()) %>%
spread(!!.x,n))
This last part is a bit more concise using count:
DF2 <- map(group_by_quo,
~DF %>%
count(Code,!!.x) %>%
spread(!!.x,n))
And by using count the unnecessary grouping information is removed as well.

Expand each row in R dataframe with multiple rows

I need a dataframe containing the names of some files matching a pattern mapped to each line in those files. My problem is, that I am unable to generate multiple rows for each row, the dataframe should grow in columns and rows, expanded per row. What I need is basically a left outer join, but I am struggling with the syntax.
library(dplyr)
app.lsts <- data.frame(
file=list.files(path='.', pattern='app.lst', recursive=TRUE)
) %>%
mutate(command=paste0('cat ', file)) %>%
mutate(packages=system(command, intern=TRUE))
The last mutate does not work because packages is a list of lines. How do I "unwrap" these?
First, some working (but not very good code):
require(tidyverse)
out_df <-
list.files(path='.', pattern='*.foo', recursive=TRUE) %>%
map(~readLines(file(.x))) %>%
setNames(fnames) %>%
t %>%
as.data.frame %>%
gather(file, lines) %>%
unnest()
out_df
This is a tidyverse-style command to generate the data that I think you want. Since I don't have your input files, I made up these sample files:
contents of f1.foo
line_1_f1
line_2_f1
contents of f2.foo
line_1_f2
line_2_f2
line_3_f2
Changes relative to your approach:
Avoid the use of the built-in function file() as a column name. I used fname instead.
Don't use system to read the files, there is built-in R functions to do that. Use of system() needlessly makes porting your code to other operating systems far more unlikely to succeed.
Build the data frame after all the data is read into R, not before. Because of the way non-standard evaluation with dplyr works, it's hard to use readLines(...) inside of a mutate() where the file connection to be read varies.
Use purrr::map() to generate a list of lists of file content lines from a list of filenames. This is a tidyverse way of writing a for-loop.
Set the names of the list elements with setNames().
Munge this list into a data.frame using t() and as.data.frame()
Tidy the data with gather() to collapse the data frame that has one column per file into a data frame with one file per row.
Expand the list using unnest().
I don't think this approach is very pretty, but it works. Another approach that avoids the ugly steps 5 and 6 is a for loop.
fnames <- list.files(path='.', pattern='*.foo', recursive=TRUE)
out_df <- data.frame(fname = c(), lines=c())
for(fname in fnames){
fcontents <- readLines(file(fname)) %>% as.character
this_df <- data.frame(fname = fname, lines = fcontents)
out_df <- bind_rows(out_df, this_df)
}
The output in either case is
fname lines
1 f1.foo line_1_f1
2 f1.foo line_2_f1
3 f2.foo line_1_f2
4 f2.foo line_2_f2
5 f2.foo line_3_f2

What is the best way to transpose a data.frame in R and to set one of the columns to be the header for the new transposed table?

What is the best way to transpose a data.frame in R and to set one of the columns to be the header for the new transposed table? I have coded up a way to do this below. As I am still new to R. I would like suggestions to improve my code as well as alternatives that would be more R-like. My solution is also unfortunately a bit hard coded (i.e. the new column headings are in a certain location).
# Assume a data.frame called fooData
# Assume the column is the first column before transposing
# Transpose table
fooData.T <- t(fooData)
# Set the column headings
colnames(fooData.T) <- test[1,]
# Get rid of the column heading row
fooData.T <- fooData.T[2:nrow(fooData.T), ]
#fooData.T now contains a transposed table with the first column as headings
Well you could do it in 2 steps by using
# Transpose table YOU WANT
fooData.T <- t(fooData[,2:ncol(fooData)])
# Set the column headings from the first column in the original table
colnames(fooData.T) <- fooData[,1]
The result being a matrix which you're probably aware of, that's due to class issues when transposing. I don't think there will be a single line way to do this given the lack of naming abilities in the transpose step.
You can do it even in one line:
fooData.T <- setNames(data.frame(t(fooData[,-1])), fooData[,1])
There are already great answers. However, this answer might be useful for those who prefer brevity in code.
Here are my two cents using dplyr for a data.frame that has grouping columns and an id column.
id_transpose <- function(df, id){
df %>%
ungroup() %>%
select(where(is.numeric)) %>%
t() %>%
as_tibble() %>%
setNames(., df %>% pull({{id}}))
}
Here is another tiyderse/dplyr approach taken from here.
mtcars %>%
tibble::rownames_to_column() %>%
tidyr::pivot_longer(-rowname) %>%
tidyr::pivot_wider(names_from=rowname, values_from=value)
Use transpose from data.table, suppose the column you want to use as header after transpose is the variable group.
fooData.transpose = fooData %>% transpose (make.name = "group")
In addition, if you want to assign a name for the transposed column name, use argument keep.names.
fooData.transpose = fooData %>% transpose (make.name = "group", keep.names = "column_name")
There's now a dedicated function to transpose data frames, rotate_df from the sjmisc package. If the desired names are in the first column of the original df, you can achieve this in one line thanks to the cn argument.
Here's an example data frame:
df <- data.frame(name = c("Mary", "John", "Louise"), class = c("A", "A", "B"), score = c(40, 75, 80))
df
# name class score
#1 Mary A 40
#2 John A 75
#3 Louise B 80
Executing the function with cn = T:
rotate_df(df, cn = T)
# Mary John Louise
#class A A B
#score 40 75 80
I had a similar problem to this -- I had a variable of factors in a long format and I wanted each factor to be a new column heading; using "unstack" from the stats library did it in one step. If the column you want as a header isn't a factor, "cast" from the reshape library might work.

Resources