I'm currently using the group_by() function and summaraise() to get the sum of my columns, is it possible to save that information to another data frame somehow? Maybe even create a csv file with its information. Thanks
workday %>% group_by(Date) %>% mutate_if(is.character,as.numeric) %>% summarise(across(Axis1:New_Sitting,sum))
Store the pipe result in a new object, say a
a <- workday %>% group_by(Date) %>% #other ops on workday
To save it to a file there are several options, including the base write.csv:
write.csv(a, "Path to file")
Related
I have a data frame that has a column with JSON values. I found the library "tidyjson" which helps to extract this JSON. However, it is always extracted into a new data frame.
I am looking for a way to replace the JSON in the original data frame with the result of tidyjson.
Code:
mydf <- df$response %>% as.tbl_json %>% gather_array %>%
spread_values(text=jstring('text'))
Is there a way that "df$response" is replaced with the extracted json "text"-value?
Thanks in advance!
This solution worked for me:
df %>% as.tbl_json(json.column = 'response') %>% gather_array %>%
spread_values(response=jstring('text'))
I am a beginner working with R and especially JSON files, and this is probably a simple question but I have been unsuccessful for a while.
Here is a sample row of data from a provided text file (there are ~4000 rows):
{"040070005001":4,"040070005003":4,"040138101003":4,"040130718024":4}
Each row has a variable number of values in the string.
I am trying to use a function, but it is only loading the last row of the data set rather than capturing the data from each row?
For (row in 1:nrow(origins)) {
json <- origins$home_cbgs[row] %>%
fromJSON() %>%
unlist() %>%
as.data.frame() %>%
rownames_to_column() %>%
rename(
origin_census_block_group = "rowname",
origin_visitors = "."
)
}
I have a csv file of Facebook data with around 190,000 rows. The column names are the following:
comment_id, status_id, parent_id, comment_message, comment_author, comment_published, comment_likes, Positive, Negative, Sentiment
I want to find out which comment_author who has the most comments (# of comment_message) and a Sentiment > 0.
Does anybody know how to apply this filter using R?
If df is your data frame you can use dplyr package as follow:
df %>% group_by(comment_author,sentiment) %>%
dplyr::summarize(total_number_comment=sum(comment_message)) %>%
as.data.frame() %>%
arrange(desc(total_number_comment)) %>%
filter(sentiment>0)
I didn't understand what you really want to do with the sentiment variable (you need to provide an example for instance), but the grouping part is done
library(dplyr)
library(tidyr)
library(forcats)
library(readxl)
Using the gss_cat dataset from the forcats package I created a grouped and summarized dataframe then split the data by the marital and race variables (If there's a better tidyverse method than using lapply here that would be a great bonus).
Survey<-gss_cat%>%
select(marital,race,relig,denom)%>%
group_by(marital,race,relig,denom)%>%
summarise(Count=n())%>%
mutate(Perc=paste0(round(100*Count/sum(Count),2),"%"))%>%
drop_na()
Survey%>%split(.$marital)%>%
lapply(function(x) split(x,x$race))
However I'm stuck trying to export the final list to an Excel file with readxl. More specifically, I want to export select tables in the list to separate Excel tabs. For example, divided by race, so that each race category is on a different tab in the spreadsheet.
First, readxl does not write Excel files. See the thread for issue 231 on the readxl GitHub page. It looks like the writexl package (not [yet] part of the tidyverse) is recommended instead.
Second, split() can take a list as an argument.
list_of_dfs <- survey %>% split(list(.$marital, .$race), sep='_')
Putting it together, assuming you've installed writexl:
require(tidyverse)
require(forcats)
require(writexl)
survey <-
gss_cat %>%
select(marital, race, relig, denom) %>%
group_by(marital, race, relig, denom) %>%
summarise(Count=n()) %>%
mutate(Perc=paste0(round(100*Count/sum(Count), 2), "%")) %>%
drop_na()
list_of_dfs <- survey %>% split(list(.$marital, .$race), sep='_')
write_xlsx(list_of_dfs, 'out.xlsx')
Note that there are no checks on the suitability of the names of the worksheets that write_xlsx tries to create. If your data contained illegal characters in the marital or race column, or if you used an illegal character in the sep argument to split(), then the operation would fail. (Try using sep = ':' if you don't believe me.)
I want to read gds_result.txt from enter link description here
using R and get dataframe.
The data.frame have 7 columns. The colnames of data.frame were:
title 2. contents 3. Organism 4. Type 5. Platform 6. FTP download 7. DataSet
How to get?
You could start with this:
library(tidyverse)
library(stringr)
txt<-read_lines("https://raw.githubusercontent.com/juancholkovich/GEO_DataSet_Browser/master/gds_result.txt")
txt %>% as_data_frame() %>%
filter(!value=='') %>%
mutate(new_group=as.numeric(str_detect(value, "^(\\d*?\\. )")),
group=cumsum(new_group),
keyword=str_match(value, "^Organism|^Project|^Type|^FTP|^Sample|^Series|^Source"),
keyword=ifelse(str_detect(tolower(value), "^dataset|^series|^sample|^platform|related platforms"), "Dataset", keyword),
keyword=ifelse(str_detect(tolower(value), "accession"), "Accession", keyword),
keyword=ifelse(new_group==1, "Name", keyword),
keyword=ifelse(is.na(keyword), "Comment", keyword)
) %>% select(-new_group) %>% spread(key=keyword, value=value)
There's probably a lot more cleaning to be done, but at least you get some structure to your data.