I need to read the ''wdbc.data' in the following data folder:
http://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer-wisconsin/
Doing this in R is easy using command read.csv but as the header is missing how can I add it? I have the information but don't know how to do this and I'd prefer do not edit the data file.
You can do the following:
Load the data:
test <- read.csv(
"http://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer-wisconsin/breast-cancer-wisconsin.data",
header=FALSE)
Note that the default value of the header argument for read.csv is TRUE so in order to get all lines you need to set it to FALSE.
Add names to the different columns in the data.frame
names(test) <- c("A","B","C","D","E","F","G","H","I","J","K")
or alternative and faster as I understand (not reloading the entire dataset):
colnames(test) <- c("A","B","C","D","E","F","G","H","I","J","K")
You can also use colnames instead of names if you have data.frame or matrix
You can also solve this problem by creating an array of values and assigning that array:
newheaders <- c("a", "b", "c", ... "x")
colnames(data) <- newheaders
in case you are interested in reading some data from a .txt file and only extract few columns of that file into a new .txt file with a customized header, the following code might be useful:
# input some data from 2 different .txt files:
civit_gps <- read.csv(file="/path2/gpsFile.csv",head=TRUE,sep=",")
civit_cam <- read.csv(file="/path2/cameraFile.txt",head=TRUE,sep=",")
# assign the name for the output file:
seqName <- "seq1_data.txt"
#=========================================================
# Extract data from imported files
#=========================================================
# From Camera:
frame_idx <- civit_cam$X.frame
qx <- civit_cam$q.x.rad.
qy <- civit_cam$q.y.rad.
qz <- civit_cam$q.z.rad.
qw <- civit_cam$q.w
# From GPS:
gpsT <- civit_gps$X.gpsTime.sec.
latitude <- civit_gps$Latitude.deg.
longitude <- civit_gps$Longitude.deg.
altitude <- civit_gps$H.Ell.m.
heading <- civit_gps$Heading.deg.
pitch <- civit_gps$pitch.deg.
roll <- civit_gps$roll.deg.
gpsTime_corr <- civit_gps[frame_idx,1]
#=========================================================
# Export new data into the output txt file
#=========================================================
myData <- data.frame(c(gpsTime_corr),
c(frame_idx),
c(qx),
c(qy),
c(qz),
c(qw))
# Write :
cat("#GPSTime,frameIdx,qx,qy,qz,qw\n", file=seqName)
write.table(myData, file = seqName,row.names=FALSE,col.names=FALSE,append=TRUE,sep = ",")
Of course, you should modify this sample script based on your own application.
this should work out,
kable(dt) %>%
kable_styling("striped") %>%
add_header_above(c(" " = 1, "Group 1" = 2, "Group 2" = 2, "Group 3" = 2))
#OR
kable(dt) %>%
kable_styling(c("striped", "bordered")) %>%
add_header_above(c(" ", "Group 1" = 2, "Group 2" = 2, "Group 3" = 2)) %>%
add_header_above(c(" ", "Group 4" = 4, "Group 5" = 2)) %>%
add_header_above(c(" ", "Group 6" = 6))
for more you can check the link
Related
I'm looking for a way to automatically add new list elements/levels to an existing list:
my real-life use case has several thousand elements to add, so the manual example below for adding two elements is not feasible anymore,
I need a list because that's the format expected by an API I'm trying to access.
Example:
library(tidyverse)
x <- data.frame(id = c(1,2,3),
label = c("label 1", "label 2", "label 3"),
category = c("cat 1", "cat 2", "cat 3"))
x_list <- x %>%
as.list() %>%
transpose()
names <- c("name 1", "name 2")
# Expected final format/output
full_list <- list(list(name = names[1],
info = x_list),
list(name = names[2],
info = x_list))
So I'm looking for a way to create this list of lists, where I "glue together" all values from the names vector with a "copy" ot the x_list.
I'm not that familiar with lists, so struggling quite a bit. I know that the purrr package can do awesome list things, so I'm open/looking forward to a tidyverse approach, although I'm also gladly taking base R. Thanks.
Iterate over names using map or using the same arguments replace map with lapply in which case no packages are needed.
library(purrr)
result <- map(names, function(nm) list(name = nm, info = x_list))
identical(result, full_list)
## [1] TRUE
I'm quite a novice, but I've successfully managed to make some code do what I want.
Right now my code does what I want for one file at a time.
I want to make my code automate this process for 600 files.
I kind of have an idea, that I need to put the list of files in a vector, then maybe use lapply and a function, but I'm not sure how to do this. The syntax and code are beyond me at the moment.
Here's my code...
#Packages are callled
library(tm) #text mining
library(SnowballC) #stemming - reducing words to their root
library(stringr) #for str_trim
library(plyr)
library(dplyr)
library(readtext)
#this is my code to run the code on a bunch of text files. Obviously it's unfinished, and I'm not sure if this is the right approach. Where do I put this? Will it even work?
data_files <- list.files(path = "data/", pattern = '*.txt', full.names = T, recursive = T)
lapply(
#
# where do I put this chunk of code?
# do I need to make all the code below a function?
##this bit cleans the document
company <- "CompanyXReport2015"
txt_raw = readLines("data/CompanyXReport2015.txt")
# remove all extra white space, also splits on lines
txt_format1 <- gsub(" *\\b[[:alpha:]]{1,2}\\b *", " ", txt_raw)
txt_format1.5 <- gsub("^ +| +$|( ) +", "\\1", txt_format1)
# recombine now that all white space is stripped
txt_format2 <- str_c(txt_format1.5, collapse=" ")
#split strings on space now to get a list of all words
txt_format3 <- str_split(txt_format2," ")
txt_format3
# convert to vector
txt_format4 <- unlist(txt_format3)
# remove empty strings and those with words shorter than 3 length
txt_format5 <- txt_format4[str_length(txt_format4) > 3]
# combine document back to single string
cleaned <- str_c(txt_format5, collapse=" ")
head(cleaned, 2)
##import key words and run analysis on frequency for the document
s1_raw = readLines("data/stage1r.txt")
str(s1_raw)
s2_raw = readLines("data/stage2r.txt")
str(s2_raw)
s3_raw = readLines("data/stage3r.txt")
str(s3_raw)
s4_raw = readLines("data/stage4r.txt")
str(s4_raw)
s5_raw = readLines("data/stage5r.txt")
str(s5_raw)
# str_count(cleaned, "legal")
# apply str_count function using each stage vector
level1 <- sapply(s1_raw, str_count, string=cleaned)
level2 <- sapply(s2_raw, str_count, string=cleaned)
level3 <- sapply(s3_raw, str_count, string=cleaned)
level4 <- sapply(s4_raw, str_count, string=cleaned)
level5 <- sapply(s5_raw, str_count, string=cleaned)
#make a vector from this for the report later
wordcountresult <- c(level1,level2,level3,level4,level5)
# convert to dataframes
s1 <- as.data.frame(level1)
s2 <- as.data.frame(level2)
s3 <- as.data.frame(level3)
s4 <- as.data.frame(level4)
s5 <- as.data.frame(level5)
# add a count column that each df shares
s1$count <- s1$level1
s2$count <- s2$level2
s3$count <- s3$level3
s4$count <- s4$level4
s5$count <- s5$level5
# add a stage column to identify what stage the word is in
s1$stage <- "Stage 1"
s2$stage <- "Stage 2"
s3$stage <- "Stage 3"
s4$stage <- "Stage 4"
s5$stage <- "Stage 5"
# drop the unique column
s1 <- s1[c("count","stage")]
s2 <- s2[c("count","stage")]
s3 <- s3[c("count","stage")]
s4 <- s4[c("count","stage")]
s5 <- s5[c("count","stage")]
# s1
df <- rbind(s1, s2,s3, s4, s5)
df
#write the summary for each company to a csv
#Making the report
#Make a vector to put in the report
#get stage counts and make a vector
s1c <- sum(s1$count)
s2c <- sum(s2$count)
s3c <- sum(s3$count)
s4c <- sum(s4$count)
s5c <- sum(s5$count)
stagesvec <- c(s1c,s2c,s3c,s4c,s5c)
names(stagesvec) <- c("Stage1","Stage2","Stage3","Stage4","Stage5")
#get the company report name for a vector
companyvec <- c(company)
names(companyvec) <- c("company")
# combine the vectors for the vector row to be inserted into the report
reportresult <- c(companyvec, wordcountresult, stagesvec)
rrdf <- data.frame(t(reportresult))
newdf <- data.frame(t(reportresult))
#if working file exists-use it
if (file.exists("data/WordCount12.csv")){
write.csv(
rrdf,
"data/WordCountTemp12.csv", row.names=FALSE
)
rrdf2 <-
read.csv("data/WordCountTemp12.csv")
df2 <-
read.csv("data/WordCount12.csv")
df2 <- rbind(df2, rrdf2)
write.csv(df2,
"data/WordCount12.csv", row.names=FALSE)
}else{ #if NO working file exists-make it
write.csv(newdf,
"data/WordCount12.csv", row.names=FALSE)
}
Hello :) Here is an example of workflow, you might find better ones but I started with it when learning.
listoftextfiles = list.files(...)
analysis1 = function(an element of listoftextfiles){
# your 1st analysis
}
res1 = lapply(listoftextfile, analysis1) # results of the 1st analysis
analysis2 = function(an element of res1){
# your 2nd analysis
}
res2 = lapply(res1, analysis2) # results of the 2nd analysis
# ect.
You will find many tutorials about custom functions on internet.
library(tidyverse)
library(knitr)
library(kableExtra)
kable(head(mpg)) %>% kable_styling %>% add_header_above(c(" " = 6,
"foobar" = 5))
Works well when you want to specify the grouped column header manually:
However I want to put this feature inside a function, I've tried:
a_function <- function(my_column_head_name){
kable(head(mpg)) %>% kable_styling %>% add_header_above(c(" " = 6,
my_column_head_name = 5))
}
a_function("foobar")
But this returns the named argument as the column header name:
I'm guessing this is something to do with lazy evaluation?
The header is a named character vector with colspan as values. You have to create the vector and assign names with names().
a_function <- function(my_column_head_name){
myHeader <- c(" " = 6, my_column_head_name = 5)
names(myHeader) <- c(" ", my_column_head_name)
kable(head(mpg)) %>% kable_styling %>% add_header_above(myHeader)
}
a_function("foobar")
I am trying to create a 'flexable' object from the R package "flextable". I need to put same column names in more than one columns. When I am putting them in the the "col_key" option in the function "flextable" I am getting the error of "duplicated col_keys". Is there a way to solve this problem?
a<-c(1:8)
b<-c(1:8)
c<-c(2:9)
d<-c(2:9)
A<-flextable(A,col_keys=c("a","b","a","b"))
This is the example code for which I am getting the error.
As it stands now, flextable does not allow duplicate column keys. However, you can achieve the same result by adding a row of "headers," or a row of column labels, to the top of your table. These headers can contain duplicate values.
You do this with the "add_header_row" function.
Here is a base example using the iris data set.
ft <- add_header_row(
ft,
values = c("", "length", "width", "length", "width"),
top = FALSE
)
ft <- theme_box(ft)
https://davidgohel.github.io/flextable/articles/layout.html
I found a work around by adding the character \r to the column names to create unique column names :
library(flextable)
A <- matrix(rnorm(8), nrow = 2, ncol = 4)
A <- as.data.frame(A)
col_Names <- c("a","b","a","b")
nb_Col_Names <- length(col_Names)
for(i in 1 : nb_Col_Names)
{
col_Names[i] <- paste0(col_Names[i], paste0(rep("\r", i), collapse = ""), collapse = "")
}
colnames(A) <- col_Names
tbl_A <- flextable(A)
Currently, using set_header_labels:
library(flextable)
a<-c(1:8)
b<-c(1:8)
c<-c(2:9)
d<-c(2:9)
A <- data.frame(a,b,c,d)
flextable(A) |> set_header_labels(`a` = "a", `b` = "b", `c` = "a", `d` = "b")
I've a data frame corresponding to the sample below:
df = data.frame(subject=c("Subject A", "Subject B", "Subject C", "Subject D"),id=c(1:4))
I would like to transform this data frame to a list object that could be conveniently implemented in selectInput:
selectInput("subject", "Subject",
choices = #my_new_list )
I would like for the end-user to see the list of subjects in the selection and for the selectInput to return the corresponding numerical value (id).
If I attempt to get my list via:
df <- data.frame(lapply(df, as.character),
stringsAsFactors = FALSE)
df <- as.list(df)
The selectInput drop down menu shows all available options:
I'm only interested in listing subjects and passing the corresponding numerical values.
Use function split:
my_new_list <- split(df$id, df$subject)
my_new_list
#$`Subject A`
#[1] 1
#$`Subject B`
#[1] 2
#$`Subject C`
#[1] 3
#$`Subject D`
#[1] 4
Together with function with:
my_new_list <- with(df, split(id, subject))
For the choices argument, you can use a named list, from the doc:
If elements of the list are named then that name rather than the value
is displayed to the user
To make the named list you could try:
your_choices <- as.list(df$id)
names(your_choices) <- df$subject
And in the app:
selectInput("subject", "Subject",
choices = your_choices )
Use setNames, for example:
selectizeInput('x3', 'X3', choices = setNames(state.abb, state.name))
like in this example http://shiny.rstudio.com/gallery/option-groups-for-selectize-input.html