Replacing rows in R - r

In R am reading a file with comments as csv using
read.data.raw = read.csv(inputfile, sep='\t', header=F, comment.char='')
The file looks like this:
#comment line 1
data 1<tab>x<tab>y
#comment line 2
data 2<tab>x<tab>y
data 3<tab>x<tab>y
Now I extract the uncommented lines using
comment_ind = grep( '^#.*', read.data.raw[[1]])
read.data = read.data.raw[-comment_ind,]
Which leaves me:
data 1<tab>x<tab>y
data 2<tab>x<tab>y
data 3<tab>x<tab>y
I am modifying this data through some separate script which maintains the number of rows/cols and would like to put it back into the original read data (with the user comments) and return it to the user like this
#comment line 1
modified data 1<tab>x<tab>y
#comment line 2
modified data 2<tab>x<tab>y
modified data 3<tab>x<tab>y
Since the data I extracted in read.data preserves the row names row.names(read.data), I tried
original.read.data[as.numeric(row.names(read.data)),] = read.data
But that didn't work, and I got a bunch of NA/s
Any ideas?

Does this do what you want?
read.data.raw <- structure(list(V1 = structure(c(1L, 3L, 2L, 4L, 5L),
.Label = c("#comment line 1", "#comment line 2", "data 1", "data 2",
"data 3"), class = "factor"), V2 = structure(c(1L, 2L, 1L, 2L, 2L),
.Label = c("", "x"), class = "factor"), V3 = structure(c(1L, 2L, 1L,
2L, 2L), .Label = c("", "y"), class = "factor")), .Names = c("V1",
"V2", "V3"), class = "data.frame", row.names = c(NA, -5L))
comment_ind = grep( '^#.*', read.data.raw[[1]])
read.data <- read.data.raw[-comment_ind,]
# modify V1
read.data$V1 <- gsub("data", "DATA", read.data$V1)
# rbind() and then order() comments into original places
new.data <- rbind(read.data.raw[comment_ind,], read.data)
new.data <- new.data[order(as.numeric(rownames(new.data))),]

Related

Formatting issues when removing row numbers in datatable

I am using the R package DT to create a table. This table contains hyperlinks and the issue that I am having is that when I put rownames = FALSE to remove the row numbers, the formatting on the hyperlinks goes away. I was wondering if anyone had a solution to this problem?
Example data:
structure(list(school = structure(c(2L, 3L, 1L, 4L), .Label = c("Linfield",
"OSU", "UO", "Willamette"), class = "factor"), mascot = structure(c(2L,
3L, 4L, 1L), .Label = c("bearcats", "beavers", "ducks", "wildcats"
), class = "factor"), website = structure(c(1L, 3L, 2L, 4L), .Label = c("oregonstate.edu",
"linfield.edu", "uoregon.edu",
"willamette.edu"), class = "factor"),
School_colors = structure(c(2L, 1L, 3L, 4L), .Label = c("<span style=\"color:green\">green & yellow</span>",
"<span style=\"color:orange\">orange & black</span>", "<span style=\"color:purple\">purple and red</span>",
"<span style=\"color:red\">red and yellow</span>"), class = "factor")), class = "data.frame", row.names = c(NA,
-4L))
Code used to generate table WITH row names
datatable(df,escape = c(1,2,3))
Code used to generate table WITHOUT row names
datatable(df, rownames = FALSE,escape = c(1,2,3))
As you can see, with the second example code, the formatting in the third column is no longer there. What I want to do is create a table without row numbers but also keep the formatting of the hyperlinks
Since your deleted the rownames the indexes of your columns changed as well.
Therefore you should change your escape argument to only 1 and 2.
datatable(df, rownames = FALSE,escape = c(1,2))
A simple escape = FALSE also works for you.
datatable(df, rownames = FALSE, escape = FALSE)

trying to summarize survey data for questions with 'select all that apply' using R

We have a survey that asks for 'select all that apply' so the result is a string inside quotes with the values separated by commas. i.e. "red, black,green"
There are other question about income so I have a factor with 'low, medium, high'
I want to be able to answer questions: What percent selected 'Red', then group that by income.
I can split the string with
'''df4 <- c("black,silver,green")'''
I can create a data frame with a timestamp and the split string with
'''t2 <- as.data.frame(c(df2[2],l2))'''
I am not able to understand how to do this for all rows at one time.
Here is a DPUT of the input:
structure(list(RespData = structure(1:2, .Label = c("1/20/2020",
"1/21/2020"), class = "factor"), CarColor = c("red,blue,green,yellow",
"black,silver,green")), row.names = c(NA, -2L), class = "data.frame")
and here is a DPUT of the desired output:
structure(list(RespData = structure(c(1L, 1L, 1L, 1L, 2L, 2L,
2L), .Label = c("1/20/2020", "1/21/2020"), class = "factor"),
Cars = structure(c(3L, 1L, 2L, 4L, 5L, 6L, 2L), .Label = c("blue",
"green", "red", "yellow", "black", "silver"), class = "factor")), row.names = c(NA,
-7L), class = "data.frame")
Example of Function:
MySplitFunc <- function(ListIn) {
# build an empty data frame and set the column names
x1.all <- ListIn[0,]
names(x1.all) <- c("ResponseTime", "Descriptive")
# for each row build the data and combine to growing list
for(x in 1:nrow(ListIn)) {
#print(x)
r1 <- ListIn[x,1]
c1 <- strsplit(ListIn[x,2],",")
x1 <- as.data.frame(c(r1,c1))
# set the names and combine to all
names(x1) <- c("ResponseTime", "Descriptive")
x1.all <- rbind(x1.all,x1)
}
# strip the whitespace
x1.all <- data.frame(lapply(x1.all, trimws), stringsAsFactors = TRUE)
return(x1.all)
}

Simple text cleaning into all columns of a dataframe frame

I have a dataframe which I would like to implement some basic formation rules.
The dataframe is:
df <- structure(list(colname1 = structure(c(2L, 1L, 1L), .Label = c("",
"TEXTA"), class = "factor"), colname2 = structure(c(2L, 1L, 3L
), .Label = c("TEXTA", "TEXTB", "TEXTE"), class = "factor"),
colname3 = structure(c(2L, 3L, 1L), .Label = c("", "TEXTC",
"TEXTD"), class = "factor")), .Names = c("colname1", "colname2",
"colname3"), class = "data.frame", row.names = c(NA, -3L))
I try to run the following for the whole dataframe data:
df2 <- as.data.frame(tolower(df))
df2 <- as.data.frame(gsub("[[:punct:]]", "", df2))
but this converts the column names of dataframe to rows. What can I do to make in lower case and remove punctuation from all rows of the example dataframe (I am not interesting for colnames)?
We remove the punctuation characters on each column by looping through the columns (lapply(df, ..), assign the output back to the original dataset
df[] <- lapply(df, function(x) gsub("[[:punct:]]+", "", tolower(x)))
Using tidyverse, this can be done by
library(dplyr)
df %>%
mutate_all(funs(gsub("[[:punct:]]+", "", tolower(.))))

change the names for certain columns in a data frame [duplicate]

This question already has answers here:
Changing column names of a data frame
(18 answers)
Closed 7 years ago.
If I want to change the name from 2 column to the end , why my command does not work ?
fredTable <- structure(list(Symbol = structure(c(3L, 1L, 4L, 2L, 5L), .Label = c("CASACBM027SBOG",
"FRPACBW027SBOG", "TLAACBM027SBOG", "TOTBKCR", "USNIM"), class = "factor"),
Name = structure(1:5, .Label = c("bankAssets", "bankCash",
"bankCredWk", "bankFFRRPWk", "bankIntMargQtr"), class = "factor"),
Category = structure(c(1L, 1L, 1L, 1L, 1L), .Label = "Banks", class = "factor"),
Country = structure(c(1L, 1L, 1L, 1L, 1L), .Label = "USA", class = "factor"),
Lead = structure(c(1L, 1L, 3L, 3L, 2L), .Label = c("Monthly",
"Quarterly", "Weekly"), class = "factor"), Freq = structure(c(2L,
1L, 3L, 3L, 4L), .Label = c("1947-01-01", "1973-01-01", "1973-01-03",
"1984-01-01"), class = "factor"), Start = structure(c(1L,
1L, 1L, 1L, 1L), .Label = "Current", class = "factor"), End = c(TRUE,
TRUE, TRUE, TRUE, FALSE), SeasAdj = c(FALSE, FALSE, FALSE,
FALSE, TRUE), Percent = structure(c(1L, 1L, 1L, 1L, 1L), .Label = "Fed", class = "factor"),
Source = structure(c(1L, 1L, 1L, 1L, 1L), .Label = "Res", class = "factor"),
Series = structure(c(1L, 1L, 1L, 1L, 2L), .Label = c("Level",
"Ratio"), class = "factor")), .Names = c("Symbol", "Name",
"Category", "Country", "Lead", "Freq", "Start", "End", "SeasAdj",
"Percent", "Source", "Series"), row.names = c("1", "2", "3",
"4", "5"), class = "data.frame")
Then in order to change the second column name to the end I use the following order but does not work
names(fredTable[,-1]) = paste("case", 1:ncol(fredTable[,-1]), sep = "")
or
names(fredTable)[,-1] = paste("case", 1:ncol(fredTable)[,-1], sep = "")
In general how one can change column names of specific columns for example
2 to end, 2 to 7 and etc and set it as the name s/he like
Replace specific column names by subsetting on the outside of the function, not within the names function as in your first attempt:
> names(fredTable)[-1] <- paste("case", 1:ncol(fredTable[,-1]), sep = "")
Explanation
If we save the new names in a vector newnames we can investigate what is going on under the hood with replacement functions.
#These are the names that will replace the old names
newnames <- paste("case", 1:ncol(fredTable[,-1]), sep = "")
We should always replace specific column names with the format:
#The right way to replace the second name only
names(df)[2] <- "newvalue"
#The wrong way
names(df[2]) <- "newvalue"
The problem is that you are attempting to create a new vector of column names then assign the output to the data frame. These two operations are simultaneously completed in the correct replacement.
The right way [Internal]
We can expand the function call with:
#We enter this:
names(fredTable)[-1] <- newnames
#This is carried out on the inside
`names<-`(fredTable, `[<-`(names(fredTable), -1, newnames))
The wrong way [Internal]
The internals of replacement the wrong way are like this:
#Wrong way
names(fredTable[-1]) <- newnames
#Wrong way Internal
`names<-`(fredTable[-1], newnames)
Notice that there is no `[<-` assignment. The subsetted data frame fredTable[-1] does not exist in the global environment so no assignment for `names<-` occurs.

processing data frame in R

I have this data frame. I would like to put each unique Dept and place the corresponding Name under each unique Dept. As you can see there are multiple Dept. For example, final dcoument should look like this:
Internet
Public-Web
Intranet
BackOffice
Batch
BackEnd
BackEnd
WebLogic
Oracle
dput(x)
structure(list(ID = c(1234L, 2345L, 6789L, 3456L, 7890L, 1987L
), Name = structure(c(5L, 3L, 2L, 1L, 6L, 4L), .Label = c("BackEnd",
"Batch", "Intranet", "Oracle", "Public-Web", "WebLogic"), class = "factor"),
Dept = structure(c(3L, 3L, 2L, 2L, 1L, 1L), .Label = c("BackEnd",
"BackOffice", "Internet"), class = "factor")), .Names = c("ID",
"Name", "Dept"), class = "data.frame", row.names = c(NA, -6L))
Any ideas how I would do this in R?
I'll assume you may have duplicates, and therefore use unique:
for(dept in unique(x$Dept)){
print(dept)
x2 <- subset(x,subset=Dept==dept)
for(name in unique(x2$Name)){
print(paste(sep=""," ",name))
}
}
Replace the print whith whatever you need.
You can use split to achieve this:
split(as.character(df$Name), df$Dept)
# $BackEnd
# [1] "WebLogic" "Oracle"
#
# $BackOffice
# [1] "Batch" "BackEnd"
#
# $Internet
# [1] "Public-Web" "Intranet"
If you want unique entries, then just do:
df <- unique(df[, 2:3])
split(as.character(df$Name), df$Dept)

Resources