Index labels removed when deleting rows, do I want to fix this? - r

I deleted rows from my R dataframe and now the index numbers are out of order. For example, the row-index was 1,2,3,4,5 before but now it is 2,3,4 because I deleted rows 1 and 5.
Do I want to change the index labels from 2,3,4 to 1,2,3 on my new dataframe?
If so, how do I do this?
If not, why not?
library(rvest)
url <- "https://en.wikipedia.org/wiki/Mid-American_Conference"
pg <- read_html(url) # Download webpage
pg
tb <- html_table(pg, fill = TRUE) # Extract HTML tables as data frames
tb
macdf <- tb[[2]]
macdf <- subset(macdf, select=c(1,2,5))
colnames(macdf) <- c("School","Location","NumStudent")
macdf <- macdf[-c(1,8),]

You can change the labels from "2" "3" "4" "5" "6" "7" "9" "10" "11" "12" "13" "14" to "1" "2" "3" "4" "5" "6" "7" "8" "9" "10" "11" "12" using:
row.names(macdf) <- 1:nrow(macdf)

You can do something like this-
> library(data.table)
> subset(setDT(macdf,row.names),select=-rn)
OR
rownames(macdf) <- NULL

Related

How can I create a new txt file by matching two different txt file and finding the same values in R?

I have 2 text files: File A and File B.
I will match the first column of File A and the first row of File B.
If the values of the first column in File A is in the first row of File B, I want to get those values along their all column values and the first row values that correspond to them.
File A:
"...1" "AZD5153" "I-BET-762" "I-BRD9" "JQ1" "OTX-015" "PFI-1" "RVX-208"
"1" "697" 0.155445 1.328728 7.6345 7.553337 0.496983 1.776878 24.540592
"2" "5637" 11.767517 66.561037 314.672133 3.891947 17.54448 10.27559 261.520227
"3" "22RV1" 2.144765 9.04165 193.4228 4.448654 19.315063 9.55938 72.036416
"4" "23132-87" 1.882177 41.26784 33.482054 10.959235 9.025218 19.621473 75.332425
"5" "42-MG-BA" 2.252297 26.56874 54.934795 7.92924 10.276993 7.937254 64.873664
"6" "639-V" 6.412568 16.979172 30.882936 12.444024 21.915518 6.449247 96.50391
File B:
"...1" "1321N1" "143B" "22RV1" "23132-87" "42-MG-BA"
"1" "100009676_at" 61161 62052 61249 66154 54236
"2" "10000_at" 81556 66152 45676 43519 66723
"3" "10001_at" 97864 99699 8872 91376 10029
"4" "10002_at" 37977 40304 38455 37085 36431
"5" "10003_at" 35458 38504 40458 39508 41589
"6" "100048912_at" 40034 37959 41465 39271 39157
"7" "100049716_at" 42744 46775 52087 47239 42522
Expected File:
"...1" "22RV1" "23132-87" "42-MG-BA"
"1" "100009676_at" 61249 66154 54236
"2" "10000_at" 45676 43519 66723
"3" "10001_at" 8872 91376 10029
"4" "10002_at" 38455 37085 36431
"5" "10003_at" 40458 39508 41589
"6" "100048912_at" 41465 39271 39157
"7" "100049716_at" 52087 47239 42522
First of all, ensure you have the correct paths to FILEA.txt and FILEB.txt, as well as the desired path to FILEC.txt. In my case, I did:
path_to_file_A <- path.expand("~/FILEA.txt")
path_to_file_B <- path.expand("~/FILEB.txt")
path_to_file_C <- path.expand("~/FILEC.txt")
Now the following code should work:
A <- read.table(path_to_file_A, header = TRUE, check.names = FALSE)
B <- read.table(path_to_file_B, header = TRUE, check.names = FALSE)
result <- cbind(B[1], B[na.omit(match(A[[1]], names(B)))])
write.table(result, path_to_file_C)
Which results in:
FILEC.txt
"...1" "22RV1" "23132-87" "42-MG-BA"
"1" "100009676_at" 61249 66154 54236
"2" "10000_at" 45676 43519 66723
"3" "10001_at" 8872 91376 10029
"4" "10002_at" 38455 37085 36431
"5" "10003_at" 40458 39508 41589
"6" "100048912_at" 41465 39271 39157
"7" "100049716_at" 52087 47239 42522

How to create a dataframe with various length of rows in R?

I am having a list of list paths which shown below ?
The Code is :
for (each in paths)
{
print (each)
}
The output is :
[1] "1" "2"
[1] "1" "2" "3"
[1] "1" "2" "3" "5"
[1] "1" "2" "4"
[1] "1" "2" "4" "5"
[1] "1" "3"
[1] "1" "3" "5"
[1] "1" "4"
[1] "1" "4" "5"
[1] "1" "5"
[1] "2" "3"
[1] "2" "3" "5"
[1] "2" "4"
[1] "2" "4" "5"
[1] "3" "5"
[1] "4" "5"
How to append this all as a rows of a data frame. as.data.frame fails due to unequal rows length.
A data frame is rectangular by definition, with the same number of columns in each row. You could set the length of each of your rows to be the same (they will be filled in with NA), and then rbind them together:
maxlength = max(lengths(paths))
paths2 = lapply(paths, function(x) {length(x) = maxlength; return(x)})
paths_df = do.call(rbind, args = paths2)
That will give a matrix, but you can easily convert to data frame from there.
data.frame needs to be rectangular. Also all elements of a given column need to be the same type of object. Thus, you could have a data.frame column composed of object of type list which can vary in size.
paths=list(1,c(1,2))
df=data.frame("pathNumber"= 1:length(paths))
df$path=paths
The result looks like this
pathNumber path
1 1 1
2 2 1, 2
One option is to have the list as a column of a data frame. This may be desirable if you want to have some other columns.
df <- data.frame(paths = I(paths))

list of dates without commas

I have this txt file
"","x"
"1","2005-01-31"
"2","2005-03-31"
"3","2005-03-31"
"4","2005-05-31"
"5","2005-05-31"
"6","2005-07-31"
"7","2005-07-31"
"8","2005-08-31"
"9","2005-10-31"
"10","2005-10-31"
list of monthly dates. How can I get the same list but without commas, like this one:
"x"
"1" "2005-01-31"
"2" "2005-02-28"
"3" "2005-03-31"
"4" "2005-04-29"
"5" "2005-05-31"
"6" "2005-06-30"
"7" "2005-07-29"
"8" "2005-08-31"
"9" "2005-09-30"
"10" "2005-10-31"
Thank you!
All you have to do is specify separator while loading file
read.csv(file, header = TRUE, **sep = ","**)

write table in R

I'm trying to get my data from R.
When I type: write.table(c, "~/XXX/XXX.txt", sep="\t"). I get something like this:
"x"
"1" 3011.5648786606
"2" 15654.1820393584
"3" 12368.7319176159
"4" 3055.2054987339
"5" 4590.9390484852
"6" 15472.0519755823
"7" 22142.4386253643
"8" 43684.1996516822
"9" 20931.0908837875
"10" 15165.4255765957
"11" 21790.7749747969
"12" 42362.7562956186
.............................
How to get rid of "x", "1", "2",...?
Thanks!
Some of the arguments in write.table are by default TRUE. Change it to FALSE and it should work
write.table(c, "~/XXX/XXX.txt", sep="\t", quote=FALSE,
row.names=FALSE, col.names=FALSE)
NOTE: c is a function. So, it is better to give object names that are not functions.

How to edit "row.names" after split and cut2 in R?

I want to edit out some information from row.names that are created automatically once split and cut2 were used. See following code:
#Mock data
date_time <- as.factor(c('8/24/07 17:30','8/24/07 18:00','8/24/07 18:30',
'8/24/07 19:00','8/24/07 19:30','8/24/07 20:00',
'8/24/07 20:30','8/24/07 21:00','8/24/07 21:30',
'8/24/07 22:00','8/24/07 22:30','8/24/07 23:00',
'8/24/07 23:30','8/25/07 00:00','8/25/07 00:30'))
U. <- as.numeric(c('0.2355','0.2602','0.2039','0.2571','0.1419','0.0778','0.3557',
'0.3065','0.1559','0.0943','0.1519','0.1498','0.1574','0.1929'
,'0.1407'))
#Mock data frame
test_data <- data.frame(date_time,U.)
#To use cut2
library(Hmisc)
#Splitting the data into categories
sub_data <- split(test_data,cut2(test_data$U.,c(0,0.1,0.2)))
new_data <- do.call("rbind",sub_data)
test_data <- new_data
You will see that "test_data" would have an extra column "row.names" with values such as "[0.000,0.100).6", "[0.000,0.100).10", etc.
How do I remove "[0.000,0.100)" and keep the number after the "." such as 6 and 10 so that I can reference these rows by their original row number later?
Any other better method to do this?
You could also set the names of sub_data to NULL.
names(sub_data) <- NULL
test_data <- do.call('rbind', sub_data)
row.names(test_data)
#[1] "6" "10" "5" "9" "11" "12" "13" "14" "15" "1" "2" "3" "4" "7" "8"
You could use a Regular Expression (Regex), as follows:
rownames(test_data) = gsub(".*[]\\)]\\.", "", rownames(test_data))
It's cryptic if you're not familiar with Regular Expressions, but it basically says match any sequence of characters (.*) that are followed by either a brace or parenthesis ([]\\)]) and then by a period (\\.) and remove all of it.
The double backslashes are "escapes" indicating that the character following the double-backslash should be interpreted literally, rather than in its special Regex meaning (e.g., . means "match any single character", but \\. means "this is really just a period").
Just for fun, you can also use regmatches
> Names <- rownames(test_data)
> ( rownames(test_data) <- regmatches(Names, regexpr("[0-9]+$", Names)) )
[1] "6" "10" "5" "9" "11" "12" "13" "14" "15" "1" "2" "3" "4" "7" "8"

Resources