R, How to set row names attribute as numeric from character? - r

I'm new to R and would like to know how I can set attribute of row name as numeric.
I was trying to sort data frame by row names with
df[ order(row.names(df)),]
and the result was like
A B C D E
1 13 6 4 4 3
10 16 5 3 8 3
100 6 4 12 14 5
101 2 14 15 3 10
102 5 2 2 9 5
103 9 1 12 3 15
104 15 1 1 8 2
105 2 10 14 7 4
106 6 2 10 2 9
107 3 1 1 3 22
108 11 4 1 6 15
109 4 29 2 6 2
11 6 29 1 4 1
I have tried
row.names(df) <- attr(df, "row.names")
row.names(df) <- as.numeric(row.names(df))
But when I check row name again, it comes back to
[1] "character" "vector" "data.frameRowLabels" "SuperClassMethod"
I don't know what to do.. Please help me

From R's help on ?row.names:
All data frames have a row names attribute, a character vector of length the number of rows with no duplicates nor missing values.
This means that the row names will always be a character vector. You would need to use workarounds as suggested in the comments to make them "usable" as integers, basically always coercing. One suggestion could be that you create an id column of class integer and do not use row.names as id:
df$id <- as.integer(row.names(df))
df[order(df$id), ]
Omitting row.names also seems to be the way to go with popular data frame rethinking such as data.table or tibble - none of those use row names.

Related

merge .csvs based on common column but of inconsistent length

Afternoon (or morning, evening)
I am trying to merge several .csv files that have a similar layout, they have a class in one column (character) and an abundance (num) in another.
When imported as a data.frame example would be:
print(one[1:5,])
X Class Abundance_inds
1 1 Chaetognath 2
2 2 Copepod_Calanoid_Acartia_spp 9
3 3 Copepod_Calanoid_Centropages_spp 4
4 4 Copepod_Calanoid_Temora_spp 1
5 5 Copepod_Calanoid_Unknown 55
The class column (number of rows and order) changes every csv based on what was found and I want to bind several (30+) csvs based on the class column, I had the following (which I am sure was working a while ago.....):
DensityFiles <- list.files(CSVdirectory,
pattern = '.csv',
full.names = T)
Combined <- rbindlist(
lapply(
DensityFiles,
fread),
fill = TRUE,
use.names = TRUE)
This produces the following:
str(Combined)
Classes ‘data.table’ and 'data.frame': 461 obs. of 3 variables:
not quite what I was after! I am looking for the following:
> print(example)
X Class CSV.NAME CSV.NAME.1
1 1 Bivalve_Larvae 1 3
2 2 Bryozoa_Larvae 4 6
3 3 Chaetognath NA 7
4 4 Cnidaria 1 8
5 5 Copepod_Calanoid_Acartia_spp 22 NA
6 6 Copepod_Calanoid_Calanus_spp 24 4
7 7 Copepod_Calanoid_Candacia_sp 5 3
8 8 Copepod_Calanoid_Centropages_spp 41 2
9 9 Copepod_Calanoid_Temora_spp 39 8
10 10 Copepod_Calanoid_Unknown 458 NA
11 11 Copepod_Cyclopoid_Corycaeus_spp 46 NA
12 12 Copepod_Cyclopoid_Oithona_spp NA 4
13 13 Copepod_Cyclopoid_Oncaea_spp NA 7
14 14 Copepod_Harpacticoid 36 NA
15 15 Copepod_Nauplii 12 9
I can get the CSV name into the column header using idcol = "origin" when using
data.table libary rbindlist. but not sure if this works for all solutions.
I have had a good hunt around but most examples seem to be dealing with a consistent number of rows,
any help would be greatly appreciated!
Jim
You can use readr and bind_rows
library(dplyr)
library(readr)
df <- do.call(bind_rows, lapply(DensityFiles,read_csv))

How to run a loop on different sections of the same data.frame [duplicate]

This question already has answers here:
Grouping functions (tapply, by, aggregate) and the *apply family
(10 answers)
Closed 7 years ago.
Suppose I have a data frame with 2 variables which I'm trying to run some basic summary stats on. I would like to run a loop to give me the difference between minimum and maximum seconds values for each unique value of number. My actual data frame is huge and contains many values for 'number' so subsetting and running individually is not a realistic option. Data looks like this:
df <- data.frame(number=c(1,1,1,2,2,2,2,3,3,4,4,4,4,4,4,5,5,5,5),
seconds=c(1,4,8,1,5,11,23,1,8,1,9,11,24,44,112,1,34,55,109))
number seconds
1 1 1
2 1 4
3 1 8
4 2 1
5 2 5
6 2 11
7 2 23
8 3 1
9 3 8
10 4 1
11 4 9
12 4 11
13 4 24
14 4 44
15 4 112
16 5 1
17 5 34
18 5 55
19 5 109
my current code only returns the value of the difference between minimum and maximum seconds for the entire data fram:
ZZ <- unique(df$number)
for (i in ZZ){
Y <- max(df$seconds) - min(df$seconds)
}
Since you have a lot of data performance should matter and you should use a data.table instead of a data.frame:
library(data.table)
dt <- as.data.table(df)
dt[, .(spread = (max(seconds) - min(seconds))), by=.(number)]
number spread
1: 1 7
2: 2 22
3: 3 7
4: 4 111
5: 5 108

combine all vectores into dataframe which starts with specific name in R

I have different numeric vectors with the same length and I want to combine those with specific name into dataframe; lets say:
I want to combine vectors which starts with "pred"
prednn=c(1,2,3,4,5)
prednb=c(2,6,4,7,8)
nope=c(5,7,5,1,1)
predsv=c(55,11,22,33,44)
result: dfpred:
prednn prednb predsv
1 2 55
2 6 11
3 4 22
4 7 33
5 8 44
How can I do it in R?
Thanks
You can try mget
data.frame(mget(ls(pattern='^pred')))
# prednb prednn predsv
#1 2 1 55
#2 6 2 11
#3 4 3 22
#4 7 4 33
#5 8 5 44
you can also use pmatch in your workplace variables ls().

How to merge dating correctly

I'm trying to merge 7 complete data frames into one great wide data frame. I figured I have to do this stepwise and merge 2 frames into 1 and then that frame into another so forth until all 7 original frames becomes one.
fil2005: "ID" "abr_2005" "lop_2005" "ins_2005"
fil2006: "ID" "abr_2006" "lop_2006" "ins_2006"
But the variables "abr_2006" "lop_2006" "ins_2006" and 2005 are all either 0,1.
Now the things is, I want to either merge or do a dcast of some sort (I think) to make these two long data frames into one wide data frame were both "abr_2005" "lop_2005" "ins_2005" and abr_2006" "lop_2006" "ins_2006" are in that final file.
When I try
$fil_2006.1 <- merge(x=fil_2005, y=fil_2006, by="ID__", all.y=T)
all the variables with _2005 at the end if it is saved to the fil_2006.1, but the variables ending in _2006 doesn't.
I'm apparently doing something wrong. Any idea?
Is there a reason you put those underscores after ID__? Otherwise, the code you provided will work
An example:
dat1 <- data.frame("ID"=seq(1,20,by=2),"varx2005"=1:10, "vary2005"=2:11)
dat2 <- data.frame("ID"=5:14,"varx2006"=1:20, "vary2006"=21:40)
# create data frames of differing lengths
head(dat1)
ID varx2005 vary2005
1 1 1 2
2 3 2 3
3 5 3 4
4 7 4 5
5 9 5 6
6 11 6 7
head(dat2)
ID varx2006 vary2006
1 5 1 21
2 6 2 22
3 7 3 23
4 8 4 24
5 9 5 25
6 10 6 26
merged <- merge(dat1,dat2,by="ID",all=T)
head(merged)
ID varx2006 vary2006 varx2005 vary2005
1 1 NA NA 1 2
2 3 NA NA 2 3
3 5 1 21 3 4
4 5 11 31 3 4
5 7 13 33 4 5
6 7 3 23 4 5

How can I produce a table into a data.frame?

I printed out the summary of a column variables as such:
Please see below the summary table printed out from R:
I would like to generate it into a data.frame. However, there are too many subject names that it's very difficult to list out all, also, the term "OTHER" with number 31 means that there are 319 subjects which appear only 1 time in the original data.frame.
So, the new data.frame I hope to produce would look like below:
Here is one possible solution.
Table<-table(rpois(100,5))
as.data.frame(Table)
Var1 Freq
1 1 2
2 2 11
3 3 9
4 4 18
5 5 13
6 6 20
7 7 14
8 8 8
9 9 3
10 10 1
11 11 1

Resources