Retain row names for subsets of csv data

Retain row names for subsets of csv data - r

I have some data in a csv file, which includes row names. I want to take a single column of the data, while retaining the row names. The csv file was produced in the following manner:
MAT <- matrix(nrow=5, ncol=2, c(1:10))
rownames(MAT) <- c("First","Second","Third","Fourth","Fifth")
write.csv(MAT, file='~/test.csv', row.names=TRUE)
The matrix MAT is given below. Ultimately I want the first column of this matrix (after loading the csv file), with the row names intact.
[,1] [,2]
First 1 6
Second 2 7
Third 3 8
Fourth 4 9
Fifth 5 10
If I now read the csv file,
MAT2 <- read.csv(file='~/test.csv')
MAT2 is given by
X V1 V2
1 First 1 6
2 Second 2 7
3 Third 3 8
4 Fourth 4 9
5 Fifth 5 10
The read.csv command seems to have created another row. In any case, if I do MAT3 <- MAT2[,2], I do not get a matrix like above. as.matrix(MAT2[,2]) does not retain the row names as I want.
Any ideas of how to proceed?

Perhaps a better starting point is:
read.csv(file='~/test.csv', row.names = 1)
V1 V2
First 1 6
Second 2 7
Third 3 8
Fourth 4 9
Fifth 5 10
You can also wrap this in as.matrix:
as.matrix(read.csv(file='~/test.csv', row.names = 1))
Compare their structures:
> str(read.csv(file='~/test.csv', row.names = 1))
'data.frame': 5 obs. of 2 variables:
$ V1: int 1 2 3 4 5
$ V2: int 6 7 8 9 10
> str(as.matrix(read.csv(file='~/test.csv', row.names = 1)))
int [1:5, 1:2] 1 2 3 4 5 6 7 8 9 10
- attr(*, "dimnames")=List of 2
..$ : chr [1:5] "First" "Second" "Third" "Fourth" ...
..$ : chr [1:2] "V1" "V2"
If all you are actually concerned about is how to extract a column while retaining the original structure, perhaps drop = FALSE is what you're after:
MAT2 <- as.matrix(read.csv(file='~/test.csv', row.names = 1))
# V1 V2
# First 1 6
# Second 2 7
# Third 3 8
# Fourth 4 9
# Fifth 5 10
MAT2[, 2]
# First Second Third Fourth Fifth
# 6 7 8 9 10
MAT2[, 2, drop = FALSE]
# V2
# First 6
# Second 7
# Third 8
# Fourth 9
# Fifth 10

Related

rename a matrix column which as no initial names with dplyr

I'm trying to rename the columns of a matrix that has no names in dplyr :
set.seed(1234)
v1 <- table(round(runif(50,0,10)))
v2 <- table(round(runif(50,0,10)))
library(dplyr)
bind_rows(v1,v2) %>%
t
[,1] [,2]
0 3 4
1 1 9
2 8 6
3 11 7
5 7 8
6 7 1
7 3 4
8 6 3
9 3 6
10 1 NA
4 NA 2
I usually use rename for that with the form rename(new_name=old_name) however because there is no old_name it doesn't work. I've tried:
rename("v1","v2")
rename(c("v1","v2")
rename(v1=1, v2=2)
rename(v1=[,1],v2=[,v2])
rename(v1="[,1]",v2="[,v2]")
rename_(.dots = c("v1","v2"))
setNames(c("v1","v2"))
none of these works.
I know the base R way to do it (colnames(obj) <- c("v1","v2")) but I'm specifically looking for a dplyrway to do it.

This one with magrittr:
library(dplyr)
bind_rows(v1,v2) %>%
t %>%
magrittr::set_colnames(c("new1", "new2"))

In order to use rename you need to have some sort of a list (like a data frame or a tibble). So you can do two things. You either convert to tibble and use rename or use colnames and leave the structure as is, i.e.
new_d <- bind_rows(v1,v2) %>%
t() %>%
as.tibble() %>%
rename('A' = 'V1', 'B' = 'V2')
#where
str(new_d)
#Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 11 obs. of 2 variables:
# $ A: int 3 1 8 11 7 7 3 6 3 1 ...
# $ B: int 4 9 6 7 8 1 4 3 6 NA ...
Or
new_d1 <- bind_rows(v1,v2) %>%
t() %>%
`colnames<-`(c('A', 'B'))
#where
str(new_d1)
# int [1:11, 1:2] 3 1 8 11 7 7 3 6 3 1 ...
# - attr(*, "dimnames")=List of 2
# ..$ : chr [1:11] "0" "1" "2" "3" ...
# ..$ : chr [1:2] "A" "B"

Aggregate command in R to combine rows based on unique ID - output data structure?

I'm sure there's a super-easy answer to this. I am trying to combine ratings on subjects based on their unique ID. Here is a test dataset (called Aggregate_Test)I created, where the ID is unique to the subject, and the StaticScore was done by different raters:
ID StaticScore
1 6
2 7
1 5
2 6
3 7
4 8
3 4
4 5
After reading other posts carefully, I used aggregate to create the following dataset with new columns:
StaticAggregate<-aggregate(StaticScore ~ ID, Aggregate_Test, c)
> StaticAggregate
ID StaticScore.1 StaticScore.2
1 1 6 5
2 2 7 6
3 3 7 4
4 4 8 5
This data frame has the following str:
> str(StaticAggregate)
'data.frame': 4 obs. of 2 variables:
$ ID : num 1 2 3 4
$ StaticScore: num [1:4, 1:2] 6 7 7 8 5 6 4 5
If I try to create a new variable by subtracting StaticScore.1 from StaticScore.2, I get the following error:
Staticdiff<-StaticScore.1-StaticScore.2
Error: object 'StaticScore.1' not found
So, please help me - what is this data structure created by aggregate? A matrix? How could I convert StaticScore.1 and StaticScore.2 to separate variables, or barring that, what is the notation to subtract one from the other to create a new variable?

We can do a dcast to create a wide format from long and subtract those columns to create the 'StaticDiff'
library(data.table)
dcast(setDT(Aggregate_Test), ID~paste0("StaticScore", rowid(ID)), value.var="StaticScore"
)[, StaticDiff := StaticScore1 - StaticScore2]
Regarding the specific question about the aggregate behavior, we are just concatenating (c) the 'StaticScore' by 'ID'. The default behavior is to create a matrix column in aggregate
StaticAggregate<-aggregate(StaticScore ~ ID, Aggregate_Test, c)
This can be checked by looking at the str(StaticAggregate)
str(StaticAggregate)
#'data.frame': 4 obs. of 2 variables:
#$ ID : int 1 2 3 4
#$ StaticScore: int [1:4, 1:2] 6 7 7 8 5 6 4 5
How do we change it to normal columns?
It can be done with do.call(data.frame
StaticAggregate <- do.call(data.frame, StaticAggregate)
Check the str again
str(StaticAggregate)
#'data.frame': 4 obs. of 3 variables:
# $ ID : int 1 2 3 4
# $ StaticScore.1: int 6 7 7 8
# $ StaticScore.2: int 5 6 4 5
Now, we can do the calcuation as showed in the OP's post
StaticAggregate$Staticdiff <- with(StaticAggregate, StaticScore.1-StaticScore.2)
StaticAggregate
# ID StaticScore.1 StaticScore.2 Staticdiff
#1 1 6 5 1
#2 2 7 6 1
#3 3 7 4 3
#4 4 8 5 3

As the str output shown in the question indicates, StaticAggregate is a two column data.frame whose second column is a two column matrix, StaticScore. We can display the matrix like this:
StaticAggregate$StaticScore
## [,1] [,2]
## [1,] 6 5
## [2,] 7 6
## [3,] 7 4
## [4,] 8 5
To create a new column with the difference:
transform(StaticAggregate, diff = StaticScore[, 1] - StaticScore[, 2])
## ID StaticScore.1 StaticScore.2 diff
## 1 1 6 5 1
## 2 2 7 6 1
## 3 3 7 4 3
## 4 4 8 5 3
Note that there are no columns in StaticAggregate or in StaticAggregate$StaticScore named StaticScore.1 and StaticScore.2. StaticScore.1 in the heading of the data.frame print output just denotes the first column of the StaticScore matrix.
The reason that the matrix has no column names is that the aggregate function c does not produce them. If we change the original aggregate to this then they would have names:
StaticAggregate2 <- aggregate(StaticScore ~ ID, Aggregate_Test, setNames, c("A", "B"))
StaticAggregate2
## ID StaticScore.A StaticScore.B
## 1 1 6 5
## 2 2 7 6
## 3 3 7 4
## 4 4 8 5
Now we can write this using the column names of the matrix:
StaticAggregate2$StaticScore[, "A"]
## [1] 6 7 7 8
StaticAggregate2$StaticScore[, "B"]
## [1] 5 6 4 5
Note that there is a significant advantage of the way R's aggregate works as it allows simpler access to the results -- the kth column of the matrix is the kth result of the aggregate function. This is in contrast to having the k+1st column of the data.frame representing the kth result of the aggregate function. This may not seem like much of a simplification here but for more complex problems it can be a significant simplification if you need to access the statistics matrix. Of course, you can always flatten it to 3 columns if you want
do.call(data.frame, StaticAggregate)
but once you think about it for a while you may find that the structure it provides is actually more convenient.

Switch one column to Another

I have a very basic question, but I'm new to R so would appreciate any help.
I have a column (among other columns) in one dataset which the rows read as numeric codes(for example).
In another dataset, I have two columns, one is the numeric codes (same as above) and the column next to it are names.
Is there a way in R that I can rename the numeric codes in the first datasets to the names using the second dataset as a reference essentially?
Many thanks for your help

Some sample data:
set.seed(42) # because I use `sample`, o/w unnecessary
df1 <- data.frame(n = sample(5, size = 10, replace = TRUE))
str(df1)
# 'data.frame': 10 obs. of 1 variable:
# $ n: int 5 5 2 5 4 3 4 1 4 4
df2 <- data.frame(n = 1:5, txt = LETTERS[5:9], stringsAsFactors = FALSE)
str(df2)
# 'data.frame': 5 obs. of 2 variables:
# $ n : int 1 2 3 4 5
# $ txt: chr "E" "F" "G" "H" ...
Base R use of merge:
merge(df1, df2, by = "n")
# n txt
# 1 1 E
# 2 2 F
# 3 3 G
# 4 4 H
# 5 4 H
# 6 4 H
# 7 4 H
# 8 5 I
# 9 5 I
# 10 5 I
Notice that the order of df1 is not preserved. We can use merge(..., sort = FALSE), but the order is "unspecified" (?merge).
Using dplyr::left_join:
library(dplyr)
df1 %>%
left_join(df2, by = "n")
# n txt
# 1 5 I
# 2 5 I
# 3 2 F
# 4 5 I
# 5 4 H
# 6 3 G
# 7 4 H
# 8 1 E
# 9 4 H
# 10 4 H
(Order is preserved.)

Converting data to data frame

I'm new to R and to programming in general. I have this data: screenshot
I have 12 'IDs' (research subjects), numbered 1-12. The 'types' column tells the 'type' of each ID. For example, the first 5 numbers of the 'types' column refer to the 'types' of first 5 IDs, i.e. 'types' of first 5 IDs are 3,3,2,1,1 respectively.
The 'pairs' column describes how IDs are paired together. For example, 6 is paired with 9; 4 is paired with 7; 1 is paired with 11 and so on.
So what I need help with is that I want to create three columns using this data.
first column: lists the ID (1-12)
second column: returns the ID of the pair (like 1 was paired with 11, so second column should say 11 for ID 1)
third column: tells the 'type' of the pair (so 'type' of 11 is 3. third column should display that.
Here's a visualization of the desired output format: output format
Any help would be much appreciated.
Thanks in advance!

You can do this with some clever indexing. I entered the raw data as a vector for types, and a list of vectors for pairs:
# Enter the raw data
type <- c(3, 3, 2, 1, 1, 1, 2, 3, 1, 1, 3, 1)
pairs <- list(c(6, 9), c(4, 7), c(1, 11), c(3, 10), c(2, 12), c(5, 8))
From this, you can create the first two columns of the desired output by stacking all of the pairs once in their original order, and then again in the reverse order. (I reversed each pair by using lapply(pairs, rev), which applies the rev command to each pair in the list.)
# Create a 12 x 2 matrix of the pairs
pairs.mat <- do.call(rbind, c(pairs, lapply(pairs, rev)))
pairs.mat
# [,1] [,2]
# [1,] 6 9
# [2,] 4 7
# [3,] 1 11
# [4,] 3 10
# [5,] 2 12
# [6,] 5 8
# [7,] 9 6
# [8,] 7 4
# [9,] 11 1
# [10,] 10 3
# [11,] 12 2
# [12,] 8 5
For cleanliness of results, I converted this into a data.frame:
# Convert to data frame
colnames(pairs.mat) <- c("id", "match")
df <- as.data.frame(pairs.mat)
Finally, we can get the type_match column by taking type in the order of the match column from the data.frame we just created.
# Add in the type_match column
df$type_match <- type[df$match]
# Print results in order
df[order(df$id), ]
# id match type_match
# 3 1 11 3
# 5 2 12 1
# 4 3 10 1
# 2 4 7 2
# 6 5 8 3
# 1 6 9 1
# 8 7 4 1
# 12 8 5 1
# 7 9 6 1
# 10 10 3 2
# 9 11 1 3
# 11 12 2 3
And that should give you the desired output.

How to extract counts as a vector from a table in R?

I'm trying to write a function to extract the frequencies of this table:
0 1 2 3 4 5 6 7
30 22 9 12 2 5 1 16
So I want to get c(30, 22, 9, 12, 2, 5, 1, 16).
The table changes each time I run the function, so I need something that can extract the information from the table automatically, so I don't have write a c() function each time.

Let's create a results object from table() and examine it:
> set.seed(42) ## be reproducible
> X <- sample(1:5, 50, replace=TRUE) ## our data
> table(X) ## our table
X
1 2 3 4 5
7 6 9 10 18
> str(table(X)) ## look at structure of object
'table' int [1:5(1d)] 7 6 9 10 18
- attr(*, "dimnames")=List of 1
..$ X: chr [1:5] "1" "2" "3" "4" ...
> as.integer(table(X)) ## and just convert to vector
[1] 7 6 9 10 18
> as.numeric(table(X)) ## or use `numeric()`
[1] 7 6 9 10 18
>
And for completeness, two more ways to get the data:
> unname(table(X)) ## jdropping names reduces to the vector
[1] 7 6 9 10 18
> table(X)[] ## or simply access it
[1] 7 6 9 10 18
>

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Retain row names for subsets of csv data - r

Related

rename a matrix column which as no initial names with dplyr

Aggregate command in R to combine rows based on unique ID - output data structure?

Switch one column to Another

Converting data to data frame

How to extract counts as a vector from a table in R?

Categories

Resources