Viewing single column of data frame in R [duplicate] - r

This question already has answers here:
How to subset matrix to one column, maintain matrix data type, maintain row/column names?
(1 answer)
How do I extract a single column from a data.frame as a data.frame?
(3 answers)
Closed 5 years ago.
I am running a simulation model that creates a large data frame as its output, with each column corresponding to the time-series of a particular variable:
data5<-as.data.frame(simulation3$baseline)
Occasionally I want to look at subsets, especially particular columns, of this data frame in order to get an idea of the output. For this I am using the View-function like so
View(data5[1:100,1])
for instance, if I wish to see the first 100 rows of column 1. Alternatively, I also sometimes do something like this, using the names of the time series:
timeframe=1:100
toAnalyse=c("u","u_n","u_e","u_nw")
View(data5[timeframe,toAnalyse])
In either case, there is an annoying display problem when I am trying to view a single column on its own (as for instance with View(data5[1:100,1])), whereby what I get looks like this:
Example 1
As you can see, the top of the table which would usually contain the name of the variable in the dataset instead contains a string of all values that the variable takes. This problem does not appear if I select 2 or more columns:
Example 2
Does anyone know how to get rid of this issue? Is there some argument that I can feed to View to make sure that it behaves nicely when I ask it to just show a single column?

View(data5[1:100,1, drop=FALSE])
When you access a single column of a data frame it is converted to a vector, drop=FALSE prevents that and retains the column name.
For instance:
> df
n s b
1 2 aa TRUE
2 3 bb TRUE
3 5 cc TRUE
> df[, 1]
[1] 2 3 5
> df[, 1, drop=FALSE]
n
1 2
2 3
3 5

Related

How to subset the first column (rownames) in R [duplicate]

This question already has answers here:
What is about the first column in R's dataset mtcars?
(4 answers)
Closed 3 years ago.
I have xy data for gene expression in multiple samples. I wish to subset the first column so I can order the genes alphabetically and perform some other filtering.
> setwd("C:/Users/Will/Desktop/BIOL3063/R code assignment");
> df = read.csv('R-assignments-dataset.csv', stringsAsFactors = FALSE);
Here is a simplified example of the dataset I'm working with, it has 270 columns (tissue samples) and 7065 rows (gene names).
The first column is a list of gene names (A2M, AAAS, AACS etc.) and each column is a different tissue sample, thus showing the gene expression in each tissue sample.
The question being asked is "Sort the gene names alpahabetically (A-Z) and print out the first 20 gene names"
My thought process would be to subset the first column (gene names) and then perform order() to sort alphabetically, after which I can use head() to print the first 20.
However when I try
> genes <- df[1]
It simply subsets the first column that has data in it (TCGA-A6-2672_TissueA) rather than the one to its left.
Also
> genes <- df[,df$col1];
> genes;
data frame with 0 columns and 7065 rows
> order(genes);
integer(0)
Appears to create a list of gene names in R studio's viewer but I cannot perform any manipulation on it.
I am unable to correctly locate the first column in the data.frame, since it does not have a column header, and I also have the same problem when doing the same thing with row 1 (sample names) as well.
I'm a complete novice at R and this is part of an assignment I'm working on, it seems I'm missing something fundamental but I can not figure out what.
Cheers guys
Please include a sample of your text file as text instead of an image.
I have created a dataset similar to yours:
X Y
1 a b
2 c d
3 d g
Note that your tissue columns have a header but your gene names do not. Therefore these will be interpreted as rownames, see ?read.table:
If row.names is not specified and the header line has one less entry
than the number of columns, the first column is taken to be the row
names.
Reading it in R:
df <- read.table(text = ' X Y
1 a b
2 c d
3 d g')
So your gene names are not at df[1] but instead in rownames(df), so to get these genes <- rownames(df) or to add these to the existing df you can use df$gene <- rownames(df)
There are numerous ways to convert your row names to a column see for example this question.
If you are asking what I think you are asking, you just need to subset inside the as.data.frame function, which will auto-generate a "header", as you call it. It will be called V1, the first variable of your new data frame.
genes <- as.data.frame(df[,1])
genes$V1
1 A
2 C
3 A
4 B
5 C
6 D
7 A
8 B
As per the comment below, the issue could be avoided if you remove the comma from your subsetting syntax. When you select columns from a data.frame, you only need to index the column, not the rows.
genes <- df[1]

R count number of variables with value ="mq" per row [duplicate]

This question already has answers here:
How to count the frequency of a string for each row in R
(4 answers)
Closed 4 years ago.
I have a data frame with 70variables, I want to create a new variable which counts the number of occurrences where the 70 variables take the value "mq" on a per row basis.
I am looking for something like this:
[ID] [Var1] [Var2] [Count_mq]
1. mq mq 2
2. 1 mq 1
3. 1 7 0
I have found this solution:
count_row_if("mq",DT)
But it gives me a vector with those values for the whole data frame and it is quite slow to compute.
I would like to find a solution using the function apply() but I don't know how to achieve this.
Best.
You can use the 'apply' function to count a particular value in your existing dataframe 'df',
df$count.MQ <- apply(df, 1, function(x) length(which(x=="mq")))
Here the second argument is 1 since you want to count for each row. You can read more about it from https://www.rdocumentation.org/packages/base/versions/3.5.1/topics/apply
I assume the name of dataset is DT. I'm a bit confused what you really want to get but this is how I understand. Data frame consists of 70 columns and a number of rows that some of them have observations 'mq'.
If I get it right, please see the code below.
apply(DT, function(x) length(filter(DT,value=='mq')), MARGIN=1)

r how to subset without retaining all data info from original set? [duplicate]

This question already has answers here:
Drop unused factor levels in a subsetted data frame
(16 answers)
Closed 7 years ago.
I am trying to subset data.
here's the link to sample data to play around with:
https://drive.google.com/file/d/0BwIbultIWxeVOFdRaE81Nm9qc2s/view?usp=sharing
so in this data set, the last column has name "Type", which has 2 values: "normal." and "back."
and let's say i am subsetting based on the "Type" column:
test.data = read.csv(file = paste0(dd, '/data_example.csv'))
test.subdata1 = subset(test.data, test.data$Type == 'normal.')
test.subdata2 = test.data[test.data$Type == 'normal.',]
here, I'm subsetting using two most common methods:
by using subset()
by directly filtering in the []
supposedly, the new subsetted data should only contain data that has Type ``"normal." (there's a period behind the word)
and indeed, when i view the subset data table, there's only "normal." ones present.
HOWEVER, the thing is, the "back." class info is retained in my subsetted data, as shown in following output:
str(test.subdata1$Type)
# Factor w/ 2 levels "back.","normal.": 2 2 2 2 2 2 2 2 2 2 ...
str(test.subdata2$Type)
# Factor w/ 2 levels "back.","normal.": 2 2 2 2 2 2 2 2 2 2 ...
so it does not matter which subsetting method i use, the complete information from the original data set will be retained in my subset data set.
my question is:
HOW to get rid of the extra info from the original data set i do not want to retain in my subset data set?
meaning, how can i see only 1 factor level in my subset data and not 2 factor levels?
# Is this what you need?
test.subdata1$Type = as.factor(as.integer(test.subdata1$Type))
# or maybe
test.subdata1$Type = factor(test.subdata1$Type)

Replacing values in data frame column using another data frame [duplicate]

This question already has answers here:
How to do vlookup and fill down (like in Excel) in R?
(9 answers)
Closed 7 years ago.
I have a table of pending bills in the Scottish Parliament. One of the columns (BillTypeID) is populated with numbers that indicate what type of bill each one is (there are seven different types of bills).
I have another table that describes which number corresponds to which bill types ( 1 = "Executive", 2 = "Member's", etc.)
I want to replace the number in my main table with the corresponding string that describes the type for each bill.
Data:
bills <- jsonlite::fromJSON(url("https://data.parliament.scot/api/bills"))
bill_stages <- jsonlite::fromJSON(url("https://data.parliament.scot/api/billstages"))
This is probably a duplicate but I can't find the corresponding answer ...
The easiest way to do this is with merge().
d1 <- data.frame(billtype=c(1,1,3,3),
bill=c("first","second","third","fourth"))
d2 <- data.frame(billtype=c(1,2,3),
billtypename=c("foo","bar","bletch"))
d3 <- merge(d1,d2)
##
## billtype bill billtypename
## 1 1 first foo
## 2 1 second foo
## 3 3 third bletch
## 4 3 fourth bletch
... then drop the billtype column if you don't want it any more. You can probably do it slightly more efficiently with match() (see my answer to the linked question).

filter R data frame with one column - keep data frame format [duplicate]

This question already has an answer here:
Filtering single-column data frames
(1 answer)
Closed 7 years ago.
I am looking for a simple way to display a subset of a one column data frame
Let's assume, I have a a data frame:
> df <- data.frame(a = 1:100)
Now, I only need the first 10 rows. If I subset it by index, I'll get a result vector instead of a data frame:
> df[1:10,]
[1] 1 2 3 4 5 6 7 8 9 10
I tried to use 'subset' but not using the 'subset'-parameter will result in an error (only for one-column-data-frames?):
subset(df[1:10,])
Error in subset.default(df[1:10, ]) :
argument "subset" is missing, with no default
There should be a very easy solution to achive a subset (still a data frame) filtered by row index, no?
I am lookung for a solution with basic R commands (it should not depend on any special library)
you can use drop=FALSE, which prevent from droping the dimensions of the array.
df[1:10, , drop=FALSE]
a
1 1
2 2
3 3
4 4
5 5
...
For subset you need to add a condition.

Resources