Transferring 1 row of numbers from excel to R - r

I'm trying to transfer a row of numbers from excel to R.
The idea is to put the numbers into a vector and compare it with another vector in order to find differences between them.
I have assembled all numbers along a single row with each placed in their own box. But as i try to copy paste it into a vector in R it appears not to contain all the numbers from the excel arc.
The row contains a substantial amount of numbers so i reckon it has something to do with the capacity of the vector. Is there perhaps a different method of succesfully transfering my numbers from excel to R?

Try copying and pasting as string and then using a function like strsplit() to split your string into a long vector, and then convert it into numeric. Here is an example code with steps:
step 1: (In excel) Remove all commas, and non-numeric characters, you can keep decimals.
step 2: (In excel) Copy the entire row in excel.
step 3: (In R)
number <- readClipboard(format = 1)
number <- strsplit(number, "\t")[[1]]
number <- strsplit(number, "-")
final <- matrix(as.numeric(unlist(number)), nrow = length(number), byrow = TRUE)
You should end up with two columns. Col 1 will be all the numbers in each cell preceding the '-' and Col 2 will be all the numbers in each cell succeeding the '-'.

Related

Separating out 6 numerical values from a column in R where there are several delimiters

I have a csv with 2 columns but should be 7. The first column is a numerical ID. The second column has the other six numerical values. However, there are several different delimiters between them. They all follow the same pattern: numerical value, a dash ("-) OR a colon (":"), eight spaces, and then the next numerical value, until the final numerical value, with nothing after it. It starts with a dash and alternates with a colon. For example:
28.3- 7.1: 62.3- 1.8: 0.5- 196
Some of these cells have missing values denoted by a single period ("."). Example:
24- .: 58.2- .: .- 174
I'm using R but I can't figure out how to accomplish this. I know it probably requires dplyr or tidyverse but I can't find what to do where there are different delimiters and spaces.
So far, I've only successfully loaded the csv and used "str()" to determine that the column with these six values is a factor.
Here is how the data look in the .csv:
Here is how it looks in RStudio after I read it in using read.csv
Here is how it looks in RStudio if I use tab as the delimiter when using read.csv, as suggested in the comments
I would try just to sort out that first column if it is the only one doing the following:
CDC_delim <- read.table('CBC.csv', sep="\t", header=F)
head(CBC_delim)
then to split that first column into two but keep both elements:
CBC_delim <- CBC_delim %>%
#
mutate(column1 = as.character(column1)) %>% # your column names will be different, maybe just V1,
#
mutate(col2 = sapply(strsplit(column1,","), `[`, 1),
col3 = sapply(strsplit(column1,","), `[`, 2))
Should leave you with some basic tidy up such as deleteing the original column1, you can check you column names using colnames(CBC_delim)
But also see:
how-to-read-data-with-different-separators

Getting only the rownames containing a specific character - R

I have a Seurat R object. I would like to only select the data corresponding to a specific sample. Therefore, I want to get only the row names that contain a specific character. Example of my differences in row names: CTAAGCTT-1 and CGTAAAT-2. I want to differentiate based on 1 and 2. The code below shows what I already tried. But it just returns the total numbers of row. Not how many rows are matching the character.
length <- length(rownames(seuratObject#meta.data) %in% "1")
OR
length <- length(grepl("-1",rownames(seuratObj#meta.data)))
Idents(seuratObject, cells = 1:length)
Thanks for any input.
Just missing which()
length(which(grepl("-1", rownames(seuratObject#meta.data))))

'dictionary' list to data.table columns

I am converting output from an API call to a bibliography database, that returns content in RIS form. I would then like to get a data.table object, with a row for each database item, and a column for each field of the RIS output.
I will explain more about RIS later, but I am stuck in the following:
I would like to get a data.table using something like:
PubDB <- as.data.table(list(TY = "txtTY",TI = "txtTI"))
which returns:
PubDB
TY TI
1: txtTY txtTI
However, what I have is a string (actually a vector of strings returned from API call: PubStr is one element)
PubStr
## [1] "TY = \"txtTY\",TI = \"txtTI\" "
How can I convert this string to the list needed inside the as.data.table command above?
More specifically, following the first steps of my code, after resp<-GET(url), rawToChar(resp$content) and as.data.table() after some string manipulation, I have a data table with rows for each publication, and one column called PubStr that has the string as above. How to convert this string to many columns, for each row of the data.table. Note: some rows have more or fewer fields.
I am unsure of RIS format but if each element of these strings are separated by commas and then within each comma the header column names are separated by the equal sign then here is a quick and dirty function that uses base R and data.table:
RIS_parser_fn<-function(x){
string_parse_list<-lapply(lapply(x,
function(i) tstrsplit(i,",")),
function(j) lapply(tstrsplit(j,"="),
function(k) t(gsub("\\W","",k))))
datatable_format<-rbindlist(lapply(lapply(string_parse_list,
function(i) data.table(Reduce("rbind",i))),
function(j) setnames(j,unlist(j[1,]))[-1]),fill = T)
return(datatable_format)
}
The first line of code simply creates a list of lists which contain 2 lists of matrices. The outer list has the number of elements equal to the size of the initial vector of strings. The inner list has exactly two matrix elements with the number of columns equal to the number of fields in each string element determined by the ',' sign. The first matrix in each list of lists consists of the columns headers (determined by the '=' sign) and the second matrix contains the values they are equal to. The last gsub simply removes any special characters remaining in the matrices. May need to modify this if you want nonalphanumeric characters to be present in the values. There were not any in your example.
The second line of code converts these lists into one data.table object. The Reduce function simply rbinds the 2 element lists and then converts them to data.tables. Hence there is now only one list consisting of data.tables for each initial string element. The "j" lapply function sets the column names to the first row of the matrix and then removes that row from the data.table. The final rbindlist call combines the list of the data.tables which have varying number of columns. Set the fill=T to allow them to be combined and NAs will be assigned to cells that do not have that particular field.
I added a second string element with one more field to test the code:
PubStr<-c("TY = \"txtTY1\",TI = \"txtTI1\"","TY = \"txtTY2\",TI = \"txtTI2\" ,TF = \"txtTF2\"")
RIS_parser_fn(PubStr)
Returns this:
TY TI TF
1: txtTY1 txtTI1 <NA>
2: txtTY2 txtTI2 txtTF2
Hopefully this will help you out and/or stimulate some ideas for more efficient code. Best of luck!

writing single column to .csv in R

HI folks: I'm trying to write a vector of length = 100 to a single-column .csv in R. Each time I try, I get two columns in the csv file: first with index numbers from the vector, second with the contents of my vector. For example:
MyPath<-("~/rstudioshared/Data/HW3")
Files<-dir(MyPath)
write.csv(Files,"Names.csv",row.names = FALSE)
If I convert the vector to a data frame and then check its dimensions,
Files<-data.frame(Files)
dim(Files)
I get 100 rows by 1 column, and the column contains the names of the files in my directory folder. This is what I want.
Then I write the csv. When I open it outside of R or read it back in and look at it, I get a 100 X 2 DF where the first column contains the index numbers and the second column has the names of my files.
Why does this happen?
How do I write just the single column of data to the .csv?
Thanks!
Row names are written by write.csv() by default (and by default, a data frame with n rows will have row names 1,...,n). You can see this by looking at e.g.:
dat <- data.frame(mevar=rnorm(10))
# then compare what gets written by:
write.csv(dat, "outname1.csv")
# versus:
rownames(dat) <- letters[1:10]
write.csv(dat, "outname2.csv")
Just use write.csv(dat, "outname.csv", row.names=FALSE) and the row names won't show up.
And a suggestion: might be easier/cleaner to just just write the vector directly to a text file with writeLines(your_vector, "your_outfile.txt") (you can still use read.csv() to read it back in if you prefer using that :p).

changing specific area from character to numeric in R programming

I use Rstudio and imported a csv file from online.
data <- read.csv("http://databank.worldbank.org/data/download/GDP.csv", stringsAsFactors = FALSE)
In the file, column X.3 is of type character.
I want to convert row (5 to 202) from character to numeric so that I can calculate mean of it.
So, when I use this line below. It still remains as character
data[c(5:202),"X.3"] <- as.numeric(gsub(",","",data[c(5:202),"X.3"]))
when i type class(data[10,"X.3"]) it shows the output as character
I am able to convert the whole column to numeric using
data[,"X.3"] <- as.numeric(gsub(",","",data[,"X.3"]))
but i want to convert only specific row's ie from 5 to 202 beacause the other rows of the column becomes N/A. i am not sure how to do it.
Following changes to your code can help you make it numeric:
data <- read.csv("http://databank.worldbank.org/data/download/GDP.csv", header = T, stringsAsFactors = FALSE, skip = 3)
# skipping first 3 rows which is just empty space/junk and defining the one as header
data <- data[-1,]
#removing the first line after the header
data$US.dollars. <- as.numeric(gsub(',','',data$US.dollars.))
#replacing scientific comma with blank to convert the character to numeric
hist(data$US.dollars.) #sample plot
As mentioned in the comment, you cannot keep part of your column as character and part numeric because R doesn't allow that and it forces type conversion to a higher order in this case numeric to character. You can read here more about Implicit Coercion of R

Resources