Error while using colSums with tab-delimited file - r

I'm new at R and I'm currently trying to get some statistical data from a file. It is a large set of data in txt tab-delimited file. While importing the file I had no problem and all of the data is shown correctly as a table in rstudio. However, when I'm trying to make any sort of calculations using colsums,
> colSums("Wages and salaries")
Error in colSums("Wages and salaries") : 'x' must be an array of at
least two dimensions
I do receive an error
x' must be an array of at least two dimensions.
"Wages and Salaries" is the name of the column I'm trying to get the sum of.
Using V1 or any other column name that was created by r gives me another error
> colSums(V2)
Error in is.data.frame(x) : object 'V2' not found
The way I'm importing the file is
rm(list=ls())
filename <- read.delim("~/filename.txt", header=FALSE)`
> is.data.frame(filename)
[1] TRUE
This gives me a matrix type data table with rows and columns the same way excel would show me the data.
The reason I'm trying to get a sum of all of the numbers in column is to later get sum of several different columns.
I'm very new at R and I could not find an answer to my question as most of the examples are using just a very small set of data that was created in the r.

In R you can access a column in 2 ways:
filename["Wages and salaries"]
or
filename$`Wages and salaries`
So, please try :
colSums(filename["Wages and salaries"])

Related

Reading .h5ad file in R using Convert

I'm trying to read a .h5ad file in my RStudio.
I first converted the .h5ad file to .h5Seurat file using the Convert() function in library(SeuratDisk).
The code for my attempt can be found here:
> library(Seurat)
> library(SeuratDisk)
> Convert("train.h5ad", "train.h5Seurat")
Warning: Unknown file type: h5ad
Warning: 'assay' not set, setting to 'RNA'
Creating h5Seurat file for version 3.1.5.9900
Adding X as data
Adding X as counts
Adding meta.features from var
Adding X_Compartment_tSNE as cell embeddings for Compartment_tSNE
Adding X_tSNE as cell embeddings for tSNE
Adding layer counts as data in assay counts
Adding layer counts as counts in assay counts
> train_seurat <- LoadH5Seurat("train.h5Seurat")
Validating h5Seurat file
Error: Ambiguous assays
The data which I'm trying to read can be found here: https://drive.google.com/drive/folders/1cXYoKNU9qY0f1bbYNh2uykWG6juVJln7
To add, I tried:
> train_seurat <- LoadH5Seurat("train.h5Seurat", assays = "RNA")
But I faced the same issue. Trying to find something quick.
Kindly try the anndata library but note that the data type won’t be seurat as you would want. It’ll be an anndata class object.

rafalib - as.fumeric error " 'x' must be a character"

Generating a plot with rafalib open.
Have a dataset with a column labeled "Tissue". The entire table is in object "b". "hc" contains a hclust of the distribution of the numeric values of "b".
When I run:
myplclust(hc, xlab="distance",main="Hierarchical Clustering Dendrogram",labels=b$Tissue,lab.col=as.fumeric(b$Tissue),cex=0.5)
RStudio responds with:
Error in as.fumeric(b$Tissue) : 'x' must be a character
What's going on here? I've reset R multiple times. I have rafalib installed and active.
SOLVED:
The names I was entering under this function were being recognized as data rather than values for some reason. By converting my list to a vector with as.vector inside of the as.fumeric function, the problem was solved.
The correct code now looks like;
myplclust(hc, xlab="distance",main="Hierarchical Clustering Dendrogram",labels=b$Tissue,lab.col=as.fumeric(as.vector(b$Tissue)),cex=0.5)

Error in fstRead R

I am using the new 'fst' package in R for a few weeks to write and read tables in the .fst format. Sometimes I cannot read a table that I've just write having the following message :
> tab=read.fst("Tables R/tab.fst",as.data.table=TRUE)
Error in fstRead(fileName, columns, from, to) :
Unknown type found in column.
Do you know why this happens ? Is there an other way to retrieve the table ?

dplyr rename command with spaces

I have tried multiple variations of the rename function in dplyr.
I have a data frame called from a database called alldata, and a column within the data frame named WindDirection:N. I am trying to rename it as Wind Direction. I understand creating variable names containing spaces is not a good practice, but I want it to be named as such to improve readability for a selectInput list in shiny, and even if I settle for renaming it WindDirection I am getting all of the same error messages.
I have tried:
rename(alldata, Wind Direction = WindDirection:N)
which gives the error message:
Error: unexpected symbol in "rename(alldata, Wind Direction"
rename(alldata, `Wind Direction` = `WindDirection:N`)
which does not give an error message, but also does not rename the variable
rename(alldata, "Wind Direction" = "WindDirection:N")
which gives the error message:
Error: Arguments to rename must be unquoted variable names. Arguments Wind Direction are not.
I then tried the same 3 combinations of the reverse order (because I know that is how plyr works even though I do not call it to be used using the library command earlier in my code) putting the old variable first and the new variable 2nd with similar error messages.
I then tried to specify the package as I have 1 example below and tried all 6 combinations again.
dplyr::rename(alldata, `Wind Direction` = `WindDirection:N`)
to similar error messages as the first time.
I have used the following thread as an attempt to do this myself.
Replacement for "rename" in dplyr
as agenis pointed out, my mistake was not redefining the dataframe after renaming the variable.
So where I had
dplyr::rename(alldata,Wind Direction=WindDirection:N)
I should have
alldata <- dplyr::rename(alldata,Wind Direction=WindDirection:N)

How to read data from excel in r?

I am trying to prepare data for cluster analysis. That's why I have prepared data tables in excel and the headers are "id","name","crime_type","crime_date","gender","age"
Then , I convert the excel into .csv format.
Then , I write the following command ->
>crime <- read.csv("crime_data.csv",header=T)
>crime # I print , and it prints
# now I will do cluster with kmeans()
>kmeans.result <- kmeans(crime,3)
But , it shows errors.
"Error is as follows :
Error in do_one(nmeth) : NA/NaN/Inf in foreign function call (arg 1)
In addition: Warning message:
In kmeans(crime, 3) : NAs introduced by coercion"
What I am doing wrong here...
I can't speak to your specific problem without knowing what you data looks like but it could be as simple as giving the xlsx package a try. I think it handles NaNs better
install.packages(xlsx)
library(xlsx)
yourdata <- read.xlsx("YOURDATASHEET.xlsx", sheetName="THESHEETNAME")
Seems like you are asking two questions. For the first; you can also try reading directly from the clipboard (beware of large tables tough, but so far I have good results with 40k rows, 30 col)
d1<-read.table(file="clipboard",sep="\t",header=FALSE,stringsAsFactors=FALSE)
set header to TRUE if you want to name your columns. You can also use what was suggested above to open excel sheets directly but this might not be practical if you have non standard tables.
For the second part perhaps you should convert to numerical using the sapply function and or suppressWarnings().

Resources