I have downloaded the latest R package and am using RStudio and am trying to convert a pgm image into a csv file and am using a readImage function.
Although any time I do
img <- readImage(file)
where file is the filepath
I get
Error in readImage(file) : could not find function "readImage"
Is there some other pack I need to download or am I using it wrong?
You can use the magick package to read pgm files.
First, you need to do:
install.packages("magick")
Now you call
library(magick)
In my case, I have a pgm file in my R home directory, so I make the file path with:
file <- path.expand("~/cat.pgm")
Now I can read the image and convert it into a matrix of RGB strings by doing:
img <- image_read(file)
ras <- as.raster(img)
mat <- as.matrix(ras)
To write this to csv format, I can do:
write.csv(mat, "cat.csv", row.names = FALSE)
So now I have the image saved as a csv file. To read this back in, and prove it works, I can do:
cat_csv <- read.csv("cat.csv")
cat_ras <- as.raster(as.matrix(cat_csv))
plot(cat_ras)
Note though that the csv file is very large - 9MB, which is one of the reasons why it is rarely a good idea to store an image as csv.
Created on 2022-02-05 by the reprex package (v2.0.1)
Related
I have compressed file like cat.txt.tar.gz, I just need to load into R and process as follows
zip <-("cat.txt.tar.gz")
data <- read.delim(file=(untar(zip,"cat.txt")),sep="\t")
but "data" is empty while running the code.Is there any way to read a file from .tar.gz
Are you sure your file is named correctly?
Usually compressed files are named cat.tar.gz, excluding the .txt.
Second, try the following code:
tarfile <- "cat.txt.tar.gz" # Or "cat.tar.gz" if that is right
data <- read.delim(file = untar(tarfile,compressed="gzip"),sep="\t")
If this doesn't work, you might need to extract the file first, and then read the extracted file.
To read in a particular csv or txt within a gz archive without having to UNZIP it first one can use library(archive) :
library(archive)
library(readr)
read_csv(archive_read("cat.txt.tar.gz", file = 1), col_types = cols(), sep="\t")
should work.
I have jpeg files in my local directory, I want to Extract text from all the images one by one and should record all the values in each cells accordingly . Can anyone please help me with the code? I have used Tesseract apnd Magick package to extract the text. But now I need to keep it in the loop
First of all, you have to know which files you want to read. Go to the directory where they are located and get their names with list.files.
old_dir <- getwd()
setwd('path/to/directory')
filenames <- list.files(pattern = '\\.jpg') # or '\\.jpeg'
Now the standard trick is to loop through the file names with one of the *apply functions. For the sake of simplicity, I will define a function that do the actual read and OCR text extraction operations.
library(magick)
library(tesseract)
read_ocr_png <- function(file){
img <- image_read(file)
image_ocr(img)
}
text_list <- lapply(filenames, read_ocr_png)
names(text_list) <- filenames
And reset the working directory when done.
setwd(old_dir)
I have a bam file does anyone know how to convert a bam file to a csv file? I am trying to use R-software to open the bam file but I am not sure how to get the variables from the bam files so far I have used the below mentioned coding:
rm(list=ls())
#install bam packages
source("http://bioconductor.org/biocLite.R")
biocLite("Rsamtools",suppressUpdates=TRUE)
biocLite("RNAseqData.HNRNPC.bam.chr14",suppressUpdates=TRUE)
biocLite("GenomicAlignments",suppressUpdates=TRUE)
#load library
library(Rsamtools)
library(RNAseqData.HNRNPC.bam.chr14)
library(GenomicAlignments)
bamfile <- file.path("C:","Users","azzop","Desktop","I16-1144-01-esd_m1_CGCTCATT-AGGCGAAG_tophat2","accepted_hits.bam")
gal<-readGAlignments(bamfile)
gal
length(gal)
names(gal)
When I inserted names(gal) it gave me NULL not sure it is the correct.
I would like to convert the bam to csv and it would be easier to read the data
I would suggest converting BAM to BED and then reading BED file into R.
You can convert BAM to BED using bedtools.
This abstract code should work:
bamfile <- "C:/Users/azzop/Desktop/I16-1144-01-esd_m1_CGCTCATT-AGGCGAAG_tophat2/accepted_hits.bam"
# This code line sends command to convert BAM to BED (might take some time)
system(paste("bedtools bamtobed -i", bamfile, "> myBed.bed"))
library(data.table)
myData <- fread("myBed.bed")
Here I'm using function fread from a data.table package for a fast data read.
I have the basic setup done following the link below:
http://htmlpreview.github.io/?https://github.com/Microsoft/AzureSMR/blob/master/inst/doc/tutorial.html
There is a method 'azureGetBlob' which allows you to retrieve objects from the containers. however, it seems to only allow "raw" and "text" format which is not very useful for excel. I've tested the connections and etc, I can retrieve .txt / .csv files but not .xlsx files.
Does anyone know any workaround for this?
Thanks
Does anyone know any workaround for this?
There is no file type on the azure blob storage, it is just a blob name. The extension type is known for OS. If we want to open the excel file in the r, we could use the 3rd library to do that such as readXl.
Work around:
You could use the get blob api to download the blob file to local path then use readXl to read the file. We also get could more demo code from this link.
# install
install.packages("readxl")
# Loading
library("readxl")
# xls files
my_data <- read_excel("my_file.xls")
# xlsx files
my_data <- read_excel("my_file.xlsx")
Solved with the following code. Basically, read the file in byte then wrote the file to disk then read it into R
excel_bytes <- azureGetBlob(sc, storageAccount = "accountname", container = "containername", blob=blob_name, type="raw")
q <- tempfile()
f <- file(q, 'wb')
writeBin(excel_bytes, f)
close(f)
result <- read.xlsx(q, sheetIndex = sheetIndex)
unlink(q)
Im trying to read an excel file into R. It's about the following file in my cwd:
> list.files()
[1] "Keuren_Op_Afspraak.xlsx"
I installed XLConnect and am doing the following:
library(XLConnect)
demoExcelFile <- system.file("Keuren_Op_Afspraak.xlsx", package = "XLConnect")
wb <- loadWorkbook(demoExcelFile)
But this gives me the error:
Error: FileNotFoundException (Java): File '' could not be found - you may specify to automatically create the file if not existing.
But I dont understand where this is coming from. Any thoughts?
I prefer using the readxl package. It is written in C so it is faster. It also seems to handle large files better. The command would be:
library(readxl)
wb <- read_excel("Keuren_Op_Afspraak.xlsx")
You can also use the xlsx package.
library(xlsx)
wb <- read.xlsx("Keuren_Op_Afspraak.xlsx", sheet = 1)
Edit :#Verena
You can also use this function much faster:
wb <- read.xlsx2("Keuren_Op_Afspraak.xlsx", sheet = 1)
You have to change your code that way:
library(XLConnect)
demoExcelFile <- "Keuren_Op_Afspraak.xlsx"
wb <- loadWorkbook(demoExcelFile)
You probably took the example from here:
http://www.inside-r.org/packages/cran/XLConnect/docs/loadWorkbook
This line
system.file("demoFiles/mtcars.xlsx", package = "XLConnect")
is a way to get sample files that are part of a package. If you download the zip File of XLConnect and look into the folder structure you will see that there is a folder demoFiles that contains mtcars.xlsx. And the parameter package="XLConnect" tells the method to look for the file in this package.
If you type it into the command line it returns the absolute path to the file:
"C:/Users/Expecto/Documents/R/win-library/3.1/XLConnect/demoFiles/mtcars.xlsx"
To use loadWorkbook you simply need to pass the relative or absolute filepath.