I have a csv file and am unsure how to get R to interpret it as a table because all the title info is in one cell and all the data relating to the titles is in a separate cell. So all the info I need is in 2 cells but it actually needs to be split up.
The cell A3 has a value called 'Team' , this corresponds to the part in the cell A4 that says 'Visitor'. Then each part after than corresponds to the bit below it. ..sorry I don't know how to describe it, but ultimately it would look like this …
Looks like the field separator in your data is a ;
read.csv has a parameter sep to change the field separator and another parameter header to tell it there is an initial line containing the column names. Use read.csv like this:
data = read.csv(file="/mydir/myfile.csv", sep=";", header=T)
To test you can print out the first 5 lines of the data table with:
head(data,5)
Related
I have got this .txt file outputed by a microscope to process.
#read the .txt file generated by microscope, skipping the first 9 lines of garbage information
df <- read.csv("Objects_Population - AllCells.txt", sep="\t", skip = 9,header=TRUE, fill = T)
Then I started looking at the structure of the dataframe, everything seems fine except I now found an extra column in the end of the data frame named "x.1" and all rows of it are NA values. I don't see this column when I open the .txt file in excel. I suspect the problem has something to do with the column names generated by microscope, they contain quite some special characters
Below is the dataframe read by Excel(only showing the last 2 columns since I have 132 columns, and their names are disgustingly long):
AllCells - Cell Contact Area with Neighbors [%] AllCells - Nucleus Nearest Neighbor Distance [µm]
0 4.82083
21.9512 0
15.7895 0
29.4118 0.584611
0 4.21569
0 1.99599
0 3.50767
...
This has happened to me before but I never took it too serious as I was always interested in a subset of my data frame. Now I'm looking at all columns then this starts to bothering me.
Is there any way I can read them correctly without R attaching that additional "X.1" column in the end? Preferably not manually delete or subset out the last column...
Cheers,
ML
If all other column names are correct, you have probably a trailing \t in the text file. R tries to include it and gives it the generic column name X.1.
You could try and read the file first as 'plain text' and remove the trailing \t and only then use read.csv:
file_connection <- file("Objects_Population - AllCells.txt")
content <- readLines(file_connection )
close(file_connection)
Now we try to get rid of these trailing \t (this might need some testing to fit your needs)
sanitized <- gsub("\\t$", "", content)
And then we read this sanitized string as if it was a file (using the argument text)
df <- read.csv(text=paste0(sanitized, collapse="\n"), sep="\t", skip = 9,header=TRUE, fill = T)
Had that problem too. Fixed it by saving the file as "CSV (MS-DOS (*csv)" instead of what I originally had as "CSV (Comma delimited)(*csv)".
This is almost certainly because you've got an extra empty column in your spreadsheet.
In Excel, open your sheet and press Ctrl-End. If you end up in an empty cell outside the range of your data, there's the problem. Select the column (Ctrl-Space), right-click, and choose Delete.
I also encountered similar problem. I found that three extra columns were created (X, X.1, X.2), after I loaded dataset from excel sheet to R studio.
Steps Followed by me:
a) I went to the excel sheet and selected those three extra columns after last column with actual values in excel sheet. Selected extra 3 columns by keeping cursor on top of columns and then right click the mouse and select delete.
b) Again loaded that excel sheet in R. I did not find those 3 columns.
I have csv file contains iphone device roadmap like version number, name of model, release of model , price etc. I have done following:
I have imported data set in Rstudio in variable name iphonedetail by following command. iphonedetail <-read.csv("iphodedata.csv")
Than i hv changed the attribute "name of model" to character by using following: iphonedetail$nameofmodel <- as.character(iphonedetail$nameofmodel)
Now i need to access 1st 5 name of model and store them in vector .
I tried this to achieve : iphonesubset <- data.frame(iphonedetail$nameofmodel)
Then on console i typed iphonesubset, but gave 0 col and row.
Could someone help in above 2 steps correct or not ? and also suggest how to fix 3rd step?
if you want to extract the first five (non unique):
iphonedf1to5 <- df[1:5,]
That means that you get the first 5 rows and all columns. Then if you want to get the unique first five elements it should be like:
iphonedf1to5 <- unique(df[1:5,])
Edit:
df means your data frame of the read csv, iphonedetail in your case.
I have a folder with tons of txt files from where I have to extract especific data. The problem is that the format of the file has changed once and the position of the data I need to extract has also changed. So I need to deal with files in different format.
To try to make it more clear, in column 4 I have the name of the variable and in 5 I have the value, but sometimes this is in a different row. Is there a way to find the name of the variable (in which row) and then extract its value?
Thanks in advance
EDITING
In some files I will have the data like this:
Column 1-------Column 2.
Device ID------A.
Voltage------- 500.
Current--------28
But in some point in life, there was a change in the software to add another variable and the new file iis like this:
Column 1-------Column 2.
Device ID------A.
Voltage------- 500.
Error------------5.
Current--------28
So I need to deal with these 2 types of data, extracting the same variables which are in different rows.
If these files can't be read with read.table use readLines and then find those lines that start with the keyword you need.
For example:
Sample file 1 (with the dashes included and extra line breaks):
Column 1-------Column 2.
Device ID------A.
Voltage------- 500.
Error------------5.
Current--------28
Sample file2 (with a comma as separator):
Column 1,Column 2.
Device ID,A.
Current,555
Voltage, 500.
Error,5.
For both cases do:
text = readLines(con = file("your filename here"))
curr = text[grepl("^Current", text, ignore.case = T)]
Which returns:
for file 1:
[1] "Current--------28"
for file 2:
[1] "Current,555"
Then use gsub to remove anything that is not a number.
I have a .txt file containing text (which I don't want) and 65 tables, as shown below (just the top of the .txt file)
Does anyone know how I can extract only the tables from this text file, such that I can open the resulting .txt file as a data.frame with my 65 tables in R? Above each table is a fixed number of lines (starting with "The result of abcpred on seq..." and ending with "Predicted B cell epitopes") and below each of them is a variable number of lines, depending on how many rows each tables has. Then it comes the next table, and it goes like that until I reach the 65th table.
Given that the tables are the only elements that start with numbers, to grep for integers at the beginning of the line is indeed the best solution. Using the shell (and not R) the command:
grep '^[0-9]' input > output
did exactly what I wanted.
I chunked several novels into a data frame called documents. I want to export each chunk as a separate .txt file.
The data frame that consists of two columns. The first column has the file names for each chunk, and the second column has the actual text that would go into the file.
documents[1,1]
[1] "Beloved.txt_1"
documents[1,2]
[1] "124 was spiteful full of a baby's venom the women......"
class(documents)
[1] "data.frame"
I'm trying to write a for loop that would take each row, make the second column into a .txt file, and make the first column the name of the file. And then to iterate for each row. I've been working with something like this:
for (i in 1:ncol(documents)) {
write(tagged_text, paste("data/taggedCorpus/",
documents[i], ".txt", sep=""))
I've also been reading that maybe the cat function would work well here?
I'm not positive this will work for you (a little more of an example of your input and desired output would help), but one issue you've got is that your for loop is by column rather than by row. If you want to do this once for every row, then it needs to be for (i in 1:nrow(documents) rather than ncol.
Assuming that "documents" is the name of your data.frame and that the column containing the text you want to save is called "tagged_text" and the column with the file name is called "file", try this:
for (i in 1:nrow(documents)) {
write(documents$tagged_text[i], paste0("data/taggedCorpus/",
documents$file[i], ".txt"))
}
Note that you don't need to specify the path every time if you already set it before you start the loop.