csv file, or tab delimited file, or whatever file [closed] - r

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
I have a question regarding file.
The reason I am asking is because of using some function for example in R to import the data from outside to R.
I have a file character_student.txt with data like below.
Name, Age, Gender, Test1
john, 11, M, 90
betty, 25, F, 33
I am confused. Is the above file considered as csv file (comma separated file)? Or is it a text file? When using R to import say this file to R, is it appropriate to use say read.csv(file="character_student.txt)?
Then the other question that I have is, if I have a file like this:
Name Age Gender Test1
john 11 M 90
betty 25 F 33
so there is only a single space in between each file, and then say I saved it as a .csv file, then I think the filename will become something like character_text.csv. Then I am just wondering is this file now a space-delimited file or a comma-delimited file?
I guess my question is how do I know if I file is a comma-separated file? or a space-delimited file? or a tab-delimited?
Is it purely based on the name of the file? for example if the name ends with csv, then it is comma separated file, if it ends with something else then is "something else" delimited file? so does it matter what the inside of the file actually looks like? like do we have to open the file to check if there is a comma separating the field to be sure that the file is a comma-separated file? or if we have a csv file but inside of it, the field could be separated by something else?
Or if it is called csv, then every field inside is separated by a comma (like I don't have to open it to make sure that it is actually separated by comma)?

Extensions don’t define files. They help various utilities or tools to process them in a specified way.
You write a python script and save it as hello.c.
You then pass it to gcc like gcc hello.c.
Nothing is wrong with that. gcc will accept to process the file but report lots of syntactical errors.
Similarly, by specifying .csv, you are telling the tool, utility or function that you are passing a comma separated file.
If you have a file like:
abc def, ghi jkl,
One user wants to extract data from it in the form:
abc, def,, ghi and jkl,. For that user it would be good if he “treats" it like a space separated file. For some other user who wants,
abc def and ghi jkl, it would be useful for him to treat it as a comma separated file.
For a particular case, you need to study that particular function or tool and analyse the way they need the file. So yes, if a tool wants a file to be in a particular way, you need to make checks to pass the file to that tool accordingly.
Its just about how you want it.

Related

How to select a random word from a txt file in python version 3.4?

**
I have a txt file called 'All_Words' and it consist of 2000 words and i'm making a hangman so i needed to choose a random word i've already thought of picking a random number from 0 to 2000 and read the line of the number to chose but i don't know how to do that, also some background info:
i am in 8th grade and i like coding im trying to get better so i'm trying to get what people suggest and try to figure out what every part does and the reserved words such as 'global' for example
also i have also tried to just shuffle the txt file because i already got it to print the first word so if im a able to shuffle the txt file then it would print a different word an i could create an if statement saying if the word chosen was already chooses then it would shuffle it again and pick he first word again, also i got this idea of the shuffle the txt file from my dad but he only did something called 'dos' or something like that he said he did it before it was even called coding so i don't even know if it world word in python, and i've asked my coding teacher and he said he dont know how you would do that because he is use to java and javascript
this is what i have so far and also i would like it to only pick one word instead of every word in order:**
import random
with open("All_Words.txt") as file:
for line in file:
print(line)
break
Assuming each word is on a new line in file, you can read the text file into a list and use random.choice() to pick a random element in the list. Then you remove the word from the list so you don't pick it again.
import random
file = open("All_Words.txt", "r")
words = file.read()
listOfWords = words.split("\n")
randWord = random.choice(listOfWords)
print(randWord)
listOfWords.remove(randWord)
newUnqiueRandWord = random.choice(listOfWords)
print(newUnqiueRandWord)

How Can I Properly Export data from DB in which some values have special character like "\r"?

I have a table on my DB and one of which columns has some special characters like "\r"(enter). Maybe these were done by typist who surveyed this data. This column was originated from essay question, in my opinion.
The problem is this. Because of situation above, some cells have special characters.
With DB tool, export table into Excel file does not go wrong. But export it to delimited file like CSV is different, even in R write.table. Some character ( "\r") does something; It make another line; 69297 → 69454.
So is there a way to handle this things??

Extract exactly one file (any) from each 7zip archive, in bulk (Unix)

I have 1,500 7zip archives, each archive contains 2 to 10 files, with no subdirectories.
Each file has the same extension, however the filename varies.
I only want one file out of each archive, but I'd like to perform this in bulk. I do not care which file is taken out, as long as only one file is taken out. It can be the first file, the newest, the biggest, the smallest, it doesn't matter.
Here's an example:
aa.7z {blah 56.smc, blah 57.smc, 1 blah 58.smc}
ab.7z {xx.smc, xx 1.smc, xx_2.smc}
ac.7z {1.smc}
I want to run something equivalent to:
7z e *.7z # But somehow only extract one file
Thank you!
Ultimately my solution was to extract all files and run the following in the directory:
for n in *; do echo "$n"; done > files.txt
I then imported that list into excel, and split the files by a special character that divided the title of the file with the qualifying data inside the filename (for example: Some Title (V1) [X2].smc), specifically I used a brackets delimiter.
Then I removed all duplicates, leaving me with only one edition of each from the zip. I finally remerged the columns (unfortunately the bracket was deleted during the splitting so wrote a function to add it back on the condition of whether there was content in the next column) and then resaved files.txt, after a bit of reviewing StackOverflow for answers, deleted files based on an input file (files.txt). A word of warning on this, spaces in filenames cause problems with rm and xargs so I had to encapsulate the variable with quotes.
Ultimately this still didn't serve me well enough so I just used a different resource entirely.
Posting this answer so others who find themselves in a similar predicament find an alternative resolution.

In R, How to remove some unwanted charaters from the CSV file names and also extract dates? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I have a folder which contains some 2000 CSVs with file names that contain character '[ ]' in it - e.g.: [Residential]20151001_0000_1.csv
I want to:
Remove '[]' from names so that we have file name as:
Residential_20151001_0000_1.csv
and place new files within a new folder.
The read all the files from that new folder in one data frame (without header) after skipping first row from each file.
Also extract 20151001 as date (e.g. 2015-10-01) in a new vector as list such that the new vector is:
File Name Date
Residential_20151001_0000_1.csv 2015-10-01
This code will answer your first question albeit with a small change in logic.
Firstly, lets create a backup of all the csv containing [] by copying them to another folder. For eg - If your csvs were in directory "/Users/xxxx/Desktop/Sub", we will copy them in the folder Backup.
Therefore,
library(stringr)
library(tools)
setwd("/Users/xxxx/Desktop/Sub")
dir.create("Backup")
files<-data.frame(file=list.files(path=".", pattern = "*.csv"))
for (f in files)
file.copy(from= file.path("/Users/xxxx/Desktop/Sub", files$file), to= "/Users/xxxx/Desktop/Sub/Backup")
This has now copied all the csv files to folder Backup.
Now lets rename the files in your original working directory by removing the "[]".
I have taken a slightly longer route by creating a dataframe with the old names and new names to make things easier for you.
Name<-file_path_sans_ext(files$file)
files<-cbind(files, Name)
files$Name<-gsub("\\[", "",files$Name)
files$Name<-gsub("\\]", "_",files$Name)
files$Name<-paste(files$Name,".csv",sep="")
This dataframe looks like:
files
file Name
1 [Residential]20150928_0000_4.csv Residential_20150928_0000_4.csv
2 [Residential]20151001_0000_1.csv Residential_20151001_0000_1.csv
3 [Residential]20151101_0000_3.csv Residential_20151101_0000_3.csv
4 [Residential]20151121_0000_2.csv Residential_20151121_0000_2.csv
5 [Residential]20151231_0000_5.csv Residential_20151231_0000_5.csv
Now lets rename the files to remove the "[]". The idea here is to replace file with Name:
for ( f in files$file)
file.rename(from=file.path("/Users/xxxx/Desktop/Sub", files$file),
to=file.path("/Users/xxxx/Desktop/Sub",files$Name))
You've renamed your files now. If you run: list.files(path=".", pattern = "*.csv") You will get the new files:
"Residential_20150928_0000_4.csv"
"Residential_20151001_0000_1.csv"
"Residential_20151101_0000_3.csv"
"Residential_20151121_0000_2.csv"
"Residential_20151231_0000_5.csv"
Try it!
In order:
After googling r replace part of string I found: R - how to replace parts of variable strings within data frame. This should get you up and running for this issue.
For skipping the first line, read the documentation of read.csv. There you will find the skip argument.
Have a look at the strftime/strptime functions. Alternatively, have a look at lubridate.

Reading a file into R with partly unknown filename

Is there a way to read a file into R where I do not know the complete file name. Something like.
read.csv("abc_*")
In this case I do not know the complete file name after abc_
If you have exactly one file matching your criteria, you can do it like this:
read.csv(dir(pattern='^abc_')[1])
If there is more than one file, this approach would just use the first hit. In a more elaborated version you could loop over all matches and append them to one dataframe or something like that.
Note that the pattern uses regular expressions and thus is a bit different from what you did expect (and what I wrongly assumed at my first shot to answer the question). Details can be found using ?regex
If you have a directory you want to submit, you have do modify the dir command accordingly:
read.csv(dir('path/to/your/file', full.names=T, pattern="^abc"))
The submitted path in your case may be c:\\users\\user\\desktop, and then the pattern as above. full.names=T forces dir() to output a whole path and not only the file name. Try running dir(...) without the read.csv to understand what is happening there.
If you want to give your path as a complete string, it again gets a bit more complicated:
filepath <- 'path/to/your/file/abc_'
read.csv(dir(dirname(filepath), full.names=T, pattern=paste("^", basename(filepath), sep='')))
That process will fail if your filename contains any regular expression keywords. You would have to substitute then with their corresponding escape sequences upfront. But that again is another topic.

Resources