How do I select columns by name while ignoring certain characters? - r

I'm trying to pull data from a file, but only pull certain columns based on the column name.
I have this bit of code:
filepath <- ([my filepath])
files <- list.files(filepath, full.names=T)
newData <- fread(file,select=c(selectCols))
selectCols contains a list of column names (as strings). But in the data I'm pulling, there may be underscores placed differently in each file for the same data.
Here's an example:
PERIOD_ID
PERIOD_ID_
_PERIOD_ID_
And so on. I know I can use gsub to change the column names once the data is already pulled:
colnames(newData) <- gsub("_","",newData)
Then I can select by column name, but given that it's a lot of data I'm not sure this is the most efficient idea.
Is there a way to do ignore underscores or other characters within the fread function?

Related

R, Dataset without column names

Complete noob here, specially with R.
For a school project I have to work with a specific dataset which doesn't come with column names in the dataset it self but there is a .txt that has extra information regarding the dataset, including the column names. The problem I'm having is that when I load the dataset rstudio assumes that the first line of data is actually the column names. Initially I just substituted the name with colnames() but by doing so I ended up ignoring/deleting the first line of data, and I'm sure that's not the right away of dealing with it.
How can I go about adding the correct column names without deleting the first line of data? (Preferably inside R due to school work requirements)
Thanks in advance!
When we read the data with read.table, use header = FALSE so that it automatically assigns a column name
df1 <- read.table('file.txt', header = FALSE)
Then, we can assign the preferred column names from the other .txt column
colnames(df1) <- scan('names.txt', what = '', quiet = TRUE)

writing single column to .csv in R

HI folks: I'm trying to write a vector of length = 100 to a single-column .csv in R. Each time I try, I get two columns in the csv file: first with index numbers from the vector, second with the contents of my vector. For example:
MyPath<-("~/rstudioshared/Data/HW3")
Files<-dir(MyPath)
write.csv(Files,"Names.csv",row.names = FALSE)
If I convert the vector to a data frame and then check its dimensions,
Files<-data.frame(Files)
dim(Files)
I get 100 rows by 1 column, and the column contains the names of the files in my directory folder. This is what I want.
Then I write the csv. When I open it outside of R or read it back in and look at it, I get a 100 X 2 DF where the first column contains the index numbers and the second column has the names of my files.
Why does this happen?
How do I write just the single column of data to the .csv?
Thanks!
Row names are written by write.csv() by default (and by default, a data frame with n rows will have row names 1,...,n). You can see this by looking at e.g.:
dat <- data.frame(mevar=rnorm(10))
# then compare what gets written by:
write.csv(dat, "outname1.csv")
# versus:
rownames(dat) <- letters[1:10]
write.csv(dat, "outname2.csv")
Just use write.csv(dat, "outname.csv", row.names=FALSE) and the row names won't show up.
And a suggestion: might be easier/cleaner to just just write the vector directly to a text file with writeLines(your_vector, "your_outfile.txt") (you can still use read.csv() to read it back in if you prefer using that :p).

How to separate one column into many columns in a .txt file?

I've been given a data set for a project that I need to reformat in order to work with it.
The problem is that all of the column names and corresponding values are mashed into one column in the file. As shown in the picture.
I'm new to R so I hardly know how to work with complex commands.
My Questions:
Is there a simple way to separate this from 1 column into 12 columns?
Desire Output:
I'll also need to remove the periods between the column names and the semicolons between the values.
I just need to be able to do basic statistical analysis on the table.
Thanks
table
Although your data is in one column, it is semi colon separated. The read.csv function has the ability to accept a column separator:
df <- read.csv(file="path/to/your/file.txt", skip=1, header=FALSE, sep=";")
The above call will generate columns based on a ; separator. I skip the first line and ignore the header, because it is a single string. You may manually assign the columns names via:
names(df) <- c("name1", "name2", ..., "name12")

How to enable check strings = F in read.Alteryx WITHIN ALTERYX DESIGNER

When I import a csv data frame into R I can do
read.csv("some.csv", check.names=F)
This will keep column names with spaces in them. For example the column name
some data column will be read in as some data column.
The problem is when I use the R developer tool in Alteryx to read in a csv.
read.Alteryx("#1", mode="data.frame")
The column name some data column turns into some.data.column.
Now I realize I could use regular expressions and other parsing tools to rename the columns to what they were originally but I am hoping there is an alternative.
I believe something like to following will work:
df1 = read.Alteryx("#1", mode="data.frame")
df1metadata <- read.AlteryxMetaInfo("#1")
colnames(df1) <- df1metadata$Name

R - column names in read.table and write.table starting with number and containing space

I am importing a csv of stock data into R, with column names of stock ticker which starts with number and containing space inside, e.g. "5560 JP". After reading into R, the column names are added with "X" and space replaced by ".", e.g. "X5560.JP". After all the works are done in R, I want to write the processed data back to a new csv, but with the original column name, e.g. "5560 JP" instead of "X5560.JP", how can I do that?
Thank you!
When you use write.csv or write.table to save your data to a CSV file, you can set the column names to whatever you like by setting the col.names argument.
But that assumes you have the column names to available.
Once you've read in the data and R has converted the names, you've lost that information. To get around this, you can suppress the conversion to get the column names:
df <- read.csv("mydata.csv", check.names=FALSE)
orig.cols <- colnames(df)
colnames(df) <- make.names(colnames(df))
[your original code]
write.csv(df, col.names=orig.cols)

Resources