How to enable check strings = F in read.Alteryx WITHIN ALTERYX DESIGNER - r

When I import a csv data frame into R I can do
read.csv("some.csv", check.names=F)
This will keep column names with spaces in them. For example the column name
some data column will be read in as some data column.
The problem is when I use the R developer tool in Alteryx to read in a csv.
read.Alteryx("#1", mode="data.frame")
The column name some data column turns into some.data.column.
Now I realize I could use regular expressions and other parsing tools to rename the columns to what they were originally but I am hoping there is an alternative.

I believe something like to following will work:
df1 = read.Alteryx("#1", mode="data.frame")
df1metadata <- read.AlteryxMetaInfo("#1")
colnames(df1) <- df1metadata$Name

Related

Appending two excel files into one dataframe

I am trying to append two excel files from Excel into R.
I am using the following the code to do so:
rm(list = ls(all.names = TRUE))
library(rio) #this is for the excel appending
library("dplyr") #this is required for filtering and selecting
library(tidyverse)
library(openxlsx)
path1 <- "A:/Users/Desktop/Test1.xlsx"
path2 <- "A:/Users/Desktop/Test2.xlsx"
dat = bind_rows(path1,path2)
Output
> dat = bind_rows(path1,path2)
Error: Argument 1 must have names.
Run `rlang::last_error()` to see where the error occurred
I appreciate that this is more for combining rows together, but can someone help me with combining difference workbooks into one dataframe in R Studio?
bind_rows() works with data frames AFTER they have been loaded into the R environment. Here are you merely trying to "bind" 2 strings of characters together, hence the error. First you need to import the data from Excel, which you could do with something like:
test_df1 <- readxl::read_xlsx(path1)
test_df2 <- readxl::read_xlsx(path2)
and then you should be able to run:
test_df <- bind_rows(test_df1, test_df2)
A quicker way would be to iterate the process using the map function from purrr:
test_df <- map_df(c(path1, path2), readxl::read_xlsx)
If you want to append one under the other, meaning that both excels have the same columns, I would
select the rows I wanted from the first excel and create a dataframe
select the rows from the second excel and create a second dataframe
append them with rbind().
On the other hand if you want to append the one next to another, I would choose the columns needed from first and second excel into two dataframes respectively and then I would go with cbind()

R, Dataset without column names

Complete noob here, specially with R.
For a school project I have to work with a specific dataset which doesn't come with column names in the dataset it self but there is a .txt that has extra information regarding the dataset, including the column names. The problem I'm having is that when I load the dataset rstudio assumes that the first line of data is actually the column names. Initially I just substituted the name with colnames() but by doing so I ended up ignoring/deleting the first line of data, and I'm sure that's not the right away of dealing with it.
How can I go about adding the correct column names without deleting the first line of data? (Preferably inside R due to school work requirements)
Thanks in advance!
When we read the data with read.table, use header = FALSE so that it automatically assigns a column name
df1 <- read.table('file.txt', header = FALSE)
Then, we can assign the preferred column names from the other .txt column
colnames(df1) <- scan('names.txt', what = '', quiet = TRUE)

How do I select columns by name while ignoring certain characters?

I'm trying to pull data from a file, but only pull certain columns based on the column name.
I have this bit of code:
filepath <- ([my filepath])
files <- list.files(filepath, full.names=T)
newData <- fread(file,select=c(selectCols))
selectCols contains a list of column names (as strings). But in the data I'm pulling, there may be underscores placed differently in each file for the same data.
Here's an example:
PERIOD_ID
PERIOD_ID_
_PERIOD_ID_
And so on. I know I can use gsub to change the column names once the data is already pulled:
colnames(newData) <- gsub("_","",newData)
Then I can select by column name, but given that it's a lot of data I'm not sure this is the most efficient idea.
Is there a way to do ignore underscores or other characters within the fread function?

Comparing column headers of two files to fetch data in R

I have a large CSV file, say INPUT, with about 500+ columns. I also have a dataframe DF that contains a subset of the column headers of INPUT which changes at every iteration.
I have to fetch the data from only those columns of INPUT that is present in the dataframe DF and write it into another CSV file, say OUTPUT.
In short,
INPUT.csv:
ID,Col_A,Col_B,Col_C,Col_D,Col_E,Col_F,,,,,,,,,,,,,Col_S,,,,,,,,,,,,,,,,Col_Z
1,009,abcd,67,xvz,33,50,,,,,,,,,,,,,,,,,,,,,,,,,,,,oup,,,,,,,,,,,,,,,,,,90
2,007,efgh,87,wuy,56,67,,,,,,,,,,,,,,,,,,,,,,,,,,,,ghj,,,,,,,,,,,,,,,,,,,,888
print(DF):
[1] "Col_D" "Col_Z"
[3] "Col_F" "Col_S"
OUTPUT.csv
ID,Col_D,Col_Z,Col_F,Col_S
1,xvz,90,50,oup
2,wuy,888,67,ghj
I'm a beginner when it comes to R. I would prefer for the matching of dataframe with the INPUT file to be automated, because i don't want to do this everyday when the dataframe gets updated.
I'm not sure whether this is the answer :
input <- read.table(...)
input[colnames(input) %in% colnames(DF)]
if I understand it correctly, you need to import the INPUT.csv file inside R and then match the columns of your DF with those columns of your INPUT, is that correct?
you can either use the match function or just import the INPUT.csv file inside RStudio via "Import Dataset" button and subset it. Subsetting of imported dataframes is fairly easy.
If you will import your dataset as INPUT, then you can make the subset of these columns in following way: INPUT[,c(1,2,4)]
and that will get you first, second and fourth column of the INPUT dataset.
First, to upload the csv is simple:
dataframe_read <- read.csv('/Path/to/csv/')
If I understand correctly that one dataframes columns is always a subset, the code is as follows:
### Example Dataframes
df1 <- data_frame(one = c(1,3,4), two= c(1,2,3), three = c(1,2,3))
df2 <- data_frame(one = c(1,3,4), three= c(1,2,3))
### Make new data frame
df3 <- df1[,colnames(df2)]
### Write new dataframe
write.csv(df3, 'hello.csv')

read multiple csv files with the same column headings and find the mean

Is it possible to read multiple csv excell files into R. All of the csv files have the same 4 columns. the first is a character, the second and third are numeric and the fourth is integer. I want to combine the data in each numeric column and find the mean.
I can get the csv files into R with
data <- list.files(directory)
myFiles <- paste(directory,data[id],sep="/")
I am unable to get the numbers from the individual columns add them and find the mean.
I am completely new to R and any advice is appreciated.
Here is a simple method:
Prep: Generate dummy data: (You already have this)
dummy <- data.frame(names=rep("a",4), a=1:4,b=5:8)
write.csv(dummy,file="data01.csv",row.names=F)
write.csv(dummy,file="data02.csv",row.names=F)
write.csv(dummy,file="data03.csv",row.names=F)
Step0: Load the file names: (just like you are doing)
data <- dir(getwd(),".csv")
Step1: Read and combine:
DF <- do.call(rbind,lapply(data,function(fn) read.csv(file=fn,header=T)))
DF
Step2: Find mean of appropriate columns:
apply(DF[,2:3],2,mean)
Hope that helps!!
EDIT: If you are having trouble with file path, try ?file.path.

Resources