How to pull the required columns from the csv file? [duplicate] - r

This question already has answers here:
Only read selected columns
(5 answers)
Closed 5 years ago.
I have a grocery sales data which has 11 columns like store name,item name,price etc. For my analysis i do not require all the column values. I need only few column values for generating a report.
what is the R code for this?
Example: Below are the column names of an sales data. i need only 6 of the below column values. I tried that coding, but error is shown, also those answers I don't understand
STORE_NAME STORE_ID DEVICE_SERIAL_NUMBER BILL_NUMBER BARCODE ITEM_NAME VARIANT_NAME BASEPACK CATEGORY BRAND MANUFACTURER QUANTITY_SOLD PRICE PURCHASE_PRICE SELLING_PRICE SALES_VAT USER_NAME COUNTER CUSTOMER_NAME CUSTOMER_PHONE BILL_DATE CREATED_DATE

Read all the data with read.table or read.csv and then extract only those, that you can use. That's what we use square brackets for in R. You can do it either by column number or column name:
lots.of.cols <- data.frame(a=1:20, b=2:21, c=3:22, d=runif(20), e=runif(20))
only.first.two.cols <- lots.of.cols[,c(1,2)] #extract only column 1 and 2
str(only.first.two.rows)
only.a.and.b <- lots.of.cols[,c("a", "b")]
str(only.a.and.b)

Related

Creating a loop to count the number of instances of each number, record the unique number and count of each instances in a new data frame [duplicate]

This question already has answers here:
Count number of occurences for each unique value
(14 answers)
Closed 2 years ago.
apologies in advance I am a beginner to R. I have loaded a CVS file into a new data frame - One of the columns provides a category number (from 1 to 6).I want to create a loop to count the number of times each category number appears , and then store this within a new data frame. (The new data frame would be the category number and how many times it appears)
I have created the below script so far however unsure how to store the results within the new data frame and include the category number.
Summarydf<-NULL
unique<-c(unique(Data$Type))
for (i in unique) {
Summarydf<-c(sum(Data$Type==i))
print(Summarydf)
}
You can just convert Data$Type to a factor and get a summary as a vector of the number of occurrences of each type. e.g.:
L <- LETTERS
Type <- sample(1:6, 26, replace = TRUE)
Data <- data.frame(L, Type)
Data$Type = as.factor(Data$Type)
summaryType <- summary(Data$Type)
summaryType
1 2 3 4 5 6
4 4 5 7 3 3

Automate data frame width in R [duplicate]

This question already has answers here:
how to remove multiple columns in r dataframe?
(8 answers)
Select column 2 to last column in R
(4 answers)
Closed 2 years ago.
I have a data frame that I import from excel each week with 40+ columns. Each week the data frame has a different number of columns, I am only interested in the first 40. I take the data frame, drop the columns after 40 and rbind to another data frame (that has 40 columns).
The code I use to drop the columns is"
df = df[ -c(40:45) ] #assume df has 45 columns this week.
I would like to find a step to automate the lendth of columns to drop, similar to length(df$x) type of idea. I try width(df) but doesn't seem to work?
Any ideas please?

Extract rows from second Dataframe which are newly added compare to first Dataframe [duplicate]

This question already has answers here:
Find complement of a data frame (anti - join)
(7 answers)
Closed 2 years ago.
I have two data frames, I need to find the rows in second data frame which are newly added that means my First data frame has some rows and my second data frame can have few rows from my First data frame and some other rows also. I need to find those rows which are not in first data frame. That means rows which are only in my second data frame.
Below is the example with output
comp1<- data.frame(sector =c('Sector_123','Sector_456','Sector_789','Sector_101','Sector_111','Sector_113','Sector_115','Sector_117'), id=c(1,2,3,4,5,6,7,8) ,stringsAsFactors = FALSE)
comp2 <- data.frame(sector = c('Sector_456','Sector_789','Sector_000','Sector_222'), id=c(2,3,6,5), stringsAsFactors = FALSE)
Expected output is should be like below:
sector id
Sector_000 6
Sector_222 5
I should not use any other libraries like compare and data.table.
any suggestions
Assuming we are looking for similar entries in column sector. For all columns just remove the restriction.
We could use dplyr:
anti_join(comp2, comp1, by="sector")
gives us
> anti_join(comp2, comp1, by="sector")
sector id
1 Sector_000 6
2 Sector_222 5
With base R we could use
comp2[!comp2$sector %in% comp1$sector,]

How to count the number of occurence of First Charcter of each string of a column in R [duplicate]

This question already has answers here:
Counting unique / distinct values by group in a data frame
(12 answers)
Closed 4 years ago.
I have a data set which has a single column containing multiple names.
For eg
Alex
Brad
Chrisitne
Alexa
Brandone
And almost 100 records like this. I want to display record as
A 2
B 2
C 1
Which means i need to show this frequency from higher to lower and if there is a tie breaker , the the values should be shown in Alphabetical Order .
I have been trying to solve this but i am not able to.
Any pointer on these ?
df <- data.frame(name = c("Alex", "Brad", "Brad"))
first_characters <- substr(df$name, 1, 1)
result <- sort(table(first_characters), decreasing = TRUE)
# from wide to long
data.frame(result)

How can I select text from a row and column of a data frame without returning all Levels in R? [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
dropping factor levels in a subsetted data frame in R
I would like to get the text from a cell in a data frame (and use that text to create a file). However, whenever I select a specific row and column from the data frame, the result is the contents of the data frame followed by a list of levels from the data frame. For example:
getFileNameTest<-function(
columnNames=c(cName1,cName2)
)
{
list1<-c("joe", "bob", "sue")
list2<-c("jones","smith","smith")
myDataFrame<-data.frame(list1,list2)
joeFileName<-myDataFrame[1,1]
return(joeFileName)
}
This function returns:
[1] joe
Levels: bob joe sue
But I would like just "joe" so that I can later create a file named "joe." How do I grab the contents of a specific row and column in a data frame without returning the levels?
as #joran suggests or:
df <- data.frame(x=sample(LETTERS,10))
df[,1][[1]]
as.character(df[,1][[1]])

Resources