Importing csv with date headers - r

I am importing an excel csv file which has a large number of columns. Each column is for a different date. e.g. March 1990, April 1990.
When I import the column headers are being changed to numbers, for example, 34355, 34356.
How do I preserve the dates?
I tried using the r studio import function
sales <- read_csv("W:/Sales_data/sales.csv")
Expected
First_Name Sir_name Region Jan_1980 Feb_1980 Mar_1980
George Dell LA 52 23 121
Lisa Stevens NY 234 122
Peter Hunt TX 3242 12 123
Actual
First_Name Sir_name Region 34524 34525 34526
George Dell LA 52 23 121
Lisa Stevens NY 234 122
Peter Hunt TX 3242 12 123
Any help is greatly appreciated.

You need to import the first as data and not headers. Then, change the format to match your desired. Finally, assign the first row as column names and remove it next.
library(readr)
sales <- read_csv("W:/Sales_data/sales.csv",
col_names = FALSE)
sales[1,4:6] <- format(as.Date(sales[1,4:6], origin = "1899-12-30"), "%b_%Y")
colnames(sales) <- sales[1,]
sales <- sales[-1,]

Related

Remove specific value in R or Linux

Hi I have a file (tab sep) in terminal that has several columns as below. You can see last column has a comma in between followed by one or more characters.
1 100 Japan Na pa,cd
2 120 India Ca pa,ces
5 110 Japan Ap pa,cres
1 540 China Sn pa,cd
1 111 Nepal Le pa,b
I want to keep last column values before the comma so the file can look like
2 120 India Ca pa
5 110 Japan Ap pa
1 540 China Sn pa
1 111 Nepal Le pa
I have looked for sed but I cannot find a way to exclude them
Regards
In R you can read the file with a tab-separator and remove the values after comma.
result <- transform(read.table('file1.txt', sep = '\t'), V5 = sub(',.*', '', V5))
V5 is used assuming it is the 5th column that you want to change the value.
We can use
df1 <- read.tsv('file1.txt', sep="\t")
df1$V5 <- sub("^([^,]+),.*", "\\1", df1$V5)

R - Converting one table to multiple table

I have a csv file called data.csv that contains 3 tables all merged in just one. I would like to separate them in 3 different data.frame when I import it to R.
So far this is what I have got after running this code:
df <- read.csv("data.csv")
View(df)
Student
Name Score
Maria 18
Bob 25
Paul 27
Region
Country Score
Italy 65
India 99
United 88
Sub region
City Score
Paris 77
New 55
Rio 78
How can I split them in such a way that I get this result:
First:
View(StudentDataFrame)
Name Score
Maria 18
Bob 25
Paul 27
Second:
View(regionDataFrame)
Country Score
Italy 65
India 99
United 88
Third:
View(SubRegionDataFrame)
City Score
Paris 77
New 55
Rio 78
One option would be to read the dataset with readLines, create a grouping variable ('grp') based on the location of 'Student', 'Region', 'Sub region' in the 'lines', split it and read it with read.table
i1 <- trimws(lines) %in% c("Student", "Region", "Sub region")
grp <- cumsum(i1)
lst <- lapply(split(lines[!i1], grp[!i1]), function(x)
read.table(text=x, stringsAsFactors=FALSE, header=TRUE))
lst
#$`1`
# Name Score
#1 Maria 18
#2 Bob 25
#3 Paul 27
#$`2`
# Country Score
#1 Italy 65
#2 India 99
#3 United 88
#$`3`
# City Score
#1 Paris 77
#2 New 55
#3 Rio 78
data
lines <- readLines("data.csv")

R. How to add sum row in data frame

I know this question is very elementary, but I'm having a trouble adding an extra row to show summary of the row.
Let's say I'm creating a data.frame using the code below:
name <- c("James","Kyle","Chris","Mike")
nationality <- c("American","British","American","Japanese")
income <- c(5000,4000,4500,3000)
x <- data.frame(name,nationality,income)
The code above creates the data.frame below:
name nationality income
1 James American 5000
2 Kyle British 4000
3 Chris American 4500
4 Mike Japanese 3000
What I'm trying to do is to add a 5th row and contains: name = "total", nationality = "NA", age = total of all rows. My desired output looks like this:
name nationality income
1 James American 5000
2 Kyle British 4000
3 Chris American 4500
4 Mike Japanese 3000
5 Total NA 16500
In a real case, my data.frame has more than a thousand rows, and I need efficient way to add the total row.
Can some one please advice? Thank you very much!
We can use rbind
rbind(x, data.frame(name='Total', nationality=NA, income = sum(x$income)))
# name nationality income
#1 James American 5000
#2 Kyle British 4000
#3 Chris American 4500
#4 Mike Japanese 3000
#5 Total <NA> 16500
using index.
name <- c("James","Kyle","Chris","Mike")
nationality <- c("American","British","American","Japanese")
income <- c(5000,4000,4500,3000)
x <- data.frame(name,nationality,income, stringsAsFactors=FALSE)
x[nrow(x)+1, ] <- c('Total', NA, sum(x$income))
UPDATE: using list
x[nrow(x)+1, ] <- list('Total', NA, sum(x$income))
x
# name nationality income
# 1 James American 5000
# 2 Kyle British 4000
# 3 Chris American 4500
# 4 Mike Japanese 3000
# 5 Total <NA> 16500
sapply(x, class)
# name nationality income
# "character" "character" "numeric"
If you want the exact row as you put in your post, then the following should work:
newdata = rbind(x, data.frame(name='Total', nationality='NA', income = sum(x$income)))
I though agree with Jaap that you may not want this row to add to the end. In case you need to load the data and use it for other analysis, this will add to unnecessary trouble. However, you may also use the following code to remove the added row before other analysis:
newdata = newdata[-newdata$name=='Total',]

readr::read_csv(), empty strings as NA not working

I was trying to load a CSV file (readr::read_csv()) in which some entries are blank. I set the na="" in read_csv() but it still loads them as blank entries.
d1 <- read_csv("sample.csv",na="") # want to load empty string as NA
Where Sample.csv file can be like following:-
Name,Age,Weight,City
Sam,13,30,
John,35,58,CA
Doe,20,50,IL
Ann,18,45,
d1 should show me as following(using read_csv())
Name Age Weight City
1 Sam 13 30 NA
2 John 35 58 CA
3 Doe 20 50 IL
4 Ann 18 45 NA
First and fourth row of City should have NA (as shown above). But in actual its showing blank there.
Based on the comments and verifying myself, the solution was to upgrade to readr_0.2.2.
Thanks to fg nu, akrun and Richard Scriven

How to create a new column with names in a list

I have searched posts on web to find a solution. But, I could not identify any. Therefore, I decided to ask your help. I have a list with data frames. I chose certain columns from each data frame and combine them. When I was combining data from two data frames, I wanted to add a column which includes the names of the list. But, I could not achieve this. Here are a sample data and what I have tried.
Sample data & my attempt
### 1st dataframe
time <- seq(as.Date("2014-09-01"), by = "day", length.out = 12)
temperature <- sample(c(15:26), replace = TRUE)
weather <- sample(c("clear", "cloudy", "rain"), size = 12, replace = TRUE)
rome <- data.frame(time, temperature, weather, stringsAsFactors = F)
### 2nd dataframe
time <- seq(as.Date("2014-09-01"), by = "day", length.out = 12)
temperature <- sample(c(12:23), replace = TRUE)
weather <- sample(c("clear", "cloudy", "rain"), size = 12, replace = TRUE)
paris <- data.frame(time, temperature, weather, stringsAsFactors = F)
### Assign names to each data frame and create a list
ana <- list(rome = rome, paris = paris)
#Here are a bit of data.
#> ana
#$rome
# time temperature weather
#1 2014-09-01 19 cloudy
#2 2014-09-02 21 cloudy
#3 2014-09-03 17 clear
#$paris
# time temperature weather
#1 2014-09-01 18 clear
#2 2014-09-02 12 cloudy
#3 2014-09-03 17 cloudy
### Select 1st and 2nd column from each data frame in the list and
### combine them.
rbind.fill(lapply(ana, `[`, 1:2))
I wanted to add something here to create the following ideal outcome with the new column, location. Please note that I trimmed the ideal outcome to save space.
time temperature location
1 2014-09-01 19 rome
2 2014-09-02 21 rome
3 2014-09-03 17 rome
13 2014-09-01 18 paris
14 2014-09-02 12 paris
15 2014-09-03 17 paris
One thing I tried was to use cbind() in the following way although I knew this would not work.
lapply(ana, function(x) cbind(x, new = names(ana)))
#$rome
# time temperature new
#1 2014-09-01 19 rome
#2 2014-09-02 21 paris
#3 2014-09-03 17 rome
#
#$paris
# time temperature new
#1 2014-09-01 18 rome
#2 2014-09-02 12 paris
#3 2014-09-03 17 rome
I have the feeling that setNames() may offer something, and that this can be done in a simple way. I could be wrong, though. Thank you very much for taking your time.
You can do
ana <- Map(cbind, ana, location = names(ana))
to append the location column before calling rbind.fill.
Nothing wrong with a simple for loop:
for (i in names(ana)) ana[[i]]$location <- i
And then use rbind.fill.
Thank you very much for great support. The for loop and Map() solutions work absolutely fine. I learned a lot. At the same time, the link Henrik provided offered me the solution I was looking for. I, therefore, decided to leave the answer here. Once again, thank you for all your support.
ldply(ana, rbind)
.id time temperature weather
1 rome 2014-09-01 19 cloudy
2 rome 2014-09-02 21 cloudy
3 rome 2014-09-03 17 clear
13 paris 2014-09-01 18 clear
14 paris 2014-09-02 12 cloudy
15 paris 2014-09-03 17 cloudy

Resources