How to parse the blank columns in read.table? [duplicate] - r

This question already has answers here:
Read fixed width text file
(6 answers)
Closed 9 years ago.
dat="mpg cyl disp hp drat wt qsec vs am gear carb
21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
2"
I need it as below, see row 6?
> read.table(text=dat,fill=T,header=TRUE)
mpg cyl disp hp drat wt qsec vs am gear carb
1 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
2 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
3 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
4 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
5 2.0 NA NA NA NA NA NA NA NA NA NA
> read.table(text=dat,fill=T,header=TRUE)
mpg cyl disp hp drat wt qsec vs am gear carb
1 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
2 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
3 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
4 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
5 NA NA NA NA NA NA NA NA NA NA 2

I solved it myself.
read.fwf(file=textConnection(dat),fill=TRUE,skip=1,widths=c(12,4,6,4,5,6,6,3,3,5,5)) -> r
unlist(strsplit(y,split="\\s+")) -> colnames(y)
unlist(strsplit(y,split="\\s+")) -> colnames(r)

Related

If column A equals criteria return value of column B in column C

Using the R inbuilt dataset
mtcars
I want to make a column called "want".
mtcars$want<-NA
When column "carb" is equal to 1 (Column A), input value of column "qsec" (Column B) in column "want" (Column C).
If carb is not equal to 1 do nothing.
The first 5 rows of the new dataset should look like this:
mpg cyl disp hp drat wt qsec vs am gear carb want
Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 NA
Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 NA
Datsun 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 18.61
Hornet Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 19.44
Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2 NA
This should do the job:
mtcars$want <- ifelse(mtcars$carb == 1, mtcars$qsec, NA)
head(mtcars, 5)
mpg cyl disp hp drat wt qsec vs am gear carb want
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 NA
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 NA
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 18.61
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 19.44
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 NA
If you only want to achieve it in the print out you could try the following (in the data.frame itself this will still be shown as NA):
mtcars$want <- ifelse(mtcars$carb == 1, mtcars$qsec, "")
head(mtcars, 5)
mpg cyl disp hp drat wt qsec vs am gear carb want
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 18.61
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 19.44
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
If it is helpful, I am of the impression that a loop over the columns should work. One can modify the loop or add further conditionals as appropriate to fill in the other values of the column.
#written in R version 4.2.1
data(mtcars)
mtcars$want = 0
for(i in 1:dim(mtcars)[1]){
if(mtcars$carb[i] == 1){
mtcars$want[i] = mtcars$qsec[i]
}}
Result:
head(mtcars)
# mpg cyl disp hp drat wt qsec vs am gear carb want
#Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 0.00
#Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 0.00
#Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 18.61
#Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 19.44
#Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 0.00
#Valiant
What you can do is first set a value to your new column "want" for example 2. You can use ifelse to do your criteria and return "want" if do nothing like this:
mtcars$want <- 2
library(dplyr)
mtcars %>%
mutate(want = ifelse(carb == 1, qsec, want)) %>%
head(5)
#> mpg cyl disp hp drat wt qsec vs am gear carb want
#> Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 2.00
#> Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 2.00
#> Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 18.61
#> Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 19.44
#> Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 2.00
Created on 2022-06-30 by the reprex package (v2.0.1)

How to message an R data frame to the console?

If I put a data frame to the console directly, it looks nice:
> head(datasets::mtcars, 4)
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
How do I get the same nice output, but through message ?
> message(head(datasets::mtcars, 4))
c(21, 21, 22.8, 21.4)c(6, 6, 4, 6)c(160, 160, 108, 258)c(110, 110, 93, 110)c(3.9, 3.9, 3.85, 3.08)c(2.62, 2.875, 2.32, 3.215)c(16.46, 17.02, 18.61, 19.44)c(0, 0, 1, 1)c(1, 1, 1, 0)c(4, 4, 4, 3)c(4, 4, 1, 1)
This question looks similar, but didn't help me.
We can use paste to create a vector of values and pass it on to message
message(do.call(paste, c(head(datasets::mtcars, 4), collapse="\n")))
-output
# 21 6 160 110 3.9 2.62 16.46 0 1 4 4
#21 6 160 110 3.9 2.875 17.02 0 1 4 4
#22.8 4 108 93 3.85 2.32 18.61 1 1 4 1
#21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
Inorder to get the row names attributes, we could use capture.output
message(paste(capture.output(head(datasets::mtcars, 4)), collapse="\n"))
-output
# mpg cyl disp hp drat wt qsec vs am gear carb
#Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
#Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
#Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
#Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1

Use mutate in purrr workflow

I got the following datasets:
dflist <- list(mtcars, mtcars)
dflist[[1]] %>%
mutate(cyl2 = cyl * 2)
This works!
dflist %>%
map(.x, ~.x$cyl2 = .x$cyl * 2)
Error: unexpected '=' in:
"dflist %>%
map(.x, ~x$cyl2 ="
This results in an error. I tried other options, but the function does not except the = sign. What is wrong there?
Try :
library(dplyr)
library(purrr)
dflist %>% map(~.x %>% mutate(cyl2 = cyl * 2))
#[[1]]
# mpg cyl disp hp drat wt qsec vs am gear carb cyl2
#1 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 12
#2 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 12
#3 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 8
#4 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 12
#5 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2 16
#....
#[[2]]
# mpg cyl disp hp drat wt qsec vs am gear carb cyl2
#1 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 12
#2 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 12
#3 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 8
#4 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 12
#5 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2 16
#...
Or keeping it in base R:
lapply(dflist, function(x) transform(x, cyl2 = cyl * 2))
You can also try:
modify(dflist, ~ update_list(., cyl2 = ~ cyl * 2))
[[1]]
mpg cyl disp hp drat wt qsec vs am gear carb cyl2
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 12
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 12
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 8
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 12
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 16
[[2]]
mpg cyl disp hp drat wt qsec vs am gear carb cyl2
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 12
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 12
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 8
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 12
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 16
We can use transform without anonymous function call in base R
lapply(dflist, transform, cyl2 = cyl *2)

Transform string of expression into quotable expression

How do I transform a string of expression into a quotable expression?
Example:
This is the result I want:
mutate(mtcars,answer=wt+wt)
# mpg cyl disp hp drat wt qsec vs am gear carb answer
# 1 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 5.240
# 2 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 5.750
# 3 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 4.640
# 4 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 6.430
# 5 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2 6.880
# 6 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1 6.920
...
Here's the function I am writing:
f<-function(df,string_expression){
se<-enexpr(string_expression)
mutate(df,answer=!!se)
}
It will work if I use the following functional call:
f(mtcars,wt+wt)
# mpg cyl disp hp drat wt qsec vs am gear carb answer
# 1 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 5.240
# 2 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 5.750
# 3 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 4.640
# 4 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 6.430
# 5 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2 6.880
# 6 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1 6.920
...
However, I would like to provide the expression as a string, so I must use the following function call:
f(mtcars,'wt+wt')
# mpg cyl disp hp drat wt qsec vs am gear carb answer
# 1 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 wt+wt
# 2 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 wt+wt
# 3 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 wt+wt
# 4 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 wt+wt
# 5 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2 wt+wt
...
How do I make it (either change the function definition or function call) to get the result I want?
What I have tried:
I have tried to sym(string_expression) -- didn't work.
I have tried to quo(string_expression) -- didn't work.
Thank you!
You could change your f function to something this:
f<-function(df,string_expression){
mutate(df, answer = eval(parse(text = string_expression)))
}
head(f(mtcars,'wt+wt'))
mpg cyl disp hp drat wt qsec vs am gear carb answer
1 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 5.24
2 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 5.75
3 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 4.64
4 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 6.43
5 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 6.88
6 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1 6.92

split comma-separated column entry into rows

I have already found other versions of the same question but I was not able to adapt the answers given there for my problem. Here is an older link:
The op there had data consisting of two columns only - and the given answer handles this really nicely. But what about more than two columns? Is there a way to adapt the linked code snippet?
Here is an example:
ve <- rbind("4,2","3","1,2,3","5","6","7")
expl <- cbind(head(mtcars),ve)
row.names mpg cyl disp hp drat wt qsec vs am gear carb ve
1 Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 4,2
2 Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 3
3 Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 1,2,3
4 Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 5
5 Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 6
6 Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1 7
I would need:
row.names mpg cyl disp hp drat wt qsec vs am gear carb ve
1 Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 4
2 Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 2
3 Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 3
4 Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 1
5 Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 2
6 Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 3
7 Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 5
8 Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 6
9 Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1 7
Thank you!
Try unnest from the tidyr package. My example uses dplyr, but you can also accomplish with base functions.
library(dplyr)
library(tidyr)
expl %>%
mutate(ve = strsplit(as.character(ve), ",")) %>%
unnest(ve)
Here's an attempt using base R only (which also preserves the row names- in a way at least...)
ve <- strsplit(ve, ",")
Res <- expl[rep(seq_len(nrow(expl)), sapply(ve, length)), ]
Res$ve <- unlist(ve)
Res
# mpg cyl disp hp drat wt qsec vs am gear carb ve
# Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 4
# Mazda RX4.1 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 2
# Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 3
# Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 1
# Datsun 710.1 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 2
# Datsun 710.2 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 3
# Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 5
# Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 6
# Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1 7
Or using data.table, one option is
library(data.table)
setDT(expl)[,
strsplit(as.character(ve), ","),
c(names(expl)[-length(expl)])
]
Another option would be
setkey(expl, ve)[setDT(expl)[, strsplit(as.character(ve), ","), ve]]
I would recommend cSplit from my "splitstackshape" package.
Since your example has rownames, I've converted your example data to a data.table with the keep.rownames = TRUE argument.
library(splitstackshape)
cSplit(as.data.table(expl, keep.rownames = TRUE), "ve", ",", "long")
# rn mpg cyl disp hp drat wt qsec vs am gear carb ve
# 1: Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 4
# 2: Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 2
# 3: Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 3
# 4: Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 1
# 5: Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 2
# 6: Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 3
# 7: Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 5
# 8: Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 6
# 9: Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1 7

Resources