I have a table that has the column header in row 2 with the actual data starting in row 5. My question is how to read the table skipping rows 1, 3 and 4 and assign row 2 as column header?
I'm using something like below. However, would like to understand if there are better ways.
headers <- read.table("file_1", skip=1, header=F, sep =',', nrows=1, as.is=T)
df <- read.table("file_1", skip=3, header=F, sep =',')
colnames(df) <- headers
Not very different, but you could scan the header row and read.table for the remainder.
You are probably facing something like this.
tb <- ' 1 1 1 1 1 1 1 1 1 1 1
2 X1 X2 X3 X4 X5 X6 X7 X8 X9 X10
3 3 3 3 3 3 3 3 3 3 3
4 4 4 4 4 4 4 4 4 4 4
5 5 5 5 5 5 5 5 5 5 5
6 6 6 6 6 6 6 6 6 6 6
7 7 7 7 7 7 7 7 7 7 7
8 8 8 8 8 8 8 8 8 8 8
9 9 9 9 9 9 9 9 9 9 9
10 10 10 10 10 10 10 10 10 10 10'
scan your file, state what=character(), how many lines are skipped and nlines to read in the column names r1. For the data r2, read.table and skip= all unneeded stuff. Skip first element each, since it's the indices. Finally use r1 to setNames of r2, and type.convert.
r1 <- scan(text=tb, what=character(), skip=1, nlines=1)[-1]
r2 <- read.table(text=tb, skip=4)[-1]
res <- r2 |>
setNames(r1) |>
type.convert(as.is=TRUE)
res
# X1 X2 X3 X4 X5 X6 X7 X8 X9 X10
# 1 5 5 5 5 5 5 5 5 5 5
# 2 6 6 6 6 6 6 6 6 6 6
# 3 7 7 7 7 7 7 7 7 7 7
# 4 8 8 8 8 8 8 8 8 8 8
# 5 9 9 9 9 9 9 9 9 9 9
# 6 10 10 10 10 10 10 10 10 10 10
Note: It depends a little on how the data is stored in the file and you probably have to customize the skip='s.
Related
I would like to create lagged variables for several columns that are grouped by two conditions.
Here is the dataset:
df <- data.frame(id = c(rep(1,4),rep(2,4)), tp = rep(1:4,2), x1 = 1:8, x2 = 2:9, x3 = 3:10, x4 = 4:11)
> df
id tp x1 x2 x3 x4
1 1 1 1 2 3 4
2 1 2 2 3 4 5
3 1 3 3 4 5 6
4 1 4 4 5 6 7
5 2 1 5 6 7 8
6 2 2 6 7 8 9
7 2 3 7 8 9 10
8 2 4 8 9 10 11
I want to lag x1, x2, x3, x4 that are grouped by id and tp and create new variables x1_lag1, x2_lag1, x3_lag1, x4_lag1, like this:
> df
id tp x1 x2 x3 x4 x1_lag1 x2_lag1 x3_lag1 x4_lag1
1 1 1 1 2 3 4 2 3 4 5
2 1 2 2 3 4 5 3 4 5 6
3 1 3 3 4 5 6 4 5 6 7
4 1 4 4 5 6 7 NA NA NA NA
5 2 1 5 6 7 8 6 7 8 9
6 2 2 6 7 8 9 7 8 9 10
7 2 3 7 8 9 10 8 9 10 11
8 2 4 8 9 10 11 NA NA NA NA
How to achieve that?
Your result doesn't seem to be grouped by tp at all. It is grouped by id and ordered by tp within the id grouping.
Generally a "lag" is a variable that takes the value from the previous row. The columns you want labeled as "lag" columns take the value from the next row, so we use the lead function.
library(dplyr)
df %>%
group_by(id) %>%
mutate(across(starts_with("x"), lead, .names = "{.col}_lag1")) %>%
ungroup()
# A tibble: 8 × 10
id tp x1 x2 x3 x4 x1_lag1 x2_lag1 x3_lag1 x4_lag1
<dbl> <int> <int> <int> <int> <int> <int> <int> <int> <int>
1 1 1 1 2 3 4 2 3 4 5
2 1 2 2 3 4 5 3 4 5 6
3 1 3 3 4 5 6 4 5 6 7
4 1 4 4 5 6 7 NA NA NA NA
5 2 1 5 6 7 8 6 7 8 9
6 2 2 6 7 8 9 7 8 9 10
7 2 3 7 8 9 10 8 9 10 11
8 2 4 8 9 10 11 NA NA NA NA
I have a dataset that looks like this
With further rows below. I want to create a column to the right that will have 1 if it matches with a certain value I am checking for row-wise and otherwise it will be 0.
For a single value I have the following code -
set.seed(4991)
my_data <- data.frame(ceiling(matrix(runif(100,4,10),ncol = 5)))
comval <- c(5)
my_data$bleh <- as.integer(apply(my_data, 1, function(r) any(comval %in% r)))
The output looks like this -
Which is what I want. Now the issue I am having is that if I have two or more values under 'comval' , for instance,
comval<-c(5,10)
I am getting 1 on the 'bleh' column for all columns that either have 5 or 10. The output is like -
It is like an OR logical operator. I need it to work as an AND logical operator, that is, 'bleh' column will have the value 1 only if all the values in 'comval' are there in the rows.
Also, I am trying to write a function here so I need to take the length(comval) as an input and then check for all the values in 'comval' against each row.
You could check if length of intersect is equal or greater than 1.
my_data$bleh <- as.integer(apply(my_data, 1, function(r) {
length(intersect(comval, unlist(r))) >= 1
}))
# X1 X2 X3 X4 X5 bleh
# 1 5 10 5 6 10 1
# 2 9 9 5 8 6 1
# 3 5 10 5 5 5 1
# 4 10 8 6 5 8 1
# 5 8 6 7 9 10 1
# 6 5 10 8 10 8 1
# 7 9 8 10 5 7 1
# 8 6 8 10 6 7 1
# 9 5 5 6 6 8 1
# 10 10 5 8 6 8 1
# 11 9 10 10 7 7 1
# 12 6 8 7 10 8 1
# 13 6 9 7 6 9 0
# 14 8 6 6 10 7 1
# 15 9 9 5 7 7 1
# 16 10 9 9 10 6 1
# 17 7 10 5 10 8 1
# 18 9 8 10 9 9 1
# 19 10 8 9 6 8 1
# 20 5 8 6 7 5 1
How to retain only unique values in each row for a data frame
input is as below:
1 1 2 3 4 1 6 7 8
2 2 5 5 7 8 9 0 0
6 6 6 6 5 1 2 3 4
Output would be as below
1 2 3 4 6 7 8
2 5 7 8 9
6 5 1 2 3 4
plyr, unique i tried, but it retains the unique values in complete data set
You can use sapply or lapply to accomplish it .
#supposing your data.frame is called 'df'
sapply(df, unique)
#$x1
#[1] 1 2 3 4 6 7 8
#
#$x2
#[1] 2 5 7 8 9 0
#
#$x3
#[1] 6 5 1 2 3 4
or
lapply(df, unique)
#$x1
#[1] 1 2 3 4 6 7 8
#
#$x2
#[1] 2 5 7 8 9 0
#
#$x3
#[1] 6 5 1 2 3 4
# Imagine D is your data.frame object
apply(D,1, function(x) rle(x)$values)
A=apply(dat,1,unique)
data.frame(t(sapply(A,`length<-`,max(lengths(A)))))
X1 X2 X3 X4 X5 X6 X7
1 1 2 3 4 6 7 8
2 2 5 7 8 9 0 NA
3 6 5 1 2 3 4 NA
Eliminate in an increasing order rows in a data frame
x<-c(4,5,6,23,5,6,7,8,0,3)
y<-c(2,4,5,6,23,5,6,7,8,0)
z<-c(1,2,4,5,6,23,5,6,7,8)
df<-data.frame(x,y,z)
df
x y z
1 4 2 1
2 5 4 2
3 6 5 4
4 23 6 5
5 5 23 6
6 6 5 23
7 7 6 5
8 8 7 6
9 0 8 7
10 3 0 8
I would like to eliminate number 23 in the df from all columns by instructing to sequentially increasingly remove a row per column (not by matching the value 23, but by its initial x location).
df
x y z
1 4 2 1
2 5 4 2
3 6 5 4
4 5 6 5
5 6 5 6
6 7 6 5
7 8 7 6
8 0 8 7
9 3 0 8
Thank you
You can iterate through the columns and remove the element from each, then reassemble as a data frame:
result <- as.data.frame(lapply(1:ncol(df), function(x) df[-(x+3),x]))
names(result) <- names(df)
result
## x y z
## 1 4 2 1
## 2 5 4 2
## 3 6 5 4
## 4 5 6 5
## 5 6 5 6
## 6 7 6 5
## 7 8 7 6
## 8 0 8 7
## 9 3 0 8
df[-(x+3),x] is the column with the value removed, by location. To start with row N in column x you would use df[-(x+N-1),x].
You could also try:
n <- 4
df1 <- df[-n,]
df1[] <- unlist(df,use.names=FALSE)[-seq(n, prod(dim(df)), by=nrow(df)+1)]
df1
# x y z
#1 4 2 1
#2 5 4 2
#3 6 5 4
#5 5 6 5
#6 6 5 6
#7 7 6 5
#8 8 7 6
#9 0 8 7
#10 3 0 8
I have a simple data frame as follows
x = data.frame(id = seq(1,10),val = seq(1,10))
x
id val
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
9 9
10 10
I want to add 4 more columns. The first 2 are the previous two rows and the next two are the next two rows. For the first two rows and last two rows it needs to write out as NA.
How do I accomplish this using cast in the reshape package?
The final output would look like
1 1 NA NA 2 3
2 2 NA 1 3 4
3 3 1 2 4 5
4 4 2 3 5 6
... and so on...
Thanks much in advance
After your give the example , I change the solution
mat <- cbind(dat,
c(c(NA,NA),head(dat$id,-2)),
c(c(NA),head(dat$val,-1)),
c(tail(dat$id,-1),c(NA)),
c(tail(dat$val,-2),c(NA,NA)))
colnames(mat) <- c('id','val','idp','valp','idn','valn')
id val idp valp idn valn
1 1 1 NA NA 2 3
2 2 2 NA 1 3 4
3 3 3 1 2 4 5
4 4 4 2 3 5 6
5 5 5 3 4 6 7
6 6 6 4 5 7 8
7 7 7 5 6 8 9
8 8 8 6 7 9 10
9 9 9 7 8 10 NA
10 10 10 8 9 NA NA
Here is a soluting with sapply. First, choose the relative change for the new columns:
lags <- c(-2, -1, 1, 2)
Create the new columns:
newcols <- sapply(lags,
function(l) {
tmp <- seq.int(nrow(x)) + l;
x[replace(tmp, tmp < 1 | tmp > nrow(x), NA), "val"]})
Bind together:
cbind(x, newcols)
The result:
id val 1 2 3 4
1 1 1 NA NA 2 3
2 2 2 NA 1 3 4
3 3 3 1 2 4 5
4 4 4 2 3 5 6
5 5 5 3 4 6 7
6 6 6 4 5 7 8
7 7 7 5 6 8 9
8 8 8 6 7 9 10
9 9 9 7 8 10 NA
10 10 10 8 9 NA NA