How to convert "heading" rows into new columns - r

I have data (imported imperfectly from a PDF) that has everything in a single column, with certain rows as descriptive headers. For example:
dfx <- data.frame(V1 = c("Box 1", "abcd10", "bcde15", "Box 2", "cdefg35", "jklm40", "nopq50", "rstu52"))
V1
1 Box 1
2 abcd10
3 bcde15
4 Box 2
5 cdefg35
6 jklm40
7 nopq50
8 rstu52
I want to create a separate column where each observation takes on the value of the nearest heading above it. Like this:
V1 v2
1 abcd10 Box 1
2 bcde15 Box 1
3 cdefg35 Box 2
4 jklm40 Box 2
5 nopq50 Box 2
6 rstu52 Box 2
Nothing I've tried has gotten me close. Any help would be appreciated. Thanks!

An idea via base R can be,
i1 <- grepl('Box', dfx$V1)
dfx$new <- with(dfx, ave(V1, cumsum(i1), FUN = function(i) i[1]))
subset(dfx, !i1)
# V1 new
#2 abcd10 Box 1
#3 bcde15 Box 1
#5 cdefg35 Box 2
#6 jklm40 Box 2
#7 nopq50 Box 2
#8 rstu52 Box 2

You could also do:
indx <- grepl("^Box \\d+$",dfx$V1)
transform(dfx,v2=V1[indx][cumsum(indx)])[!indx,]
V1 v2
2 abcd10 Box 1
3 bcde15 Box 1
5 cdefg35 Box 2
6 jklm40 Box 2
7 nopq50 Box 2
8 rstu52 Box 2

Create a V2 column which equals V1 for the Box rows and NA for other rows and then use na.locf0 to fill in the NAs. Finally remove the V1 Box rows.
library(zoo)
isBox <- grepl("Box", dfx$V1)
transform(dfx, V2 = na.locf0(replace(V1, !isBox, NA)))[ !isBox, ]
giving:
V1 V2
2 abcd10 Box 1
3 bcde15 Box 1
5 cdefg35 Box 2
6 jklm40 Box 2
7 nopq50 Box 2
8 rstu52 Box 2

Related

Shifting positions of values in a single column

This is my first question, so please let me know if I made any mistakes in the ask.
I am trying to create a dataframe which has multiple columns all containing the same values in the same order, but shifted in position. Where the first value from each column is moved to the end, and everything else is shifted up.
For example, I would like to convert a data frame like this:
example = data.frame(x=c(1,2,3,4), y=c(1,2,3,4), z=c(1,2,3,4), w=c(1,2,3,4)
Which looks like this
x y z w
1 1 1 1
2 2 2 2
3 3 3 3
4 4 4 4
into this:
x y z w
1 2 3 4
2 3 4 1
3 4 1 2
4 1 2 3
In the new dataframe, the "peak" or # 4 has moved progressively up in rows.
I've seen advice on how to shift columns up and down, but just replacing the remaining values with zeroes or NA. But I don't know how to shift the column up and replace the bottom-most value with what was formerly at the top.
Thanks in advance for any help.
In base R, we can update with Map by removing the sequence of elements while appending values from the end
example[-1] <- Map(function(x, y) c(tail(x, -y),
head(x, y)), example[-1], head(seq_along(example), -1))
example
# x y z w
#1 1 2 3 4
#2 2 3 4 1
#3 3 4 1 2
#4 4 1 2 3
Or another option is embed
example[] <- embed(unlist(example), 4)[1:4, 4:1]

Removing character from dataframe

I have this simple code, which generates a data frame. I want to remove the V character from the middle column. Is there any simple way to do that?
Here is a test code (the actual code is very long), very similar with the actual code.
mat1=matrix(c(1,2,3,4,5,"V1","V2","V3","V4","V5",1,2,3,4,5), ncol=3)
mat=as.data.frame(mat1)
colnames(mat)=c("x","row","y")
mat
This is the data frame:
x row y
1 1 V1 1
2 2 V2 2
3 3 V3 3
4 4 V4 4
5 5 V5 5
I just want to remove the V's like this:
x row y
1 1 1 1
2 2 2 2
3 3 3 3
4 4 4 4
5 5 5 5
We can use str_replace from stringr
library(stringr)
mat$row <- str_replace(mat$row, "V", "")

Using another data table to condition on columns in a primary data table r

Suppose I have two data tables, and I want to use the second one, which contains a row with some column values, to condition the first one.
Specifically, I want to use d2 to select rows where its variables are less than or equal to the values.
d1 = data.table('d'=1,'v1'=1:10, 'v2'=1:10)
d2 = data.table('v1'=5, 'v2'=5)
So I would want the output to be
d v1 v2
1: 1 1 1
2: 1 2 2
3: 1 3 3
4: 1 4 4
5: 1 5 5
But I want to do this without referencing specific names unless it's in a very general way, e.g. names(d2).
You could do it with a bit of text manipulation and a join:
d2[d1, on=sprintf("%1$s>=%1$s", names(d2)), nomatch=0]
# v1 v2 d
#1: 1 1 1
#2: 2 2 1
#3: 3 3 1
#4: 4 4 1
#5: 5 5 1
It works because the sprintf expands to:
sprintf("%1$s>=%1$s", names(d2))
#[1] "v1>=v1" "v2>=v2"

text to dataframe in r

I have a problem in making a dataframe with a text in R.
my text is like this:
t1 = "[[1,5,3,4],[3,2,2,1],[19,11,1,1]]"
and I want to make this dataframe:
V1 V2 V3 V4
1 1 5 3 4
2 3 2 2 1
3 19 11 1 1
To combine the comments, you need to do:
yourDf <- as.data.frame(jsonlite::fromJSON(t1))

Split Column and then aggregate count of unique values

I have the following dataset:
color type
1 black chair
2 black chair
3 black sofa
4 green table
5 green sofa
I want to split this to form the following dataset:
arg value
1 color black
2 color black
3 color black
4 color green
5 color green
6 type chair
7 type chair
8 type sofa
9 type table
10 type sofa
I would then like to calculate unique values of all arg-value combination:
arg value count
1 color black 3
2 color green 2
3 type chair 2
4 type sofa 2
5 type table 1
It does not need to be sorted by count. This would then be printed in the following output form:
arg unique_count_values
1 color black(3) green(2)
2 type chair(2) sofa(2) table(1)
I tried the following:
AttrList<-colnames(DataSet)
aggregate(.~ AttrList, DataSet, FUN=function(x) length(unique(x)) )
I also tried summary(DataSet) but then I am not sure how to manipulate the result to get it in the desired Output form.
I am relatively new to R. If you find something that would reduce the effort then please let me know. Thanks!
Update
So, I tried the following:
x <- matrix(c(101:104,101:104,105:106,1,2,3,3,4,5,4,5,7,5), nrow=10, ncol=2)
V1 V2
1 101 1
2 102 2
3 103 3
4 104 3
5 101 4
6 102 5
7 103 4
8 104 5
9 105 7
10 106 5
Converting to table:
as.data.frame(table(x))
Which gives me:
x Freq
1 1 1
2 2 1
3 3 2
4 4 2
5 5 3
6 7 1
7 101 2
8 102 2
9 103 2
10 104 2
11 105 1
12 106 1
What should I do so I get this:
V Val Freq
1 V2 1 1
2 V2 2 1
3 V2 3 2
4 V2 4 2
5 V2 5 3
6 V2 7 1
7 V1 101 2
8 V1 102 2
9 V1 103 2
10 V1 104 2
11 V1 105 1
12 V1 106 1
Try
library(tidyr)
library(dplyr)
df %>%
gather(arg, value) %>%
count(arg, value) %>%
summarise(unique_count_values = toString(paste0(value, "(", n, ")")))
Which gives:
#Source: local data frame [2 x 2]
#
# arg unique_count_values
# (fctr) (chr)
#1 color black(3), green(2)
#2 type chair(2), sofa(2), table(1)
Here's a base R approach. I've expanded it out a bit mostly so that I can add comments as to what is happening.
The basic idea is to just use sapply to loop through the columns, tabulate the data in each column, and then use sprintf to extract the relevant parts of the tabulation to achieve your desired output (the names, followed by the values in brackets).
The stack function takes the final named vector and converts it to a data.frame.
stack( ## convert the final output to a data.frame
sapply( ## cycle through each column
mydf, function(x) {
temp <- table(x) ## calculate counts and paste together values
paste(sprintf("%s (%d)", names(temp), temp), collapse = " ")
}))
# values ind
# 1 black (3) green (2) color
# 2 chair (2) sofa (2) table (1) type
If the data are factors, you could also try something like the following, which matches the data you expect, but not the desired output.
stack(apply(summary(mydf), 2, function(x) paste(na.omit(x), collapse = " ")))
# values ind
# 1 black:3 green:2 color
# 2 chair:2 sofa :2 table:1 type

Resources