I have a problem in making a dataframe with a text in R.
my text is like this:
t1 = "[[1,5,3,4],[3,2,2,1],[19,11,1,1]]"
and I want to make this dataframe:
V1 V2 V3 V4
1 1 5 3 4
2 3 2 2 1
3 19 11 1 1
To combine the comments, you need to do:
yourDf <- as.data.frame(jsonlite::fromJSON(t1))
Related
Consider the following data named mydata. My intention is to put v1 and v2 in the same column by adding an identifier variable v4.
id v1 v2
1 2 3
2 4 5
3 7 8
OUTPUT required:
id v3 v4
1 2 1
2 4 1
3 7 1
1 3 2
2 5 2
3 8 2
Any help is much appreciated!
I think you are looking for something like dplyr::mutate() for adding columns, and rbind() for stacking two data frames on top of each other.
library(dplyr)
mydata <- data.frame (id = c(1,2,3),
v1 = c(2,4,7),
v2 = c(3,5,8))
)
a<- data.frame(mydata$id, mydata$v1)%>%
mutate(v4=1)%>%
rename(v3=mydata.v1, id=mydata.id )
b<- data.frame(mydata$id, mydata$v2)%>%
mutate(v4=2)%>%
rename(v3=mydata.v2, id=mydata.id )
> rbind(a,b)
id v3 v4
1 1 2 1
2 2 4 1
3 3 7 1
4 1 3 2
5 2 5 2
6 3 8 2
What about this:
mydata <- data.frame(c(1,2,3),c(2,4,7),c(3,5,8))
colnames(mydata) <- c("id","v1","v2")
mydata_2 <- rbind(mydata[,c(1,2)], setNames(mydata[,c(1,3)], names(mydata[,c(1,2)])))
mydata_2$v4 <- c(rep(1,length(mydata$v1)),rep(2,length(mydata$v2)))
colnames(mydata_2) <- c("id","v3","v4")
A data.table option
setcolorder(
transform(
setnames(melt(setDT(df), id.var = "id", variable.name = "v4"), "value", "v3"),
v4 = as.numeric(factor(v4))
), c("id", "v3", "v4")
)[]
gives
id v3 v4
1: 1 2 1
2: 2 4 1
3: 3 7 1
4: 1 3 2
5: 2 5 2
6: 3 8 2
I have data (imported imperfectly from a PDF) that has everything in a single column, with certain rows as descriptive headers. For example:
dfx <- data.frame(V1 = c("Box 1", "abcd10", "bcde15", "Box 2", "cdefg35", "jklm40", "nopq50", "rstu52"))
V1
1 Box 1
2 abcd10
3 bcde15
4 Box 2
5 cdefg35
6 jklm40
7 nopq50
8 rstu52
I want to create a separate column where each observation takes on the value of the nearest heading above it. Like this:
V1 v2
1 abcd10 Box 1
2 bcde15 Box 1
3 cdefg35 Box 2
4 jklm40 Box 2
5 nopq50 Box 2
6 rstu52 Box 2
Nothing I've tried has gotten me close. Any help would be appreciated. Thanks!
An idea via base R can be,
i1 <- grepl('Box', dfx$V1)
dfx$new <- with(dfx, ave(V1, cumsum(i1), FUN = function(i) i[1]))
subset(dfx, !i1)
# V1 new
#2 abcd10 Box 1
#3 bcde15 Box 1
#5 cdefg35 Box 2
#6 jklm40 Box 2
#7 nopq50 Box 2
#8 rstu52 Box 2
You could also do:
indx <- grepl("^Box \\d+$",dfx$V1)
transform(dfx,v2=V1[indx][cumsum(indx)])[!indx,]
V1 v2
2 abcd10 Box 1
3 bcde15 Box 1
5 cdefg35 Box 2
6 jklm40 Box 2
7 nopq50 Box 2
8 rstu52 Box 2
Create a V2 column which equals V1 for the Box rows and NA for other rows and then use na.locf0 to fill in the NAs. Finally remove the V1 Box rows.
library(zoo)
isBox <- grepl("Box", dfx$V1)
transform(dfx, V2 = na.locf0(replace(V1, !isBox, NA)))[ !isBox, ]
giving:
V1 V2
2 abcd10 Box 1
3 bcde15 Box 1
5 cdefg35 Box 2
6 jklm40 Box 2
7 nopq50 Box 2
8 rstu52 Box 2
I have this simple code, which generates a data frame. I want to remove the V character from the middle column. Is there any simple way to do that?
Here is a test code (the actual code is very long), very similar with the actual code.
mat1=matrix(c(1,2,3,4,5,"V1","V2","V3","V4","V5",1,2,3,4,5), ncol=3)
mat=as.data.frame(mat1)
colnames(mat)=c("x","row","y")
mat
This is the data frame:
x row y
1 1 V1 1
2 2 V2 2
3 3 V3 3
4 4 V4 4
5 5 V5 5
I just want to remove the V's like this:
x row y
1 1 1 1
2 2 2 2
3 3 3 3
4 4 4 4
5 5 5 5
We can use str_replace from stringr
library(stringr)
mat$row <- str_replace(mat$row, "V", "")
My Problem in general:
I have a data frame where i would like to find all bi-clusters with constant values in columns.
For Example the initial dataframe:
> df
v1 v2 v3
1 0 2 1
2 1 3 2
3 2 4 3
4 3 3 4
5 4 2 3
6 5 2 4
7 2 2 3
8 3 1 2
And for example i would like to find the a cluster like this:
> cluster1
v1 v3
1 2 3
2 2 3
I tried to use the biclust package and tested several functions but the result was always not what i want to archive.
I figured out that I may can use the BCPlaid function with fit.model = y ~ m. But it looks like this produce also different results.
Is there a way to archive this task efficient?
I have 2 data frames
D1 = V1 V2 V3 V4
1 2 3 4
2 3 4 5
3 5 4 2
D2 = V1 V2 V3
1 2 3
3 5 4
I am trying to match the two data frames and extract index of row D2 which matches with that of D1 using which but getting the error
which(D2[,1:3]==D1[3,1:3])
Error in Ops.data.frame : ‘==’ only defined for equally-sized data frames
(but if I write the equation separately as ,
which(D2[,1]==D1[3,1] & D2[,1]==D1[3,2] & D2[,1]==D1[3,3])
there is no problem but I want to generalise it)
Please suggest some alternative.
This does the trick:
which(apply(D2, 1, function(x) all(D1[3,1:3] == x)))
[1] 2
Data:
D1 <- read.table(text="V1 V2 V3 V4
1 2 3 4
2 3 4 5
3 5 4 2", header=T)
D2 <- read.table(text="V1 V2 V3
1 2 3
3 5 4", header=T)