The script below shows X, Y data that is stored in a two columns data.frame
a1 <- as.character(c(3456,2569))
a2 <- as.character(c(956,569))
a3 <- as.character(c(156,269))
mydf <- rbind(a1, a2, a3)
How can I stored it in a data.frame with one column in the format “X,Y” and add zero to each X and Y (characters).
so the output will be
"3456.000, 2569.000"
"956.000, 569.000"
"156.000, 269.000"
Something like this could work:
data.frame(col1 = apply(mydf, 1, function(x) paste(paste0(x, '.000'), collapse = ', ')))
# col1
#a1 3456.000, 2569.000
#a2 956.000, 569.000
#a3 156.000, 269.000
apply iterates per row of your matrix and firstly creates the number with the zeroes (that's paste0) and then merges everything in one comma separated string (that's paste).
Are all the numbers integers, or do some of them already have a decimal point? If it's the latter, you might want to do something like
sprintf("%.3f, %.3f", as.numeric(mydf[,1]), as.numeric(mydf[,2]))
Related
I have a data frame df with a column X where we have three different variables a, b and c as characters. For example
df$X <- data.frame(X = c(a,a,a,b,b,c,c,c,c), Y = ....)
I want to transform it into a = 1, b = 2 and c = 3 as numerics.
I first tried
df$X = as.factor(df$X)
transform(df, X = as.numeric(X))
where now I have a factor with three levels and a=1, b=2 and c=3. However the problem is that I need the column X as numeric. If I try
transform(df, X = as.numeric(as.character(X)))
or
transform(df, X = as.numeric(levels(X))[X])
I get NA for all the inputs (a, b, c).
How can I get the column X with numeric 1, 2, 3?
The solution of #jay.sf with encoding the characters first as a factor is quite elegant, because it generalizes to aribitrary strings and not just single characters.
If the codes are single characters, there is another possible solution, which uses the builtin constant letters and returns the position therein:
sapply(df$X, function(x) {which(x == letters)})
lets take an example dataframe with removal of variable columns:
frame <- data.frame("a" = 1:5, "b" = 2:6, "c" = 3:7, "d" = 4:8)
rem <- readline()
frame <- subset(frame, select = -c(rem))
How do I get the variable column to be removed? This is not my real code, just wanted to present my problem in a simple code. Thanks!
Edit: I am so sorry, I am really sleepy and don't know what I typed into my code, I edited it now.
1) Do both at once. We assume that ix contains at least one column number.
ix <- 1:2
frame[-ix]
## c d
## 1 3 4
## 2 4 5
## 3 5 6
## 4 6 7
## 5 7 8
1a) or if the case where ix is zero length, ix <- c(), is important we can do this. The output of this and all the rest are the same as for (1) so we won't repeat the output.
ix <- 1:2
frame[setdiff(seq_along(frame), ix)]
1b) or if we have names rather than column numbers. This works even if nms is a zero length vector in which case it returns the original data frame.
nms <- c("a", "b")
frame[setdiff(names(frame), nms)]
2) or if you need to do it iteratively remove the largest one first because if it were done in ascending order then after the first one is removed the second column is no longer the second but is the first. If we knew that ix is already sorted we could omit the sort. We have used frame_out to hold the result so that the input is not destroyed. This works even if ix is the empty vector.
ix <- 1:2
frame_out <- frame
for(i in rev(sort(ix))) frame_out <- frame_out[-i]
frame_out
3) One way to do it independent of order is to do it by name. In this case it would be possible to remove them in ascending order. This works even if ix the empty vector.
ix <- 1:2
nms <- names(frame)[ix]
frame_out <- frame
for(nm in nms) frame_out <- frame_out[-match(nm, names(frame_out))]
frame_out
I have a set of lists stored in the all_lists.
all_list=c("LIST1","LIST2")
From these, I would like to create a data frame such that
LISTn$findings${Coli}$character is entered into the n'th column with rowname from LISTn$rowname.
DATA
LIST1=list()
LIST1[["findings"]]=list(s1a=list(character="a1",number=1,string="a1type",exp="great"),
=list(number=2,string="b1type"),
in2a=list(character="c1",number=3,string="c1type"),
del3b=list(character="d1",number=4,string="d1type"))
LIST1[["rowname"]]="Row1"
LIST2=list()
LIST2[["findings"]]=list(s1a=list(character="a2",number=5,string="a2type",exp="great"),
s1b=list(character="b2",number=6,string="b2type"),
in2a=list(character="c2",number=7,string="c2type"),
del3b=list(character="d2",number=8,string="d2type"))
LIST2[["rowname"]]="Row2"
Please note that some characters are missing for which NA would suffice.
Desired output is this data frame:
s1a s1b in2a del3b
Row1 a1 NA c1 d1
Row2 a2 b2 c2 d2
There is about 1000 of these lists, speed is a factor. And each list is about 50mB after I load them through rjson::fromJSON(file=x)
The row and column names don't follow a particular pattern. They're names and attributes
We can use a couple of lapply/sapply combinations to loop over the nested list and extract the elements that have "Row" as the name
do.call(rbind, lapply(mget(all_list), function(x)
sapply(lapply(x$findings[grep("^Row\\d+", names(x$findings))], `[[`,
"character"), function(x) replace(x, is.null(x), NA))))
Or it can be also done by changing the names to a single value and then extract all those
do.call(rbind, lapply(mget(all_list), function(x) {
x1 <- setNames(x$findings, rep("Row", length(x$findings)) )
sapply(x1[names(x1)== "Row"], function(y)
pmin(NA, y$character[1], na.rm = TRUE)[1])}))
purrr has a strong function called map_chr which is built for these tasks.
library(purrr)
sapply(mget(all_list),function(x) purrr::map_chr(x$findings,"character",.default=NA))
%>% t
%>% data.frame
I have a df like this:
a <- c(4,5,3,5,1)
b <- c(8,9,7,3,5)
c <- c(6,7,5,4,3)
df <- data.frame(rbind(a,b,c))
I want a new df, df2, containing the difference between the values in each cell in rows a and b and the value in row c in their respective columns.
df2 would look like this:
a <- c(-2,-2,-2,1,-2)
b <- c(2,2,2,-1,2)
df2 <- data.frame(rbind(a,b))
Here is where I'm getting stuck:
df2 <- data.frame(apply(df,c(1,2),function(x) x - df[nrow(df),the col index of x]))
How do I reference the column index of x? Is there something like JavaScript's this?
We can do this easily by replicating the 3rd row to make the lengths equal before subtracting with the first two rows
out <- df[c("a", "b"),] - df["c",][col(df[c("a", "b"),])]
identical(df2, out)
#[1] TRUE
Or explicitly using rep
df[c("a", "b"),] - rep(unlist(df["c",]), each = 2)
I would like to exclude rows from a data-frame which contain mirrored info. This is my input:
dfin <- 'info
c1-10-20-c2-40-50
c2-1-2-c4-20-25
c4-20-25-c2-1-2
c2-40-50-c1-10-20'
dfin <- read.table(text=dfin, header=T)
In the above example you can see that rows 1 and 3; 2 and 4 represent the same logic in a 'mirror'. In my context does not matter if I have c1-10-20-c2-40-50 or c2-40-50-c1-10-20, thus I would like to filter one of this rows out (any of them). I don't have more than two redundant rows. Moreover, In my actual data-set these 'mirrored' rows are scattered and do not follow a pattern. My expected output:
dfout <- 'info
c1-10-20-c2-40-50
c2-1-2-c4-20-25'
dfout <- read.table(text=dfout, header=T)
We can split the 'info' column by -, sort it, convert to a logical vector with duplicated which will be used for subsetting the rows.
dfN <- dfin[!duplicated(lapply(strsplit(as.character(dfin$info), "-"), sort)),, drop=FALSE]
all.equal(dfN, dfout, check.attributes=FALSE)
#[1] TRUE
Here is an approach which does not keep the original order:
dfin <- 'info-info-info-info-info-info
c1-10-20-c2-40-50
c2-1-2-c4-20-25
c4-20-25-c2-1-2
c2-40-50-c1-10-20'
df <- read.table(text=dfin, header=T, sep = "-", strip.white = T)
dfout<-as.data.frame(unique(t(apply(df, 1, sort))))
I extended your column name to make it work.