I want to make this dataframe
into this matrix
I have tried:
x <- read.csv("sample1.csv")
ax <- matrix(c(x[1,1],x[2,1],x[1,3],x[1,1],x[3,1],x[1,4],x[1,1],x[4,1],x[1,5],x[1,1],x[5,1],x[1,6],x[1,1],x[6,1],x[1,7],x[2,1],x[1,1],x[2,2],x[2,1],x[3,1],x[2,4],x[2,1],x[4,1],x[2,5],x[2,1],x[5,1],x[2,6],x[3,1],x[6,1],x[2,7],x[3,1],x[1,1],x[3,2],x[3,1],x[2,1],x[3,3],x[3,1],x[4,1],x[3,5],x[3,1],x[5,1],x[3,6],x[3,1],x[6,1],x[3,7],x[4,1],x[1,1],x[4,2],x[4,1],x[2,1],x[4,3],x[4,1],x[3,1],x[4,4],x[4,1],x[5,1],x[4,6],x[4,1],x[6,1],x[4,7],x[5,1],x[1,1],x[2,2],x[5,1],x[2,1],x[2,4],x[5,1],x[3,1],x[2,5],x[5,1],x[4,1],x[2,6],x[5,1],x[6,1],x[2,7],x[6,1],x[1,1],x[2,2],x[6,1],x[2,1],x[2,4],x[6,1],x[3,1],x[2,5],x[6,1],x[4,1],x[2,6],x[6,1],x[5,1],x[2,7]),10,3, byrow=TRUE)
bx <- ax[order(ax[,3], decreasing = TRUE),]
But it's not beautiful at all, and also it's gonna be lots of work if I got different sample data.
So I wish to simplified it if possible, any suggestion?
This can be achieved by using melt() function from reshape2 package:
> a = matrix(c(1:9), nrow = 3, ncol = 3, dimnames = list(LETTERS[1:3], letters[1:3]))
> a
a b c
A 1 4 7
B 2 5 8
C 3 6 9
> library(reshape2)
> melt(a, na.rm = TRUE)
Var1 Var2 value
1 A a 1
2 B a 2
3 C a 3
4 A b 4
5 B b 5
6 C b 6
7 A c 7
8 B c 8
9 C c 9
I have a data.frame which has two column. However, I need to convert the format of psw column in 5 digit integer from the current format. How can I automatically change 1 digit to 5 in psw column? How can I get this done in R easily? Thanks
Here is reproducible data.frame
mydat <- data.frame(ID=LETTERS[seq( from = 1, to = 6)],
psw=c(10501,3,80310,8930,234,1))
> mydat
ID psw
1 A 10501
2 B 3
3 C 80310
4 D 8930
5 E 234
6 F 1
This is my desired output:
> mydat
ID psw
1 A 10501
2 B 00003
3 C 80310
4 D 08930
5 E 00234
6 F 00001
You can't do that while keeping the psw column numeric, but you can format it to be a certain width as a character vector. Here are two methods for this:
In base R you can use formatC():
mydat <- data.frame(ID=LETTERS[seq( from = 1, to = 6)],
psw=c(10501,3,80310,8930,234,1))
mydat$psw <- formatC(mydat$psw, width = 5, format = "d", flag = "0")
mydat
# ID psw
# 1 A 10501
# 2 B 00003
# 3 C 80310
# 4 D 08930
# 5 E 00234
# 6 F 00001
In stringr, you can use str_pad():
install.packages("stringr")
library(stringr)
mydat <- data.frame(ID=LETTERS[seq( from = 1, to = 6)],
psw=c(10501,3,80310,8930,234,1))
mydat$psw <- str_pad(mydat$psw, width = 5, pad = "0")
mydat
# ID psw
# 1 A 10501
# 2 B 00003
# 3 C 80310
# 4 D 08930
# 5 E 00234
# 6 F 00001
One can even use sprintf in base-R.
mydat$psw <- sprintf("%05d",mydat$psw)
mydat
# ID psw
# 1 A 10501
# 2 B 00003
# 3 C 80310
# 4 D 08930
# 5 E 00234
# 6 F 00001
If I want to add a field to a given data frame and setting it equal to an existing field in the same data frame based on a condition on a different (existing) field.
I know this works:
is.even <- function(x) x %% 2 == 0
df <- data.frame(a = c(1,2,3,4,5,6),
b = c("A","B","C","D","E","F"))
df$test[is.even(df$a)] <- as.character(df[is.even(df$a), "b"])
> df
a b test
1 1 A NA
2 2 B B
3 3 C NA
4 4 D D
5 5 E NA
6 6 F F
But I have this feeling it can be done a lot better than this.
Using data.table it's quite easy
library(data.table)
dt = data.table(a = c(1,2,3,4,5,6),
b = c("A","B","C","D","E","F"))
dt[is.even(a), test := b]
> dt
a b test
1: 1 A NA
2: 2 B B
3: 3 C NA
4: 4 D D
5: 5 E NA
6: 6 F F
I have a table called data:
A 22
B 333
C Not Av.
D Not Av.
How can I get a subset, from which all rows containing "Not Av." are excluded? It is important to mention that I have the index of a column to be checked (in this case colnum = 2), but I don't have its name.
I tried this, but it does not work:
data<-subset(data,colnum!="Not Available")
df <- read.csv(text="A,22
B,333
C,Not Av.
D,Not Av.", header=F)
df[df[,2] != "Not Av.",]
You don't really need the subset function. Just use [:
> set.seed(42)
> DF <- data.frame(x = LETTERS[1:10],
y = sample(c(1, 2, 3, "Not Av."), 10, replace = TRUE))
> DF
x y
1 A Not Av.
2 B Not Av.
3 C 2
4 D Not Av.
5 E 3
6 F 3
7 G 3
8 H 1
9 I 3
10 J 3
> DF[DF[2] != "Not Av.",]
x y
3 C 2
5 E 3
6 F 3
7 G 3
8 H 1
9 I 3
10 J 3
In case you still want to use the subset function:
df<-subset(df,!grepl("Not Av",df[,2]))
Say I have a data frame which looks like this:
df.A
A B C
x 1 3 4
y 5 4 6
z 8 9 1
And I want to replace the column names in the first based on column values in a second:
df.B
Low High
A D
B F
C G
Such that I get:
df.A
D F G
x 1 3 4
y 5 4 6
z 8 9 1
How would I do it?
I have tried extracting the vector df.B$High from df.B and using this in names(df.A), but everything is in alphabetical order and shifted over one. Furthermore, this only works if the order of columns in df.A is conserved with respect to the elements in df.B$High, which is not always the case (and in my real example there is no numeric or alphabetical way to sort the two to the same order). So I think I need an rbind-type argument for matching elements, but I'm not sure.
Thanks!
You can use rename from plyr:
library(plyr)
dat <- read.table(text = " A B C
x 1 3 4
y 5 4 6
z 8 9 1",header = TRUE,sep = "")
> new <- read.table(text = "Low High
A D
B F
C G",header = TRUE,sep = "")
> rename(dat,replace = setNames(new$High,new$Low))
D F G
x 1 3 4
y 5 4 6
z 8 9 1
using match:
df.A <- read.table(sep=" ", header=T, text="
A B C
x 1 3 4
y 5 4 6
z 8 9 1")
df.B <- read.table(sep=" ", header=T, text="
Low High
A D
B F
C G")
df.C <- df.A
names(df.C) <- df.B$High[match(names(df.A), df.B$Low)]
df.C
# D F G
# x 1 3 4
# y 5 4 6
# z 8 9 1
You can play games with the row names of df.B to make a lookup more convenient:
rownames(df.B) <- df.B$Low
names(df.A) <- df.B[names(df.A),"High"]
df.A
## D F G
## x 1 3 4
## y 5 4 6
## z 8 9 1
Here's an approach abusing factor:
f <- factor(names(df.A), levels=df.B$Low)
levels(f) <- df.B$High
f
## [1] D F G
## Levels: D F G
names(df.A) <- f
## Desired results