Related
I have data as follows:
dataset = list()
a <- c(1,2,3)
b <- c(1,2,3)
country <- c("A","B","C")
source_country <- c("D","D","D")
dataset[[1]] <- data.frame(a,b,country, source_country)
a <- c(NA)
b <- c(NA)
country <- c(NA)
source_country <- c(NA)
dataset[[2]] <- data.frame(a,b,country, source_country)
I want to rename each list item with the source_country from the data frame of the same list item. I tried the following:
for (i in 1:length(dataset)) {
if (!is.null(dataset[[i]])) {
print ("no data")
} else if (nrow(dataset[[i]]) > 1) {
names(dataset)[i] <- dataset[[i]][["source_country"]][[1]]
}
}
But it does not seem to work..
Desired Outcome:
names(dataset)[1] <- "D"
names(dataset)[2] <- "NA"
A purrr option -
library(purrr)
set_names(dataset, map_chr(dataset, pluck, "source_country", 1))
#$D
# a b country source_country
#1 1 1 A D
#2 2 2 B D
#3 3 3 C D
#$<NA>
# a b country source_country
#1 NA NA NA NA
If your R version is less than 4.1.0 then replace \(x) with function(x):
names(dataset) <- sapply(dataset, \(x) x$source_country[1])
This will give your second element a name of NA. If you want that to be a character you can wrap with the function as.character.
The problem with your loop is that you're testing if each element of your list is not null (is.null tests if the element is null, !is.null inverts this). Since each element of your list is a dataframe none of them are null so your loop never enters the else if clause. The only thing you're doing in your if statement is printing so nothing is renamed.
You could do something like:
for (i in 1:length(dataset)) {
if (nrow(dataset[[i]]) == 0) {
print ("no data")
} else if (nrow(dataset[[i]]) >= 1) {
names(dataset)[i] <- dataset[[i]][["source_country"]][1]
}
}
Using base R
setNames(dataset, unlist(sapply(dataset, subset,
subset = seq_along(source_country) == 1, select = source_country)))
-ouptut
$D
a b country source_country
1 1 1 A D
2 2 2 B D
3 3 3 C D
$<NA>
a b country source_country
1 NA NA NA NA
Interesting I am unable to find a way to filter using the column number. I do not know the name of the column because it changes name, but I always know the position of the column.
This seems pretty trivial but it seems like I can only reference the i portion using the column name.
table = data.table(one = c(1,2,3), two = c("a","b","c"))
> table
one two
1: 1 a
2: 2 b
3: 3 c
I do not know that the second column is "two". I just want to filter by second column.
> table[two == "a"]
one two
1: 1 a
UPDATE:
As Ronak described, I could use
> table[table[[2]]=="a"]
one two
1: 1 a
However I would next like to update this same column, for example I would like to turn "a" into "c".
what I need:
> table
one two
1: 1 c
2: 2 b
3: 3 c
I have tried:
> table[table[[2]]=="a", table[[2]]:= "c"]
> table
one two a b c
1: 1 a c c c
2: 2 b <NA> <NA> <NA>
3: 3 c <NA> <NA> <NA>
So it seems like I am taking all the values in the second column and creating new columns for them instead of just changing the filtered rows to c.
> table[table[[2]]=="a", table[2]:= "c"]
Error in `[.data.table`(table, table[[2]] == "a", `:=`(table[2], "c")) :
LHS of := must be a symbol, or an atomic vector (column names or positions).
So I think I need to know the position of the second column.
Using [[ works :
library(data.table)
dt <- data.table(a = 1:5, b = 2:6)
dt[dt[[1]] == 1]
# a b
#1: 1 2
This gives the same output as dt[a == 1].
As we know we need the 2nd column, get the 2nd column name, and use "variable as column name", see example:
library(data.table)
d <- data.table(one = c(1,2,3), two = c("a","b","c"))
# get the 2nd column name
myCol <- colnames(d)[ 2 ]
# subset
d[ get(myCol) == "a", ]
# subset and update
d[ get(myCol) == "a", (myCol) := "c" ]
We can use .SD
dt[dt[, .SD[[1]] == 1]]
# a b
#1: 1 2
data
dt <- data.table(a = 1:5, b = 2:6)
You can also try this:
table[[2]][table[[2]]=="a"] <- "c"
table
> table
one two
1: 1 c
2: 2 b
3: 3 c
I have figured it out:
> table[table[[2]]=="a", colnames(table)[2]:= "c"]
> table
one two
1: 1 c
2: 2 b
3: 3 c
Thanks!
I want to delete the header from a dataframe that I have. I read in the data from a csv file then I transposed it, but it created a new header that is the name of the file and the row that the data is from in the file.
Here's an example for a dataframe df:
a.csv.1 a.csv.2 a.csv.3 ...
x 5 6 1 ...
y 2 3 2 ...
I want to delete the a.csv.n row, but when I try df <- df[-1,] it deletes row x and not the top.
If you really, really, really don't like column names, you may convert your data frame to a matrix (keeping possible coercion of variables of different class in mind), and then remove the dimnames.
dd <- data.frame(x1 = 1:5, x2 = 11:15)
mm1 <- as.matrix(dd)
mm2 <- matrix(mm1, ncol = ncol(dd), dimnames = NULL)
I add my previous comment here as well:
?data.frame: "The column names should be non-empty, and attempts to use empty names will have unsupported results.".
Set names to NULL
names(df) <- NULL
You can also use the header option in read.csv
You can use names(df) to change the names of header or col names. If newnames is a list of names as newname<-list("col1","col2","col3"), then names(df)<-newname will give you a data with col names as col1 col2 col3.
As # Henrik said, the col names should be non-empty. Setting the names(df)<-NULLwill give NA in col names.
If your data is csv file and if you use header=TRUE to read the data in R then the data will have same colnames as csv file, but if you set the header=FALSE, R will assign the colnames as V1,V2,...and your colnames in the original csv file appear as a first row.
anydata.csv
a b c d
1 1 2 3 13
2 2 3 1 21
read.csv("anydata.csv",header=TRUE)
a b c d
1 1 2 3 13
2 2 3 1 21
read.csv("anydata.csv",header=FALSE)
V1 V2 V3 V4
1 a b c d
2 1 2 3 13
3 2 3 1 21
You could use
setNames(dat, rep(" ", length(dat)))
where dat is the name of the data frame. Then all columns will have the name " " and hence will be 'invisible'.
It comes with some years of delay but you can simply use a vector renaming de columns:
## if you want to delete all column names:
colnames(df)[] <- ""
## if you want to delete let's say column 1:
colnames(df)[1] <- ""
## if you want to delete 1 to 3 and 7:
colnames(df)[c(1:3,7)] <- ""
As already mentioned not having column names just isn't something that is going to happen with a data frame, but I'm kind of guessing that you don't care so much if they are there you just don't want to see them when you print your data frame? If so, you can write a new print function to get around that, like so:
> dat <- data.frame(var1=c("A","B","C"),var2=rnorm(3),var3=rnorm(3))
> print(dat)
var1 var2 var3
1 A 1.2771777 -0.5726623
2 B -1.5000047 1.3249348
3 C 0.1989117 -1.4016253
> ncol.print <- function(dat) print(matrix(as.matrix(dat),ncol=ncol(dat),dimnames=NULL),quote=F)
> ncol.print(dat)
[,1] [,2] [,3]
[1,] A 1.2771777 -0.5726623
[2,] B -1.5000047 1.3249348
[3,] C 0.1989117 -1.4016253
Your other option it set your variable names to unique amounts of whitespace, for example:
> names(dat) <- c(" ", " ", " ")
> dat
1 A 1.2771777 -0.5726623
2 B -1.5000047 1.3249348
3 C 0.1989117 -1.4016253
You can also write a function do this:
> blank.names <- function(dat){
+ for(i in 1:ncol(dat)){
+ names(dat)[i] <- paste(rep(" ",i),collapse="")
+ }
+ return(dat)
+ }
> dat <- data.frame(var1=c("A","B","C"),var2=rnorm(3),var3=rnorm(3))
> dat
var1 var2 var3
1 A -1.01230289 1.2740237
2 B -0.13855777 0.4689117
3 C -0.09703034 -0.4321877
> blank.names(dat)
1 A -1.01230289 1.2740237
2 B -0.13855777 0.4689117
3 C -0.09703034 -0.4321877
But generally I don't think any of this should be done.
A function that I use in one of my R scripts:
read_matrix <- function (csvfile) {
a <- read.csv(csvfile, header=FALSE)
matrix(as.matrix(a), ncol=ncol(a), dimnames=NULL)
}
How to call this:
iops_even <- read_matrix('even_iops_Jan15.csv')
iops_odd <- read_matrix('odd_iops_Jan15.csv')
You can simply do:
print(df.to_string(header=False))
if you want to remove the line indexes as well, you can do:
print(df.to_string(index=False,header=False))
according to my last question i have an new belonging question. After Editing my post and ask there and wait abot a week i want to try it here again.
This time with a better example:
Equip<- c(1,1,1,1,2,2,2,3,3,3,4,4,4,5,5,5,6,6,6,6,6,6)
Notif <-c(1,1,3,4,2,2,2,5,6,7,9,9,15,10,11,12,13,14,16,17,18,19)
rank <- c(1,1,2,3,1,1,1,1,2,3,1,1,2,1,2,3,1,2,3,4,5,6)
Component <- c("Ventil","Motor","Ventil","Ventil","Vergaser","Vergaser","Bremse",
"Lichtmaschine","Bremse","Lichtmaschine","Bremse","Motor","Lichtmaschine",
"Bremse","Bremse","Motor","Vergaser","Motor","Vergaser","Motor",
"Vergaser","Motor")
df <- data.frame(Equip,Notif,rank,Component)
Equip is my subject and rank the actual visit number. Component is the subject what have to be looked for.
I want to have an output like this:
If an Equip(subject) was visited 2 times( rank 1 and 2) look by all Equips with rank 1&2 , if there is any Component which was regarded the first and the second time.
If an Equip(subject) was visited 3 times (rank 1 ,2 and 3) for this look by all Equips, if there is any Component list up 3 times like Equip 1, rank 1, Component Motor, Equip 1, rank 2, Component Motor, Equip 1, rank 3, Component Motor
The output should have the name of the Component, like True "Motor"
I have a code but with this, i can just compare the 1 and the 2 visit, the 2 and the 3 together and so on( i cannot split up again with the ranks, like Equips with 2 ranks, Equips with 3 ranks and so on)
the code is this:
a <- lapply(split(df,df$Equip),function(x){
ll <- split(x,x$rank)
if(length(ll)>1 )
ii <- intersect(ll[[1]]$Component,ll[[2]]$Component ) ## test intersection
else
ii <- NA
c(length(ii)> 0 && !is.na(ii),ii)
})
b <- unlist(a)
c <- table(b,b)
rowSums(c)
Hopefully you can help me. Please ask if there are any questions.
according to your question about the output, and to your way of solution,
Equip Component V1 idx
1: 1 Ventil TRUE 3
2: 2 NA False 1
3: 3 NA False 3
4: 4 NA FALSE 2
5: 5 NA FALSE 3
6: 6 NA FALSE 6
Something like that, but if its easier, Equip and idx is not neccessarilly needed
for Equip with 2 ranks:
TRUE FALSE
0 1
for Equip with 3 ranks:
TRUE FALSE
1 2
for Equip with 6 ranks:
TRUE FALSE
0 1
Here's the output I think would be of interest to you. Its using data.table.
First, we create a data.table from your data.frame df with keys = Equip, Component as follows.
require(data.table) # load package
# then create the data.table with keys as specified above
# Check that both these columns are already sorted out for you!
dt <- data.table(df, key=c("Equip", "Component"))
Second, we create a function that'll give the desired output for a given rank query (2, 3 etc..)
this.check <- function(idx) {
chk <- seq(1, idx)
o <- subset(dt[, all(chk %in% rank), by=c("Equip", "Component")], V1 == TRUE)
if (nrow(o) > 0) o[, idx:=idx]
}
What does this do? Let's run this for rank=1,2. We run this by:
> this.check(2)
# output
Equip Component V1 idx
1: 1 Ventil TRUE 2
2: 5 Bremse TRUE 2
This tells you that for Equip = 1 and 5, there are Components = Ventil and Bremse with rank = 1 and 2, respectively (indicated with idx=2). You also get the column V1 = TRUE, even though I, as #Carl pointed out already, don't understand the need for this. If you require, you can change the column names of this output by using setnames
Third, we use this function to query ranks=1,2, then ranks=1,2,3 .. and so on. This can be accomplished with a simple lapply as follows:
# Let's run the function for idx = 2 to 6.
# This will check from rank = 1,2 until rank=1,2,3,4,5,6
o <- lapply(2:6, function(idx) {
this.check(idx)
})
> o
[[1]]
Equip Component V1 idx
1: 1 Ventil TRUE 2
2: 5 Bremse TRUE 2
[[2]]
Equip Component V1 idx
1: 1 Ventil TRUE 3
[[3]]
NULL
[[4]]
NULL
[[5]]
NULL
It shows that for rank=1,2 and rank=1,2,3 you have some Component. For others there's nothing = NULL.
Finally, we can bind all of these together using rbind to get one single data.table as follows:
o <- do.call(rbind, o)
> o
Equip Component V1 idx
1: 1 Ventil TRUE 2
2: 5 Bremse TRUE 2
3: 1 Ventil TRUE 3
Here, idx=2 are the Component that satisfies rank=1,2 and idx=3 are the ones that satisfy rank=1,2,3.
Putting it all together:
this.check <- function(idx) {
chk <- seq(1, idx)
o <- subset(dt[, all(chk %in% rank), by=c("Equip", "Component")], V1 == TRUE)
if (nrow(o) > 0) o[, idx:=idx]
}
o <- do.call(rbind, lapply(2:6, function(idx) {
this.check(idx)
}))
I hope this helps.
Edit: (After series of exchanges in comments, this is the new solution I propose. I hope this is what you are after.)
require(data.table)
dt <- data.table(df, key=c("Equip", "Component"))
dt[, `:=`(e.max=max(rank)), by=Equip]
dt[, `:=`(ec.max=max(rank)), by=c("Equip", "Component")]
setkey(dt, "e.max", "ec.max")
this.check <- function(idx) {
t1 <- dt[J(idx,idx)]
t2 <- t1[, identical(as.numeric(seq_len(idx)), as.numeric(rank)),
by=c("Equip", "Component")]
o <- table(t2$V1)
if (length(o) == 1)
o <- c(o, "TRUE"=0)
o <- c("idx"=idx, o)
}
o <- do.call(rbind, lapply(2:6, function(idx) this.check(idx)))
> o
# idx FALSE TRUE
# [1,] 2 1 0
# [2,] 3 2 1
# [3,] 4 1 0
# [4,] 5 1 0
# [5,] 6 1 0
If I make an array of your data, columnwise, as
foo<-cbind(Equip,Notif, rank, Component)
eqp<-1 # later, loop over all values
foo[c( which( foo[,1]==eqp & (foo[,3]==1 | foo[,3]==2) ) ),4]
[1] "Ventil" "Motor" "Ventil"
Feed those results to table and extract items with count ==2
Clearly any item which shows up twice is what you want.
This is not an answer I'd recommend using, since tools like ddply and aggregate will do this much more cleanly, but I want to be sure that this is the answer you're after, assuming a loop over eqp values in the original Equip .
So I have a bunch of data frames in a list object. Frames are organised such as
ID Category Value
2323 Friend 23.40
3434 Foe -4.00
And I got them into a list by following this topic. I can also run simple functions on them as shown in this topic.
Now I am trying to run a conditional function with lapply, and I'm running into trouble. In some tables the 'ID' column has a different name (say, 'recnum'), and I need to tell lapply to go through each data frame, check if there is a column named 'recnum', and change its name to 'ID', as in
colnr <- which(names(x) == "recnum"
if (length(colnr > 0)) {names(x)[colnr] <- "ID"}
But I'm running into trouble with local scope and who knows what. Any ideas?
Use the rename function from plyr; it renames by name, not position:
x <- data.frame(ID = 1:2,z=1:2)
y <- data.frame('recnum' = 1:2,z=3:4)
.list <- list(x,y)
library(plyr)
lapply(.list, rename, replace = c('recnum' = 'ID'))
[[1]]
ID z
1 1 1
2 2 2
[[2]]
ID z
1 1 3
2 2 4
Your original code works fine:
foo <- function(x){
colnr <- which(names(x) == "recnum")
if (length(colnr > 0)) {names(x)[colnr] <- "ID"}
x
}
.list <- list(x,y)
lapply(.list, foo)
Not sure what your problem was.
If you look at the second part of mnel's answer, you can see that the function foo evaluates x as its last expression. Without that, if you try to change the names of the data.frames in your list directly from within the anonymous function passed to lapply, it will likely not work.
Just as an alternative, you could use gsub and avoid loading an additional package (although plyr is a nice package):
xx <- list(data.frame("recnum" = 1:3, "recnum2" = 1:3),
data.frame("ID" = 4:6, "hat" = 4:6))
lapply(xx, function(x){
names(x) <- gsub("^recnum$", "ID", names(x))
return(x)
})
# [[1]]
# ID recnum2
# 1 1 1
# 2 2 2
# 3 3 3
# [[2]]
# ID hat
# 1 4 4
# 2 5 5
# 3 6 6