This doesn't work and I'm not sure why.
look_up <- data.frame(flat=c("160","130"),
street=c("stamford%20street", "doddington%20grove"),
city = c("London", "London"),
postcode = c("SE1%20", "se17%20"))
new <- data.frame()
for(i in 1:nrow(look_up)){
new <- rbind(new,look_up$flat[i])
}
Grateful if someone could tell me why please! My result should be a data frame with one column called 'flat' and the values of 160 and 130 on each row. Once I understand this I can move onto the real thing I'm trying to do!
No need for a loop:
look_up[,"flat",drop=FALSE]
As mentioned, the problem with your loop is automatic conversion to factors. You can put options(stringsAsFactors=FALSE)in front of your script to avoid that.
However, it's almost certain that you are approaching your actual problem in the wrong way. You should probably ask a new question, where you tell us what you actually want to achieve.
You need to look into the stringsAsFactors argument of data.frame.
look_up <- data.frame(flat=c("160","130"),
street=c("stamford%20street", "doddington%20grove"),
city = c("London", "London"),
postcode = c("SE1%20", "se17%20"),
stringsAsFactors = FALSE)
look_up[, "flat", drop = FALSEĀ ]
You could also do something like:
> look_up <- data.frame(flat=c("160","130"),
+ street=c("stamford%20street", "doddington%20grove"),
+ city = c("London", "London"),
+ postcode = c("SE1%20", "se17%20"))
>
> new <- look_up[,1,drop=FALSE]
> new
flat
1 160
2 130
> class(new)
[1] "data.frame"
This shows your final desired output is a dataframe with 160 and 130 on columns.
If you don't include drop=FALSE here, then your final output will be a factor.
Hope this helps.
Related
I have a string splitting related problem. I have a huge amount of files, which names are structures like this:
filenames = c("NO2_Place1_123_456789.dat", "NO2_Nice_Place_123_456789.dat", "NO2_Nice_Place_123_456789.dat", "NO2_Place2_123_456789.dat")
I need to extract the Stationnames, e.g. Place1, Nice_Place1 and so on. Its either "Place" and a number or "Nice_Place" and a number.
I tried this to get the stationnames for "Place" and a number and it works geat, but this doesnt give me the correct name in case of "Nice_Place"...because it handles it as 2 words.
Station = strsplit(filenames[1], "_")[[1]][2] #Works
Station = strsplit(filenames[2], "_")[[1]][2] #Doesnt work
My idea is now to use if...else. So If the Stationname in the example above is "Nice", add the 3rd part of the stringsplit with an underscore. Unfortunatley I am totally new to this if else condition.
Can somebody please help?
EDIT:
Expected output:
Station = strsplit(filenames[1], "_")[[1]][2] #Station = Place
Station = strsplit(filenames[2], "_")[[1]][2] #Station = Nice -- not correct I want to have "Nice_Place"
So When I get
Station = strsplit(filenames[2], "_")[[1]][2] #Station = Nice
I want to add a condition, that if Station is "Nice" it should add strsplit(filenames[2], "_")[[1]][3] with an underscore!
EDIT2:
I found now a way to get what I want:
filenames = c("NO2_Place1_123_456789.dat", "NO2_Nice_Place1_123_456789.dat", "NO2_Nice_Place2_123_456789.dat", "NO2_Place2_123_456789.dat")
Station = strsplit(filenames[2], "_")[[1]][2]
if (Station == "Nice"){
Station = paste(Station, strsplit(filenames[2], "_")[[1]][3], sep = "_")
}
We can use sub
sub("^[^_]+_(.*Place\\d*).*", "\\1", filenames[2])
#[1] "Nice_Place1"
I'm a newbie to this world.
I am currently working with R codes to analyze some sequencing data and just stuck now.
Here's some problem description.
What I'd like to do is to select first word of $v3 from pat1_01_exonic data(115 rows)and make it file. (I used strsplit function for this)
till now, I tried below code 1 attached, and it worked for first line.
but the problem is I can't do this for 115 times.
so, It seems like a loop is necessary.
I'm not really confident with making a loop by myself. and as I expected it didn't work.
for making stack I thought about using append or rbind or stack.
Can anyone give me some advice about how to fix this problem?
Big thanks in advance
#code1
pat1_01_exonic$V3 <-as.character(pat1_01_exonic$V3)
pat1 <- data.frame(head(strsplit(pat1_01_exonic$V3, ":")[[1]],1))
#code2
for (i in 1: nrow(pat1_01_exonic)) {
pat1_output <- vector()
sub[i] <- data.frame(head(strsplit(pat1_01_exonic$V3, ":")[[i]],1))
pat1_0utput <- append(sub[i])
i <- i+1
}
Many of the times, you can avoid for loop in R. If I have understood you correctly, here you can use sub to get first string before ":"
pat1_01_exonic$new_col <- sub(":.*", "", pat1_01_exonic$V3)
pat1_01_exonic
# V3 new_col
#1 abc:def:avd abc
#2 afd:adef afd
#3 emg:rvf:temp emg
data
pat1_01_exonic <- data.frame(V3 = c("abc:def:avd", "afd:adef", "emg:rvf:temp"),
stringsAsFactors = FALSE)
The below code is an an example to create a new variable "V3_First_Word" that selects the first word in the original string.
Want<-pat1_01_exonic%>%
mutate(V3_First_Word=word(V3,1,1)) # This creates new varaible and selects first word
In base R, we can use read.table
pat1_01_exonic$new_col <- read.table(text = pat1_01_exonic$V3, sep=":",
header = FALSE, fill = TRUE, stringsAsFactors = FALSE)[,1]
pat1_01_exonic$new_col
#[1] "abc" "afd" "emg"
Or strsplit and select the first element
sapply(strsplit(pat1_01_exonic$V3, ":"), `[`, 1)
data
pat1_01_exonic <- data.frame(V3 = c("abc:def:avd", "afd:adef", "emg:rvf:temp"),
stringsAsFactors = FALSE)
In a dbf I make the new field xyz then attempt to sum existing item1 and item2 fields and replace field xyz with sum and then create a new dbf-- but does not work. All working without the for loop. I hope someone can help. Thank you.
library(foreign)
setwd("C:/temp")
dbfdata <- read.dbf("sldu_500ka.dbf", as.is = TRUE)
dbfdata$xyz <- 1:nrow(dbfdata)
for(i in 1:nrow(dbfdata)) {
row <- dbfdata[i,]
dbfdata$xyz <- dbfdata$item1 + dbfdata$item2
}
write.dbf(dbfdata, "sldu_500k1.dbf")
I'm not sure whether I understand you correctly, but
library(foreign)
setwd("C:/temp")
dbfdata <- read.dbf("sldu_500ka.dbf", as.is = TRUE)
dbfdata$xyz <- dbfdata$item1 + dbfdata$item2
write.dbf(dbfdata, "sldu_500k1.dbf")
should do the job. Instead of looping overall rows, you can add the entire column at once.
I have the following data frame
id,category,value
A,21,0.89
B,21,0.73
C,21,0.61
D,12,0.95
E,12,0.58
F,12,0.44
G,23,0.33
Note, they are already sorted by value within each (id,category). What I would like to be able to do is to get the top from each (id,category) and make a string, followed by the second in each (id,category) and so on. So for the above example it would look like
A,D,G,B,E,C,F
Is there a way to do it easily in R? Or am I better off relying on a Perl script to do it?
Thanks much in advance
This appears to work, but I'm certain we could simplify it somewhat, particularly if you are able to relax your ordering requirements:
library(plyr)
d <- read.table(text = "id,category,value
A,21,0.89
B,21,0.73
C,21,0.61
D,12,0.95
E,12,0.58
F,12,0.44
G,23,0.33",sep = ',',header = TRUE)
d <- ddply(d,.(category),transform,r = seq_along(category))
d <- arrange(d,id)
> paste(d$id[order(d$r)],collapse = ",")
[1] "A,D,G,B,E,C,F"
This version is probably more robust to ordering, and avoids plyr:
d$r <- unlist(sapply(rle(d$category)$lengths,seq_len))
d$s <- 1:nrow(d)
with(d,paste(id[order(r,s)],collapse = ","))
I am looking to use a function to speed up a data cleaning process. In the example shown I am looking to remove values reported in the am and pm columns if the ".no" column for that day has a value of 1.
df1 = data.frame (identifier = c(1:4),
mon.no = c(1,NA,NA,NA),mon.am = c(2,1,NA,3),mon.pm = c(3,4,NA,5),
tues.no = c(NA,NA,1,NA),tues.am = c(2,3,1,4),tues.pm = c(3,3,2,3))
I envisage using a function uses the day to clean the data:
clean1 = function (day) {
df1$day.am[df1$day.no==1] = NA
df1$day.pm[df1$day.no==1] = NA
return (df1)}
df2 = clean1(mon)
However this returns the following error.
Error in `$<-.data.frame`(`*tmp*`, "day.am", value = logical(0)) :
replacement has 0 rows, data has 4
I assume that this is because the function expects a full column name and cannot fill in the gaps around a text input? Is it possible to use a function in that way?
Having read these notes I think that it would be better practice to have my data in a tidy format and am working on a solution which involves reorganising my data. However it would also be handy to be able to do this while the data is in it's original format.
Thanks.
You're really close. #Tyler Rinker in comments has explained why it doesn't work. Here's a fix:
clean1 = function (day) {
day.am = paste(day, "am", sep=".") # make a string from the variable day and the suffixes
day.pm = paste(day, "pm", sep=".")
day.no = paste(day, "no", sep=".")
df1[day.am][df1[day.no]==1] = NA
df1[day.pm][df1[day.no]==1] = NA
return (df1)}
df2 = clean1("mon") # "mon" should be a string
Somebody else might offer more efficient ways of doing this. Note that you're only ever working from your original df1 here. If you now run
df3 = clean1("tues")
you won't get a dataframe with both days cleaned. You could fix this by supplying the dataframe to be acted on to the function too:
clean2 = function(df, day){...