I have around 800 data frames(a1,a2,a3...a800) in R and all of them have the same number of columns and column names.I want to a left join table a1 with rest of the 799 tables and store it in an object. Similarly, left join table a2 with the rest of them and store it another object and so on. I am unable to proceed with this! If anyone could help me will be great.
Here is an example
Table a1:
Names ID Time
X 1 2
Y 2 6
Z 3 5
K 4 8
Table a2;
Names ID Time
P 11 8
Q 12 9
R 10 7
Y 2 6
and so on.. I want to join by ID Column. And I have 800 tables!
u can use data.table::rbindlist
dataframe_name_list = list(a1,a2,a3,...a800)
data.table::rbindlist(dataframe_name_list, use.names=TRUE)
Related
I have a data set similar to df1 here
df1 = pd.DataFrame({'id':[1,1,2,2,2],
'value':[67,45,7,5,9]})
id value
1 67
1 45
2 7
2 5
2 9
I want to bring bring it to this form. all the values corresponding to that id in one cell separated by spaces.
id values
1 67 45
2 7 5 9
Here is my code
df2 = pd.DataFrame(df1['id'].unique())
df2.columns=['id']
df2['values']=np.nan
for i in df2['id']:
s=''
for k in df1[df1['id']==i]['value']:
s=s+' '+str(k)
df2.loc[df2['id']==i,'values']=s.lstrip()
print(df2)
Is there a more pythonic way of doing this. I have 70000 unique id's, each id may have number of values ranging from 1 to 20
I am using
Anaconda python 3.5
pandas 0.20.1
numpy 1.12.1
windows 10
Also, How can we replicate the same in R
Convert the 'value' column from int to string, then perform a groupby on 'id' and apply the str.join function:
# Convert 'value' column to string.
df1['value'] = df1['value'].astype(str)
# Perform a groupby and apply a string join.
df1 = df1.groupby('id')['value'].apply(' '.join).reset_index()
The resulting output:
id value
0 1 67 45
1 2 7 5 9
Here is how to do it in R. It is the same approach
df = data.frame('id'=c(1,1,2,2,2),'value'=c(67,45,7,5,9))
aggregate(cbind(values=value)~id,
data = df,
FUN = function(x){paste(x,collapse=' ')})
I got a table like this
a b c
-- -- --
1 1 10
2 1 0
3 1 0
4 4 20
5 4 0
6 4 0
The b column 'points' to 'a', a bit like if a is the parent.
c was computed. Now I need to propagate the parent c value to their children.
The result would be
a b c
-- -- --
1 1 10
2 1 10
3 1 10
4 4 20
5 4 20
6 4 20
I can't make an UPDATE/SELECT combo that works
So far I got a SELECT that procuce the c column I'd like to get
select t1.c from t t1 join t t2 on t1.a=t2.b;
c
----------
10
10
10
20
20
20
But I dunno how to stuff that into c
Thanx in advance
Cheers, phi
You have to look up the value with a correlated subquery:
UPDATE t
SET c = (SELECT c
FROM t AS parent
WHERE parent.a = t.b)
WHERE c = 0;
I finnally found a way to copy back my initial 'temp' SELECT JOIN to table 't'. Something like this
create temp table u as select t1.c from t t1 join t t2 on t1.a=t2.b;
update t set c=(select * from u where rowid=t.rowid);
I'd like to know how the 2 solutions, yours with 1 query UPDATE correlated SELECT, and mine that is 2 queries and 1 correlated query each, compare perf wise. Mine seems more heavier, and less aesthetic, yet regarding perf I wonder.
On the Algo side, yours take care not to copy the parent data, only copy child data, mine copy parent on itself, but that's a nop, yet consuming some cycles :)
Cheers, Phi
I have data frame in R with 3 columns A,B and C
A B C
2 3 4
5 2 7
I want to get square of each number like this
A B C
4 9 16
25 4 49
Can anyone please help me out. I can able to make in excel but want to do in R
just do this. In R ^ will take care whether it is a number,vector,matrix or dataframe..
dataframe^2
If you want your result as a data.frame rather than a matrix, do
data.frame(dataframe^2)
Sorry if this is really simple, but I've been trying to fin an answer for hours. I have two data frames that contain several columns each, example of similar situation below (actual data frames are very large and cumbersome).
First data frame
"GPS_ID" "Object_ID" "DBH_cm"
1 19426 15
2 9456 9
3 19887 11
5 18765 4
6 9322 7
And the second data frame
"Location" "ID"
block 1 9456
block 2 18765
block 2 9322
I need to create a new object that has ONLY the ID's in the second data frame matched with their corresponding DBH_cm's from the first data frame. I thought maybe merging would help, but when I tried it, it just added the Location column to the first data frame.
If I understand your final output correctly, the merge function should be what you need:
> merge(x,y, by.x = "Object_ID", by.y = "ID")
Object_ID GPS_ID DBH_cm Location
1 9322 6 7 block_3
2 9456 2 9 block_1
3 18765 5 4 block_2
You can further edit the new data.frame by removing what columns you don't require.
You can also use inner_join from dplyr. If x and y are the two datasets
library(dplyr)
colnames(y)[2] <- colnames(x)[2]
inner_join(x,y, by="Object_ID")
# GPS_ID Object_ID DBH_cm Location
# 1 2 9456 9 block 1
# 2 5 18765 4 block 2
# 3 6 9322 7 block 2
I'm working with some data that looks like this:
AB 123 4 5 3 2 1
AB 234 4 2 7 4 3
...
The row id is actually the combination of the first two columns, so I would like to be able to reference row AB123 or AB234. However, since they are in two columns, I figured the easiest way to do this would be to merge columns 1 and 2 somehow and then convert it to a table with column 1 specified as the row names. Does anyone know how I can do this? Is there an easier way? Thanks.
row.names(df)<-paste(df[,1],df[,2],sep="")