# Another Note on Fragmented-Replicated-Join

Interesting work in the database community on the subject of FRJ…

A second join that we’ve had to face with is the computation of disjunctive equi-joins. This seems like an easy problem, though not so simple.

In pig, if I want to join table A and B that is expressed in the following SQL:

select * from A, B where (
A.ind1=B.ind1 AND
(A.ind2=B.ind2 OR A.ind3=B.ind3) AND
(A.ind4=B.ind4 OR A.ind5=B.ind5));

How would one do this? Pig only supports conjunctive equi-join of the following form:
select * from A, B where (
A.ind1=B.ind1 AND
A.ind2=B.ind2 AND
A.ind3=B.ind3 AND
A.ind4=B.ind4 AND
A.ind5=B.ind5);
C = JOIN A by (ind1, ind2, ind3, ind4, ind5),
B by (ind1, ind2, ind3, ind4, ind5),
I’m not even looking for an efficient way to do this, just any way to do it in pig or in a map-reduce algorithm would be sufficient for now.

## 2 thoughts on “Another Note on Fragmented-Replicated-Join”

1. Anonymous

You can simulate disjunctive equi-join with strict equi-joins. To achieve the following SQL:

select * from A, B where (
A.ind1=B.ind1 AND
(A.ind2=B.ind2 OR A.ind3=B.ind3) AND
(A.ind4=B.ind4 OR A.ind5=B.ind5));

AA = foreach A generate FLATTEN(
MAKEBAG(
MAKETUPLE('group1','ind1',ind1),
MAKETUPLE('group1','ind2',ind2),
MAKETUPLE('group2','ind3',ind3),
MAKETUPLE('group2','ind4',ind4)
)) as (group_type,col_type,col_value),
)) as (col_type,col_value),
);

BB = foreach B generate FLATTEN(
MAKEBAG(
MAKETUPLE('group1','ind1',ind1),
MAKETUPLE('group1','ind2',ind2),
MAKETUPLE('group2','ind3',ind3),
MAKETUPLE('group2','ind4',ind4)
)) as (group_type,col_type,col_value),
);

CC = JOIN AA by (ind1,group_type,col_type, col_value),
BB by (ind1,group_type,col_type, col_value);

DD = group CC by (ind1, group_type);
EE = foreach DD {
AT = DISTINCT AA.col_type;
BT = DISTINCT BB.col_type;
ATC = COUNT(AT);
BTC = COUNT(BT);
AFIELDS = LIMIT AA.additional_field 1;
BFIELDS = LIMIT BB.additional_field 1;
generate ATC, BTC, AFIELDS, BFIELDS;
}

FF = filter EE by (ATC>0 AND BTC>0);

RESULT = foreach FF generate FLATTEN(AFIELDS), FLATTEN(BFIELDS);

This gives the ability to perform any expression that can be expressed in CNF of equality comparisons.

I hope this wasn't home work for some undergraduate class you are taking…

2. Anonymous

For clarity, I've retyped the Pig Latin code to perform the arbitrary conjunctive normal form equi-join using some short hands for tuples and bags:

AA = foreach A generate FLATTEN(
{('group1','ind1',ind1),('group1','ind2',ind2),('group2','ind3',ind3),('group2','ind4',ind4)}) as (group_type,col_type,col_value),
;

BB = foreach B generate FLATTEN(