Paradoks Simpson

Paradoks Simpson minangka salah sawijining fenomena statistik sing gampang dingerteni lan ing wektu sing padha. Muncul nalika klompok data nuduhake tren tartamtu, nanging tren kasebut dibalik nalika klompok digabungake. Kanthi conto conto sing gampang, paradoks bisa dingerteni kanthi cepet.


Kita nganggep loro set sing beda \(\#1\) lan \(\#2\) uga \(G = \#1 \cup \#2\) lan nyoba tingkat sukses \(A\) lan ing set kasebut \(B\):

\(A\)\(B\)\(win\)
\(\#1\)\(\frac{1}{1}=100\%\)\(\frac{3}{4}=75\%\)\(A\)
\(\#2\)\(\frac{2}{5}=40\%\)\(\frac{1}{3}=33\%\)\(A\)
\(\#1 \cup \#2\)\(\frac{3}{6}=50\%\)\(\frac{4}{7}=57\%\)\(B\)

Ternyata \(A\) luwih sukses tinimbang \(B\) ing \(\#1\) uga \(\#2\) \(B\) , nanging kaget ing \(G\) \(B\) luwih sukses tinimbang \(A\) . Contone iki uga minangka salah sawijining sing paling cilik set \(G\) kanthi \(|G|=13\) . Ora ana \(G\) kanthi \(|G|<13\) (bukti kanthi kekuwatan).

Saiki kita dibagi bagean \(G\) tinimbang \(2\) dadi \(3\) langganan sing beda \(\#1, \, \#2, \, \#3\) nganggo \(\#1 \cup \#2 \cup \#3 = G\) . Banjur kita nggawe kasus sing apik kanggo saben elemen \(e_k \neq \emptyset\) set daya \(P(G)\) saka \(G\) ngisor iki ditrapake: $$\forall e_1, e_2 \in P(G): |e_1| \neq |e_2| \Rightarrow win(e_1) \neq win(e_2) \land |e_1| = |e_2| \Rightarrow win(e_1) = win(e_2)$$ $$\forall e_1, e_2 \in P(G): |e_1| \neq |e_2| \Rightarrow win(e_1) \neq win(e_2) \land |e_1| = |e_2| \Rightarrow win(e_1) = win(e_2)$$

Sawise sawetara jam kekuwatan ing inti i7 standar, conto ing ngisor iki bisa ditemokake:

\(A\)\(B\)\(C\)\(win\)
\(\#1\)\(\frac{6}{7}=85,71\%\)\(\frac{12}{15}=80,00\%\) \(\frac{22}{37}=59,46\%\) \(A\)
\(\#2\)\(\frac{95}{167}=56,89\%\) \(\frac{48}{88}=54,55\%\) \(\frac{38}{67}=56,72\%\) \(A\)
\(\#3\)\(\frac{48}{144}=33,33\%\) \(\frac{16}{50}=32,00\%\) \(\frac{2}{20}=10,00\%\) \(A\)
\(\#1 \cup \#2\)\(\frac{101}{174}=58,05\%\) \(\frac{60}{103}=58,25\%\) \(\frac{60}{104}=57,69\%\) \(B\)
\(\#1 \cup \#3\)\(\frac{54}{151}=35,76\%\) \(\frac{28}{65}=43,08\%\) \(\frac{24}{57}=42,11\%\) \(B\)
\(\#2 \cup \#3\)\(\frac{143}{311}=45,98\%\) \(\frac{64}{138}=46,38\%\) \(\frac{40}{87}=45,98\%\) \(B\)
\(\#1 \cup \#2\cup \#3\)\(\frac{149}{318}=46,86\%\) \(\frac{76}{153}=49,67\%\) \(\frac{62}{124}=50,00\%\) \(C\)

Mangkono (kanthi asumsi wektu komputasi kanthi suwene) conto \(n\) langganan sing beda karo prilaku sing padha bisa ditemokake. Nalika kasus kasebut kedadeyan ing kasunyatan, kesimpulan adhedhasar rekomendasi supaya sukses klompok bisa dingerteni lan ora ana gunane.

Ing wekdal punika, maca kausalitas sing apik: Model, Reasoning lan Inferensi dening Yudea Mutiara dianjurake.

Bali