2017-09-05 45 views
1

我有一個關於組合的問題,但在一個相當複雜的情況下,我還沒有找到任何幫助。我試圖找到一種方法來報告數據集中的所有可能組合。報告列的所有可能的組合

有關土地變化文獻調查的數據報告,並指出每篇文章中報告哪些近似和潛在驅動因素。因此,這些行表示單獨的文章,並且列表示所有鄰近和下面的驅動程序。有六種類型的接近驅動程序和五種類型的基礎驅動程序。對於每篇文章,將1放置在該文章中標識的驅動程序的列中,並在驅動程序的列中放置一個0。該表看起來大致像這樣:

key | d1 | d2 |...| d6 | i1 |...| i5 | 
-------------------------------------- 
A1 | 1 | 0 |...| 1 | 1 |...| 0 | 
A2 | 0 | 1 |...| 0 | 0 |...| 1 | 

凡文章A1標識D1和D6直接驅動和I1作爲間接驅動等

我想什麼做的是找出文章的數量報告直接驅動因素,間接驅動因素以及直接驅動因素和間接驅動因素的所有可能組合。舉例來說,有多少篇文章標識了d1,d2和i1;多少個確定d1,d2和i2;等等?我的學生在Excel文件中有表格,我在想,也許Calc或Base可能有一個功能來自動執行該過程。有沒有人有一個想法,我該如何做到這一點?

謝謝!

+0

所以你想識別所有2^11組合,並計算每個組合的數量?這是2048種不同的組合。 –

+0

這就是爲什麼我希望簡化這個過程。這個想法是確定哪些驅動程序組合在文獻中最常出現。 – Napoletano

+0

使用一個小UDF將驅動程序組合成條件字符串(或連接二進制數字)。然後使用數據透視表來統計每個組合字符串的數量。 – MacroMarc

回答

0

我終於放棄了,並採取了蠻力的方法。我將表格導出爲文本並將其拉入MySQL,然後使用bash腳本遍歷選項。如果其他人有類似的問題,這裏是bash腳本:

# Generate array containing factors 
faclis1=(d_inf d_com d_inm d_ind d_agr d_bos i_dem i_eco i_tec i_pol i_cul); 
#faclis=("d_inf" "d_com" "d_inm"); 
a=0 
#echo ${faclis[@]}; 


# Remove output file if exists 
if [ -e permcounts.txt ]; 
    then 
    rm permcounts.txt; 
    fi; 

# Cycle through list of factors 
for f1 in ${faclis1[@]}; 
do 
# only proceed if factor not null 
if [ ${f1} ]; 
then 
# print remaining array just to be sure 
echo "factor list is ${faclis1[@]}"; 
#echo ${faclis[@]}; 
echo "Now on factor ${f1}"; 
echo "FACTOR ${f1}" >> permcounts.txt; 
mysql -u harvey -pdavid -e "select count(clave) from genfact where \ 
${f1} = 1;" metamorelia >> permcounts.txt; 
# create sub array without current factor, 2 factors 
faclis2=(${faclis1[@]/${f1}/}); 
#set sub-counter 
b=0 
#echo "${faclis2[@]}"; 
# loop through sub array, two factors 
for f2 in ${faclis2[@]}; 
do 
if [ ${f2} ] && \ 
[ "${f1}" != "${f2}" ]; 
then 
echo "FACTOR ${f1} \ 
AND ${f2}" >> permcounts.txt; 
mysql -u harvey -pdavid -e "select count(clave) from genfact where \ 
${f1} = 1 and \ 
${f2} = 1;" metamorelia >> permcounts.txt; 

# next sub-array 
faclis3=(${faclis2[@]//${f2}}); 
c=0 
#echo "${faclis3[@]}"; 
# loop through sub-array 
for f3 in ${faclis3[@]}; 
do 
if [ ${f3} ] && \ 
[ "${f1}" != "${f3}" ] && \ 
[ "${f2}" != "${f3}" ]; 

then 
echo "FACTOR ${f1} \ 
AND ${f2} \ 
AND ${f3}" >> permcounts.txt; 
mysql -u harvey -pdavid -e "select count(clave) from genfact where \ 
${f1} = 1 and \ 
${f2} = 1 and \ 
${f3} = 1;" metamorelia >> permcounts.txt; 

# next sub-array 
faclis4=(${faclis3[@]//${f3}}); 
d=0 
#echo "${faclis4[@]}"; 
# loop through sub-array 
for f4 in ${faclis4[@]}; 
do 
if [ ${f4} ] && \ 
[ "${f1}" != "${f4}" ] && \ 
[ "${f2}" != "${f4}" ] && \ 
[ "${f3}" != "${f4}" ]; 
then 
echo "FACTOR ${f1} \ 
AND ${f2} \ 
AND ${f3} \ 
AND ${f4}" >> permcounts.txt; 
mysql -u harvey -pdavid -e "select count(clave) from genfact where \ 
${f1} = 1 and \ 
${f2} = 1 and \ 
${f3} = 1 and \ 
${f4} = 1;" metamorelia >> permcounts.txt; 

# next sub-array 
faclis5=(${faclis4[@]//${f4}}); 
e=0 
#echo "${faclis5[@]}"; 
# loop through sub-array 
for f5 in ${faclis5[@]}; 
do 
if [ ${f5} ] && \ 
[ "${f1}" != "${f5}" ] && \ 
[ "${f2}" != "${f5}" ] && \ 
[ "${f3}" != "${f5}" ] && \ 
[ "${f4}" != "${f5}" ]; 
then 
echo "FACTOR ${f1} \ 
AND ${f2} \ 
AND ${f3} \ 
AND ${f4} \ 
AND ${f5}" >> permcounts.txt; 
mysql -u harvey -pdavid -e "select count(clave) from genfact where \ 
${f1} = 1 and \ 
${f2} = 1 and \ 
${f3} = 1 and \ 
${f4} = 1 and \ 
${f5} = 1;" metamorelia >> permcounts.txt; 

# next sub-array 
faclis6=(${faclis5[@]//${f5}}); 
f=0 
#echo "${faclis6[@]}"; 
# loop through sub-array 
for f6 in ${faclis6[@]}; 
do 
if [ ${f6} ] && \ 
[ "${f1}" != "${f6}" ] && \ 
[ "${f2}" != "${f6}" ] && \ 
[ "${f3}" != "${f6}" ] && \ 
[ "${f4}" != "${f6}" ] && \ 
[ "${f5}" != "${f6}" ]; 
then 
echo "FACTOR ${f1} \ 
AND ${f2} \ 
AND ${f3} \ 
AND ${f4} \ 
AND ${f5} \ 
AND ${f6}" >> permcounts.txt; 
mysql -u harvey -pdavid -e "select count(clave) from genfact where \ 
${f1} = 1 and \ 
${f2} = 1 and \ 
${f3} = 1 and \ 
${f4} = 1 and \ 
${f5} = 1 and \ 
${f6} = 1;" metamorelia >> permcounts.txt; 

# next sub-array 
faclis7=(${faclis6[@]//${f6}}); 
g=0 
#echo "${faclis7[@]}"; 
# loop through sub-array 
for f7 in ${faclis7[@]}; 
do 
if [ ${f7} ] && \ 
[ "${f1}" != "${f7}" ] && \ 
[ "${f2}" != "${f7}" ] && \ 
[ "${f3}" != "${f7}" ] && \ 
[ "${f4}" != "${f7}" ] && \ 
[ "${f5}" != "${f7}" ] && \ 
[ "${f6}" != "${f7}" ]; 
then 
echo "FACTOR ${f1} \ 
AND ${f2} \ 
AND ${f3} \ 
AND ${f4} \ 
AND ${f5} \ 
AND ${f6} \ 
AND ${f7}" >> permcounts.txt; 
mysql -u harvey -pdavid -e "select count(clave) from genfact where \ 
${f1} = 1 and \ 
${f2} = 1 and \ 
${f3} = 1 and \ 
${f4} = 1 and \ 
${f5} = 1 and \ 
${f6} = 1 and \ 
${f7} = 1;" metamorelia >> permcounts.txt; 

# next sub-array 
faclis8=(${faclis7[@]//${f7}}); 
h=0 
#echo "${faclis8[@]}"; 
# loop through sub-array 
for f8 in ${faclis8[@]}; 
do 
if [ ${f8} ] && \ 
[ "${f1}" != "${f8}" ] && \ 
[ "${f2}" != "${f8}" ] && \ 
[ "${f3}" != "${f8}" ] && \ 
[ "${f4}" != "${f8}" ] && \ 
[ "${f5}" != "${f8}" ] && \ 
[ "${f6}" != "${f8}" ] && \ 
[ "${f7}" != "${f8}" ]; 
then 
echo "FACTOR ${f1} \ 
AND ${f2} \ 
AND ${f3} \ 
AND ${f4} \ 
AND ${f5} \ 
AND ${f6} \ 
AND ${f7} \ 
AND ${f8}" >> permcounts.txt; 
mysql -u harvey -pdavid -e "select count(clave) from genfact where \ 
${f1} = 1 and \ 
${f2} = 1 and \ 
${f3} = 1 and \ 
${f4} = 1 and \ 
${f5} = 1 and \ 
${f6} = 1 and \ 
${f7} = 1 and \ 
${f8} = 1;" metamorelia >> permcounts.txt; 

# next sub-array 
faclis9=(${faclis8[@]//${f8}}); 
i=0 
#echo "${faclis9[@]}"; 
# loop through sub-array 
for f9 in ${faclis9[@]}; 
do 
if [ ${f9} ] && \ 
[ "${f1}" != "${f9}" ] && \ 
[ "${f2}" != "${f9}" ] && \ 
[ "${f3}" != "${f9}" ] && \ 
[ "${f4}" != "${f9}" ] && \ 
[ "${f5}" != "${f9}" ] && \ 
[ "${f6}" != "${f9}" ] && \ 
[ "${f7}" != "${f9}" ] && \ 
[ "${f8}" != "${f9}" ]; 
then 
echo "FACTOR ${f1} \ 
AND ${f2} \ 
AND ${f3} \ 
AND ${f4} \ 
AND ${f5} \ 
AND ${f6} \ 
AND ${f7} \ 
AND ${f8} \ 
AND ${f9}" >> permcounts.txt; 
mysql -u harvey -pdavid -e "select count(clave) from genfact where \ 
${f1} = 1 and \ 
${f2} = 1 and \ 
${f3} = 1 and \ 
${f4} = 1 and \ 
${f5} = 1 and \ 
${f6} = 1 and \ 
${f7} = 1 and \ 
${f8} = 1 and \ 
${f9} = 1;" metamorelia >> permcounts.txt; 

# next sub-array 
faclis10=(${faclis9[@]//${f9}}); 
j=0 
#echo "${faclis10[@]}"; 
# loop through sub-array 
for f10 in ${faclis10[@]}; 
do 
if [ ${f10} ] && \ 
[ "${f1}" != "${f10}" ] && \ 
[ "${f2}" != "${f10}" ] && \ 
[ "${f3}" != "${f10}" ] && \ 
[ "${f4}" != "${f10}" ] && \ 
[ "${f5}" != "${f10}" ] && \ 
[ "${f6}" != "${f10}" ] && \ 
[ "${f7}" != "${f10}" ] && \ 
[ "${f8}" != "${f10}" ] && \ 
[ "${f9}" != "${f10}" ]; 
then 
echo "FACTOR ${f1} \ 
AND ${f2} \ 
AND ${f3} \ 
AND ${f4} \ 
AND ${f5} \ 
AND ${f6} \ 
AND ${f7} \ 
AND ${f8} \ 
AND ${f9} \ 
AND ${f10}" >> permcounts.txt; 
mysql -u harvey -pdavid -e "select count(clave) from genfact where \ 
${f1} = 1 and \ 
${f2} = 1 and \ 
${f3} = 1 and \ 
${f4} = 1 and \ 
${f5} = 1 and \ 
${f6} = 1 and \ 
${f7} = 1 and \ 
${f8} = 1 and \ 
${f9} = 1 and \ 
${f10} = 1;" metamorelia >> permcounts.txt; 

# next sub-array 
faclis11=(${faclis10[@]//${f10}}); 
k=0 
#echo "${faclis11[@]}"; 
# loop through sub-array 
for f11 in ${faclis11[@]}; 
do 
if [ ${f11} ] && \ 
[ "${f1}" != "${f11}" ] && \ 
[ "${f2}" != "${f11}" ] && \ 
[ "${f3}" != "${f11}" ] && \ 
[ "${f4}" != "${f11}" ] && \ 
[ "${f5}" != "${f11}" ] && \ 
[ "${f6}" != "${f11}" ] && \ 
[ "${f7}" != "${f11}" ] && \ 
[ "${f8}" != "${f11}" ] && \ 
[ "${f9}" != "${f11}" ] && \ 
[ "${f10}" != "${f11}" ]; 
then 
echo "FACTOR ${f1} \ 
AND ${f2} \ 
AND ${f3} \ 
AND ${f4} \ 
AND ${f5} \ 
AND ${f6} \ 
AND ${f7} \ 
AND ${f8} \ 
AND ${f9} \ 
AND ${f10} \ 
AND ${f11}" >> permcounts.txt; 
mysql -u harvey -pdavid -e "select count(clave) from genfact where \ 
${f1} = 1 and \ 
${f2} = 1 and \ 
${f3} = 1 and \ 
${f4} = 1 and \ 
${f5} = 1 and \ 
${f6} = 1 and \ 
${f7} = 1 and \ 
${f8} = 1 and \ 
${f9} = 1 and \ 
${f10} = 1 and \ 
${f11} = 1;" metamorelia >> permcounts.txt; 

unset faclis11[k]; 
k=$((${k} + 1)); 
fi; 
done; 
unset faclis10[j]; 
j=$((${j} + 1)); 
fi; 
done; 
unset faclis9[i]; 
i=$((${i} + 1)); 
fi; 
done; 
unset faclis8[h]; 
h=$((${h} + 1)); 
fi; 
done; 
unset faclis7[g]; 
g=$((${g} + 1)); 
fi; 
done; 
unset faclis6[f]; 
f=$((${f} + 1)); 
fi; 
done; 
unset faclis5[e]; 
e=$((${e} + 1)); 
fi; 
done; 
unset faclis4[d]; 
d=$((${d} + 1)); 
fi; 
done; 
unset faclis3[c]; 
c=$((${c} + 1)); 
fi; 
done; 
# Remove analyzed factors from vector 
unset faclis2[b]; 
b=$((${b} + 1)); 
fi; 
done; 
# remove nth item from array (progressively remove one item) 
unset faclis1[a]; 
# increment n for next round 
a=$((${a} + 1)); 
echo ${n}; 
fi; 
done; 

這個腳本是在有點低效我想我加入了很多不必要的操作,但它得到了這份工作完成。 (我想是的,我的學生必須導航輸出文件,以確保一切都在那裏。)