Bu oyunda kazanan kelimeleri bulmak için en hızlı python kodu

Bu çocuklar için aktivite kartları bir dizi bir kelime oyunudur. Kuralların altında / usr / share / dict / words kullanarak en iyi üçlüyü bulmak için kod bulunmaktadır. Bunun ilginç bir optimizasyon problemi olduğunu düşündüm ve insanların iyileştirme bulabileceğini merak ediyorum.

kurallar

Aşağıdaki setlerin her birinden bir harf seçin.
Seçilen harfleri (ve diğerlerini) kullanarak bir kelime seçin.
Kelimeyi puanlayın.
- Seçilen kümedeki her harf, kümeyle gösterilen sayıyı alır (tekrarlar dahil).
- AEIOU sayı 0
- Diğer tüm harfler -2
Yukarıdaki 1-3 arasındaki adımları tekrarlayın (1. adımda harfleri tekrar kullanmayın).
Nihai puan üç kelime puanının toplamıdır.

Setler

(1 puan 1 puan, 2 puan 2 puan vb. ayarlayın)

Kod:

from itertools import permutations
import numpy as np

points = {'LTN' : 1,
          'RDS' : 2,
          'GBM' : 3,
          'CHP' : 4,
          'FWV' : 5,
          'YKJ' : 6,
          'QXZ' : 7}

def tonum(word):
    word_array = np.zeros(26, dtype=np.int)
    for l in word:
        word_array[ord(l) - ord('A')] += 1
    return word_array.reshape((26, 1))

def to_score_array(letters):
    score_array = np.zeros(26, dtype=np.int) - 2
    for v in 'AEIOU':
        score_array[ord(v) - ord('A')] = 0
    for idx, l in enumerate(letters):
        score_array[ord(l) - ord('A')] = idx + 1
    return np.matrix(score_array.reshape(1, 26))

def find_best_words():
    wlist = [l.strip().upper() for l in open('/usr/share/dict/words') if l[0].lower() == l[0]]
    wlist = [l for l in wlist if len(l) > 4]
    orig = [l for l in wlist]
    for rep in 'AEIOU':
        wlist = [l.replace(rep, '') for l in wlist]
    wlist = np.hstack([tonum(w) for w in wlist])

    best = 0
    ct = 0
    bestwords = ()
    for c1 in ['LTN']:
        for c2 in permutations('RDS'):
            for c3 in permutations('GBM'):
                for c4 in permutations('CHP'):
                    for c5 in permutations('FWV'):
                        for c6 in permutations('YJK'):
                            for c7 in permutations('QZX'):
                                vals = [to_score_array(''.join(s)) for s in zip(c1, c2, c3, c4, c5, c6, c7)]
                                ct += 1
                                print ct, 6**6
                                scores1 = (vals[0] * wlist).A.flatten()
                                scores2 = (vals[1] * wlist).A.flatten()
                                scores3 = (vals[2] * wlist).A.flatten()
                                m1 = max(scores1)
                                m2 = max(scores2)
                                m3 = max(scores3)
                                if m1 + m2 + m3 > best:
                                    print orig[scores1.argmax()], orig[scores2.argmax()], orig[scores3.argmax()], m1 + m2 + m3
                                    best = m1 + m2 + m3
                                    bestwords = (orig[scores1.argmax()], orig[scores2.argmax()], orig[scores3.argmax()])
    return bestwords, best


if __name__ == '__main__':
    import timeit
    print timeit.timeit('print find_best_words()', 'from __main__ import find_best_words', number=1)

Matris sürümü saf python (sözlükler kullanarak ve her kelimeyi bağımsız olarak puanlama) ve diğeri numpy yazdıktan sonra matris çarpma yerine indeksleme kullanarak yazdım sonra geldi.

Bir sonraki optimizasyon ünlüleri puanlamadan tamamen kaldırmak (ve değiştirilmiş bir ord()işlev kullanmak ) olacaktır, ancak daha hızlı yaklaşımlar olup olmadığını merak ediyorum.

EDIT : timeit.timeit kodu eklendi

DÜZENLEME : En çok sevdiğim gelişmeye vereceğim bir ödül ekliyorum (veya muhtemelen birden fazla cevap, ancak durum böyleyse daha fazla itibar kazanmam gerekecek).

fastest-code python optimization

— thouis
kaynak

BTW, annesine karşı oyunu oynadığında ezberlemek için sekiz yaşındaki üç kelimemi vermek için kod yazdım. Şimdi ksilopirografinin ne anlama geldiğini biliyorum.

Bu eğlenceli bir soru. Aşağıdakileri sağlarsanız yanıt alma olasılığını artırabilirsiniz: (1) Herkes aynı veri kümesiyle çalışacak şekilde çevrimiçi bir kelime listesine bağlantı. (2) Çözümünüzü tek bir işleve koyun. (3) Zamanlamaları göstermek için time-it modülünü kullanarak bu işlevi çalıştırın. (4) Disk hızını test etmemek için sözlük verilerinin yüklenmesini işlevin dışına koymayı unutmayın. İnsanlar daha sonra çözümlerini karşılaştırmak için mevcut kodu bir çerçeve olarak kullanabilirler.

Timeit kullanmak için yeniden yazacağım, ancak adil karşılaştırmalar için kendi makinemi kullanmam gerekirdi (ki bu da çözümler gönderen insanlar için mutluyum). Çoğu sistemde bir kelime listesi bulunmalıdır, ancak değilse, burada birkaç tane vardır: wordlist.sourceforge.net

Her kullanıcı çözümünüzü ve diğer yayınlanmış çözümleri kendi makinelerinde kendi çözümlerine göre zamanlarsa adil karşılaştırmalar yapılabilir. Platformlar arasında bazı farklılıklar olacaktır, ancak genel olarak bu yöntem işe yarar.

Hm, bu durumda bunun doğru site olup olmadığını merak ediyorum. Bence SO en uygun olurdu.

— Joey

Yanıtlar:

Keith'in her kelime için mümkün olan en iyi puanı önceden hesaplama fikrini kullanarak, bilgisayarımda yürütme süresini yaklaşık 0.7 saniyeye indirmeyi başardım (75.288 kelimelik bir liste kullanarak).

İşin püf noktası, seçilen tüm harf kombinasyonları yerine çalınacak kelimelerin kombinasyonlarından geçmektir. Birkaç kelime kombinasyonu hariç tüm kelimeleri (kelime listemi kullanarak 203) görmezden gelebiliriz, çünkü daha önce bulduğumuzdan daha yüksek bir puan alamazlar. Uygulama zamanlarının neredeyse tamamı kelime puanlarını hesaplamak için harcanır.

Python 2.7:

import collections
import itertools


WORDS_SOURCE = '../word lists/wordsinf.txt'

WORDS_PER_ROUND = 3
LETTER_GROUP_STRS = ['LTN', 'RDS', 'GBM', 'CHP', 'FWV', 'YKJ', 'QXZ']
LETTER_GROUPS = [list(group) for group in LETTER_GROUP_STRS]
GROUP_POINTS = [(group, i+1) for i, group in enumerate(LETTER_GROUPS)]
POINTS_IF_NOT_CHOSEN = -2


def best_word_score(word):
    """Return the best possible score for a given word."""

    word_score = 0

    # Score the letters that are in groups, chosing the best letter for each
    # group of letters.
    total_not_chosen = 0
    for group, points_if_chosen in GROUP_POINTS:
        letter_counts_sum = 0
        max_letter_count = 0
        for letter in group:
            if letter in word:
                count = word.count(letter)
                letter_counts_sum += count
                if count > max_letter_count:
                    max_letter_count = count
        if letter_counts_sum:
            word_score += points_if_chosen * max_letter_count
            total_not_chosen += letter_counts_sum - max_letter_count
    word_score += POINTS_IF_NOT_CHOSEN * total_not_chosen

    return word_score

def best_total_score(words):
    """Return the best score possible for a given list of words.

    It is fine if the number of words provided is not WORDS_PER_ROUND. Only the
    words provided are scored."""

    num_words = len(words)
    total_score = 0

    # Score the letters that are in groups, chosing the best permutation of
    # letters for each group of letters.
    total_not_chosen = 0
    for group, points_if_chosen in GROUP_POINTS:
        letter_counts = []
        # Structure:  letter_counts[word_index][letter] = count
        letter_counts_sum = 0
        for word in words:
            this_word_letter_counts = {}
            for letter in group:
                count = word.count(letter)
                this_word_letter_counts[letter] = count
                letter_counts_sum += count
            letter_counts.append(this_word_letter_counts)

        max_chosen = None
        for letters in itertools.permutations(group, num_words):
            num_chosen = 0
            for word_index, letter in enumerate(letters):
                num_chosen += letter_counts[word_index][letter]
            if num_chosen > max_chosen:
                max_chosen = num_chosen

        total_score += points_if_chosen * max_chosen
        total_not_chosen += letter_counts_sum - max_chosen
    total_score += POINTS_IF_NOT_CHOSEN * total_not_chosen

    return total_score


def get_words():
    """Return the list of valid words."""
    with open(WORDS_SOURCE, 'r') as source:
        return [line.rstrip().upper() for line in source]

def get_words_by_score():
    """Return a dictionary mapping each score to a list of words.

    The key is the best possible score for each word in the corresponding
    list."""

    words = get_words()
    words_by_score = collections.defaultdict(list)
    for word in words:
        words_by_score[best_word_score(word)].append(word)
    return words_by_score


def get_winning_words():
    """Return a list of words for an optimal play."""

    # A word's position is a tuple of its score's index and the index of the
    # word within the list of words with this score.
    # 
    # word played: A word in the context of a combination of words to be played
    # word chosen: A word in the context of the list it was picked from

    words_by_score = get_words_by_score()
    num_word_scores = len(words_by_score)
    word_scores = sorted(words_by_score, reverse=True)
    words_by_position = []
    # Structure:  words_by_position[score_index][word_index] = word
    num_words_for_scores = []
    for score in word_scores:
        words = words_by_score[score]
        words_by_position.append(words)
        num_words_for_scores.append(len(words))

    # Go through the combinations of words in lexicographic order by word
    # position to find the best combination.
    best_score = None
    positions = [(0, 0)] * WORDS_PER_ROUND
    words = [words_by_position[0][0]] * WORDS_PER_ROUND
    scores_before_words = []
    for i in xrange(WORDS_PER_ROUND):
        scores_before_words.append(best_total_score(words[:i]))
    while True:
        # Keep track of the best possible combination of words so far.
        score = best_total_score(words)
        if score > best_score:
            best_score = score
            best_words = words[:]

        # Go to the next combination of words that could get a new best score.
        for word_played_index in reversed(xrange(WORDS_PER_ROUND)):
            # Go to the next valid word position.
            score_index, word_chosen_index = positions[word_played_index]
            word_chosen_index += 1
            if word_chosen_index == num_words_for_scores[score_index]:
                score_index += 1
                if score_index == num_word_scores:
                    continue
                word_chosen_index = 0

            # Check whether the new combination of words could possibly get a
            # new best score.
            num_words_changed = WORDS_PER_ROUND - word_played_index
            score_before_this_word = scores_before_words[word_played_index]
            further_points_limit = word_scores[score_index] * num_words_changed
            score_limit = score_before_this_word + further_points_limit
            if score_limit <= best_score:
                continue

            # Update to the new combination of words.
            position = score_index, word_chosen_index
            positions[word_played_index:] = [position] * num_words_changed
            word = words_by_position[score_index][word_chosen_index]
            words[word_played_index:] = [word] * num_words_changed
            for i in xrange(word_played_index+1, WORDS_PER_ROUND):
                scores_before_words[i] = best_total_score(words[:i])
            break
        else:
            # None of the remaining combinations of words can get a new best
            # score.
            break

    return best_words


def main():
    winning_words = get_winning_words()
    print winning_words
    print best_total_score(winning_words)

if __name__ == '__main__':
    main()

Bu ['KNICKKNACK', 'RAZZMATAZZ', 'POLYSYLLABLES']95 puanla çözümü döndürür. Keith'in çözümünün kelime listesine eklenen kelimelerle onunla aynı sonucu elde ederim. Thouis "ksilopyrografi" eklendi ile, ben ['XYLOPYROGRAPHY', 'KNICKKNACKS', 'RAZZMATAZZ']105 puan almak .

— flornquake
kaynak

İşte bir fikir - çoğu kelimenin korkunç puanları olduğunu fark ederek birçok kelimeyi kontrol etmekten kaçınabilirsiniz. Sizi 50 puan alan oldukça iyi bir skorlama oyunu bulduğunuzu varsayalım. Daha sonra 50 puandan fazla olan herhangi bir oyunda en az tavan (51/3) = 17 puanlık bir kelime olmalıdır. Dolayısıyla, 17 puan üretemeyen herhangi bir kelime göz ardı edilebilir.

İşte yukarıdakileri yapan bazı kodlar. Sözlükteki her kelime için mümkün olan en iyi puanı hesaplar ve skorla dizine eklenmiş bir dizide saklarız. Daha sonra bu diziyi yalnızca gerekli minimum puanı olan kelimeleri kontrol etmek için kullanırız.

from itertools import permutations
import time

S={'A':0,'E':0,'I':0,'O':0,'U':0,
   'L':1,'T':1,'N':1,
   'R':2,'D':2,'S':2,
   'G':3,'B':3,'M':3,
   'C':4,'H':4,'P':4,
   'F':5,'W':5,'V':5,
   'Y':6,'K':6,'J':6,
   'Q':7,'X':7,'Z':7,
   }

def best_word(min, s):
    global score_to_words
    best_score = 0
    best_word = ''
    for i in xrange(min, 100):
        for w in score_to_words[i]:
            score = (-2*len(w)+2*(w.count('A')+w.count('E')+w.count('I')+w.count('O')+w.count('U')) +
                      3*w.count(s[0])+4*w.count(s[1])+5*w.count(s[2])+6*w.count(s[3])+7*w.count(s[4])+
                      8*w.count(s[5])+9*w.count(s[6]))
            if score > best_score:
                best_score = score
                best_word = w
    return (best_score, best_word)

def load_words():
    global score_to_words
    wlist = [l.strip().upper() for l in open('/usr/share/dict/words') if l[0].lower() == l[0]]
    score_to_words = [[] for i in xrange(100)]
    for w in wlist: score_to_words[sum(S[c] for c in w)].append(w)
    for i in xrange(100):
        if score_to_words[i]: print i, len(score_to_words[i])

def find_best_words():
    load_words()
    best = 0
    bestwords = ()
    for c1 in permutations('LTN'):
        for c2 in permutations('RDS'):
            for c3 in permutations('GBM'):
            print time.ctime(),c1,c2,c3
                for c4 in permutations('CHP'):
                    for c5 in permutations('FWV'):
                        for c6 in permutations('YJK'):
                            for c7 in permutations('QZX'):
                                sets = zip(c1, c2, c3, c4, c5, c6, c7)
                                (s1, w1) = best_word((best + 3) / 3, sets[0])
                                (s2, w2) = best_word((best - s1 + 2) / 2, sets[1])
                                (s3, w3) = best_word(best - s1 - s2 + 1, sets[2])
                                score = s1 + s2 + s3
                                if score > best:
                                    best = score
                                    bestwords = (w1, w2, w3)
                                    print score, w1, w2, w3
    return bestwords, best


if __name__ == '__main__':
    import timeit
    print timeit.timeit('print find_best_words()', 'from __main__ import find_best_words', number=1)

Minimum puan 100'e kadar hızlı bir şekilde yükselir, bu da toplam /usr/share/dict/words33'ün sadece küçük bir kısmı olan 33+ puan kelimesini dikkate almamız gerektiği anlamına gelir (benim 1786 puan 33+ puan =% 0.8 olan 208662 geçerli kelimeye sahibim). Makinemde yaklaşık yarım saat içinde çalışır ve üretir:

(('MAXILLOPREMAXILLARY', 'KNICKKNACKED', 'ZIGZAGWISE'), 101)

— Keith Randall
kaynak

Güzel. Bunu matris çözümüne ekleyeceğim (puanları çok düştükçe kelimeleri kaldırmak), ancak bu, ortaya koyduğum saf python çözümlerinden önemli ölçüde daha iyi.

— thouis

Birçoğunun daha önce döngüler için iç içe olduğunu gördüğümden emin değilim.

— Peter Olson

Fikrinizi matris puanlama (ve mümkün olan en iyi skorda daha sıkı bir üst sınır) ile birleştirmek, makinemdeki zamanı yaklaşık 80 saniyeye düşürür (yaklaşık bir saatten itibaren). kod burada

— thouis

O zamanın iyi bir kısmı, çok daha hızlı yapılabilen mümkün olan en iyi skorların önceden hesaplanmasıdır.

— thouis