Python'da ROC eğrisi nasıl çizilir

Question 1

Lojistik regresyon paketleri kullanarak Python'da geliştirdiğim bir tahmin modelinin doğruluğunu değerlendirmek için bir ROC eğrisi çizmeye çalışıyorum. Gerçek pozitif oranı ve yanlış pozitif oranı hesapladım; ancak, matplotlibAUC değerini kullanarak bunları nasıl doğru şekilde çizeceğimi ve hesaplayacağımı çözemiyorum . Bunu nasıl yapabilirim?

Question 2

modelSklearn öngörücünüzün olduğunu varsayarak deneyebileceğiniz iki yol :

import sklearn.metrics as metrics
# calculate the fpr and tpr for all thresholds of the classification
probs = model.predict_proba(X_test)
preds = probs[:,1]
fpr, tpr, threshold = metrics.roc_curve(y_test, preds)
roc_auc = metrics.auc(fpr, tpr)

# method I: plt
import matplotlib.pyplot as plt
plt.title('Receiver Operating Characteristic')
plt.plot(fpr, tpr, 'b', label = 'AUC = %0.2f' % roc_auc)
plt.legend(loc = 'lower right')
plt.plot([0, 1], [0, 1],'r--')
plt.xlim([0, 1])
plt.ylim([0, 1])
plt.ylabel('True Positive Rate')
plt.xlabel('False Positive Rate')
plt.show()

# method II: ggplot
from ggplot import *
df = pd.DataFrame(dict(fpr = fpr, tpr = tpr))
ggplot(df, aes(x = 'fpr', y = 'tpr')) + geom_line() + geom_abline(linetype = 'dashed')

veya Dene

ggplot(df, aes(x = 'fpr', ymin = 0, ymax = 'tpr')) + geom_line(aes(y = 'tpr')) + geom_area(alpha = 0.2) + ggtitle("ROC Curve w/ AUC = %s" % str(roc_auc))

Question 3

Bu, bir dizi kesin referans etiketi ve tahmin edilen olasılıklar verildiğinde, bir ROC eğrisini çizmenin en basit yoludur. En iyi yanı, TÜM sınıflar için ROC eğrisini çizmesidir, böylece birden çok düzgün görünümlü eğri de elde edersiniz.

import scikitplot as skplt
import matplotlib.pyplot as plt

y_true = # ground truth labels
y_probas = # predicted probabilities generated by sklearn classifier
skplt.metrics.plot_roc_curve(y_true, y_probas)
plt.show()

Plot_roc_curve tarafından oluşturulan örnek bir eğri. Scikit-learn'deki örnek rakam veri setini kullandım, böylece 10 sınıf var. Her sınıf için bir ROC eğrisinin çizildiğine dikkat edin.

Sorumluluk reddi: Bunun benim oluşturduğum scikit-plot kitaplığını kullandığına dikkat edin .

Question 4

Buradaki sorunun ne olduğu hiç net değil, ancak bir diziniz true_positive_rateve bir diziniz varsa false_positive_rate, ROC eğrisini çizmek ve AUC'yi elde etmek şu kadar basittir:

import matplotlib.pyplot as plt
import numpy as np

x = # false_positive_rate
y = # true_positive_rate 

# This is the ROC curve
plt.plot(x,y)
plt.show() 

# This is the AUC
auc = np.trapz(y,x)

Question 5

Matplotlib kullanarak İkili Sınıflandırma için AUC eğrisi

from sklearn import svm, datasets
from sklearn import metrics
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_breast_cancer
import matplotlib.pyplot as plt

Göğüs Kanseri Veri Kümesini Yükle

breast_cancer = load_breast_cancer()

X = breast_cancer.data
y = breast_cancer.target

Veri Kümesini Böl

X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.33, random_state=44)

Modeli

clf = LogisticRegression(penalty='l2', C=0.1)
clf.fit(X_train, y_train)
y_pred = clf.predict(X_test)

Doğruluk

print("Accuracy", metrics.accuracy_score(y_test, y_pred))

AUC Eğrisi

y_pred_proba = clf.predict_proba(X_test)[::,1]
fpr, tpr, _ = metrics.roc_curve(y_test,  y_pred_proba)
auc = metrics.roc_auc_score(y_test, y_pred_proba)
plt.plot(fpr,tpr,label="data 1, auc="+str(auc))
plt.legend(loc=4)
plt.show()

Question 6

İşte ROC eğrisini hesaplamak için python kodu (dağılım grafiği olarak):

import matplotlib.pyplot as plt
import numpy as np

score = np.array([0.9, 0.8, 0.7, 0.6, 0.55, 0.54, 0.53, 0.52, 0.51, 0.505, 0.4, 0.39, 0.38, 0.37, 0.36, 0.35, 0.34, 0.33, 0.30, 0.1])
y = np.array([1,1,0, 1, 1, 1, 0, 0, 1, 0, 1,0, 1, 0, 0, 0, 1 , 0, 1, 0])

# false positive rate
fpr = []
# true positive rate
tpr = []
# Iterate thresholds from 0.0, 0.01, ... 1.0
thresholds = np.arange(0.0, 1.01, .01)

# get number of positive and negative examples in the dataset
P = sum(y)
N = len(y) - P

# iterate through all thresholds and determine fraction of true positives
# and false positives found at this threshold
for thresh in thresholds:
    FP=0
    TP=0
    for i in range(len(score)):
        if (score[i] > thresh):
            if y[i] == 1:
                TP = TP + 1
            if y[i] == 0:
                FP = FP + 1
    fpr.append(FP/float(N))
    tpr.append(TP/float(P))

plt.scatter(fpr, tpr)
plt.show()

Question 7

from sklearn import metrics
import numpy as np
import matplotlib.pyplot as plt

y_true = # true labels
y_probas = # predicted results
fpr, tpr, thresholds = metrics.roc_curve(y_true, y_probas, pos_label=0)

# Print ROC curve
plt.plot(fpr,tpr)
plt.show() 

# Print AUC
auc = np.trapz(tpr,fpr)
print('AUC:', auc)

Question 8

Önceki cevaplar, gerçekten TP / Sens değerini kendiniz hesapladığınızı varsayar. Bunu manuel olarak yapmak kötü bir fikirdir, hesaplamalarda hata yapmak kolaydır, bunun yerine tüm bunlar için bir kütüphane işlevi kullanın.

scikit_lean'daki plot_roc işlevi tam olarak ihtiyacınız olanı yapar: http://scikit-learn.org/stable/auto_examples/model_selection/plot_roc.html

Kodun temel kısmı şudur:

  for i in range(n_classes):
      fpr[i], tpr[i], _ = roc_curve(y_test[:, i], y_score[:, i])
      roc_auc[i] = auc(fpr[i], tpr[i])

Question 9

Stackoverflow, scikit-learn belgeleri ve diğer bazılarından gelen birden fazla yoruma dayanarak, ROC eğrisini (ve diğer ölçütleri) gerçekten basit bir şekilde çizmek için bir python paketi yaptım.

Paketi yüklemek için: pip install plot-metric(yazının sonunda daha fazla bilgi)

Bir ROC Eğrisi çizmek için (örnek, belgelerden alınmıştır):

İkili sınıflandırma

Basit bir veri seti yükleyelim ve bir tren ve test seti yapalım:

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
X, y = make_classification(n_samples=1000, n_classes=2, weights=[1,1], random_state=1)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5, random_state=2)

Bir sınıflandırıcı eğitin ve test setini tahmin edin:

from sklearn.ensemble import RandomForestClassifier
clf = RandomForestClassifier(n_estimators=50, random_state=23)
model = clf.fit(X_train, y_train)

# Use predict_proba to predict probability of the class
y_pred = clf.predict_proba(X_test)[:,1]

Artık ROC Eğrisini çizmek için plot_metric'i kullanabilirsiniz:

from plot_metric.functions import BinaryClassification
# Visualisation with plot_metric
bc = BinaryClassification(y_test, y_pred, labels=["Class 1", "Class 2"])

# Figures
plt.figure(figsize=(5,5))
bc.plot_roc_curve()
plt.show()

Sonuç:

Paketin github ve belgelerinde daha fazla örnek bulabilirsiniz:

Github: https://github.com/yohann84L/plot_metric
Belgeler: https://plot-metric.readthedocs.io/en/latest/

Question 10

Resmi dokümantasyon formu scikit'i de takip edebilirsiniz:

https://scikit-learn.org/stable/auto_examples/model_selection/plot_roc.html#sphx-glr-auto-examples-model-selection-plot-roc-py

Question 11

ROC eğrisi için bir pakete dahil edilen basit bir fonksiyon yaptım. Makine öğrenimi uygulamaya yeni başladım, bu nedenle lütfen bu kodda herhangi bir sorun olup olmadığını da bildirin!

Daha fazla ayrıntı için github benioku dosyasına bir göz atın! :)

https://github.com/bc123456/ROC

from sklearn.metrics import confusion_matrix, accuracy_score, roc_auc_score, roc_curve
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np

def plot_ROC(y_train_true, y_train_prob, y_test_true, y_test_prob):
    '''
    a funciton to plot the ROC curve for train labels and test labels.
    Use the best threshold found in train set to classify items in test set.
    '''
    fpr_train, tpr_train, thresholds_train = roc_curve(y_train_true, y_train_prob, pos_label =True)
    sum_sensitivity_specificity_train = tpr_train + (1-fpr_train)
    best_threshold_id_train = np.argmax(sum_sensitivity_specificity_train)
    best_threshold = thresholds_train[best_threshold_id_train]
    best_fpr_train = fpr_train[best_threshold_id_train]
    best_tpr_train = tpr_train[best_threshold_id_train]
    y_train = y_train_prob > best_threshold

    cm_train = confusion_matrix(y_train_true, y_train)
    acc_train = accuracy_score(y_train_true, y_train)
    auc_train = roc_auc_score(y_train_true, y_train)

    print 'Train Accuracy: %s ' %acc_train
    print 'Train AUC: %s ' %auc_train
    print 'Train Confusion Matrix:'
    print cm_train

    fig = plt.figure(figsize=(10,5))
    ax = fig.add_subplot(121)
    curve1 = ax.plot(fpr_train, tpr_train)
    curve2 = ax.plot([0, 1], [0, 1], color='navy', linestyle='--')
    dot = ax.plot(best_fpr_train, best_tpr_train, marker='o', color='black')
    ax.text(best_fpr_train, best_tpr_train, s = '(%.3f,%.3f)' %(best_fpr_train, best_tpr_train))
    plt.xlim([0.0, 1.0])
    plt.ylim([0.0, 1.0])
    plt.xlabel('False Positive Rate')
    plt.ylabel('True Positive Rate')
    plt.title('ROC curve (Train), AUC = %.4f'%auc_train)

    fpr_test, tpr_test, thresholds_test = roc_curve(y_test_true, y_test_prob, pos_label =True)

    y_test = y_test_prob > best_threshold

    cm_test = confusion_matrix(y_test_true, y_test)
    acc_test = accuracy_score(y_test_true, y_test)
    auc_test = roc_auc_score(y_test_true, y_test)

    print 'Test Accuracy: %s ' %acc_test
    print 'Test AUC: %s ' %auc_test
    print 'Test Confusion Matrix:'
    print cm_test

    tpr_score = float(cm_test[1][1])/(cm_test[1][1] + cm_test[1][0])
    fpr_score = float(cm_test[0][1])/(cm_test[0][0]+ cm_test[0][1])

    ax2 = fig.add_subplot(122)
    curve1 = ax2.plot(fpr_test, tpr_test)
    curve2 = ax2.plot([0, 1], [0, 1], color='navy', linestyle='--')
    dot = ax2.plot(fpr_score, tpr_score, marker='o', color='black')
    ax2.text(fpr_score, tpr_score, s = '(%.3f,%.3f)' %(fpr_score, tpr_score))
    plt.xlim([0.0, 1.0])
    plt.ylim([0.0, 1.0])
    plt.xlabel('False Positive Rate')
    plt.ylabel('True Positive Rate')
    plt.title('ROC curve (Test), AUC = %.4f'%auc_test)
    plt.savefig('ROC', dpi = 500)
    plt.show()

    return best_threshold

Bu kod tarafından üretilen örnek bir roc grafiği

Question 12

Olasılıklara da ihtiyacınız olduğunda ... Aşağıdakiler AUC değerini alır ve hepsini tek seferde çizer.

from sklearn.metrics import plot_roc_curve

plot_roc_curve(m,xs,y)

Olasılıklara sahip olduğunuzda ... auc değerini ve grafikleri tek seferde alamazsınız. Aşağıdakileri yapın:

from sklearn.metrics import roc_curve

fpr,tpr,_ = roc_curve(y,y_probas)
plt.plot(fpr,tpr, label='AUC = ' + str(round(roc_auc_score(y,m.oob_decision_function_[:,1]), 2)))
plt.legend(loc='lower right')

Question 13

Bunu sizin için yapacak olan metriculous adlı bir kütüphane var:

$ pip install metriculous

Önce bazı verilerle alay edelim, bu genellikle test veri setinden ve modellerden gelir:

import numpy as np

def normalize(array2d: np.ndarray) -> np.ndarray:
    return array2d / array2d.sum(axis=1, keepdims=True)

class_names = ["Cat", "Dog", "Pig"]
num_classes = len(class_names)
num_samples = 500

# Mock ground truth
ground_truth = np.random.choice(range(num_classes), size=num_samples, p=[0.5, 0.4, 0.1])

# Mock model predictions
perfect_model = np.eye(num_classes)[ground_truth]
noisy_model = normalize(
    perfect_model + 2 * np.random.random((num_samples, num_classes))
)
random_model = normalize(np.random.random((num_samples, num_classes)))

Artık , ROC eğrileri de dahil olmak üzere çeşitli ölçümler ve diyagramlar içeren bir tablo oluşturmak için metriculous'u kullanabiliriz :

import metriculous

metriculous.compare_classifiers(
    ground_truth=ground_truth,
    model_predictions=[perfect_model, noisy_model, random_model],
    model_names=["Perfect Model", "Noisy Model", "Random Model"],
    class_names=class_names,
    one_vs_all_figures=True, # This line is important to include ROC curves in the output
).save_html("model_comparison.html").display()

Çıktıdaki ROC eğrileri:

Grafikler yakınlaştırılabilir ve sürüklenebilir ve farenizle arsa üzerinde gezindiğinizde daha fazla ayrıntı alırsınız: