Py Template: Data Science and Analytics Tools - Text Classification with TF-IDF and Naive Bayes

Tuesday, April 1, 2025

Notes:

What problem does it solve?
Helps businesses automatically categorize text data (e.g., emails, reviews, or support tickets) into predefined categories.
How can businesses or users benefit from customizing the code?
Custom categories or more advanced models can be added to fine-tune the text classification.
How can businesses or users adopt the solution further, if needed?
Can be integrated with email management systems or customer service platforms for automated routing.

Actual Python Code:

import pandas as pd

from sklearn.feature_extraction.text import TfidfVectorizer

from sklearn.naive_bayes import MultinomialNB

from sklearn.model_selection import train_test_split

from sklearn.metrics import classification_report

# Load text data (e.g., emails or customer support tickets)

data = pd.read_csv('text_data.csv')

# Prepare features (TF-IDF) and target variable (categories)

vectorizer = TfidfVectorizer()

X = vectorizer.fit_transform(data['Text'])

y = data['Category']

# Split the data into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Train a Naive Bayes model

model = MultinomialNB()

model.fit(X_train, y_train)

# Evaluate the model

y_pred = model.predict(X_test)

print(classification_report(y_test, y_pred))

Py Template