Email Spam And Malware Filtering

What is E - Mail Spam?

Email spam, also known as junk email, refers to unsolicited email messages, usually sent in bulk to a large list of recipients.

Why Gmail Says emails are spam

Some common reasons for email landing in the spam folder are: Usage of spam trigger words. Promotional subject lines. Too much HTML content in the email.

In this Machine Learning Spam Filtering applications, we will develop a spam detector app using support vector machine (SVM) technique for classification and Natural Language Processing. Using tensor flow...

Support Vector Machine Introduction :

Support Vector Machine or SVM is one of the most popular Supervised Learning algorithms, which is used for Classification as well as Regression problems.

However, primarily, it is used for Classification problems in Machine Learning.

The goal of the SVM algorithm is to create the best line or decision boundary

that can segregate n-dimensional space into classes so that we can easily put

the new data point in the correct category in the future.

This best decision boundary is called a hyperplane.

Support Vector Machine Hyperplane:

The dimensions of the hyperplane depend on the features present in the dataset, which means if there are 2 features (as shown in image), then hyperplane will be a straight line.

And if there are 3 features, then hyperplane will be a 2-dimension plane.

We always create a hyperplane that has a maximum margin, which means the maximum distance between the data points.

Types Of SVM:

SVM can be of two types:

1. Linear SVM: Linear SVM is used for linearly separable data, which means if a dataset can be classified into two classes by using a single straight line, then such data is termed as linearly separable data, and classifier is used called as Linear SVM classifier.

2. Non-linear SVM: Non-Linear SVM is used for non-linearly separated data, which means if a dataset cannot be classified by using a straight line, then such data is termed as non-linear data and classifier used is called as Non-linear SVM classifier

Linear SVM

Formulas For Linear SVM:

Non-Linear SVM:

Solution Of non-linear SVM:

Code for implementing the spam mail detection system by using the support vector machine :

import pandas as pd
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.model_selection import train_test_splitfrom sklearn.svm import SVCfrom sklearn.metrics import accuracy_score, classification_report, confusion_matrix


# Load datasetdata = pd.read_csv("spam.csv", encoding='latin-1')
data = data[['v1', 'v2']]  # Select relevant columns
data.columns = ['label', 'message']  # Rename columns


# Convert labels to binary valuesdata['label'] = data['label'].map({'ham': 0, 'spam': 1})
# Split the dataset into features and labelsX = data['message']
y = data['label']


# Convert text data to numerical features using TF-IDFtfidf = TfidfVectorizer(stop_words='english', max_df=0.7)
X_tfidf = tfidf.fit_transform(X)


# Split the data into training and testing setsX_train, X_test, y_train, y_test = train_test_split(X_tfidf, y, test_size=0.2, random_state=42)


# Train the SVM modelsvm_model = SVC(kernel='linear', C=1.0)
svm_model.fit(X_train, y_train)


# Predict using the test sety_pred = svm_model.predict(X_test)


# Evaluate the modelaccuracy = accuracy_score(y_test, y_pred)
print(f'Accuracy: {accuracy:.2f}')print('\nClassification Report:\n', classification_report(y_test, y_pred))print('\nConfusion Matrix:\n', confusion_matrix(y_test, y_pred))

Explanation:

Dataset: The dataset used here is a CSV file named "spam.csv" that contains labeled email messages. Make sure to have the dataset, or you can use another dataset such as UCI Spam Collection Dataset.
Data Preparation:
- Columns v1 and v2 are renamed to label and message.
- The label is converted to binary: ham as 0 and spam as 1.
TF-IDF Vectorization:
- The text data is converted into numerical features using TfidfVectorizer to represent each message as a weighted word frequency.
Model Training and Evaluation:
- SVM with a linear kernel is used to train the model.
- Evaluation metrics such as accuracy, classification report, and confusion matrix are used to assess the model's performance. Learn more about SVM

A Guide to Email Spam: Staying Safe from Unwanted Messages

Email Spam And Malware Filtering

What is E - Mail Spam?

Why Gmail Says emails are spam

Support Vector Machine Introduction :

Support Vector Machine Hyperplane:

Types Of SVM:

Formulas For Linear SVM:

Code for implementing the spam mail detection system by using the support vector machine :

Explanation:

Posted by Manoj insights

Post a Comment

0 Comments

Advertisement

Header ADS

Followers

Most Popular

Top Vector Databases in 2025: A Comparative Overview

How LLMs Are Transforming Business Operations: The Complete 2025 Guide

AI Tools for Content Creators: YouTube, TikTok, Instagram - Complete Guide 2025

Tags

Footer Menu Widget

Contact form

A Guide to Email Spam: Staying Safe from Unwanted Messages

Email Spam And Malware Filtering

What is E - Mail Spam?

Why Gmail Says emails are spam

Support Vector Machine Introduction :

Support Vector Machine Hyperplane:

Types Of SVM:

Formulas For Linear SVM:

Code for implementing the spam mail detection system by using the support vector machine :

Explanation:

Posted by Manoj insights

You may like these posts

Post a Comment

0 Comments

Social Plugin

Advertisement

Header ADS

Followers

Most Popular

Top Vector Databases in 2025: A Comparative Overview

How LLMs Are Transforming Business Operations: The Complete 2025 Guide

AI Tools for Content Creators: YouTube, TikTok, Instagram - Complete Guide 2025

Tags

Footer Menu Widget

Contact form