Learning Machine Learning from Scratch: Spam Classifier

🚀 Learning Machine Learning from Scratch: Spam Classifier ✉️🤖

Learning machine learning doesn’t have to be complicated. An ideal starter project is to build a spam email classifier using the Enron dataset.

✅ Text processing (tokenization, stemming, lemmatization)
🔢 Turning words into numbers with TF‑IDF
📊 Training a simple model such as Naive Bayes or even an SVM
📈 Evaluating with accuracy, precision, recall and F1‑score

from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
import glob

# cargar correos y etiquetas (spam/ham)
emails, labels = [], []
for path in glob.glob("enron/**/*.txt", recursive=True):
    with open(path, errors='ignore') as f:
        text = f.read()
    emails.append(text)
    labels.append("spam" if "spam" in path else "ham")

# vectorizar y entrenar
X = TfidfVectorizer(stop_words="english").fit_transform(emails)
X_train, X_test, y_train, y_test = train_test_split(X, labels, test_size=0.2)
model = MultinomialNB().fit(X_train, y_train)
print(classification_report(y_test, model.predict(X_test)))

📝 Explanation in a Few Words

Imagine each email is a cooking recipe. First we break the recipe into words (tokenization) and convert them into numbers based on how frequent they are (TF‑IDF). Then an algorithm like Naive Bayes learns which words tend to appear in “spam” emails. After training, you just feed a new email to the model and it will tell you whether it’s junk or not. ✅

This approach teaches you key NLP and ML concepts in a practical way, and in no time you’ll have your first classifier working! 💡✨



































  
    
      
    
  



  
  More information at the link 👇



  

  
  
    

    
    
  

  
  
    
      
    
  

  
  

  
    

      

      
        
          
            Enron Email Dataset
          

          
        

        
          
          
            www.cs.cmu.edu ↗
          
        
      
    
  






































  
    
      
    
  





  

  
  
    
      
    

    
    
      
    
  

  
  
    
  

  
  
    
  

  
    

      
        
          
        
      

      
        
          
            7 Beginner Machine Learning Projects To Complete This Weekend - KDnuggets
          

          
            
              In this article, we will explore seven simple machine learning projects that will help you learn important skills and improve your career.
            
          
        

        
          
          
            www.kdnuggets.com ↗
          
        
      
    
  












  
    

  

  Also published on LinkedIn.

Author

Juan Pedro Bretti Mandarano