DAZL Documentation | Data Analytics A-to-Z Processing Language


Contents

trainModel - Parameter Tuning Cheat Sheet

machine learning

slug: reference-trainmodel-parameter-tuning-cheat-sheet

practical quick-reference table training ML models, showing which parameters to adjust, for which model types, and what effect they typically have.


trainModel Parameter Tuning Cheat Sheet

Parameter Model Type Purpose / Effect Guidance / Tuning Tips
learning_rate Linear, Logistic Controls the step size during gradient descent Lower → more stable but slower; Higher → faster convergence but may overshoot
max_iterations Linear, Logistic Maximum training iterations for gradient descent Increase if model hasn’t converged; decrease if convergence is fast
normalize Linear, Logistic, k-NN Scale numeric features Keep true for k-NN or gradient-based models; ensures balanced feature contribution
k k-NN Number of neighbors Smaller → sensitive to noise (overfit); Larger → smoother predictions (underfit)
distance_metric k-NN How distances are calculated euclidean (default) or manhattan; affects neighbor selection
categorical All Columns treated as categorical 'auto' usually works; specify manually if automatic detection fails
missing_values All Handling of missing data 'error' → fail on missing; 'ignore' → skip rows; 'impute' → fill missing values
test_size All Fraction of data reserved for testing Use 0.1–0.3; smaller datasets may require less, larger datasets more
random_state All Seed for reproducibility Pick any integer; keeps train/test splits and model initialization consistent
params Model-specific Hyperparameters for optimization e.g., linear regression → learning_rate, max_iterations; tune incrementally for best performance

Suggested Workflow for Tuning

  1. Baseline Run

    • Train with default parameters
    • Evaluate test set metrics (MAE, MSE, R² for regression; accuracy/F1 for classification)
  2. Adjust Parameters Incrementally

    • Learning rate & max_iterations for gradient-based models
    • k & distance_metric for k-NN
    • Missing value handling or normalization if performance is poor
  3. Evaluate Effects

    • Compare metrics on training vs test set
    • Detect overfitting (high train, low test) or underfitting (low train, low test)
  4. Feature Engineering

    • Add or remove features to improve predictive power
    • Transform skewed distributions if needed
  5. Document Each Run

    • Record parameter settings and resulting metrics
    • Helps identify the combination that generalizes best