statistical reasoning and data analysis
statistical primitive
slug: topic-map-statistical-primitive-statistical-reasoning-and-data-analysis
Vocabulary:
- Population: Complete set of all items of interest
- Sample: Subset of population selected for analysis
- Parameter: Numerical characteristic of a population
- Statistic: Numerical characteristic of a sample
- Random variable: Variable whose value is subject to randomness
- Probability distribution: Function describing likelihood of outcomes
- Probability density function (PDF): Function for continuous distributions
- Probability mass function (PMF): Function for discrete distributions
- Cumulative distribution function (CDF): Probability X ≤ x
- Expected value: Long-run average of random variable
- Variance: Measure of spread around the mean
- Standard deviation: Square root of variance
- Covariance: Measure of joint variability between two variables
- Correlation: Standardized measure of linear association
- Independence: Events where occurrence of one doesn't affect the other
- Conditional probability: Probability of A given B has occurred
- Bayes' theorem: Method for updating probabilities with new evidence
- Likelihood: Probability of observing data given parameters
- Prior distribution: Initial belief about parameters before seeing data
- Posterior distribution: Updated belief after observing data
- Conjugate prior: Prior that yields posterior in same family
- Central Limit Theorem: Distribution of sample means approaches normal
- Law of Large Numbers: Sample average converges to expected value
- Sampling distribution: Distribution of a statistic across samples
- Standard error: Standard deviation of sampling distribution
- Bias: Systematic deviation from true value
- Unbiased estimator: Estimator whose expected value equals parameter
- Consistency: Estimator converges to true value as n increases
- Efficiency: Estimator with smallest variance among unbiased estimators
- Sufficient statistic: Captures all information about parameter
- Confidence interval: Range likely to contain true parameter
- Confidence level: Probability that interval contains parameter
- Coverage probability: True proportion of intervals containing parameter
- Hypothesis test: Statistical procedure to evaluate claims
- Null hypothesis: Statement of no effect or no difference
- Alternative hypothesis: Statement of effect or difference
- Test statistic: Value computed from sample data for testing
- p-value: Probability of observing data as extreme under null
- Significance level (alpha): Threshold for rejecting null hypothesis
- Type I error: Rejecting true null hypothesis (false positive)
- Type II error: Failing to reject false null hypothesis (false negative)
- Statistical power: Probability of rejecting false null hypothesis
- Effect size: Magnitude of difference or relationship
- Multiple testing correction: Adjustment for testing many hypotheses
- False discovery rate (FDR): Expected proportion of false positives
- Familywise error rate (FWER): Probability of any false positive
- Bonferroni correction: Conservative multiple testing adjustment
- Permutation test: Nonparametric test using resampling
- Bootstrap: Resampling method for estimating distributions
- Jackknife: Resampling by leaving out one observation
- Cross-validation: Method for assessing model performance
- Overfitting: Model captures noise rather than signal
- Underfitting: Model too simple to capture patterns
- Bias-variance tradeoff: Balance between systematic error and variability
- Degrees of freedom: Number of independent pieces of information
- Residual: Difference between observed and predicted values
- Leverage: Influence of observation on fitted values
- Influential point: Observation with large effect on analysis
- Outlier: Observation far from others in dataset
- Robust statistics: Methods resistant to outliers
- Heteroscedasticity: Non-constant variance of errors
- Homoscedasticity: Constant variance of errors
- Autocorrelation: Correlation of variable with itself over time
- Stationarity: Statistical properties don't change over time
- Seasonality: Patterns that repeat at regular intervals
- Trend: Long-term movement in time series
- Confounding variable: Variable that affects both predictor and outcome
- Mediator: Variable through which effect operates
- Moderator: Variable that affects strength of relationship
- Interaction effect: Combined effect differs from sum of main effects
- Simpson's paradox: Trend reverses when data are aggregated
- Ecological fallacy: Inferring individual from aggregate relationships
- Regression to the mean: Extreme values tend toward average on retest
- Selection bias: Non-random sample leads to systematic error
- Survivorship bias: Analyzing only surviving cases
- Measurement error: Difference between measured and true value
- Reliability: Consistency of measurement
- Validity: Whether measurement captures intended construct
- Sensitivity: True positive rate
- Specificity: True negative rate
- Precision (PPV): Proportion of positive predictions that are correct
- Recall: Same as sensitivity
- F1 score: Harmonic mean of precision and recall
- ROC curve: Plot of true vs false positive rates
- AUC: Area under ROC curve
- Likelihood ratio: Ratio of probabilities under two hypotheses
- Information criterion: Measure balancing fit and complexity
- AIC: Akaike Information Criterion
- BIC: Bayesian Information Criterion
- Maximum likelihood estimation (MLE): Finding parameters that maximize likelihood
- Method of moments: Equating sample and population moments
- Least squares: Minimizing sum of squared residuals
- Regularization: Adding penalty to prevent overfitting
- Ridge regression: L2 penalty on coefficients
- Lasso: L1 penalty promoting sparsity
- Elastic net: Combination of L1 and L2 penalties
- Shrinkage: Pulling estimates toward central value
- Empirical Bayes: Using data to estimate prior distribution
- Hierarchical model: Model with multiple levels of variation
- Mixed effects model: Model with fixed and random effects
- Random effects: Effects varying across groups
- Fixed effects: Effects constant across groups
- Marginal effect: Effect of one variable holding others constant
- Counterfactual: What would have happened under different conditions
- Propensity score: Probability of receiving treatment given covariates
- Instrumental variable: Variable affecting outcome only through treatment
- Difference-in-differences: Method comparing changes across groups
- Regression discontinuity: Exploiting threshold in treatment assignment
- Matching: Pairing similar units across treatment groups
- Causal inference: Determining cause-effect relationships
- Directed acyclic graph (DAG): Graph representing causal relationships
- Conditional independence: Independence given another variable
- Collider: Variable caused by two other variables
- Backdoor path: Non-causal path between variables
- Front-door criterion: Identifying causal effects through mediators
- Identifiability: Ability to estimate parameters from data
- Estimand: Quantity we want to estimate
- Estimator: Method or formula for estimation
- Estimate: Numerical result from applying estimator to data
- Score function: Derivative of log-likelihood
- Fisher information: Expected squared score
- Cramér-Rao bound: Lower bound on estimator variance
- Delta method: Approximating distribution of function of estimator
- Wald test: Test based on maximum likelihood estimates
- Likelihood ratio test: Comparing nested models
- Score test: Test based on score function
- Goodness of fit: How well model matches data
- Residual analysis: Examining errors for patterns
- Diagnostic plots: Visual checks of model assumptions
- Q-Q plot: Comparing quantiles to check distributional assumptions
- Leverage plot: Identifying influential observations
- Cook's distance: Measure of observation's influence
- VIF (Variance Inflation Factor): Measure of multicollinearity
- Parsimony: Preference for simpler models
- Occam's razor: Simpler explanations are preferable
- Box-Cox transformation: Family of power transformations
- Logit transformation: Log odds transformation
- Z-score: Standardized value (x - mean) / SD
- Percentile: Value below which percentage of data falls
- Quantile: Generalization of percentile
- Interquartile range (IQR): Difference between 75th and 25th percentiles
- Median absolute deviation (MAD): Robust measure of spread
- Skewness: Measure of asymmetry
- Kurtosis: Measure of tail heaviness
- Moment: Expected value of power of variable
- Moment-generating function: Function encoding all moments
- Characteristic function: Fourier transform of distribution
- Convolution: Distribution of sum of independent variables
- Mixture distribution: Weighted combination of distributions
- Censoring: Observation partially known (e.g., survival past time)
- Truncation: Observations outside range not recorded
- Missing data mechanism: Process generating missingness
- Missing completely at random (MCAR): Missingness unrelated to data
- Missing at random (MAR): Missingness depends on observed data
- Missing not at random (MNAR): Missingness depends on unobserved data
- Imputation: Filling in missing values
- Multiple imputation: Creating multiple completed datasets
- Sensitivity analysis: Examining robustness to assumptions
- Meta-analysis: Statistical synthesis of multiple studies
- Effect heterogeneity: Variation in effects across studies
- Publication bias: Tendency to publish significant results
- Funnel plot: Visual check for publication bias
- Random effects meta-analysis: Modeling between-study variation
- Fixed effects meta-analysis: Assuming common true effect
Concepts:
- Statistical thinking vs deterministic thinking
- Variability as fundamental property of data
- Signal vs noise distinction
- Uncertainty quantification and propagation
- Sampling as basis for inference
- Representative sampling challenges
- Randomization as foundation of inference
- Probability as formalization of uncertainty
- Frequentist interpretation of probability
- Bayesian interpretation of probability
- Subjective vs objective probability
- Aleatory vs epistemic uncertainty
- Law of total probability
- Conditional independence structures
- Exchangeability of observations
- Sufficiency and data reduction
- Ancillary statistics
- Completeness of statistic families
- Information loss in summarization
- Minimal sufficient statistics
- Exponential families of distributions
- Location-scale families
- Transformation of random variables
- Jacobian in transformations
- Order statistics and their distributions
- Asymptotic theory and approximations
- Convergence in probability
- Convergence in distribution
- Almost sure convergence
- Consistency of estimators
- Asymptotic normality
- Delta method for variance approximation
- Slutsky's theorem
- Continuous mapping theorem
- Large sample theory
- Efficiency and relative efficiency
- Cramér-Rao lower bound
- Optimal estimation theory
- Decision theory framework
- Loss functions and risk
- Admissibility of estimators
- Minimax decision rules
- Bayes estimators
- Empirical Bayes methods
- James-Stein estimator and shrinkage
- Stein's paradox
- Hypothesis testing logic
- Neyman-Pearson framework
- Likelihood principle
- Evidential paradigm
- Multiple comparisons problem
- Sequential testing
- Interim analysis considerations
- Stopping rules
- Optional stopping problem
- Pre-registration and registered reports
- Exploratory vs confirmatory analysis
- Data dredging and p-hacking
- HARKing (Hypothesizing After Results Known)
- Researcher degrees of freedom
- Replication crisis and reproducibility
- Statistical vs practical significance
- Clinical vs statistical significance
- Equivalence testing
- Non-inferiority testing
- Bayesian hypothesis testing
- Bayes factors
- Prior elicitation
- Prior sensitivity analysis
- Conjugacy and computational convenience
- Noninformative priors
- Jeffreys prior
- Reference priors
- Maximum entropy priors
- Empirical priors from data
- Hierarchical priors
- Markov Chain Monte Carlo (MCMC)
- Gibbs sampling
- Metropolis-Hastings algorithm
- Hamiltonian Monte Carlo
- Variational inference
- Approximate Bayesian computation
- Posterior predictive checking
- Model comparison via marginal likelihood
- Model averaging
- Model selection vs model averaging
- Information criteria philosophy
- Cross-validation strategies
- Leave-one-out cross-validation
- K-fold cross-validation
- Time series cross-validation
- Nested vs non-nested models
- Parsimony principle
- Model complexity penalties
- Regularization philosophy
- Bias-variance decomposition
- Ensemble methods rationale
- Bootstrap aggregating (bagging)
- Random subspace methods
- Out-of-bag error estimation
- Bootstrap confidence intervals
- Percentile bootstrap
- BCa (bias-corrected and accelerated) bootstrap
- Parametric bootstrap
- Permutation tests logic
- Exact tests vs asymptotic tests
- Monte Carlo hypothesis testing
- Resampling-based inference
- Robust statistics philosophy
- Breakdown point
- Influence functions
- M-estimation
- Rank-based methods
- Nonparametric statistics
- Distribution-free methods
- Kernel density estimation
- Bandwidth selection
- Smoothing parameters
- Local polynomial regression
- Splines and basis functions
- Generalized additive models philosophy
- Semiparametric models
- Functional data analysis concepts
- High-dimensional statistics
- Curse of dimensionality
- Dimension reduction strategies
- Feature selection vs extraction
- Sparse estimation
- Variable screening
- False discovery rate control
- Multiple testing frameworks
- Closed testing procedures
- Holm-Bonferroni method
- Benjamini-Hochberg procedure
- q-values
- Local FDR
- Sequential vs simultaneous inference
- Experimental design principles
- Randomization rationale
- Blocking to reduce variation
- Factorial designs
- Fractional factorial designs
- Confounding in designs
- Aliasing of effects
- Resolution of designs
- Optimal design theory
- D-optimality, A-optimality
- Adaptive designs
- Sequential experimental design
- Response surface methodology
- Latin square designs
- Crossover designs
- Split-plot designs
- Repeated measures designs
- Power analysis and sample size
- Minimal detectable effect
- Precision-based sample size
- Adaptive sample size determination
- Interim analyses and alpha spending
- Group sequential designs
- Futility stopping
- Sample size re-estimation
- Observational study design
- Cohort studies
- Case-control studies
- Cross-sectional studies
- Ecological studies
- Natural experiments
- Quasi-experimental designs
- Instrumental variables intuition
- Regression discontinuity intuition
- Difference-in-differences logic
- Synthetic control methods
- Causal graphs and d-separation
- Identifying assumptions
- Ignorability assumption
- Exchangeability in causal inference
- Positivity assumption
- Consistency assumption (SUTVA)
- Potential outcomes framework
- Counterfactual reasoning
- Average treatment effect (ATE)
- Average treatment on treated (ATT)
- Local average treatment effect (LATE)
- Conditional average treatment effect (CATE)
- Heterogeneous treatment effects
- Subgroup analysis
- Interaction effects interpretation
- Effect modification
- Mediation analysis
- Direct vs indirect effects
- Path analysis
- Structural equation modeling concepts
- Latent variable models
- Factor analysis logic
- Measurement models
- Structural models
- Identification in SEMs
- Model fit indices
- Modification indices
- Time series concepts
- Autocorrelation structure
- Partial autocorrelation
- Stationarity and differencing
- Unit root testing
- Cointegration
- ARIMA models
- Seasonal ARIMA
- State space models
- Kalman filtering
- Exponential smoothing
- Holt-Winters method
- Spectral analysis
- Fourier analysis
- Wavelet analysis
- Change point detection
- Structural breaks
- Intervention analysis
- Transfer function models
- Vector autoregression (VAR)
- Granger causality
- Impulse response functions
- Forecast accuracy measures
- Forecast intervals
- Prediction intervals vs confidence intervals
- Forecast combination
- Longitudinal data structure
- Panel data concepts
- Within vs between variation
- Fixed effects logic
- Random effects logic
- Hausman test intuition
- Clustered standard errors
- Robust variance estimation
- Sandwich estimators
- Generalized estimating equations (GEE)
- Working correlation structures
- Missing data challenges
- Listwise deletion consequences
- Multiple imputation philosophy
- Proper vs improper imputation
- Imputation model specification
- Auxiliary variables in imputation
- Pattern mixture models
- Selection models
- Sensitivity to missingness assumptions
- Measurement error effects
- Attenuation bias
- Classical measurement error
- Berkson measurement error
- Errors-in-variables models
- Instrumental variables for measurement error
- Regression calibration
- SIMEX (simulation-extrapolation)
- Validation study designs
- Reliability vs validity
- Construct validity
- Criterion validity
- Content validity
- Test-retest reliability
- Inter-rater reliability
- Internal consistency
- Cohen's kappa
- Intraclass correlation
- Cronbach's alpha
- Item response theory
- Differential item functioning
- Meta-analysis rationale
- Fixed vs random effects in meta-analysis
- Heterogeneity assessment
- I-squared statistic
- Tau-squared
- Meta-regression
- Publication bias assessment
- Trim and fill method
- Egger's test
- P-curve analysis
- Cumulative meta-analysis
- Prospective meta-analysis
- Individual participant data meta-analysis
- Network meta-analysis
- Indirect comparisons
- Transitivity assumption
Procedures:
- Exploratory Data Analysis (EDA):
- Examine data structure and types
- Calculate summary statistics
- Identify data quality issues
- Check for outliers and anomalies
- Visualize distributions
- Explore relationships between variables
- Identify patterns and anomalies
- Document findings and questions
- Formulate hypotheses for testing
- Data cleaning and preparation:
- Handle missing values
- Identify and treat outliers
- Check for data entry errors
- Validate data against expectations
- Transform variables as needed
- Create derived variables
- Encode categorical variables
- Normalize or standardize if needed
- Split data for validation
- Assessing distributional assumptions:
- Create histograms and density plots
- Generate Q-Q plots
- Perform Shapiro-Wilk test
- Conduct Kolmogorov-Smirnov test
- Check Anderson-Darling test
- Examine skewness and kurtosis
- Consider transformations if needed
- Use robust methods if assumptions violated
- Hypothesis test execution:
- State null and alternative hypotheses
- Choose appropriate test
- Check test assumptions
- Set significance level
- Calculate test statistic
- Determine p-value
- Make decision about null hypothesis
- Calculate confidence interval
- Report effect size
- Interpret results in context
- Sample size calculation:
- Specify hypotheses or precision goal
- Determine significance level (alpha)
- Specify desired power (1-beta)
- Estimate effect size from literature or pilot
- Account for expected attrition
- Consider design complexity (clustering, etc.)
- Calculate required sample size
- Assess feasibility
- Document assumptions
- Power analysis:
- Specify sample size
- Define effect size of interest
- Set significance level
- Calculate power
- Create power curves
- Assess sensitivity to assumptions
- Consider minimal detectable effect
- Bootstrap confidence intervals:
- Draw B bootstrap samples with replacement
- Calculate statistic for each sample
- Sort bootstrap statistics
- Extract percentile-based intervals
- Or calculate BCa intervals with bias correction
- Assess interval stability
- Report bootstrap SE and CI
- Permutation test:
- Calculate observed test statistic
- Randomly permute group labels (or data)
- Recalculate test statistic
- Repeat many times (e.g., 10,000)
- Compare observed to permutation distribution
- Calculate p-value as proportion as extreme
- Assess sensitivity to number of permutations
- Cross-validation procedure:
- Partition data into K folds
- For each fold:
- Train model on K-1 folds
- Validate on held-out fold
- Record performance metric
- Average performance across folds
- Calculate standard error of performance
- Select model with best CV performance
- Refit on full dataset if needed
- Multiple testing correction:
- Identify family of tests
- Choose correction method (Bonferroni, FDR, etc.)
- Calculate adjusted p-values or critical values
- Apply decision rule
- Report both raw and adjusted p-values
- Interpret significant findings
- Consider power loss from correction
- Model diagnostics for regression:
- Plot residuals vs fitted values
- Create Q-Q plot of residuals
- Check scale-location plot
- Identify influential points (Cook's D)
- Calculate VIF for multicollinearity
- Perform Durbin-Watson test for autocorrelation
- Test for heteroscedasticity (Breusch-Pagan)
- Consider remedies if assumptions violated
- Variable selection:
- Assess univariate associations
- Check for multicollinearity
- Use domain knowledge for inclusion
- Apply stepwise selection (with caution)
- Use penalized regression (lasso/elastic net)
- Cross-validate selection process
- Consider stability of selection
- Report selection process clearly
- Comparing nested models:
- Fit reduced (simpler) model
- Fit full (more complex) model
- Calculate likelihood ratio statistic
- Determine degrees of freedom
- Find p-value from chi-square distribution
- Or use AIC/BIC for comparison
- Consider parsimony principle
- Validate on held-out data
- Time series analysis workflow:
- Plot time series
- Check for stationarity (visual and tests)
- Difference if needed
- Examine ACF and PACF
- Identify potential ARIMA orders
- Fit candidate models
- Check residual diagnostics
- Compare models (AIC, BIC)
- Validate with forecast accuracy
- Generate forecasts with intervals
- Propensity score analysis:
- Identify confounders
- Fit propensity score model
- Check covariate balance
- Trim if needed for overlap
- Apply matching, weighting, or stratification
- Check balance after adjustment
- Estimate treatment effect
- Conduct sensitivity analysis
- Report assumptions and limitations
- Missing data handling:
- Assess missingness patterns
- Determine missingness mechanism (MCAR/MAR/MNAR)
- Choose handling strategy
- For multiple imputation:
- Specify imputation model
- Include auxiliary variables
- Generate m imputed datasets
- Analyze each dataset
- Pool results using Rubin's rules
- Conduct sensitivity analysis
- Report missingness and approach
- Meta-analysis procedure:
- Define inclusion criteria
- Search literature systematically
- Extract effect sizes and SEs
- Assess study quality/risk of bias
- Check for heterogeneity
- Fit fixed or random effects model
- Create forest plot
- Assess publication bias (funnel plot, tests)
- Conduct sensitivity analyses
- Report with PRISMA guidelines
- Bayesian analysis workflow:
- Specify likelihood
- Choose prior distributions
- Check prior predictive distributions
- Fit model (MCMC or variational)
- Check convergence diagnostics (R-hat, ESS)
- Examine trace plots
- Check posterior predictive distributions
- Summarize posterior (mean, median, intervals)
- Conduct sensitivity to prior
- Report full posterior, not just point estimates
- Causal inference with DAGs:
- Draw causal DAG based on domain knowledge
- Identify confounders, mediators, colliders
- Determine minimal adjustment set
- Check if effect is identifiable
- Assess backdoor criterion
- Consider front-door criterion if needed
- Implement adjustment strategy
- Estimate causal effect
- Conduct sensitivity analysis
- Report causal assumptions explicitly
- Designing an experiment:
- Define research question clearly
- Identify primary outcome
- Specify treatment/intervention
- Determine experimental units
- Plan randomization scheme
- Calculate sample size
- Design data collection protocol
- Plan statistical analysis in advance
- Pre-register if appropriate
- Consider pilot study
Topics:
- Fundamentals of probability theory
- Combinatorics and counting methods
- Random variables and distributions
- Discrete probability distributions
- Continuous probability distributions
- Multivariate distributions
- Joint, marginal, and conditional distributions
- Transformations of random variables
- Moment generating functions
- Characteristic functions
- Order statistics
- Sampling distributions
- Central Limit Theorem and applications
- Law of Large Numbers
- Convergence concepts in probability
- Point estimation theory
- Maximum likelihood estimation
- Method of moments
- Bayesian estimation
- Properties of estimators
- Sufficiency and completeness
- Cramér-Rao lower bound
- Asymptotic theory of estimation
- Robust estimation methods
- M-estimation and robust regression
- Interval estimation
- Confidence intervals construction
- Bayesian credible intervals
- Bootstrap confidence intervals
- Prediction intervals
- Tolerance intervals
- Hypothesis testing foundations
- Neyman-Pearson theory
- Likelihood ratio tests
- Wald tests
- Score tests
- Multiple testing procedures
- False discovery rate control
- Sequential testing methods
- Equivalence and non-inferiority testing
- Bayesian hypothesis testing
- Parametric hypothesis tests
- t-tests and variations
- ANOVA (one-way, two-way, repeated measures)
- ANCOVA
- MANOVA
- Chi-square tests
- Tests for proportions
- Tests for variance
- F-tests
- Nonparametric tests
- Sign test
- Wilcoxon signed-rank test
- Mann-Whitney U test
- Kruskal-Wallis test
- Friedman test
- Rank correlation tests
- Kolmogorov-Smirnov test
- Anderson-Darling test
- Shapiro-Wilk test
- Simple linear regression
- Multiple linear regression
- Polynomial regression
- Regression diagnostics
- Residual analysis
- Influential observations
- Multicollinearity
- Variable selection methods
- Ridge regression
- Lasso regression
- Elastic net
- Principal component regression
- Partial least squares regression
- Generalized linear models
- Logistic regression
- Poisson regression
- Negative binomial regression
- Probit regression
- Ordinal regression
- Multinomial regression
- Zero-inflated models
- Hurdle models
- Survival analysis
- Kaplan-Meier estimation
- Cox proportional hazards
- Parametric survival models
- Competing risks
- Time-varying covariates
- Frailty models
- Longitudinal data analysis
- Mixed effects models
- Random intercept and slope models
- Growth curve modeling
- Generalized estimating equations
- Time series analysis
- ARIMA models
- Seasonal decomposition
- State space models
- GARCH models
- Vector autoregression
- Cointegration analysis
- Forecasting methods
- Exponential smoothing
- Structural time series models
- Multivariate analysis
- Principal component analysis
- Factor analysis
- Discriminant analysis
- Canonical correlation
- Multidimensional scaling
- Correspondence analysis
- Cluster analysis
- Hierarchical clustering
- K-means clustering
- Model-based clustering
- Density-based clustering
- Dimensionality reduction techniques
- Causal inference methods
- Potential outcomes framework
- Propensity score methods
- Instrumental variables
- Regression discontinuity designs
- Difference-in-differences
- Synthetic controls
- Mediation analysis
- Directed acyclic graphs
- Structural equation modeling
- Path analysis
- Measurement models
- Latent variable models
- Experimental design
- Randomized controlled trials
- Factorial designs
- Fractional factorial designs
- Response surface methodology
- Optimal designs
- Sequential designs
- Adaptive designs
- Crossover designs
- Latin squares
- Observational study designs
- Cohort studies
- Case-control studies
- Cross-sectional studies
- Survey sampling methods
- Simple random sampling
- Stratified sampling
- Cluster sampling
- Multistage sampling
- Sampling weights
- Survey variance estimation
- Nonresponse adjustment
- Resampling methods
- Bootstrap methods
- Jackknife methods
- Permutation tests
- Cross-validation techniques
- Monte Carlo methods
- Simulation studies
- Bayesian computational methods
- Markov Chain Monte Carlo
- Gibbs sampling
- Metropolis-Hastings
- Hamiltonian Monte Carlo
- Variational inference
- Approximate Bayesian computation
- Prior elicitation
- Posterior predictive checking
- Model selection and averaging
- Information criteria (AIC, BIC)
- Cross-validation for model selection
- Bayesian model selection
- Model averaging strategies
- Missing data methods
- Multiple imputation
- Maximum likelihood with missing data
- Inverse probability weighting
- Pattern mixture models
- Selection models
- Measurement error models
- Errors-in-variables regression
- Regression calibration
- SIMEX
- Latent class models
- Reliability and validity assessment
- Classical test theory
- Item response theory
- Meta-analysis
- Fixed effects meta-analysis
- Random effects meta-analysis
- Meta-regression
- Publication bias assessment
- Network meta-analysis
- Individual participant data meta-analysis
- Statistical learning theory
- Bias-variance tradeoff
- Regularization methods
- Ensemble methods
- Model interpretation and explanation
- High-dimensional statistics
- Sparse estimation
- Variable screening
- False discovery rate
- Functional data analysis
- Spatial statistics
- Geostatistics
- Kriging
- Spatial point processes
- Spatial regression models
- Nonparametric statistics
- Kernel methods
- Local regression
- Smoothing splines
- Generalized additive models
- Quantile regression
- Robust statistics theory
- Influence functions
- Breakdown points
- Statistical quality control
- Control charts
- Process capability analysis
- Acceptance sampling
- Reliability theory
- Extreme value theory
- Statistical graphics and visualization
- Reproducible research practices
- Statistical computing
- Numerical optimization
- Matrix computations
- Statistical software packages
Categories:
- Probability Theory
- Mathematical Statistics
- Inferential Statistics
- Descriptive Statistics
- Parametric Methods
- Nonparametric Methods
- Regression Analysis
- Time Series Analysis
- Multivariate Statistics
- Bayesian Statistics
- Frequentist Statistics
- Computational Statistics
- Resampling Methods
- Experimental Design
- Survey Methodology
- Causal Inference
- Survival Analysis
- Longitudinal Data Analysis
- Spatial Statistics
- High-Dimensional Statistics
- Robust Statistics
- Missing Data Analysis
- Measurement Theory
- Meta-Analysis
- Statistical Learning
- Stochastic Processes
- Extreme Value Statistics
- Quality Control Statistics
- Biostatistics
- Econometrics
- Psychometrics
- Statistical Computing
Themes:
- Uncertainty quantification and management
- Inference from samples to populations
- Balancing model complexity and interpretability
- Assumptions and their verification
- Robustness to departures from assumptions
- Signal detection in noisy data
- Multiple perspectives on probability and inference
- Trade-offs between different statistical properties
- Importance of study design for valid inference
- Causation vs correlation distinction
- Replication and reproducibility
- Transparency in statistical practice
- Context-dependent choice of methods
- Integration of domain knowledge and data
- Ethics in data analysis and reporting
- Communication of uncertainty
- Computational advances enabling new methods
- Bridging classical and modern approaches
- Handling real-world data complexities
- Adaptation to non-standard data structures
- Unification of statistical frameworks
- Model criticism and validation
- Sensitivity to modeling choices
- Multiplicity and its consequences
- Pre-specification vs exploratory analysis
- Theory-driven vs data-driven approaches
Trends:
- Increased focus on causal inference methods
- Bayesian methods becoming more accessible
- Machine learning integration with statistics
- Emphasis on prediction vs explanation
- High-dimensional and sparse methods
- Advances in computational Bayesian methods
- Reproducibility and replication emphasis
- Pre-registration of analyses
- Open data and open science movement
- Registered reports in journals
- Transparency and robustness checks
- Sensitivity analysis as standard practice
- Multiverse analysis
- Specification curve analysis
- Quantifying researcher degrees of freedom
- Model-agnostic interpretation methods
- Conformal prediction
- Distribution-free inference
- Robust inference without strong assumptions
- Adaptive and sequential designs
- Platform trials
- Master protocols
- Real-world evidence methods
- Integration of multiple data sources
- Privacy-preserving statistical methods
- Differential privacy
- Federated learning
- Statistical methods for algorithmic fairness
- Causal ML and double machine learning
- Targeted learning
- Reinforcement learning for treatment optimization
- Network and graphical models expansion
- Topological data analysis
- Functional data analysis growth
- Distributional regression
- Expectile regression beyond quantiles
- AI-assisted statistical analysis
- Automated model selection and tuning
- Interpretable ML vs black-box trade-offs
- Uncertainty quantification in ML
- Probabilistic programming languages
- Stan, PyMC, TensorFlow Probability
- Cloud-based statistical computing
- Big data statistical methods
- Streaming data analysis
- Online learning and updating
- Spatial and temporal big data
- Integration of structured and unstructured data
- Text as data methods
- Image and video data statistical analysis
- Wearable device data analysis
- Electronic health records analysis
- Environmental and climate statistics
- Statistical methods for complex surveys at scale
- Small area estimation advances
- Synthetic data generation
- Data fusion techniques
Use_cases:
- Clinical trial design and analysis
- Drug efficacy and safety evaluation
- Biomarker discovery and validation
- Genome-wide association studies
- Differential gene expression analysis
- Proteomics and metabolomics data analysis
- Epidemiological outbreak investigation
- Disease surveillance systems
- Risk factor identification
- Diagnostic test evaluation
- Survival and time-to-event analysis
- Meta-analysis of medical literature
- Health policy evaluation
- Quality of care assessment
- A/B testing and experimentation
- Customer segmentation and profiling
- Churn prediction and prevention
- Market mix modeling
- Pricing optimization
- Demand forecasting
- Inventory optimization
- Supply chain analytics
- Credit risk modeling
- Fraud detection
- Algorithmic trading strategies
- Portfolio optimization
- Risk management and VaR estimation
- Econometric modeling
- Policy impact evaluation
- Labor market analysis
- Housing market analysis
- Inflation and growth forecasting
- Survey data analysis
- Election forecasting
- Public opinion polling
- Census data analysis
- Social program evaluation
- Education intervention effectiveness
- Test score analysis and equating
- Learning analytics
- Environmental impact assessment
- Climate change modeling
- Species distribution modeling
- Pollution monitoring and prediction
- Water quality analysis
- Agricultural field trials
- Crop yield prediction
- Weather forecasting
- Reliability analysis for engineering systems
- Quality control in manufacturing
- Process optimization
- Accelerated life testing
- Fatigue analysis
- Six Sigma projects
- Psychometric test development
- Personality assessment validation
- Neuroimaging data analysis
- Behavioral experiment analysis
- Sports analytics and performance evaluation
- Player evaluation and scouting
- Game strategy optimization
- Sensor data analysis for IoT
- Predictive maintenance
- Network traffic analysis
- Cybersecurity threat detection
- Natural language processing applications
- Search engine ranking
- Recommendation systems
- User behavior modeling
- Content optimization
- Social network analysis
- Information diffusion studies
- Archaeological dating and analysis
- Historical data reconstruction
- Legal evidence evaluation
- Forensic statistics
- Insurance pricing and reserves
- Actuarial modeling
- Astronomy and cosmology data analysis
- Particle physics experiments
- Geospatial analysis
- Transportation planning
- Real estate valuation
- Energy consumption modeling