| Title: | Robust Pipeline for 'VALD' 'ForceDecks' Data Extraction and Analysis |
|---|---|
| Description: | Provides a robust and reproducible pipeline for extracting, cleaning, and analyzing athlete performance data generated by 'VALD' 'ForceDecks' systems. The package supports batch-oriented data processing for large datasets, standardized data transformation workflows, and visualization utilities for sports science research and performance monitoring. It is designed to facilitate reproducible analysis across multiple sports with comprehensive documentation and error handling. |
| Authors: | Praveen D Chougale [aut, cre], Usha Ananthakumar [aut] |
| Maintainer: | Praveen D Chougale <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.1.1 |
| Built: | 2026-05-23 06:45:10 UTC |
| Source: | https://github.com/praveenmaths89/vald.extractor |
Applies regex-based pattern matching to standardize inconsistent sport/team naming conventions into a clean categorical variable. This is the core "value-add" for multi-sport organizations where team names may vary (e.g., "Football", "Soccer", "FSI" all map to "Football").
classify_sports( data, group_col = "all_group_names", output_col = "sports_clean" )classify_sports( data, group_col = "all_group_names", output_col = "sports_clean" )
data |
Data frame containing athlete metadata. |
group_col |
Character. Name of the column containing group/team names. Default is "all_group_names". |
output_col |
Character. Name for the new standardized sports column. Default is "sports_clean". |
Classify Sports from Group Names
Data frame with an additional column containing standardized sports categories.
if (FALSE) { metadata <- standardize_vald_metadata(profiles, groups) metadata <- classify_sports(metadata) table(metadata$sports_clean) }if (FALSE) { metadata <- standardize_vald_metadata(profiles, groups) metadata <- classify_sports(metadata) table(metadata$sports_clean) }
Implements chunked trial extraction from VALD ForceDecks API with fault-tolerant error handling. This function prevents timeout errors and memory issues when working with large datasets by processing data in manageable chunks.
fetch_vald_batch(start_date, chunk_size = 100, verbose = TRUE)fetch_vald_batch(start_date, chunk_size = 100, verbose = TRUE)
start_date |
Character string in ISO 8601 format (e.g., "2020-01-01T00:00:00Z"). The starting date for data extraction. |
chunk_size |
Integer. Number of tests to process per batch. Default is 100. Reduce this value if you experience timeout errors. |
verbose |
Logical. If TRUE, prints progress messages. Default is TRUE. |
Fetch VALD ForceDecks Data in Batches
This function first retrieves all test metadata, then iterates through tests in chunks to fetch associated trial data. Each chunk is wrapped in a tryCatch block to ensure that errors in one chunk do not halt the entire extraction process.
The chunking strategy is essential for large organizations with thousands of tests, as it prevents API timeout errors and reduces memory pressure.
A list containing two data frames:
Data frame of all tests metadata
Data frame of all trials (individual repetitions) data
if (FALSE) { # Set VALD credentials first valdr::set_credentials( client_id = "your_client_id", client_secret = "your_client_secret", tenant_id = "your_tenant_id", region = "aue" ) # Fetch data from 2020 onwards in chunks of 100 vald_data <- fetch_vald_batch( start_date = "2020-01-01T00:00:00Z", chunk_size = 100 ) # Access tests and trials tests_df <- vald_data$tests trials_df <- vald_data$trials }if (FALSE) { # Set VALD credentials first valdr::set_credentials( client_id = "your_client_id", client_secret = "your_client_secret", tenant_id = "your_tenant_id", region = "aue" ) # Fetch data from 2020 onwards in chunks of 100 vald_data <- fetch_vald_batch( start_date = "2020-01-01T00:00:00Z", chunk_size = 100 ) # Access tests and trials tests_df <- vald_data$tests trials_df <- vald_data$trials }
Authenticates with VALD API using OAuth2 client credentials flow and retrieves complete athlete profile and group membership data. This function handles token management, pagination, and robust JSON parsing.
fetch_vald_metadata( client_id, client_secret, tenant_id, region = "aue", verbose = TRUE )fetch_vald_metadata( client_id, client_secret, tenant_id, region = "aue", verbose = TRUE )
client_id |
Character. Your VALD API client ID. |
client_secret |
Character. Your VALD API client secret. |
tenant_id |
Character. Your VALD tenant ID. |
region |
Character. VALD region code (e.g., "aue" for Australia East). Default is "aue". |
verbose |
Logical. If TRUE, prints progress messages. Default is TRUE. |
Fetch VALD Metadata via OAuth2
A list containing two data frames:
Complete athlete profile data
Group/team membership data
if (FALSE) { metadata <- fetch_vald_metadata( client_id = "your_client_id", client_secret = "your_client_secret", tenant_id = "your_tenant_id" ) profiles <- metadata$profiles groups <- metadata$groups }if (FALSE) { metadata <- fetch_vald_metadata( client_id = "your_client_id", client_secret = "your_client_secret", tenant_id = "your_tenant_id" ) profiles <- metadata$profiles groups <- metadata$groups }
Allows users to provide an external Excel or CSV file containing corrected demographic information (e.g., sex, date of birth) for athletes with missing or incorrect data in the VALD system. This function merges the corrections and updates the master metadata.
patch_metadata( data, patch_file, patch_sheet = 1, id_col = "profileId", fields_to_patch = c("sex", "dateOfBirth"), verbose = TRUE )patch_metadata( data, patch_file, patch_sheet = 1, id_col = "profileId", fields_to_patch = c("sex", "dateOfBirth"), verbose = TRUE )
data |
Data frame. Master metadata or analysis dataset. |
patch_file |
Character. Path to Excel (.xlsx) or CSV (.csv) file containing corrections. |
patch_sheet |
Character or integer. For Excel files, which sheet to read. Default is 1 (first sheet). |
id_col |
Character. Name of the ID column in both |
fields_to_patch |
Character vector. Column names to update from the patch file. Default is c("sex", "dateOfBirth"). |
verbose |
Logical. If TRUE, prints progress messages. Default is TRUE. |
Patch Missing Metadata from External File
Data frame with patched metadata.
if (FALSE) { # Create an Excel file with columns: profileId, sex, dateOfBirth # Then patch the metadata patched_data <- patch_metadata( data = athlete_metadata, patch_file = "corrections.xlsx", fields_to_patch = c("sex", "dateOfBirth") ) # Check results table(patched_data$sex) }if (FALSE) { # Create an Excel file with columns: profileId, sex, dateOfBirth # Then patch the metadata patched_data <- patch_metadata( data = athlete_metadata, patch_file = "corrections.xlsx", fields_to_patch = c("sex", "dateOfBirth") ) # Check results table(patched_data$sex) }
Creates boxplots to compare performance metrics across different groups (e.g., sports, sex, teams). Useful for benchmarking and identifying performance differences between populations.
plot_vald_compare( data, metric_col, group_col = "sports", fill_col = "sex", title = NULL, y_label = NULL )plot_vald_compare( data, metric_col, group_col = "sports", fill_col = "sex", title = NULL, y_label = NULL )
data |
Data frame. Test data with grouping variables and metrics. |
metric_col |
Character. Name of the metric to plot. |
group_col |
Character. Primary grouping variable (x-axis). Default is "sports". |
fill_col |
Character. Optional fill color grouping (e.g., "sex"). Default is "sex". |
title |
Character. Plot title. If NULL, auto-generates from metric name. |
y_label |
Character. Y-axis label. If NULL, uses metric_col. |
Compare Performance Across Groups
A ggplot2 object.
if (FALSE) { test_datasets <- split_by_test(final_analysis_data) # Compare CMJ peak force across sports and sex plot_vald_compare( data = test_datasets$CMJ, metric_col = "PEAK_FORCE_Both", group_col = "sports", fill_col = "sex", title = "Peak Force Comparison by Sport and Sex" ) }if (FALSE) { test_datasets <- split_by_test(final_analysis_data) # Compare CMJ peak force across sports and sex plot_vald_compare( data = test_datasets$CMJ, metric_col = "PEAK_FORCE_Both", group_col = "sports", fill_col = "sex", title = "Peak Force Comparison by Sport and Sex" ) }
Creates professional line plots showing how performance metrics change over time for individual athletes or groups. Useful for tracking training adaptations, injury recovery, and seasonal trends.
plot_vald_trends( data, date_col = "Testdate", metric_col, group_col = NULL, facet_col = NULL, title = NULL, smooth = FALSE )plot_vald_trends( data, date_col = "Testdate", metric_col, group_col = NULL, facet_col = NULL, title = NULL, smooth = FALSE )
data |
Data frame. Test data with a date column and at least one metric. |
date_col |
Character. Name of the date column. Default is "Testdate". |
metric_col |
Character. Name of the metric to plot. |
group_col |
Character. Optional grouping variable (e.g., "profileId", "sports"). If provided, separate lines are drawn for each group. |
facet_col |
Character. Optional faceting variable (e.g., "sex"). Creates separate panels for each level. |
title |
Character. Plot title. If NULL, auto-generates from metric name. |
smooth |
Logical. If TRUE, adds a smoothed trend line. Default is FALSE. |
Plot Longitudinal Trends for VALD Metrics
A ggplot2 object.
if (FALSE) { test_datasets <- split_by_test(final_analysis_data) # Plot individual athlete trends plot_vald_trends( data = test_datasets$CMJ, metric_col = "PEAK_FORCE_Both", group_col = "profileId", facet_col = "sex" ) # Plot sport-level averages sport_avg <- test_datasets$CMJ %>% group_by(Testdate, sports) %>% summarise(avg_force = mean(PEAK_FORCE_Both, na.rm = TRUE)) plot_vald_trends( data = sport_avg, date_col = "Testdate", metric_col = "avg_force", group_col = "sports" ) }if (FALSE) { test_datasets <- split_by_test(final_analysis_data) # Plot individual athlete trends plot_vald_trends( data = test_datasets$CMJ, metric_col = "PEAK_FORCE_Both", group_col = "profileId", facet_col = "sex" ) # Plot sport-level averages sport_avg <- test_datasets$CMJ %>% group_by(Testdate, sports) %>% summarise(avg_force = mean(PEAK_FORCE_Both, na.rm = TRUE)) plot_vald_trends( data = sport_avg, date_col = "Testdate", metric_col = "avg_force", group_col = "sports" ) }
Takes a master wide-format dataset and returns a named list of data frames, one per test type (e.g., CMJ, DJ, ISO). Crucially, this function automatically strips the test-type suffix from column names within each sub-dataframe, enabling generic analysis code that works across all test types.
This implements the "DRY" (Don't Repeat Yourself) principle by allowing users to write one analysis function that works for any test type.
split_by_test(data, metadata_cols = NULL, verbose = TRUE)split_by_test(data, metadata_cols = NULL, verbose = TRUE)
data |
Data frame. Wide-format test data with columns ending in test type suffixes (e.g., "PEAK_FORCE_Both_CMJ"). |
metadata_cols |
Character vector. Column names to retain as metadata in each split dataset. Default includes common identifiers and demographics. |
verbose |
Logical. If TRUE, prints progress messages. Default is TRUE. |
Split Wide-Format Data by Test Type
Named list of data frames, one per test type. Each data frame contains:
All metadata columns
Test-specific metrics with suffixes removed (e.g., "PEAK_FORCE_Both")
if (FALSE) { # After joining tests, trials, and metadata into wide format test_datasets <- split_by_test( data = final_analysis_data, metadata_cols = c("profileId", "sex", "Testdate", "age", "sports") ) # Access individual test datasets cmj_data <- test_datasets$CMJ dj_data <- test_datasets$DJ # Note: Column names are now generic (e.g., "PEAK_FORCE_Both" not "PEAK_FORCE_Both_CMJ") # This allows you to write one function that works for all test types }if (FALSE) { # After joining tests, trials, and metadata into wide format test_datasets <- split_by_test( data = final_analysis_data, metadata_cols = c("profileId", "sex", "Testdate", "age", "sports") ) # Access individual test datasets cmj_data <- test_datasets$CMJ dj_data <- test_datasets$DJ # Note: Column names are now generic (e.g., "PEAK_FORCE_Both" not "PEAK_FORCE_Both_CMJ") # This allows you to write one function that works for all test types }
Processes raw profile and group data to create a clean, analysis-ready metadata table. Unnests group memberships, concatenates group names, and applies sports classification logic.
standardize_vald_metadata(profiles, groups, verbose = TRUE)standardize_vald_metadata(profiles, groups, verbose = TRUE)
profiles |
Data frame. Raw profile data from |
groups |
Data frame. Raw group data from |
verbose |
Logical. If TRUE, prints progress messages. Default is TRUE. |
Standardize VALD Metadata
A data frame with one row per athlete containing:
Unique athlete identifier
Athlete names
Demographic information
Comma-separated list of all group memberships
Comma-separated list of all group IDs
if (FALSE) { metadata <- fetch_vald_metadata(client_id, client_secret, tenant_id) clean_metadata <- standardize_vald_metadata( profiles = metadata$profiles, groups = metadata$groups ) }if (FALSE) { metadata <- fetch_vald_metadata(client_id, client_secret, tenant_id) clean_metadata <- standardize_vald_metadata( profiles = metadata$profiles, groups = metadata$groups ) }
Creates a comprehensive summary table showing mean, standard deviation, coefficient of variation, and sample size for all numeric performance metrics. Can be grouped by test type, sex, sport, or any combination thereof.
summary_vald_metrics( data, group_vars = c("sex", "sports"), exclude_cols = c("profileId", "athleteId", "testId", "Testdate", "dateofbirth", "age", "Weight_on_Test_Day"), digits = 2 )summary_vald_metrics( data, group_vars = c("sex", "sports"), exclude_cols = c("profileId", "athleteId", "testId", "Testdate", "dateofbirth", "age", "Weight_on_Test_Day"), digits = 2 )
data |
Data frame. Test data (typically from |
group_vars |
Character vector. Variables to group by. Default is c("sex", "sports"). |
exclude_cols |
Character vector. Column names to exclude from summary (typically metadata columns). Default includes common ID and date fields. |
digits |
Integer. Number of decimal places for rounding. Default is 2. |
Generate Summary Statistics for VALD Metrics
Data frame with summary statistics (Mean, SD, CV, N) for each metric and grouping combination.
if (FALSE) { test_datasets <- split_by_test(final_analysis_data) cmj_summary <- summary_vald_metrics( data = test_datasets$CMJ, group_vars = c("sex", "sports") ) print(cmj_summary) }if (FALSE) { test_datasets <- split_by_test(final_analysis_data) cmj_summary <- summary_vald_metrics( data = test_datasets$CMJ, group_vars = c("sex", "sports") ) print(cmj_summary) }