The latest release of the baseballr
includes a function for acquiring player statistics from the NCAA’s website for baseball teams
across the three major divisions (I, II, III).
In order to look up teams, you can either load the teams for all
divisions from the baseballr-data
repository or access them
directly from the NCAA website for a given year and division.
Loading from the baseballr-data repository:
library(baseballr)
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
<- load_ncaa_baseball_teams() ncaa_teams_df
From the NCAA website:
try(ncaa_teams(year = most_recent_ncaa_baseball_season(), division = "1"))
#> ── NCAA Baseball Teams data from stats.ncaa.org ───────────── baseballr 1.5.0 ──
#> ℹ Data updated: 2023-03-20 10:00:07 EDT
#> # A tibble: 305 × 8
#> team_id team_name team_url confer…¹ confe…² divis…³ year seaso…⁴
#> <chr> <chr> <chr> <chr> <chr> <chr> <dbl> <chr>
#> 1 140 Cincinnati /team/140/16340 823 AAC 1 2023 16340
#> 2 196 East Carolina /team/196/16340 823 AAC 1 2023 16340
#> 3 288 Houston /team/288/16340 823 AAC 1 2023 16340
#> 4 404 Memphis /team/404/16340 823 AAC 1 2023 16340
#> 5 651 South Fla. /team/651/16340 823 AAC 1 2023 16340
#> 6 718 Tulane /team/718/16340 823 AAC 1 2023 16340
#> 7 128 UCF /team/128/16340 823 AAC 1 2023 16340
#> 8 782 Wichita St. /team/782/16340 823 AAC 1 2023 16340
#> 9 67 Boston College /team/67/16340 821 ACC 1 2023 16340
#> 10 147 Clemson /team/147/16340 821 ACC 1 2023 16340
#> # … with 295 more rows, and abbreviated variable names ¹conference_id,
#> # ²conference, ³division, ⁴season_id
The function, ncaa_team_player_stats()
, requires the
user to pass values for three parameters for the function to work:
team_id
: numerical code used by the NCAA for each school
year
: a four-digit year type
: whether to pull
data for batters or pitchers
If you want to pull batting statistics for Florida State for the 2023 season, you would use the following:
<- ncaa_teams_df %>%
team_id ::filter(.data$team_name == "Florida St.") %>%
dplyr::select("team_id") %>%
dplyr::distinct() %>%
dplyr::pull("team_id")
dplyr
<- most_recent_ncaa_baseball_season()
year
ncaa_team_player_stats(team_id = team_id, year = year, "batting")
#> ── NCAA Baseball Team Batting Stats data from stats.ncaa.org ───────────────────
#> ℹ Data updated: 2023-03-20 10:00:11 EDT
#> # A tibble: 37 × 35
#> year team_name team_id confe…¹ confe…² divis…³ playe…⁴ playe…⁵ playe…⁶ Yr
#> <int> <chr> <dbl> <int> <chr> <dbl> <int> <chr> <chr> <chr>
#> 1 2023 Florida … 234 821 ACC 1 2649339 http:/… Tibbs … So
#> 2 2023 Florida … 234 821 ACC 1 2649334 http:/… Ferrer… So
#> 3 2023 Florida … 234 821 ACC 1 2478605 http:/… Carrio… Jr
#> 4 2023 Florida … 234 821 ACC 1 2468075 http:/… Vincen… Sr
#> 5 2023 Florida … 234 821 ACC 1 2112619 http:/… De Sed… Sr
#> 6 2023 Florida … 234 821 ACC 1 2649307 http:/… Rank, … So
#> 7 2023 Florida … 234 821 ACC 1 2797459 http:/… Smith,… Fr
#> 8 2023 Florida … 234 821 ACC 1 2649340 http:/… Bush, … So
#> 9 2023 Florida … 234 821 ACC 1 2797428 http:/… Kamaka… Fr
#> 10 2023 Florida … 234 821 ACC 1 2797465 http:/… Willia… So
#> # … with 27 more rows, 25 more variables: Pos <chr>, Jersey <chr>, GP <dbl>,
#> # GS <dbl>, BA <dbl>, OBPct <dbl>, SlgPct <dbl>, R <dbl>, AB <dbl>, H <dbl>,
#> # `2B` <dbl>, `3B` <dbl>, TB <dbl>, HR <dbl>, RBI <dbl>, BB <dbl>, HBP <dbl>,
#> # SF <dbl>, SH <dbl>, K <dbl>, DP <dbl>, CS <dbl>, Picked <dbl>, SB <dbl>,
#> # RBI2out <dbl>, and abbreviated variable names ¹conference_id, ²conference,
#> # ³division, ⁴player_id, ⁵player_url, ⁶player_name
The same can be done for pitching, just by changing the
type
parameter:
ncaa_team_player_stats(team_id = team_id, year = year, "pitching")
#> ── NCAA Baseball Team Pitching Stats data from stats.ncaa.org ──────────────────
#> ℹ Data updated: 2023-03-20 10:00:16 EDT
#> # A tibble: 37 × 43
#> year team_name team_id confe…¹ confe…² divis…³ playe…⁴ playe…⁵ playe…⁶ Yr
#> <int> <chr> <dbl> <int> <chr> <dbl> <int> <chr> <chr> <chr>
#> 1 2023 Florida … 234 821 ACC 1 2649339 http:/… Tibbs … So
#> 2 2023 Florida … 234 821 ACC 1 2649334 http:/… Ferrer… So
#> 3 2023 Florida … 234 821 ACC 1 2478605 http:/… Carrio… Jr
#> 4 2023 Florida … 234 821 ACC 1 2468075 http:/… Vincen… Sr
#> 5 2023 Florida … 234 821 ACC 1 2112619 http:/… De Sed… Sr
#> 6 2023 Florida … 234 821 ACC 1 2649307 http:/… Rank, … So
#> 7 2023 Florida … 234 821 ACC 1 2797459 http:/… Smith,… Fr
#> 8 2023 Florida … 234 821 ACC 1 2649340 http:/… Bush, … So
#> 9 2023 Florida … 234 821 ACC 1 2797428 http:/… Kamaka… Fr
#> 10 2023 Florida … 234 821 ACC 1 2797465 http:/… Willia… So
#> # … with 27 more rows, 33 more variables: Pos <chr>, Jersey <chr>, GP <dbl>,
#> # App <dbl>, GS <dbl>, ERA <dbl>, IP <dbl>, H <dbl>, R <dbl>, ER <dbl>,
#> # BB <dbl>, SO <dbl>, SHO <dbl>, BF <dbl>, `P-OAB` <dbl>, `2B-A` <dbl>,
#> # `3B-A` <dbl>, Bk <dbl>, `HR-A` <dbl>, WP <dbl>, HB <dbl>, IBB <dbl>,
#> # `Inh Run` <dbl>, `Inh Run Score` <dbl>, SHA <dbl>, SFA <dbl>,
#> # Pitches <dbl>, GO <dbl>, FO <dbl>, W <dbl>, L <dbl>, SV <dbl>, KL <dbl>,
#> # and abbreviated variable names ¹conference_id, ²conference, ³division, …
Now, the function is dependent on the user knowing the
team_id
used by the NCAA website. Given that, I’ve included
a ncaa_school_id_lu
function so that users can find the
team_id
they need.
Just pass a string to the function and it will return possible matches based on the school’s name:
ncaa_school_id_lu("Vand")
#> ───────────────────────────────────────────────────────────── baseballr 1.5.0 ──
#> # A tibble: 14 × 8
#> team_id team_name team_url conference…¹ confe…² divis…³ year seaso…⁴
#> <dbl> <chr> <chr> <dbl> <chr> <dbl> <dbl> <dbl>
#> 1 736 Vanderbilt /team/736/16340 911 SEC 1 2023 16340
#> 2 736 Vanderbilt /team/736/15860 911 SEC 1 2022 15860
#> 3 736 Vanderbilt /team/736/15580 911 SEC 1 2021 15580
#> 4 736 Vanderbilt /team/736/15204 911 SEC 1 2020 15204
#> 5 736 Vanderbilt /team/736/15204 911 SEC 1 2019 15204
#> 6 736 Vanderbilt /team/736/12973 911 SEC 1 2018 12973
#> 7 736 Vanderbilt /team/736/12560 911 SEC 1 2017 12560
#> 8 736 Vanderbilt /team/736/12360 911 SEC 1 2016 12360
#> 9 736 Vanderbilt /team/736/12080 911 SEC 1 2015 12080
#> 10 736 Vanderbilt /team/736/11620 911 SEC 1 2014 11620
#> 11 736 Vanderbilt /team/736/11320 911 SEC 1 2013 11320
#> 12 736 Vanderbilt /team/736/10942 911 SEC 1 2012 10942
#> 13 736 Vanderbilt /team/736/10561 911 SEC 1 2011 10561
#> 14 736 Vanderbilt /team/736/10240 911 SEC 1 2010 10240
#> # … with abbreviated variable names ¹conference_id, ²conference, ³division,
#> # ⁴season_id