santoku santoku logo

CRAN status Lifecycle: experimental CRAN Downloads Per Month R-universe R-CMD-check AppVeyor build status Codecov test coverage

santoku is a versatile cutting tool for R. It provides chop(), a replacement for base::cut().

Installation

Install from r-universe:

install.packages("santoku", repos = c("https://hughjonesd.r-universe.dev", 
                                      "https://cloud.r-project.org"))

Or from CRAN:

install.packages("santoku")

Or get the development version from github:

# install.packages("remotes")
remotes::install_github("hughjonesd/santoku")

Advantages

Here are some advantages of santoku:

These advantages make santoku especially useful for exploratory analysis, where you may not know the range of your data in advance.

Examples

library(santoku)

chop returns a factor:

chop(1:5, c(2, 4))
#> [1] [1, 2) [2, 4) [2, 4) [4, 5] [4, 5]
#> Levels: [1, 2) [2, 4) [4, 5]

Include a number twice to match it exactly:

chop(1:5, c(2, 2, 4))
#> [1] [1, 2) {2}    (2, 4) [4, 5] [4, 5]
#> Levels: [1, 2) {2} (2, 4) [4, 5]

Use names in breaks for labels:

chop(1:5, c(Low = 1, Mid = 2, High = 4))
#> [1] Low  Mid  Mid  High High
#> Levels: Low Mid High

Or use lbl_* functions:

chop(1:5, c(2, 4), labels = lbl_dash())
#> [1] 1—2 2—4 2—4 4—5 4—5
#> Levels: 1—2 2—4 4—5

Chop into fixed-width intervals:

chop_width(runif(10), 0.1)
#>  [1] [0.1399, 0.2399) [0.5399, 0.6399) [0.5399, 0.6399) [0.5399, 0.6399)
#>  [5] [0.6399, 0.7399) [0.3399, 0.4399) [0.8399, 0.9399] [0.8399, 0.9399]
#>  [9] [0.5399, 0.6399) [0.1399, 0.2399)
#> 5 Levels: [0.1399, 0.2399) [0.3399, 0.4399) ... [0.8399, 0.9399]

Or into fixed-size groups:

chop_n(1:10, 5)
#>  [1] [1, 6)  [1, 6)  [1, 6)  [1, 6)  [1, 6)  [6, 10] [6, 10] [6, 10] [6, 10]
#> [10] [6, 10]
#> Levels: [1, 6) [6, 10]

Chop dates by calendar month, then tabulate:

library(lubridate)
#> 
#> Attaching package: 'lubridate'
#> The following objects are masked from 'package:base':
#> 
#>     date, intersect, setdiff, union

dates <- as.Date("2021-12-31") + 1:90

tab_width(dates, months(1), labels = lbl_discrete(fmt = "%d %b"))
#> 01 Jan—31 Jan 01 Feb—28 Feb 01 Mar—31 Mar 
#>            31            28            31

For more information, see the vignette.