Chinese Numerals in R 中文數字處理

CRAN_Status_Badge License: MIT

This R package provides useful functions to work with Chinese numerals in R, such as conversion between Chinese numerals and Arabic numerals as well as detection and extraction of Chinese numerals in character objects and string.

cnum supports:

cnum 是協助處理中文數字的R套件,提供轉換、識別及抽取中文數字的函數。

本套件支援:

cnum 是协助处理中文数字的R包,提供转换、识别及抽取中文数字的函数。

本包支援:

Installing

install.packages("cnum")

# To install the development version:
#install.packages("devtools")
devtools::install_github("elgarteo/cnum")

Using

To convert from Arabic numerals to Chinese numerals:

num2c(721)
#> [1] "七百二十一"
num2c(-6)
#> [1] "負六"
num2c(3.14)
#> [1] "三點一四"
num2c(721, literal = TRUE) # instead of "七百二十一"
#> [1] "七二一"
num2c(1.45e12, financial = TRUE) # 大寫數字
#> [1] "壹兆肆仟伍佰億"
num2c(6.85e12, lang = "sc", mode = "casualPRC")
#> [1] "六万亿八千五百亿"
num2c(1.5e9, mode = "SIprefixPRC", single = TRUE) # instead of "一吉五百兆"
#> [1] "一點五吉"

To convert from Chinese numerals to Arabic numerals:

c2num("七百二十一")
#> [1] 721
c2num("負六")
#> [1] -6
c2num("三點一四")
#> [1] 3.14
c2num("七二一", literal = TRUE)
#> [1] 721
c2num("壹兆肆仟伍佰億", financial = TRUE)
#> [1] 1.45e+12
c2num("六万亿八千五百亿", lang = "sc", mode = "casualPRC")
#> [1] 6.85e+12
c2num("一點五吉", mode = "SIprefix")
#> [1] 1.5e+09

To detect Chinese numerals in character objects or string:

is_cnum("七百二十一")
#> [1] TRUE
is_cnum(c("非數字", "七百二十一"))
#> [1] FALSE  TRUE
is_cnum("七千三") # passes the casual test
#> [1] TRUE
is_cnum("七千三", strict = TRUE) # fails the strict test
#> [1] FALSE
is_cnum("壹兆肆仟伍佰億", financial = TRUE)
#> [1] TRUE
is_cnum("六万亿八千五百亿", lang = "sc", mode = "casualPRC")
#> [1] TRUE

has_cnum("加價一百八十元")
#> [1] TRUE
has_cnum(c("非數字", "加價一百八十元"))
#>         非數字 加價一百八十元 
#>          FALSE           TRUE

To extract Chinese numerals from string:

extract_cnum("分別加價一百八十元和五十六元")
#> [[1]]
#> [1] "一百八十" "五十六"
extract_cnum("十四亿人", lang = "sc")
#> [[1]]
#> [1] "十四亿"
extract_cnum("六万亿元", lang = "sc", mode = "casualPRC")
#> [[1]]
#> [1] "六万亿"
extract_cnum(c("加價一百八十元", "加價五十六元"))
#> [[1]]
#> [1] "一百八十"
#> 
#> [[2]]
#> [1] "五十六"
extract_cnum(c("他是第一個", "我是第二個"), prefix = "第", suffix = "個")
#> [[1]]
#> [1] "第一個"
#> 
#> [[2]]
#> [1] "第二個"

The default language is Traditional Chinese. This can be changed by setting:

options(cnum.lang = "sc")

To-Do