In this example we’re going to be using the Eunomia synthetic data.
library(CDMConnector)
library(OmopSketch)
library(PhenotypeR)
library(dplyr)
library(CohortConstructor)
con <- DBI::dbConnect(duckdb::duckdb(),
CDMConnector::eunomiaDir("synpuf-1k", "5.3"))
cdm <- CDMConnector::cdmFromCon(con = con,
cdmName = "Eunomia Synpuf",
cdmSchema = "main",
writeSchema = "main",
achillesSchema = "main")
cdm$injuries <- conceptCohort(cdm = cdm,
conceptSet = list(
"ankle_sprain" = 81151L,
"ankle_fracture" = 4059173L,
"forearm_fracture" = 4278672L,
"hip_fracture" = 4230399L
),
name = "injuries")Although we may have created our study cohort, to inform analytic
decisions and interpretation of results requires an understanding of the
dataset from which it has been derived. The
databaseDiagnostics() function will help us better
understand a data source.
To run database diagnostics we just need to provide our cdm reference to the function.
Database diagnostics builds on OmopSketch package to perform the following analyses:
The output is a summarised result object.
We can use OmopSketch package functions to visualise the results obtained.
|
Database name
|
|
|---|---|
|
Eunomia Synpuf
|
|
|
Diagnostic
|
|
|
databaseDiagnostics
|
|
| Estimate |
Phenotyper version
|
| 0.3.0 | |
| General | |
| Snapshot date | 2026-01-12 |
| Person count | 1,000 |
| Vocabulary version | v5.0 06-AUG-21 |
| Cdm | |
| Source name | Synpuf |
| Version | 5.3 |
| Holder name | ohdsi |
| Release date | 2018-03-15 |
| Description | |
| Documentation reference | |
| Observation period | |
| N | 1,000 |
| Start date | 2008-01-01 |
| End date | 2010-12-31 |
| Cdm source | |
| Type | duckdb |
| Package | CDMConnector |
| Write schema | main |
|
CDM name
|
||||
|---|---|---|---|---|
|
Eunomia Synpuf
|
||||
|
Diagnostic
|
||||
|
databaseDiagnostics
|
||||
| Observation period ordinal | Variable name | Variable level | Estimate name |
Phenotyper version
|
| 0.3.0 | ||||
| all | Number records | – | N | 1,048 |
| Number subjects | – | N | 1,000 | |
| Subjects not in person table | – | N (%) | 0 (0.00%) | |
| Records per person | – | Mean (SD) | 1.05 (0.21) | |
| Median [Q25 - Q75] | 1 [1 - 1] | |||
| Range [min to max] | [1 to 2] | |||
| Duration in days | – | Mean (SD) | 979.71 (262.79) | |
| Median [Q25 - Q75] | 1,096 [1,096 - 1,096] | |||
| Range [min to max] | [1 to 1,096] | |||
| Days to next observation period | – | Mean (SD) | 172.17 (108.35) | |
| Median [Q25 - Q75] | 138 [93 - 254] | |||
| Range [min to max] | [32 to 366] | |||
| Type concept id | Period while enrolled in insurance | N (%) | 1,048 (100.00%) | |
| Start date before birth date | – | N (%) | 0 (0.00%) | |
| End date before start date | – | N (%) | 0 (0.00%) | |
| Column name | Observation period end date | N missing data (%) | 0 (0.00%) | |
| Observation period id | N missing data (%) | 0 (0.00%) | ||
| N zeros (%) | 0 (0.00%) | |||
| Observation period start date | N missing data (%) | 0 (0.00%) | ||
| Period type concept id | N missing data (%) | 0 (0.00%) | ||
| N zeros (%) | 0 (0.00%) | |||
| Person id | N missing data (%) | 0 (0.00%) | ||
| N zeros (%) | 1 (0.10%) | |||
| 1st | Number subjects | – | N | 1,000 |
| Duration in days | – | Mean (SD) | 994.16 (257.95) | |
| Median [Q25 - Q75] | 1,096 [1,096 - 1,096] | |||
| Range [min to max] | [1 to 1,096] | |||
| Days to next observation period | – | Mean (SD) | 172.17 (108.35) | |
| Median [Q25 - Q75] | 138 [93 - 254] | |||
| Range [min to max] | [32 to 366] | |||
| 2nd | Number subjects | – | N | 48 |
| Duration in days | – | Mean (SD) | 678.60 (164.50) | |
| Median [Q25 - Q75] | 730 [730 - 730] | |||
| Range [min to max] | [31 to 730] | |||
| Days to next observation period | – | Mean (SD) | – | |
| Median [Q25 - Q75] | – | |||
| Range [min to max] | – | |||
|
CDM name
|
|||
|---|---|---|---|
|
Eunomia Synpuf
|
|||
|
Diagnostic
|
|||
|
databaseDiagnostics
|
|||
| Variable name | Variable level | Estimate name |
Phenotyper version
|
| 0.3.0 | |||
| Number subjects | – | N | 1,000 |
| Number subjects not in observation | – | N (%) | 0 (0.00%) |
| Sex | Female | N (%) | 498 (49.80%) |
| Male | N (%) | 502 (50.20%) | |
| None | N (%) | 0 (0.00%) | |
| Sex source | Female | N (%) | 498 (49.80%) |
| Male | N (%) | 502 (50.20%) | |
| Race | Black or African American | N (%) | 114 (11.40%) |
| No matching concept | N (%) | 67 (6.70%) | |
| White | N (%) | 819 (81.90%) | |
| Race source | Black | N (%) | 114 (11.40%) |
| Hispanic | N (%) | 17 (1.70%) | |
| Others | N (%) | 50 (5.00%) | |
| White | N (%) | 819 (81.90%) | |
| Ethnicity | Hispanic or Latino | N (%) | 17 (1.70%) |
| No matching concept | N (%) | 50 (5.00%) | |
| Not Hispanic or Latino | N (%) | 933 (93.30%) | |
| Ethnicity source | Hispanic or Latino | N (%) | 17 (1.70%) |
| Not Hispanic or Latino | N (%) | 933 (93.30%) | |
| Others | N (%) | 50 (5.00%) | |
| Year of birth | – | Missing (%) | 0 (0.00%) |
| Median [Q25 - Q75] | 1,935 [1,928 - 1,942] | ||
| 90% Range [Q05 to Q95] | 1,919 to 1,962 | ||
| Range [min to max] | 1,909 to 1,983 | ||
| Month of birth | – | Missing (%) | 0 (0.00%) |
| Median [Q25 - Q75] | 7 [3 - 10] | ||
| 90% Range [Q05 to Q95] | 1 to 12 | ||
| Range [min to max] | 1 to 12 | ||
| Day of birth | – | Missing (%) | 0 (0.00%) |
| Median [Q25 - Q75] | 1 [1 - 1] | ||
| 90% Range [Q05 to Q95] | 1 to 1 | ||
| Range [min to max] | 1 to 1 | ||
| Location | – | Missing (%) | 0 (0.00%) |
| Zero count (%) | 1 (0.10%) | ||
| Distinct values | 626 | ||
| Provider | – | Missing (%) | 1,000 (100.00%) |
| Zero count (%) | 0 (0.00%) | ||
| Distinct values | 1 | ||
| Care site | – | Missing (%) | 1,000 (100.00%) |
| Zero count (%) | 0 (0.00%) | ||
| Distinct values | 1 | ||
|
Database name
|
|||
|---|---|---|---|
|
Eunomia Synpuf
|
|||
|
Diagnostic
|
|||
|
databaseDiagnostics
|
|||
| Variable name | Variable level | Estimate name |
Phenotyper version
|
| 0.3.0 | |||
| condition_occurrence | |||
| Number records | – | N | 160,322 |
| Number subjects | – | N (%) | 836 (100.00%) |
| Subjects not in person table | – | N (%) | 0 (0.00%) |
| Records per person | – | Mean (SD) | 191.77 (148.24) |
| Median [Q25 - Q75] | 160 [76 - 278] | ||
| Range [min to max] | [1 to 765] | ||
| In observation | Yes | N (%) | 160,322 (100.00%) |
| Domain | Condition | N (%) | 159,651 (99.58%) |
| Observation | N (%) | 666 (0.42%) | |
| Procedure | N (%) | 5 (0.00%) | |
| Source vocabulary | Icd9cm | N (%) | 160,322 (100.00%) |
| Standard concept | S | N (%) | 153,665 (95.85%) |
| – | N (%) | 6,657 (4.15%) | |
| Type concept id | Carrier claim detail - 10th position | N (%) | 308 (0.19%) |
| Carrier claim detail - 11th position | N (%) | 210 (0.13%) | |
| Carrier claim detail - 12th position | N (%) | 170 (0.11%) | |
| Carrier claim detail - 13th position | N (%) | 122 (0.08%) | |
| Carrier claim detail - 1st position | N (%) | 34,738 (21.67%) | |
| Carrier claim detail - 2nd position | N (%) | 14,405 (8.99%) | |
| Carrier claim detail - 3rd position | N (%) | 7,484 (4.67%) | |
| Carrier claim detail - 4th position | N (%) | 4,235 (2.64%) | |
| Carrier claim detail - 5th position | N (%) | 2,598 (1.62%) | |
| Carrier claim detail - 6th position | N (%) | 1,595 (0.99%) | |
| Carrier claim detail - 7th position | N (%) | 876 (0.55%) | |
| Carrier claim detail - 8th position | N (%) | 599 (0.37%) | |
| Carrier claim detail - 9th position | N (%) | 420 (0.26%) | |
| Carrier claim header - 1st position | N (%) | 34,477 (21.50%) | |
| Carrier claim header - 2nd position | N (%) | 19,670 (12.27%) | |
| Carrier claim header - 3rd position | N (%) | 11,387 (7.10%) | |
| Carrier claim header - 4th position | N (%) | 6,436 (4.01%) | |
| Carrier claim header - 5th position | N (%) | 828 (0.52%) | |
| Carrier claim header - 6th position | N (%) | 468 (0.29%) | |
| Carrier claim header - 7th position | N (%) | 238 (0.15%) | |
| Carrier claim header - 8th position | N (%) | 177 (0.11%) | |
| Inpatient header - 10th position | N (%) | 31 (0.02%) | |
| Inpatient header - 11th position | N (%) | 476 (0.30%) | |
| Inpatient header - 1st position | N (%) | 515 (0.32%) | |
| Inpatient header - 2nd position | N (%) | 491 (0.31%) | |
| Inpatient header - 3rd position | N (%) | 490 (0.31%) | |
| Inpatient header - 4th position | N (%) | 466 (0.29%) | |
| Inpatient header - 5th position | N (%) | 450 (0.28%) | |
| Inpatient header - 6th position | N (%) | 424 (0.26%) | |
| Inpatient header - 7th position | N (%) | 397 (0.25%) | |
| Inpatient header - 8th position | N (%) | 371 (0.23%) | |
| Inpatient header - 9th position | N (%) | 335 (0.21%) | |
| Outpatient header - 10th position | N (%) | 8 (0.00%) | |
| Outpatient header - 11th position | N (%) | 1,225 (0.76%) | |
| Outpatient header - 1st position | N (%) | 4,977 (3.10%) | |
| Outpatient header - 2nd position | N (%) | 3,161 (1.97%) | |
| Outpatient header - 3rd position | N (%) | 2,004 (1.25%) | |
| Outpatient header - 4th position | N (%) | 1,202 (0.75%) | |
| Outpatient header - 5th position | N (%) | 715 (0.45%) | |
| Outpatient header - 6th position | N (%) | 460 (0.29%) | |
| Outpatient header - 7th position | N (%) | 315 (0.20%) | |
| Outpatient header - 8th position | N (%) | 213 (0.13%) | |
| Outpatient header - 9th position | N (%) | 155 (0.10%) | |
| Start date before birth date | – | N (%) | 0 (0.00%) |
| End date before start date | – | N (%) | 0 (0.00%) |
| Column name | Condition concept id | N missing data (%) | 0 (0.00%) |
| N zeros (%) | 0 (0.00%) | ||
| Condition end date | N missing data (%) | 0 (0.00%) | |
| Condition end datetime | N missing data (%) | 160,322 (100.00%) | |
| Condition occurrence id | N missing data (%) | 0 (0.00%) | |
| N zeros (%) | 1 (0.00%) | ||
| Condition source concept id | N missing data (%) | 0 (0.00%) | |
| N zeros (%) | 0 (0.00%) | ||
| Condition source value | N missing data (%) | 0 (0.00%) | |
| Condition start date | N missing data (%) | 0 (0.00%) | |
| Condition start datetime | N missing data (%) | 160,322 (100.00%) | |
| Condition status concept id | N missing data (%) | 160,322 (100.00%) | |
| N zeros (%) | 0 (0.00%) | ||
| Condition status source value | N missing data (%) | 160,322 (100.00%) | |
| Condition type concept id | N missing data (%) | 0 (0.00%) | |
| N zeros (%) | 0 (0.00%) | ||
| Person id | N missing data (%) | 0 (0.00%) | |
| N zeros (%) | 191 (0.12%) | ||
| Provider id | N missing data (%) | 0 (0.00%) | |
| N zeros (%) | 73,682 (45.96%) | ||
| Stop reason | N missing data (%) | 160,322 (100.00%) | |
| Visit detail id | N missing data (%) | 160,322 (100.00%) | |
| N zeros (%) | 0 (0.00%) | ||
| Visit occurrence id | N missing data (%) | 0 (0.00%) | |
| N zeros (%) | 10 (0.01%) | ||