% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/undirected_dcsbm.R
\name{dcsbm}
\alias{dcsbm}
\title{Create an undirected degree corrected stochastic blockmodel object}
\usage{
dcsbm(
  n = NULL,
  theta = NULL,
  k = NULL,
  B = NULL,
  ...,
  pi = rep(1/k, k),
  sort_nodes = TRUE,
  force_identifiability = FALSE,
  poisson_edges = TRUE,
  allow_self_loops = TRUE
)
}
\arguments{
\item{n}{(degree heterogeneity) The number of nodes in the blockmodel.
Use when you don't want to specify the degree-heterogeneity
parameters \code{theta} by hand. When \code{n} is specified, \code{theta}
is randomly generated from a \code{LogNormal(2, 1)} distribution.
This is subject to change, and may not be reproducible.
\code{n} defaults to \code{NULL}. You must specify either \code{n}
or \code{theta}, but not both.}

\item{theta}{(degree heterogeneity) A numeric vector
explicitly specifying the degree heterogeneity
parameters. This implicitly determines the number of nodes
in the resulting graph, i.e. it will have \code{length(theta)} nodes.
Must be positive. Setting to a vector of ones recovers
a stochastic blockmodel without degree correction.
Defaults to \code{NULL}. You must specify either \code{n} or \code{theta},
but not both.}

\item{k}{(mixing matrix) The number of blocks in the blockmodel.
Use when you don't want to specify the mixing-matrix by hand.
When \code{k} is specified, the elements of \code{B} are drawn
randomly from a \code{Uniform(0, 1)} distribution.
This is subject to change, and may not be reproducible.
\code{k} defaults to \code{NULL}. You must specify either \code{k}
or \code{B}, but not both.}

\item{B}{(mixing matrix) A \code{k} by \code{k} matrix of block connection
probabilities. The probability that a node in block \code{i} connects
to a node in community \code{j} is \code{Poisson(B[i, j])}. Must be
a square matrix. \code{matrix} and \code{Matrix} objects are both
acceptable. If \code{B} is not symmetric, it will be
symmetrized via the update \code{B := B + t(B)}. Defaults to \code{NULL}.
You must specify either \code{k} or \code{B}, but not both.}

\item{...}{
  Arguments passed on to \code{\link[=undirected_factor_model]{undirected_factor_model}}
  \describe{
    \item{\code{expected_degree}}{If specified, the desired expected degree
of the graph. Specifying \code{expected_degree} simply rescales \code{S}
to achieve this. Defaults to \code{NULL}. Do not specify both
\code{expected_degree} and \code{expected_density} at the same time.}
    \item{\code{expected_density}}{If specified, the desired expected density
of the graph. Specifying \code{expected_density} simply rescales \code{S}
to achieve this. Defaults to \code{NULL}. Do not specify both
\code{expected_degree} and \code{expected_density} at the same time.}
  }}

\item{pi}{(relative block probabilities) Relative block
probabilities. Must be positive, but do not need to sum
to one, as they will be normalized internally.
Must match the dimensions of \code{B} or \code{k}. Defaults to
\code{rep(1 / k, k)}, or a balanced blocks.}

\item{sort_nodes}{Logical indicating whether or not to sort the nodes
so that they are grouped by block and by \code{theta}. Useful for plotting.
Defaults to \code{TRUE}.}

\item{force_identifiability}{Logical indicating whether or not to
normalize \code{theta} such that it sums to one within each block. Defaults
to \code{FALSE}, since this behavior can be surprise when \code{theta} is set
to a vector of all ones to recover the DC-SBM case.}

\item{poisson_edges}{Logical indicating whether or not
multiple edges are allowed to form between a pair of
nodes. Defaults to \code{TRUE}. When \code{FALSE}, sampling proceeds
as usual, and duplicate edges are removed afterwards. Further,
when \code{FALSE}, we assume that \code{S} specifies a desired between-factor
connection probability, and back-transform this \code{S} to the
appropriate Poisson intensity parameter to approximate Bernoulli
factor connection probabilities. See Section 2.3 of Rohe et al. (2017)
for some additional details.}

\item{allow_self_loops}{Logical indicating whether or not
nodes should be allowed to form edges with themselves.
Defaults to \code{TRUE}. When \code{FALSE}, sampling proceeds allowing
self-loops, and these are then removed after the fact.}
}
\value{
An \code{undirected_dcsbm} S3 object, a subclass of the
\code{\link[=undirected_factor_model]{undirected_factor_model()}} with the following additional
fields:
\itemize{
\item \code{theta}: A numeric vector of degree-heterogeneity parameters.
\item \code{z}: The community memberships of each node, as a \code{\link[=factor]{factor()}}.
The factor will have \code{k} levels, where \code{k} is the number of
communities in the stochastic blockmodel. There will not
always necessarily be observed nodes in each community.
\item \code{pi}: Sampling probabilities for each block.
\item \code{sorted}: Logical indicating where nodes are arranged by
block (and additionally by degree heterogeneity parameter)
within each block.
}
}
\description{
To specify a degree-corrected stochastic blockmodel, you must specify
the degree-heterogeneity parameters (via \code{n} or \code{theta}),
the mixing matrix (via \code{k} or \code{B}), and the relative block
probabilities (optional, via \code{pi}). We provide defaults for most of these
options to enable rapid exploration, or you can invest the effort
for more control over the model parameters. We \strong{strongly recommend}
setting the \code{expected_degree} or \code{expected_density} argument
to avoid large memory allocations associated with
sampling large, dense graphs.
}
\section{Generative Model}{
There are two levels of randomness in a degree-corrected
stochastic blockmodel. First, we randomly chose a block
membership for each node in the blockmodel. This is
handled by \code{dcsbm()}. Then, given these block memberships,
we randomly sample edges between nodes. This second
operation is handled by \code{\link[=sample_edgelist]{sample_edgelist()}},
\code{\link[=sample_sparse]{sample_sparse()}}, \code{\link[=sample_igraph]{sample_igraph()}} and
\code{\link[=sample_tidygraph]{sample_tidygraph()}}, depending depending on your desired
graph representation.
\subsection{Block memberships}{

Let \eqn{z_i} represent the block membership of node \eqn{i}.
To generate \eqn{z_i} we sample from a categorical
distribution (note that this is a special case of a
multinomial) with parameter \eqn{\pi}, such that
\eqn{\pi_i} represents the probability of ending up in
the ith block. Block memberships for each node are independent.
}

\subsection{Degree heterogeneity}{

In addition to block membership, the DCSBM also allows
nodes to have different propensities for edge formation.
We represent this propensity for node \eqn{i} by a positive
number \eqn{\theta_i}. Typically the \eqn{\theta_i} are
constrained to sum to one for identifiability purposes,
but this doesn't really matter during sampling (i.e.
without the sum constraint scaling \eqn{B} and \eqn{\theta}
has the same effect on edge probabilities, but whether
\eqn{B} or \eqn{\theta} is responsible for this change
is uncertain).
}

\subsection{Edge formulation}{

Once we know the block memberships \eqn{z} and the degree
heterogeneity parameters \eqn{theta}, we need one more
ingredient, which is the baseline intensity of connections
between nodes in block \code{i} and block \code{j}. Then each edge
\eqn{A_{i,j}} is Poisson distributed with parameter

\deqn{
  \lambda[i, j] = \theta_i \cdot B_{z_i, z_j} \cdot \theta_j.
}{
  \lambda_{i, j} = \theta[i] * B[z[i], z[j]] * \theta[j].
}
}
}

\examples{

set.seed(27)

lazy_dcsbm <- dcsbm(n = 1000, k = 5, expected_density = 0.01)
lazy_dcsbm

# sometimes you gotta let the world burn and
# sample a wildly dense graph

dense_lazy_dcsbm <- dcsbm(n = 500, k = 3, expected_density = 0.8)
dense_lazy_dcsbm

# explicitly setting the degree heterogeneity parameter,
# mixing matrix, and relative community sizes rather
# than using randomly generated defaults

k <- 5
n <- 1000
B <- matrix(stats::runif(k * k), nrow = k, ncol = k)

theta <- round(stats::rlnorm(n, 2))

pi <- c(1, 2, 4, 1, 1)

custom_dcsbm <- dcsbm(
  theta = theta,
  B = B,
  pi = pi,
  expected_degree = 50
)

custom_dcsbm

edgelist <- sample_edgelist(custom_dcsbm)
edgelist

# efficient eigendecompostion that leverages low-rank structure in
# E(A) so that you don't have to form E(A) to find eigenvectors,
# as E(A) is typically dense. computation is
# handled via RSpectra

population_eigs <- eigs_sym(custom_dcsbm)

}
\seealso{
Other stochastic block models: 
\code{\link{directed_dcsbm}()},
\code{\link{mmsbm}()},
\code{\link{overlapping_sbm}()},
\code{\link{planted_partition}()},
\code{\link{sbm}()}

Other undirected graphs: 
\code{\link{chung_lu}()},
\code{\link{erdos_renyi}()},
\code{\link{mmsbm}()},
\code{\link{overlapping_sbm}()},
\code{\link{planted_partition}()},
\code{\link{sbm}()}
}
\concept{stochastic block models}
\concept{undirected graphs}
