dnxmy package¶

Submodules¶

dnxmy.config_generator module¶

class dnxmy.config_generator.DnxmyConfig¶

Bases: object

add_arma_column(col_name: str, intercept: float = 0, sigma: float = 0, ar_initial: Optional[float] = None, ar_order: Optional[int] = None, ar_params: Optional[list] = None, ar_shock_time: Optional[list] = None, ar_shock_type: Optional[list] = None, ar_shock_value: Optional[list] = None, ma_initial: Optional[float] = None, ma_order: Optional[int] = None, ma_params: Optional[list] = None, ma_shock_time: Optional[list] = None, ma_shock_type: Optional[list] = None, ma_shock_value: Optional[list] = None)¶

Add a column configuration for an ARMA variable.

Parameters

col_name (str) – Name of the column.
intercept (float) – Intercept for the ARMA variable. Defaults to 0.
sigma (float) – Sigma for the ARMA variable. Defaults to 0.
ar_initial (float) – Initial value for the AR part of the ARMA variable. Defaults to None.
ar_order (int) – Order of the AR part of the ARMA variable. Defaults to None.
ar_params (list) – List of AR parameters for the ARMA variable. Defaults to None.
ar_shock_type (list) – List of shock types for the AR part of the ARMA variable. Defaults to None.
ar_shock_value (list) – List of shock values for the AR part of the ARMA variable. Defaults to None.
ma_initial (float) – Initial value for the MA part of the ARMA variable. Defaults to None.
ma_order (int) – Order of the MA part of the ARMA variable. Defaults to None.
ma_params (list) – List of MA parameters for the ARMA variable. Defaults to None.
ma_shock_type (list) – List of shock types for the MA part of the ARMA variable. Defaults to None.
ma_shock_value (list) – List of shock values for the MA part of the ARMA variable. Defaults to None.

add_categorical_column(col_name: str, value: list, probability: list)¶

Add a column configuration for a categorical variable.

Parameters

col_name (str) – Name of the column.
value (list) – List of values of the categorical variable.
probability (list) – List of probabilities of the values.

add_constant_column(col_name: str, constant_value: float = 1)¶

Add a column configuration for a constant variable.

Parameters

col_name (str) – Name of the column.
constant_value (float) – Value of the constant. Defaults to 1.

add_dependent_column(col_name: str, variables: list, beta: list, intercept: float, offset_column: Optional[str] = None, offset_function: str = 'default', link_function: str = 'identity')¶

Add a column configuration for a dependent variable.

Parameters

col_name (str) – Name of the column.
variables (list) – List of columns on which the variable depends.
beta (list) – List of coefficients for the dependent variable.
intercept (float) – Intercept for the dependent variable.
offset_column (str) – Name of the column to use for the offset. Defaults to None.
offset_function (str) – Function to use for the offset. Defaults to ‘default’.
link_function (str) – Link function to use for the dependent variable. Defaults to ‘identity’.

add_independent_column(col_name: str, probability_distribution_type: str = 'uniform', probability_distribution_params: dict = {'high': 1, 'low': 0})¶

Add a column configuration for an independent variable.

Parameters

col_name (str) – Name of the column.
probability_distribution_type (str) – Type of the probability distribution. Defaults to ‘uniform’.
probability_distribution_params (dict, optional) – Parameters of the probability distribution. Defaults to {‘low’: 0, ‘high’: 1}.

add_missing_config(target_col_name: str, missing_type: str = 'MCAR', missing_rate: Optional[float] = None, dependent_on: Optional[str] = None)¶

Generate a missing configuration dictionary.

Parameters

missing_type (str) – Type of missingness. Defaults to ‘MCAR’.
target_column_name (str) – Name of the column to which the missingness is applied.
missing_rate (float) – Missing rate. Defaults to None.
dependent_on (str) – String expressing a missing condition in query string format in pandas.DataFrame.query. Defaults to None.

Returns

Dictionary of missing configurations.

Return type

dict

add_time_part_column(col_name: str, start_time: str, time_unit: str, time_format: str = '%Y-%m-%d')¶

Add a column configuration for a time part variable.

Parameters

col_name (str) – Name of the column.
start_time (str) – Start time of the time variable.
time_unit (str) – Time unit of the time variable.
time_format (str) – Time format of the time variable. Defaults to ‘%Y-%m-%d’.

delete_column_config(col_name: str)¶

Delete a column configuration.

Parameters: col_name (str) – Name of the column.

set_dataset_config(m: int)¶: Set dataset configurations based on the provided configurations or the default configurations.

t_sort()¶: Optimize the order of dataset configuration using topological sorting.

dnxmy.config_generator.generate_default_config(column_config: Optional[dict] = None) → dict¶

Generate a dictionary with default values for column configurations.

Parameters: column_config (dict, optional) – Dictionary containing column configurations. Defaults to None.
Returns: Dictionary with default values for column configurations.
Return type: dict

dnxmy.dnxmy module¶

class dnxmy.dnxmy.Dnxmy(n: int, m: Optional[int] = None, dnxmy_config: Optional[dnxmy.config_generator.DnxmyConfig] = None, seed: int = 0)¶

Bases: object

add_samples(n: int) → pandas.core.frame.DataFrame¶

Add samples to the generated data.

Parameters: n (int) – Number of samples to be added.
Returns: DataFrame containing the generated data with added samples.
Return type: pd.DataFrame

generate() → pandas.core.frame.DataFrame¶

Generate the data based on the provided configurations.

Returns: DataFrame containing the generated data.
Return type: pd.DataFrame

miss() → pandas.core.frame.DataFrame¶

Generate missing values based on the provided configurations.

Returns: DataFrame containing the generated data with missing values.
Return type: pd.DataFrame

dnxmy.variable_generator module¶

dnxmy.variable_generator.generate_arma_samples(col_name: str, time_series_config: dict, n: int) → pandas.core.series.Series¶

Generate a time series(ARMA model) based on the provided time configurations.

Parameters

col_name (str) – Name of the column.
time_series_config (dict) – Dictionary containing the time configurations.
n (int) – Number of samples to generate.

Returns

Series containing the generated samples.

Return type

pd.Series

dnxmy.variable_generator.generate_dependent_samples(col_name: str, dataset_config: list, df, dependent_on: dict, n: int, offset: Optional[dict] = None) → pandas.core.series.Series¶

Generate dependent samples based on the provided dependent_on configurations.

Parameters

col_name (str) – Name of the column.
dataset_config (list) – List containing the dataset configurations.
df (pd.DataFrame) – Dataframe containing the dataset.
dependent_on (dict) – Dictionary containing the dependent_on configurations.
n (int) – Number of samples to generate.
offset (dict, optional) – Dictionary containing the offset configurations. Defaults to None.

Returns

Series containing the generated samples.

Return type

pd.Series

dnxmy.variable_generator.generate_random_samples(col_name: str, probability_distribution: dict, n: int) → pandas.core.series.Series¶

Generate random samples based on the provided probability distribution.

Parameters

col_name (str) – Name of the column.
probability_distribution (dict) – Dictionary containing the probability distribution.
n (int) – Number of samples to be generated.

Returns

Series containing the generated samples.

Return type

pd.Series

dnxmy.variable_generator.generate_time_part(col_name: str, time_part_config: dict, n: int) → pandas.core.series.Series¶

Generate a time series based on the provided time configurations.

Parameters

col_name (str) – Name of the column.
time_part_config (dict) – Dictionary containing the time configurations.
n (int) – Number of samples to generate.

Returns

Series containing the generated time series.

Return type

pd.Series

dnxmy package¶

Submodules¶

dnxmy.config_generator module¶

dnxmy.dnxmy module¶

dnxmy.variable_generator module¶

Module contents¶