dnxmy package

Submodules

dnxmy.config_generator module

class dnxmy.config_generator.DnxmyConfig

Bases: object

add_arma_column(col_name: str, intercept: float = 0, sigma: float = 0, ar_initial: Optional[float] = None, ar_order: Optional[int] = None, ar_params: Optional[list] = None, ar_shock_time: Optional[list] = None, ar_shock_type: Optional[list] = None, ar_shock_value: Optional[list] = None, ma_initial: Optional[float] = None, ma_order: Optional[int] = None, ma_params: Optional[list] = None, ma_shock_time: Optional[list] = None, ma_shock_type: Optional[list] = None, ma_shock_value: Optional[list] = None)

Add a column configuration for an ARMA variable.

Parameters
  • col_name (str) – Name of the column.

  • intercept (float) – Intercept for the ARMA variable. Defaults to 0.

  • sigma (float) – Sigma for the ARMA variable. Defaults to 0.

  • ar_initial (float) – Initial value for the AR part of the ARMA variable. Defaults to None.

  • ar_order (int) – Order of the AR part of the ARMA variable. Defaults to None.

  • ar_params (list) – List of AR parameters for the ARMA variable. Defaults to None.

  • ar_shock_type (list) – List of shock types for the AR part of the ARMA variable. Defaults to None.

  • ar_shock_value (list) – List of shock values for the AR part of the ARMA variable. Defaults to None.

  • ma_initial (float) – Initial value for the MA part of the ARMA variable. Defaults to None.

  • ma_order (int) – Order of the MA part of the ARMA variable. Defaults to None.

  • ma_params (list) – List of MA parameters for the ARMA variable. Defaults to None.

  • ma_shock_type (list) – List of shock types for the MA part of the ARMA variable. Defaults to None.

  • ma_shock_value (list) – List of shock values for the MA part of the ARMA variable. Defaults to None.

add_categorical_column(col_name: str, value: list, probability: list)

Add a column configuration for a categorical variable.

Parameters
  • col_name (str) – Name of the column.

  • value (list) – List of values of the categorical variable.

  • probability (list) – List of probabilities of the values.

add_constant_column(col_name: str, constant_value: float = 1)

Add a column configuration for a constant variable.

Parameters
  • col_name (str) – Name of the column.

  • constant_value (float) – Value of the constant. Defaults to 1.

add_dependent_column(col_name: str, variables: list, beta: list, intercept: float, offset_column: Optional[str] = None, offset_function: str = 'default', link_function: str = 'identity')

Add a column configuration for a dependent variable.

Parameters
  • col_name (str) – Name of the column.

  • variables (list) – List of columns on which the variable depends.

  • beta (list) – List of coefficients for the dependent variable.

  • intercept (float) – Intercept for the dependent variable.

  • offset_column (str) – Name of the column to use for the offset. Defaults to None.

  • offset_function (str) – Function to use for the offset. Defaults to ‘default’.

  • link_function (str) – Link function to use for the dependent variable. Defaults to ‘identity’.

add_independent_column(col_name: str, probability_distribution_type: str = 'uniform', probability_distribution_params: dict = {'high': 1, 'low': 0})

Add a column configuration for an independent variable.

Parameters
  • col_name (str) – Name of the column.

  • probability_distribution_type (str) – Type of the probability distribution. Defaults to ‘uniform’.

  • probability_distribution_params (dict, optional) – Parameters of the probability distribution. Defaults to {‘low’: 0, ‘high’: 1}.

add_missing_config(target_col_name: str, missing_type: str = 'MCAR', missing_rate: Optional[float] = None, dependent_on: Optional[str] = None)

Generate a missing configuration dictionary.

Parameters
  • missing_type (str) – Type of missingness. Defaults to ‘MCAR’.

  • target_column_name (str) – Name of the column to which the missingness is applied.

  • missing_rate (float) – Missing rate. Defaults to None.

  • dependent_on (str) – String expressing a missing condition in query string format in pandas.DataFrame.query. Defaults to None.

Returns

Dictionary of missing configurations.

Return type

dict

add_time_part_column(col_name: str, start_time: str, time_unit: str, time_format: str = '%Y-%m-%d')

Add a column configuration for a time part variable.

Parameters
  • col_name (str) – Name of the column.

  • start_time (str) – Start time of the time variable.

  • time_unit (str) – Time unit of the time variable.

  • time_format (str) – Time format of the time variable. Defaults to ‘%Y-%m-%d’.

delete_column_config(col_name: str)

Delete a column configuration.

Parameters

col_name (str) – Name of the column.

set_dataset_config(m: int)

Set dataset configurations based on the provided configurations or the default configurations.

t_sort()

Optimize the order of dataset configuration using topological sorting.

dnxmy.config_generator.generate_default_config(column_config: Optional[dict] = None)dict

Generate a dictionary with default values for column configurations.

Parameters

column_config (dict, optional) – Dictionary containing column configurations. Defaults to None.

Returns

Dictionary with default values for column configurations.

Return type

dict

dnxmy.dnxmy module

class dnxmy.dnxmy.Dnxmy(n: int, m: Optional[int] = None, dnxmy_config: Optional[dnxmy.config_generator.DnxmyConfig] = None, seed: int = 0)

Bases: object

add_samples(n: int)pandas.core.frame.DataFrame

Add samples to the generated data.

Parameters

n (int) – Number of samples to be added.

Returns

DataFrame containing the generated data with added samples.

Return type

pd.DataFrame

generate()pandas.core.frame.DataFrame

Generate the data based on the provided configurations.

Returns

DataFrame containing the generated data.

Return type

pd.DataFrame

miss()pandas.core.frame.DataFrame

Generate missing values based on the provided configurations.

Returns

DataFrame containing the generated data with missing values.

Return type

pd.DataFrame

dnxmy.variable_generator module

dnxmy.variable_generator.generate_arma_samples(col_name: str, time_series_config: dict, n: int)pandas.core.series.Series

Generate a time series(ARMA model) based on the provided time configurations.

Parameters
  • col_name (str) – Name of the column.

  • time_series_config (dict) – Dictionary containing the time configurations.

  • n (int) – Number of samples to generate.

Returns

Series containing the generated samples.

Return type

pd.Series

dnxmy.variable_generator.generate_dependent_samples(col_name: str, dataset_config: list, df, dependent_on: dict, n: int, offset: Optional[dict] = None)pandas.core.series.Series

Generate dependent samples based on the provided dependent_on configurations.

Parameters
  • col_name (str) – Name of the column.

  • dataset_config (list) – List containing the dataset configurations.

  • df (pd.DataFrame) – Dataframe containing the dataset.

  • dependent_on (dict) – Dictionary containing the dependent_on configurations.

  • n (int) – Number of samples to generate.

  • offset (dict, optional) – Dictionary containing the offset configurations. Defaults to None.

Returns

Series containing the generated samples.

Return type

pd.Series

dnxmy.variable_generator.generate_random_samples(col_name: str, probability_distribution: dict, n: int)pandas.core.series.Series

Generate random samples based on the provided probability distribution.

Parameters
  • col_name (str) – Name of the column.

  • probability_distribution (dict) – Dictionary containing the probability distribution.

  • n (int) – Number of samples to be generated.

Returns

Series containing the generated samples.

Return type

pd.Series

dnxmy.variable_generator.generate_time_part(col_name: str, time_part_config: dict, n: int)pandas.core.series.Series

Generate a time series based on the provided time configurations.

Parameters
  • col_name (str) – Name of the column.

  • time_part_config (dict) – Dictionary containing the time configurations.

  • n (int) – Number of samples to generate.

Returns

Series containing the generated time series.

Return type

pd.Series

Module contents