dnxmy package¶
Submodules¶
dnxmy.config_generator module¶
- class dnxmy.config_generator.DnxmyConfig¶
Bases:
object- add_arma_column(col_name: str, intercept: float = 0, sigma: float = 0, ar_initial: Optional[float] = None, ar_order: Optional[int] = None, ar_params: Optional[list] = None, ar_shock_time: Optional[list] = None, ar_shock_type: Optional[list] = None, ar_shock_value: Optional[list] = None, ma_initial: Optional[float] = None, ma_order: Optional[int] = None, ma_params: Optional[list] = None, ma_shock_time: Optional[list] = None, ma_shock_type: Optional[list] = None, ma_shock_value: Optional[list] = None)¶
Add a column configuration for an ARMA variable.
- Parameters
col_name (str) – Name of the column.
intercept (float) – Intercept for the ARMA variable. Defaults to 0.
sigma (float) – Sigma for the ARMA variable. Defaults to 0.
ar_initial (float) – Initial value for the AR part of the ARMA variable. Defaults to None.
ar_order (int) – Order of the AR part of the ARMA variable. Defaults to None.
ar_params (list) – List of AR parameters for the ARMA variable. Defaults to None.
ar_shock_type (list) – List of shock types for the AR part of the ARMA variable. Defaults to None.
ar_shock_value (list) – List of shock values for the AR part of the ARMA variable. Defaults to None.
ma_initial (float) – Initial value for the MA part of the ARMA variable. Defaults to None.
ma_order (int) – Order of the MA part of the ARMA variable. Defaults to None.
ma_params (list) – List of MA parameters for the ARMA variable. Defaults to None.
ma_shock_type (list) – List of shock types for the MA part of the ARMA variable. Defaults to None.
ma_shock_value (list) – List of shock values for the MA part of the ARMA variable. Defaults to None.
- add_categorical_column(col_name: str, value: list, probability: list)¶
Add a column configuration for a categorical variable.
- Parameters
col_name (str) – Name of the column.
value (list) – List of values of the categorical variable.
probability (list) – List of probabilities of the values.
- add_constant_column(col_name: str, constant_value: float = 1)¶
Add a column configuration for a constant variable.
- Parameters
col_name (str) – Name of the column.
constant_value (float) – Value of the constant. Defaults to 1.
- add_dependent_column(col_name: str, variables: list, beta: list, intercept: float, offset_column: Optional[str] = None, offset_function: str = 'default', link_function: str = 'identity')¶
Add a column configuration for a dependent variable.
- Parameters
col_name (str) – Name of the column.
variables (list) – List of columns on which the variable depends.
beta (list) – List of coefficients for the dependent variable.
intercept (float) – Intercept for the dependent variable.
offset_column (str) – Name of the column to use for the offset. Defaults to None.
offset_function (str) – Function to use for the offset. Defaults to ‘default’.
link_function (str) – Link function to use for the dependent variable. Defaults to ‘identity’.
- add_independent_column(col_name: str, probability_distribution_type: str = 'uniform', probability_distribution_params: dict = {'high': 1, 'low': 0})¶
Add a column configuration for an independent variable.
- Parameters
col_name (str) – Name of the column.
probability_distribution_type (str) – Type of the probability distribution. Defaults to ‘uniform’.
probability_distribution_params (dict, optional) – Parameters of the probability distribution. Defaults to {‘low’: 0, ‘high’: 1}.
- add_missing_config(target_col_name: str, missing_type: str = 'MCAR', missing_rate: Optional[float] = None, dependent_on: Optional[str] = None)¶
Generate a missing configuration dictionary.
- Parameters
missing_type (str) – Type of missingness. Defaults to ‘MCAR’.
target_column_name (str) – Name of the column to which the missingness is applied.
missing_rate (float) – Missing rate. Defaults to None.
dependent_on (str) – String expressing a missing condition in query string format in pandas.DataFrame.query. Defaults to None.
- Returns
Dictionary of missing configurations.
- Return type
dict
- add_time_part_column(col_name: str, start_time: str, time_unit: str, time_format: str = '%Y-%m-%d')¶
Add a column configuration for a time part variable.
- Parameters
col_name (str) – Name of the column.
start_time (str) – Start time of the time variable.
time_unit (str) – Time unit of the time variable.
time_format (str) – Time format of the time variable. Defaults to ‘%Y-%m-%d’.
- delete_column_config(col_name: str)¶
Delete a column configuration.
- Parameters
col_name (str) – Name of the column.
- set_dataset_config(m: int)¶
Set dataset configurations based on the provided configurations or the default configurations.
- t_sort()¶
Optimize the order of dataset configuration using topological sorting.
- dnxmy.config_generator.generate_default_config(column_config: Optional[dict] = None) → dict¶
Generate a dictionary with default values for column configurations.
- Parameters
column_config (dict, optional) – Dictionary containing column configurations. Defaults to None.
- Returns
Dictionary with default values for column configurations.
- Return type
dict
dnxmy.dnxmy module¶
- class dnxmy.dnxmy.Dnxmy(n: int, m: Optional[int] = None, dnxmy_config: Optional[dnxmy.config_generator.DnxmyConfig] = None, seed: int = 0)¶
Bases:
object- add_samples(n: int) → pandas.core.frame.DataFrame¶
Add samples to the generated data.
- Parameters
n (int) – Number of samples to be added.
- Returns
DataFrame containing the generated data with added samples.
- Return type
pd.DataFrame
- generate() → pandas.core.frame.DataFrame¶
Generate the data based on the provided configurations.
- Returns
DataFrame containing the generated data.
- Return type
pd.DataFrame
- miss() → pandas.core.frame.DataFrame¶
Generate missing values based on the provided configurations.
- Returns
DataFrame containing the generated data with missing values.
- Return type
pd.DataFrame
dnxmy.variable_generator module¶
- dnxmy.variable_generator.generate_arma_samples(col_name: str, time_series_config: dict, n: int) → pandas.core.series.Series¶
Generate a time series(ARMA model) based on the provided time configurations.
- Parameters
col_name (str) – Name of the column.
time_series_config (dict) – Dictionary containing the time configurations.
n (int) – Number of samples to generate.
- Returns
Series containing the generated samples.
- Return type
pd.Series
- dnxmy.variable_generator.generate_dependent_samples(col_name: str, dataset_config: list, df, dependent_on: dict, n: int, offset: Optional[dict] = None) → pandas.core.series.Series¶
Generate dependent samples based on the provided dependent_on configurations.
- Parameters
col_name (str) – Name of the column.
dataset_config (list) – List containing the dataset configurations.
df (pd.DataFrame) – Dataframe containing the dataset.
dependent_on (dict) – Dictionary containing the dependent_on configurations.
n (int) – Number of samples to generate.
offset (dict, optional) – Dictionary containing the offset configurations. Defaults to None.
- Returns
Series containing the generated samples.
- Return type
pd.Series
- dnxmy.variable_generator.generate_random_samples(col_name: str, probability_distribution: dict, n: int) → pandas.core.series.Series¶
Generate random samples based on the provided probability distribution.
- Parameters
col_name (str) – Name of the column.
probability_distribution (dict) – Dictionary containing the probability distribution.
n (int) – Number of samples to be generated.
- Returns
Series containing the generated samples.
- Return type
pd.Series
- dnxmy.variable_generator.generate_time_part(col_name: str, time_part_config: dict, n: int) → pandas.core.series.Series¶
Generate a time series based on the provided time configurations.
- Parameters
col_name (str) – Name of the column.
time_part_config (dict) – Dictionary containing the time configurations.
n (int) – Number of samples to generate.
- Returns
Series containing the generated time series.
- Return type
pd.Series