Getting Data From Database

CLEASE retrieves data from ASE databases, via the generic DataManager class. For convenience, concrete implementations of the DataManager is provided for the most common applications.

class clease.data_manager.CorrFuncEnergyDataManager(db_name: str, tab_name: str, cf_names: List[str] | None = None, order: int = 1)[source]

CorrFuncFinalEnergyDataManager is a convenience class provided to handle the standard case where the features are correlation functions and the target is the DFT energy per atom

Parameters:
  • db_name – Name of the database being passed

  • cf_names – List with the correlation function names to extract

  • tab_name – Name of the table where the correlation functions are stored

  • order – Order of the correlation function. Default 1.

get_data(select_cond: List[tuple]) Tuple[ndarray, ndarray][source]

Return X and y, where X is the design matrix containing correlation functions and y is the DFT energy per atom.

Parameters:

select_cond – List with select conditions for the database (e.g. [(‘converged’, ‘=’, True)])

class clease.data_manager.CorrFuncVolumeDataManager(db_name: str, tab_name: str, cf_names: List[str] | None = None, order: int = 1)[source]

CorrFuncVolumeDataManager is a convenience class provided to handle the standard case where the features are correlation functions and the target is the volume of the relaxed cell

Parameters:
  • db_name – Name of the database being passed

  • tab_name – Name of the table where the correlation functions are stored

  • cf_names – List with the correlation function names to extract. If None, all correlation functions in the database will be extracted.

  • order – Order of the correlation functions. Default 1.

get_data(select_cond: List[tuple]) Tuple[ndarray, ndarray][source]

Return X and y, where X is the design matrix containing correlation functions and y is the volume per atom.

Parameters:

select_cond: list

List with select conditions for the database (e.g. [(‘converged’, ‘=’, True)])

class clease.data_manager.CorrelationFunctionGetter(db_name: str, tab_name: str, cf_names: List[str] | None = None, order: int = 1)[source]

CorrelationFunctionGetter is a class that extracts the correlation functions from an AtomsRow object

Parameters:
  • db_name – Name of the database

  • tab_name – Name of the external table where the correlation functions are stored

  • cf_names – List with the names of the correlation functions. If None, all correlation functions in the database will be extracted

  • order – Order of the correlation function. Default is 1.

get_property(ids: Sequence[int]) ndarray[source]

Extracts the design matrix associated with the database IDs. The first row in the matrix corresponds to the first item in ids, the second row corresponds to the second item in ids etc. If cf_names was None, all correlation functions in the database will be extracted. cf_names will be updated such that it reflects the names of the correlation functions that were extracted.

Parameters:

ids – Database IDs of initial structures

property names

Return a name of each column

class clease.data_manager.CorrelationFunctionGetterVolDepECI(db_name: str, tab_name: str, cf_names: List[str], order: int | None = 0, properties: Tuple[str] = ('energy', 'pressure'), cf_order: int = 1)[source]

Extracts correlation functions, multiplied with a power of the volume per atom. The feature names are named according to the correlation function names in the database, but a suffix of _Vd is appended. d is an integer inticading the power. Thus, if the name is for example c2_d0000_0_00_V2, it means that the column contains the correlation function c2_d0000_0_00, multiplied by V^2, where V is the volume per atom.

Parameters:
  • db_name – Name of the database

  • tab_name – Name of the table where correlation functions are stored

  • cf_names – Name of the correlation functions that should be extracted

  • order – Each ECI will be a polynomial in the volume of the passed order (default: 0)

  • properties – List of properties that should be used in fitting. Can be energy, pressure, bulk_mod. (default: [‘energy’, ‘pressure’]). The pressure is always assumed to be zero (e.g. the energies passed are for relaxed structures.). All entries in the database are expected to have an energy. The remaining properties (e.g. bulk_mod) is not required for all structures. In class will pick up and the material property for the structures where it is present.

  • cf_order – The energy is expanded up and (inluding) this order in the correlation function. Default is 1.

build(ids: List[int]) ndarray[source]

Construct the design matrix and the target value required to fit a cluster expansion model to all material properties in self.properties.

Parameters:

ids – List of ids to take into account

get_data(select_cond: List[tuple]) Tuple[ndarray, ndarray][source]

Return the design matrix and the target values for the entries corresponding to select_cond.

Parameters:

select_cond – ASE select condition. The design matrix and the target vector will be extracted for rows matching the passed condition.

groups() List[int][source]

Return the group of each rows.

class clease.data_manager.DataManager(db_name: str)[source]

DataManager is a class for extracting data from CLEASE databases to be used to fit ECIs

Parameters:

db_name – Name of the database

get_cols(names: List[str]) ndarray[source]

Get all columns corresponding to the names

Pram names:

List of names (e.g. [‘c0’, ‘c1_1’])

abstract get_data(select_cond: List[tuple]) Tuple[ndarray, ndarray][source]

Return the design matrix X and the target data y

get_matching_names(pattern: str) List[str][source]

Get names that matches pattern

Parameters:

pattern – Pattern which the string should contain.

Example:

If the names are [‘abc’, ‘def’, ‘gbcr’] and the passed pattern is ‘bc’, then [‘abc’, ‘gbcr’] will be returned

groups() List[int][source]

Returns the group of each item in the X matrix. In the top-level DataManager it is assumed that each row in the X matrix constitutes its own group. But this method may be overrided in child classes.

to_csv(fname: str)[source]

Export the dataset used to fit a model y = Xc where y is typically the DFT energy per atom and c is the unknown ECIs. This function exports the data to a csv file with the following format

# ECIname_1, ECIname_2, …, ECIname_n, E_DFT 0.1, 0.4, …, -0.6, -2.0 0.3, 0.2, …, -0.9, -2.3

thus each row in the file contains the correlation function values and the corresponding DFT energy value.

Parameters:

fname – Filename to write to. Typically this should end with .csv

class clease.data_manager.FinalStructPropertyGetter(db_name: str, prop: str)[source]

FinalStructPropertyGetter is a class that returns the user defined property value corresponding to the passed AtomsRow object. The user defined property should be located in the final atoms row.

Parameters:

db_name – Name of the database

get_property(ids: Sequence[int]) ndarray[source]

Extract the property of the ids passed.

Parameters:

ids – Database ids of initial structures

property name

Return the name of the target property

exception clease.data_manager.InconsistentDataError[source]

Data is inconsistent

clease.data_manager.make_corr_func_data_manager(prop: str, db_name: str, tab_name: str, cf_names: Sequence[str], **kwargs) DataManager[source]

Helper function for creating a correlation function data manager