You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
# A Public Database of Thermoelectric Materials and System-Identified Material Representation for Data-Driven Discovery
2
+
3
+
Source: https://github.com/KRICT-DATA/SIMD
4
+
5
+
License: CC BY 4.0, https://creativecommons.org/licenses/by/4.0/
6
+
7
+
This database is created from a database released for [Na, G. S., & Chang, H. (2022). A public database of thermoelectric materials and system-identified material representation for data-driven discovery. npj Computational Materials, 8(1), 214.](https://www.nature.com/articles/s41524-022-00897-2).
8
+
9
+
The ESTM dataset (in "estm.xlsx") covers 880 unique thermoelectric materials and provides five experimentally measured thermoelectric properties: Seebeck coefficient, electrical conductivity, thermal conductivity, power factor, and figure of merit (ZT). There are a total of 5205 rows in the dataset.
10
+
11
+
Changes include:
12
+
- Standardized the DOI Format
13
+
- Introduced constraints on specific columns like temperature and thermal conductivity
// XSLXInterpeter does not work, see https://github.com/jvalue/jayvee/issues/603
2
2
3
-
/*
4
-
These datasets are based on the paper cited as "Na, G. S., & Chang, H. (2022). A public database of thermoelectric materials and system-identified material representation for data-driven discovery. npj Computational Materials, 8(1), 214.".
5
-
The ESTM dataset covers 880 unique thermoelectric materials and provides five experimentally measured thermoelectric properties: Seebeck coefficient, electrical conductivity, thermal conductivity, power factor, and figure of merit (ZT)".
6
-
In this paper, a machine learning approach is devised through which the ZT values for different materials from unexplored material groups were predicted and R2-score from 0.13 to 0.71 in an extrapolation problem.
7
-
8
-
Coming to the data engiineering pipeline, the earlier version had 5 sheets, but we removed the "results_extrapol_0.xlsx", "results_extrapol_1.xlsx" & "results_extrapol_2.xlsx", because they were the truncated versions of
9
-
"preds_sxgb.xlsx" and no new data. So, we have kept only "estm.xlsx" & "preds_sxgb.xlsx" & removed the other redundant datasets. There are a total of 5205 rows in both the datasets, before and after the pipeline is executed.
10
-
Other changes that we made were changing the datatype of various columns. Previously all of them were text, whereas, we changed them to appropriate datatypes, as we saw fit. We have also introduced constraints on specific columns like
11
-
temperature & Thermal conductivity, where we mentioned the range of values that the columns can allow.
12
-
13
-
We have also used Transform blocks on the Reference column to standardize the DOI URLs, so that all of them start with 10.xx.xxx instead of http://xxx or https://xxxx.
14
-
15
-
All the changes have been discussed in details just before the part of the code that causes it.
16
-
*/
17
-
18
-
//In the paper the range of the temperature is mentioned to be between 10 & 1275 kelvin. So, we have kept that as the allowed range of the temperature.(refer to page 2)
3
+
// The paper mentions temperature to be between 10 & 1275 kelvin, see page 2
//In the paper the range of the Thermal conductivity is mentioned to be between 0.07 & 77.16 W/mK and that is the allowed range for our datatype. (refer to page 2)
14
+
// The paper mentions thermal conductivity to be between 0.07 & 77.16 W/mK , see page 2
0 commit comments