Feedforward beta control in the KSTAR tokamak by deep reinforcement learning

被引:47
作者
Seo, Jaemin [1 ]
Na, Y. S. [1 ]
Kim, B. [1 ]
Lee, C. Y. [1 ]
Park, M. S. [1 ]
Park, S. J. [1 ]
Lee, Y. H. [2 ]
机构
[1] Seoul Natl Univ, Dept Nucl Engn, Seoul, South Korea
[2] Korea Inst Fus Energy, Daejeon, South Korea
基金
新加坡国家研究基金会;
关键词
machine learning; reinforcement learning; beta control; data-driven simulation; KSTAR; tokamak; GENERAL AXISYMMETRICAL EQUILIBRIA; RECONSTRUCTION; PARAMETERS; PLASMAS;
D O I
10.1088/1741-4326/ac121b
中图分类号
O35 [流体力学]; O53 [等离子体物理学];
学科分类号
070204 ; 080103 ; 080704 ;
摘要
In this work, we address a new feedforward control scheme for the normalized beta (beta (N)) in tokamak plasmas, using the deep reinforcement learning (RL) technique. The deep RL algorithm optimizes an artificial decision-making agent that adjusts the discharge scenario to obtain a given target beta (N) from the state-action-reward sets explored by its own trial and error in a virtual tokamak environment. The virtual environment for the RL training is constructed using a long short-term memory (LSTM) network that imitates the plasma responses to external actuator controls, which is trained using five years' worth of KSTAR experimental data. The RL agent then experiences numerous discharges with different actuator controls in the LSTM simulator, and its internal parameters are optimized in the direction of maximizing the reward. We analyze a series of KSTAR experiments conducted with the RL-determined scenarios to validate the feasibility of the beta control scheme in a real device. We discuss the successes and limitations of feedforward beta control by RL, and suggest a future research path for this area of study.
引用
收藏
页数:14
相关论文
共 51 条
[1]   Overview of recent experimental results from the DIII-D advanced tokamak programme [J].
Allen, SL .
NUCLEAR FUSION, 2001, 41 (10) :1341-1353
[2]   Neoclassical transport coefficients for general axisymmetric equilibria in the banana regime [J].
Angioni, C ;
Sauter, O .
PHYSICS OF PLASMAS, 2000, 7 (04) :1224-1234
[3]  
[Anonymous], 2002, 598 IPP MAX PLANCK I
[4]   Long short-term memory [J].
Hochreiter, S ;
Schmidhuber, J .
NEURAL COMPUTATION, 1997, 9 (08) :1735-1780
[5]   Integrated predictive modeling of high-mode tokamak plasmas using a combination of core and pedestal models [J].
Bateman, G ;
Bandrés, MA ;
Onjun, T ;
Kritz, AH ;
Pankin, A .
PHYSICS OF PLASMAS, 2003, 10 (11) :4358-4370
[6]   Non-inductive improved H-mode operation at ASDEX Upgrade [J].
Bock, A. ;
Fable, E. ;
Fischer, R. ;
Reich, M. ;
Rittich, D. ;
Stober, J. ;
Bernert, M. ;
Burckhart, A. ;
Doerk, H. ;
Dunne, M. ;
Geiger, B. ;
Giannone, L. ;
Igochine, V. ;
Kappatou, A. ;
McDermott, R. ;
Mlynek, A. ;
Odstrcil, T. ;
Tardini, G. ;
Zohm, H. .
NUCLEAR FUSION, 2017, 57 (12)
[7]   Formation of the internal transport barrier in KSTAR [J].
Chung, J. ;
Kim, H. S. ;
Jeon, Y. M. ;
Kim, J. ;
Choi, M. J. ;
Ko, J. ;
Lee, K. D. ;
Lee, H. H. ;
Yi, S. ;
Kwon, J. M. ;
Hahn, S. -H. ;
Ko, W. H. ;
Lee, J. H. ;
Yoon, S. W. .
NUCLEAR FUSION, 2018, 58 (01)
[8]  
Felici F, 2011, 38 EPS C PLASM PHYS
[9]  
Felici F, 2015, IEEE DECIS CONTR P, P5370, DOI 10.1109/CDC.2015.7403060
[10]   Real time equilibrium reconstruction for tokamak discharge control [J].
Ferron, JR ;
Walker, ML ;
Lag, LL ;
St John, HE ;
Humphreys, DA ;
Leuer, JA .
NUCLEAR FUSION, 1998, 38 (07) :1055-1066