ÁöµµÇнÀ°ú °ÈÇнÀÀ» À§ÇÑ ÃֽŠ±â¼ú ¸ÞŸ·¯´×À» ÀÌÇØÇÏÀÚ!ÀÌ Ã¥Àº ÃÖ±Ù ÀΰøÁö´É°ú ¸Ó½Å·¯´× ºÐ¾ß¿¡¼ °¢±¤¹Þ°í ÀÖ´Â ¸ÞŸ·¯´×¿¡ ´ëÇÑ ÀÔ¹® ¼ÀûÀÔ´Ï´Ù. µ¶ÀÚ ¿©·¯ºÐ²²¼ ´Ù¼Ò »ý¼ÒÇÒ ¼ö ÀÖ´Â ¸ÞŸ·¯´×¿¡ ´ëÇÑ °³³äÀ» ÀÌÇØÇÏ°í ½ÇÁ¦·Î ¼¼ºÎ ¾Ë°í¸®Áòµé±îÁö ±¸ÇöÇغ¸´Â ±âȸ¸¦ Á¦°øÇÏ´Â °ÍÀ» ¸ñÇ¥·Î ÇÕ´Ï´Ù. ƯÈ÷ ¸ÞŸ·¯´×¿¡¼ ȸ±Í ¹®Á¦, ºÐ·ù ¹®Á¦¸¦ ´Ù·ç´Â ¸ÞŸ ÁöµµÇнÀ»Ó ¾Æ´Ï¶ó, °ÈÇнÀÀ» ¼Ò°³ÇÏ°í ÀÌ¿¡ ´ëÇØ ¸ÞŸ·¯´×À» Àû¿ëÇÑ ¸ÞŸ °ÈÇнÀ±îÁö ½Éµµ ÀÖ°Ô ´Ù·ç´Â °ÍÀÌ ÀÌ Ã¥ÀÇ Å« ÀåÁ¡ÀÔ´Ï´Ù. óÀ½¿¡´Â ´Ù¼Ò »ý¼ÒÇÒ ¼ö ÀÖÁö¸¸, ²ÙÁØÈ÷ ÀÌ Ã¥À» ¹Ýº¹Çؼ ÀÐ°í ½Ç½ÀÇÏ¸é ¸ÚÁø ÃֽŠ¸Ó½Å·¯´× ±â¼úÀÎ ¸ÞŸ·¯´×À» ÇÑÃþ ±íÀÌ ÀÌÇØÇÒ ¼ö ÀÖÀ» °ÍÀÔ´Ï´Ù.
µ¿±¹´ëÇб³¿¡¼ ÄÄÇ»ÅÍ°øÇÐÀ» Àü°øÇÏ°í ¼¿ï´ëÇб³ ÄÄÇ»ÅÍ°øÇкο¡¼ ¹Ú»ç°úÁ¤ Áß¿¡ ÀÖ´Ù. ¸ÞŸ·¯´×À» ¿¬±¸ÇÏ°í, ÃÖ±Ù¿¡´Â ¸ÞŸ °ÈÇнÀ, ¿ÀÇÁ¶óÀÎ °ÈÇнÀ¿¡ °ü½ÉÀ» °¡Áö°í ¿¬±¸ÇÏ°í ÀÖ´Ù.
¢Ã 1Àå: ¸ÞŸ·¯´× °³¿ä1.1 ¸Ó½Å·¯´×°ú µö·¯´× 1.2 ¸ÞŸ·¯´×À̶õ? 1.3 ¸ÞŸ·¯´× ÇнÀ ȯ°æ ±¸Ãà ___1.3.1 ¾Æ³ªÄÜ´Ù ¼³Ä¡¿Í »ç¿ë___1.3.2 ¾Æ³ªÄÜ´Ù ¼³Ä¡___1.3.3 ±êÇãºê ÀúÀå¼Ò Ŭ·Ð ¹× ȯ°æ ±¸Ãà¢Ã 2Àå: ¸ÞŸ ÁöµµÇнÀ2.1 ¸ÞŸ·¯´× ¹®Á¦ Á¤ÀÇ ___2.1.1 ŽºÅ© Á¤ÀÇ___2.1.2 ¸ÞŸ·¯´× µ¥ÀÌÅͼÂ___2.1.3 ¸ÞŸ·¯´×___2.1.4 ½Ç½À: Torchmeta ¶óÀ̺귯¸® ¼Ò°³2.2 ¸ðµ¨ ±â¹Ý ¸ÞŸ·¯´× ___2.2.1 ¸ðµ¨ ±â¹Ý ¸ÞŸ·¯´×ÀÇ ÇÙ½É °³³ä___2.2.2 NTM(Neural Turing Machines)___2.2.3 MANN(Memory-Augmented Neural Networks)___2.2.4 ½Ç½À: MANN ±¸Çö ___2.2.5 SNAIL(Simple Neural Attentive Meta-Learner)___2.2.6 ½Ç½À: SNAIL ±¸Çö2.3 ÃÖÀûÈ ±â¹Ý ¸ÞŸ·¯´× ___2.3.1 ÀüÀÌÇнÀ°ú ÃÖÀûÈ ±â¹Ý ¸ÞŸ·¯´× ___2.3.2 MAML°ú FOMAML___2.3.3 ½Ç½À: MAML-Regression___2.3.4 ½Ç½À: MAML-Classification2.4 ¸ÞÆ®¸¯ ±â¹Ý ¸ÞŸ·¯´× ___2.4.1 KNN°ú ¸ÞÆ®¸¯ ±â¹Ý ¸ÞŸ·¯´×___2.4.2 Matching ³×Æ®¿öÅ© ___2.4.3 ½Ç½À: Matching ³×Æ®¿öÅ© ±¸Çö___2.4.4 Prototypical ³×Æ®¿öÅ©___2.4.5 ½Ç½À: Prototypical ³×Æ®¿öÅ© ±¸Çö2.5 ¸ÞŸ·¯´× ¾Ë°í¸®Áò ¼Ó¼º°ú Àå´ÜÁ¡ ___2.5.1 ¸ÞŸ·¯´× ¾Ë°í¸®ÁòÀÇ ¼¼ °¡Áö ¼Ó¼º___2.5.2 ¸ÞŸ·¯´× ¾Ë°í¸®Áò ºñ±³¢Ã 3Àå: °ÈÇнÀ °³¿ä3.1 ¸¶¸£ÄÚÇÁ °áÁ¤ °úÁ¤, Á¤Ã¥, °¡Ä¡ÇÔ¼ö ___3.1.1 ¸¶¸£ÄÚÇÁ °áÁ¤ °úÁ¤___3.1.2 Á¤Ã¥°ú °ÈÇнÀÀÇ ¸ñÇ¥___3.1.3 °¡Ä¡ ÇÔ¼ö3.2 ŽÇè°ú È°¿ë 3.3 °ÈÇнÀ ¾Ë°í¸®ÁòÀÇ Á¾·ù___3.3.1 On-policy¿Í Off-policy___3.3.2 Á¤Ã¥ ±â¹Ý ¾Ë°í¸®Áò___3.3.3 °¡Ä¡ ±â¹Ý ¾Ë°í¸®Áò___3.3.4 ¾×ÅÍ Å©¸®Æ½ ¾Ë°í¸®Áò3.4 TRPO(Trust Region Policy Optimization) ___3.4.1 TRPO ¾ÆÀ̵ð¾î___3.4.2 Surrogate ¸ñÀû ÇÔ¼ö¿Í Á¦¾à Á¶°Ç___3.4.3 ÄÓ·¹ ±×¶óµð¾ðÆ®¹ý ±â¹Ý ÃÖÀûÈ3.5 PPO(Proximal Policy Optimzation) ___3.5.1 PPO ¾ÆÀ̵ð¾î___3.5.2 Clipped Surrogate ¸ñÀûÇÔ¼ö___3.5.3 PPO ¾Ë°í¸®Áò3.6 SAC(Soft Actor Critic) ___3.6.1 ¿£Æ®·ÎÇÇ___3.6.2 ÃÖ´ë ¿£Æ®·ÎÇÇ °ÈÇнÀ___3.6.3 °¡Ä¡ÇÔ¼ö ¹× Á¤Ã¥ ÇнÀ___3.6.4 SAC ¾Ë°í¸®Áò¢Ã 4Àå: ¸ÞŸ °ÈÇнÀ4.1 ¸ÞŸ °ÈÇнÀ ___4.1.1 ŽºÅ© °³³ä ¼Ò°³___4.1.2 ¸ÞŸ °ÈÇнÀ ¹®Á¦ Á¤ÀÇ___4.1.3 MuJoCo ¹× Half-Cheetah ȯ°æ °³³ä ¼Ò°³4.2 ¼øȯ Á¤Ã¥ ¸ÞŸ °ÈÇнÀ ___4.2.1 GRU___4.2.2 ¼øȯ Á¤Ã¥ ¸ÞŸ °ÈÇнÀ___4.2.3 RL2 ___4.2.4 ½Ç½À: RL2 ±¸Çö4.3 ÃÖÀûÈ ±â¹Ý ¸ÞŸ °ÈÇнÀ ___4.3.1 MAML-RL ___4.3.2 ½Ç½À: MAML-RL ±¸Çö4.4 ÄÁÅؽºÆ® ±â¹Ý ¸ÞŸ °ÈÇнÀ ___4.4.1 ŽºÅ© Ãß·Ð °üÁ¡¿¡¼ÀÇ ¸ÞŸ °ÈÇнÀ___4.4.2 ÄÁÅؽºÆ® ±â¹Ý Á¤Ã¥___4.4.3 º¯ºÐÀû Ãß·Ð___4.4.4 PEARL(Probabilistic Embeddings for Actor critic RL)___4.4.5 ½Ç½À: PEARL ±¸Çö ¢Ã 5Àå: ¿ÀǠ縰Áö¿Í ¸ÞŸ·¯´× ¾ÖÇø®ÄÉÀ̼Ç5.1 ¿ÀǠ縰Áö(Open Chanllenges) ___5.1.1 ¸ÞŸ °úÀûÇÕ___5.1.2 Ä¡¸íÀû ¸Á°¢°ú Áö¼Ó ÇнÀ___5.1.3 ºÎÁ·ÇÑ º¥Ä¡¸¶Å©___5.1.4 ºÎÁ·ÇÑ ·¹À̺íµÈ µ¥ÀÌÅÍ¿Í ¸ÞŸ ºñÁöµµ ÇнÀ5.2 ¸ÞŸ·¯´× ¾ÖÇø®ÄÉÀÌ¼Ç ___5.2.1 ÄÄÇ»ÅÍ ºñÀü___5.2.2 °ÈÇнÀ___5.2.3 ÀÚ¿¬¾î ó¸®___5.2.4 ÀÇ·á___5.2.5 ¸¶Ä¡¸ç