ÄÜÅÙÃ÷ »ó¼¼º¸±â
¸ðµÎ¸¦ À§ÇÑ ¸ÞŸ·¯´×


¸ðµÎ¸¦ À§ÇÑ ¸ÞŸ·¯´×

¸ðµÎ¸¦ À§ÇÑ ¸ÞŸ·¯´×

<Á¤Ã¢ÈÆ>,<À̽ÂÇö>,<À̵¿¹Î>,<À强Àº>,<À̽ÂÀç>,<À±½ÂÁ¦> °øÀú/<ÃÖ¼ºÁØ> °¨¼ö | À§Å°ºÏ½º

Ãâ°£ÀÏ
2023-01-31
ÆÄÀÏÆ÷¸Ë
ePub
¿ë·®
15 M
Áö¿ø±â±â
PC½º¸¶Æ®ÆùÅÂºí¸´PC
ÇöȲ
½Åû °Ç¼ö : 0 °Ç
°£·« ½Åû ¸Þ¼¼Áö
ÄÜÅÙÃ÷ ¼Ò°³
ÀúÀÚ ¼Ò°³
¸ñÂ÷
ÇÑÁÙ¼­Æò

ÄÜÅÙÃ÷ ¼Ò°³

ÁöµµÇнÀ°ú °­È­ÇнÀÀ» À§ÇÑ ÃֽŠ±â¼ú ¸ÞŸ·¯´×À» ÀÌÇØÇÏÀÚ!

ÀÌ Ã¥Àº ÃÖ±Ù ÀΰøÁö´É°ú ¸Ó½Å·¯´× ºÐ¾ß¿¡¼­ °¢±¤¹Þ°í ÀÖ´Â ¸ÞŸ·¯´×¿¡ ´ëÇÑ ÀÔ¹® ¼­ÀûÀÔ´Ï´Ù. µ¶ÀÚ ¿©·¯ºÐ²²¼­ ´Ù¼Ò »ý¼ÒÇÒ ¼ö ÀÖ´Â ¸ÞŸ·¯´×¿¡ ´ëÇÑ °³³äÀ» ÀÌÇØÇÏ°í ½ÇÁ¦·Î ¼¼ºÎ ¾Ë°í¸®Áòµé±îÁö ±¸ÇöÇغ¸´Â ±âȸ¸¦ Á¦°øÇÏ´Â °ÍÀ» ¸ñÇ¥·Î ÇÕ´Ï´Ù. ƯÈ÷ ¸ÞŸ·¯´×¿¡¼­ ȸ±Í ¹®Á¦, ºÐ·ù ¹®Á¦¸¦ ´Ù·ç´Â ¸ÞŸ ÁöµµÇнÀ»Ó ¾Æ´Ï¶ó, °­È­ÇнÀÀ» ¼Ò°³ÇÏ°í ÀÌ¿¡ ´ëÇØ ¸ÞŸ·¯´×À» Àû¿ëÇÑ ¸ÞŸ °­È­ÇнÀ±îÁö ½Éµµ ÀÖ°Ô ´Ù·ç´Â °ÍÀÌ ÀÌ Ã¥ÀÇ Å« ÀåÁ¡ÀÔ´Ï´Ù. óÀ½¿¡´Â ´Ù¼Ò »ý¼ÒÇÒ ¼ö ÀÖÁö¸¸, ²ÙÁØÈ÷ ÀÌ Ã¥À» ¹Ýº¹Çؼ­ ÀÐ°í ½Ç½ÀÇÏ¸é ¸ÚÁø ÃֽŠ¸Ó½Å·¯´× ±â¼úÀÎ ¸ÞŸ·¯´×À» ÇÑÃþ ±íÀÌ ÀÌÇØÇÒ ¼ö ÀÖÀ» °ÍÀÔ´Ï´Ù.

ÀúÀÚ¼Ò°³

µ¿±¹´ëÇб³¿¡¼­ ÄÄÇ»ÅÍ°øÇÐÀ» Àü°øÇÏ°í ¼­¿ï´ëÇб³ ÄÄÇ»ÅÍ°øÇкο¡¼­ ¹Ú»ç°úÁ¤ Áß¿¡ ÀÖ´Ù. ¸ÞŸ·¯´×À» ¿¬±¸ÇÏ°í, ÃÖ±Ù¿¡´Â ¸ÞŸ °­È­ÇнÀ, ¿ÀÇÁ¶óÀÎ °­È­ÇнÀ¿¡ °ü½ÉÀ» °¡Áö°í ¿¬±¸ÇÏ°í ÀÖ´Ù.

¸ñÂ÷

¢Ã 1Àå: ¸ÞŸ·¯´× °³¿ä
1.1 ¸Ó½Å·¯´×°ú µö·¯´×
1.2 ¸ÞŸ·¯´×À̶õ?
1.3 ¸ÞŸ·¯´× ÇнÀ ȯ°æ ±¸Ãà
___1.3.1 ¾Æ³ªÄÜ´Ù ¼³Ä¡¿Í »ç¿ë
___1.3.2 ¾Æ³ªÄÜ´Ù ¼³Ä¡
___1.3.3 ±êÇãºê ÀúÀå¼Ò Ŭ·Ð ¹× ȯ°æ ±¸Ãà

¢Ã 2Àå: ¸ÞŸ ÁöµµÇнÀ
2.1 ¸ÞŸ·¯´× ¹®Á¦ Á¤ÀÇ
___2.1.1 ŽºÅ© Á¤ÀÇ
___2.1.2 ¸ÞŸ·¯´× µ¥ÀÌÅͼÂ
___2.1.3 ¸ÞŸ·¯´×
___2.1.4 ½Ç½À: Torchmeta ¶óÀ̺귯¸® ¼Ò°³
2.2 ¸ðµ¨ ±â¹Ý ¸ÞŸ·¯´×
___2.2.1 ¸ðµ¨ ±â¹Ý ¸ÞŸ·¯´×ÀÇ ÇÙ½É °³³ä
___2.2.2 NTM(Neural Turing Machines)
___2.2.3 MANN(Memory-Augmented Neural Networks)
___2.2.4 ½Ç½À: MANN ±¸Çö
___2.2.5 SNAIL(Simple Neural Attentive Meta-Learner)
___2.2.6 ½Ç½À: SNAIL ±¸Çö
2.3 ÃÖÀûÈ­ ±â¹Ý ¸ÞŸ·¯´×
___2.3.1 ÀüÀÌÇнÀ°ú ÃÖÀûÈ­ ±â¹Ý ¸ÞŸ·¯´×
___2.3.2 MAML°ú FOMAML
___2.3.3 ½Ç½À: MAML-Regression
___2.3.4 ½Ç½À: MAML-Classification
2.4 ¸ÞÆ®¸¯ ±â¹Ý ¸ÞŸ·¯´×
___2.4.1 KNN°ú ¸ÞÆ®¸¯ ±â¹Ý ¸ÞŸ·¯´×
___2.4.2 Matching ³×Æ®¿öÅ©
___2.4.3 ½Ç½À: Matching ³×Æ®¿öÅ© ±¸Çö
___2.4.4 Prototypical ³×Æ®¿öÅ©
___2.4.5 ½Ç½À: Prototypical ³×Æ®¿öÅ© ±¸Çö
2.5 ¸ÞŸ·¯´× ¾Ë°í¸®Áò ¼Ó¼º°ú Àå´ÜÁ¡
___2.5.1 ¸ÞŸ·¯´× ¾Ë°í¸®ÁòÀÇ ¼¼ °¡Áö ¼Ó¼º
___2.5.2 ¸ÞŸ·¯´× ¾Ë°í¸®Áò ºñ±³

¢Ã 3Àå: °­È­ÇнÀ °³¿ä
3.1 ¸¶¸£ÄÚÇÁ °áÁ¤ °úÁ¤, Á¤Ã¥, °¡Ä¡ÇÔ¼ö
___3.1.1 ¸¶¸£ÄÚÇÁ °áÁ¤ °úÁ¤
___3.1.2 Á¤Ã¥°ú °­È­ÇнÀÀÇ ¸ñÇ¥
___3.1.3 °¡Ä¡ ÇÔ¼ö
3.2 ŽÇè°ú È°¿ë
3.3 °­È­ÇнÀ ¾Ë°í¸®ÁòÀÇ Á¾·ù
___3.3.1 On-policy¿Í Off-policy
___3.3.2 Á¤Ã¥ ±â¹Ý ¾Ë°í¸®Áò
___3.3.3 °¡Ä¡ ±â¹Ý ¾Ë°í¸®Áò
___3.3.4 ¾×ÅÍ Å©¸®Æ½ ¾Ë°í¸®Áò
3.4 TRPO(Trust Region Policy Optimization)
___3.4.1 TRPO ¾ÆÀ̵ð¾î
___3.4.2 Surrogate ¸ñÀû ÇÔ¼ö¿Í Á¦¾à Á¶°Ç
___3.4.3 ÄÓ·¹ ±×¶óµð¾ðÆ®¹ý ±â¹Ý ÃÖÀûÈ­
3.5 PPO(Proximal Policy Optimzation)
___3.5.1 PPO ¾ÆÀ̵ð¾î
___3.5.2 Clipped Surrogate ¸ñÀûÇÔ¼ö
___3.5.3 PPO ¾Ë°í¸®Áò
3.6 SAC(Soft Actor Critic)
___3.6.1 ¿£Æ®·ÎÇÇ
___3.6.2 ÃÖ´ë ¿£Æ®·ÎÇÇ °­È­ÇнÀ
___3.6.3 °¡Ä¡ÇÔ¼ö ¹× Á¤Ã¥ ÇнÀ
___3.6.4 SAC ¾Ë°í¸®Áò

¢Ã 4Àå: ¸ÞŸ °­È­ÇнÀ
4.1 ¸ÞŸ °­È­ÇнÀ
___4.1.1 ŽºÅ© °³³ä ¼Ò°³
___4.1.2 ¸ÞŸ °­È­ÇнÀ ¹®Á¦ Á¤ÀÇ
___4.1.3 MuJoCo ¹× Half-Cheetah ȯ°æ °³³ä ¼Ò°³
4.2 ¼øȯ Á¤Ã¥ ¸ÞŸ °­È­ÇнÀ
___4.2.1 GRU
___4.2.2 ¼øȯ Á¤Ã¥ ¸ÞŸ °­È­ÇнÀ
___4.2.3 RL2
___4.2.4 ½Ç½À: RL2 ±¸Çö
4.3 ÃÖÀûÈ­ ±â¹Ý ¸ÞŸ °­È­ÇнÀ
___4.3.1 MAML-RL
___4.3.2 ½Ç½À: MAML-RL ±¸Çö
4.4 ÄÁÅؽºÆ® ±â¹Ý ¸ÞŸ °­È­ÇнÀ
___4.4.1 ŽºÅ© Ãß·Ð °üÁ¡¿¡¼­ÀÇ ¸ÞŸ °­È­ÇнÀ
___4.4.2 ÄÁÅؽºÆ® ±â¹Ý Á¤Ã¥
___4.4.3 º¯ºÐÀû Ãß·Ð
___4.4.4 PEARL(Probabilistic Embeddings for Actor critic RL)
___4.4.5 ½Ç½À: PEARL ±¸Çö

¢Ã 5Àå: ¿ÀǠ縰Áö¿Í ¸ÞŸ·¯´× ¾ÖÇø®ÄÉÀ̼Ç
5.1 ¿ÀǠ縰Áö(Open Chanllenges)
___5.1.1 ¸ÞŸ °úÀûÇÕ
___5.1.2 Ä¡¸íÀû ¸Á°¢°ú Áö¼Ó ÇнÀ
___5.1.3 ºÎÁ·ÇÑ º¥Ä¡¸¶Å©
___5.1.4 ºÎÁ·ÇÑ ·¹À̺íµÈ µ¥ÀÌÅÍ¿Í ¸ÞŸ ºñÁöµµ ÇнÀ
5.2 ¸ÞŸ·¯´× ¾ÖÇø®ÄÉÀ̼Ç
___5.2.1 ÄÄÇ»ÅÍ ºñÀü
___5.2.2 °­È­ÇнÀ
___5.2.3 ÀÚ¿¬¾î ó¸®
___5.2.4 ÀÇ·á
___5.2.5 ¸¶Ä¡¸ç