ÄÜÅÙÃ÷ »ó¼¼º¸±â
ÆÄÀÌÅäÄ¡¿Í À¯´ÏƼ ML-Agents·Î ¹è¿ì´Â °­È­ÇнÀ


ÆÄÀÌÅäÄ¡¿Í À¯´ÏƼ ML-Agents·Î ¹è¿ì´Â °­È­ÇнÀ

ÆÄÀÌÅäÄ¡¿Í À¯´ÏƼ ML-Agents·Î ¹è¿ì´Â °­È­ÇнÀ

<¹Î±Ô½Ä>,<ÀÌÇöÈ£>,<±è¿µ·Ï>,<Á¤À¯Á¤>,<Á¤±Ô¿­>,<¹ÚÀ¯¹Î> °øÀú | À§Å°ºÏ½º

Ãâ°£ÀÏ
2022-08-17
ÆÄÀÏÆ÷¸Ë
PDF
¿ë·®
32 M
Áö¿ø±â±â
PC½º¸¶Æ®ÆùÅÂºí¸´PC
ÇöȲ
½Åû °Ç¼ö : 0 °Ç
°£·« ½Åû ¸Þ¼¼Áö
ÄÜÅÙÃ÷ ¼Ò°³
ÀúÀÚ ¼Ò°³
¸ñÂ÷
ÇÑÁÙ¼­Æò

ÄÜÅÙÃ÷ ¼Ò°³

À¯´ÏƼ¸¦ ÀÌ¿ëÇÏ¿© Á÷Á¢ °ÔÀÓÀ» Á¦ÀÛÇÏ°í ML-Agents·Î °­È­ÇнÀ ȯ°æÀ» ±¸¼ºÇÒ ¼ö ÀÖ½À´Ï´Ù!

À¯´ÏƼ ML-Agents´Â °ÔÀÓ ¿£ÁøÀÎ À¯´ÏƼ¸¦ ÅëÇØ Á¦ÀÛÇÑ ½Ã¹Ä·¹ÀÌ¼Ç È¯°æÀ» °­È­ÇнÀÀ» À§ÇÑ È¯°æÀ¸·Î ¸¸µé¾îÁÖ´Â °í¸¶¿î µµ±¸ÀÌ´Ù. ML-Agents¸¦ ÅëÇØ ¸¹Àº °³¹ßÀÚ, ¿¬±¸ÀÚµéÀÌ ¿øÇÏ´Â °­È­ÇнÀ ȯ°æÀ» Á÷Á¢ ¸¸µé ¼ö ÀÖ°Ô µÇ¸é¼­ ML-Agents´Â ÇмúÀû, »ê¾÷ÀûÀ¸·Î °­È­ÇнÀÀÇ »ç¿ë¿¡ ÀÖ¾î Áß¿äÇÑ µµ±¸°¡ µÇ¾ú´Ù. ÇÏÁö¸¸ ¾ÆÁ÷±îÁöµµ ML-Agents, ±×Áß¿¡¼­µµ ƯÈ÷ ML-Agents 2.0 ÀÌÈÄÀÇ ¹öÀüÀ» ´Ù·ç´Â Âü°í ÀÚ·á°¡ ¸¹Áö ¾Ê±â ¶§¹®¿¡ ML-Agents¸¦ »ç¿ëÇÏ´Â µ¥ ¾î·Á¿òÀÌ ¸¹¾Ò´Ù.

ÀÌ Ã¥Àº À¯´ÏƼ, ML-Agents, ½ÉÃþ°­È­ÇнÀ µî À¯´ÏƼ ML-Agents¸¦ »ç¿ëÇÏ´Â µ¥ ÇÊ¿äÇÑ ´Ù¾çÇÑ ³»¿ëÀ» ´Ù·é´Ù. ¶ÇÇÑ ÀÌ Ã¥Àº 2020³â Ãâ°£µÈ ¡ºÅÙ¼­ÇÃ·Î¿Í À¯´ÏƼ ML-Agents·Î ¹è¿ì´Â °­È­ÇнÀ¡»ÀÇ °³Á¤ÆÇÀ¸·Î ÃֽŹöÀüÀÇ ML-Agents¿¡ ´ëÇÑ ³»¿ëÀ» ´Ù·ç°í ÀÖ´Ù.

ÀúÀÚ¼Ò°³

ÇѾç´ëÇб³ ¹Ì·¡ÀÚµ¿Â÷°øÇаú¿¡¼­ ¹Ú»çÇÐÀ§¸¦ ÃëµæÇßÀ¸¸ç ÇöÀç Ä«Ä«¿À¿¡¼­ AI ¿£Áö´Ï¾î·Î ÀÏÇÏ°í ÀÖ´Ù. °­È­ÇнÀ °ü·Ã ÆäÀ̽ººÏ ±×·ìÀÎ Reinforcement Learning KoreaÀÇ ¿î¿µÁøÀ¸·Î È°µ¿ÇÏ°í ÀÖÀ¸¸ç À¯´ÏƼ ÄÚ¸®¾Æ¿¡¼­ °øÀÎÇÑ À¯´ÏƼ Àü¹®°¡ ±×·ìÀÎ Unity Masters 3~5±â·Î È°µ¿Çß´Ù.

¸ñÂ÷

¢Ã 1Àå: °­È­ÇнÀÀÇ °³¿ä
1.1 °­È­ÇнÀÀ̶õ?
___1.1.1 ±â°èÇнÀÀ̶õ?
___1.1.2 °­È­ÇнÀÀÇ ¼º°ú
1.2 °­È­ÇнÀÀÇ ±âÃÊ ¿ë¾î
1.3 °­È­ÇнÀÀÇ ±âÃÊ ÀÌ·Ð
___1.3.1 º§¸¸ ¹æÁ¤½Ä
___1.3.2 ŽÇè(exploration)°ú ÀÌ¿ë(exploitation)

¢Ã 2Àå: À¯´ÏƼ ML_Agents »ìÆ캸±â
2.1 À¯´ÏƼ¿Í ML-Agents
___2.1.1 À¯´ÏƼ
___2.1.2 ML-Agents
2.2 À¯´ÏƼ ¼³Ä¡ ¹× ±âÃÊ Á¶ÀÛ¹ý
___2.2.1 À¯´ÏƼ Çãºê ´Ù¿î·Îµå ¹× ¼³Ä¡
___2.2.2 À¯´ÏƼ ¶óÀ̼±½º È°¼ºÈ­
___2.2.3 À¯´ÏƼ ¿¡µðÅÍ ¼³Ä¡
___2.2.4 À¯´ÏƼ ÇÁ·ÎÁ§Æ® »ý¼º
___2.2.5 À¯´ÏƼ ÀÎÅÍÆäÀ̽º
___2.2.6 À¯´ÏƼÀÇ ±âÃÊÀûÀÎ Á¶ÀÛ
2.3 ML-Agents ¼³Ä¡
___2.3.1 ML-Agents ÆÄÀÏ ³»·Á¹Þ±â
___2.3.2 À¯´ÏƼ¿¡ ML-Agents ¼³Ä¡Çϱâ
___2.3.3 ML-Agents ÆÄÀ̽ã ÆÐÅ°Áö ¼³Ä¡Çϱâ
2.4 ML-AgentsÀÇ ±¸¼º ¿ä¼Ò
___2.4.1 Behavior Parameters
___2.4.2 Agent Script
___2.4.3 Decision Requester, Model Overrider
___2.4.4 ȯ°æ ºôµåÇϱâ
2.5 mlagents-learnÀ» ÀÌ¿ëÇØ ML-Agents »ç¿ëÇϱâ
___2.5.1 ML-Agents¿¡¼­ Á¦°øÇÏ´Â °­È­ÇнÀ ¾Ë°í¸®Áò
___2.5.2 ML-Agents¿¡¼­ Á¦°øÇÏ´Â ÇнÀ ¹æ½Ä
___2.5.3 PPO ¾Ë°í¸®ÁòÀ» ÀÌ¿ëÇÑ 3DBall ȯ°æ ÇнÀ
2.6 Python-API¸¦ ÀÌ¿ëÇØ ML-Agents »ç¿ëÇϱâ
___2.6.1 Python-API¸¦ ÅëÇÑ ¿¡ÀÌÀüÆ® ·£´ý Á¦¾î

¢Ã 3Àå: ±×¸®µå¿ùµå ȯ°æ ¸¸µé±â
3.1 ÇÁ·ÎÁ§Æ® ½ÃÀÛÇϱâ
3.2 ±×¸®µå¿ùµå ½ºÅ©¸³Æ® ¼³¸í
3.3 º¤ÅÍ °üÃø Ãß°¡ ¹× ȯ°æ ºôµå
3.4 ¹ø¿Ü: ÄÚµå ÃÖÀûÈ­ Çϱâ

¢Ã 4Àå: Deep Q Network(DQN)
4.1 DQN ¾Ë°í¸®ÁòÀÇ ¹è°æ
___4.1.1 °¡Ä¡ ±â¹Ý °­È­ÇнÀ
___4.1.2 DQN ¾Ë°í¸®ÁòÀÇ °³¿ä
4.2 DQN ¾Ë°í¸®ÁòÀÇ ±â¹ý
___4.2.1 °æÇè ¸®Ç÷¹ÀÌ(experience replay)
___4.2.2 Ÿ±ê ³×Æ®¿öÅ©(target network)
4.3 DQN ÇнÀ
4.4 DQN ÄÚµå
___4.4.1 ¶óÀ̺귯¸® ºÒ·¯¿À±â ¹× ÆĶó¹ÌÅÍ °ª ¼³Á¤
___4.4.2 Model Ŭ·¡½º
___4.4.3 Agent Ŭ·¡½º
___4.4.4 Main ÇÔ¼ö
___4.4.5 ÇнÀ °á°ú

¢Ã 5Àå: µå·Ð ȯ°æ ¸¸µé±â
5.1 A2C ¾Ë°í¸®ÁòÀÇ °³¿ä
5.2 ¾×ÅÍ-Å©¸®Æ½ ³×Æ®¿öÅ©ÀÇ ±¸Á¶
5.3 A2C ¾Ë°í¸®ÁòÀÇ ÇнÀ °úÁ¤
5.4 A2CÀÇ ÀüüÀûÀÎ ÇнÀ °úÁ¤
5.5 A2C ÄÚµå
___5.5.1 ¶óÀ̺귯¸® ºÒ·¯¿À±â ¹× ÆĶó¹ÌÅÍ °ª ¼³Á¤
___5.5.2 Model Ŭ·¡½º
___5.5.3 Agent Ŭ·¡½º
___5.5.4 Main ÇÔ¼ö
5.5.5 ÇнÀ °á°ú

¢Ã 6Àå: Advantage Actor Critic(A2C)
6.1 ÇÁ·ÎÁ§Æ® ½ÃÀÛÇϱâ
6.2 µå·Ð ¿¡¼Â °¡Á®¿À±â & ¿ÀºêÁ§Æ® Ãß°¡
___6.2.1 ¿¡¼Â½ºÅä¾î¿¡¼­ µå·Ð ¿¡¼Â ³»·Á¹Þ±â
___6.2.2 µå·Ð ȯ°æ Á¦ÀÛÇϱâ
6.3 ½ºÅ©¸³Æ® ¼³¸í
___6.3.1 DroneSetting ½ºÅ©¸³Æ®
___6.3.2. DroneAgent ½ºÅ©¸³Æ®
6.4 µå·Ð ȯ°æ ½ÇÇà ¹× È¯°æ ºôµå

¢Ã 7Àå: Deep Deterministic Policy Gradient(DDPG)
7.1 DDPG ¾Ë°í¸®ÁòÀÇ °³¿ä
7.2 DDPG ¾Ë°í¸®ÁòÀÇ ±â¹ý
___7.2.1 °æÇè ¸®Ç÷¹ÀÌ(experience replay)
___7.2.2 Ÿ±ê ³×Æ®¿öÅ©(target network)
___7.2.3 ¼ÒÇÁÆ® Ÿ±ê ¾÷µ¥ÀÌÆ®(soft target update)
___7.2.4 OU ³ëÀÌÁî(Ornstein Uhlenbeck Noise)
7.3 DDPG ÇнÀ
___7.3.1 Å©¸®Æ½ ³×Æ®¿öÅ© ¾÷µ¥ÀÌÆ®
___7.3.2 ¾×ÅÍ ³×Æ®¿öÅ© ¾÷µ¥ÀÌÆ®
7.4 DDPG ÄÚµå
___7.4.1 ¶óÀ̺귯¸® ºÒ·¯¿À±â ¹× ÆĶó¹ÌÅÍ °ª ¼³Á¤
___7.4.2 OU Noise Ŭ·¡½º
___7.4.3 Actor Ŭ·¡½º
___7.4.4 Critic Ŭ·¡½º
___7.4.5 Agent Ŭ·¡½º
___7.4.6 Main ÇÔ¼ö
___7.4.7 ÇнÀ °á°ú

¢Ã 8Àå: Ä«Æ®·¹ÀÌ½Ì È¯°æ ¸¸µé±â
8.1 ÇÁ·ÎÁ§Æ® ½ÃÀÛÇϱâ
8.2 Ä«Æ®·¹ÀÌ½Ì È¯°æ ±¸¼ºÇϱâ
8.3 ½ºÅ©¸³Æ® ÀÛ¼º ¹× ºôµåÇϱâ

¢Ã 9Àå: Behavioral Cloning(BC)
9.1 Behavioral Cloning ¾Ë°í¸®ÁòÀÇ °³¿ä
9.2 Behavioral Cloning ¾Ë°í¸®ÁòÀÇ ±â¹ý
___9.2.1 º¸»óÀÌ À½¼öÀÎ µ¥ÀÌÅÍ Á¦¿ÜÇϱâ
9.3 Behavioral Cloning ÇнÀ
9.4 Behavioral Cloning ¾Ë°í¸®Áò ÄÚµå
___9.4.1 ¶óÀ̺귯¸® ºÒ·¯¿À±â ¹× ÆĶó¹ÌÅÍ °ª ¼³Á¤
___9.4.2 Model Ŭ·¡½º
___9.4.3 Agent Ŭ·¡½º
___9.4.4 Main ÇÔ¼ö
___9.4.5 ÇнÀ °á°ú
9.5 ml-agentsÀÇ ³»Àå Imitation Learning »ç¿ë
___9.5.1 ML-Agents¿¡¼­ Á¦°øÇÏ´Â Behavioral Cloning ¾Ë°í¸®Áò
___9.5.2 ML-Agents¿¡¼­ Á¦°øÇÏ´Â GAIL ¾Ë°í¸®Áò
___9.5.3 ¸ð¹æÇнÀÀ» À§ÇÑ Config ÆÄÀÏ ¼³Á¤
___9.5.4 ml-agent¿¡¼­ÀÇ ¸ð¹æÇнÀ °á°ú

¢Ã 10Àå: ¸¶¹«¸®
10.1 ±âÃÊÆí ³»¿ë Á¤¸®
10.2 Ãß°¡ ÇнÀ ÀÚ·á
___10.2.1 À¯´ÏƼ
___10.2.2 À¯´ÏƼ ML-Agents
___10.2.3 °­È­ÇнÀ
10.3 ÀÀ¿ëÆí¿¡¼­ »ìÆ캼 ³»¿ë