Despite the increasing attention to technology-enhanced language learning in English-as-a-foreign-language contexts, investigation regarding the effects of multimodal technologies on affective factors (particularly emotions and grit) in digital storytelling has remained underexplored. Therefore, this mixed-methods study reports on positive/negative emotions, grit and learner perceptions in the digital storytelling presentation process as the result of different presentation modes (robot-assisted versus PowerPoint-assisted). With 52 9th-grade middle-school students from two intact classes in a junior high school in central Taiwan, the results from multiple data sources (an emotion questionnaire, a grit survey, a perception survey, student in-class sharing) revealed that the robot-assisted mode was more advantageous in contributing to more positive emotions and in making learners grittier, foregrounding higher perseverance of effort in the learning process. The students having the robot-assisted mode also responded more positively to the overall learning experience.