id: 820970fps: 23.976nb_cd: 1format: srtchar_encoding: utf-8feature_type: Movierelease_name: Architecture.101.2012.BluRay.720p.DTS.x264-CHDrelease_title: titlerelease_group: video_codec: h264screen_size: video_format: author_comments: bad: 0enabled: truevotes: 1points: 10ratings: 10.0thanks: 15download_count: 12522last_download: 2022-10-27 21:55:06 UTCupload_date: 2012-12-14 23:39:58 UTCcomments_count: 0favourited: 0auto_assign: falsehearing_impaired: falselast_comment: featured: falsefeatured_ts: hd: trueauto_translation: falseforeign_parts_only: falsefrom_trusted: falseadmin_check: falseadmin_checked: trueversion: 0translator: detected_language: Englishdetected_encoding: UTF-8smart_feature_name: 2012 - Architecture 101osdb_subtitle_id: 4741419subfiles_count: 1osdb_user_id: 1955708osdb_user_info: "display_rank"=>"Trusted member", "display_user"=>"ZimukuChina", "display_author"=>"Anonymous", "uploader_badge"=>"platinum-member", "display_user_email"=>"zimuku@test.com", "display_author_rank"=>"anonymous"slug: architecture-101-2012-bluray-720p-dts-x264-chduploader_id: 67003translator_id: created_at: 2018-09-10 15:33:53 UTCupdated_at: 2023-01-31 01:34:15 UTCfeature_id: 576466osdb_language_id: 1075ai_translated: falseparent_subtitle_id: childrens_count: 0new_download_count: 478new_last_download: 2023-01-31 00:00:00 UTClast_sync_files: 2022-10-27 19:58:48 UTC
The movie starts in the present day, when a woman named Yang Seo-yeon approaches an old college classmate named Lee Seung-min to help design her house. The two first met in an architecture class, where Lee started falling in love with Yang. Architecture 101 traces the evolution of their youthful romance, linking it poetically back to the present day.
architecture 101 movie subtitles download
Netflix's experience is all about providing users with the content they want, and this theme carries throughout their platform from movie recommendations on the homepage down to translated subtitles. In fact, Netflix has become pretty well known for their localization efforts on both the developer and translation end.
Many existing methods have been built to recognize the emotions of humans in videos [7,8,9,10,11,12,13,14]. These methods, however, do not predict the emotions that a person experiences when watching movies. Fewer studies [15,16,17,18] have focused on the latter, i.e., predicting the emotions of viewers based on movies themselves. This is the main goal of this work. More specifically, we propose a model for movie viewer emotion prediction using features extracted from video, audio, and movie subtitles.
Many studies on predicting the affective responses of movie viewers typically use extracted features from video and audio streams [15,18,19,20,21]. Those features are then fused by applying either early fusion or late fusion techniques without explicitly taking the relationship among these modalities into account. To tackle this problem, we previously proposed a preliminary model based on a deep-learning architecture with self-attention, called the AttendAffectNet model [22]. This network predicts the emotions of movie viewers by making use of the self-attention mechanism introduced in the Transformer model [23].
The current paper is an extended version of our previous model [22]. In addition to visual and audio features, we also consider the text features extracted from movie subtitles. A combination of the Feature and Temporal AttendAffectNet models is also explored by applying the self-attention mechanism on both the extracted features as well as the time domain.
Overview of the proposed AttendAffectNet (ANN). Feature vectors are extracted from video, audio, and movie subtitles. We reduce their dimensionality before feeding them to the self-attention based models, which predict the affective responses of movie viewers.
The FlowNet Simple network (FlowNetS) [34] is used to extract motion features while VGGish [35] and the OpenSMILE toolkit [36] are used for audio feature extraction. In addition, the Bidirectional Encoder Representations from the Transformers model (BERT) [37] pretrained on Wikipedia & BookCorpus [38] is used to extract features from movie subtitles. All of these features are then passed to our AttendAffectNet model to predict the emotions of movie viewers.
Extensive experiments were conducted on the EIMT16 [29] and the extended COGNIMUSE dataset [30,31]. Note that, for the extended COGNIMUSE, the clips are cut into 5-s segments as in [19]. A careful analysis of the impact of video, audio, and movie subtitles on the accuracy of the emotion prediction task is performed.
We also investigate whether a combination of visual, audio, and text features could improve the performance of the proposed emotion prediction models. Our results show that audio features have a significant effect on driving the emotions of movie viewers, while movie subtitles are not as important as video and audio streams.
We also apply this technique on the extended COGNIMUSE dataset, except for the fact that instead of extracting 64 frames, we use the entire set of frames obtained from each 5-s movie excerpt (i.e., 125 frames). Note that all 5-s movie excerpts in this dataset have the same frame rate (25 frames per second). The subtitles are only available for the extended COGNIMUSE dataset and they are not provided in the Global EIMT16 dataset, therefore, experiments on text features are conducted on the extended COGNIMUSE dataset only.
In this study, in addition to the OpenSMILE toolkit, we use pretrained deep neural networks (ResNet-50, RGB-stream I3D, FlowNetS, VGGish, and BERT) as feature extractors. Each extractor allows the input (i.e., image, audio, and movie subtitles) to propagate forward and to stop at a predetermined layer, whereby its output forms our extracted feature vector. One of the advantages of this approach is that we can obtain the robust and discriminative feature vectors learned by these deep neural networks without concerns about the memory usage and computational time, as is the case when fine-tuning or training these entire networks or some of their layers together with the self-attention layers.
The prediction accuracy in this case, however, is still better than when using only text features extracted from movie subtitles as the model input. The same observation can be made about Temporal AAN and Mixed AAN models. We see similar results for the extended COGNIMUSE dataset: using only audio features, our proposed models also reach higher accuracy than those with only visual features. This same observation was also observed in [21].
The higher influence of audio features may be explained by the fact that audio in movies is intentionally selected with the goal of setting an emotional context for the user. Specifically, music is useful in eliciting emotions of audiences as mentioned in [83]. Therefore, audio may have a higher impact on elicited emotions compared to video and subtitles.
Whether the model input consists of feature vectors extracted from video, audio, or movie subtitles separately or a combination of them (as shown in Table 1, Table 2, Table 3, Table 4 and Table 5), on both datasets, the structure with only fully connected layers performs better than the two-layer LSTM model. However, it performs worse than our proposed Feature AAN and Temporal AAN. Particularly, for the extended COGNIMUSE, when we feed all visual, audio, and text features simultaneously to the model with only fully connected layers, the MSE and PCC for arousal prediction are 0.289 and 0.229, respectively.
A thorough analysis of the importance of different features extracted from different modalities on the resulting accuracy for emotion prediction of movie viewers was also conducted. Notably, variants of the AAN trained on audio features performed better than those trained on either visual features or text features. The reason for this might be due to the more influential impact of audio/music on evoked emotions compared with video and movie subtitles. A combination of audio, visual, and text features delivered the highest level of performance. 2ff7e9595c
Comments