Ensemble based algorithms are widely used for eradicating the issue of overfitting in classification algorithms and the framework proposed is reducing the issue of overfitting by bootstrapping the random samples and feature samples derived from the YouTube trending videos. The utterance level preprocessing on the video frames are extracted using the label encoder algorithm and the preprocessed data is fed into the ensemble-based regression model for extracting the sentiment features using random sampling and feature sampling. Random forest algorithm is one of the most efficient ensemble algorithms for reducing overfitting and space effective out of bag error elimination prediction model used to construct the proposed opinion mining framework. The YouTube trending videos are extracted with its linguistic information and MFCC is used to interpret the audio signals extracted from the same using APP algorithms. The weighted utterance level fusion of audio and linguistic information is attained with an accuracy of 88.29%. The fusion level of utterances with the image frame set and linguistic information is preprocessed using label encoder and OpenCV commands which attained the accuracy of 72.33% of sentiment polarity obtained. The fusion at the utterance level of linguistic, acoustic and visual modalities achieved the sentiment classification accuracy of 90.06%. The proposed opinion mining framework works with 91% accuracy on sentiment polarity identification from the emotion expressed in the videos fed as input.