Unstructured Data Classification - Quiz

Q: Can we consider sentiment classification as a text classification problem?

A. Yes
B. No

Correct Option: A
Explanation: Yes

Q: Choose the correct sequence from the following.

A. Data Analysis -> Pre-Processing -> Model Building -> Predict
B. Pre-Processing -> Model Building -> Predict
C. Pre-Processing -> Predict -> Train
D. Data Analysis -> Pre-Processing -> Predict -> Train

Correct Option: A
Explanation: Data Analysis -> Pre-Processing -> Model Building -> Predict

Q: Which pre-processing technique is used to remove the most commonly used words?

A. Tokenization
B. Lemmatization
C. Stopword removal

Correct Option: C
Explanation: Stopword removal

Q: The most widely used package for machine learning in Python is _________

A. bottle
B. jango
C. sklearn
D. pillow

Correct Option: C
Explanation: sklearn

Q: Clustering is supervised classification.

A. True
B. False

Correct Option: B
Explanation: False

Q: True Negative is when the predicted instance and the actual instance are positive.

A. True
B. False

Correct Option: B
Explanation: False

Q:


 a) Download the dataset from https://hrcdn.net/s3_pub/istreet-assets/H4_TQkbOj39HUNoBukluIQ/training.txt  and load it to the variable 'sentiment_analysis_data'.
 b) Give the column names as 'label' and 'message'.
 c) Try out the code snippets and answer the questions.

Which of the following commands is used to view the dataset SIZE, and what is the value returned?

A. sentiment_analysis_data.size(), (6918, 2)
B. sentiment_analysis_data.shape, (6918, 3)
C. sentiment_analysis_data.shape, (6918, 2)
D. sentiment_analysis_data.size, (6918, 3)

Correct Option: C
Explanation: sentiment_analysis_data.shape, (6918, 2)

Q: The cross-validation technique is used to evaluate a classifier by dividing the data set into a training set to train the classifier and a testing set to test the same.

A. True
B. False

Correct Option: A
Explanation: True

Q:


 a) Download the dataset from https://hrcdn.net/s3_pub/istreet-assets/H4_TQkbOj39HUNoBukluIQ/training.txt  and load it to the variable 'sentiment_analysis_data'.
 b) Give the column names as 'label' and 'message'.
 c) Try out the code snippets and answer the questions.

What is the output of the following command: print(sentiment_analysis_data['label'].unique())

A. [yes no]
B. [true false]
C. [1 0]
D. None of the options

Correct Option: C
Explanation: [1 0]

Q: Identify the unstructured data from the following.

A. Excel data
B. Data from mySQL DB
C. Image

Correct Option: C
Explanation: Image