Essentials of Data Science – Probability and Statistical Inference – Introduction to Probability Distributions

In the previous note on the Probability and Statistical Inference, we have seen the the important concept of probability and statistics which are as follows: At the very first, we have seen the basic theory of probability and how to model a random phenomenon by satisfying the axioms of probability. Further, we explore the random variables […]

Loading

Essentials of Data Science – Probability and Statistical Inference – Introduction to Probability Distributions Read More »

Essentials of Data Science – Probability and Statistical Inference – Quantiles and Tschebyschev’s Inequality

In the previous note on the Probability and Statistical Inference, we have learned expectations, moments, skewness, and kurtosis to measure the central tendency, dispersion, symmetry, and peakedness of probability curve or distribution, respectively. Introduction We define quantiles in terms of the distribution function. The value for which the cumulative distribution function is: is called the p-quantile. Here, is a value

Loading

Essentials of Data Science – Probability and Statistical Inference – Quantiles and Tschebyschev’s Inequality Read More »

Essentials of Data Science – Probability and Statistical Inference – Skewness and Kurtosis

In the previous note on the Probability and Statistical Inference, we have learned expectations and moments for the probability distribution of a random variable which gives central tendency and variability of the values of a random variable, respectively. In this note, we will further extend the concept of moments and study the other characteristics precisely the shape and peakedness, of the probability

Loading

Essentials of Data Science – Probability and Statistical Inference – Skewness and Kurtosis Read More »

Essentials of Data Science – Probability and Statistical Inference – Moments and Variance

In the previous note on the Probability and Statistical Inference, we started a new topic that characterizes the probability distribution of a random variable to get the hidden statistical information about the probability distribution. One of the statistical tools is the expectation of random variables or expectation of the probability distribution of a random variable. We have seen

Loading

Essentials of Data Science – Probability and Statistical Inference – Moments and Variance Read More »

Automatic Language Identification in Texts – Polyglot

In the earlier note on langid of this note series automatic language identification, we introduced how to detect language using the langid tool, which uses a naive Bayes classifier with a multinomial event model over a mixture of character n-grams and trained over 97 languages. It provided additional tools for model building, training, tokenization, etc., that are helpful

Loading

Automatic Language Identification in Texts – Polyglot Read More »

Automatic Language Identification in Texts – LangId

In the earlier note on sparknlp of this note series automatic language identification, we introduced how to detect language using the sparknlp library, which uses pre-trained deep learning models generated using CNN architectures in TensorFlow/Keras. Currently, they have published pre-trained models that can detect 375 languages, which is significantly higher than any other open-source library. Introduction In this

Loading

Automatic Language Identification in Texts – LangId Read More »

Automatic Language Identification in Texts – Sparknlp

In the earlier note on langdetect of this note series automatic language identification, we had introduced how to detect language using the langdetect library, which uses a Naive Bayes classifier with character n-gram to detect language.  In this note, we introduce another language identification library, which is a part of the sparknlp package. They had designed and developed Deep Learning models

Loading

Automatic Language Identification in Texts – Sparknlp Read More »

Automatic Language Identification in Texts – Langdetect

In the note series of automatic language identification, we had introduced how to detect language using the gcld3 library. Moreover, it is designed to run in the Chrome browser, written in the C++ programming language, based on a neural network model, and supports over 100 languages/scripts. In this note, we introduce another language identification library called LangDetect.

Loading

Automatic Language Identification in Texts – Langdetect Read More »

Automatic Language Identification in Texts – GCLD3

In the previous note on automatic language identification, we had introduced how to detect language using fasttext. Fasttext is a library created by Facebook’s AI research lab for efficient learning of text representations and classification. In this note, we introduce another language identification library called Google Compact Language Detector v3 (GCLD3). GCLD3 is designed to

Loading

Automatic Language Identification in Texts – GCLD3 Read More »

Automatic Language Identification in Texts – Fasttext

Language detection is vital in Natural Langauge Processing (NLP), as different NLP tasks or activities are language-dependent. Moreover, finding the best language detector that can support most natural language, short text, and multilingual texts is difficult. However, the Fasttext library performs well compared to other automatic language identification libraries such as gcld3, langdetect, langid, nltk_textcat,

Loading

Automatic Language Identification in Texts – Fasttext Read More »

Scroll to Top
Scroll to Top