000 03332nam a22001697a 4500
999 _c297263
_d297263
020 _a9789350236741
082 _a006
_bCON.W
100 _aConway, Drew and White, Myles John
245 _aMachine Learning for Hackers
260 _aCA
_bO'Reilly Media Inc.
_c2019
300 _a xiii, 303 pages : illustrations ; 24 cm
505 _a Machine generated contents note: 1. Using R -- R for Machine Learning -- Downloading and Installing R -- IDEs and Text Editors -- Loading and Installing R Packages -- R Basics for Machine Learning -- Further Reading on R -- 2. Data Exploration -- Exploration versus Confirmation -- What Is Data? -- Inferring the Types of Columns in Your Data -- Inferring Meaning -- Numeric Summaries -- Means, Medians, and Modes -- Quantiles -- Standard Deviations and Variances -- Exploratory Data Visualization -- Visualizing the Relationships Between Columns -- 3. Classification: Spam Filtering -- This or That: Binary Classification -- Moving Gently into Conditional Probability -- Writing Our First Bayesian Spam Classifier -- Defining the Classifier and Testing It with Hard Ham -- Testing the Classifier Against All Email Types -- Improving the Results -- 4. Ranking: Priority Inbox -- How Do You Sort Something When You Don't Know the Order? -- Ordering Email Messages by Priority. Contents note continued: Priority Features of Email -- Writing a Priority Inbox -- Functions for Extracting the Feature Set -- Creating a Weighting Scheme for Ranking -- Weighting from Email Thread Activity -- Training and Testing the Ranker -- 5. Regression: Predicting Page Views -- Introducing Regression -- The Baseline Model -- Regression Using Dummy Variables -- Linear Regression in a Nutshell -- Predicting Web Traffic -- Defining Correlation -- 6. Regularization: Text Regression -- Nonlinear Relationships Between Columns: Beyond Straight Lines -- Introducing Polynomial Regression -- Methods for Preventing Overfitting -- Preventing Overfitting with Regularization -- Text Regression -- Logistic Regression to the Rescue -- 7. Optimization: Breaking Codes -- Introduction to Optimization -- Ridge Regression -- Code Breaking as Optimization -- 8. PCA: Building a Market Index -- Unsupervised Learning -- 9. MDS: Visually Exploring US Senator Similarity. Contents note continued: Clustering Based on Similarity -- A Brief Introduction to Distance Metrics and Multidirectional Scaling -- How Do US Senators Cluster? -- Analyzing US Senator Roll Call Data (101st--111th Congresses) -- 10. kNN: Recommendation Systems -- The k-Nearest Neighbors Algorithm -- R Package Installation Data -- 11. Analyzing Social Graphs -- Social Network Analysis -- Thinking Graphically -- Hacking Twitter Social Graph Data -- Working with the Google SocialGraph API -- Analyzing Twitter Networks -- Local Community Structure -- Visualizing the Clustered Twitter Network with Gephi -- Building Your Own "Who to Follow" Engine -- 12. Model Comparison -- SVMs: The Support Vector Machine -- Comparing Algorithms.
520 _aNow that storage and collection technologies are cheaper and more precise, methods for extracting relevant information from large datasets is within the reach any experienced programmer willing to crunch data.
650 _aComputer Algorithms
650 _aElectronic Data Processing
942 _cBK