Infographic Gallery | Vertical Discovery

Technology 23 May 2026

AI workflow

Repost:

Environment 05 May 2026

Biomech

Repost:

Social Trends 05 May 2026

Force creatures

Repost:

Social Trends 25 Apr 2026

Alcohol vs ethanol

Repost:

Environment 23 Apr 2026

Hair styles

Repost:

Business & Finance 21 Apr 2026

วิธีใส่ซอง

Repost:

Environment 20 Apr 2026

Pablo Escobar's Hippo

Repost:

Social Trends 13 Apr 2026

How to sashimi cut a salmon

Following the guide.

Repost:

Business & Finance 04 Apr 2026

Real wealth

Longevity wealth.

Repost:

Business & Finance 03 Apr 2026

Same pile of money

Salary still same.

Repost:

Technology 03 Apr 2026

Apache Spark

The Advanced Spark Pipeline: Unified Streaming and Batch: The bank uses Spark's Structured Streaming to ingest the live firehose of 50,000 transactions per second from a messaging queue like Apache Kafka. Distributed Joins: Spark distributes this live stream across a cluster of 50 powerful servers. While holding the live data in memory, Spark instantly joins it with a massive 10-terabyte historical database (stored in a data lake like AWS S3 or Hadoop) to check the customer's past spending habits. Machine Learning at Scale: Still running in parallel across all 50 servers, Spark feeds the joined data into a distributed Machine Learning model (using Spark MLlib). The model scores each transaction for fraud probability. Fault Tolerance (Resiliency): If Server #14 literally catches fire and dies in the middle of processing its chunk of transactions, Spark's core engine notices instantly. It grabs the lost chunk of data, reroutes it to Server #15, and finishes the calculation without crashing the application or losing a single swipe. Micro-batch Output: The flagged fraudulent transactions are instantly written to a database that triggers an immediate text message to the customer ("Did you just spend $5,000 in another country?").

Repost:

Environment 03 Apr 2026

PoseScan By Wifi

ทีมนักวิจัยจาก Carnegie Mellon University (CMU) ได้พัฒนาวิธีการที่เรียกว่า "DensePose from WiFi" ซึ่งสามารถ "มองเห็น" ร่างกายมนุษย์และจำลองท่าทางแบบ 3 มิติออกมาได้โดยใช้เพียงสัญญาณ Wi-Fi จากเร้าเตอร์ตามบ้านทั่วไปเท่านั้น

Repost:

Technology 03 Apr 2026

Cassandra

?? Apache Cassandra: The Cheat Sheet1. Architecture & Consistency (โครงสร้าง)Masterless: ทุกโหนดเท่าเทียมกัน (Peer-to-Peer) ไม่มี Single Point of FailureToken Ring: แบ่งเค้ก (ข้อมูล) ตามช่วงตัวเลข TokenReplication: ก็อปปี้ข้อมูลไปไว้หลายเครื่อง (ตามค่า RF) เพื่อความชัวร์Consistency: เลือกได้ว่าจะเอาเร็วหรือเอาชัวร์ สูตรลับคือ $(Read + Write) > RF$2. Data Modeling (การออกแบบ)Query-first: รู้ก่อนว่าจะถามอะไร ค่อยสร้างตารางDenormalization: ยอมเก็บข้อมูลซ้ำเพื่อแลกกับความเร็วในการอ่าน (Read Speed)Partition Key: "ไปเครื่องไหน?" (Where) ??️ สำคัญมากสำหรับการกระจายข้อมูลClustering Key: "เรียงยังไง?" (Order) ?? สำคัญมากสำหรับการดึงข้อมูลเป็นช่วง (Range Query) และแก้ปัญหาข้อมูลซ้ำ3. The Write Path (ทางเข้า - เร็วจัด) ⚡Commit Log: เขียนลง Disk กันเหนียวMemtable: เขียนลง RAM เพื่อความเร็วSSTable: Flush ลง Disk เป็นไฟล์ถาวร (Immutable - ห้ามแก้)Tombstone: การลบคือการแปะป้ายว่า "ตายแล้ว"4. The Read Path (ทางออก - ฉลาดเลือก) ??Bloom Filter: ด่านหน้าผู้บอกว่า "ไฟล์นี้ไม่มีข้อมูลหรอก ข้ามไปเลย" ช่วยลดการอ่าน DiskLast Write Wins: ข้อมูลชนกัน? เชื่อตัวที่มี Timestamp ล่าสุด5. Gossip Protocol (ข่าวลือ) ??️โหนดคุยกันเองแบบสุ่มเพื่อเช็คสถานะเพื่อนๆ ใช้หลักการความน่าจะเป็น ($\Phi$) เพื่อจับผิดโหนดที่ตาย

Repost:

Technology 03 Apr 2026

DuckDB

DuckDB คือเครื่องมือวิเคราะห์ข้อมูลสมัยใหม่ที่มีใจความสำคัญ 3 อย่างคือ "รวดเร็ว ฝังตัวได้ และทำงานแบบคอลัมน์" ⚡ คุณสมบัติเด่น (Features): ทำงานแบบฝังตัว (In-Process): รันอยู่ข้างในแอปพลิเคชันของคุณได้เลย ไม่ต้องมีเซิร์ฟเวอร์แยก ทำให้ไม่มีความหน่วงจากเครือข่าย ประมวลผลเร็วปรี๊ด: ทำงานแบบคอลัมน์และประมวลผลข้อมูลทีละชุดใหญ่ๆ (Vectorized) ทำให้ดึงข้อมูลและคำนวณได้ไวมาก อ่านคลาวด์ได้ตรงๆ (Zero-Copy): คิวรีไฟล์อย่าง .csv, .parquet, .json จากคลาวด์ (S3/GCS) ได้โดยตรงแบบไม่ต้องโหลดข้อมูลเข้าเครื่องก่อน SQL ที่นักพัฒนาเลิฟ: มีฟีเจอร์ SQL ล้ำๆ ที่ช่วยลดการเขียนโค้ดซ้ำซาก และอัปเดตล่าสุดยังรองรับข้อมูลแบบโครงสร้างซ้อนกัน (Variant) และพิกัดพื้นที่ (Geometry) ด้วย ??️ กรณีการใช้งาน (Use Cases): วิเคราะห์ข้อมูลในเครื่อง (EDA): เป็นทางเลือกแทน Pandas เวลาเจอไฟล์ใหญ่ๆ วิเคราะห์ได้ลื่นๆ เครื่องไม่ค้าง สถาปัตยกรรมข้อมูลสมัยใหม่: เป็นเอนจินขนาดเบาที่เอาไว้นั่งคิวรีข้อมูลบน Data Lake ได้เลย (ตอบโจทย์สถาปัตยกรรมแบบ Kappa สุดๆ) Edge Analytics & Telemetry: เล็กจนเอาไปฝังในอุปกรณ์ Edge Device, IoT หรือแม้แต่รันบนเว็บเบราว์เซอร์ผ่าน WebAssembly (Wasm) ก็ยังได้ Serverless & Web Apps: เอามาใช้จัดการคิวรีลำดับชั้นที่ซับซ้อนภายในแอปพลิเคชัน โดยไม่ต้องเสียตังค์หรือเสียเวลาดูแลฐานข้อมูลภายนอกที่หนักอึ้ง

Repost: